Is Gemini better than GPT4? Self-created benchmark - Fact Retrieval/Checking, Coding, Tool Use

Channel:

John Tan Chong Min

Subscribers:

5,450

Published on December 8, 2023 11:16:58 AM ● Video Link: https://www.youtube.com/watch?v=kHexvmsZwWo

Duration: 11:19

386 views

Google just unveiled their latest Large Language Model (LLM) in Dec, named Gemini. It is multi-modal and reportedly performs better than GPT4.

Does it really? Let's find out.

In my limited experiments for text-based tasks, GPT4 is still better than Gemini Pro in Constrained Generation, Fact Checking, Advanced Coding and Tool Use.

It's performance is still superior to ChatGPT and Llama 2, so that's a good first step.

Probably with more data collected from users on Gemini via Bard, it will likely get even better.

Also, Gemini Ultra (the best version) may be better, and I look forward to evaluating it when it is out.

~~~~~~~~~~~~

Gemini Technical Report: https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf

Slides: https://github.com/tanchongmin/TensorFlow-Implementations/blob/main/Paper_Reviews/Gemini%20vs%20GPT4.pdf

Fireball Dodger Game Prompt + Code: https://github.com/tanchongmin/TensorFlow-Implementations/blob/main/Fireball_Dodger/Gemini_Comparison.ipynb

Previous Video on ChatGPT vs Llama 2 Comparison: https://www.youtube.com/watch?v=SBBFxwnABLM

~~~~~~~~~~~~~

0:00 Introduction
0:40 Classification and JSON Prompting
1:11 Creative Generation
3:09 Fact Retrieval and Checking
5:06 Math
5:50 Coding
7:46 Tool Use
9:57 Verdict

~~~~~~~~~~~~~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin

Other Videos By John Tan Chong Min

2024-03-05	TaskGen - A Task-based Agentic Framework using StrictJSON at the core
2024-02-27	SymbolicAI / ExtensityAI Paper Overview (Part 2) - Evaluation Benchmark Discussion!
2024-02-20	SymbolicAI / ExtensityAI Paper Overview (Part 1) - Key Philosophy Behind the Design - Symbols
2024-02-13	Embeddings Walkthrough (Part 2): Context-Dependent Embeddings, Shifting Embedding Space
2024-02-06	Embeddings Walkthrough (Part 1) - Bag of Words to word2vec to Transformer contextual embeddings
2024-01-29	V* - Better than GPT-4V? Iterative Context Refining for Visual Question Answer!
2024-01-23	AutoGen: A Multi-Agent Framework - Overview and Improvements
2024-01-09	AppAgent: Using GPT-4V to Navigate a Smartphone!
2024-01-08	Tutorial #13: StrictJSON, my first Python Package! - Get LLMs to output into a working JSON!
2023-12-20	"Are you smarter than an LLM?" game speedrun
2023-12-08	Is Gemini better than GPT4? Self-created benchmark - Fact Retrieval/Checking, Coding, Tool Use
2023-12-04	Learning, Fast and Slow: 10 Years Plan - Memory Soup, Hier. Planning, Emotions, Knowledge Sharing
2023-12-01	Tutorial #12: Use ChatGPT and off-the-shelf RAG on Terminal/Command Prompt/Shell - SymbolicAI
2023-11-20	JARVIS-1: Multi-modal (Text + Image) Memory + Decision Making with LLMs in MineCraft!
2023-11-20	Tutorial #11: Virtual Persona from Documents, Multi-Agent Chat, Text-to-Speech to hear your Personas
2023-11-14	A Roadmap for AI: Past, Present and Future (Part 3) - Multi-Agent, Multiple Sampling and Filtering
2023-11-07	Learning, Fast and Slow: My Landmark Idea for fast, adaptable agents (ICDL 2023 Best Paper Finalist)
2023-11-06	A roadmap for AI: Past, Present and Future (Part 2): Fixed vs Flexible, Memory Soup vs Hierarchy
2023-11-03	AI & Education: Education when AI tools are smarter than us - Discussion with Kuang Wen (Part 2)
2023-11-03	AI & Education: RAG Question-Answer, Test Question Generator, Autograder by Kuang Wen! (Part 1)
2023-10-31	A Roadmap for AI: Past, Present and Future (Part 1)

Channel	Latest
강자	6 hours ago
Beverlyビバリー	6 hours ago
Garena Free Fire VN	6 hours ago
AgentJ Gaming	6 hours ago
Soccer Gameplay	6 hours ago
POWER OF GAME	6 hours ago
笠希々	6 hours ago
Dunkelschloss	6 hours ago
Yusuke Yamamoto [Otaku President]	6 hours ago
よっしぃ game channel	6 hours ago
フリーランスなおきち広島弁ゲーム実況	6 hours ago
Atomix Knight	7 hours ago
阿德 (藝圓創)	7 hours ago
Tama Ch	7 hours ago
やまだちゃんねる	7 hours ago
Krosmaster Team Spain	7 hours ago
fin	7 hours ago
MacTom	7 hours ago
Kikoskia	7 hours ago
ゆっくり田んぼ	7 hours ago
TTKT Studio	7 hours ago
TOHO animation	7 hours ago
Dan Field	7 hours ago
ゆあちゃんねる / Yua Channel	7 hours ago
アサルトサイジ1プレイ動画も上げてます	7 hours ago