Claude 3 vs GPT-4 vs Gemini: Which is Better in 2024?

5 min readMar 29, 2024

I’ve been testing different large language models (LLMs) and comparing their responses. Each LLM seems to have its strengths, and the most suitable one depends entirely on the task at hand. Here’s my personal take on these three LLMs: GPT-4, Claude 3 Opus, and Gemini Advanced. I’ll share what I’ve used them for and highlight their strengths and weaknesses.

Note: To get the most out of these LLMs, I recommend using the paid versions. Free versions have their limitations and can be frustrating.

Round 1: Creative Writing

Gemini excels at producing the most human-like writing. It also offers good suggestions, criticisms, recommendations, etc., for writing tasks. Recently, I was writing a newsletter and shared the same prompt with Gemini, Claude 3, and GPT-4. Gemini provided the best ideas for me to refine and work with.

Claude 3 can be made to sound more natural than GPT-4 with little effort. Also, it can be used if you need longer outputs e.g. exceeding 1000 words.

GPT-4 on the other hand sounds very robotic and lazy which makes it frustrating sometimes. It also struggles with generating longer content, it caps its output around 600–700 words.

So, for Twitter posts, newsletter subjects, email writing, cover letters, stories, etc, Gemini shines in this area, however, it has censorship which might limit its ability to fully respond to some prompts.

Round 2: Maths and Logical Reasoning

GPT-4 can solve math problems effortlessly, simply by being provided with an image of the problem statement. Also, it’s great for logical reasoning involving difficult word problems, multi-step mathematical questions, etc.

Claude 3 also performs well, but it falls slightly behind GPT-4 in terms of accuracy and performance on some advanced logical reasoning and mathematical tasks. In this area, GPT-4 exhibits slightly more intelligence and problem-solving skills.

Gemini can be bad at understanding instructions sometimes so I don’t expect it to excel in logic reasoning tasks.

Round 3: Coding

Gemini doesn’t excel at coding tasks compared to GPT-4 and Claude 3. That doesn’t mean it’s useless entirely, it just struggles with some advanced coding tasks or refuses tasks due to censorship. Claude 3 and GPT-4 both perform well with coding. However, GPT-4 has limitations. It usually provides small outputs, leading to a back-and-forth exchange where you try to get the full code. This can consume a lot of tokens, and once you run out, you need to wait 4–5 hours before using it again. This is where Claude 3 shines. With a single prompt, you can generate the entire code, saving you tokens and time.

Overall, both Claude 3 and GPT-4 are good options and can be used interchangeably to achieve the best results.

Round 4: Context Window

Claude 3 has a larger context window (200k tokens window) than GPT4 and Gemini enabling it to handle longer inputs and process large amounts of information. Also, it does a better job of remembering all your original instructions.

GPT-4 and Gemini context window is around 128K tokens far lower than what one can get with Claude 3. This makes Gemini struggles more with remembering information. After some back-and-forth within a single chat session, it starts to lose track of the conversation entirely. Also, GPT-4 acts the same way after a while but it has more memory retention than Gemini.

Round 5: Internet Access

Both Gemini and GPT-4 can access the internet, providing real-time access to web data. However, GPT-4’s browsing mode has limitations. It gives generic content instead of focusing on specific needs. Also, Gemini can hallucinate and refuse to give some answers it sometimes behaves like it’s an offline model.

Claude 3, on the other hand, lacks direct internet access. While it can’t access real-time data, it excels in other areas.

Round 6: Generating Image

Gemini told me it can’t create images and went ahead to create one for me 😄. Image generation is definitely not its strong suit now. I’ve found it declines requests to create images for my blog. GPT-4 can easily generate images. Claude 3 cannot generate images but it can interpret and analyze the image you send.

Round 7: Extracting file data (PDF, CSV, Docx, etc.)

When I feed PDFs to GPT-4 and ask a specific question, or even ask for a summary it only spits out a short underwhelming paragraph, not answering my questions, not providing a good summary, etc. With Claude, analyzing and answering questions based on the PDF is much better.

For Gemini, I upload my PDFs to GDrive and ask it to summarize by my filename.

Here’s a Summary Breakdown of Gemini Advanced, Claude 3, and GPT-4:

Gemini Advanced

Strengths: Fastest, integrates with Google Suite, excellent UI, readable outputs, polished product, handles poor grammar well, finds unique solutions.
Weaknesses: Prone to errors, overly conservative with token usage (may omit details), struggles with image analysis, lacks customer support, dismissive of improvement suggestions, misinterprets user intent in coding contexts.
Best suited for: Creative writing (newsletters, tweets, emails, etc.) and exploring coding solutions (may require additional verification).

Claude 3

Strengths: Fast, efficient token usage (provides detailed answers), unique responses, handles large PDFs and inputs well, sounds natural compared to GPT-4, generates longer outputs.
Weaknesses: Can be stubborn about correcting its own errors, my least favourite UI, can’t share, can’t edit, recommends more errors compared to GPT-4.
Best suited for: Tasks requiring longer outputs, human-like communication, creative writing, and coding (generally comparable to GPT-4).

GPT-4

Strengths: Least likely to suggest errors, effective at fixing errors in its own code, good UI, active community, most features.
Weaknesses: Slowest of the three, limited chat history, less likely to offer unique solutions compared to others, prone to crashes, feels less polished.
Best suited for: Educational purposes, professional use, brainstorming ideas, and situations where limits are not a concern.

Final Notes

All LLMs are susceptible to generating inaccurate information. They hallucinate a lot so you have to do your bits by checking.
Good prompts can significantly impact the quality of responses.