Large Language Models#

The following tutorial is a summary of Andrej Karpathy’s video A Deep Dive into Large Language Models like ChatGPT. I highly recommend watching the video, as it contains a lot of useful information and visualizations that are hard to capture in text.

Why should you read this?

  • You want to understand how LLMs actually work not just at the surface level.

  • You want to understand confusing fine-tuning terms like chat_template and ChatML (especially if you’re using Axolotl).

  • You want to get better at prompt engineering by understanding why some prompts work better than others.

  • You’re trying to reduce hallucinations and want to know how to keep LLMs from making things up.

  • You want to understand why DeepSeek-R1 is such a big deal right now.

Note: If you are looking for the excalidraw diagram that Andrej made for the video, you can download it here. He shared it through Google Drive and it invalidates the link after a certain time. That’s why I have decided to host it on my CDN as well.

Preview of Things to Come#

Future LLMs will expand in several key areas:

  • Multimodal Capabilities → Not just text, but also understanding and generating images, audio, and video.

  • Agent-Based Models → Moving beyond single tasks to long-term memory, reasoning, and correction of mistakes.

  • Pervasive & Invisible AI → AI will be integrated into workflows in a way that becomes second nature.

  • Computer-Using AI → AI models that interact with software and take actions beyond just text generation.

  • Test-Time Training → AI adapting itself in real-time to improve accuracy on the fly.

Keeping Track of LLMs#

If you’re interested in following developments in this space, here are some great resources:

  • LM Arena → Benchmarking new language models.

  • AI News → A newsletter covering AI research.

Where to Find LLMs#

Want to try different LLMs? Here’s where to find them:

  • Proprietary Models → OpenAI (GPT-4), Google (Gemini), Anthropic (Claude), etc.

  • Open-Weight Models → DeepSeek, Meta (Llama), etc. Try them via Together.ai.

  • Run Locally → Use Ollama or LM Studio.

  • Base Models → Explore Hyperbolic.