Large Language Models#
The following tutorial is a summary of Andrej Karpathy’s video A Deep Dive into Large Language Models like ChatGPT. I highly recommend watching the video, as it contains a lot of useful information and visualizations that are hard to capture in text.
Why should you read this?
You want to understand how LLMs actually work not just at the surface level.
You want to understand confusing fine-tuning terms like chat_template and ChatML (especially if you’re using Axolotl).
You want to get better at prompt engineering by understanding why some prompts work better than others.
You’re trying to reduce hallucinations and want to know how to keep LLMs from making things up.
You want to understand why DeepSeek-R1 is such a big deal right now.
Note: If you are looking for the excalidraw diagram that Andrej made for the video, you can download it here. He shared it through Google Drive and it invalidates the link after a certain time. That’s why I have decided to host it on my CDN as well.
Preview of Things to Come#
Future LLMs will expand in several key areas:
Multimodal Capabilities → Not just text, but also understanding and generating images, audio, and video.
Agent-Based Models → Moving beyond single tasks to long-term memory, reasoning, and correction of mistakes.
Pervasive & Invisible AI → AI will be integrated into workflows in a way that becomes second nature.
Computer-Using AI → AI models that interact with software and take actions beyond just text generation.
Test-Time Training → AI adapting itself in real-time to improve accuracy on the fly.
Keeping Track of LLMs#
If you’re interested in following developments in this space, here are some great resources:
Where to Find LLMs#
Want to try different LLMs? Here’s where to find them:
Proprietary Models → OpenAI (GPT-4), Google (Gemini), Anthropic (Claude), etc.
Open-Weight Models → DeepSeek, Meta (Llama), etc. Try them via Together.ai.
Base Models → Explore Hyperbolic.