Are you frustrated with the AI hype, and the talk about software engineers becoming “obsolete?” Instead, don't you just want to understand LLMs simply and use it just like any other tool?
I was frustrated too. And I had two choices.
Instead, I chose a third option. Which is what my book is about. Incidentally, this is the option software engineers naturally elect as the solution to any problem they face. And it is the best solution. (There is indeed some truth to the smartness of programmers.)
What I liked about this book is that it tries to explain LLMs from first principles, in a simple and structured way, especially around patterns, tokens, context, prediction, and reasoning.
For me, the biggest value is that it gives a cleaner mental model to think about AI more practically, instead of treating it as magic.
AbulAsar Sayyad
Engineers have long learned that neither trying to reason relentlessly, nor staying silent, really works out for them.
Instead, they focus on a third option. Which is to understand the technology better and act on that understanding. Then, no one really is able to argue with the results, even if they have misplaced opinions about the process.
In short, engineers build.
It is through that understanding that:
In that same spirit, LLMs are just next-token prediction tools. With just that one principle, we have a plethora of emergent behaviors like assisting with queries, coding, designing, or even solving arcane problems.
However, to truly understand LLMs, it’s not enough to merely believe that they predict the next token. It’s important to actually know it and the emergent behaviors it leads to. (Just like how calling Python “simple” doesn’t make it simple for someone.)
How AI Thinks is my attempt at communicating that understanding, so that you don’t have to scour the internet or read obscure papers just to get a gist of what’s really happening. It took me several months to track down enough details to be happy with my understanding, but you can do the same in just a few hours with this book.
The purpose of this book isn’t to dive into complex math, or intricate neural network architectures, or whatever. The goal is to stay at a conceptual level while abstracting away all the unnecessary details. That way, we get an intuition for how LLMs work, how they might perform under certain circumstances, and how to design features that effectively use them. But we neither go into math, nor into code, nor into neural network architectures.
Sure, just calling an LLM a “next-token prediction” tool appears to solve the problem. Anyone can do that. But that’s mistaking simplicity to be a starting point; it isn’t.
True simplicity is a culmination. Once you cut through the contradictions, fluff, and hype, and really understand why hallucination and creativity are not so different, everything starts to look simple in hindsight. Simplicity is the destination.
Here are a few things you'll understand better as a consequence of reading this book:
Not every problem should be solved with AI (contrary to what the world largely believes). You’ll gain the intuition to tell the difference before you build the wrong thing.
Instead of trial and error, you’ll understand why certain prompts work, why others fail, and why outputs vary. LLMs don't work just because the user asked nicely.
Not just that they happen, but why they happen, and which kinds of problems make them more likely. The source of hallucinations is also the source of creativity, so it's a tough balance.
You’ll see how LLMs can appear to reason, what that means mechanically, and where that illusion breaks down. You'll never again be fooled by the loose usage of the word "reasoning."
You’ll be able to think in terms of inputs, constraints, and tradeoffs. When you see Kafka as a log, you're able to design systems better. You'll get a similar mental model for LLMs.
Taking architectural decisions is tough enough. But coming to consensus about LLMs, about which everyone has a different idea entirely, is a real challenge. You'll be able to back your solutions up with explanations that everyone understands.
Here’s the table of contents. Note that we stay away from math as much as possible. Even if it’s included, it’s at a very shallow level. The better thing to do is to get the intuition of how it works, rather than trying to do a rigorous mathematical analysis.
We keep using words like “intelligence,” “thinking,” or “reasoning” very loosely. Most marketing materials have incorporated these inaccurate buzzwords so that everyone gets confused about what’s really being implied.
The fundamentals of next-token-prediction is distilled in this chapter. We then look at how that principle translates to the tasks of drafting emails, writing code, and the relationship of text and meaning.
LLMs operate on tokens, although we can loosely say that they operate on “words” for simplicity. However, as software engineers, we should always understand the details of any approach, including the kind of edge cases that they tackle. In this chapter, we understand what the LLM actually sees behind each word, and how tokens allow us to have a more concise vocabulary while also allowing us to express novel words effectively, without having to re-train the LLMs all the time.
Instead of diving into the nitty-gritty of neural network architecture, we take a high-level view of how LLM training is prepared (in contrast to traditional machine learning models). We also look into the basics of instruction-tuning.
You will also learn how to look at text generation with the LLM’s eyes, and how the LLM manages to predict and accurately use words in the right context, even if it can’t grasp their meaning.
Finally, we discuss the two major things the LLM learns during training.
We look at how the decoding process works, and why adjusting the temperature changes the variability of the output. There is some light math here to drive the point across, because it will come in handy for the next chapter too.
Most discussions about hallucinations never look at the underlying mechanisms that cause them. If we completely eliminate hallucinations, we will be left with a model that's not creative at all. You will learn to understand the tradeoff, and to some extent, control it.
We carry the same mental models and abstractions we've been building so far, and then finally apply it to reasoning. You'll understand what "reasoning" really is, and why LLMs need to do it, and why it looks like self-talk.
The book might not appeal to everyone. However, if you like first-principles thinking, and prefer simplicity in your design more than arbitrary complexity, you'll probably love it.
What's described below are possibilities for how you might apply the knowledge in this book. These do not represent topics covered in the book, but are just ideas of its application.
Senior leaders who are responsible for the technical direction, and don't want to make fundamental mistakes that can stall their AI progress.
Leaders managing teams building with or around LLMs who need accurate intuition to guide decisions and expectations.
Senior engineers responsible for designing systems that incorporate LLMs.
Builders using LLMs as core technology who need clarity before scaling ideas, teams, or narratives.
Managers of ML or data teams who want a deeper explanation of model behavior beyond surface-level explanations.
Product managers and directors working with AI-powered features.
This book won't be the right fit if you're looking for quick fixes or surface-level answers. Because this book is about thinking clearly, not shipping faster.
People looking for step-by-step tutorials or prompt recipes won't find them here. This book teaches the why, not the how.
Readers who want hype, futurism, or AGI speculation won't find it here. This book is grounded in reality, not marketing.
Teams seeking implementation patterns without understanding fundamentals won't find shortcuts. This book requires thinking.
Honestly? I don’t know. But the question you should ask yourself is, how many engineers truly become good at what they do, without ever touching the fundamentals of their craft?
And wouldn’t you want to be someone proud of whatever you create? Instead of clobbering a few packages, snippets, and questionable design into a mess only to produce digital junk food?
I strongly believe that the purpose of new technology is to make new things possible. With AI, a good chunk of things have now become possible for non-technical folks. And those possibilities overlap somewhat with what software engineers used to do. However, new frontiers have also opened for software engineers; things they couldn’t do before but now have become plausible.
Rather than focusing on things we don’t have to do anymore, we should focus on new things that we can. And what made us engineers in the first place was our insatiable curiosity. This book a light nudge in that direction.
Writing this book was hard. One of the primary reasons is that LLMs are prone to regression to the mean. Which means, they opposed almost everything that I wrote in this book. Only when I submitted additional evidence did the LLMs actually “agree” (if they can actually do that) to whatever I had to say.
Incidentally, it’s also the reason why I possibly couldn’t have used an LLM to “generate” this book. It took good-old hard writing. And then editing. And then revising. And then making sure that all the principles covered flow effortlessly across chapters, so that you feel like reading a cohesive book rather than a collection of blog posts. Finally, I illustrated every major concept myself to ensure that they communicated the ideas well, and also designed the book in InDesign for a pleasant reading experience. (I’d be embarrassed to talk about how much time I spent just deciding on the fonts.)
It took me just a week to conceive the book's idea, but 5 months to condense it into a fluff-free form.
Perfect for personal learning
For teams and organizations
I'm current the Director of AI for a leading MarTech SaaS, and primarily focus on setting the AI vision, researching and developing interesting features through a combination of LLMs and computer science, and packaging them into great product experiences. The best part of my job is my team, who surprise me every day by pushing new possibilities.
My background: I've been programming since the age of 11 (almost 25 years). I'm an undergraduate in Electronics & Telecommunications Engineering and hold a Master's in Computer Science from GeorgiaTech, specializing in Machine Learning.
I started my career by freelancing. Then, I decided to join EdCast, which was a ed-tech startup specializing in micro-learning, as a Technical Architect. Within a few years, I became the Chief Architect. At EdCast, I also worked on classical NLP as part of our recommendation systems.
After that, I joined WebEngage as a Director of Engineering and helped design and build really high-scale stuff. Some of the major accomplishments here was getting the organization its first ISO 27001 compliance. And also introducing some of the first AI features. And improving the architecture to solve the skyrocketing cloud costs.
After that, I had a brief stint at a health-tech startup, and post that I came back to WebEngage as the Director of AI. This time, the game was purely focused on AI. I've built sophisticated features and products like Significant Factors (automated hypothesis generation and data science), our core conversational agents (for journeys, segments, content, and image assets), a few data-mining algorithms, and a lot more. My focus is turning my knowledge of computer science into product differentiators.
In writing this book, I combine my AI knowledge with my first-principles thinking approach, putting everything in the context of the business of technology so what I cover is both clear and useful. I've taken all the experience with technology which I've had, as well as the hard lessons learned by putting sophisticated AI systems on production, and tried to distill the basics which would be widely applicable in this book.