4/15/2026 at 10:49:12 AM
*Donald Knute -> Donald Ervin Knuth is the author of the book "The Art of Computer Programming" (in progress for a couple of decades, currently volume 4c is being written). It is quite advanced, and it will likely not cover compilers anymore (Addison-Wesley had commissioned a compiler book from Knuth when he was a doctoral candidate, now he is retired and has stated his goal for the series has changed).I disagree with the author's point: the "Dragon book"'s ("Compilers: Principles, Techniques, and Tools" by Aho et al.) Chapter 2 is a self-sufficient introduction into compilers from end to end, and it can be read on its own, ignoring the rest of the excellent book.
Another fantastic intro to compiler writing is the short little book "Compilers" by Niklaus Wirth, which explains and contains the surprisingly short source code of a complete compiler (the whole book is highly understandable - pristine clarity, really) and all in <100 pages total (99).
(I learned enough from these two sources to write a compiler in high school.)
by jll29
4/15/2026 at 1:21:43 PM
The dragon book almost convinced me never to try to write a compiler. I don't know why people recommend it. I guess you're a lot smarter than I am.There are some excellent books out there. In its own way, the dragon book is excellent, but it is a terrible starting place.
Here are a bunch of references from the same vintage as OP. I recommend starting with a book that actually walks through the process of building a compiler and doesn't spend its time exclusively with theory.
by projektfu
4/15/2026 at 4:08:29 PM
You're not the only one. In college I took a compilers course and we used the dragon book, to me it sucked the joy out of the magical concept of making a compiler.Some years later I (re-) discovered Forth, and I thought "why not?" and built my own forth in 32-bit Intel assembly, _that_ brought back the wonder and "magical" feeling of compilers again. All in less than 4KB.
I guess I wasn't the right audience for the dragon book.
by rangerelf
4/15/2026 at 6:05:15 PM
Great thread. If you have 1 hour to get started, I recommend opening Engineering a Compiler and studying Static Single-Assignment (SSA) from ch 9.3.The book is famous for its SSA treatment. Chapters 1-8 are not required to understand SSA. This allows you to walk away with a clear win. Refer to 9.2 if you're struggling with dominance + liveness.
http://www.r-5.org/files/books/computers/compilers/writing/K...
by _false
4/15/2026 at 9:58:24 PM
I bought this book when I was working on a toy language and I think I was too stupid to understand most of it. The first few chapters were great, but it quickly surpassed my capacity to understand. Seeing it mentioned makes me want to revisit.by unclad5968
4/15/2026 at 7:26:21 PM
It was a product of its time I guess, much better ones from similar vintage,The Tiger book (with C, Standard ML, and Java variants)
https://www.cs.princeton.edu/~appel/modern/
Compiler Design in C (freely available nowadays, beware this is between K&R C and C89)
lcc, A Retargetable Compiler for ANSI C
Or if one wants to go with more clever stuff,
Compiling with Continuations
Lisp in Small Pieces
by pjmlp
4/15/2026 at 7:55:54 PM
Another vote for Lisp in Small Pieces. Great high level compiler book that teaches you how to build a Lisp and doesn’t get bogged down in lexing and parsing.by drob518
4/16/2026 at 4:18:42 PM
Instead of Lisp in Small Pieces I'd recommend SICP instead. No continuation passing, but much better written.by rurban
4/17/2026 at 5:55:24 AM
And no information on how to actually do a compiler, end to end, only a self hosted interpreter.The authors don't have the same audience in mind.
I would recommend both, one is about actual Lisp compilers, the other alternative computation models.
by pjmlp
4/15/2026 at 4:19:15 PM
Imho the problem is the fixation on parser generators and BNF. It's just a lot easier to write a recursive descent parser than to figure out the correct BNF for anything other than a toy language with horrible syntax.by randomNumber7
4/16/2026 at 8:24:09 AM
Imo BNF (or some other formal notation) is quite useful for defining your syntax, my biggest gripe with BNF in particular is the way it handles operator precedence (through nested recursive expressions), which can get messy quite fast.Pratt parsers dont even use this recursion, they only have a concept of 'binding strength', which means in laymans terms that if I'm parsing the left side of say a '' expression, and I managed to parse something a binary subexpression, and the next token I'm looking at is another binary op, do I continue parsing that subexpression, which will be the RHS of the '' expression, or do I finish my original expression which will then be the LHS of the new one?
It represents this through the concept of stickiness, with onesimple rule - the subexpression always sticks to the operator that's more sticky.
This is both quite easy to imagine, and easy to encode, as stickiness is just a number.
I think a simpler most straightforward notation that incorporates precedence would be better.
by torginus
4/16/2026 at 10:01:31 AM
> I think a simpler most straightforward notation that incorporates precedence would be better.For example YACC provides a way to specify how a shift/reduce conflict
> https://www.gnu.org/software/bison/manual/html_node/Shift_00...
and a reduce/reduce conflict
> https://www.gnu.org/software/bison/manual/html_node/Reduce_0...
should be resolved; see also
> https://www.gnu.org/software/bison/manual/html_node/Mysterio...
This is actually a way to specify operator precedence in a much simpler and more straightforward way than via nested recursive expressions.
by aleph_minus_one
4/15/2026 at 11:48:09 PM
I would argue the opposite: Being describable in BNF is exactly the hallmark of sensible syntax in a language, and of a language easily amenable to recursive descent parsing. Wirth routinely published (E)BNF for the languages he designed.by microtherion
4/15/2026 at 5:18:10 PM
The problem with recursive descent parsers is that they don't restrict you into using simple grammars.But then, pushing regular languages theory into the curriculum, just to rush over it so you can use them for parsing is way worse.
by marcosdumay
4/15/2026 at 10:42:36 PM
> But then, pushing regular languages theory into the curriculum, just to rush over it so you can use them for parsing is way worse.At least in the typical curriculum of German universities, the students already know the whole theory of regular languages from their Theoretical Computer Science lectures quite well, thus in a compiler lecture, the lecturer can indeed rush over this topic because it is just a repetition.
by aleph_minus_one
4/16/2026 at 4:28:05 PM
(Same for US universities at least 30ish years ago)by jjtheblunt
4/15/2026 at 2:15:47 PM
When I was professionally writing a compiler professionally (see https://ciex-software.com/intro-to-compilers.html) the Dragon book was the second book that I read. I found it very helpful. That was the first Dragon book. I got the second one later. I would have been ok to start with the Dragon book--the Compiler Generator book was a harder study.by wglb
4/15/2026 at 1:31:36 PM
I started with the dragon book, and I found it to be a good introductory text.A lot of people say the dragon book is difficult, so I suppose there must be something there. But I don't see what it is, I thought it was quite accessible.
I'm curious, what parts/aspects of the dragon book make it difficult to start with?
by tovej
4/15/2026 at 2:02:42 PM
It's been a few years since I worked with the dragon book, but I think the most common complaint was that it starts with like 350 pages on parser theory: generating bottom-up and top-down parsers from context free grammars, optimizing lexers for systems that don't have enough RAM to store an entire source file, etc... before ever getting to what most people who want to write a compiler care about (implementing type inference, optimizing intermediate representations, generating assembly code). Of course parsing is important, and very interesting to some. But there's a reason most modern resources skip over all of that and just make the reader write a recursive descent parser.by hmry
4/15/2026 at 4:43:06 PM
I guess "back in the day" you had to be able to write an efficient parser, as no parser generators existed. If you couldn't implement whatever you wanted due to memory shortage at the parser level, then obviously it's gonna be a huge topic. Even now I believe it is good to know about this - if only to avoid pitfalls in your own grammar.I repeatedly skip parts that are not important to me when reading books like this. I grabbed a book about embedded design and skipped about half of it, which was bus protocols, as I knew I wouldn't need it. There is no need to read the dragon book from front to back.
> But there's a reason most modern resources skip over all of that and just make the reader write a recursive descent parser.
Unless the reason is explicitly stated there is no way to verify it's any good.
There's a reason people use AI to write do their homework - it just doesn't mean it's a good one.
I can think of plenty arguments for why you wouldn't look into the pros and cons of different parsing strategies in an introduction to compilers, "everyone is(or isn't) doing it" does not belong to them.
In the end, it has to be written down somewhere, and if no other book is doing it for whatever reason, then the dragon book it shall be. You can always recommend skipping that part if someone asks about what book to use.
by ablob
4/16/2026 at 8:30:57 AM
The thing about parsing (and algorithms in general) is that it can be hair raisingly complex for arbitrary grammars, but in practice, people have recently discovered, that making simple, unambiguous grammars, and avoiding problems, like context dependent parsing, make the parsing problem trival.Accepting such constraints is quite practical, and lead to little to no loss of power.
In fact, most modern languages are designed with little to no necessary backtracking and simple parsing, Go and Rust being noteworthy examples.
by torginus
4/16/2026 at 10:43:58 AM
> In fact, most modern languages are designed with little to no necessary backtracking and simple parsing, Go and Rust being noteworthy examples.But to understand how to generate grammars for languages that are easy to parse, you have in my opinion to dive quite deeply into parsing theory to understand which subtle aspects make parsing complicated.
by aleph_minus_one
4/16/2026 at 8:41:22 PM
My personal context to understand where I'm coming from - I'm working on my own language, which is a curly-brace C-style language with quite, where I didn't try to stray too far from established norms, and the syntax is not that fancy (at least not in that regard). I also want my language to look familiar to most programmers, so I'm deliberately sticking close to established norms.I'm thankfully past the parsing stage and so far I haven't really encountered much issues with ambiguity, but when I did, I was able to fix them.
Also in certain cases I'm quite liberal with allowing omission of parentheses and other such control tokens, which I know leads to some cases where either the code is ambiguous (as in there's no strictly defined way the compiler is supposed to interpret it) or valid code fails to parse,
So far I have not tackled this issue, as it can always be fixed by the programmer manually adding back those parens for example. I know this is not up to professional standards, but I like the cleanliness of the syntax and simplicity of the compiler, and the issue is always fixable for me later. So this is a firm TODO for me.
Additionally I have some features planned that would crowd up the syntax space in a way that I think would probably need some academic chops to fix, but I'm kinda holding off on those, as they are not central to the main gimmickTM and I want release this thing in a reasonable timeframe.
I don't really have much of a formal education in this, other than reading a few tutorials and looking through a few implementations.
Btw, besides just parsing, there are other concerns in modern languages, such as IDE support, files should be parseable independently etc., error recovery, readable errors, autocomplete hints, which I'm not sure are addressed in depth in the dragon book. These features I do want.
My two cents is that for a simple modern language, you can get quite far with zero semantic model, while with stuff like C++ (with macros), my brain would boil at the thought of having to write a decent IDE backend.
by torginus
4/15/2026 at 5:46:47 PM
I actually think the parsing part is more important for laymen. Like, there may be a total of 10K programmers who are interested in learning compiler theories, but maybe 100 of them are ever going to write the backend -- the rest of them are stuck with either toy languages, or use parsing to help with their job. Parsing is definitely more useful for most of us who are not smart enough :Dby markus_zhang
4/15/2026 at 7:30:28 PM
Yeah I agree, that seems vey true. Although the average person probably also benefits more from learning about recursive descent and pratt parsing than LL(k) parser generators, automata, and finding first and follow sets :)by hmry
4/15/2026 at 2:07:03 PM
the dragon book is how to write a production grade thing i guess. it has all the interesting concepts very elaborated on which is great but it dives quickly into things that can clutter a project if its just for fun..by saidnooneever
4/15/2026 at 5:36:24 PM
It’s academic and comprehensive, that’s the issue. It’s not about writing a production grade compiler, though, in my humble opinion. There are more things to learn for that, unfortunately… is just a pretty big topic with lots of stuff to learn.by emigre
4/15/2026 at 9:22:51 PM
the dragon book is all i have on the topic. it was a big investment for me.it taught me to think very differently but i am sure i am still not ready to write a compiler :D
by saidnooneever
4/15/2026 at 11:52:46 PM
The dragon book almost convinced me never to try to write a compiler.
That was the point. That's why it's not a cute beaver on the cover :)
by keyle
4/15/2026 at 1:30:59 PM
The "Dragon Book" is big on parsing but I wouldn't recommend it if you want to make many optimisation passes or a back-end.The first edition was my first CS textbook, back in the '90s and as a young programmer I learned a lot from it. A couple years ago, I started on a modern compiler back-end however, and found that I needed to update my knowledge with quite a lot.
The 2nd ed covers data-flow analysis, which is very important. However, modern compilers (GCC, LLVM, Cranelift, ...) are built around an intermediate representation in Static Single Assignment-form. The 2nd ed. has only a single page about SSA and you'd need to also learn a lot of theory about its properties to actually use it properly.
by Findecanor
4/15/2026 at 2:07:55 PM
Parsing is the front end to a compiler. Can't get semantics without first recognizing syntax. I have a hard time thinking about programming languages without seeing them as a parsing exercise first, every time.by aldousd666
4/15/2026 at 2:43:54 PM
The recommended advice is to start with semantics first. Syntax will change, there is not much point fixing it down too early.Most of the work is actually the backend, and people sort of illusion themselves into "creating a language" just because they have an AST.
by gf000
4/15/2026 at 4:30:13 PM
Syntax and semantics are never orthogonal and you always need syntax so it must be considered from the start. Any reasonable syntax will quickly become much more pleasant to generate an ast or ir than, say, manually building these objects in the host language of the compiler which is what the semantics first crowd seem to propose.It also is only the case that most of the work is the backend for some compilers, though of course all of this depends on how backend is defined. Is backend just codegen or is it all of the analysis between parsing and codegen? If you target a high level language, which is very appropriate for one's first few compilers, the backend can be quite simple. At the simplest, no ast is even necessary and the compiler can just mechanically translate one syntax into another in a single pass.
by thefaux
4/15/2026 at 4:47:44 PM
I think his point is that "form follows function". If you know what kind of semantics you're going to have, you can use that to construct a syntax that lends itself to using it properly.by ablob
4/15/2026 at 11:56:20 PM
> The recommended advice is to start with semantics first. Syntax will change, there is not much point fixing it down too early.It's actually the reverse, in my opinion. Semantics can change much more easily than syntax. You can see this in that small changes in syntax can cause massive changes in a recursive-descent parser while the semantics can change from pass-by-reference to pass-by-value and make it barely budge.
There is a reason practically every modern language has adopted syntax sigils like (choosing Zig):
pub fn is_list(arg: arg_t, len: ui_t) bool {
This allows the identification of the various parts and types without referencing or compiling the universe. That's super important and something that must be baked in the syntax at the start or there is nothing you can do about it.
by bsder
4/15/2026 at 3:28:04 PM
Getting an overview of parsing theory is mainly useful to avoid making ambiguous or otherwise hard to parse grammars. Usually one can't go too wrong with a hand-written recursive descent parser, and most general-purpose language are so complicated that parser generator can't really handle them. Anyway the really interesting parts of compiling happen in the backend.Another alternative is basing the language on S-expressions, for which a parser is extremely simple to write.
by samus
4/15/2026 at 11:47:21 AM
There is still hope for a compiler book. From Knuth's website:> And after Volumes 1--5 are done, God willing, I plan to publish Volume 6 (the theory of context-free languages) and Volume 7 (Compiler techniques), but only if the things I want to say about those topics are still relevant and still haven't been said.
by antiquark
4/15/2026 at 1:23:01 PM
I don't think there is hope if you look at actuarial tables and Knuth's age. It's not clear to me if he'll be able to finish volume 4. The outline he has seems to have enough material to fill volumes 4C-4G to my eyes, and he isn't exactly cranking out the volumes.Admittedly, volumes 5-7 wouldn't be as massive as volume 4 (it sort of turns out that almost all interesting algorithms ends up being categorized as being in volume 4), so you probably wouldn't have a half-dozen subvolumes per topic but, it's still too many books down the line, especially if he plans to revise volumes 1-3 before working on anything else.
by jcranmer
4/16/2026 at 1:18:17 AM
Have no fear, we'll just train an LLM on TAOCP and have it automatically generate the remaining volumes‽by kibwen
4/15/2026 at 1:14:07 PM
I hope that God is indeed willing, but the man is 88 years old and he’s not done with the third tome of volume four. It would require a minor miracle for him to finish volume 7 within this lifetime.by gdwatson
4/15/2026 at 1:36:05 PM
I really hope he ends up completing the whole series. I started volume one recently and it is excellentby mghackerlady
4/15/2026 at 11:30:14 AM
> "Compilers" by Niklaus WirthThis one? https://people.inf.ethz.ch/wirth/CompilerConstruction/Compil...
by Hendrikto
4/15/2026 at 2:18:52 PM
i found the same file but that's only 44 pages long ?This ( https://github.com/tpn/pdfs/blob/master/Compiler%20Construct... ) seems to be a previous version (2005) and it's 131 pages long
by znpy
4/15/2026 at 3:41:42 PM
You'll want part 2 as well for a total of 107 pages.by guenthert
4/15/2026 at 3:37:08 PM
I'd never seen Knuth's middle name until your comment. I think it safely could be left out of an article.by LoganDark