alt.hn

1/10/2025 at 7:23:50 PM

Flattening ASTs and other compiler data structures (2023)

https://www.cs.cornell.edu/~asampson/blog/flattening.html

by aw1621107

1/11/2025 at 6:08:52 AM

I thought a reddit comment on this article had an interesting point:

https://www.reddit.com/r/rust/comments/1d3b356/my_new_favori...

[–]Timzhy0 3 points 7 months ago

Btw I think one can go a step further than the author, there is no need to keep two explicit ExprRef baked in a binary node (lhs, rhs). You can exploit locality, basically the AST, seen it the LISP way, is just an arbitrarily nestable list, where elements are atoms or other lists. Hence all you need to know is where each list ends (and if it's an atom you can assume it spans one node) and actually one bit to know if it is the last entry in the list is quite ergonomic as well (because then you can distinguish whether moving next slot in the AST means there is a sibling). Basically it's easier to keep it sync while constructing and takes up less memory per node. I pay 40 bits per node, stored interleaved for best cache locality (some unaligned accesses but I think it's still worthwhile), 8 bits for the tag, 32 for the data, if data is bigger, 32 is an index into some auxiliary segment (basically a ptr).

by jgrowl

1/11/2025 at 6:33:45 PM

An arbitrarily nestable list is a tree, no?

by catgary

1/10/2025 at 9:49:04 PM

"Instead of allocating Expr objects willy-nilly on the heap, we’ll pack them into a single, contiguous array." Zig compiler pipeline (AST, Zir, Air, Sema) does exactly this on all layers. Not only contiguous, but instead of array-of-structs it is struct-of-arrays, so walking the tree is even more cache friendly. For AST see: https://github.com/ziglang/zig/blob/master/lib/std/zig/Ast.z...

by dmagyari

1/11/2025 at 7:32:05 AM

I work on a C dialect where everything is flattened. JSON and other trees in particular. Binary heaps are flat, merge sort and iterator heaps are absolutely great, can build LSM databases with that. Stacks, circular buffers, hash maps, etc, all flat. Templated output (PHP like) is done by a flat data structure.

https://github.com/gritzko/librdx/blob/master/abc/B.md

Apart from locality and lifetimes, these flat data structures improve composability. When every data structure is a flat buffer, you can mmap them or zip them or send them by the network, all by the same routine. They are uniform like bricks, in a sense.

by gritzko

1/11/2025 at 3:52:13 PM

I worked in a language where all datastructures were "flattened", could be trivially serialized to disk, and read in again. Called print and read. The language was called lisp. All flat, just parens.

Some of my compilers export the AST as lisp trees. Much smaller and more readable than json, and it can be executed. Uniform like bricks

by rurban

1/11/2025 at 6:36:37 PM

> All flat, just parens.

So not flat then. Prefix is not postfix. Forth, and most concatenative languages, are much closer to actually bein, flat.

Lisp is trivial to flatten, but that's not the same thing.

by vanderZwan

1/11/2025 at 12:43:13 AM

Makes me wonder if people in APL/J/K community have not been influenced or influencing this kind of technique. IIRC Aaron Hsu does tree processing through arrays (but i'm not skilled enough to analyze his code)

by agumonkey

1/11/2025 at 7:10:07 AM

Do you have a link to such an example of Aaron's code? Thank you in advance!

by gsf_emergency

1/11/2025 at 11:37:49 AM

https://scholarworks.iu.edu/dspace/items/3ab772c9-92c9-4f59-...

Iverson's 1962 book also mentions tree representations, see pp45-62: https://archive.org/details/aprogramminglanguage1962/page/n6...

by 082349872349872

1/13/2025 at 5:50:39 AM

Thanks. Was hoping for low-level code that might even blow a deue(I?)ng(m?)ker's mind on THE semantics-syntax duality..

This was closest: https://www.dyalog.com/blog/2018/01/stackless-traversal/

Your links should still be useful for orienteering!

by gsf_emergency

1/11/2025 at 11:54:14 AM

Can't remember where exactly but he did demo his code in talks/conferences with links.

by agumonkey

1/13/2025 at 5:49:20 AM

I've given up on finding relevant low-level CUDA(? tacit? asm??!!) code from Hsu*, but I did push the following to my to-watch stack

https://youtu.be/z8MVKianh54

Does APL need a type system

Guess it's time to reverse whatever else I can find

https://www.dyalog.com/uploads/conference/dyalog16/prerequis...

*Bad faith, or just a run o' the mill, aka compleatly forgiveable profiteering?

by gsf_emergency

1/13/2025 at 10:17:53 AM

Ahh apologies

https://news.ycombinator.com/item?id=13797797

https://news.ycombinator.com/item?id=13576176

by gsf_emergency

1/14/2025 at 2:59:32 PM

He must be referring to https://dl.acm.org/doi/pdf/10.1145/2935323.2935331 ?

Just looking at refs for the moment: Henglein and Hinze's discriminators are interesting, whenever you come back up for air. (are they also amenable to sorting codata?)

The oft-cited R Bernecky is, IIRC, also known as "Boolean Bob" for his SWAR-style algos. EDIT: nope, confused him with R Smith: https://aplwiki.com/wiki/Bob_Smith

(I once asked Boolean Bob if any of his tricks went back to the card processing days —I could even believe the keyed tree* might?— but he was too young to know, and the people he'd have liked to ask are no longer available.)

EDIT: Aardappel also has some interesting languages: https://strlen.com/#programming-languages

* for manipulating Bills of Materials despite the linearity of card decks?

EDIT2: compare "§3.2.1 Constructing Node Coordinates" (p34) with [T in "Table 1.19 Full list matrices of the tree of Fig. 1.16" in A Programming Language (p50): https://archive.org/details/aprogramminglanguage1962/page/n6...

        Fc
  1 0 0 0 0
  1 1 0 0 0
  1 1 1 0 0
  1 1 1 1 0
  1 1 1 1 1
  1 1 1 1 2
  1 1 1 1 3
  1 1 2 0 0
  1 1 2 2 0
        Ec
  1 0 0
  1 1 0
  1 1 1
  1 1 2
  1 1 3
  1 2 0
  1 3 0
  1 3 4
  1 3 5
  1 3 6
[all three appear to be 1-origin, probably due to the phenomenon mentioned in the footnote on p49 of APL]

by 082349872349872

1/15/2025 at 4:32:51 AM

Thanks!

[Too low global b/w to feel the adrenalin.. wow.. what was your deduction chain for figuring how BoMs was next item on my stack? Guessing you started from assumption that pure bits mongering are not on the boundary of my ikigais(yet)]

On the heap, I (re)surfaced "Typed Array Intermediate Language" slides, but too low (local) b/w to try to find out^W^W^W^W sus out if this or smth v similar is already in dyalog.com's workflow.

https://news.ycombinator.com/item?id=11974936

>Bernecky

https://news.ycombinator.com/item?id=11963548

What were those slides about formalizing Euclid ?

[Medium b/w vibing that multiplicity of edits is a precise estimate of flow-of-war, ~ webpage memory usage is an accurate estimate of just how well run the entire organization is..]

by gsf_emergency

1/16/2025 at 12:27:04 PM

still haven't refound the slides — every 5 or 6 weeks I get a new idea of who it might have been, but so far to no avail...

[BoMs were complete coincidence]

by 082349872349872

1/11/2025 at 11:20:39 AM

I guess Rust's contribution to high performance programming is that its borrow checker is so loathsome that it pushes people into using ideas like ECS or arenas just to not have to bother with it.

by torginus

1/10/2025 at 10:21:39 PM

> Instead of allocating Expr objects willy-nilly on the heap, we’ll pack them into a single, contiguous array.

This happens naturally if you bump-allocate them in a garbage-collected run-time, particularly under a copying collector. Free lists also tend to co-locate because they are produced during sweep phases of GC which run through heaps in order of address.

Don't make me bring out the L word for the billionth time.

> A flat array of Exprs can make it fun and easy to implement hash consing

OK, it's not a case of L-ignorance, just willful neglect.

by kazinator

1/10/2025 at 10:34:04 PM

FWIW I did acknowledge this in the article:

> A sufficiently smart memory allocator might achieve the same thing, especially if you allocate the whole AST up front and never add to it

> Again, a really fast malloc might be hard to compete with—but you basically can’t beat bump allocation on sheer simplicity.

by samps

1/11/2025 at 6:50:21 PM

And if you don’t need more than 32 GB of heap space, the JVM also gives you the ability to reduce reference sizes to 32 bits, with compressed references. (Due to alignment requirements, the lower 3 bits of a pointer are zero and hence do not need to be stored, which effectively gives you a 35-bit address space.) Of course, the presence of object headers counteracts this to a certain extent.

by layer8

1/11/2025 at 11:25:34 AM

About 10 years ago working with AST trees I (re)invented a flat structure representing trees in a flat array. It reminds me of what is described here but not exactly. In my case I needed only two indices per node: total sub-region length of all the children and sub-children and parent index (so no need to have indices of all children). Total sub-length basically can be interpreted as the index of the next sibling. With such a structure it's easy/cheap to execute FindFirstChild/FindNextSibling.

Later the theory behind such structures was revealed as "Nested set model" [1]. The article seems to not mention the internal representation, but I think that the implementation should use something like my solution, so fixed number of references per node

[1] https://en.wikipedia.org/wiki/Nested_set_model

by Agraillo

1/10/2025 at 9:50:15 PM

Amazing article, very good advice to keep your data structures flat.

Adding to that, it also makes editing the AST vastly more efficient.

I have discovered that principle on my own when I worked on an editor that directly operated on the AST instead of text. I found manipulating the tree-style AST so painful, constantly traversing the tree and all. Once I made it flat, my life was a hell lot easier. You can just directly index any part of AST in linear time.

by cardanome

1/11/2025 at 8:27:04 AM

Rediscovering techniques that were somewhat well-known in the 70s and 80s.

See also: https://en.wikipedia.org/wiki/Binary_heap

by userbinator

1/11/2025 at 8:35:59 AM

heh - I built compilers this back in the 70s because the machine I was working on didn't really do pointers as a 1st class data structure (B6700 algol), it's not really surprising finding someone doing something similar in another language that makes pointers difficult to deal with

by Taniwha

1/11/2025 at 11:23:45 AM

Yup, and chances are if you're using a good C++ stl implementation, most containers already use packed storage internally. It doesn't even have a heap data structure, it uses an std::vector, with a set of helper functions.

by torginus

1/11/2025 at 2:40:32 PM

Twee (an equational theorem prover in Haskell used by quickspec) has an interesting take on this. Terms are slices of arrays, but you get a normal interface including pattern matching via synonyms. It can also be nice to use phantom types of your references (array offsets), so if you project them into flat view types you can do so type safely

Requires the language to have something equivalent to pattern synonyms to be as invisible as twee, though.

In twee a TermList is a slice of a bytearray (two ints for offset/length plus a pointer).

And a term is an int for the function symbol and an unpacked TermList for the arguments.

The pattern match synonyms load a flat representation from the array into a view type, and the allocation of the view type cancels out with the pattern matching so everything remains allocation free.

https://hackage.haskell.org/package/twee-lib-2.4.2/docs/Twee...

by Tarean

1/11/2025 at 8:49:21 PM

Forgot to mention: In the twee style, the int for the function id contains metadata (is it a unification variable or constant name? how many args does it take?). That way f1(f3(f5(), f7())) would be serialised as something like [1,3,5,7], without even references to other offsets

by Tarean

1/10/2025 at 9:44:26 PM

This is a fantastic idea. AST works well in an array based allocation block since it has no need for freeing individual nodes. It’s an add-only allocation.

by ww520

1/11/2025 at 9:17:06 AM

What about transforming the AST after it is built, or deriving a new tree which partly aliases the original in persistent structure fashion?

by JonChesterfield

1/10/2025 at 10:16:07 PM

Cool! Carbon is doing exactly this. I had asked leads if there was a paper on this approach, but they didn't have anything for me. I'll send them this post!

by ndesaulniers

1/11/2025 at 4:31:19 PM

Chandler discusses it in this video though! https://youtu.be/ZI198eFghJk

You get some traversals for free with this layout (preorder, reverse post order). Can search for subtrees with string searching algorithms or more complex things with regex.

by wrsh07

1/12/2025 at 2:25:31 AM

Ah, forgot if they had talked about that pattern externally yet. Thanks for the link.

by ndesaulniers

1/11/2025 at 2:25:19 PM

One advantage to this is the ease with which it handles ephemeral annotations.

Suppose you want to attach additional information to some of the nodes of the AST. Different algorithms on the AST will attach different information; you don't necessarily need them all at the same time or know ahead of time what you'll need.

With nodes, you have to have some sort of node/value hash table, or hang a key/value map off each node. But with this flattened representation, each datum gets its own flat array as well, which can be independently allocated and deallocated.

One other thing I noticed about this flat representation is that it throws static typing into a cocked hat! All you have to refer to other nodes is indices. All different kinds of nodes are stored in the same array.

by pfdietz

1/11/2025 at 12:17:43 AM

As the article mentions, this makes it quite similar to a bytecode vm. I think the traditional wisdom is that an AST walker is easy to write, but for speed you'd want a bytecode interpreter. It'd be interesting to see how close the performance gets with this flattened AST.

In practice I think there are more differences. E.g. AST interpreters tend to pass environments around while bytecode interpreters often store these on a stack (though I guess there's nothing stopping you from doing this with an AST either). I wonder if there's some goldilocks zone for ease of implementation with decent performance.

by hencq

1/11/2025 at 12:44:30 AM

If you instead flatten the expression tree into RPN, then you can execute it like that, with a stack machine.

I seem to recall that the Red Dragon Book (Compilers: Principles, Techniques and Tools, Aho, Sethi, Ullman [1988]) describes a technique whereby intermediate code is represented in RPN, and transformations are performed by pattern matches on it.

by kazinator

1/11/2025 at 1:46:36 AM

The sample flat program in the post is exactly RPN, no?

by finnh

1/11/2025 at 2:59:47 AM

I think it would be more like RPN if it used a stack, and operands were specified as relative offsets (i.e., stack offsets). In the version I wrote, operands are still represented as absolute offsets in the expression table.

by samps

1/11/2025 at 1:58:12 PM

You can also be smart about memory with lexing for great winnage. Have a struct for your tokens that has an enum for your token type and either pointer or indices or a string_view (or a &str but lol lotsa luck with the borrow checker). You can then have a vector of your token structs for fast allocation and iteration and you have a slice for the token back into the original input, no substring copying.

by jonstewart

1/11/2025 at 3:06:49 PM

Yes, the C FE I write (in C) does exactly this and othen utputs in one pass a flattened intermediate code. I did not see at as AST because it does semantic analysis during parsing, but it sitll has all the information.

by uecker

1/11/2025 at 12:55:12 PM

Did the author just reinvent Forth?

by efitz

1/11/2025 at 5:59:49 AM

[dead]

by linkerdoo