3/14/2026 at 12:58:01 PM
XML is notoriously expensive to properly parse in many languages. Basically, the entire world centers around 3 open source implementations (libxml2, expat and Xerces), if you want to get anywhere close to actual compliance. Even with them, you might hit challenges (libxml2 was largely unmaintained recently, yet it is the basis for many bindings in other languages).The main property of SGML-derived languages is that they make "list" a first class object, and nesting second class (by requiring "end" tags), and have two axes for adding metadata: one being the tag name, another being attributes.
So while it is a suitable DSL for many things (it is also seeing new life in web components definition), we are mostly only talking about XML-lookalike language, and not XML proper. If you go XML proper, you need to throw "cheap" out the window.
Another comment to make here is that you can have an imperative looking DSL that is interpreted as a declarative one: nothing really stops you from saying that
totalOwed = totalTax - totalPayments
totalTax = tentativeTaxNetNonRefundableCredits + totalOtherTaxes
totalPayments = totalEstimatedTaxesPaid +
totalTaxesPaidOnSocialSecurityIncome +
totalRefundableCredits
means exactly the same as the XML-alike DSL you've got.One declarative language looking like an imperative language but really using "equations" which I know about is METAFONT. See eg. https://en.wikipedia.org/wiki/Metafont#Example (the example might not demonstrate it well, but you can reorder all equations and it should produce exactly the same result).
by necovek
3/14/2026 at 2:00:20 PM
I keep seeing people make the same mistake as XML made over and over; without learning from it. I will clarify the problem thusly:> The more capabilities you add to a interchange format, the harder that format is to parse.
There is a reason why JSON is so popular, it supports so little, that it is legitimately easy to import. Whereas XML supports attributes, namespaces, CDATA, DTDs, QNames, xml:base, xml:lang, XInclude, etc etc. They gave it everything, including the kitchen sink.
There was a thread here the other day about using Sqlite as an interchange format to REDUCE complexity. Look, I love Sqlite, as an application specific data-store. But much like XML it has a ton of capabilities, which is good for a data-store, but awful for an interchange format with multiple producers/consumers with their own ideas.
CSV may be under-specified, but it remains popular largely due to its simplicity to produce/consume. Unfortunately, we're seeing people slowly ruin JSON by adding e.g. commands to the format, with others than using those "comments" to hold data (e.g. type information), which must be parsed. Which is a bad version of an XML Attribute.
by Someone1234
3/14/2026 at 2:58:55 PM
I think JSON has the opposite problem, it is too simple, the lack of comments in particular is particularly bad for many common usages of the format today.I know some implementations of JSON support comments and other things, but is is not true JSON, in the same way that most simple XML implementations are not true XML. That's what I say "opposite problem", XML is too complex, and most practical uses of XML use incomplete implementations, while many practical uses of JSON use extended implementations.
By the way, this is not a problem for what JSON was designed for: a text interchange format, with JS being the language of choice, but it has gone beyond its design: configuration files, data stores, etc...
by GuB-42
3/14/2026 at 3:17:26 PM
A lot of people dislike that decision not to include comments in JSON, but I think while shocking it was and is totally correct.In a programming language it's usually free to have comments because the comment is erased before the program runs; we usually render comments in grey text because they can't change the meaning of the program.
In a data language you have no such luxury. In a data language there's no comment erasure happening between the producer and the consumer, so comments are just dangerous as they would without doubt evolve into a system of annotations -- an additional layer of communication which would then not be standardized at all and which then would grow into a wild west of nonstandard features and compatibility workarounds.
by conartist6
3/14/2026 at 4:27:47 PM
I don't dislike the decision at all, FWIW! For data interchange it's totally reasonable. But it does make JSON ill-suited for a bunch of applications that JSON has been forcefully and unfortunately applied to.by phlakaton
3/14/2026 at 6:07:29 PM
> so comments are just dangerous as they would without doubt evolve into a system of annotations -- an additional layer of communication which would then not be standardized at all and which then would grow into a wild west of nonstandard features and compatibility workaroundsIIRC Douglas Crockford explicitly stated that he saw people initially using comments for a purpose like ad hoc preprocessor directives.
by jancsika
3/14/2026 at 5:00:59 PM
No, it was obviously and flagrantly incorrect, as evidenced by the success of interchange formats that do allow for comments, including many real world systems that pragmatically allow comments even when JSON says they shouldn't. This is Stockholm Syndrome.But what can we expect from a spec that somehow deems comments bad but can't define what a number is?
by quotemstr
3/14/2026 at 10:35:15 PM
How do you feel numbers are ill defined in json? The syntactical definition is clear and seems to yield a unique and obvious interpretation of json numbers as mathematical rational numbers.A given programming language may not have a built in representation for rational numbers in general. That isn't the fault of json.
by colonwqbang
3/14/2026 at 10:59:41 PM
I can't really tell what you're trying to say; JSON also has no representation for rational numbers in general. The only numeric format it allows is the standard floating point "2.01e+25" format. Try representing 1/3 that way.The usual complaint about numbers not being well-defined in JSON is that you have to provide all numbers as strings; 13682916732413492 is ill-advised JSON, but "13682916732413492" is fine. That isn't technically a problem in JSON; it's a problem in Javascript, but JSON parsers that handle literals the same way Javascript would turn out to be common.
Your "defense", on the other hand, actually is a lack in JSON itself. There is no way to represent rational numbers numerically.
by thaumasiotes
3/15/2026 at 11:10:44 AM
I didn't say that json can represent all rational numbers. I said that all json numbers have an obvious interpretation as a rational number.So far you haven't really shown an example of a json number which has an ambiguous or ill defined interpretation.
Maybe you mean that json numbers may not fit into 32 bit integers or double floats. That's certainly true but I don't see it as a deficiency in the standard. There is no limit on the size of strings in json, so why have a limit on numbers?
by colonwqbang
3/15/2026 at 11:46:16 AM
>> A given programming language may not have a built in representation for rational numbers in general.Why did you say this?
by thaumasiotes
3/14/2026 at 9:43:18 PM
As long as they stay comments there's no harm. As soon as they become struct tags and stripping comments affects the document's meaning you lose the plot.by Spivak
3/14/2026 at 3:35:00 PM
Could you imagine hitting a rest api and like 25% of the bytes are comments? lolby blackcatsec
3/14/2026 at 5:11:55 PM
Worse than that - people will start tagging "this value is a Date" via comments, and you'll need to parse ad-hoc tags in the comments to decode the data. People already do tagging in-band, but at least it's in-band and you don't have to write a custom parser.by dunham
3/14/2026 at 9:18:28 PM
See also: postscript. The document structure extensions being comments always bothered me. I mean surely, surely in a turing complete language there is somewhere to fit document structure information. Adobe: nah, we will jam it in the comments.https://dn790008.ca.archive.org/0/items/ps-doc-struc-conv-3/...
by somat
3/15/2026 at 8:52:17 AM
Not sure it's a fair comparison. The spec says:"Use of the document structuring conventions... allows PostScript language programs to communicate their document structure and printing requirements to document managers in a way that does not affect the PostScript language page description"
The idea being that those document managers did not themselves have to be PostScript interpreters in order to do useful things with PostScript documents given to them. Much simpler.
For example, a page imposition program, which extracts pages from a document and places them effectively on a much larger sheet, arranged in the way they need to be for printing 8- or 16- or 32-up on a commercial printing press, can operate strictly on the basis of the DSC comments.
To it, each page of PostScript is essentially an opaque blob that it does not need to interpret or understand in the least. It is just a chunk of text between %%BeginPage and %%EndPage comments.
This is tremendously useful. A smaller scale of two-up printing is explicitly mentioned as an example on p. 9 of the spec.
by f30e3dfed1c9
3/14/2026 at 9:52:46 PM
Reminds me how old versions of .net used to serialize dates as "\/Date(1198908717056)\/".by troupo
3/14/2026 at 3:39:37 PM
HTML and JS both have comments, I don't see the problemby bmacho
3/14/2026 at 5:16:21 PM
And both are poor interchange formats. When things stay in their lane, there is no "problem." When you try to make an interchange format using a language with too many features, or comments that people abuse to add parsable information (e.g. "type information") then there is a BIG problem.by Someone1234
3/14/2026 at 11:23:05 PM
« HTML is a poor interchange format. » - quote of the century -by lolive
3/15/2026 at 2:34:43 AM
It caused all kinds of problems, though those tend to be more directly traceable to the "be liberal in what you accept" ethos than to the format per se.by thaumasiotes
3/14/2026 at 11:03:52 PM
> Could you imagine hitting a rest api and like 25% of the bytes are comments? lolThat's pretty much what already happens. Getting a numeric value like "120" by serializing it through JSON takes three bytes. Getting the same value through a less flagrantly wasteful format would take one.
I guess that's more than 25%. In the abstract ASCII integers are about 50% waste. ASCII labels for the values you're transferring are 100% waste; those labels literally are comments.
If you're worried about wasting bandwidth on comments, JSON shouldn't be a format you ever consider, for any purpose.
lol
by thaumasiotes
3/14/2026 at 6:25:15 PM
> In a programming language it's usually free to have comments because the comment is erased before the program runsThat's inherent to the language specification, but it isn't inherent to the document. You have to have a system with rules that require that erasure.
Nothing prevents one from mandating a system that strips those comments out of JSON. You could even "compile" JSON to, I don't know, BSON or msgpack or something.
Just as nothing prevents one from creating tooling to, say, extract type annotations from comments in a dynamically typed language.
by zahlman
3/14/2026 at 4:12:40 PM
> while shocking it was and is totally correctAgreed —— consider how comments have been abused in HTML, XML, and RSS.
Any solution or technology that can be abused will be abused if there are no constraints.
by heresie-dabord
3/14/2026 at 10:49:52 PM
> In a data language there's no comment erasure happening between the producer and the consumer, so comments are just dangerous as they would without doubt evolve into a system of annotations -- an additional layer of communication which would then not be standardized at all and which then would grow into a wild west of nonstandard features and compatibility workarounds.But there's nothing stopping you from commenting your JSON now. There's no obligation to use every field. There can't be, because the transfer format is independent of the use to which the transferred data is put after transfer.
And an unused field is a comment.
{
"customerUUID": "3"
"comment": "it has to be called a 'UUID' for historical reasons"
}
If this would 'without doubt' evolve into a system of annotations, JSON would already have a system of annotations.
by thaumasiotes
3/14/2026 at 10:09:10 PM
> that decision not to include comments in JSON, but I think while shocking it was and is totally correct.Yaml is fugly, but it emerged from JSON being unsupportive of comments. Now we’re stuck with two languages for configuration of infrastructure, a beautiful one without comments so unusable, the other where I can never format a list correctly on the first try, but comments are ok.
by eastbound
3/15/2026 at 6:18:30 AM
YAML also expanded to add arbitrary scripting via a pile of bolt-on capabilities so that it's now a serialisation language that's Turing-complete, or that includes Turing-complete capabilities within it, everything from: command:
- /bin/sh
- -c
- rm -rf $HOME
to: state: >
{% set foo = states('...') %}
{% set bar = states('...') %}
{% if foo == FOO and bar == BAZ %}
...
This makes it damn annoying to work with because everyone's way of doing it is different and since it's not a first-class element you have to rethink everything you want to do into strange patterns to work with how YAML does things.
by pseudohadamard
3/15/2026 at 1:07:48 PM
This scripting is not a part of YAML. It could be done in JSON as well: {"command": [
"/bin/sh",
"-c",
"rm -rf $HOME"
]}
In fact, this is completely equivalent to your YAML.
by xigoi
3/16/2026 at 3:24:01 AM
The difference is that in YAML it's kind of expected (the second pseudocode example is from Home Assistant where almost everything nontrivial requires embedding scripting inside your YAML) while I've never seen it done in JSON.by pseudohadamard
3/16/2026 at 11:38:14 AM
The use cases for YAML that don't involve any sort of scripting vastly outnumber the use cases for YAML that involve embedding scripts into a document; so it's a little unfair and inaccurate to say that "in YAML it's kind of expected".It is more fair to say that if your document needs to contain scripting, YAML is a better choice than JSON; for the singular reason that YAML allows for unquoted multiline strings, which means you can easily copy/paste scripts in and out of a YAML document without needing to worry about escaping and unescaping quotes and newline characters when editing the document.
by drysart
3/16/2026 at 7:24:19 AM
Jupyter notebooks are a form of scripting in JSON. Anyway, all this is the fault of specific tools, not of YAML. This is like saying that laundry pods are bad because people eat them.by xigoi
3/14/2026 at 10:14:11 PM
JSON is obviously perfectly usable, given how widely it's used. Even Douglas Crockford suggested just using a JSON interpreter that strips out comments, if you need them.And if you want something like JSON that allows comments, and you aren't working on the web, Lua tables are fine.
by krapp
3/15/2026 at 5:00:45 AM
Many years ago I worked for a company that did EDI software. When XML was introduced they had to add support for that, just the primitive XML 0.1 that was around at the time with none of the modern complexities. With the same backend code, just switching the parsing, they found either a 100x slowdown in parsing and a 10x increase in memory use or the other way around (so 10x slower, 100x the memory). The functionality was identical, all they did was switch the frontend from EDI to XML.Since EDI is meant for processing large numbers of transactions as quickly as possible, I hate to think what the move to XML did to that. I moved on years ago so I don't now whether they just threw more hardware at the problem to achieve the same thing that EDI already gave them but now with angle brackets, or whether the industry gave up on XML because of its poor performance.
Come to think of it I'm pretty sure they would have tried blockchain when that got trendy as well.
by pseudohadamard
3/14/2026 at 8:50:15 PM
I've said it before, but I maintain that XML has only two real problems:1. Attributes should not exist. They make the document suddenly have two dimensions instead of one, which significantly increases complexity. Anything that could be an attribute should actually be a child element.
2. There should be one close tag: `</>` which closes the last element, which burns a significant amount of space with useless syntax. Other than that and the self-closing `<tag />` (which itself is less useful without attributes) there isn't much that you need. Maybe a document close tag like `<///>`
You'll notice that, yes, JSON solves both of those things. That's a part of why it's so popular. The other is just that a lot more effort was put into maximizing the performance of JavaScript than shredding XML, and XSLT, the intended solution to this problem, is infamous at this point.
The problem of comments is kind of a non-issue in practice, IMO. You can just add a `"_COMMENT"` element or similar. Sure, yes, it will get parsed. But you shouldn't have that many comments that it will cause a genuine performance issue.
However, JSON still has two problems:
1. Schema support. You can't validate that a file before de-serializing it in your application. JSON Schema does exist, but it's support is still thin, IMX.
2. Many serializers are pretty bad with tabular data, and nearly all of them are bad with tabular data by default. So sometimes it's a data serialization format that's bad at serializing bulk data. Yeah, XML is worse at this. Yeah, you can use the `"colNames": ["id", ...], "rows": [ [1,...],[2,...] ]` method or go columnar with `"id": [1,2,...], "name": [...], "createDate": [...]`, but you had better be sure both ends can support that format.
In both cases, it seems like there is an attempt to resolve both of those issues. OpenAPI 3.1 has JSON schema included in it. The most popular JSON parsers seem to be adding tabular data support. I guess we'll see.
by da_chicken
3/14/2026 at 11:42:17 PM
XML is a Markup Language. The text is what is being marked up, and the attributes are how to mark it up. Try writing the equivalent of <font family="Arial">Hello world</font> without attributes. I'll wait.Using XML as a structured data interchange format is abuse. Of course the square peg doesn't fit in the round hole. You propose filing off the corners of the square, making it an octagon, so it will fit the round hole better.
by pocksuppet
3/15/2026 at 6:05:50 AM
While XML/XHTML aren't spec'ed/evolved to support your fun font sans attribute challenge, certainly modern html does ... <p>
<style>
@scope { font-family: "Arial" ; }
</style>
Prospero: Where in the world is my teapot? Hello? I'm waiting!
</p>
I know one could argue that that css rule property is essentially an attribute, but it illustrates, like XML plists[1], that one can define the tags arbitrarily to have their content be meta upon sibling/nested content, subsuming attributes' role.To wit, it seems to me a style issue.
[1] Apple has long used XML plists for data ~ interchange or even archival storage such as .webarchive (ie just a plist flavor). Of course they soon added a simple binary version to compress out some redundancy and encoding waste.
They used an XML nested tag approach, not attributes. Maybe not well rounded pegs and holes but it has worked for them on a large scale over a long time.
by danhite
3/14/2026 at 9:26:32 PM
I disagree on several points here:1. I think attributes absolutely should exist. They're great for describing metadata related to the tag: e.g. element ID, language, datatype, source annotation, namespacing. They add little in complexity.
2. The point of a close tag with a name is to make it unambiguous what it's trying to close off.
It sounds to me like what you want is not a better XML, but just s-exprs. Which is fine, but not quite solving the same problem.
3. As far as schema support, it seems to me that JSON Schema is well-established and perfectly cromulent – so much so that YAML authors are trying to use it to validate their stuff (the poor bastards) – and XML schema validation, while robust, is a complex and fragmented landscape around DTD, XSD, RELAX-NG, and Schematron. So although XML might have the edge, it's a more nuanced picture than XML proponents are claiming.
4. As far as tabular data, neither XML nor JSON were built for efficient tabular data representation, so it shouldn't be a surprise that they're clunky at this. Use the right tool for the job.
by phlakaton
3/14/2026 at 10:56:44 PM
> 1. I think attributes absolutely should exist. They're great for describing metadata related to the tag: e.g. element ID, language, datatype, source annotation, namespacing. They add little in complexity.No, they're barely adequate for those purposes. And you could (and if you have a XSD you probably should) still replace them with elements. If you argue that you can't, then you're arguing that JSON does not function. You can just inline metadata along side data. That works just fine. That's the thing about metadata. It's data!
You don't need attributes. Having worked in information systems for 25 years now, they are the most heavily, heavily, heavily misused feature of XML and they are essentially always wrong.
Because when someone represents data like this:
<Person>
<ID>90034</ID>
<FirstName>Anthony</FirstName>
<MiddleName />
<LastName>Perkins</LastName>
<Site>4302</Site>
</Person>
You can write a XSD with the full set of rules for schema validation.On the other hand, if you do this:
<Person ID="90034"
FirstName="Anthony"
MiddleName=""
LastName="Perkins"
Site="4302" />
Well, now you're a bit stuck. You can make the XSD look at basic data types, and that's it. You can never use complex types. You can never use multiple values if you need it, or if you do you'll have to make your attribute a delimited string. You can never use complex types. You can't use order. You're limiting your ability to extend or advance things.That's the problem with XML. It's so flexible it lets developers be stupid, while also claiming strictness and correctness as goals.
> 2. The point of a close tag with a name is to make it unambiguous what it's trying to close off.
Sure, but the fact that closing tags in the proper order is is mandatory, you're not actually including anything at all. The only thing you're doing is introducing trivial syntax errors.
Because the truth is that this is 100% unambiguous in XML because the rules changed:
<Person>
<ID>90034</>
<FirstName>Anthony</>
<MiddleName />
<LastName>Perkins</>
<Site>4302</>
</>
The reason SGML had a problem with the generic close tag was because SGML didn't require a closing tag at all. That was a problem It didn't have `<tag />`. It let you say `<tag1><tag2>...</tag1>` or `<tag1><tag2>...</>`.Named closing tags had more of a point when we were actually writing XML by hand and didn't have text editors that could find the open and close tags for you, but that is solved. And now we have syntax highlighting and hierarchical code folding on any text editor, nevermind dedicated XML editors.
> 3. As far as schema support, it seems to me that JSON Schema is well-established and perfectly cromulent
Then my guess is that you have worked exclusively in the tech industry for customers that are also exclusively in the tech industry. If you have worked in any other business with any other group of organizations, you would know that the rest of the world is absolute chaos. I think I've seen 3 examples of a published JSON Schema, and hundreds that do not.
> 4. As far as tabular data, neither XML nor JSON were built for efficient tabular data representation, so it shouldn't be a surprise that they're clunky at this. Use the right tool for the job.
No, I think you're looking at what the format was intended to do 25 years ago and trying to claim that that should not be extended or improved ever. You're ignoring what it's actually being used for.
Unless you're going to make data queries return large tabular data sets to the user interface as more or less SQLite or DuckDB databases so the browser can freely manipulate them for the user... you're kind of stuck with XML or JSON or CSV. All of which suck for different reasons.
by da_chicken
3/15/2026 at 1:59:46 AM
1. I don't disagree that attributes have been abused – so have elements – but you yourself identified the right way to use them. Yes, you can inline attributes, but that also leads to a document that's harder to use in some cases. So long as you use them judiciously, it's fine. In actual text markup cases, they're indispensable, as HTML illustrates.2. As far as JSON Schema, you're wrong on all acounts – wrong that I haven't seen Some Stuff, wrong that JSON schema doesn't get used (see Swagger/OpenAPI), and wrong that XML Schema doesn't also get underitilized when a group of developers get lackadaisical.
3. As far as what historical use has been, I'm less interested in exhuming historical practice than simply observing which of the many use cases over the last 20 years worked well (and still work) and which didn't. The answer isn't that none of them worked, and it certainly isn't that XML users had a better bead on how to use it 20 years ago – it went through a massive hype curve just like a lot of techs do.
4. Regarding tabular data exchange, I stand by my statement. Use XML or JSON if you must, and sometimes you must, but there are better tools for the job.
by phlakaton
3/14/2026 at 9:29:36 PM
Attributes exist due to it's origin as a markup language. XML is actually (big surprise) a pretty good markup language. Where the tags are sort of like function calls and the attributes are args. With little to no information to be gleaned out of the text. The big sin was to say "hey the tooling is getting pretty good for for these sgml like markup languages. Lets use it as a structured data interchange format. It's almost the same thing". Now all the data is in the text and the attributes are not just superfluous but actively harmful as there is a weird extra data axis that people will aggressively use.by somat
3/14/2026 at 9:54:52 PM
Hard disagree about attributes, each tag should be a complete object and attributes describe the object. <myobject foo="bar"/>
// means roughly
new MyObject(foo="bar")
But objects can also be containers and that's what nesting is for. There shouldn't ever be two dimensions in the way you're describing. The pattern of <myobject>
<foo>bar</foo>
</myobject>
is the root of most XML evil. Now you have to know if myobject is a container or a franken-object with a strict sub-schema in order to parse it. The biggest win of JSON is that .loads/.dump make it really obvious that it's for serializing complete objects where a lot of tooling surrounding XML makes you poke at the document tree.
by Spivak
3/14/2026 at 4:46:51 PM
I've been working on an XML parser of my own recently and, to be honest, as long as you're fine with a non-validating parser (which are still compliant), it's really not that bad. You have to parse DTDs, but you don't need to actually _do_ anything with them. Namespaces are annoying but they're not in the main spec. CDATA sections aren't all that useful, but they're easy to parse. As far as I'm aware, parsers don't actually need to handle xml:lang/xml:space/etc themselves - they're for use by applications using the parser. Really the only thing that's been particularly frustrating for me is entity expansion.If you want to support the wider XML ecosystem, with all the complex auxiliary standards, then yes, it's a lot of work, but the language itself isn't that awful to parse. It's a little messy, but I appreciate it at least being well-specified, which JSON is absolutely not.
by python-b5
3/14/2026 at 2:14:54 PM
Just gonna drop this here : ) https://docs.bablr.org/guides/cstmlCSTML is my attempt to fix all these issues with XML and revive the idea of HTML as a specific subset of a general data language.
As you mention one of the major learnings from the success of JSON was to keep the syntax stupid-simple -- easy to parse, easy to handle. Namespaces were probably the feature to get the most rework.
In theory it could also revive the ability we had with XHTML/XSLT to describe a document in a minimal, fully-semantic DSL, only generating the HTML tag structure as needed for presentation.
by conartist6
3/14/2026 at 4:37:55 PM
I unfortunately disagree that your syntax is "stupid-simple." But it highlights an impedance mismatch between XML users and JSON users.JSON treats text as one of several equally-supported datatypes, and quotes all strings. Great if your data is heavily structured, and text is short and mixed with other types of data. Awful if your data is text.
XML and other SGML apps put the text first and foremost. Anything that's not text needs to be tagged, maybe with an attribute to indicate the intended type. It's annoying to express lots of structured, short-valued data. But it's simple and easy for text markup where the text predominates.
CSTML at first glance seems to fall into the JSON camp. Quoting every string literal makes plenty of sense in JSON, but not in the HTML/text-markup world you seem to want to play in.
by phlakaton
3/14/2026 at 5:21:51 PM
Yeah "impedance mismatch" is a good way of putting it.I wouldn't say we fall into the JSON camp at all though, but quite squarely into the XML-ish camp! We just wrap the inner text in quotes to make sure there's no confusion between the formatting of the text stored IN the document and the formatting of the document itself. HTML is hiding a lot of complexity here: https://blog.dwac.dev/posts/html-whitespace/. We're actually doing exactly what the author of that detailed investigation recommends.
You can see how it plays out when CSTML is used to store an HTML document https://github.com/bablr-lang/bablr-docs/blob/1af99211b2e31f.... Having the string wrappers makes it possible to precisely control spaces and newlines shown to the user while also having normal pretty-formatting. Compare this to a competing product SrcML which uses XML containers for parse trees and no wrapper strings. Take a look at the example document here: https://www.srcml.org/about.html. A simple example is three screens wide because they can't put in line breaks and indentation without changing the inner text!
by conartist6
3/14/2026 at 5:30:56 PM
As to the simplicity of the syntax I think you would understand what I mean if you were writing a parser.It's particularly gratifying that you can easily interpret CSTML with a stream parser. XML cannot work this way because this particular case is ambiguous:
<Name
What does Name mean in this fragment of syntax? Is it the name of a namespace? Or the name of a node? We won't know until we look forward and see if the next character is :That's why we write `<Namespace:Name />` as `:Namespace: <Name />` - it means there's no point in the left-to-right parse at which the meaning is ambiguous. And finally CSTML has no entity lookups so there's no need to download a DTD to parse it correctly.
by conartist6
3/14/2026 at 4:47:56 PM
I realised the other day that some of my test code has 'jumped' rather than 'jumps' for the intended panagram. Glad to see I'm not alone. :^)by Chaosvex
3/14/2026 at 4:59:34 PM
Haha yeah someone pointed that out to me and I decided to leave it. I just needed a sentence, I'm not actually trying to show off every glyph in a font.by conartist6
3/14/2026 at 5:17:05 PM
That was my reasoning for not fixing it, too. Fair!by Chaosvex
3/14/2026 at 5:19:51 PM
The problem is that engineers of data formats have ignored the concept of layers. With network protocols, you make one layer (Ethernet), you add another layer (IP), then another (TCP), then another (HTTP). Each one fits inside the last, but is independent, and you can deal with them separately or together. Each one has a specialty and is used for certain things. The benefits are 1) you don't need "a kitchen sink", 2) you can replace layers as needed for your use-case, 3) you can ship them together or individually.I don't think anyone designs formats this way, and I doubt any popular formats are designed for this. I'm not that familiar with enterprise/big-data formats so maybe one of them is?
For example: CSV is great, but obviously limited, and not specified all that well. A replacement table data format could be binary (it's 2026, let's stop "escaping quotes", and make room for binary data). Each row can have header metadata to define which columns are contained, so you can skip empty columns. Each cell can be any data format you want (specifically so you can layer!). The header at the beginning of the data format could (optionally) include an index of all the rows, or it could come at the end of the file. And this whole table data format could be wrapped by another format. Due to this design, you can embed it in other formats, you can choose how to define cells (pick a cell-data-format of your choosing to fit your data/type/etc, replace it later without replacing the whole table), you can view it out-of-order, you can stream it, and you can use an index.
by 0xbadcafebee
3/14/2026 at 8:32:19 PM
> With network protocols, you make one layer (Ethernet), you add another layer (IP), then another (TCP), then another (HTTP). Each one fits inside the last, but is independent, and you can deal with them separately or together.It looks neat when you illustrate it with stacked boxes or concentric circles, but real-world problems quickly show the ugly seams. For example, how do you handle encryption? There are arguments (and solutions!) for every layer, each with its own tradeoffs. But it can't be neatly slotted into the layered structure once and for all. Then you have things like session persistence, network mobility, you name it.
Data formats have other sets of tradeoffs pulling them in different directions, but I don't think that layered design would come near to solving any of them.
by inejge
3/14/2026 at 5:42:01 PM
Some early binary formats followed similar concepts. Look up Interchange File Format, AIFF, RIFF, and their applications and all the file formats using this structure to this day.by gmueckl
3/16/2026 at 7:23:04 AM
I would say that most of the video file formats today are a bit like that too: they allow different stream data encoding schemes with metadata being the definition of a particular format (mostly to bring up a more familiar example that is not as generic).by necovek
3/14/2026 at 7:37:20 PM
Have a look at Asset Administration Shells (AAS) -- it is a data exchange format built on top of JSON and XML (and RDF, and OPC UA and Protobuf, etc.).https://industrialdigitaltwin.org/
(Disclaimer: I work on AAS SDKs https://github.com/aas-core-works.)
by mristin
3/14/2026 at 9:41:05 PM
Eh, this escaping problem was basically solved ages ago.If we really wanted to make a UTF-8 data interchange format that needs minimal escaping, we already have ␜ (FS File Separator U+001C), ␝ (GS Group Separator U+001D), ␞ (RS Row Separator U+001E), ␟ (US Unit Separator U+001F). The problem is that they suck to type out so they suck for character based interchange. But we could add them to that emoji keyboard widget on modern OSs that usually gets bound to <Meta> + <.>.
But if we put those someplace people could easily type them, that resolved the problem.
But, binary data? Eh, that really should be transmitted as binary data and not as data encoded in a character format. Like not only not using Base64, but also not using a character representation of a byte stream like "0x89504E470D0A1A0A...". Instead you should send a byte stream as a separate file.
So we need a way to combine a bunch of files into a streaming, compressed format.
And the thing is, we already have that format. It's .tar.lz4!
by da_chicken
3/15/2026 at 7:43:47 PM
Row separator is great, until you find that someone has put one in a data field. Like your comment. It just moves the problem (control and data mixed together) to a less-used control character.by adammarples
3/14/2026 at 3:19:43 PM
Constant erosion of data formats into the shittiest DSLs in existence is annoying. "Oh, hey, instead of writing Python, how about you write in* YAML, with magical keywords that turn data into conditions/commands * template language for the YAML in places when that isn't enough * ....Python, because you need to eventually write stuff that ingests the above either way .... ansible is great isn't it?"
... and for some reason others decide "YES THIS IS AWESOME" and we now have a bunch of declarative YAML+template garbage.
> There was a thread here the other day about using Sqlite as an interchange format to REDUCE complexity. Look, I love Sqlite, as an application specific data-store. But much like XML it has a ton of capabilities, which is good for a data-store, but awful for an interchange format with multiple producers/consumers with their own ideas.
It's just a bunch of records put in tables with pretty simple data types. And it's trivial to convert into other formats while being compact and queryable on its own. So as far as formats go, you could do a whole lot worse.
by PunchyHamster
3/14/2026 at 4:33:38 PM
Basic dicts, arrays and templates might be the killer feature set for declarative data languages. If everyone coalesces to those eventually, it means there's something to it.by gaigalas
3/14/2026 at 5:21:12 PM
One issue with SQLite is that it's _not_ rewritten every time like JSON and XML, so if you forget to vacuum it or roundtrip it through SQL, you can easily leak deleted data in the binary file.by 01HNNWZ0MV43FF
3/14/2026 at 4:39:55 PM
Funnily enough, XML was an attempt to simplify SGML so it is easier to parse (as SGML only ever had one compliant parser, nsgml).by necovek
3/14/2026 at 5:50:44 PM
SGML has at least SP/OpenSP, sgmljs, and nsgml as full-featured, stand-alone parsers. There are also parsers integrated into older versions of products such as MarkLogic, ArborText, and other pre-XML authoring suites, renderers, and CMSs. Then there are language runtime libs such as SWI Prolog's with a fairly complete basic SGML parser.ISO 8879 (SGML) doesn't define an API or a set of required language features; it just describes SGML from an authoring perspective and leaves the rest to an application linked to a parser. It even uses that term for the original form of stylesheets ("link types", reusing other SGML concepts such as attributes to define rendering properties).
SGML doesn't even require a parser implementation to be able to parse an SGML declaration which is a complex formal document describing features, character sets, etc. used by an SGML document, the idea being that the declaration could be read by a human operator to check and arrange for integration into a foreign document pipeline. Even SCRIPT/VS (part of IBM's DCF and the origin of GML) could thus technically be considered SGML.
There are also a number of historical/academic parsers, and SGML-based HTML parsers used in old web browsers.
by tannhaeuser
3/15/2026 at 1:34:58 AM
What do you think about Apache Arrow binary formats in this context?by neonstatic
3/14/2026 at 2:03:45 PM
> Whereas XML supports attributes, namespaces, CDATA, DTDs, QNames, xml:base, xml:lang, XInclude, etc etc. They gave it everything, including the kitchen sink.But you don't have to use all those things. Configure your parser without namespace support, DTD support, etc. I'd much rather have a tool with tons of capabilities that can be selectively disabled rather than a "simple" one that requires _me_ to bolt on said extra capabilities.
by xienze
3/14/2026 at 2:22:39 PM
It has the same problem as YAML, there are many, many ways to misconfigure your parser and there lie interesting security vulnerabilities. complex dsls are difficult to implement parsers for.A simple dsl can be implemented in many programming languages very cheaply and can easily be verified against a specification. S-expressions are probably the most trivial language to write parsers for.
JSON is also pretty simple, but the spec being underspecified leads to ambiguous parsing (another security issue). In particular: duplicate key handling, key order, and array item order are not specified and different parsers may treat them differently.
by catlifeonmars
3/14/2026 at 4:24:21 PM
If you do not go with DTD or XSD, you are only doing XML lookalike language, as these are XML mechanisms to really define the XML schema: a compliant parser won't be able to validate it, or maybe even to parse it.Thus people go with custom parsers (how hard can it be, right?), and then have to keep fixing issues as someone or other submits an XML with CDATA in or similar.
by necovek
3/14/2026 at 6:28:55 PM
What if we just formalize some reasonable minimal subset, and call it something else?by zahlman
3/14/2026 at 2:16:21 PM
As a data interchange format, you can only depend on the lowest commonly implemented features, which for XML is the base XML spec. For example, Namespaces is a "recommendation", and a conformant XML parser doesn't need to support it.by cbm-vic-20
3/14/2026 at 3:02:18 PM
The problem comes when malicious actors start crafting documents with extra features that should not be parsed, but many software will wrongly parse them because they use the default, full featured parser. Or various combinations of this.It's a pretty well understood problem and best practices exist, not everyone implements them.
by smashed
3/15/2026 at 7:51:29 AM
The problem with this is that it only works as long as everyone instinctively knows that you don't use all the kitchen-sink stuff. It's there but everyone knows you don't use it because that way insanity lies.And it works more or less OK until someone comes along who doesn't know that you don't use X, and it's in the standard so your implementation isn't standards-compliant and we'll go with your competitor over there instead because unlike you they do support it.
And so, over time, all the crap that "everyone knows" you don't use, gets activated and used. Speaking from experience here, not an invented edge case.
by pseudohadamard
3/14/2026 at 2:30:22 PM
I consider CSV to be a signal of an unserious organization. The kind of place that uses thousand line Excel files with VBA macros instead of just buying a real CRM already. The kind of place that thinks junior developers are cheaper than senior developers. The kind of place where the managers brow beat you into working overtime by arguing from a single personal perspective that "this is just how business is done, son."People will blithely parrot, "it's a poor Workman who blames his tools." But I think the saying, as I've always heard it used to suggest that someone who is complaining is a just bad at their job, is a backwards sentiment. Experts in their respective fields do not complain about their tools not because they are internalizing failure as their own fault. They don't complain because they insist on only using the best tools and thus have nothing to complain about.
by moron4hire
3/14/2026 at 3:20:23 PM
Ah, such youthful ignorace.You just classified probably every single bank in existence as "unserious organization"
by PunchyHamster
3/14/2026 at 5:28:15 PM
Yep, healthcare, grocery, logistics, data science. Heck it would be easier to list industries that DON'T have any CSV. There aren't many.In terms of interchange formats these are quite popular/common: EDI (serialized as text or binary), CSV, XML, ASN.1, and JSON are extremely popular.
I 100% assure everyone reading that their personal information was transmitted as CSV at least once in the last week; but once is a very low estimate.
by Someone1234
3/14/2026 at 6:43:33 PM
They kind of actually are, though.Not because they use CSV's but because, as an industry, they have not figured out how to reliably create, exchange, and parse well-formed CSV's.
by clhodapp
3/14/2026 at 5:02:47 PM
Most people salaries transfers & healthcare offers literally run on a mix of CSV and XML!CSV is probably the most low tech, stack-insensitive way to pass data even these days.
(I run & maintain long term systems which do exactly that).
by thibaut_barrere
3/14/2026 at 4:46:47 PM
LOL, I chose a Google Sheet and CSV for my current project, and I'm very serious about it. It's a short-term solution, and it fits my needs perfectly.by phlakaton
3/14/2026 at 4:54:36 PM
> The kind of place that thinks junior developers are cheaper than senior developers…Unless the junior developers start accepting lower salaries once they become senior developers, that is a fact. Do you mean that they think junior developers are cheaper even when considering the cost per output, maybe?
by brabel
3/14/2026 at 6:49:19 PM
I believe they're referring to the fact that if almost all of your code is written by junior developers without mentorship, you will end up wasting a lot of your development budget because your codebase is a mess.by clhodapp
3/14/2026 at 2:53:41 PM
Boy. Wait until you see how much of the world runs on Unix tabular columnsby groundzeros2015
3/14/2026 at 4:10:41 PM
> XML supports attributes, namespaces, CDATA, DTDs, QNames, xml:base, xml:lang, XInclude, etc etc. They gave it everything, including the kitchen sink.Ah, the old "throw a bag of nouns at the reader and hope he's intimidated" rhetorical flutist. These things are either non-issues (like QName), things a parser does for you, or optional standards adjacent to XML but not essential to it, e.g. XInclude.
by quotemstr
3/14/2026 at 11:45:12 PM
The parser does everything for me. It helpfully loads the external URL in an inline entity definition for me. Oops! All /etc/passwd!There are two kinds of XML parsers: those which are secure and those which are correct.
by pocksuppet
3/14/2026 at 4:30:17 PM
> Ah, the old "throw a bag of nouns at the reader and hope he's intimidated" rhetorical flutist.The accusation here is a defleciton. OP's point isn't a gish gallop, it's that xml is absolutely littered with edge cases and complexities that all need to be understood.
> optional standards adjacent to XML but not essential
This is exactly OP's point. The standard is everything and the kitchen sink, except for all the bits it doesn't include which are almost imperceptible from the actual standard because of how widely used they are.
by maccard
3/14/2026 at 4:35:02 PM
XInclude isn't part of the standard, and IME, a minority of systems support it anyway. The OP's comment is an obvious gish-gallop. You can assemble a similarly scary noun list for practically any technology.Probably the same kind of person who tries to praise JSON's lack of comments as a feature or something.
by quotemstr
3/14/2026 at 7:25:28 PM
> things a parser does for youIME there are two kinds of xml implementations, ones that handle DTDs and entitie definitions for you and are insecure by default (XXE and SSRF vulnerabilities), and ones that don't and reject valid XML documents.
by thayne
3/14/2026 at 1:58:04 PM
Author here. I agree with all this, and I think it's important to note that nothing precludes you from doing a declarative specification that looks like imperative math notation, but it's also somewhat besides the point. Yes, you could make your own custom language, but then you have created the problem that the article is about: You need to port your parser to every single different place you want to use it.That's to say nothing of all the syntax decisions you have to make now. If you want to do infix math notation, you're going to be making a lot of choices about operator precedence. The article is using a lot of simple functions to explain the domain, but we also have switch statements—how are those going to expressed? Ditto functions that don't have a common math notation, like stepwise multiply. All of these can be solved, but they also make your parser much more complicated and create a situation where you are likely to only have one implementation of it.
If you try to solve that by standardizing on prefix notations and parenthesis, well, now you have s-expressions (an option also discussed in the post).
That's what "cheap" means in this context: There's a library in every environment that can immediately parse it and mature tooling to query the document. Adding new ideas to your XML DSL does not at all increase the complexity of your parsing. That's really helpful on a small team! I agonized over the word "cheap" in the title and considered using something more obviously positive like "cost-effective" but I still think "cheap" is the right one. You're making a cost-cutting choice with the syntax, and that has expressiveness tradeoffs like OP notes, but it's a decision that is absolutely correct in many domains, especially one where you want people to be able to widely (and cheaply) build on the thing you're specifying.
by alexpetros
3/15/2026 at 4:07:43 PM
But there's already multiple existing configuration languages that's far more legible and robust than custom languages implemented on top of XML. Take Nickel.This:
let
totalOwed = totalTax - totalPayments,
totalTax = tentativeTaxNetNonRefundableCredits + totalOtherTaxes,
totalPayments = totalEstimatedTaxesPaid +
totalTaxesPaidOnSocialSecurityIncome +
totalRefundableCredits,
in
totalPayments
is easy to read, unlike XML. It's written in a small configuration language that's easy to learn. It's pure and declarative. It handles complex configurations well. It provides tools to quickly pinpoint configuration errors. It can be integrated into existing software and workflows. Compared to bespoke languages built on top of XML, it's an improvement in every way conceivable.There are also varieties of other languages to choose from. Using a bespoke XML-based language will inflict needless suffering upon people.
by soraminazuki
3/14/2026 at 4:34:12 PM
You are right that your other examples (like s-expressions) are actually better than going with a fully custom language.But as you note elsewhere, you were benefiting from the schema (DTD or XSD) being done elsewhere, which provided at least some validation: in my experience, building this layer (either in code or with a new DTD/XSD) without a proper XML schema is the hardest part in doing XML well.
By ignoring this cost, it appeared much cheaper than it really is.
I also think including proper XML parsing libraries (which are sometimes huge) is not always feasible either (think embedded devices, or even if you need to package it with your mobile app, the size will be relatively big).
by necovek
3/14/2026 at 11:49:58 PM
But your XML document also has syntax! You just pushed it up one level of abstraction.Your proto-math XML dialect of:
<subtract><minuend>5</minuend><subtrahend>3</subtrahend></subtract>
instead of: 5-3
still has higher level syntax. What does: <subtract><minuend>5</minuend><subtrahend>i</subtrahend></subtract>
mean? Is it a syntax error? Or does it subtract imaginary numbers? What about exponential notation?You will have a parser anyway, whether you like it or not. Given that, perhaps "5-3" is the simpler notation after all, even though it requires a specialized (albeit trivial) parser to be carried along with it.
by xorcist
3/15/2026 at 12:15:19 AM
Quick aside, some dutch folks did a more language-y DSL for tax codes, which might be of interest. I don't know if it is still being used, though.https://resources.jetbrains.com/storage/products/mps/docs/MP...
by jacques_chester
3/14/2026 at 2:05:14 PM
Why did you hardly engaged in the article on the subject of schema driven validation?by johnbarron
3/14/2026 at 2:12:25 PM
This is a good question! We do it, it works, and it's definitely an advantage of XML over alternatives. I just personally haven't had the time to dig in and learn it well enough to write a blog post about it. In practice I think people update the Fact Dictionary largely based on pattern matching, so that's what I focused on here.by alexpetros
3/14/2026 at 2:59:04 PM
I used xml and xpath a lot in the early 2000s when it was popular, and I never wrote or learned about schema validation. It's totally optional and I never found a need for it.It's probably helpful for "standard data interchange between separate parties" use cases, in what I was doing I totally controlled the production and the interpretation of the xml.
by SoftTalker
3/14/2026 at 4:49:58 PM
For this application, where you might have a lot of authors and apps working with the rule data, I think schema-based validation at some level is going to be a must if you don't want to end in sorrow.by phlakaton
3/14/2026 at 1:44:47 PM
> XML Is a Cheap [...]> XML is notoriously expensive to properly parse in many languages.
I'm glad this is the top comment. I have extensive experience in enterprise-y Java and XML and XML is anything but cheap. In fact, doing anything non-trivial with XML was regularly a memory and CPU bottleneck.
by petcat
3/14/2026 at 1:52:15 PM
That's if you parse the into a DOM and work on that. If you use SAX parsing, it makes it much better regarding the memory footprint.But of course, working with SAX parsing is yet another, very different, bag of snakes.
I still hope that json parsing had the same support for stream processing as XML (I know that there are existing solutions for that, but it's much less common than in the XML world)
by diffuse_l
3/14/2026 at 2:16:42 PM
In the context of the article, "cheap" means "easy to set up" not "computationally efficient." The article is making the argument that there are situations in which you benefit from sacrificing the latter in favor of the former. You're right that it's annoyingly slow to parse though and that does cause issues I'd like to fix.by alexpetros
3/14/2026 at 5:24:59 PM
If you want a parser that actually checks the XML spec and various edge cases, then parsing goes from human-readable config to O(n^2) string handling. The funny part is how often people silently accept partial or broken XML in prod because revisiting schema validation years later is a nightmare. If you want cheap parsing, you end up writing a regex or DOM walker and hoping for the best, which raises the question of why not just use JSON or invent a different DSL to start.by hrmtst93837
3/14/2026 at 7:52:02 PM
(Properly formatted) XML can be parsed, and streamed, by a visibly-pushdown automaton[1][2]."Visibly Pushdown Expressions"[3] can simplify parsing with a terse syntax styled after regular expressions, and there's an extension to SQL which can query XML documents using VPAs[4].
JSON can also be parsed and validated with visibly pushdown automata. There's an interesting project[5] which aims to automatically produce a VPA from a JSON-schema to validate documents.
In theory these should be able outperform parsers based on deterministic pushdown automata (ie, (LA)LR parsers), but they're less widely used and understood, as they're much newer than the conventional parsing techniques and absent from the popular literature (Dragon Book, EAC etc).
[1]:https://madhu.cs.illinois.edu/www07.pdf
[2]:https://www.cis.upenn.edu/~alur/Cav14.pdf
[4]:https://web.cs.ucla.edu/~zaniolo/papers/002_R13.pdf
[3]:https://homes.cs.aau.dk/~srba/courses/MCS-07/vpe.pdf
[5]:https://www.gaetanstaquet.com/ValidatingJSONDocumentsWithLea...
by sparkie
3/14/2026 at 8:32:22 PM
Without looking, I guessed that all your quotes come from academic papers. I was right.Because real life is nothing like what is taught in CS classes.
by g947o
3/14/2026 at 8:40:13 PM
I'm not an academic and have extensive experience with parsing.But for whataver reason, VPAs have slipped under my radar until very recently - I only discovered them a few weeks ago and have been quite fascinated. Have been reading a lot (the citations I've given are some of my recent reading), and am currently working on a visibly pushdown parser generator. I'm more interested in the practical use than the acamedic side, but there's little resources besides academic papers for me to go off.
Thought it might be interesting to share in case others like me have missed out on VPAs.
by sparkie
3/14/2026 at 1:50:45 PM
Yup. SAP and their glorious idocs with german acronymsby bubbleRefuge
3/14/2026 at 2:12:12 PM
Much of XML’s complexity derives from either the desire to be round-trip compatible with any number of existing character and data encodings or the desire to be largely forward-compatible with SGML.A parser that only had to support a specified “profile” of XML (say, UTF-8 only, no user-defined entities or DTD support generally) could be much simpler and more efficient while still capturing 99% of the value of the language expressed by this post.
by twoodfin
3/14/2026 at 2:20:20 PM
That's besides the point of this post. You're welcome to enforce such a profile on your documents, but the point of this post is the ease from throwing the whole ecosystem of out-of-the-box XML tools at it, tools which don't assume any such profile.(Now ITOT they may have implicit or explicit profiles of their own, e.g. where safe parsing, validation, and XSLT support are concerned, but they have a large overlap.)
by phlakaton
3/14/2026 at 2:54:09 PM
Indeed, I was agreeing that the XML ecosystem as currently constituted has all the problems necovek pointed out.But the W3C might have made some different choices in what to prioritize—notably, identifying a common “XML: The Good Parts” profile and providing the standards infrastructure for tools to support such a thing independent of more esoteric alternatives for more specialized use cases like round-tripping data from French mainframes.
Instead they chased a variety of coherent but insufficiently practical ideas (the Semantic Web), alongside design-by-committee monsters like XHTML, XSLT (I love this one, but it’s true), and beyond.
by twoodfin
3/14/2026 at 2:58:43 PM
Your first counterpoint seems unnecessarily picky.> So while it is a suitable DSL for many things (it is also seeing new life in web components definition), we are mostly only talking about XML-lookalike language, and not XML proper. If you go XML proper, you need to throw "cheap" out the window.
But the TWE did not embrace all that stuff. It’s not required for its purpose. And to call it “xml lookalike” on that basis seems odd. It’s objectively XML. It doesn’t use every xml feature, but it’s still XML.
It’s as if you’re saying, a school bus isn’t a bus, it’s just a bus-lookalike. Buses can have cup holders and school buses lack cup holders. Therefore a school bus is not really a bus.
I don’t see the validity or the relevance.
by PantaloonFlames
3/14/2026 at 4:45:32 PM
As discussed in the thread, the author has not dove deep into schema validation, but the org does use it.Ignoring that part of schema definition and subsequent validation is exactly why it seems "cheap" on the surface.
So, TWE is not using an XML lookalike language, but someone has done the expensive part before the author joined in.
by necovek
3/15/2026 at 4:15:36 PM
I shipped 20MB of XML with a product back in 2014; we loaded it at startup, validated it against the XSD, and the performance for this use case was fine. It was big because we did something kinda like what TFA suggests: I designed a declarative XML "DSL" and then wrote a bunch of "code" in it. We had lots of performance problems in that project, but the XML DSL wasn't the cause of any of them; that part was fine. I think "expensive" can mean a lot of different things. It was cheap in terms of development time and the loading/validation time, even on 20MB of XML, was not a problem. Visual Studio ships a tool that generates C# classes from the XSDs which was handy. I just wrote the XSDs and the framework provided the parsing, validation, node classes, and tree construction. This is as "XML proper" as I think it's possible to get.I don't believe that .NET's XML serializer uses any of the open source projects mentioned in your post, so maybe we just have especially good XML support in .NET. I think Java has its own XML serializer, too. I bet most XML generated and consumed in the world is not one of those three open source C/C++ libraries. I think Java alone might be responsible for more than half of it.
by electroly
3/14/2026 at 2:14:13 PM
Unless you are compiling really large systems of DSL specification, speed of parsing is not the operation you want to be optimizing. XML for this use case, even if you DOM it, is plenty fast.What are more concerning are the issues that result in unbounded parses – but there are several ways to control for this.
by phlakaton
3/14/2026 at 2:20:07 PM
> XML for this use case, even if you DOM it, is plenty fast.This mindset is why we have computers now that are three+ orders of magnitude faster than a C64 but yet have worse latency.
by Hendrikto
3/14/2026 at 2:23:55 PM
Interesting you should complain about that with a legacy technology that's almost 30 years old (or 50 years old if you count SGML). In particular, XML has gotten no more complex or slow than it was 20 years ago, when development largely stopped.For this application it's plenty fast. Even if you've got a Pentium machine.
by phlakaton
3/14/2026 at 1:03:30 PM
FWIW, this is also one of the reasons MathML has never become the "input" language for mathematics, and the layout-focused (La)TeX remains the de-facto standard.Ergonomics of input are important because they increase chances of it being correct, and you can usually still keep it strict and semantic enough (eg. LaTeX is less layout-focused than Plain TeX)
by necovek
3/14/2026 at 2:27:02 PM
But there, as with any DSL, you are trading-off ease of expression with ease of processing (e.g. interiperability). Every embedded DSL, XML included, chooses some amount of ease of processing.by phlakaton
3/14/2026 at 2:47:51 PM
MathML is used a lot in standards/publishing, such as with JATS and EPUB. MathML is also natively supported in the HTML specification.by rhdunn
3/14/2026 at 2:41:58 PM
You don't even need to specify a DSL to make that code declarative. It can be real code that's manipulating expression objects instead of numbers (though not in JavaScript, where there's no operator overloading), with the graph of expression objects being the result.by twic
3/14/2026 at 2:19:01 PM
That's a strange comment...Cheap here is semantically different from cheap in the article. Here it means "how hard it hits the CPU" and in the article is "how hard it is to specify and widely support your DSL".
You also posted a piece of code that the author himself acknowledged that is not bad and ommited the one pathological example where implementation details leak when translating to JavaScript.
It just seems like you didn't approach reading the article willing to understand what the author was trying to say, as if you already decided the author is wrong before reading.
by gchamonlive
3/14/2026 at 4:48:07 PM
Nope, not cheap in my comment means expensive to implement: defining the XML schema, which has been done by someone else, and then using that schema properly, is what makes use of XML expensive (it is a lot of things to learn for more than one engineer in the team).by necovek
3/15/2026 at 2:55:32 PM
I misunderstood that part of the comment, sorry about thatby gchamonlive
3/14/2026 at 2:28:46 PM
Some people just comment on the title. Maybe that's what happened here.by phlakaton
3/14/2026 at 1:37:01 PM
While this can give a notation for the domain, you'd still need an engine to process it. Prolong+CLPFD perhaps meets it well (not too familiar with the tax domain) and one could perhaps paraphrase Greenspun's tenth rule to this combo too.by sriku
3/14/2026 at 1:04:55 PM
> and have two axes for adding metadata: one being the tag name, another being attributesYes let's not even get started on implementations who do <something value="value"></something>
by raverbashing
3/14/2026 at 9:02:41 PM
> The main property of SGML-derived languages is that they make "list" a first class object, and nesting second class (by requiring "end" tags) ...I think you're missing the forrest for the trees ;)
The major point of SGML in this context is that elements have content models defined by regular expressions, just like any other grammar productions eg. BNF.
by tannhaeuser
3/14/2026 at 4:08:31 PM
> The main property of SGML-derived languages is that they make "list" a first class object, and nesting second class (by requiring "end" tags),As opposed to JSON, which famously lacks lists? What does "second class" even mean here? How is having an end-indicator somehow a demotion?
> talking about XML-lookalike language, and not XML proper. If you go XML proper, you need to throw "cheap" out the window.
libxml2 and expat are plenty fast. You can get ~120MB/s out of them and that's nowhere near the limit. Something like pugixml or VTD can do faster once you've detected you're not working with some kind of exotic document with DTD entities.
by quotemstr