A better R programming experience thanks to Tree-sitter

4/17/2026 at 2:29:51 AM

I read this article a week or so ago and immediately implemented a VS Code extension that I've always wanted: a static analysis tool for targets pipelines. targets is an R package which provides Make-like pipelines for data science and analysis work. You write your pipeline as a DAG and targets orchestrates the analysis and only re-runs downstream nodes if upstream ones are invalidated and the output changes. Fantastic tool, but at a certain level of complexity the DAG becomes a bit hard to navigate and reason about ("wait, what targets are downstream of this one again?"). This isn't really a targets problem, as this will happen with any analysis of decent complexity, but the structure targets adds to the analysis actually allows for a decent amount of static analysis of the environment/code. Enter tree-sitter.

I wrote a VS Code extension that analyzes the pipeline and provides useful hover information (like size, time last invalidated, computation time for that target, and children/parent info) as well as links to quickly jump to different targets and their children/parents. I've dogfooded the hell out of it and it's already vastly improved my targets workflow within a week. Things like providing better error hints in the IDE for targets-specific malformed inputs and showing which targets are emitting errors really take lots of the friction out of an analysis.

All that to say: nice work on extending tree-sitter to R!

tarborist: targets + tree-sitter https://open-vsx.org/extension/tylermorganwall/tarborist

GH: https://github.com/tylermorganwall/tarborist

by tylermw

4/17/2026 at 6:53:14 AM

I only dabble in data analysis. I scratch the surface of what R can do, and my most complicated analysis fits in 100 or so lines of code I manage manually rather than with the help of tools like targets. What sort of work do you do where you get to play around with fun tools like that?

by kqr

4/17/2026 at 11:30:51 AM

It's not necessarily the number of lines that motivates these tools. Say you're running an NLP pipeline where you want to do sentiment analysis on a large text corpus (tweets, for example) and then relate sentiment over time to some other variables. Each of those steps might only be a dozen lines of code, but the sentiment analysis might take a nonnegligable amount of time. If you can avoid rerunning it when only the later analysis has changed that can save you considerable time while iterating on the second step of the analysis.

The old fashioned way to do this in R is to use the REPL and only rerun the lines of the script that have changed, with the earlier part staying in the environment. But it's easy to make mistakes doing it manually that way; having the computer track what has changed and needs to be rerun is much less error-prone.

by CrazyStat

4/17/2026 at 1:46:21 PM

Yes, the main benefit is caching and reproducibility: with targets (or any other DAG-based approach), you only recompute what needs to be recomputed and you are assured that no stale inputs or temporary analysis artifacts end up in the final product. If you don't own the underlying data sources and those sources can change at any point, a DAG-based approach helps ensure that.

by tylermw

4/17/2026 at 8:55:53 AM

Long time lurker on HN but this totally deserves my first (edit: second) ever post. Looks amazing, thank you!

by adamalt

4/18/2026 at 1:55:13 AM

Honored!

by tylermw

4/17/2026 at 11:45:47 AM

It has been a lot of fun watching you iterate on this via bluesky updates!

by davisvaughan

4/17/2026 at 1:40:07 PM

Thanks for all the work you (and the rest of the contributors) have done putting this together! I think bringing tree-sitter to R has already shown massive benefits: Just air alone has been a big improvement to my workflow.

by tylermw

4/17/2026 at 4:06:25 PM

What is the advantage of targets over nextflow or snakemake?

by kjkjadksj

4/17/2026 at 12:14:14 AM

The article makes out like auto completion and help on hover are new things, but RStudio IDE has had them for years and years.

R/RStudio was my first language/IDE. I was horribly shocked when moving into other languages to discover they didn't have things you got out of the box with R/RStudio. "You mean I have to look up documentation for a function/method!?! - that's supposed to be automatic!".

R has a bunch of features which other languages lack to the degree that it's a rude shock to learn that other ecosystems lack them. One is the REPL with extremely convenient RStudio keyboard shortcuts to run lines of code (to achieve similar with ruby, I have an elaborate neovim/slime setup that took hours to configure and still isn't as good as RStudio gives out of the box).

A sign of a brilliant tool is when an idiot can get more done with it than an expert can with alternatives.

by nomilk

4/17/2026 at 12:20:30 AM

Maybe that explains why I was confused about this article. I kept wondering what exactly on offer, and that it couldn't be as simple as help on hover and auto-complete, because those seemed pretty basic and prevalent. It took me a few years to move to RStudio, but at this point, I literally don't know anyone who doesn't use it. To the point that I once had to explain to a labmate that R and RStudio were, in fact, not the same thing.

So either this is not that exciting, or else the additional things that are on offer are not very clearly explained to the point that I missed them.

by MostlyStable

4/17/2026 at 12:53:05 AM

I suspect the main benefits are portability (since tree-sitter uses wasm and javascript it can run in any webpage - compared to the previous way of parsing R code which needed an R runtime, so not just any old website could do it; e.g. a shiny app probably could because it has an R runtime available but a standard HTML page couldn't). And the other is tree-sitter is a widely used tool so now anything that uses tree-sitter can now work with R, since the R grammar is available.

Looks like R's tree-sitter grammar has been in use for GitHub search for a while (since 2024), so it's a nice improvement due to R/tree-sitter, although we've probably been benefitting from it for a while already, perhaps without knowing exactly how it worked!

https://github.com/orgs/community/discussions/120397#discuss...

by nomilk

4/17/2026 at 2:42:20 AM

I believe this should let you do syntax highlighting for R in vim for example.

by user3939382

4/17/2026 at 10:51:11 AM

In my opinion, RStudio is still the best data science IDE and it's not even close. I've been using Positron a bit more lately just for Claude Code reasons, as I prefer having the pane itself rather than using the terminal, but man it's really tough to shake RStudio. Even with the work put into configuring VSCode to get it kind of close to it, it still just always feels a bit janky.

by mscbuck

4/17/2026 at 12:29:42 PM

Emacs + ESS is superior IMO. RStudio has a bunch of frills I don't care about and doesn't let me configure files as I'd like. ESS showing the function signature in the minibuffer to me is the killer feature. Wish I could get that for EVERYTHING.

by chocochunks

4/17/2026 at 4:50:12 AM

What if you want to share something outside of your precious IDE?

- Merge request on GitHub - Presentation with reveal.js (kind of like PowerPoint)

You'd be stuck with either bland, uncoloured, text-only characters, OR with a fuzzy PNG screenshot where you can't zoom or copy. Or maybe you "parse R" with Regex.

tree-sitter integrates into any web-based technology, allowing you to _share_ code.

by stephbook

4/17/2026 at 5:18:15 AM

Yes, your comment really should be the focus of article, i.e. genuinely new capabilities and improvements, not existing capabilities done a slightly different way. In any case it’s a minor nitpick and it’s awesome progress for the language and tooling

by nomilk

4/17/2026 at 6:57:16 AM

The ESS package in Emacs has also had several of these features for R for a long time. The difference here is portability and generality. Tree-sitter is a partial solution to the n×m problem, and now R has been invited to participate in that solution. That's something to be celebrated, even if it doesn't have immediate impact on our day-to-day, because it means future innovations in tooling for programming languages get automatically shared to R, instead of having to be reimplemented.

(The n×m problem is that for n languages and m tools like autoformatting, etc., we need an implementation for each tool specific to each language. With tree-sitter, we get n+m implementations instead: generic tools that work across multiple languages.)

by kqr

4/16/2026 at 10:52:58 PM

Tree-sitter is one of the finer engineering products out there, it enables so much. Thanks to its creator and everyone who has contributed to this project and its many grammars!

by epistasis

4/17/2026 at 12:06:05 AM

Do the tools built on this understand dplyr pipelines and columns in the data frames appearing as bare variables in the code? If so, I’m really impressed. R does some unusual stuff.

by fn-mote

4/17/2026 at 4:04:18 PM

I first leanred about tree sitter a couple months back when I started looking at what was inside the NPM fodler for claude. It's a really cool library.

One of the things it made me think about is whether it made sense for using when editing large markdown files would it be more efficient to convert a document form markdown to DOM then back again for the purposes of editing a large markdown file via code agents? (or a json)

The theory being that agents are always asking me for pemission to use sed in bash to edit markdown files -- could tree-sitter do the same thing using its code-editing capabilities? And would that difference be materially impactful? Could I lower the token cost of writing an extensive plan by choosing a format that allows me to use tree sitter?

I really haven't explored that much yet since I've been working on other things but it was more just one of those things that make you go hmmmmm... maybe someone else knows :)

by AlexC04

4/17/2026 at 4:06:11 PM

oooh ... I wonder if it would be sufficient to just write a markdown gramar for TS?! I should ask my AI what it thinks.. I'm sure it'll tell me I'm absoloutely right and a very good and smart boy.

by AlexC04

4/17/2026 at 6:45:41 AM

I've been thinking about an R package, or maybe a more general treesitter-based package, to reorganize functions in a project. Something like a tui which shows you functions in files in folders and lets you copy and paste them around; and maybe use graph analysis to automate this, analysing function dependencies and putting each "community" of functions into one file.

Is there any interest in this? There are per-language complexities, for example R functions are often preceded by a roxygen block which ought to travel with it. Has anyone done something similar?

by dash2

4/17/2026 at 11:20:49 AM

I think that'd be cool, but I'd say that Claude Code/Codex is often used for this exact thing and they do a decent job of it (at least in my experience with R). Usually once I've kind of wrapped up my model or data work I'll just ask "okay, now organize this so it makes sense", and it usually does a great job at organizing the helpers, etc.

by mscbuck

4/17/2026 at 1:49:07 PM

I've done exactly what you're talking about using tree-sitter (via the tarborist VS Code extension), specific to targets pipelines:

https://bsky.app/profile/tylermw.com/post/3mjmcykuows2d

So yes, it is possible and quite useful!

by tylermw

4/17/2026 at 11:14:08 AM

tree-sitter's design has potential, but my impression is that even after all these years, it is yet to be realized. The speed claims turned out to be largely overstated in practice, for the general variety of usage (rather than single task benchmarks or special cases). And the claim with the grammar system was that, given such a coherent system rather than the much-hated regex parsing, people would be able to write better grammars that are less prone to edge case problems and be less buggy. And maybe that's true in cases like this where someone gets paid to write the grammar and maintain it, but in most common cases, the actual quality of the grammars turn out to be much the same, but with more possibility of regression or breakage. It's possible that in ten years' time, tree-sitter will clearly be the way to go, with more polish all around, but at this point it doesn't feel like an easy strong recommend over the traditional parsing systems.

by sundarurfriend

4/18/2026 at 8:31:24 PM

I like the very idea of tree sitter and even listening to the first talk video by the creator was interesting. However, it has been big barrier for me to write grammar for it for a custom lisp based DSL used in industry (called SKILL; think lisp but with support for both C and lisp styles syntax), and the regex based syntax shines well here since iterating over it does not need recompile and also is incremental independent rules compared to the syntax tree based with hierarchy.

by analog_daddy

4/17/2026 at 12:09:05 AM

I moved to tree-sitter inside Emacs a while ago and I'd say tree-sitter is much easier than it looks like.

I had a first little use case... For whatever reason the options to align let bindings in Clojure code, no matter if I tried the "semantic" or Tonsky's semi-standard way of formatting Clojure code (several tools adopted Tonsky's suggestion) and no matter which option/knob I turned on, I couldn't align like I wanted.

I really, really, really hate the pure horrible chaos of this:

    (let [abc (+ a 2)
          d (inc b)
          vwxyz (+ abc d)]
      ...

But I love the perfection of this [1]:

    (let [abc     (+ a 2)
          d       (inc b)
          vwxyz   (+ abc d)]
      ...

And the cljfmt is pretty agnostic about it: I can both use cljfmt from Emacs and have a hook forcing cljfmt and it'll align everything but it won't mess with those nice vertical alignments.

Now, I know, I know: it is supposed to work directly from cljfmt but many options are, still in the latest version, labelled as experimental and I simply couldn't make it work on my setup, no matter which knob I turned on.

So what did I do? Claude Code CLI, tree-sitter, and three elisp functions.

And I added my own vertical indenting to Clojure let bindings. And it's compatible with cljfmt (as in: if I run cljfmt it doesn't remove my vertical alignments).

I'd say the tree-sitter syntax tree is incredibly verbose (and has to be) but it's not that hard to use tree-sitter.

P.S: and I'm not alone in liking this kind of alignment and, no, we're not receptive to the "but then you modify one line and several lines are detected as modified". And we're less receptive by the day now that we begin to had tools like diff'ing tools that are indentation-agnostic and only do AST diffs.

by TacticalCoder

4/17/2026 at 3:04:41 AM

Can you move the closing ) to also be vertically aligned?

And the first +/inc in parenthesis?

by eviks

4/17/2026 at 9:20:41 AM

People really do still be using R in 2026. Old habits I guess.

by moffkalast

4/17/2026 at 11:18:42 AM

Not that TIOBE or PyPl are the end all be all, but R was in the Top 10 for the first time since 2020 and PyPl has it at #4. A lot of people use R in 2026, because it's still great for data science work, "tidy" language is still fantastic for working with data, and also it's caught up to Python in almost every way when it comes to putting models into production. Both are great "orchestrator" languages, and I've put both into production on sites that get hundreds of thousands of hits a day.

by mscbuck

4/17/2026 at 12:41:18 PM

2021 just called and want their comment back.

by sieste