LLM Workflows then Agents: Getting Started with Apache Airflow

4/1/2025 at 9:26:16 AM

Nice to see some workflow engine action on Hacker News! :-)

I'm currently building pgflow, which is a simple, postgres-first engine that uses task queues to perform real work.

Have explicit DAG approach, strong typesafety, nice DSL in TypeScript and a dedicated task queue worker that allows it to run solely on Supabase without any external tools.

I'm super close to the alpha release, if you guys want more info, check out the readme for SQL core (https://github.com/pgflow-dev/pgflow/tree/main/pkgs/core#rea...) or my Twitter (https://x.com/pgflow_dev).

Hope that grabs someone attention :-) Cheers

by jumski

4/1/2025 at 10:00:29 AM

Exactly what I was looking for without even knowing it :) EDIT: well I knew I need smt like this, but I thought I'd had to build a very rudimentary version myself. Thank you for saving me tons of time in my project

by CjHuber

4/1/2025 at 12:13:33 PM

Thanks! That's the reason I'm building it - I needed something like this but there was nothing abailable.

I'm very close to releasing an alpha, will post here when ready!

by jumski

3/31/2025 at 10:03:07 PM

this is really cool!

That said, my impression is that Airflow is a really dated choice for a greenfield project. There isn't a clear successor though. I looked into this recently, and was quickly overwhelmed by Prefect, Dagster, Temporal, and even newer ones like Hatchet and Hamilton

Most of these frameworks now have docs / plugins / sister libraries geared around AI agents

It would be really helpful to read a good technical blog doing a landscape of design patterns in these different approaches, and thoughts on how to fit things together well into a pipeline given various quirks of LLMs (e.g. nondeterminism).

This page is a good start, even if it is written as an airflow-specific how-to!

by mushufasa

3/31/2025 at 11:06:02 PM

Dated doesn’t mean bad (usually the opposite in my experience!) What issues do you have with Airflow?

by Hasz

4/1/2025 at 12:59:14 AM

Here's my problems with MWAA (amazon hosted airflow.) I have about 100 dags which maxes out the scheduler thread. Airflow parses all the files every minute so it's always parsing around 94% cpu. I could run a second scheduler thread if I coordinate with my SRE team and get the terraform deployed...it's really tedious.

Related possibly, my dags get kill -9 for no apparent reason. The RAM usage is not that high, maybe 2gb out of 8gb system RAM in use. No reason is given in the logs.

I am trying to switch to dagster, not because it's awesome, but because it hasn't crashed randomly on me.

by gre

4/1/2025 at 1:54:50 PM

This feels like an MWAA issue but I understand how that often gets conflated with it being an Airflow issue.

by alittletooraph2

4/1/2025 at 3:00:55 PM

You're right, it doesn't happen when developing locally, only in MWAA. This was the answer given by the Airflow team as well and I figured they would punt before I asked.

I realize Amazon is taking an open source project and making a ton of money on it (the instance prices are ridiculous for what you get) and the incentives are misaligned for the Airflow team to help AWS make it better unless AWS paid them to help fix it.

It's crap all around, and Airflow gets a bad rap from AWS's terrible MWAA product based on it.

by gre

4/1/2025 at 3:04:09 AM

MWAA is hot garbage. I had similar issues and switched to running it on EKS instead.

by mblast311

4/1/2025 at 3:20:54 AM

> What issues do you have with Airflow?

Their operational perspective is catastrophic; how does one view the logs for a dag through the UI[1]? Why can't it store the python in the database they have attached to their deployment, versus making me jump through 80,000 hoops to put the files in the right magic directory on disk of every worker[2]?

1: no, not <https://airflow.apache.org/docs/apache-airflow/stable/ui.htm...> I mean the log, you know, like in the old days of $(tail -f /var/log/the.thing). I'm open to the answer hiding somewhere in this gobbledygook <https://airflow.apache.org/docs/apache-airflow/stable/admini...> but who is the target audience for having such a fancy UI and omitting log viewing from it, doubly so if there's some alleged http just for viewing logs

2: https://airflow.apache.org/docs/apache-airflow/stable/core-c... and double-plus-good anytime python software mentions PYTHONPATH -- that's how you know you're in for a hot good time https://airflow.apache.org/docs/apache-airflow/stable/admini...

by mdaniel

4/1/2025 at 3:50:21 AM

We deploy on K8s in OpenStack from a scheduled GitHub Actions pipeline which aggregates DAGs into a new container build based on hashes of hashes. This works well with almost no intervention.

WRT your 1, above any DAG output to stdout/err is available via the logs tab from the graph view of the individual tasks. Almost all our DAGs leverage on the PythonOperator though, not sure if that standardises this for us and your experience is muddied by more complexity than we currently have?

WRT 2. we generate an uber requirements.txt running pyreqs from the pipeline and install everything in the container automatically. Again no issues currently - although we do need to manually add the installation of test libraries to the pipeline job as for some reason auto-discovery is flakier for unit-tests frameworks.

by 6LLvveMx2koXfwn

4/1/2025 at 2:59:05 AM

I'd be curious if this scratches your itch:

https://www.dbos.dev/blog/durable-execution-crashproof-ai-ag...

by jedberg

4/1/2025 at 8:27:55 AM

Pleasantly surprised to see the name Mike Stonebraker in the About Us.

by hbarka

4/1/2025 at 4:04:10 PM

About to jump into an eng meeting with him right now!

by jedberg

4/1/2025 at 12:14:45 AM

This space is honestly a mess. I did an in depth survey around 1.5 yrs ago and my eventual conclusion was just to build with airflow.

You either get simplicity with the caveate that your systems need to perfectly align.

Or you get complexity but will work with basically anything (airflow).

by bashfulpup

4/1/2025 at 12:57:55 AM

Would be interested to know what drawbacks you found with Dagster or Prefect.

by febed

4/2/2025 at 9:10:51 PM

Other guy said it right. These work and are fine but you lose the legacy stuff. If you know your limits and where the eventual system will end up it's great and probably better.

If you are building a expandable long term system and you want all the goodies baked in choose airflow.

Pretty much the same as any architecture choice. Ugly/hard often means control and features, pretty/easy means less of both.

On the surface the differences are not very noticable other than the learning curve of getting started.

by bashfulpup

4/1/2025 at 1:47:52 AM

Prefect is amazing. Built out an ETL pipeline system with it at last job and would love to get it incorporated in the current one, but unfortunately have a lot of legacy stuff in Airflow. Being able to debug stuff locally was amazing and super clean integration with K8S.

by jt_b

4/1/2025 at 2:28:18 AM

+1 to this. other solutions over-promise, under-deliver, poor developer relations and communication, "open-source, but pay us" style open-source, and is indeed a mess

by nikolayasdf123

3/31/2025 at 11:20:19 PM

[dead]

by wetestinprod

3/31/2025 at 9:34:18 PM

Truthfully have been a little skeptical of how many workloads will actually need “agents” vs doing something totally deterministic with a little LLM augmentation. Seems like I’m not the only one that thinks the latter works a lot of the time!

by itsallrelative

4/1/2025 at 2:34:27 AM

Yes! I just wrote an article on this: https://sgnt.ai/p/hell-out-of-llms/

by petesergeant

4/1/2025 at 12:45:54 AM

Extremely bearish on existing tools solving agentic workflows well. If anyone, it will be temporal. Airflow and the like simply were not designed for high dynamic execution, and so have all sorts of annoyances that will make them lose.

by ldjkfkdsjnv

4/1/2025 at 1:13:06 AM

Temporal’s great! That being said, there is something about being able to orchestrate LLMs and agents using what many already use to orchestrate their data workflows because there’s already proven out reliability, scalability, observability, etc. I’m sure there are boundary conditions for really advanced agentic workflows though…

by alittletooraph2

4/1/2025 at 7:16:15 AM

Temporal is for a static graph with idempotent nodes. Powerful LLM workflows don’t fit this model.

by acchow

4/1/2025 at 1:46:31 PM

Temporal is absolutely not for a static graph, idempotent nodes yes. Please explain your argument more

by ldjkfkdsjnv

4/3/2025 at 2:00:44 AM

> Temporal is absolutely not for a static graph

I'd clarify this to say "Temporal is absolutely not limited to a static graph." It can certainly handle a static graph, but it can also handle a dynamic one. Here is an example in Go (https://github.com/temporalio/samples-go/tree/main/choice-mu...), there are similar ones for other languages.

I think the confusion might stem from the determinism requirement in Temporal (and other replay-based Durable Execution platforms). It's not the Workflow Definition (i.e., the code) that must be deterministic, it's the Workflow Execution (i.e., a specific running instance of that code) that must be deterministic. Each running instance is allowed to take a different path through that code, so long as it does so consistently when executed with the same input.

by tomwheeler

4/1/2025 at 2:30:55 AM

Have you checked out DBOS Transact[0]? DBOS is designed for high dynamic execution, and doesn't have the overhead or complexity of Temporal [1].

Disclosure, I'm the CEO of DBOS.

[0] https://github.com/dbos-inc/dbos-transact-py

[1] https://www.dbos.dev/blog/durable-execution-coding-compariso...

by jedberg

4/1/2025 at 1:46:53 PM

I have seen it! And appreciate your response, just haven't had the time to dive in.

If you want a product hint from me, I think that adding integrations natively into the platform that would allow vibe coders to build asynchronous agents easier would really boost revenue. Like email, text, etc.

Probably not your vision, just a suggestion

by ldjkfkdsjnv

4/1/2025 at 4:03:54 PM

> that would allow vibe coders

We've experimented with that actually! Six months ago it was terrible, but the new models are getting pretty good.

And it's definitely easier for an AI to generate DBOS code to make a fully formed distributed system than a fully formed distributed system somewhere else.

by jedberg

4/2/2025 at 11:50:45 AM

Making an asynchronous ai agent is still hard, there is a disconnect between the agentic LLM code (langgraph, openai agents, etc) and asynchronous distributed systems/message passing. True AI agents will have a cohesive joining of the two.

by ldjkfkdsjnv

4/1/2025 at 2:32:32 AM

maybe it is not "highly dynamic execution" in the first place. daily/hourly schedule for batch processing is not too bad. and of course, rarely run jobs (e.g. github review, slack, etc. as author says in post) is definitely ok

by nikolayasdf123

4/1/2025 at 1:56:05 AM

I'm sorry, I don't really know Airflow, but what's the point of `@task.agent`, as compared to plain old `return my_agent.run_sync(...)`? To me it feels like a more restrictive[1], and possibly less intuitive[2] API.

[1]: Limited to what decorator arguments can do. I suspect it could become an issue with `@task.branch` if some post-processing would be needed to adjust for smaller models' finickinesses.

[2]: As the final step is described at the top of the function.

by drdaeman

4/1/2025 at 3:44:07 PM

Disclaimer: author of the SDK here.

It is _potentially_ more restrictive than writing pure Python functions, but the plus side is that we can interject certain Airflow-specific features into how the agent runs. And this isn't mean for someone who knows agents inside & out / wants the low-level customizability.

The best example of this today is log groups: Airflow lets you log things out as part of a "group" which has some UI abstractions to make it easier. This SDK takes the raw agent tool calls and turns them each into a log group, so you can see a) at a high level what the agent is doing, and b) drill down into a specific tool call to understand what's happening within the tool call.

To your point about the `@task.llm_branch`, the SDK & Pydantic AI (which the SDK uses under the hood) will re-prompt the LLM up to a certain number of attempts if it receives output that isn't the name of a downstream task. So there shouldn't be much finickiness.

by jlaneve

3/31/2025 at 9:28:03 PM

I'm looking into using LLM calls inside SQL Triggers to make agents / 'agentic' workflows. Having LLM powered workflows can get you powerful results and are basically the equivalent of 'spinning up' an agent.

by datadrivenangel

4/1/2025 at 9:23:09 AM

I'm not sure if this would scratch your itch, but I'm building a Postgres-native workflow engine that separates orchestration from execution. I want to be able to start my flows from within db triggers.

It just exposes set of functions to propagate the DAG through states and queues tasks for a task worker to perform the actual work and acknowledge completion or failure back to the sql orchestrator.

I was working on it for last few months and it will be ready in upcoming weeks, first version is dedicated to work on Supabase but I plan to make it agnostic.

If you want to learn more, check out the SQL Core readme which explains the whole concept (https://github.com/pgflow-dev/pgflow/tree/main/pkgs/core#rea...) or my Twitter for updates and some demos (https://x.com/pgflow_dev).

by jumski

4/1/2025 at 8:57:25 AM

Been having a great time with postgresml for this exact kind of thing. If you don't need a complex DAG but have a simple pipeline or work queue that can be easily represented in postgres anyway, it's very straightforward to work with and nicely encapsulates all of your processing (traditional data munging and LLM calls) together with a modest extension of a familiar system.

by fancy_pantser

3/31/2025 at 11:30:34 PM

Know of any online examples of the same?

by jackthetab

4/1/2025 at 9:15:46 AM

Decorators in the usage example looks useless, and more to show off than being a real convenience.

In real life program, I don't think that you will have hundreds of calls to LLM or agent in your app so much that you have any code gains to decorator but at the opposite the decorator will make it very hard to have parametric values or values not hard coded but from config that you don't set up upfront at application startup like globals. That is a bad practice...

by greatgib

4/1/2025 at 3:45:27 PM

Disclaimer: author of the SDK here.

Airflow actually uses decorators to indicate something is an explicit task in a data pipeline vs just a utility function, so this follows that pattern!

It also uses an "operator" under the hood (Airflow's term for a pre-built, parameterized task) which can be subclassed and customized if you want to do any customization.

by jlaneve

3/31/2025 at 11:35:01 PM

This is about workflows that use AI, but it lead me to actually think of the inverse - has anyone experimented with AI agents defining and iterating upon long-running workflows?

by falcor84

4/1/2025 at 2:24:14 AM

nice. airflow is a good fit for this

by nikolayasdf123

3/31/2025 at 9:27:18 PM

[dead]

by ringtailedlemur

4/1/2025 at 3:17:39 AM

[dead]

by curtisszmania