alt.hn

5/19/2025 at 11:25:32 AM

Airport for DuckDB

https://airport.query.farm/

by jonbaer

5/23/2025 at 1:34:42 AM

I was almost going to build a lakehouse* with DuckDB because I low-key love it, easiest and strongest analytical engine I've found yet: scale from laptops to big metal, while being mostly out-of-core when doing sane stuff, and avoiding distributed computing for SQL in the process (looking at you Spark).

That is until I found out it does not support Iceberg writes[1], big nono as I would need another engine for inserts, and I want a simple stack :(. What a bummer.

[1] https://github.com/duckdb/duckdb_iceberg/issues/37

*that is what they are called now aren't they? I just can't follow the terms anymore haha.

by mrbungie

5/23/2025 at 6:03:23 AM

Fivetran tried to upstream write support but it was not accepted https://github.com/duckdb/duckdb-iceberg/pull/95

by nicornk

5/23/2025 at 10:42:30 AM

That sounds less "not accepted" and more "will implement, rewrite required". It was only a couple months ago.

by shakna

5/23/2025 at 6:34:10 PM

I'm curious, did you consider delta tables? Pretty sure duckdb supports them nicely. If you did, how come you chose not to go with them?

by benrutter

5/23/2025 at 2:15:01 AM

This is one of the ideas behind using DuckDB in github.com/spiceai/spiceai

by jeadie

5/23/2025 at 11:41:56 AM

That looks like an amazing "swiss army knife"...!

by anentropic

5/23/2025 at 2:37:04 AM

Looks very cool! I will take a look, tysm!

by mrbungie

5/23/2025 at 11:48:23 AM

Not just for building a new one, it can also complement existing data-warehouse/lakehouses: https://github.com/buremba/universql

The flight extension is excellent as it removes the need to write C++ extensions and lets you use your favorite language to develop native DuckDB catalogs. It's straightforward to build data lake connectors and plug them in as a flight catalog, thanks to Airport!

by buremba

5/23/2025 at 1:52:19 AM

it's coming. they already have hive style parquet writes. Iceberg is more complicated than that, but it's certainly doable.

by mritchie712

5/23/2025 at 1:56:57 AM

Yeah, it just would be great if it already did so and I hope it supports Iceberg soon, as it would enable me to change expensive (and bad) engines like AWS Athena for something more manageable.

Don't get me wrong, I'm just being a tongue-in-check egotistical bastard data engineer from hell. DuckDB is a fine piece of software as it is, and those mantainers deserve heaven.

by mrbungie

5/23/2025 at 11:43:45 AM

same here man, ended up going with trino explicitly for writing and data management and using chdb/duckdb to process data for front-ends etc (mostly ethereum data so chdb "support" for ui256 is quite important)

by sukhavati

5/23/2025 at 3:15:00 AM

I love duck db. We use it a ton for indexing and organizing system / kernel level metrics exported by eBPF.

Check out our sandbox:

https://yeet.cx/play

by r3tr0

5/23/2025 at 12:27:49 AM

This is a cool thought exercise to think that everything that we do in the data world can be done in SQL, from SQL. In a sense this is the MCPs but for the DuckDB world.

by blef

5/23/2025 at 11:14:47 AM

Thanks for taking the time to understand the philosophy of the extension.

by rustyconover

5/23/2025 at 6:21:02 AM

Not clear. Will this allow loading ipc files in DuckDB finally? That's been my biggest issue, since I use IPC files for append operations before I turn them into parquet files.

by k_bx

5/23/2025 at 11:12:36 AM

That’s possible with the arrow extension today.

by rustyconover

5/23/2025 at 4:10:28 AM

Does this mean the data source and destination both have to set up flight servers? I imagine then this won’t be useful for integration of third-party services.

by rubenvanwyk

5/23/2025 at 11:13:39 AM

Only the data source.

by rustyconover

5/23/2025 at 7:05:06 AM

This is very nice. I also love the fuzzycomplete and lindel from the same org/authors.

by vkaku

5/23/2025 at 7:17:28 AM

fuzzycomplete - https://github.com/Query-farm/fuzzycomplete "This fuzzycomplete extension serves as an alternative to DuckDB's autocomplete extension, with several key differences: ..."

lindel - https://github.com/Query-farm/lindel "This lindel extension adds functions for the linearization and delinearization of numeric arrays in DuckDB. It allows you to order multi-dimensional data using space-filling curves. ... Linearization maps multi-dimensional data into a one-dimensional sequence while preserving locality, enhancing the efficiency of data structures and algorithms for spatial data, such as in databases, GIS, and memory caches."

by code_biologist

5/23/2025 at 11:13:16 AM

Thanks for the compliments!

by rustyconover

5/23/2025 at 1:41:30 AM

What’s the situation where this is useful? Seems like ‘replace your remote duckDB instance—used to replace a DB server—with duckDB instance + a flight server (or a bunch of them!)’. Who has a problem for which this is the solution?

by the_optimist

5/23/2025 at 2:44:40 AM

A Flight server paired with duckdb is a good way to get concurrent writes.

by simlevesque