5/23/2025 at 1:34:42 AM
I was almost going to build a lakehouse* with DuckDB because I low-key love it, easiest and strongest analytical engine I've found yet: scale from laptops to big metal, while being mostly out-of-core when doing sane stuff, and avoiding distributed computing for SQL in the process (looking at you Spark).That is until I found out it does not support Iceberg writes[1], big nono as I would need another engine for inserts, and I want a simple stack :(. What a bummer.
[1] https://github.com/duckdb/duckdb_iceberg/issues/37
*that is what they are called now aren't they? I just can't follow the terms anymore haha.
by mrbungie
5/23/2025 at 6:03:23 AM
Fivetran tried to upstream write support but it was not accepted https://github.com/duckdb/duckdb-iceberg/pull/95by nicornk
5/23/2025 at 10:42:30 AM
That sounds less "not accepted" and more "will implement, rewrite required". It was only a couple months ago.by shakna
5/23/2025 at 6:34:10 PM
I'm curious, did you consider delta tables? Pretty sure duckdb supports them nicely. If you did, how come you chose not to go with them?by benrutter
5/23/2025 at 2:15:01 AM
This is one of the ideas behind using DuckDB in github.com/spiceai/spiceaiby jeadie
5/23/2025 at 11:41:56 AM
That looks like an amazing "swiss army knife"...!by anentropic
5/23/2025 at 2:37:04 AM
Looks very cool! I will take a look, tysm!by mrbungie
5/23/2025 at 11:48:23 AM
Not just for building a new one, it can also complement existing data-warehouse/lakehouses: https://github.com/buremba/universqlThe flight extension is excellent as it removes the need to write C++ extensions and lets you use your favorite language to develop native DuckDB catalogs. It's straightforward to build data lake connectors and plug them in as a flight catalog, thanks to Airport!
by buremba
5/23/2025 at 1:52:19 AM
it's coming. they already have hive style parquet writes. Iceberg is more complicated than that, but it's certainly doable.by mritchie712
5/23/2025 at 1:56:57 AM
Yeah, it just would be great if it already did so and I hope it supports Iceberg soon, as it would enable me to change expensive (and bad) engines like AWS Athena for something more manageable.Don't get me wrong, I'm just being a tongue-in-check egotistical bastard data engineer from hell. DuckDB is a fine piece of software as it is, and those mantainers deserve heaven.
by mrbungie
5/23/2025 at 11:43:45 AM
same here man, ended up going with trino explicitly for writing and data management and using chdb/duckdb to process data for front-ends etc (mostly ethereum data so chdb "support" for ui256 is quite important)by sukhavati