alt.hn

3/1/2026 at 8:48:30 PM

Dbslice: Extract a slice of your production database to reproduce bugs

https://github.com/nabroleonx/dbslice

by rbanffy

3/5/2026 at 3:37:32 PM

I built tools like this at several startups to copy production customer data onto a dev instance for the purpose of bug reproduction.

When I moved to big tech the rules against doing this were honestly one of the biggest drivers of reduced velocity I encountered. Many, many bugs and customer issues are very data dependent and can’t easily be reproduced without access to the actual customer data.

Obviously I get why the rules against data access like that exist and yes, many companies have ways to get consent for this access but it tends to be cumbersome and last-resortish. I think it’s under-appreciated how much it slows down the real world progress of fixing customer-reported issues.

by semiquaver

3/5/2026 at 2:04:41 PM

Cool project.

I haven’t looked at the code too much(yet). I’d be curious to know how you’re handling some of the more wiry edge cases when it comes to following foreign key constraints. Things like circular dependencies come to mind. As well as complex joins.

I feel ok posting this because it’s archived, but this problem is basically what we designed for with Neosync [1]. It was probably the hardest feature to fully solve for the customers that needed it the most, which were the ones with the most complex data sets and foreign key dependencies.

To the point where it was almost impossible to do this, at least with syncing it directly to another Postgres database with everything in tact. Meaning that if on the other side you want another pg database that has all of the same constraints, it is difficult to ensure you got the full sliced dataset. At least the way we were thinking about it.

[1]: https://github.com/nucleuscloud/neosync

by nickzelei

3/5/2026 at 5:21:24 PM

that is a valid point. dbslice finds cycles in the fk graph and usually resolves them by nulling a nullable fk for insert order, then patching it back with deferred updates after inserts. if a cycle has no nullable fks, postgres output can still work when deferred fk checking is enabled and the cycle constraints are deferrable, otherwise it fails fast with a clear error.

traversal automatically pulls in parent records so you don’t end up with dangling references, and a validator (enabled by default) can double-check the slice before output. for complex joins, you can opt into subqueries in seed where clauses.

it covers a lot of messy cases, but i won’t claim it’s fully solved yet. there’s no automatic discovery of relationships that only exist in app code (beyond heuristic hints), and real production schemas will still surface new edge cases. it’s still early-stage, so the more people test it on messy production-like datasets, the faster i can iron those out.

i would also love to hear what you think of the implementation if you check out the code.

by nabroleonx

3/5/2026 at 9:56:57 AM

I made one of these, however I still have to solve the PII issues convince the data custodians that it's safe to use.

by patpatpat

3/5/2026 at 9:27:40 AM

This is extremely valuable. Every time we get a problem which we are not able to reproduce, usually an extreme edge case, we end up getting our entire production DB replicated to get to the error.

I'll surely try this. Thanks for posting it here.

by thunderbong

3/5/2026 at 9:30:00 AM

[dead]

by mergisi