3/3/2026 at 6:59:22 AM
RFC 4180 [1] Section 2.6 says:"Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes."
If the DMS output isn’t quoting fields that contain commas, that’s technically invalid CSV.
A small normalization step before COPY (or ensuring the writer emits RFC-compliant CSV in the first place) would make the pipeline robust without renaming countries or changing delimiters.
That way, if/when the DMS output is fixed upstream, nothing downstream needs to change.
by simula67
3/3/2026 at 8:19:06 AM
That's the real shame but also the lesson, a perfectly good and specified format, but the apparent simplicity makes everyone ignore the spec and yolo out broken stuff.This is why SQL is "broken", it's powerful, simple and people will always do the wrong thing.
Was teaching a class on SQL, half my class was reminding them that examples with concatenating strings was bad and they should use prepared statements (JDBC).
Come practice time, half the class did string concatenations.
This is why I love Linq and the modern parametrized query-strings in JS, they make the right thing easier than the wrong thing.
by whizzter
3/3/2026 at 9:27:46 AM
I also really like the way Androidx's Room handles query parameters and the corresponding APIs. @Dao
public interface UserDao {
@Query("SELECT * FROM user")
List<User> getAll();
@Query("SELECT * FROM user WHERE uid IN (:userIds)")
List<User> loadAllByIds(int[] userIds);
@Query("SELECT * FROM user WHERE first_name LIKE :first AND " +
"last_name LIKE :last LIMIT 1")
User findByName(String first, String last);
@Insert
void insertAll(User... users);
@Delete
void delete(User user);
}
by kuschku
3/3/2026 at 2:07:32 PM
It's one of the better abstractions given the lack of first class expressions in Java, having used EfCore/Linq a while I'd be hard pressed to like going back though.The Linq code is native C# that can be strongly typed for ID's,etc but you can "think" in SQL terms by writing Where,Select,OrderBy and so on (I will admit that the C# world hasn't really gotten there in terms of promoting strongly typed db ID's yet but there support is there).
by whizzter
3/3/2026 at 4:05:42 PM
In that case, I'd recommend jooq, which is just linq in Java :D create.select(AUTHOR.FIRST_NAME, AUTHOR.LAST_NAME, count())
.from(AUTHOR)
.join(BOOK).on(AUTHOR.ID.equal(BOOK.AUTHOR_ID))
.where(BOOK.LANGUAGE.eq("DE"))
.and(BOOK.PUBLISHED.gt(date("2008-01-01")))
.groupBy(AUTHOR.FIRST_NAME, AUTHOR.LAST_NAME)
.having(count().gt(5))
.orderBy(AUTHOR.LAST_NAME.asc().nullsFirst())
.limit(2)
.offset(1)
by kuschku
3/4/2026 at 7:03:20 AM
No, it's not, it's a SQL builder.by mrsmrtss
3/3/2026 at 11:29:38 PM
Yeah I don't understand how the article started with, we want this ["value"] but got this [value] but doing that wasn't the solution?by hahn-kev
3/4/2026 at 10:55:53 AM
This. Pulling in Parquet and all of its dependencies is utter overkill.by julik
3/4/2026 at 11:39:52 AM
Not to mention that while Parquet fixes the "delimiter problem", it doesn't fix the "encoding problem".In (simplistic) CSV, you have to pick the right delimiter or it mangles some of your data.
In Parquet you have to pick the right data type encodings for each column for your data or it gets mangled.
Your clean monetary fixed-precision decimal data from the source system becomes floating point slop in your "I didn't want to think about data types"-encoded Parquet file and then starts behaving differently (or even changing values!) due to the nature of floating point precision artifacts. Or your blanks become 0s or nulls, etc, etc.
And don't get me started on character set encodings!
by gregw2