whats your opinion on "ledgers"? ie ingest raw data and leave as is, e.g json, .txt... if you need extra metadata, keep a separate table with e.g timestamp, file_path, status (e.g success, 404, 500, timeout). I think its nice cause 1- the raw data is untouched - no extra metadata fields in the data itself, 2 - if i want to e.g select a subset of those raw files, or check how many there are, its easier and faster to do that by quering the ledger table, than discovering the files using filesystem list operations.
To me, "raw" data means I didn't make it and don't control its schema or integrity. Maybe it's well-formed, maybe it's not, but it's probably not in the form I ultimately want it.
whats your opinion on "ledgers"? ie ingest raw data and leave as is, e.g json, .txt... if you need extra metadata, keep a separate table with e.g timestamp, file_path, status (e.g success, 404, 500, timeout). I think its nice cause 1- the raw data is untouched - no extra metadata fields in the data itself, 2 - if i want to e.g select a subset of those raw files, or check how many there are, its easier and faster to do that by quering the ledger table, than discovering the files using filesystem list operations.
To me, "raw" data means I didn't make it and don't control its schema or integrity. Maybe it's well-formed, maybe it's not, but it's probably not in the form I ultimately want it.