Why Open Table Formats and Why Now

Поділитися
Вставка

КОМЕНТАРІ • 1

  • @Studiful
    @Studiful 2 місяці тому

    There are a whole lot of assumptions made here that, in my experience, just are not true. Eg. processing is almost always much more expensive than storage, so no cost savings there by saving on space at the expense of processing. Also, the warehouse is not a business model at all. Seems this is from a very narrow perspective of someone who focuses mainly on data science, rather than on data engineering. There is huge value in having a centralised location for data where the definitions are the same, the data is enriched from many sources into a cohesive and unified view of the data, much less duplication of data (so the marginal cost savings of space in this presentation are nullified anyway). The main reason that this approach is popular, at the moment, is that people don't have to wait so long to access the data and they can run off an do their own empire building in a silo. This is great for the individual, but not great at all for the business as a whole. Data scientists are used to mass duplication of data and massive processing costs in building their models, so this is normal for them. There is big trouble ahead for those who are designing everything around the very specific needs of machine learning. That is only a small part of the whole.