Efficient CSV Parsing - On the Complexity of Simple Things - Pedro Holanda

Поділитися
Вставка
  • Опубліковано 8 кві 2024
  • DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN:
    We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to unite, foster collaborations between its members, and bring in high-quality international speakers. We would like to invite all researchers, especially PhD students, who are working on related topics to join the events. It is an excellent opportunity to receive feedback early on from researchers in your field.
    Website: dsdsd.da.cwi.nl/
    X: x.com/dsdsdnl
    Speaker: Pedro Holanda
    Title: Efficient CSV Parsing: On the Complexity of Simple Things
    Abstract: In this talk, we will revisit different CSV parsing
    implementations in DuckDB and compare them with the current
    implementation. The bulk of the talk is to discuss the design and
    implementation decisions in DuckDB's current CSV Parser. In particular,
    we will examine the parallel algorithm, the CSV buffer manager, and the
    transitions of the CSV state machine. Disclaimer: This talk is not for
    the faint of heart; some very exotically built CSV files will be depicted.
    Bio: Pedro is an early contributor to DuckDB and currently works as a
    software engineer at DuckDB Labs, focusing on core and integration
    aspects of DBMS technology. He completed his PhD at the Database
    Architectures group at CWI, researching Indexes for Interactive Data
    Analysis.
  • Наука та технологія

КОМЕНТАРІ •