- 18
- 18 471
Andrew Lamb
Приєднався 20 бер 2011
2024 12 18 Chicago DataFusion Meetup 04 Andrew Lamb
Describes how InfluxDB 3.0 was built using DataFusion (unfortunately it has no sound)
Andrew Lamb is a Staff Engineer at InfluxData, working on InfluxDB 3.0, focused on query processing, the Apache DataFusion query engine and the Apache Arrow ecosystem. He is a member of the Apache Software Foundation and the Apache DataFusion PMC (2024 Chair), and Apache Arrow PMC (2023 Chair).
Andrew Lamb is a Staff Engineer at InfluxData, working on InfluxDB 3.0, focused on query processing, the Apache DataFusion query engine and the Apache Arrow ecosystem. He is a member of the Apache Software Foundation and the Apache DataFusion PMC (2024 Chair), and Apache Arrow PMC (2023 Chair).
Переглядів: 86
Відео
2024 12 18 Chicago DataFusion Meetup 03 Xiangpeng Hao
Переглядів 356Місяць тому
Building disaggregated cache for modern data analytics using Apahce DataFusion Xiangpeng Hao is a fourth year PhD student at the University of Wisconsin-Madison, studying and building database and storage systems. He was a former intern at InfluxData working on the StringView integration in Apache DataFusion.
2024 12 18 Chicago DataFusion Meetup 01 Adrian Garcia Badaracco
Переглядів 136Місяць тому
Overview of how Pydantic uses DataFusion to build a near real time data lake for observability data, with a deeper dive into indexing and their metadata store. Adrian Garcia Badaracco, founding engineer at Pydantic, leads the team building the database for an observability platform.
2024 12 18 Chicago DataFusion Meetup 02 Tim Saucer
Переглядів 86Місяць тому
Sata science in robotics and how DataFusion can be used to address some of the challenges particular to that field. Tim Saucer is a contributor to DataFusion, specifically focused on the python bindings to DataFusion.
6 Piotr Findeisen The Types
Переглядів 1853 місяці тому
@findepi from SDF wrapped up the talks with a detailed exploration of types and functions in the context of Apache Arrow vs DataFusion. He explained how types are handled in Arrow and DataFusion. @findepi's insights shed light on the potential improvements that could further enhance DataFusion’s handling of data types. Huge thanks to Sam from Synnada for this description: github.com/apache/data...
5 Nick Karlov DataFusion as a heart of modern HTAP DB
Переглядів 2513 місяці тому
@karlovnv from Tarantool followed, sharing insights on how his team is pushing the limits of big data. @karlovnv showcased their work on real massive datasets, such as handling 3,000-column dataset, processing 70TB of data in RAM and doing these things really really fast (quicker than 10ms for a fraud detection use case!?). His talk demonstrating how DataFusion plays a key role in enabling thes...
4 Marko Grujić - Database replication with DataFusion
Переглядів 1823 місяці тому
@gruuya, senior staff engineer at EDB, the hero of the day who gathered all of us for this amazing event, gave a talk focused on database replication using the FDAP (Flight, DataFusion, Arrow, and Parquet) stack. @gruuya explained how this powerful combination of open-source tools enables efficient and scalable data replication, particularly in analytic environments. By leveraging Apache Arrow ...
3. Mehmet Ozan Kabak One Compute to Rule Them All: Unifiying Data & AI Workflows with DataFusion
Переглядів 2433 місяці тому
@ozankabak, co-founder and CEO of Synnada, spoke about the challenges of building data-intensive applications, referring to the Data Chasm - a complex landscape with many moving parts that makes it difficult to manage data efficiently. He explained how DataFusion helps break down these barriers, allowing for a more streamlined approach to data processing. @ozankabak highlighted Synnada's contri...
Artjoms Iškovs Reducing query latency via a caching object store layer
Переглядів 4173 місяці тому
Next, @mildbyte, Principal Engineer at EDB, delivered a highly technical talk on caching optimization using DataFusion in EDB. @mildbyte explained how EDB utilizes DataFusion to optimize query caching, which leads to significant performance improvements.These optimizations are crucial for managing large-scale data systems, showcasing how EDB leverages DataFusion’s capabilities effectively. Huge...
1 Andrew Lamb - DataFusion Introduction
Переглядів 6653 місяці тому
The talks kicked off with @alamb, who provided an in-depth introduction to origins and goals of Apache DataFusion. He started by described DataFusion as LLVM for data systems, enabling innovation in data-intensive systems. @alamb highlighted DataFusion’s architecture, built with industrial best practices, and its ability to compete with tightly integrated systems. Finally, @alamb touched on the...
Profiling Apache DataFusion using flamegraph
Переглядів 2484 місяці тому
Shows how to use flamegraphs to profile your application. This video shows how to profile a particular query using datafusion-cli and then understand the resulting flamegraph.svg
Faster DataFusion with StringView - Xiangpeng Hao (Aug 15, 2024)
Переглядів 5145 місяців тому
Xiangpeng Hao summarizes what Apache Arrow StringView is, why it can improve performance, and the practical challenges overcome when realizing the potential. Xiangpeng Hao presents his 2024 Summer Intern project at @influxdata8893: improving performance in Apache DataFusion, the query engine used in InfluxDB 3.0. Talk Abstract: We implemented a new string representation-StringView-in the Rust i...
SIGMOD 2024 Practice: Apache Arrow DataFusion A Fast, Embeddable, Modular Analytic Query Engine
Переглядів 2,8 тис.7 місяців тому
Presentation: docs.google.com/presentation/d/1gqcxSNLGVwaqN0_yJtCbNm19-w5pqPuktII5_EDA6_k/edit#slide=id.p Paper: github.com/apache/datafusion/files/14789704/DataFusion_Query_Engine SIGMOD_2024-FINAL.pdf
2024-03-25 DataFusion Meetup Introduction
Переглядів 46710 місяців тому
2024-03-25 DataFusion Meetup Introduction
Profiling DataFusion with Instruments (part of XCode on Mac OSx)
Переглядів 50810 місяців тому
Show how to use the Instruments tool that comes with XTools on Mac OSX in order to see where a Rust program (in this case `cargo bench ...`) is spending its time and a brief summary of how to interpret the visualization
Apache Arrow DataFusion Architecture Part 3
Переглядів 2 тис.Рік тому
Apache Arrow DataFusion Architecture Part 3
Apache Arrow DataFusion Architecture Part 2
Переглядів 2,8 тис.Рік тому
Apache Arrow DataFusion Architecture Part 2
Apache Arrow DataFusion Architecture Part 1
Переглядів 7 тис.Рік тому
Apache Arrow DataFusion Architecture Part 1