OSACon
OSACon
  • 46
  • 6 486
Proton : A single binary to tackle streaming and historical analytics
Presented by Ken Chen at OSA Con 2023
Proton is a unified streaming and historical analytic engine which is built on top of ClickHouse code base and is in one single binary. It is the core engine which empowers Timeplus core product and open sourced under apache v2 github.com/timeplus-io/proton.

In this talk, I will cover its technical internals like watermarking, streaming query state management, its internal streaming store, and how it connects historical data with live streaming etc. In the meaning while, some core features like tumble / hop / session window processing, streaming join, aggregation, new designed materialized view etc will be presented as well.

Timeplus also dedicates to contribute Proton's streaming query processing features back to the ClickHouse community and it is already in progress (Github issue github.com/ClickHouse/ClickHouse/issues/54776, and PR github.com/ClickHouse/ClickHouse/pull/54870).

So we like to present the porting roadmap and hear the feedbacks from the community. We believe with the addition of streaming processing capability, ClickHouse OSS will serve the community users better.
Переглядів: 62

Відео

Open Source Project Report: Evidence - Business Intelligence as Code
Переглядів 1428 місяців тому
Presented by Sean Hughes at OSA Con 2023 Evidence is an open source business intelligence tool where all content is defined in markdown and SQL. This session will give an overview of the project: what it is, why we're building it, why we chose open source, and the upcoming roadmap. It will also include a look at the newest release, which introduces a client-side SQL runtime powered by DuckDB We...
Data Alchemy: Transforming Raw Data to Gold with Apache Hudi and DBT
Переглядів 1678 місяців тому
Presented by Nadine Farah at OSA Con 2023 The medallion architecture graduates raw data sitting in operational systems into a set of refined tables in a series of stages, ultimately processing data to serve analytics from gold tables. While there is a deep desire to build this architecture incrementally, it is very challenging with current technologies available on lakehouses. Many technologies...
Who needs ChatGPT? Rock solid AI pipelines with Hugging Face and Kedro
Переглядів 638 місяців тому
Presented by Juan Luis Cano Rodríguez at OSA Con 2023 Artificial Intelligence is all the rage, largely thanks to generative systems like ChatGPT, Midjourney, and the like. These commercial systems are very sophisticated and powerful, but also a bit opaque if you want to learn how they work or adapt them to your needs. What happens inside the 'black box'? Luckily there are open AI models that yo...
How to implement Data Contracts with DataHub
Переглядів 6488 місяців тому
Presented by Shirshanka Das at OSA Con 2023 Data contracts have been much discussed in the community of late, with a lot of curiosity around how to approach this concept in practice. We believe data contracts need a harmonizing layer to manage data quality in a uniform manner across a fragmented stack. We are calling this harmonizing layer the Control Plane for Data - powered by the common thre...
Unlocking Scalable and Efficient Data Storage with Apache Ozone
Переглядів 958 місяців тому
Presented by Uma Maheswara Rao Gangumalla at OSA Con 2023 In today's data-driven world, organizations are faced with unprecedented volumes of data and increasingly complex storage requirements. To address these challenges, Apache Ozone emerges as a game-changing solution, redefining the landscape of distributed object storage systems. Apache Ozone is an open-source, highly scalable, and efficie...
Unveiling the Power of dbt and DuckDB: Hype vs. Reality
Переглядів 1618 місяців тому
Presented by Cameron Cyr at OSA Con 2023 Data professionals and analysts are constantly searching for efficient ways to streamline their ETL/ELT processes. dbt, with its focus on transformation, modeling, and testing, has gained significant traction in the industry. On the other hand, DuckDB, a high-performance analytical database, has gained recognition for its speed and versatility. In this s...
Getting Started with Polars
Переглядів 2208 місяців тому
Presented by Matt Harrison at OSA Con 2023 Get ready to revolutionize your data analysis with Polars - the newest, most highly optimized dataframe library on the market! In this talk, we'll introduce you to the power of Polars and show you how it compares to the popular Pandas library.
Query Live Data Using Open Source SQL Engines
Переглядів 288 місяців тому
Presented by Jove Zhong & Gang Tao at OSA Con 2023 Streaming data is rapidly becoming a key component in modern applications, and Apache Kafka, Redpanda and Apache Pulsar have emerged as a popular and powerful platform for managing and processing these data streams. However, as the volume and complexity of streaming data continue to grow, it becomes increasingly critical to have efficient and e...
Going beyond Observability: Grafana for Analytics
Переглядів 588 місяців тому
Presented by Kyle Cunningham at OSA Con 2023 Grafana is a powerful platform for infrastructure observability and visualization, allowing for easy access to a wide array of operational metrics. It is more than that too however, as Grafana has been quickly adding a multitude of features to support a large variety of data analysis use cases; all while using the same intuitive user experience that ...
Apache Pulsar: Finally an Alternative to Kafka?
Переглядів 4068 місяців тому
Presented by Julien Jakubowski at OSA Con 2023 Today, when you think about building event-driven and real-time applications, the words that come to you spontaneously are probably: RabbitMQ, ActiveMQ, or Kafka. These are the solutions that dominate this landscape. But have you ever heard of Apache Pulsar? After a brief presentation of the fundamental concepts of messaging, you'll discover the Ap...
From Click to Insight: Transforming Streams with Apache Flink
Переглядів 378 місяців тому
Presented by Andrey Gusarov at OSA Con 2023 In this topic, I'll delve into using Apache Flink for real-time distributed data processing in diverse product initiatives. From implementing counters and windowed analytics to online data enrichment, I'll highlight the challenges faced and share insights on harnessing Flink's capabilities to address these scenarios in high-demand environments
Unlocking Financial Data with Real-Time Pipelines
Переглядів 1138 місяців тому
Presented by Timothy Spann at OSA Con 2023 Financial institutions thrive on accurate and timely data to drive critical decision-making processes, risk assessments, and regulatory compliance. However, managing and processing vast amounts of financial data in real-time can be a daunting task. To overcome this challenge, modern data engineering solutions have emerged, combining powerful technologi...
You put OLTP in my OLAP! Analytics and Real-time Converged
Переглядів 498 місяців тому
Presented by Felipe Mendes at OSA Con 2023 Analytics (OLAP) and Real-time (OLTP) workloads serve distinctly different purposes. OLAP is optimized for data analysis and reporting, while OLTP is optimized for real-time low-latency traffic. Most databases are designed to primarily benefit from one of them. Worse, concurrently running both workloads under the same datastore will frequently introduc...
Maybe The Real Modern Data Stack Was the Open Source Tools We Got Along The Way
Переглядів 518 місяців тому
Presented by Pedram Navid at OSA Con 2023 There are many misconceptions of the Modern Data Stack, and it's easy to forget the real pain it solved and the value it unlocked. While some people still view the Modern Data Stack as marketing-fluff, I'd like to demonstrate how powerful it can be by reclaiming it using Open Source tooling. With tools like sling, dbt, duckdb, dagster, and more, we can ...
Many Faces of Real-time Analytics
Переглядів 348 місяців тому
Many Faces of Real-time Analytics
"New" Workflow Orchestrator in town: "Apache Airflow 2.x"
Переглядів 848 місяців тому
"New" Workflow Orchestrator in town: "Apache Airflow 2.x"
Make data movement limitless and secure with Open Source
Переглядів 288 місяців тому
Make data movement limitless and secure with Open Source
Maximizing Query Speed and Minimizing Costs in Data Lakes with Open-Source Caching
Переглядів 248 місяців тому
Maximizing Query Speed and Minimizing Costs in Data Lakes with Open-Source Caching
ETL with Meltano + Singer in the LLM era
Переглядів 2768 місяців тому
ETL with Meltano Singer in the LLM era
Navigating the Landscape of a Fully Open Source Data Stack in 2023
Переглядів 758 місяців тому
Navigating the Landscape of a Fully Open Source Data Stack in 2023
A Guide to Responsible Data Collection In Open Source
Переглядів 538 місяців тому
A Guide to Responsible Data Collection In Open Source
Open Source means Open! Or Does it? The State of Licensing in 2023
Переглядів 158 місяців тому
Open Source means Open! Or Does it? The State of Licensing in 2023
Leveraging object storage: Tiered Storage for ClickHouse
Переглядів 888 місяців тому
Leveraging object storage: Tiered Storage for ClickHouse
Open Formats: The Happy Accident Disrupting the Data Industry
Переглядів 408 місяців тому
Open Formats: The Happy Accident Disrupting the Data Industry
Open Source BI FTW - Building Compelling Dashboards with Apache Superset
Переглядів 1678 місяців тому
Open Source BI FTW - Building Compelling Dashboards with Apache Superset
The Need for an Open Standard for Semantic Layer
Переглядів 608 місяців тому
The Need for an Open Standard for Semantic Layer
Unlocking Advanced Log Analytics With ClickHouse and Kafka
Переглядів 1198 місяців тому
Unlocking Advanced Log Analytics With ClickHouse and Kafka
StarRocks: Fast Real-Time Analytics for User-Facing Applications
Переглядів 1268 місяців тому
StarRocks: Fast Real-Time Analytics for User-Facing Applications
QuestDB: The building blocks of a fast open-source time-series database
Переглядів 428 місяців тому
QuestDB: The building blocks of a fast open-source time-series database

КОМЕНТАРІ

  • @EnriqueParedes_mdz
    @EnriqueParedes_mdz 2 місяці тому

    One of the most smart ideas I’ve seen in a long time. Thanks!!

  • @sudhanshuagariya9288
    @sudhanshuagariya9288 4 місяці тому

    Hi,can you please share the github link for this project

  • @krist17860
    @krist17860 7 місяців тому

    Takk Shirshanka, good content exceptional delivery as always.