- 102
- 60 052
CelerData
United States
Приєднався 9 лют 2022
CelerData enables enterprises to quickly and easily grow their business with the world’s most performant query engine for open lakehouses. Powered by StarRocks, CelerData delivers 3X the performance/cost of any other solution on the market and is the only platform uniquely designed to enable users to simplify their lakehouse architecture and ditch the need for a data warehouse. CelerData is used worldwide by market-leading brands including Airbnb, Pinterest, and Trip.com to generate critical new insights for these data-driven companies.
Stream Processing Must Haves
Ben Gamble, Field CTO at Ververica (the original creators of Apache Flink), and Sida Shen, Product Manager at CelerData, break down the essentials of real-time stream processing, sharing insights into its benefits, challenges, and practical ways to get started.
Take a closer look at VERA: The Path to Cloud-Native Apache Flink, and how it’s designed to support today’s real-time data needs. The session concludes with a live demo that showcases how CelerData (managed StarRocks) and Ververica (managed Flink) integrate to form a powerful data stack. If you're interested in real-time data processing and its practical applications, this is a perfect place to start!
----------------------------------------------------------------------------------------------------------------------
00:00 Introduction
02:30 What is Stream Processing?
04:15 Does Stream Processing Mean it is “Always On”? Does it Make it Expensive? How Do You Manage the Cost?
08:47 How Do I Know if Stream Processing is Actually Useful for My Business?
11:08 The Technology Complexity of Stream Processing
17:20 With the Technology Available Today, What Use Cases Can Stream Processing Specifically Excel In? What Kind of Use Cases Are There for Stream Processing?
22:16 Understanding Real Time: "Soft Real Time," "Near Real Time," and "Real Real Time"-What Real Time Means
24:36 Challenges in Real-Time Analytics in Analytical Settings and How Databases Like StarRocks Address These Challenges
28:19 The Evolution of Table Formats and Introduction to Apache Paimon
32:38 The Inherent Complexity Compared to Batch Processing and the Knowledge Required to Manage Workloads
35:22 Solutions to Managing the Complexity in Stream Processing: Apache Flink and Vera Cloud Native
40:27 Demo: Celerdata + Ververica
50:34 Q&A
50:40 When not to use Apache Flink
53:16 Does lineage work with custom Flink jobs or only with Flink SQL? If it does, how does it inspect the custom code to determine the lineage?
55:58 How do you get the first batch of data in place before turning on streaming? For example, in a MySQL database → Flink → StarRocks pipeline, what is the best method to load all historical data?
58:12 Data lake analytics is gaining interest, but it still seems too slow to fully support streaming/real-time. What does the future look like for streaming analytics as people shift towards data lakes?
59:21 From the Paimon perspective, what do you think will happen in this space, especially with Snowflake doubling down on IB and Databricks acquiring Tabular?
1:01:59 How do you make a streaming solution compatible with a transactional database like StarRocks? I understand that inserts in StarRocks generate both partitions and replicas, which could pose issues with frequent small inserts.
1:04:02 How does Flink/Ververica handle scaling (e.g., when a large volume of data reaches the initial PostgreSQL database) and resource sharing? Can I run multiple small jobs on the same cluster/VM?
---------------------------------------------------------------------------------------------------------------------
Learn more at celerdata.com/
Connect with us:
LinkedIn: www.linkedin.com/company/celerdata/
Twitter: celerdata
CelerData Website: celerdata.com/
StarRocks GitHub: github.com/StarRocks/StarRocks
StarRocks Website: www.starrocks.io/
Slack: starrocks.io/redirecting-to-slack
#DataEngineering #StreamProcessing #RealTimeAnalytics #ApacheFlink #DataAnalytics #CloudNative #ApachePaimon #BigData #StreamingData #DataPipelines #OpenSourceData #EventDrivenArchitecture #DataInfrastructure #DataStreaming #DataOps #DataInnovation #DataLineage #ModernDataStack #FlinkSQL #DataManagement #DataScalability #ETL #ChangeDataCapture #DataProcessing #DataIntegration
Take a closer look at VERA: The Path to Cloud-Native Apache Flink, and how it’s designed to support today’s real-time data needs. The session concludes with a live demo that showcases how CelerData (managed StarRocks) and Ververica (managed Flink) integrate to form a powerful data stack. If you're interested in real-time data processing and its practical applications, this is a perfect place to start!
----------------------------------------------------------------------------------------------------------------------
00:00 Introduction
02:30 What is Stream Processing?
04:15 Does Stream Processing Mean it is “Always On”? Does it Make it Expensive? How Do You Manage the Cost?
08:47 How Do I Know if Stream Processing is Actually Useful for My Business?
11:08 The Technology Complexity of Stream Processing
17:20 With the Technology Available Today, What Use Cases Can Stream Processing Specifically Excel In? What Kind of Use Cases Are There for Stream Processing?
22:16 Understanding Real Time: "Soft Real Time," "Near Real Time," and "Real Real Time"-What Real Time Means
24:36 Challenges in Real-Time Analytics in Analytical Settings and How Databases Like StarRocks Address These Challenges
28:19 The Evolution of Table Formats and Introduction to Apache Paimon
32:38 The Inherent Complexity Compared to Batch Processing and the Knowledge Required to Manage Workloads
35:22 Solutions to Managing the Complexity in Stream Processing: Apache Flink and Vera Cloud Native
40:27 Demo: Celerdata + Ververica
50:34 Q&A
50:40 When not to use Apache Flink
53:16 Does lineage work with custom Flink jobs or only with Flink SQL? If it does, how does it inspect the custom code to determine the lineage?
55:58 How do you get the first batch of data in place before turning on streaming? For example, in a MySQL database → Flink → StarRocks pipeline, what is the best method to load all historical data?
58:12 Data lake analytics is gaining interest, but it still seems too slow to fully support streaming/real-time. What does the future look like for streaming analytics as people shift towards data lakes?
59:21 From the Paimon perspective, what do you think will happen in this space, especially with Snowflake doubling down on IB and Databricks acquiring Tabular?
1:01:59 How do you make a streaming solution compatible with a transactional database like StarRocks? I understand that inserts in StarRocks generate both partitions and replicas, which could pose issues with frequent small inserts.
1:04:02 How does Flink/Ververica handle scaling (e.g., when a large volume of data reaches the initial PostgreSQL database) and resource sharing? Can I run multiple small jobs on the same cluster/VM?
---------------------------------------------------------------------------------------------------------------------
Learn more at celerdata.com/
Connect with us:
LinkedIn: www.linkedin.com/company/celerdata/
Twitter: celerdata
CelerData Website: celerdata.com/
StarRocks GitHub: github.com/StarRocks/StarRocks
StarRocks Website: www.starrocks.io/
Slack: starrocks.io/redirecting-to-slack
#DataEngineering #StreamProcessing #RealTimeAnalytics #ApacheFlink #DataAnalytics #CloudNative #ApachePaimon #BigData #StreamingData #DataPipelines #OpenSourceData #EventDrivenArchitecture #DataInfrastructure #DataStreaming #DataOps #DataInnovation #DataLineage #ModernDataStack #FlinkSQL #DataManagement #DataScalability #ETL #ChangeDataCapture #DataProcessing #DataIntegration
Переглядів: 226
Відео
What’s Next for Lakehouse in 2025 With Databricks and CelerData
Переглядів 57114 днів тому
🌟Take a deep dive into the future of lakehouse technology with this recorded fireside chat featuring Databricks Product Expert Michelle Leon and Sida Shen from CelerData. In this session, we unpack the latest developments, including Unity Catalog, Databricks’ acquisition of Tabular, key differences between Databricks and Snowflake on open formats, and more. You'll also see live demos showcasing...
StarRocks X Tencent - Introducing Vector Similarity Search
Переглядів 34228 днів тому
In this video, Petri Zhang from Tencent reveals how Tencent leverages StarRocks for Retrieval-Augmented Generation (RAG) with hybrid search. He details their journey in building a billion-scale image repository that enables both reverse image search and traditional BI dashboards within a unified platform. Petri also shares the challenges they faced along the way and the key reasons why Tencent ...
Tencent Games Unifies Their Gaming Analytics with StarRocks
Переглядів 85Місяць тому
✅ 15x storage cost-saving ✅ 50% better efficiency when developing new data pipelines ✅ 100% pre-aggregations are eliminated Discover how Tencent Games, boasting 200 million users globally and renowned titles like "League of Legends" & "PUBG Mobile," is able to unify their gaming analytics across multiple studios, and why StarRocks was the only solution that could make this possible. Tencent Gam...
Demo: Achieving Interoperability with Unity Catalog, Delta UniForm, and StarRocks
Переглядів 229Місяць тому
This video introduces Databricks’ Unity Catalog, an open-source tool launched at the DATA AI 2024 event. Unity Catalog helps manage data across different data and AI applications. In this demo, StarRocks, a powerful query engine, is used to show how Unity Catalog makes it easy to work with multiple table formats. For example, StarRocks is able to query data stored as Delta but treat it like an ...
Query Engine Must-Haves for the Best Apache Superset Experience
Переглядів 569Місяць тому
Apache Superset promises quick insights through interactive, no-code dashboards. Still, it can struggle to deliver due to the unsatisfactory performance and limited capabilities of the query engines chosen to power it. In this session, CelerData's Sida Shen will explore the essential capabilities your query engine must possess to achieve the BI performance everyone hopes for, but few often get,...
How to Accelerate Apache Iceberg Metadata Retrieval
Переглядів 3852 місяці тому
Many Iceberg users struggle with job planning taking too long, particularly when dealing with high concurrency or large metadata files. In this video, Chelsea (Simo) Wang covers: 🌟 What makes job planning in Iceberg so challenging. 🌟How StarRocks 3.3’s new features provide effective solutions for both large and small metadata use cases, supported by real-world examples. 🌟StarRocks' adaptive job...
StarRocks Virtual Meetup: Version 3.3.x and What’s Next
Переглядів 3742 місяці тому
Catch the recorded session of the StarRocks Community August Meetup, where Harrison (Heng) Zhao, a TSC member, and Sida Shen, an StarRocks Contributor and project expert, discussed the latest developments and shared their expertise. Highlights: 🌟New Features in 3.3.x and the Roadmap Ahead: Dive into the latest features introduced in the 3.3.x release of StarRocks, and discover the exciting deve...
Rockset Acquired by OpenAI: What's Next for Its Users?
Переглядів 4883 місяці тому
Whether you're affected by Rockset's shutdown this September or simply exploring new database options, this video will guide you through a range of alternatives and help you find the best fit for your needs. Webinar Highlights: 🌟What the Rockset acquisition means for its users. 🌟Immediate steps users should take now to ensure continuity in their operations. 🌟The pros and cons of multiple open-s...
Data Lake Query Engines: Trino vs StarRocks
Переглядів 3274 місяці тому
Explore the differences between Trino and StarRocks as data lake query engines, their architectures, performance benchmarks, and suitable use cases for modern data analytics. Timestamps 00:00Intro 00:10 What is Trino? 00:56 What is StarRocks? 01:59 Performance Comparison: Trino vs StarRocks 02:33 Trino vs. StarRocks: Which to Use for Specific Use Cases 03:28 Conclusion Join StarRocks on Slack: ...
What Is StarRocks and What Are Its Use Cases?
Переглядів 7054 місяці тому
StarRocks is an Apache-licensed open-source project under the Linux Foundation. As of June 2024, it has garnered 8.3 thousand stars on GitHub and has 354 contributors globally. Its adoption by major industry players attests to its reliability and efficiency in handling various data analytics needs. Get to know StarRocks in this video and discover how it excels as a data warehouse for real-time ...
How to Solve Data Upserts Challenges in OLAP Databases
Переглядів 3194 місяці тому
Discover the challenges of data upserts in OLAP systems and explore effective strategies, including the delete and insert method, with real-world examples like Airbnb’s fraud detection. Timestamps 00:00 Intro 00:05 The Challenges of Data Upserts Challenges in OLAP 01:25 Delete and Insert Strategy 03:24 Data Upsert Example: Airbnb Fraud Detection - Challenges 04:42 Data Upsert Example: Airbnb Fr...
Materialized Views: Tips, Tricks, and Use Cases
Переглядів 4784 місяці тому
Materialized views are one of StarRocks’ most popular and powerful features, but are you getting the most out of them? Murphy Wang, the technical mind behind the project’s materialized views, is ready to share all the latest tips and tricks to help you get the best query performance for your data pipeline. Session Highlights: 🌟Best practices for rolling out materialized views: Learn what causes...
[Coinbase Lakehouse Architecture] Achieving Data Warehouse Performance on a Data Lakehouse
Переглядів 8834 місяці тому
Join Sida Shen from CelerData and Eric Sun from Coinbase in this video as they dive into the latest advancements in data lakehouse querying and share tips to make the most out of your data lakehouse. They'll cover: 🌟Why you shouldn't rely on proprietary data warehouses just to speed up queries 🌟The latest cool stuff in query engines boosting lakehouse performance 🌟A close look at how Coinbase i...
StarRocks 3.3 is Here: Key Features and Improvements
Переглядів 8415 місяців тому
StarRocks 3.3 is here, and it's more powerful than ever! In this video, we'll walk you through everything you need to know to get the most out of this release. Let's dive in and explore the new features and enhancements together! 00:00 Intro & Agenda 01:21 StarRocks Use Cases - Lakehouse Query Engine 03:07 StarRocks Use Cases - Real-Time Analytics Workloads 05:24 StarRocks 3.3: Shared-Data 05:3...
Getting Started with CelerData Cloud Serverless: Intro and Live Demo
Переглядів 1185 місяців тому
Getting Started with CelerData Cloud Serverless: Intro and Live Demo
The Register & CelerData: Ditch Your Data Warehouse with Superior Lakehouse Performance
Переглядів 1435 місяців тому
The Register & CelerData: Ditch Your Data Warehouse with Superior Lakehouse Performance
Going Serverless for Warehouse-Free Lakehouse Analytics
Переглядів 3266 місяців тому
Going Serverless for Warehouse-Free Lakehouse Analytics
Challenges and Accelerations in Running Data Warehouse Workloads on Open Data Lake
Переглядів 816 місяців тому
Challenges and Accelerations in Running Data Warehouse Workloads on Open Data Lake
What Is Single Instruction Multiple Data and the Role of SIMD in Boosting OLAP Database Efficiency
Переглядів 3046 місяців тому
What Is Single Instruction Multiple Data and the Role of SIMD in Boosting OLAP Database Efficiency
StarRocks Connect: Sling - Extract & Load Data From Your CLI With Ease and Speed
Переглядів 3926 місяців тому
StarRocks Connect: Sling - Extract & Load Data From Your CLI With Ease and Speed
Unlock User Behavior with 87M Events Using Hudi, StarRocks & MinIO - Apache Hudi Community Call
Переглядів 2146 місяців тому
Unlock User Behavior with 87M Events Using Hudi, StarRocks & MinIO - Apache Hudi Community Call
Tencent's A/B Testing SaaS Platform Unifies All SQL Workloads on the Data Lakehouse with StarRocks
Переглядів 557 місяців тому
Tencent's A/B Testing SaaS Platform Unifies All SQL Workloads on the Data Lakehouse with StarRocks
How to Accelerate Apache Iceberg Queries
Переглядів 1627 місяців тому
How to Accelerate Apache Iceberg Queries
Apache Iceberg + StarRocks: Your Recipe for Superior Lakehouse Performance
Переглядів 2,2 тис.7 місяців тому
Apache Iceberg StarRocks: Your Recipe for Superior Lakehouse Performance
StarRocks Architecture: StarRocks as a Data Warehouse & StarRocks as a Lakehouse Query Engine
Переглядів 3837 місяців тому
StarRocks Architecture: StarRocks as a Data Warehouse & StarRocks as a Lakehouse Query Engine
Getting Started Tutorial: Building a Data Lakehouse With StarRocks, Apache Hudi, and MinIO
Переглядів 6647 місяців тому
Getting Started Tutorial: Building a Data Lakehouse With StarRocks, Apache Hudi, and MinIO
StarRocks on Open Data Lakehouse Tutorial: StarRocks + Apache Iceberg + MinIO
Переглядів 7738 місяців тому
StarRocks on Open Data Lakehouse Tutorial: StarRocks Apache Iceberg MinIO