Introduction to Scaling Analytics Using DuckDB with Python

Using the {arrow} and {duckdb} packages to wrangle medical datasets that are Larger than RAM

DuckDB: Crunching Data Anywhere, From Laptops to Servers • Gabor Szarnyas • GOTO 2024

СПОРИМ ТЫ НЕ ЗНАЕШЬ ТРИ СЛОВА НА БУКВУ О? #shortsvideo #юмор #катяклон #comedy #прикол #мамадочка

When you lose control of your Waboba Moon Ball. @TheWabobaTeam #wabobapartner

Правильный подход к детям

DuckDB Tutorial - DuckDB course for beginners

Data with Marc

Переглядів 37 022

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 18 січ 2025

КОМЕНТАРІ • 51

@MHelmers77 Рік тому ⁺¹⁶
Would love a full Udemy Course on DuckDB & dbt from you!
@shneor.e Рік тому ⁺²
I would buy it
@gatono25 Рік тому ⁺¹
That's game changing, very nice video, Marc, Thank you.
@tech-n-data Рік тому ⁺¹
This is a great intro video, thank you.
@MarcLamberti Рік тому ⁺¹
Thank you ❤️
@robertbutscher6824 Рік тому
great tutorial, thanks for inspiration and your interstings lessons
@interestingamerican3100 Рік тому ⁺¹
Dude.....I like your style! Subscribed and hit the bell....
@MarcLamberti Рік тому
Thank you 🙏
@HitAndMissLab 6 місяців тому
Brilliant tutorial, appreciate your work.
@mathieujasinski2383 Рік тому ⁺¹
I did smashed the like "booton", thanks brother, very helpful content!
@MarcLamberti Рік тому
Thank you 🙏
@TheBryhanS 4 місяці тому
Thank you!
@BD_warriors Рік тому ⁺¹
Please make more videos
@AliMasri91 Рік тому
Thank you! Great content as always!
@rayusaki88 7 місяців тому
Thanks Marc
@yangsong6111 6 місяців тому
thx a lot, I learned a lot
@adiliophi Рік тому
Awesome content! Thanks a lot!
@theaudiomelon 3 місяці тому ⁺¹
The link in the description only provides one dataset. How do I get all the CSVs?
@user-fv1576 6 місяців тому
do you have a video on your vscode config setup?
@stnico Рік тому
Great, thanks!
@MarcLamberti Рік тому
You’re welcome ☺️
@davidjackson7675 Рік тому
Thanks.
@ularkadutdotnet 2 роки тому
Awesome ❤
@MarcLamberti 2 роки тому
Thank you 🙏
@jerrynwaeze9269 2 роки тому
Hi This is the best video on duckdb atm. How do you get resources on it? I heard you can connect to a postgres db using a Postgres scanner. How does that work?
@CaribouDataScience Рік тому
Why not use the pandas SQL functions?
@LeSaboteur3981 11 місяців тому
for example i don't think it supports the nice extended Syntax like EXCLUTE. but the main reason is speed. duckDB will execute way faster, pandas SQL is still pandas in the end. (correct me if i'm wrong)
@georgelza 5 місяців тому
i'm sorta stuck... your VSCode is running like a Jupiter notebook... how so... what did i miss... your screen shows a run cell run above debug above... where does that come from
@dataslinger6379 Рік тому
Thanks as always for the great content Marc! Just as a sanity check for myself, OLAP Databases don't necessarily need to be columnar based correct? Recent big data databases use columnar based storage, but any database that processes analytical workloads is considered an OLAP database even if it is row based correct?
@olga6163 Рік тому
Thanks, great lesson! Could you help to fix an error, it occurs when I try to run the sales table that we`ve created. The error is "Conversion Error: Could not convert DATE (0019-04-19) to nanoseconds" Don`t understand why it`s trying to get nanoseconds with the date format. Thanks!
@LeSaboteur3981 11 місяців тому
like always... way to few likes for such a great video!🎉
@leonvictorrichard3959 Рік тому
Would love an dbt+ duckdb course from u on Udemy. Big fan of yours 🎉
@aruniyer1833 Рік тому
I am getting this error
Conversion Error: Could not convert DATE (0019-04-19) to nanoseconds and run conn.execute("from sales").df()
when i create a sales table. Not sure if you have seen it?
@MarcLamberti Рік тому ⁺¹
Let me double check that
@emilychase4610 Рік тому
I'm getting this error too!
@aruniyer1833 Рік тому
I was able to solve it by dropping the non date values
df = df.loc[df['Order Date'] != 'Order Date']
df['Order Date'] =pd.to_datetime(df['Order Date'],format='%m/%d/%y %H:%M')
In the sql block i made this change to match df format strptime("Order Date", '%Y-%m-%d %H:%M:%S')
Looks like the trycast is not working or some bug
@durgeshkshirsagar5160 Рік тому
4:28 it is written redhshit :D Please don't mind. Thanks for the video.
@MarcLamberti Рік тому ⁺¹
🫣🥹🥹🥹🥹
@shogun8-9 Рік тому
The files are not available anymore :(
@MarcLamberti Рік тому
What files?
@shogun8-9 Рік тому
@@MarcLamberti In order to work with this tutorial, you provided the sales dataset in the materials. There is a link in the description. On this page, there is a link to the kaggle dataset that you used in the video. However, this one got removed. So it is not possible anymore to follow this tutorial :(
@MarcLamberti Рік тому
oh oh, let me check if I can fix that
@shogun8-9 Рік тому
@@MarcLamberti Thank you! Let us know if it works again
@MarcLamberti Рік тому
Here we go: www.kaggle.com/datasets/kushagra1211/usa-sales-product-datasetcleaned
@pba1957 5 місяців тому
Here is important information. Jupyter notebook extension needs to be added to VSCode. How about using the StackOverflow principle and avoid the worthless thank you comments.
@rj_nelson_97 Рік тому
Oh man, oooofff! I tried following your video here and your blog with the stock analysis data. I've run into too many errors unfortunately. That's the thing about these dependencies - there's too many moving parts where errors are persistent. I can sort through them, but it would take up too much time. We all got work to do :-).
For data analysis work, my go to still at this point is using an AWS S3 bucket to load the raw data, use an AWS Glue to create a database and an AWS Glue Crawler to upload the data while creating a table within that database. I can also use AWS Glue Studio to convert the data from .cvs format to .parquet format. From there, I can use AWS Athena to query the data.
@durgeshkshirsagar5160 Рік тому
Except that baziz, everything is perfectly fine.😀
@davidjackson7675 Рік тому
When I ran this Python/Duckdb it only shape returned (186862, 6):
# with duckdb
cur_time = time.time()
df = conn.execute("""
SELECT *
FROM '/kaggle/input/sales-product-data/*.csv'
""").df()
print(f"time: {(time.time() - cur_time)}")
print(df.shape)

Наступне

Автоматичне відтворення

Introduction to Scaling Analytics Using DuckDB with Python

Introduction to Scaling Analytics Using DuckDB with Python

Using the {arrow} and {duckdb} packages to wrangle medical datasets that are Larger than RAM

Using the {arrow} and {duckdb} packages to wrangle medical datasets that are Larger than RAM

DuckDB: Crunching Data Anywhere, From Laptops to Servers • Gabor Szarnyas • GOTO 2024

DuckDB: Crunching Data Anywhere, From Laptops to Servers • Gabor Szarnyas • GOTO 2024

СПОРИМ ТЫ НЕ ЗНАЕШЬ ТРИ СЛОВА НА БУКВУ О? #shortsvideo #юмор #катяклон #comedy #прикол #мамадочка

СПОРИМ ТЫ НЕ ЗНАЕШЬ ТРИ СЛОВА НА БУКВУ О? #shortsvideo #юмор #катяклон #comedy #прикол #мамадочка

When you lose control of your Waboba Moon Ball. @TheWabobaTeam #wabobapartner

When you lose control of your Waboba Moon Ball. @TheWabobaTeam #wabobapartner

Правильный подход к детям

Правильный подход к детям

Прочистка шлюзов

Прочистка шлюзов

This INCREDIBLE trick will speed up your data processes.

This INCREDIBLE trick will speed up your data processes.

A (deeper) dive into DuckDB using DuckDB CLI and VSCode - PART 1

A (deeper) dive into DuckDB using DuckDB CLI and VSCode - PART 1

In-Process Analytical Data Management with DuckDB - posit::conf(2023)

In-Process Analytical Data Management with DuckDB - posit::conf(2023)

Why use DuckDB in your data pipelines ft. Niels Claeys

Why use DuckDB in your data pipelines ft. Niels Claeys

pg_duckdb: Postgres analytics just got faster with DuckDB

pg_duckdb: Postgres analytics just got faster with DuckDB

7 Database Design Mistakes to Avoid (With Solutions)

7 Database Design Mistakes to Avoid (With Solutions)

SQL Databases with Pandas and Python - A Complete Guide

SQL Databases with Pandas and Python - A Complete Guide

Why should you care about DuckDB? ft. Mihai Bojin

Why should you care about DuckDB? ft. Mihai Bojin

Big Data is Dead | MotherDuck

Big Data is Dead | MotherDuck

Перший наступ КНДРівців

Перший наступ КНДРівців

REAL or FAKE? #beatbox #tiktok

REAL or FAKE? #beatbox #tiktok

Они Скупали ВСЁ Серебро Мира и вот ЧТО Было Дальше! #shorts

Они Скупали ВСЁ Серебро Мира и вот ЧТО Было Дальше! #shorts

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

When you lose control of your Waboba Moon Ball. @TheWabobaTeam #wabobapartner

When you lose control of your Waboba Moon Ball. @TheWabobaTeam #wabobapartner

КТО НЕ ДВИНЕТСЯ, ПОЛУЧИТ МАШИНУ!

КТО НЕ ДВИНЕТСЯ, ПОЛУЧИТ МАШИНУ!

Комаровский. Когда конец войны, Трамп не поможет, потеря Украины, эмиграция, многоженство в Украине

Комаровский. Когда конец войны, Трамп не поможет, потеря Украины, эмиграция, многоженство в Украине

МАФИЯ в РЕАЛЬНОЙ ЖИЗНИ: Дубровский, Позов, Мамикс, Катя Клэп, Егорик, Кадрол, Столяров, Масленников

МАФИЯ в РЕАЛЬНОЙ ЖИЗНИ: Дубровский, Позов, Мамикс, Катя Клэп, Егорик, Кадрол, Столяров, Масленников