- 124
- 370 839
Dask
United States
Приєднався 22 лют 2016
Content, tutorials, and more on how to use Dask effectively.
Dask is a flexible open-source Python library for parallel computing. Dask scales Python code from multi-core local machines to large distributed clusters in the cloud. Dask provides a familiar user interface by mirroring the APIs of other libraries in the PyData ecosystem including Pandas, Scikit-learn, and NumPy. It also exposes low-level APIs that help programmers run custom algorithms in parallel.
Dask was created by Matthew Rocklin in 2014 and is used by retail, financial, and governmental organizations, as well as life science and geophysical institutes.
Dask is a flexible open-source Python library for parallel computing. Dask scales Python code from multi-core local machines to large distributed clusters in the cloud. Dask provides a familiar user interface by mirroring the APIs of other libraries in the PyData ecosystem including Pandas, Scikit-learn, and NumPy. It also exposes low-level APIs that help programmers run custom algorithms in parallel.
Dask was created by Matthew Rocklin in 2014 and is used by retail, financial, and governmental organizations, as well as life science and geophysical institutes.
Dask Demo Day 2024-09-05
Today's Talks:
00:00 Intro
00:25 Dask Applications in Astronomy with LINCC Frameworks - @dougbrn
21:12 dask-joqueue demo - @jacobtomlinson
Next Demo Day is October 3rd, sign up here:
github.com/dask/community/issues/307
---
What is Dask Demo Day?
Each month we solicit 5-10 minute demos that show off ongoing and/or lesser-known work. Meetings will be recorded and advertised on social. Hopefully, this helps educate folks on some of the great work people do.
If you're interested, please reply to this issue with a brief (a couple sentences) description. If you have colleagues who you think should be interested please let them know. If you would like to present but not this month, check out the dates and signup for an upcoming one:
coiled.io/dask-demo-days
----
What is Dask?
Dask is a free and open-source library for parallel computing in Python. Dask is a community project maintained by developers and organizations.
Share your feedback on this video in the comments and let us know:
- Did you find this video helpful?
- Have you used Dask before?
Learn more at dask.org
00:00 Intro
00:25 Dask Applications in Astronomy with LINCC Frameworks - @dougbrn
21:12 dask-joqueue demo - @jacobtomlinson
Next Demo Day is October 3rd, sign up here:
github.com/dask/community/issues/307
---
What is Dask Demo Day?
Each month we solicit 5-10 minute demos that show off ongoing and/or lesser-known work. Meetings will be recorded and advertised on social. Hopefully, this helps educate folks on some of the great work people do.
If you're interested, please reply to this issue with a brief (a couple sentences) description. If you have colleagues who you think should be interested please let them know. If you would like to present but not this month, check out the dates and signup for an upcoming one:
coiled.io/dask-demo-days
----
What is Dask?
Dask is a free and open-source library for parallel computing in Python. Dask is a community project maintained by developers and organizations.
Share your feedback on this video in the comments and let us know:
- Did you find this video helpful?
- Have you used Dask before?
Learn more at dask.org
Переглядів: 397
Відео
Dask Demo Day 2024-03-21
Переглядів 1,1 тис.9 місяців тому
Today's Talks: 00:00 Intro 00:38 Dask DataFrame is Fast - @fjetter 14:15 Large scale population of vector databases for RAG - @mrocklin 26:36 Easy GPU access with Coiled - @jrbourbeau Next Demo Day is April 18th, sign up here: github.com/dask/community/issues/307 What is Dask Demo Day? Each month we solicit 5-10 minute demos that show off ongoing and/or lesser-known work. Meetings will be recor...
Dask Demo Day - 2024-02-15
Переглядів 76510 місяців тому
Today's Talks: 00:00 Intro 01:18 One trillion row challenge - @mrocklin 06:20 Deploying Dask on Databricks - @jacobtomlinson 15:09 Deploying Prefect workflows on the cloud with Coiled - @jrbourbeau 29:22 Scaling embedding pipelines (LlamaIndex Dask) - @quasiben 46:45 Using AWS Cost Explorer to see the cost of public IPv4 addresses - @ntabris Next Demo Day is March 21st, sign up here: github.com...
Dask Demo Day - 2024-01-18
Переглядів 733Рік тому
Today's Talks: 00:00 Intro 00:47 Apache Beam DaskRunner - @cisaacstern 15:45 Array expressions - @mrocklin 26:27 One billion row challenge - @scharlottej13 What is Dask Demo Day? Each month we solicit 5-10 minute demos that show off ongoing and/or lesser-known work. Meetings will be recorded and advertised on social. Hopefully, this helps educate folks on some of the great work people do. If yo...
Dask Demo Day - 2023.10.19
Переглядів 847Рік тому
October 19th, 2023 Today's Talks: 00:00 Intro 00:31 @jacobtomlinson - "Who uses RAPIDS?" 10:51 @mrocklin - TPC-H benchmarks for Spark, Dask, Polars, DuckDB 24:27 @jhamman Dask - Arraylake integration 37:24 @mrchtr - Fondant We'd like to solicit 5-10 minute demos that show off ongoing or lesser-known work. I hope to have 3-5 of these during the meeting. Meetings will be recorded and advertised o...
Dask Demo Day - 2023-09-21
Переглядів 414Рік тому
Today's Talks 00:00 Intro 00:21 @fjetter - Performance with P2P array rechunking 14:04 @phofl - Dask expressions 27:07 @sjcharlotte13 @dcherian - Processing a quarter petabyte geospatial dataset in the cloud We'd like to solicit 5-10 minute demos that show off ongoing or lesser-known work. I hope to have 3-5 of these during the meeting. Meetings will be recorded and advertised on social. Hopefu...
Dask Demo Day - 2023-08-17
Переглядів 377Рік тому
Last Dask Demo Day of the summer! Todays Talks: @fjetter - Memray Integration for Memory Management @mrocklin - Some new updates and news @ jrbourbeau - Analyzing Sea Levels in the Cloud with Earthaccess and Coiled We'd like to solicit 5-10 minute demos that show off ongoing or lesser-known work. I hope to have 3-5 of these during the meeting. Meetings will be recorded and advertised on social....
How to Install Dask
Переглядів 1,2 тис.Рік тому
Learn how to install Dask and the Dask JupyterLab extension with either conda or pip. This video goes through how to set up with a clean working environment with Dask 00:00 Introduction 00:51 Pip install Dask 02:21 Create LocalCluster 03:27 Use Dashboard in JupyterLab
Dask Demo Day - 2023-07-20
Переглядів 566Рік тому
Today's talks @hendrikmakait - Shuffle resilience @Matt711 - Dask-Kubernetes update @GueroudjiAmal - External tasks in Dask distributed (github.com/GueroudjiAmal/distributed) @skrawcz Dask - Hamilton integration We'd like to solicit 5-10 minute demos that show off ongoing or lesser-known work. I hope to have 3-5 of these during the meeting. Meetings will be recorded and advertised on social. Ho...
Dask Demo Day - 2023-06-15
Переглядів 539Рік тому
Today's Talks dask-geopandas demo by @martinfleis Fine performance dask metrics and spans @crusaderky (10-15 min) Gil monitoring on dask @milesgranger We'd like to solicit 5-10 minute demos that show off ongoing or lesser-known work. I hope to have 3-5 of these during the meeting. Meetings will be recorded and advertised on social. Hopefully, this helps educate folks on some of the great work p...
Dask Demo Day 2023-05-18
Переглядів 479Рік тому
These are 5-10 minute demos that show off ongoing or lesser-known work. We hope to have 3-5 of these during the meeting. Meetings will be recorded and advertised on social. Hopefully, this helps to educate folks on some of the great work people are up to. Meetings are 3rd Thursday of every month at 11am EDT on zoom, Zoom link: us06web.zoom.us/j/89383035703?pwd=WkRJSzNnRTh4T2R1ZjJuVVdJWlMxQT09 W...
Dask Demo Day 2023-04-20
Переглядів 507Рік тому
Talks: Lindsey Gray - dask-awkward and dask-histogram for high energy physics analysis Amine Diro - daskqueue : a dask-based distributed task queue James Bourbeau - Pyarrow strings in Dask DataFrames Jacob Tomlinson - Launching a Jupyter/Dask cluster on NVIDIA Base Command Platform Want to present in one of the upcoming Dask Demo Days? Sign up here: github.com/dask/community/issues/307 Key Mome...
Dask Demo Day - 2023-03-16
Переглядів 467Рік тому
Dask Demo Days Talks: Analyzing Terabytes of Ocean Simulation model output with Xarray, xgcm and xhistogram - Tom Nicholas P2P shuffling - Hendrik Makait Scaling weather radar data analysis with Dask - Max Grover Automatic package synchronization in Coiled Dask Clusters - David Chudzicki Graph Neural Networks training with Dask - Vibhu Jawa Want to present at one of the upcoming Dask Demo Days?...
Dask Demo Day - 2023-02-16
Переглядів 502Рік тому
Monthly Dask Demo Day: February 2023 Talks: 00:00 Intro 00:28 New Dask integration in Flyte - Bernhard Stadlbauer 11:37 Parallelizing FTP downloads from a janky government server - Paul Hobson 22:45 Configurable Dataframe backends - Rick Zamora 34:36 Parallelize HPO of XGBoost with Optuna and Dask (multi-cluster) - Guido Imperiale 43:20 Accelerated Jaccard similarity using RAPIDS and Dask - Jiw...
Dask Demo Day - 2022-11-16
Переглядів 1 тис.2 роки тому
Monthly demo day for Dask for November 2022 Github Issue: github.com/dask/community/issues/286 Talks: 00:00 Intro 03:05 2,000,000,000 lightning flashes - @ktyle 14:44 Dask CLI - @douglasdavis 21:44 Optuna - @jrbourbeau 32:00 Community Interlude - @mrocklin 34:02 Dask Awkward - @douglasdavis 46:02 Dask PySpy - @gjoseph92 01:03:30 Closing Follow us on twitter @dask_dev or sign up for the newslett...
Dask in Production | How Dask Can Help in Production
Переглядів 5462 роки тому
Dask in Production | How Dask Can Help in Production
Dask Use Case | Who Uses Dask: GrubHub
Переглядів 2932 роки тому
Dask Use Case | Who Uses Dask: GrubHub
Dask Use Case | Who Uses Dask: CapitalOne
Переглядів 3022 роки тому
Dask Use Case | Who Uses Dask: CapitalOne
Dask Use Case | Who Uses Dask: Geophysical Sciences Studying Ocean Currents
Переглядів 3172 роки тому
Dask Use Case | Who Uses Dask: Geophysical Sciences Studying Ocean Currents
Dask Use Case | Who Uses Dask: UK Meteorology Office
Переглядів 1912 роки тому
Dask Use Case | Who Uses Dask: UK Meteorology Office
Dask Use Case | Who Uses Dask: WalMart
Переглядів 3182 роки тому
Dask Use Case | Who Uses Dask: WalMart
Dask Use Case | CapitalOne: Adding Dask to Your Existing Pipeline
Переглядів 3222 роки тому
Dask Use Case | CapitalOne: Adding Dask to Your Existing Pipeline
Dask Scientific Libraries | Scaling Science | Genevieve Buckley
Переглядів 3572 роки тому
Dask Scientific Libraries | Scaling Science | Genevieve Buckley
New Dask Branding | Dask Gets an Upgrade
Переглядів 1,1 тис.2 роки тому
New Dask Branding | Dask Gets an Upgrade
Dask Use Case | Who Uses Dask: Financial Institutions
Переглядів 5582 роки тому
Dask Use Case | Who Uses Dask: Financial Institutions
Dask Best Practices | Scaling Up Science | Genevieve Buckley
Переглядів 3,6 тис.2 роки тому
Dask Best Practices | Scaling Up Science | Genevieve Buckley
Dask for Science | Dask Example | Genevieve Buckley
Переглядів 3572 роки тому
Dask for Science | Dask Example | Genevieve Buckley
Scientific Computing & Dask | Leveraging Dask for Life Sciences | Genevieve Buckley
Переглядів 7072 роки тому
Scientific Computing & Dask | Leveraging Dask for Life Sciences | Genevieve Buckley
360p, you have to be kidding me.
Great video!
based...
Cool, my team uses dask at very scale for transactions management, credit card, debit card and ATM transactions. Basically one csv file will come and we have to do line by line validation and transformations by joining a couple of mariadb tables. Our goal is to finish 20 million records within 45 mins. For step 1, file validation, im using dask bag with enumeration to write which line has problems, and using dask lazy evaluation with delayed for step 2 transformations. But not able to process 20 mill in 45 mins for now . Any hindsight would be highly appreciated🎉
👋thanks for the question! It's hard to say without knowing some more details. If you haven't already, I'd try your question in the Dask forum dask.discourse.group/, ideally with a minimal reproducer so folks can jump in and help figure out where the bottleneck is.
Witting Stravenue
nixtla wasted a lot of my time and my friends' time with their broken forecasting tools. Exogenous variables do not work. Data scientists are toiling away with a crap product and they dont even know it, getting excited abotu this bullshit framework and then they finally try exogenous variables and boom all of this programming was a waste because we can get much better accuracy without the nixtla dependencies when we add exogenous variables, which we can NOT do using this nixtla shit product. nixtla it turns out is a bullshit organization
Gemini 1.5 Pro: The video mentions that group by operations can fail due to large datasets and unsorted data. Here are the reasons for failure and how to compensate for them: * **Large datasets:** When dealing with large datasets, it is recommended to tune the split-out parameter. This parameter determines the size of the partitions, and a good starting point is to target 100 megabyte partitions. You can estimate the split-out value by considering the number of groups in your data and the size of each group. * **Unsorted data:** Dask performs better when the data is sorted by the group by fields. If your data is not sorted, Dask will shuffle the data to group it, which can be expensive. There are two ways to address this: * Sort your data before performing the group by operation. * Use math partitions. Math partitions can be used when your data is already sorted by an index matching one of your group-by fields. In this case, Dask can perform the group by operation on each partition without shuffling the data. Here are additional tips to improve the performance of group by operations in Dask: * **Optimize memory usage:** * Use pandas string dtype instead of object dtype for strings. * Use categorical data types when applicable. Categoricals are efficient when you have a small number of unique strings and the strings are large. * Drop unnecessary columns before performing the group by operation. * **Repartition your data:** Repartitioning your data ensures that the partitions are uniform in size. This can improve the performance of group by operations by avoiding situations where some partitions are significantly larger than others. * **Prioritize reductions before group by:** Perform any filtering or data reduction operations before the group by operation. This will reduce the amount of data that needs to be shuffled or grouped by.
Gemini 1.5 Pro: This video is about Dask Bag, a library for processing large datasets in parallel. The video starts with a basic introduction to Dask Bag. It explains that Dask Bag is a library that is useful for doing embarrassingly parallel analyses and a lot of pre-processing especially the text JSON or Avro data. Then the video dives into details with an example. The speaker constructs a bag with ten elements separate into four different partitions to demonstrate what a bag is. A bag is like a bunch of lists. Users can perform map, filter and reduce functions on the bag. For instance, the speaker uses map function to square every element in the bag, and filter function to get only the even elements. Next, the video shows how to use Dask Bag on real data. The data used in the example is a bunch of JSON files from a web service called MyBinder. The speaker reads the data using the read text function from Dask Bag. Then the speaker uses map function to convert the JSON encoded text into Python dictionaries. After converting the data into Python dictionaries, the speaker uses frequencies function to count how many times each Github repository shows up. The result shows that ipython is the most common repository that showed up in the data. The video then talks about how to use Dask Bag to pre-process data. The speaker filters out data that does not have "task" in the "spec" field and convert the data back into JSON format. Finally, the speaker writes the data to a text file. The last part of the video talks about the data frame. The speaker mentioned that Dask Bag may not be the right choice for complex analyses. Dask Dataframe might be a better option for such cases. The speaker also mentioned that Dask Bag can be converted to a Dask Dataframe using the to_dataframe function.
The resolution is very bad.
Excellent video, I wish all tech videos were this good.
Very interesting. Thank you for this view on the new dask_databricks functionalities.
Dask on Databricks is really cool. There's so many times you're on Databricks doing Python data science and don't want to use Spark.
Question regarding Array Expressions: how do they play together with the Dask (high-level) graph? A concrete xarray example: a problem with very large arrays is that even just their computational graph is too large to be materialized. A strategy is to read them without Dask (chunks=None), slice, and then again turn them into a dask-backed array by chunking. Would Array Expression simplify this, pushing the slicing before the graph materialization, or are those operating at different levels?
Expressions will eventually replace high-level graphs. They generate low-level task graphs directly. Slicing is definitely pushed through before graph generation, which will likely help reduce overall graph generation overhead. It's still possible to create large graphs though, just less likely. We're also shipping the expressions directly to the scheduler, so there will be less pain to large graphs (they won't have to travel over a wire).
@@Coiled Thanks for the answer! That actually sounds great, would help our workflows quite a bit.
Show its use with xarray
Obrigado por ter legendas em Português .
Where can I get Paul Hobson's source code ?
Awesome video Trevor. Do you have any idea about the resources that I can use to learn more about the Zarr and its inbuilt configurations? I have seen the documentation, but it seems little overwhelming to me.
Nice video. Is there a detailed review how your colleagues are analyze billions of records? you've mentioned it here: ua-cam.com/video/8aQ3xcX8e9Y/v-deo.htmlsi=0FRQOT9TEnDz9FUs&t=1621
@martinfleis can we access your notebook?
Had some issues with Ray, but Dask worked out of the Box! Congratulations to the Developers!
What is the name of this enviromnet where you are running this commands?
It's a Jupyter notebook
Great intro. Also, how do I show those additional panes on the right shows an 2:05 to display memory usage and progress etc. That is pretty awesome. Thanks so much
Great work you guys
1:08:00
Could I use async/await with dask?
55:15
Is there an official Dask community channel?
Hi Matt. Amazing stuff as always. Do you know if there is something similar for VScode? Thank you!
pin me
Kept checking my Slack because I didn't realize it was coming from the video...
is the notebook for the local gpu availablr
Thank you!
Dask is the bomb.
Hiya, you mentioned Xarray in passing. Is there a multi-demensional equivalent to cudf?
Please correct if I am wrong, but maybe it is better to open file for writing at 4:48 with 'a' mode or every worker will override the data inside and you will have only the result of the last firing worker.
where can we download the CSV files?
Highest resolution available is 360p. It’s hard to read the code
These videos are fantastic but sometimes difficult to hear (even with my volume set to max)
Hi, since this video was posted, the dask-report.html page has an extra tab called "Summary" - is there a doc where I can read what the various stats in that summary mean?
cant even create dataframe from python list. need to create a pandas dataframe first. which kinda defeats the whole purpose.
Thank you for this recorded Dask Demo Day! Are these Jupyter NB available for users?
We don't have a single repo for this, yet. My notebooks are available here github.com/fjetter/dask-demo
Hi! Thank you for this! Regards
Does the Dask have some kind of linter?
Really a great talk!
Thank you for the explanation. Now it clears up my confusion on compute() vs persist()
Where can I get access to the notebooks used here?
Hi Carlos, you can find them here: github.com/quasiben/rapids-dask-summit-2021
The quality of this video makes it impossible to read the code
This lib is awesome!!! Thanks a lot 😍😍
what a boring speaker, such a disgusting english!
Thanks for the great explanation!