Scalable Machine Learning with Dask

Поділитися
Вставка
  • Опубліковано 30 лип 2024
  • AnacondaCon 2018. Tom Augspurger. Scikit-Learn, NumPy, and pandas form a great toolkit for single-machine, in- memory analytics. Scaling them to larger datasets can be difficult, as you have to adjust your workflow to use chunking or incremental learners. Dask provides NumPy- and pandas-like data containers for manipulating larger than memory datasets, and dask-ml provides estimators and utilities for modeling larger than memory datasets. These tools scale your usual workflow out to larger datasets. We’ll discuss some of the challenges data scientists run into when scaling out to larger datasets. We’ll then focus on demonstrations of how dask and dask-ml solve those challenges. We’ll see examples of how dask can expose a cluster of machines to scikit-learn’s built-in parallelization framework. We’ll see how dask-ml can train estimators on large datasets.
  • Наука та технологія

КОМЕНТАРІ • 15

  • @bharatggaikwad
    @bharatggaikwad 6 років тому +1

    Thanks Tom, excellent introduction to dask. I am trying it out.

  • @bordeivlad
    @bordeivlad 5 років тому +3

    great presentation! Thank you Tom.

  • @AkshayRoyal
    @AkshayRoyal 4 роки тому +1

    Great Presentation Tom

  • @thomas.moerman
    @thomas.moerman 6 років тому +2

    Good talk! Dask rules.

  • @internationalscholarhw
    @internationalscholarhw 4 роки тому

    Thanks Tom

  • @user-ky5yz7pw8g
    @user-ky5yz7pw8g 3 роки тому

    great thank you tom

  • @divyamaskar6950
    @divyamaskar6950 4 роки тому

    Can't we use dask dataframe for modelling? I.e for logistic regression. I tried and I get error as not implemented error

  • @clarissalee7414
    @clarissalee7414 5 років тому

    Nice. Will use for some jobs

  • @julianhecker944
    @julianhecker944 4 роки тому

    Would reeeeally like to make some distributed computing library using this!

  • @mrmuranga
    @mrmuranga 3 роки тому

    Thanks..
    are you able to share your code ?
    where would you recommend one learn dask and sklearn ?

  • @kidou123456
    @kidou123456 5 років тому +1

    In terms of 50 cluster workers in the demo, I guess they are processes in a bare metal server. (Correct me if I am wrong) How to actually set up servers for distributed job of Dask?

    • @kevinfortier556
      @kevinfortier556 5 років тому

      If you have acces to a hpc cluster, check out dask-jobque. If you want to setup your own cluster try kubernetes. I believe dask has support for kubernetes

  • @NabinKhadka14
    @NabinKhadka14 5 років тому

    So dask is for statistical calculations?

    • @owisagrom
      @owisagrom 5 років тому +1

      No, it scales existing machine learning tools for parallel processing

  • @shubhambansal532
    @shubhambansal532 4 роки тому

    Sound in your video is very low. Its making difficult to focus on your topics. I think you should imporve it.
    And for this beautiful explanation.