Dask on HPC Introduction

Поділитися
Вставка
  • Опубліковано 19 жов 2024
  • This thirty-minute screen casts takes users through setting up an interactive computing environment on a traditional HPC Supercomputer using Miniconda, Jupyter, and Dask. It covers the following:
    1. Install miniconda
    2. Use dask-jobqueue to deploy Dask
    3. Build a configuration file to simplify things and share with colleagues
    4. Use ssh tunneling to get access to the dask dashboard
    5. Use JupyterLab on the supercomputer from our laptops

КОМЕНТАРІ • 13

  • @Xnowornever
    @Xnowornever 4 роки тому +8

    1:10 Setting up Miniconda.
    3:40 Installing ipython, dask and dask_jobqueue.
    4:57 Making a cluster with dask_jobqueue: requesting resources, submitting jobs and connecting to our dask cluster.
    8:08 Example: simpple demonstration of parallelization.
    9:51 Under the hood: the job script submitted by dask.
    10:43 Cluster configuration.
    15:35 Distributed configuration.
    17:50 SSH port forwarding + accessing the dask dashboard from your local machine.
    24:00 Set up your interactive work environment with jupyter lab.
    30:10 Further information.
    Thanks for the video!

  • @neilmvaytet
    @neilmvaytet 6 років тому +7

    Hi Matthew, awesome, crystal clear and really useful video! One thing you may consider next time is to not place the bottom of the terminal all the way to the bottom of the video frame. The youtube progress and control bar is kind of hiding your last typed command whenever I press pause, when i'm trying to implement your setup at the same time as I watch the video. Thanks again for the great tutorial!

  • @deeplearningexplained
    @deeplearningexplained 4 роки тому +1

    Thanks Matthew for this, really helped me get setup on my university cluster!

  • @lijodxl1
    @lijodxl1 6 років тому +2

    Thank you for this video. Along with explaining Dask's usage, you have introduced a very useful workflow.

  • @codea1273
    @codea1273 4 роки тому

    Excited to try this. Thanks!

  • @salmon9130
    @salmon9130 4 роки тому

    Thank you, very enlightening.

  • @zapy422
    @zapy422 5 місяців тому

    This is great solving access to compute.
    How do you handle data and code dependencies?

    • @MatthewRocklin
      @MatthewRocklin  5 місяців тому

      In this case we're just using the network file system to handle software dependencies and data. This is common for how people use HPC systems.

  • @zapy422
    @zapy422 5 років тому

    Thank you for this explanation, very clear I am having hard time running parallel computation with ipyparallel inside jupyter. I lost contact with my workers/engines after reconnecting to the notebook server.

  • @Suresh-sj9nk
    @Suresh-sj9nk 5 років тому

    Thank you very much for the tutorial!
    Can use my jupyter notebook instead of jupyter lab?

  • @Ellie-kv8oi
    @Ellie-kv8oi 4 роки тому

    Hi and thanks for the video. For the number of processes, after following your steps, sometimes I only see 0/10 or 4/10 for me when I print out client. Do you know what might be causing this? Also when the job are done processing in the queue, client processes print out back to 0 for me.

    • @MatthewRocklin
      @MatthewRocklin  4 роки тому

      Maybe your worker jobs haven't yet started?

  • @danielkrajnik3817
    @danielkrajnik3817 3 роки тому

    1:00 dashdashboardboard