Parallel Load in Spark Notebook - Questions Answered

Поділитися
Вставка
  • Опубліковано 5 вер 2024

КОМЕНТАРІ • 5

  • @vivekupadhyay6663
    @vivekupadhyay6663 5 місяців тому

    For CPU intensive operations would this work since it uses threading? Also, can't we use multiprocessing if we want to achieve parallelism?

  • @neerajnaik5161
    @neerajnaik5161 8 місяців тому

    I tried this. However, I noticed a issue when I have single notebook which creates multiple threads, where each thread is calling a function which creates the spark localtempviews, the views get overwritten by the second thread as it essentially is same spark session.
    How do I get around this?

    • @DustinVannoy
      @DustinVannoy  8 місяців тому

      I would parameterize it so that each temp view has a unique name.

    • @neerajnaik5161
      @neerajnaik5161 8 місяців тому

      @@DustinVannoyyea i had that in mind, unfortunately i cannot as the existing jobs are stable in production. However, this is definitely useful for new implementation

    • @neerajnaik5161
      @neerajnaik5161 8 місяців тому

      I figured it. instead of calling the function i can use dbutils.notebook.run to invoke the notebook in seperate spark session. Thanks