Building ML Models in Snowflake Using Python UDFs and Snowpark | DEMO

Поділитися
Вставка
  • Опубліковано 2 гру 2024

КОМЕНТАРІ • 11

  • @nagasai5029
    @nagasai5029 Рік тому

    Where can i find the data set that is used in this video

  • @octo3010
    @octo3010 Рік тому +1

    How would you solve this with a vectorized UDF?
    Is there a demo on same.

    • @sonny.rivera
      @sonny.rivera Рік тому +2

      Chris and I did not vectorize the UDF. That's a great idea. I'll sync with Chris and see want we do. Thanks for the great suggestion.

    • @octo3010
      @octo3010 Рік тому

      @@sonny.rivera that would be very helpful.
      On a side note, is there any reference material on optimising costs for Snowflake compute resources.

    • @snowflakedevelopers
      @snowflakedevelopers  Рік тому +1

      Here a few resources to get you started:
      medium.com/snowflake/best-practices-to-optimize-snowflake-spend-73b8f66d16c1
      medium.com/snowflake/using-snowflakes-scale-to-zero-capabilities-for-fun-profit-f326a1d222d0
      medium.com/snowflake/deep-dive-into-managing-latency-throughput-and-cost-in-snowflake-2fa658164fa8
      medium.com/snowflake/improve-snowflake-price-performance-by-optimizing-storage-be9b5962decb
      medium.com/snowflake/compute-primitives-in-snowflake-and-best-practices-to-right-size-them-b3add53933a3

    • @octo3010
      @octo3010 Рік тому

      Thank you !

  • @tahabekmez5072
    @tahabekmez5072 Рік тому

    When you run the ml, does it run on local machine or within snowflake?

    • @sonny.rivera
      @sonny.rivera Рік тому +1

      I often dev and test using VS Code/python on my local instance and then deploy the code to snowflake & snowpark that runs in the cloud.

  • @saeedrahman8362
    @saeedrahman8362 Рік тому

    if we do the per category training and predictions in udf function generate_auto_arima_predictions via pandas dataframe we wouldn't get any parallelization benefit, right ? We would process all the categories sequenetially.
    Shouldn't we use UDTF for these kind of operations ?

    • @snowflakedevelopers
      @snowflakedevelopers  Рік тому

      Thanks for your comment! A UDTF would be a stronger option, as it could leverage parallel partitioning to perform these concurrently instead (as you mention). Check out the following two articles on training ARIMA models:
      interworks.com/blog/2022/11/22/a-definitive-guide-to-creating-python-udtfs-directly-within-the-snowflake-user-interface/
      interworks.com/blog/2022/11/29/a-definitive-guide-to-creating-python-udtfs-in-snowflake-using-snowpark/
      For some more information on UDTFs and how they work, see:
      interworks.com/blog/2022/11/15/an-introduction-to-python-udtfs-in-snowflake/
      Thanks!

    • @sonny.rivera
      @sonny.rivera Рік тому

      The models will run concurrently on the virtual warehouse. The UDTF is really just calling the 'predict' function. The model training is happening in the stored proc.