Here a few resources to get you started: medium.com/snowflake/best-practices-to-optimize-snowflake-spend-73b8f66d16c1 medium.com/snowflake/using-snowflakes-scale-to-zero-capabilities-for-fun-profit-f326a1d222d0 medium.com/snowflake/deep-dive-into-managing-latency-throughput-and-cost-in-snowflake-2fa658164fa8 medium.com/snowflake/improve-snowflake-price-performance-by-optimizing-storage-be9b5962decb medium.com/snowflake/compute-primitives-in-snowflake-and-best-practices-to-right-size-them-b3add53933a3
if we do the per category training and predictions in udf function generate_auto_arima_predictions via pandas dataframe we wouldn't get any parallelization benefit, right ? We would process all the categories sequenetially. Shouldn't we use UDTF for these kind of operations ?
Thanks for your comment! A UDTF would be a stronger option, as it could leverage parallel partitioning to perform these concurrently instead (as you mention). Check out the following two articles on training ARIMA models: interworks.com/blog/2022/11/22/a-definitive-guide-to-creating-python-udtfs-directly-within-the-snowflake-user-interface/ interworks.com/blog/2022/11/29/a-definitive-guide-to-creating-python-udtfs-in-snowflake-using-snowpark/ For some more information on UDTFs and how they work, see: interworks.com/blog/2022/11/15/an-introduction-to-python-udtfs-in-snowflake/ Thanks!
The models will run concurrently on the virtual warehouse. The UDTF is really just calling the 'predict' function. The model training is happening in the stored proc.
Where can i find the data set that is used in this video
How would you solve this with a vectorized UDF?
Is there a demo on same.
Chris and I did not vectorize the UDF. That's a great idea. I'll sync with Chris and see want we do. Thanks for the great suggestion.
@@sonny.rivera that would be very helpful.
On a side note, is there any reference material on optimising costs for Snowflake compute resources.
Here a few resources to get you started:
medium.com/snowflake/best-practices-to-optimize-snowflake-spend-73b8f66d16c1
medium.com/snowflake/using-snowflakes-scale-to-zero-capabilities-for-fun-profit-f326a1d222d0
medium.com/snowflake/deep-dive-into-managing-latency-throughput-and-cost-in-snowflake-2fa658164fa8
medium.com/snowflake/improve-snowflake-price-performance-by-optimizing-storage-be9b5962decb
medium.com/snowflake/compute-primitives-in-snowflake-and-best-practices-to-right-size-them-b3add53933a3
Thank you !
When you run the ml, does it run on local machine or within snowflake?
I often dev and test using VS Code/python on my local instance and then deploy the code to snowflake & snowpark that runs in the cloud.
if we do the per category training and predictions in udf function generate_auto_arima_predictions via pandas dataframe we wouldn't get any parallelization benefit, right ? We would process all the categories sequenetially.
Shouldn't we use UDTF for these kind of operations ?
Thanks for your comment! A UDTF would be a stronger option, as it could leverage parallel partitioning to perform these concurrently instead (as you mention). Check out the following two articles on training ARIMA models:
interworks.com/blog/2022/11/22/a-definitive-guide-to-creating-python-udtfs-directly-within-the-snowflake-user-interface/
interworks.com/blog/2022/11/29/a-definitive-guide-to-creating-python-udtfs-in-snowflake-using-snowpark/
For some more information on UDTFs and how they work, see:
interworks.com/blog/2022/11/15/an-introduction-to-python-udtfs-in-snowflake/
Thanks!
The models will run concurrently on the virtual warehouse. The UDTF is really just calling the 'predict' function. The model training is happening in the stored proc.