Thanks Simon, for the suggestion of putting the checkpoints alongside the destination table, for a recent client, I've built a streaming pipeline with AvailableNow trigger into a UC managed Delta table with Apply Changes. Is it still possible to put the checkpoint folder inside a managed table like that? The example provided only works for external tables.
Great video love it, I have a question if I have for example 3 ETL Pipelines that use readStream and writeStream (Let's say every Pipeline is a separated notebook) what are the best practices to include that in my workflow? should I add them as tasks without dependency?
If you want them all to be using the same job cluster (which makes sense from a cost POV!) then they need to be in the same workflow, and all the restartability it around each task. Three tasks without dependencies makes the most sense, unless you have them as children off a single "initialisation task". If they're not sharing a cluster, then put them on separate workflows so you can stop/start/manage independently - all depends how inter-dependent the streams are
Good video Simon. Question.... We always hear that you can run streaming in a batch-like mode (via the trigger) and that's true ..... BUT we have not been able to yield the correct output when performing left or right outer joins with 2 streaming data sets (with slow moving). To solve the problem, we have to treat the outer side as static and that shouldn't be the case. Any thoughts?
Good explanation, as always Simon. A suggestion if you don't mind. Can you please shrink the size of your video a little bit, so we get to see more of your presentation? Thanks a lot.
Great video Simon, thanks for sharing
Thanks Simon, for the suggestion of putting the checkpoints alongside the destination table, for a recent client, I've built a streaming pipeline with AvailableNow trigger into a UC managed Delta table with Apply Changes. Is it still possible to put the checkpoint folder inside a managed table like that? The example provided only works for external tables.
Great video love it, I have a question if I have for example 3 ETL Pipelines that use readStream and writeStream (Let's say every Pipeline is a separated notebook) what are the best practices to include that in my workflow? should I add them as tasks without dependency?
If you want them all to be using the same job cluster (which makes sense from a cost POV!) then they need to be in the same workflow, and all the restartability it around each task. Three tasks without dependencies makes the most sense, unless you have them as children off a single "initialisation task". If they're not sharing a cluster, then put them on separate workflows so you can stop/start/manage independently - all depends how inter-dependent the streams are
@@AdvancingAnalytics thanks!
Good video Simon. Question.... We always hear that you can run streaming in a batch-like mode (via the trigger) and that's true ..... BUT we have not been able to yield the correct output when performing left or right outer joins with 2 streaming data sets (with slow moving). To solve the problem, we have to treat the outer side as static and that shouldn't be the case. Any thoughts?
Good explanation, as always Simon. A suggestion if you don't mind. Can you please shrink the size of your video a little bit, so we get to see more of your presentation? Thanks a lot.
More content, less of my face, gotcha :)
I'll have a tweak!
@@AdvancingAnalytics , I'm a fan of your expressions too. Thanks.
Very good content. Thank you!