Delta Lake: Optimizing Merge

Поділитися
Вставка
  • Опубліковано 28 січ 2025

КОМЕНТАРІ • 12

  • @jacek_laskowski
    @jacek_laskowski Рік тому +2

    The pace, tone and fairly detailed slides made this talk so pleasant to tune in. Thanks Justin! 👏👏👏

  • @guambomber448
    @guambomber448 Рік тому +1

    I don't understand how pruning on the left has any effect because the left is the source of the distinct dates in the first place

  • @vdsg
    @vdsg Рік тому

    Is the slide deck available as a PDF somewhere?

  • @sushantpachipulusu8646
    @sushantpachipulusu8646 2 роки тому

    Can you please share the deck or the KB articles shared in the slides?

  • @alessiocesaretti3614
    @alessiocesaretti3614 Рік тому

    Hello, I was trying to apply your suggestion about partition pruning on the existing table, by using the distinct values coming from the partitioning column of the incoming table to be merged, I wanted to do this dinamically but I found a corner case: in case I have an historization table that I'd like to update, if one new record has a new date (i.e. 2024), and the old version of that record was created in 2023, my merge condition "hist.BK_id == incoming.BK_id AND partition_year in (2024)" wouldn't allow me to update (flag is_current = False) the old record... I end up having duplicates and I cannot figure out an efficient way to include the partition without doing another expensive lookup. Do you have any suggestion for this use case?

  • @schallereqo
    @schallereqo 4 роки тому

    Great talk Justin! Learned a lot of new things

  •  4 роки тому

    Gracias Justin , esta explicación está excelente

  • @Universal_MisiQ
    @Universal_MisiQ 3 роки тому

    Very informative Justin

  • @titowoche
    @titowoche 3 роки тому

    Great talk

  • @harshikamahesh9459
    @harshikamahesh9459 Рік тому

    Just because u get a separate bucket for atable doesn’t mean that the bucket will be in its own partition. S3 will scale