Advancing Spark - Databricks Feature Store

Поділитися
Вставка
  • Опубліковано 13 лип 2024
  • We've talked about the idea of feature stores, and how they're used to help Data Science teams scale, so it's about time we had a look at using one in anger.
    In this video, Simon welcomes back Gavi, our resident Data Science expert, to walk us through an example Feature Engineering workflow, saving the feature tables down to the store then pulling them into a model training workflow.
    For a step by step guide around getting started with Databricks Featurestore, see Gavi's excellent blog over on the AA website: www.advancinganalytics.co.uk/...
    As always, don't forget to Like & Subscribe, and let us know if we can help your Data Science journey!

КОМЕНТАРІ • 13

  • @manishmishra341
    @manishmishra341 Рік тому +3

    Thanks Simon. That was very helpful as always. Just wondering why the step by step guide link is not working, would you be able to share the link for Gavi's notebook or update the description if possible?

  • @peterbutsch
    @peterbutsch 11 місяців тому +3

    Hey! The link to the previous video doesn't work. Is it only me?

  • @hubert_dudek
    @hubert_dudek 2 роки тому +2

    After all feature store is just delta table/dataframe which is in hive metastore. Using ready feature store options it is not possible to specify kind of join. It could handle also storing images at least, not as url to s3 but natively with possibility to preview images in table preview.

  • @mimmakutu
    @mimmakutu 2 роки тому +3

    No online offline synch is a big missing feature

  • @antoruby
    @antoruby 2 роки тому +3

    18:48 I missed where df_batch came from

    • @AdvancingAnalytics
      @AdvancingAnalytics  2 роки тому +2

      I'll do a follow-up vid this week looking at the engineering behind the vid - I'll get back to you when I go through the notebook!

    • @Asiagosik
      @Asiagosik 2 роки тому +1

      Any follow-up on this? Same question

  • @torito_misiu
    @torito_misiu Рік тому

    Once you cleaned the data, What do you do with the columns? Are they store in a feature store or is it going to the raw data ?

    • @AdvancingAnalytics
      @AdvancingAnalytics  Рік тому +1

      Yep, assumption in these steps is that you're cleaning up the data that's used in features and stored directly in the feature store (as delta tables under the hood!)

  • @mchiranjeeb
    @mchiranjeeb 2 роки тому +1

    What if I don't have primary keys, then how to process command 31 as creating a feature store is throwing the following value error,
    Non-unique rows detected in input dataframe for key combination ['column1', 'column2',...'columnN']

    • @AdvancingAnalytics
      @AdvancingAnalytics  2 роки тому +2

      Either need to make a primary key by hashing over several columns, or drop duplicates (and accept the feature lookup would return the same result for matching records). Similar constraints to performing a merge.
      Simon

  • @asanahidi3231
    @asanahidi3231 5 місяців тому

    the link doesn't work

  • @marcwhite1877
    @marcwhite1877 2 роки тому

    Hello👋 fab!! You are missing out - Promo`SM!!