Data Ingestion From APIs to Warehouses - Adrian Brudaru

Поділитися
Вставка
  • Опубліковано 12 вер 2024

КОМЕНТАРІ • 20

  • @fahadshoaib8735
    @fahadshoaib8735 7 місяців тому +26

    Starts at 10:44

  • @manonleroux5861
    @manonleroux5861 7 місяців тому

    Classe really well made, i enjoy to have a complete .md file with all instruction writed. It's really convenient ! Thanks !

    • @dltHub
      @dltHub 3 місяці тому

      Thank you! the intention was that here the class is delivered in a human format, and the student can take their time to deep dive on their own.

  • @OskarLindberg-v5h
    @OskarLindberg-v5h 7 місяців тому +2

    seems like a great tool! Combined with Mage that already has great integrations with dbt, it seems you can make a powerpul and easy-to-set-up data pipeline :-)

    • @dltHub
      @dltHub 3 місяці тому

      Yep we are orchestrator agostic so you can run it on mage or on whatever you want. We offer a dbt runner too, so you don't have to set up credentials and config twice.

  • @easypeasy5523
    @easypeasy5523 6 місяців тому

    Amazing content and discussion, as data engineer myself looking forward to contribution in the project.

    • @dltHub
      @dltHub 3 місяці тому

      Thank you!

  • @fabmeyer_ch
    @fabmeyer_ch 7 місяців тому

    If I let both examples run in Google Colab for all rows, not just the first 5 in example 3, I am getting about 10x speed up with example 3 vs. example 2. How can this be possible?

    • @dltHub
      @dltHub 3 місяці тому

      Because a full download at once is faster than a full download as stream. What we demonstrate in the timing is that one does a full download while the other only 5 rows.

  • @tobiasfsdfsd
    @tobiasfsdfsd 7 місяців тому

    Might be a stupid question but in 33:25 you say that if we have 4gb in file, we have 8gb in memory. Why is that?

    • @MAHDY51
      @MAHDY51 7 місяців тому

      because it runs twice

    • @DylanTan
      @DylanTan 7 місяців тому +2

      You keep the contents of the 4GB file in memory twice, in both "data" and "parsed_data" variables, thus 8GB of memory is consumed

    • @dltHub
      @dltHub 3 місяці тому

      Besides that it was kept twice, we assume efficient data storage. In reality, if you load 4gb to a df, you might see much more ram usage.

  • @user-md8np5wo6s
    @user-md8np5wo6s 7 місяців тому +2

    lego 75192 millennium falcon?

  • @dawei7
    @dawei7 7 місяців тому +5

    Quite hard to watch & follow.
    Workshop should be "writing code", step by step and not show ready code.

    • @MegaTarino
      @MegaTarino 7 місяців тому +3

      The topic is too broad to write a code, he had like 1 hour to demonstrate 3 dlt features, make an introduction, show generators. It would have taken a few hours to write it. Moreover in the case of dlt the code itself is 5 lines and one has a notebook... I agree that it was sometimes hard to follow, but better so, than looking a few hours video (IMHO).

    • @tobiasfsdfsd
      @tobiasfsdfsd 7 місяців тому +6

      Yeah, well most of the time it was like reading loud what is written in the file. There was not really a use of free words and only a few examples where explained better than written in the text. Seems sufficient to read the file. But I was happy when Alex tried to ask some questions or explained some things differently. Apart from that I think there is no need to watch the video

    • @dltHub
      @dltHub 3 місяці тому

      @@tobiasfsdfsd This was a live workshop which had to be pre-prepared to fit in the time and cover everything effectively. There would be little benefit going off topic. A class is different and you are welcome to suggest it to us, if there is enough demand we will do it!

    • @dltHub
      @dltHub 3 місяці тому

      @@MegaTarino That was exactly what we were looking to do, convey the info effectively in a short amount of time, and as with any class there's an expectation that the learner dedicates 8x the time themselves to practice and comprehend, and for this we provided the notebook and homework assignment. We barely got started covering the topic too, since it's so broad. We are working on a more comrehensive ppeline building course, but that will take 6 hours and be similarly packed, with the expectation that the student invests at least 24h on their own.
      Covering an entire domain in a few hours is quite efficient I would say, there's a ton of possible complexity around ELT