Це відео не доступне.
Перепрошуємо.

Python Libraries You Should Know As A Data Engineer - Python For Beginners

Поділитися
Вставка
  • Опубліковано 18 сер 2024
  • What python libraries should data engineers know?
    Here is a list from beginner to advanced!
    Beginner
    - Requests
    - Paramiko
    - Psycopg2 or SQLAlchemy
    - Datetime
    Mid
    - BeautifulSoup
    - Airflow
    - All the cloud libraries(AWS, GCP, Azure)
    Advanced
    - PySpark
    - PyKafka
    0:00 Intro
    2:10 Requests
    2:44 Paramiko
    3:02 Psycopg2
    4:00 Basic Data Engineering Project Idea
    4:42 BeautifulSoup
    5:02 Datetime
    6:00 Airflow
    6:33 All the cloud libraries(AWS, GCP, Azure)
    8:30 PySpark and PyKafka
    If you enjoyed this video, check out some of my other top videos.
    Top Courses To Become A Data Engineer In 2022
    • Top Courses To Become ...
    What Is The Modern Data Stack - Intro To Data Infrastructure Part 1
    • What Is The Modern Dat...
    If you would like to learn more about data engineering, then check out Googles GCP certificate
    bit.ly/3NQVn7V
    If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.
    seattledataguy...
    Or check out my blog
    www.theseattle...
    And if you want to support the channel, then you can become a paid member of my newsletter
    seattledataguy...
    Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio
    _____________________________________________________________
    Subscribe: / @seattledataguy
    _____________________________________________________________
    About me:
    I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.
    *I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.

КОМЕНТАРІ • 28

  • @SeattleDataGuy
    @SeattleDataGuy  Рік тому

    If you guys want to learn more about data engineering, then sign up for my newsletter here seattledataguy.substack.com/

  • @shravanshenoy3873
    @shravanshenoy3873 Рік тому +18

    Beginner -
    1. Requests (and sftp)
    2. Psycopg2 and similar database libraries
    3. Beautifulsoup and scrapy
    4. Datetime
    5. Virtualenv
    Intermediate -
    6. Airflow
    7. Boto3 and similar libraries to interact with cloud
    8. Flask/Django
    Advanced (based on need to know) -
    9. Pyspark
    10. Pyarrow

  • @RSKriegs
    @RSKriegs Рік тому +11

    Some other cool libraries from my side:
    - Pandas - you've mentioned it but you haven't put it in a context that one should know I think (vide the case from your Facebook interviews) - I think its essential for any sort of data wrangling with Python.
    - NumPy - essential stuff for any sort of algebra if you want to dive deeper into ML
    - MyPy/Pydantic - for data validation & static typing
    - Pytest - for testing
    - matplotlib & seaborn - for data visualization in Python
    - any sort of file libraries for specific file formats like json, csv, avro-python etc.
    - ML libraries like scikit-learn
    - FastAPI as an alternative to Django/Flask
    - Selenium
    - argparse for scripting
    Although I haven't used most of these in my job on a regular basis - I think it doesn't hurt to know them :)

    • @data-dylan
      @data-dylan Рік тому

      sympy is more of an algebra library. I think you meant numpy is a linear algebra library. This can be a good way of thinking about it for a beginner who wants to learn ML, but I find it gets used a lot for stuff where you want to try and represent continuous mathematics as closely as possible on a computer. For example, numpy would also be also be good for stuff like signal processing or creating a function of best fit for your data that can be plotted.

  • @hdr-tech4350
    @hdr-tech4350 Рік тому +1

    Requests
    Psycopg
    Bigquery
    Beautifulsoup & scrapy
    Datetime
    Boto 3
    Flask
    Virtualenv
    Spark
    Pyarrow
    Pykafka
    Snowflake

    • @SeattleDataGuy
      @SeattleDataGuy  Рік тому

      Thanks! I finally added in the agenda so these are now included.

  • @matthewwiese6972
    @matthewwiese6972 Рік тому

    Psycho pg2 is how I've heard folks say it too!

  • @luizhenriquecudo125
    @luizhenriquecudo125 Рік тому

    Great content as usual! I'd add json library to that

  • @shashankemani1609
    @shashankemani1609 Рік тому +1

    amazing thank you!

  • @lkellermann
    @lkellermann Рік тому

    Watching the premiere... expecting to hear about the tenacity library here xD

  • @EH-it8pj
    @EH-it8pj Рік тому +7

    I'm stuck in a "data engineer" position where all my boss will let me do is debug SQL script and it's killing me

    • @gavinkalaher7314
      @gavinkalaher7314 Рік тому

      how long have you been there?

    • @jeffGordon852
      @jeffGordon852 Рік тому +1

      QUIT

    • @playea123
      @playea123 Рік тому +2

      Leave if you can. You are doing yourself no favors by wasting years at a job you don’t like and especially one that isn’t improving your skills

  • @redrum4486
    @redrum4486 Рік тому

    I have to use a shell script ti execute mysql queries then pass the resulrt as an argument in my python scripts >_< wish i could just use mysql connector

  • @data-dylan
    @data-dylan Рік тому

    How can you know pandas every which direction, but not understand a dictionary? You wouldn't know how to construct a dataframe from a dictionary of lists (often my approach when webscraping) or know how to use the map function to change categorical names. Wes McKinney (who created pandas) even says that a pandas series data structure is similar to an ordered dictionary.

  • @pcargolo1
    @pcargolo1 Рік тому

    I've gone through possibly all python courses in Udemy but have never seen a course focused on Data Engineering and the good-to-know libraries. Some times there is one short chapter about one of them buth nothing complete. Anyone has any tips?

  • @SanjeevKumar-dr6qj
    @SanjeevKumar-dr6qj Рік тому +1

    You are awesome.

  • @EbeneezerGumb
    @EbeneezerGumb Рік тому

    good list, but most of your psycopg2 stuff prob would have been easier with sqlalchemy

  • @gabrielkolletalves493
    @gabrielkolletalves493 Рік тому +1

    Regarding to APIs I always thought we should learn how to pull from them, not actually create them. So where does Flask fits into all that?

    • @playea123
      @playea123 Рік тому

      Depends on what product is built on top of your db/dw. You might need to build an api on top of your warehouse to power your product.

    • @gabrielkolletalves493
      @gabrielkolletalves493 Рік тому

      @@playea123 Cool. And do you know what kind of custom API could run over a DW? I could only think such case in an OLTP context...

    • @playea123
      @playea123 Рік тому +1

      @@gabrielkolletalves493 depends on how you model your DW. If you want something similar to an OLTP, Snowflake rolled out hybrid tables a few months ago

  • @alexanderpotts8425
    @alexanderpotts8425 Рік тому

    hey! leave gcp libs alone 😂