Things I Wish I Knew When I Started As A Data Engineer

Поділитися
Вставка
  • Опубліковано 26 лис 2024

КОМЕНТАРІ • 127

  • @SeattleDataGuy
    @SeattleDataGuy  3 роки тому +4

    If you're enjoying this content, then you should sign up for my newsletter! seattledataguy.substack.com/p/how-to-improve-your-data-analytics

  • @malcorub
    @malcorub 3 роки тому +41

    Seattle Data Guy is #1. I listen while washing the dishes and doing laundry. 😀

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +2

      I see, you are also with the D3, apparently I need to start a podcast

    • @malcorub
      @malcorub 3 роки тому +3

      @@SeattleDataGuy Podcast would be super awesome with guests. There are many analytics podcasts out there but DE podcasts are hard to find.

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +2

      Yeah I occasionally interview people. Maybe I will set it up on my other channel

    • @nanomartin
      @nanomartin 2 роки тому +1

      Hey, what is that "other" channel?
      I'm just about to turn into data fields. I've discovered you a couple weeks ago and found really interesting your videos. 😉

  • @eck1997rock
    @eck1997rock 2 роки тому +1

    Iaca ce bine zice băiatul asta. Thanks for the content!

  • @snyab0354
    @snyab0354 3 роки тому +27

    Super helpful! I was getting overwhelmed with the amount of tools I thought I had to learn. Now I'm just focusing on building my python and sql skills then going from there. Thanks as always for being a reliable source of data engineering related content!

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +4

      Yeah! Focus on the basics. The rest comes, whether you want it too or not...I was literally walking someone through setting up a docker compose and docker file to spin up airflow, redis and postgres from memory...and I was like..why do I know this.

    • @GuyThompsonFWTX
      @GuyThompsonFWTX 3 роки тому +6

      SQL, Python, Linux CLI, a text editor (I use sublime text all the time to make changes to files that I’m having formatting issues with if it’s a one-time thing), and then maybe node.js will get you really far.

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +3

      ​ @Guy Thompson What this guy says! If you have these skills, then even if you dislike data engineering you can shift your role easily.

    • @mustafakara7739
      @mustafakara7739 Рік тому

      Wait, don't we have to know this? :D @@SeattleDataGuy

  • @LukeBarousse
    @LukeBarousse 3 роки тому +38

    "Saying Yes to every request"... So guilty of this one; just because you want to do something doesn't mean you should.
    Great tips Ben!!

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +2

      For sure! It's so hard when you both want to be helpful and love every idea you hear.

    • @Mahj111
      @Mahj111 2 роки тому

      lol, a of people working with data have this misconception that if business people write to them all the time they sometimes think that they like you. Unfortunately it's not the truth in most of the cases and those people just try to exploit and use you as a tool unfortunately (I was on both sides so I know it perfectly well). The easiest way to get over this and do the job that actually matters and is planned for you to do is bypass them to your manager and (even better) categorize it with manager and if something needs to be done quickly - do it without consulting him as it can have meaningful impact on the business side (which means money). But I think it comes with experience - unfortunately I'm seeing a lot of newcomers who are pushed so much from business people that out of nowhere they have 10 tasks to do and can't concentrate on the actual goals. They're trying hard to impress I assume and that's not necessary because your work will be evaluated by your manager, not some random guy from the company that you're helping out all the time.

  • @GuyThompsonFWTX
    @GuyThompsonFWTX 3 роки тому +8

    I end up having to use the Accounting Software as a source of truth to gauge precision, accuracy, and alignment. Believe it or not, sometimes Accountants get the data entry wrong (2020 instead of 2021 happens all the time). The human factor in the source of truth needs to always be considered.

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +1

      Oh yes! Yeah I have seen issues like that. That's what anomaly detection is for...sometimes. Oh an accountant put in a -300k charge when they have never put in more than a -5k charge. Lets make sure thats right

  • @calin997
    @calin997 3 роки тому +9

    If possible it would be great to hear more about your story going from BI to Data Engineering (I remember you mentioned this in one of your videos). I am in a similar place right now and would love to learn from your experience/steps you took. Otherwise, really great video, as always!

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +1

      I will see if I can put together another video on a similar subject but with some form of twist

  • @miguelalcorta3558
    @miguelalcorta3558 3 роки тому +3

    Great video!! Really glad I found your channel - currenly navigating my first data engineer job and it has been hectic (but good)

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +2

      Yeah we are still grasping a lot of the DE space and trying to define a lot! Let me know if you have any questions.

  • @amdtoon
    @amdtoon 4 місяці тому

    I recommend making maps of resources and documenting scripts/processes.
    Basically something that can be referred to when you're on leave

  • @BJTangerine
    @BJTangerine 3 роки тому +6

    There's so much to learn about data engineering; from programming skills, tooling knowledge, to interview prep. One could spend a wealth of time just on each.

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +7

      You could spend 5 lifetimes learning and by the time you finished there would be 5 more lifetimes of tech to learn

    • @BJTangerine
      @BJTangerine 3 роки тому +2

      @@SeattleDataGuy hahaha that's right

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +1

      @@BJTangerine Just learn the basics well, all the rest is fluff

    • @BJTangerine
      @BJTangerine 3 роки тому +1

      @@SeattleDataGuy thanks for the advice, I'll do my best

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +1

      Good luck!

  • @philippeaddelia2007
    @philippeaddelia2007 Рік тому

    This is a lot of experience served on a small easy to digest format. Good advices.

  • @liy1561
    @liy1561 3 роки тому +4

    Ure spoilin us with these vids Ben

  • @shrutijain1628
    @shrutijain1628 2 роки тому +2

    This is just amazing
    Thank you for all this tips

  • @Goku-br7yt
    @Goku-br7yt 3 роки тому +2

    Oh my man someone spoke the truth.
    Fellow mates , Yes don't accept every damm adhoc request 🙏
    Also the crm thing happen with me recently , change in existing sfdc construct just made my sync pipeline useless..
    & ideal data source doesn't exist🥲

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +2

      Yeah, there are so many problems that will continue to persist for...maybe all of eternity

  • @TheChrystiann
    @TheChrystiann 2 роки тому +2

    Well done Ben!!! "save your SQL query" I used to spent a lot of time just trying to remember my code haha, I learned the hard way !! nice videos!

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +1

      Its such a small thing...but makes a big difference.

  • @antonkostov1691
    @antonkostov1691 2 роки тому +1

    Great Advices. Following several of your strategies I am doing really good in my career. Thanks.

  • @tejalkarande
    @tejalkarande 3 роки тому +3

    We like to have two source of truths. One that just reads the data (no transformations) and other with transformations. Both reading independently from the logging pipeline. That way, we know there is no integrity loss due to any of the transformations that are performed on the data.

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому

      Yeah, I agree with this. Then you can usually run some QA and compare the two sources. In theory if you right a query that does similar work to your pipelines, but on the raw form of data..you should get the same output

  • @priteshugrankar6815
    @priteshugrankar6815 3 роки тому +2

    I'm not a Data Engineer yet I see your videos. You are a good chap 👍🏻

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +2

      Glad you enjoyed it! What do you do?

    • @priteshugrankar6815
      @priteshugrankar6815 3 роки тому

      @@SeattleDataGuy I'm a Storage guy currently working as a Service Account Manager handling tech and non tech tasks. I just have a wavering interest in Data Engineering. 😂 I use Excel to create performance reports, capacity reports and at times use readymade templates from within Tableau. I have done a bit of scripting and automation earlier.

  • @andrewkim2891
    @andrewkim2891 2 роки тому +2

    "a self-service BI tool or a single source of truth as overused marketing terms"...was therapeutic to hear. It made me think it might be time we drop this unrealistic bar I find myself striving towards. What do you think the tech field is closer to achieving, the concepts of self-service BI tool/single data source of truth or having SQL as a standard competency for non-data professions (i.e. product, marketing, ops, management)?

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому

      Yeah, I have heard those terms..for the last decade.

  • @uranio58
    @uranio58 Рік тому

    Thanks for your videos and advices!!! ❤

  • @HenioGracie
    @HenioGracie 3 роки тому +1

    Hey SeatleDataGuy, idk if U remeber, but we talked few moths ago about MS Azure certs. I did 3 of them so far, Az900 Dp900 and now Dp203. First two are entry level exams, general knowlege, however az900 is harder than you could inmagine for such a trivia. Dp900 was very easy, but still this is a fundamental exam, so kinda what you want tl expect from entry level. However, MS associate level exams can be a little bit of a pain: dp203 data engineer was a HARD exam, no walkover. A lot of data warehousing and processing questions, mkstly from azure synapse, and i think equaly emphase for Databricks related qustions. I think some experience from these two and free content from the internet should get You easily trough this exam.

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +2

      Wow congrats on the certs! Also thanks for sharing what they were about. Its not only helpful to me but I am sure everyone else who will read this comment in the future!

    • @HenioGracie
      @HenioGracie 3 роки тому +1

      @@SeattleDataGuy np! I am looking forward to do at least two more on Azure, at least one associate and one of expert level. I will share here how it was when ill get trough it 4 sure.

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому

      @@HenioGracie Good luck to you!!

  • @alexanderbernard7846
    @alexanderbernard7846 3 роки тому +1

    Great video, biggest thing like you said is to focus on certain things and build upon that foundation, don’t get caught up on the newest technology because it’ll be an endless cycle.

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому

      It really is a hellish cycle. Thank you marketing teams...

  • @enpassant7358
    @enpassant7358 2 роки тому +1

    I follow a UA-camr who flies Paramotors, Tucker Gott. You remind me so much of him, you two could be brothers.

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +1

      Interesting! I get this every once and a while. Not with this specific youtuber..just a other youtubers

  • @peteintania
    @peteintania 2 роки тому +1

    This video is very useful. Thank you, Ben!

  • @chadheitman3646
    @chadheitman3646 3 роки тому +4

    Hey man I’m enjoying your channel, I think it’s quality content! Just a request.. would it be possible to get videos / series in the future in data engineering projects / walkthroughs, implementing various data engineering tools? I think you’d be great to make those type videos and it would be a huge value add! Thanks

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +2

      I really do want to do some of these videos. Just juggling a lot right now. But I am going to try to get these done!

    • @chadheitman3646
      @chadheitman3646 3 роки тому

      @@SeattleDataGuy just saw your reply, appreciate you getting back to me! Can’t wait till your able to make it happen, thanks for making the content you do while balancing a life ! Take care!

  • @day77012
    @day77012 3 роки тому +3

    I get data engineer interview calls where they keep asking for spark, but i don't use spark in my current job(we use dataflow,cloud functions,Airflow, Bigquery, snowflake) for our ETLs. I was thinking should I start learning spark if that is something which is really needed for a Data engineer. What are your thoughts?

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +2

      It depends, do you want to work for these companies? If so, then yeah. If not, then maybe just have a high level understanding. The truth is, in 10 years, the DE tools we work with will likely change...I mean it might even be 5 years or 2. So if you don't want to work with the companies, then you might find that in a few years that tool isn't the cool thing to use anymore.

    • @day77012
      @day77012 3 роки тому

      @@SeattleDataGuy thanks for responding! You are the best !🙂😇

  • @minthura24
    @minthura24 3 роки тому +1

    Thanks for sharing!

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому

      Man, I missed so many of your comments! My bad. Thank you!

  • @digithat6496
    @digithat6496 3 роки тому +3

    Great helpful video! So, as a data engineer, what are the first things/checks you do when you're given a dataset/table?

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +4

      Generally, I try to figure out where the data is coming from, the logic that populates that table, if there are similar fields in other tables, duplicates, correct data types, and going forward, probably set some anomaly checks.

    • @GuyThompsonFWTX
      @GuyThompsonFWTX 3 роки тому +1

      @@SeattleDataGuy checking data types and using trim for leading/trailing spaces, hidden characters, or zero-width characters should really be higher in the data audit methodology.

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому

      ​ @Guy Thompson Hahahaha very true :). The hidden character one...always the worst. Whoops I was expecting a different kind of line break so I just inserted everything into one line.

    • @loner007
      @loner007 3 роки тому +1

      @@SeattleDataGuy That could be a nice video to share with an example.

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +1

      Yes it really could be. Also probably very boring for 90% of people sadly. But I like the idea. I could do it with a touch of humor.

  • @NoobCoderDiaries
    @NoobCoderDiaries 3 роки тому +1

    That's really helpful. thank you for uploading these kind of video. Sir will you make a discord channel ?

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +1

      Perhaps at some point! I am juggling a few things right now so its not on the current roadmap

  • @valterbarros100
    @valterbarros100 3 роки тому +2

    Hi Ben! Thank you for your video! I'm from Brazil and looking forward to learn more about data engineering. Where can I learn more about using SFTP servers, as seen in the roadmap?

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому

      That's a solid question I think I will likely put out a video on the concept at some point. But you can read more here www.ssh.com/academy/ssh/sftp

  • @AlphaWatt
    @AlphaWatt 2 роки тому +1

    Great video

  • @justsomeguy1408
    @justsomeguy1408 3 роки тому +1

    I appreciate this kind of insight as someone who's more or less spearheading the burgeoning data culture at my work. It often feels like I have to learn all aspects of data at once, since we basically don't have governance or management, so I have to research and implement those principals myself. Do you have any recommended resources for simple frameworks that someone can use if they have to be a data manager, engineer, and analyst at the same time? Thanks for the video!

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +1

      This will always be hard. The best place to start is to try to get buy-in from management. If you don't have stakeholder buy-in...its going to be an uphill battle.
      But after that you can check out DAMA-DMBOK: Data Management Body of Knowledge. This is kind of a broad view of data management. From there I would scale a strategy that works for you. It sounds like you have a small or mid sized company so you will need to pick your battles. What makes the most sense to focus on? What has the highest ROI? These are questions you will need to ask. Also probably limit the number of dashboards and insights you deploy. You want to avoid trying to support too much early on. Just focus on applying some basic concepts that you pick up from something like the DMBOK and once you have that set-up, continue to grow the whole data strategy.

    • @justsomeguy1408
      @justsomeguy1408 3 роки тому +1

      @@SeattleDataGuy Thank you for the advice! I checked out DMBOK and ordered their Navigating the Labyrinth book for starters. I'll definitely focus more on choosing appropriate battles and being mindful of how much analysis I'm trying to support. Thanks!

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому

      Good luck! Hopefully things go well.

  • @LeX-fr-
    @LeX-fr- 3 роки тому +2

    Great video! Just as you mentioned at 8:24 I am at that stage currently. Got myself an opportunity to work as a Data Engineer recently which starts after my 4 months internship. Among projects, Data tools and Courses, what would you suggest me to do in the mean time to get ready for the overwhelming experience? 😬

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому

      Oh the is an amazing stage. Everything works and it takes like 2 days to put up some piece of code that takes 3 weeks in a company. I would say practice interviews...sadly interviews are different than work..but you have to interview to get work.

    • @TheCheukhin
      @TheCheukhin 3 роки тому

      Azure/AWS/GCP, Apache Spark, Hadoop, T-SQL, pgSQL, SQL(Oracle), CosmosDB, MongosDB, Python, Data pipeline, DevOps, CI, CD, MLOps

  • @pushpanthkumar9028
    @pushpanthkumar9028 3 роки тому +1

    Non related Question:-
    I am curious to know how do you track/monitor 100's of data pipelines whether they are completed success/failed or meeting the sla or not ? I am looking for a solution inputs for this problem.

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +1

      How are you currently managing this? Part of this can be managed by having at least a few DEs to manage hundreds of pipelines because at a certain point regardless the tool, you will need to break up some of the work. As far as tools. Airflow has a good UI that can be used to see failures and track complex DAGs. It doesn't have SLAs per say. But, it does have all that data stored so you can see runtimes and landing times and do some analysis from there. Plus I am sure there are a few plugins you can add in to do that automatically.

  • @jaymedina1991
    @jaymedina1991 2 роки тому +1

    Great tips!
    One other thing about data is how often the quality of the data is so bad. Many times I’ve been banging my head trying to get my query right, when it’s actually the data that is of bad quality.

  • @torontonian77
    @torontonian77 2 роки тому +1

    what the heck is that book with godzilla on the cover

  • @JohnLee-vb8we
    @JohnLee-vb8we 3 роки тому +1

    Hi SDG, questions on data modeling. I am a new data engineer and wondering where I can improve my data modeling skills? I know a lot of people mention Ralph Kimball's book, do we have anything else? Like where I can get database design from companies like uber/Facebook?

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +1

      Maybe their engineering blogs? Even though I don't think they talk much about data modeling. Sidenote, in Facebook's interview they will likely ask questions about data modeling that can often best be solved with Kimball. So that's a great place to start. Also, Data Vaults are making a resurgence here and there. So maybe read up on that. hanshultgren.files.wordpress.com/2012/09/data-vault-modeling-guide.pdf

  • @melanyjimenez757
    @melanyjimenez757 2 роки тому +1

    How do you keep track of your SQL versions? I usually add comments with the last update date but I'm looking for a more functional way to do it

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +1

      I think most of us use Git(hub) or some other version control tool. Also dbt can help manage versions as well. Although thats less for adhoc queries.

  • @chacmool2581
    @chacmool2581 2 роки тому

    I want to master Spark and Mahout. I am putty in your hands. Tell me how to achieve my goal.

  • @shauryajain4851
    @shauryajain4851 3 роки тому +1

    Will it be a good decision to pursue masters in data engineering, considering it's future scope is bright?

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому

      There are masters in DE! I haven't seen it. I think most engineering skills don't need a masters degree. For us its more about experience. Unless you're coming from a non technical background and looking to break into tech

    • @shauryajain4851
      @shauryajain4851 3 роки тому +1

      @@SeattleDataGuy it's northeastern uni in Boston, i hope you've heard the name 😂😅
      I agree with your point but breaking into data engineering as a fresher is tough

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому

      Yes I have heard of it. It is tough, but I know a lot more companies like FB are looking for talent out of college for the role

  • @antoniobogdan8679
    @antoniobogdan8679 3 роки тому +3

    Did you really said "bună dimineața'' in the beginning of the video or it is just an illusion because of my romanian?

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +2

      frati ma! Hahaha, my parents both moved from Romania to the US in the 1980s. Prior to the wall falling and the whole Christmas thing. My Romanian is terrible but I can understand a decent amount. You will also notice that the Funko pop in my background is Florian Munteanu i.e. Razor Fist.

    • @antoniobogdan8679
      @antoniobogdan8679 3 роки тому +2

      @@SeattleDataGuy Haha, didn't expected that to be honest :D. I am really happy to know this because since I decided to try a career switch from SAP consulting to Data Engineering I found your channel and your advices and explanations are very useful. Now, knowing that you have Romanian roots, I will have an even bigger smile on my face when you will upload a new video!

    • @calin997
      @calin997 3 роки тому +1

      I was about to post the same thing when I saw your comment. 😁😁

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому

      ​@@calin997 Hahaha! There are plenty of us Romanians in Tech!...also hacking

    • @antoniobogdan8679
      @antoniobogdan8679 3 роки тому +1

      @@SeattleDataGuy True 🤣🤣 we are waiting for you in Romania in the following years to drink some țuică or palincă and talk about data

  • @nerwin3409
    @nerwin3409 3 роки тому +1

    hello sir, i am on my 2nd year in college, i wanna be a DE after grad, still learning python and sql, can u give me some tips or topics i should learn after learning the basics? cuz i dont know what to do or to study next after learning the basics. thankyou in advance :)

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому +1

      Hello! I have a video on a data engineering roadmap. This should help answer what you should study next. ua-cam.com/video/SpaFPPByOhM/v-deo.html

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому

      Thought i replied to this. I have a video on a roadmap for data engineering. ua-cam.com/video/SpaFPPByOhM/v-deo.html

  • @ashishagrawal5003
    @ashishagrawal5003 3 роки тому +1

    Learning to say no is the hardest skill IMO.

  • @chessopenings
    @chessopenings 3 роки тому +1

    10:30 yes DBT 👍

  • @jesuispac
    @jesuispac Рік тому +1

    Why did you greet in Romanian?

    • @SeattleDataGuy
      @SeattleDataGuy  Рік тому

      It's where my family is from

    • @jesuispac
      @jesuispac Рік тому

      @@SeattleDataGuy Buna ziua! Mulțumesc pentru content. Foarte bun! :)

  • @dataArtists
    @dataArtists 3 роки тому +2

    Seems us data folk are always asking 'Where' the source of truth 'SofT' is of our corporate brethren. I guess that makes us 'SofTWhere' developers! 😁🤣

    • @SeattleDataGuy
      @SeattleDataGuy  3 роки тому

      I need to get a snare drum or something...dun dun tttsssss. Thaks for the humor!

  • @letechnicaljames
    @letechnicaljames 3 роки тому +1

    Hello darkness my old friend.

  • @victorpinasarnault9135
    @victorpinasarnault9135 2 роки тому +1

    Watching in 2022.

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому

      Responding in 2022! Thanks for supporting the channel

  • @MirasBlackbox
    @MirasBlackbox 2 роки тому +1

    I am BI analyst/engineer and feel attacked xD. Kidding. Everything is so relatable ~~~ 👍🥲