I appreciate this video so much. This is the real "day in the life" video that I'm looking for. These other videos are mostly always showed in a glorified manner. I don't care what you had for lunch, tell me what you do at work as said profession. Thanks again!
@@SeattleDataGuy I'm going to sign up for classes at my local community College they have three programs. General programing, networking, and cyber security. I want to go into the data field but don't know which program to start with. Any suggestions?
Overall, very easy to understand explanation video. Good job. Currently I work as a engineering specialist and my role combines data engineering and data analyst. I feel it's best if you can have some understanding from a data engineering perspective it will help you out tremendously when it comes to analyzing said data.
I agree. From both perspectives. If you're a data engineer who understands the goals of a data analyst or a data analyst who understand the data engineering process, then you can communicate better with the other role.
@@SeattleDataGuy hi guys, thanks for opening this topic Data Scientist vs Engineer workflow & duties! Very much interesting and informative. Now for me as a Beginner in Data sector who wants take an educational course from which position is better to start to learn the subject : from Data Science or Engineering? Please advise! Looking forward to receiving your support on that!
I'm in a small shop where I started doing Analyst work (i.e. writing queries, preparing reports for dashboard uploads, troubleshooting alignment between the frontend to the backend) and now I'm learning more about the data engineering aspect of how to manage the pipeline. The biggest problem for me is understanding the legacy code (which is honestly great learning). But I constantly feel like there's a better way, I just don't have the knowledge of how to do it yet. And like other creators have mentioned - there's just so many tools out there now.
Sounds about right. The legacy code can be great to learn but it can also be difficult to maintain. There honestly is probably a better way, but I assume it would be time consuming to migrate. What is the code base built off of? I have been spending a lot of time moving some companies away from custom code and over to more maintainable solutions like Fivetran and Stitch or if they still want to code Astronomer or Google Cloud Composer.
@@SeattleDataGuy The data and pipeline is managed through GCP using node.js and running cloud scheduler/cloud functions/pubsub/etc. A lot of what I do is gathering data from another Accounting Software and trying to duplicate or create custom reports from the raw data. Part of my problem is that the Accounting Software is also a blackbox in that they don't have much documentation bc they're worried about competitors obtaining source information, so I'm constantly flying blind trying to reverse engineer their tables while also maintaining accounting standards. There's also some hubspot, slack, and nodered stuff in there as well. I feel that there's already some tooling within GCP that could make this easier to manage using Dataflow and DataProc. But we're also talking about moving to Prefect to manage the pipelines as well. I wish there was another me - 1 to manage the day-to-day data analysis work, and another to research the pipeline and options for optimization and improvement.
I can at least tell you you're not alone! I think Dataflow or Cloud Composer would be good choices. Also with hubspot and slack you could use a low code solution as most low code solutions have data connectors to manage the big business apps. Are they expecting you to build all of this?
@@SeattleDataGuy I won't build all of it. But once a lot of the analysis backlog (mainly updating sql queries) is fixed, or as I become more knowledgeable and able to fix things faster, then I'll transition over to more of the DevOps/DE solutions. I'm already troubleshooting and monitoring the current pipelines which are based off BEAM structure - Source > Staging > DW > Creating Views > import into dashboard, so it's not far-fetched to pick up that task as well, right? This my first Analyst/Engineering role after a decade in Accounting. I feel like I'm doing okay, always could be faster.
Although building pipelines is more of a challenge, I am really enjoying azure synapse analytics (serverless sql) for data engineering. Essentially, there is just a data store (with documents) and a data warehouse definition. Data visualization tools just tap the data warehouse definition and an engine serves the relevant transformed data from the (virtual?) warehouse. It feels much cleaner than other approaches I have worked with. I could see this being where things are going: businesses have large object document storage, and Google/Amazon/Microsoft sell mpp engines that enable businesses to easily consume their data as if everything was a small sql database with structured data. Pipelines might still populate the data store, but a lot of the transformation can happen on end-user read.
I still haven't got to play in the new Microsoft environment. This sounds a little like Looker in terms of how you have defined models and metrics in their ML files. Would you say that is accurate? I think many companies are switching to the EL(T) process where they use a low code solution like Azure data factory or fivetran just to get the data into a datawarehouse and then transform. Do you think this puts data governance at risk. Whenever I hear end-users can set up their own transforms, it gets concerning because then you have 5 definitions of the same field floating around. So many questions.
@@SeattleDataGuy Data governance is definitely a concern. The security side can be handled multiple ways, such as native row-level security or dataset truncating by the client (for embedded analytics). As far as multiple definitions of the same field, I think that is more likely to cause havok. It can directly impact the quality of decisioning, along with the confidence to use the information to make decisions. I have not had a chance to try out Looker yet. I am using Power BI, although, from what I have read, I expect Tableau, Power BI and Looker to have a lot of functional overlap. One of the interesting things about Power BI and Synapse is that a person can define Synapse views in a Power BI query. Synapse can be completely empty of any tables, views or anything resulting in data. In this case, an analyst would strictly use Synapse as an engine to query the data store from a data visualization tool. Naturally, Power BI will not let a person connect front-end paramters to these queries :)
Yup! I agree about havoc being created. I have now worked on several projects with this issue. Not to mention in some cases low-code projects where analysts were just pulling in Billions of rows into Looker or PowerBI and wondering why it is taking so long to pull in the data. But yes, I am loving a lot of these tools. They are making managing all the different data sources so much easier.
@@SeattleDataGuy Please consider doing a "how to set up a youtube channel". Maybe covering what you use to produce your content. Software, hardware and etc.
@@SeattleDataGuy Yes. Maybe I should just start instead of trying to save for expensive equipment. My phone's camera is actually good. Which affordable mic, on Amazon, would you recommend?
The data janitor said that data engineers don't have many meetings, which is why he likes it over data science and machine learning. Not true? Also, can you get hired as a data engineer without a prior IT job (for example, if you did some machine learning as a biologist, and passed some SQL certification exams).
Hey Serik! Always good to see you. I think you have seen my data engineering roadmap video correct? That kind of is a hodge podge of courses. But perhaps I should put together a video A-Z of DEs.
@@SeattleDataGuy that will be amazing, and you can sell the course as well. I feel there are so much information and hard to find A-Z course for DEs. Thank you
Hey I was curious on what the workload is. Do they expect you to do x amount of data models per week. In other words, do they just give you the data say we want to find x y and z and you go out and do it. In a given week do you have to do like thousands of datasets? For example , the vouume of things they want done. I am looking for the day to day.
Dude, 5:38 -> "Queiries" is a 8 character string.. while the correct word has only 7. Look at the English Language data to figure that out. We're engineers but that doesn't mean we can't spell (ironically, this is a word/term Data people use quite a lot :D).
Also, truthfully I have thought it might be funny to do a video "Data Engineer Vs Data Scientist Vs Data Analyst" Who is Stronger? And have Ken Jee and Luke Barousse and I do some form of tech bro-y challenge. Just because both Ken Jee and Luke Barousse have both put clips of them working out in their videos. So far I think Luke wins with his Muscle Ups.
It took me about 7 years to become a Senior data engineer. But that really depends on the company. I have seen people get the senior title straight from college because they had a masters degree. I would worry less about titles and more on the work. The more you chase something, the farther it gets from you. Zach wilson talks about this on linkedin. He couldn't get a senior DE position at FB, quit, took a year off, went to Netflix and got hired as a Sr. SWE and is now a tech lead at airbnb.
@@SeattleDataGuy Yes I can say right now that I am into it! Data is everywhere and it is something that won't expire in the future. I just watched your video about 'Breaking into Data Engineering and How to be DE', my goal is to be Data Analyst first to gain experience and then later be DE. Also I might consider being a DE intern after I graduate but upon searching for DE internships here in my country, there are less results. Thank you so much Ben!
Hey Ben, Great video. wrt Data Modelling, is The Data Warehouse Toolkit still the best resource for learning? If I'm not mistaken, it was last updated in 2013 so I'm wondering if there's anything more recent. Thanks!
also, would be great if you could summarize your videos at the beginning or split them up so people could find things in the video better and also recap what they learned.
@@SeattleDataGuy Thanks, prob easier for you to just divide the videos up. Any comment on my question about Data Modelling? The Toolkit is a long book and very detailed, I'm assuming it's not necessary to go though all of it. Is going through a couple Udemy courses on Data Modelling enough to know to do interviews?
This is always a challenge. I would say don't get to caught up with every new tool. Pick 1-2 a year to learn but focus more on what you're working on at work.
Depends on the company's needs. I've already worked in a company that didn't want to use any third-party paid software outside AWS, so I worked a lot with Python creating connections and managing Airflow and Kafka in EKS and Hadoop in EMR, but now I work at a company that uses a lot of third-party tools and software and my main focus is modeling data using SQL and DBT for Airflow. Data Engineering roles vary a lot from company to company, but what Ben said in this video is mainly the day-to-day routine of all data engineering jobs I've been in.
I agree with Daniel. It really does depend company to company. For example, at large tech companies you will likely work closely with software engineers because they are building products you are pulling from. Where as in other companies you might be working with external third party solutions so you talk more with solutions engineers and sales engineers. It all depends on how close you are with the data!
Thank you Daniel for responding! I love it when people share their experiences(besides just me). Like you said, the role of a DE depends company to company. You could work a lifetime and not realize that some DEs do completely different work. So having other perspectives is exactly what this community needs!
Hi man,i'm brazilian and i'm doing a 2,5 degree in data science,i want to be a data engineer but a data anlyst too,did u heard about analytics engineer? i don't wanna be overwhelmed with data engineering and data science,what should i do?
Yeah there are a lot of terms out there. I would focus more on the work you like doing and not the title. Do you prefer analytics or engineering? Because even with analytics engineers, I imagine some people will be either leaning towards the analytics side of that role or the DE side of that role
@@SeattleDataGuy Thanks for reply. Can you make one video comparing AWS, GCP and Azure for data engineering?(job market, difficulty etc) I am moving on path with Azure but just want a heads up regarding what is better. And lastly, Keep up the good work!
This is challenging. I have a few videos talking about how I got my first DE job. There are some internships and jr. positions. But the other route is usually analyst to DE or SWE to de. Here is a video on the concept ua-cam.com/video/lGzh-QendJc/v-deo.html
This and my upspeak are two areas I am trying to work on. You guys don't even see the times I borderline start saying a new sentence before i finish the last one. I am working on it. Thank you for your patience!
If you enjoyed this video, then consider signed up for my weekly newsletter! seattledataguy.substack.com/p/airbyte-is-open-source-the-way-forward
I appreciate this video so much. This is the real "day in the life" video that I'm looking for. These other videos are mostly always showed in a glorified manner. I don't care what you had for lunch, tell me what you do at work as said profession.
Thanks again!
Yup, they aren't day in the lives of a professional, but of a general human🤷♀️
Thank you for this video, I've been working as a data engineer intern for 6 months now and this video shows everything I'm seeing in my job
I am so glad it was helpful!
Mic so big they call it Michael
Amazing video..
I watched your video and it helped me a lot...
I'm now a data engineer.
Thanks a lot
You're making my day! How are you enjoying being a DE?
I wasn't expecting a "day in the life" video so soon. Thanks
I had been planning to make it for a while. So ta-da!
@@SeattleDataGuy I'm going to sign up for classes at my local community College they have three programs. General programing, networking, and cyber security. I want to go into the data field but don't know which program to start with. Any suggestions?
My fav forgotten print statement.
print("did you make it here")
Overall, very easy to understand explanation video. Good job. Currently I work as a engineering specialist and my role combines data engineering and data analyst. I feel it's best if you can have some understanding from a data engineering perspective it will help you out tremendously when it comes to analyzing said data.
I agree. From both perspectives. If you're a data engineer who understands the goals of a data analyst or a data analyst who understand the data engineering process, then you can communicate better with the other role.
@@SeattleDataGuy hi guys, thanks for opening this topic Data Scientist vs Engineer workflow & duties! Very much interesting and informative.
Now for me as a Beginner in Data sector who wants take an educational course from which position is better to start to learn the subject : from Data Science or Engineering?
Please advise!
Looking forward to receiving your support on that!
+ @Chris Ellis - your opinion is very welcome too 🙏
I'm in a small shop where I started doing Analyst work (i.e. writing queries, preparing reports for dashboard uploads, troubleshooting alignment between the frontend to the backend) and now I'm learning more about the data engineering aspect of how to manage the pipeline. The biggest problem for me is understanding the legacy code (which is honestly great learning). But I constantly feel like there's a better way, I just don't have the knowledge of how to do it yet. And like other creators have mentioned - there's just so many tools out there now.
Sounds about right. The legacy code can be great to learn but it can also be difficult to maintain. There honestly is probably a better way, but I assume it would be time consuming to migrate. What is the code base built off of?
I have been spending a lot of time moving some companies away from custom code and over to more maintainable solutions like Fivetran and Stitch or if they still want to code Astronomer or Google Cloud Composer.
@@SeattleDataGuy The data and pipeline is managed through GCP using node.js and running cloud scheduler/cloud functions/pubsub/etc. A lot of what I do is gathering data from another Accounting Software and trying to duplicate or create custom reports from the raw data. Part of my problem is that the Accounting Software is also a blackbox in that they don't have much documentation bc they're worried about competitors obtaining source information, so I'm constantly flying blind trying to reverse engineer their tables while also maintaining accounting standards. There's also some hubspot, slack, and nodered stuff in there as well. I feel that there's already some tooling within GCP that could make this easier to manage using Dataflow and DataProc. But we're also talking about moving to Prefect to manage the pipelines as well. I wish there was another me - 1 to manage the day-to-day data analysis work, and another to research the pipeline and options for optimization and improvement.
I can at least tell you you're not alone! I think Dataflow or Cloud Composer would be good choices. Also with hubspot and slack you could use a low code solution as most low code solutions have data connectors to manage the big business apps. Are they expecting you to build all of this?
@@SeattleDataGuy I won't build all of it. But once a lot of the analysis backlog (mainly updating sql queries) is fixed, or as I become more knowledgeable and able to fix things faster, then I'll transition over to more of the DevOps/DE solutions. I'm already troubleshooting and monitoring the current pipelines which are based off BEAM structure - Source > Staging > DW > Creating Views > import into dashboard, so it's not far-fetched to pick up that task as well, right? This my first Analyst/Engineering role after a decade in Accounting. I feel like I'm doing okay, always could be faster.
Sounds like a great move! It's pretty typical. How are you enjoying the shift over to the DE/Analyst world?
Although building pipelines is more of a challenge, I am really enjoying azure synapse analytics (serverless sql) for data engineering. Essentially, there is just a data store (with documents) and a data warehouse definition. Data visualization tools just tap the data warehouse definition and an engine serves the relevant transformed data from the (virtual?) warehouse. It feels much cleaner than other approaches I have worked with. I could see this being where things are going: businesses have large object document storage, and Google/Amazon/Microsoft sell mpp engines that enable businesses to easily consume their data as if everything was a small sql database with structured data. Pipelines might still populate the data store, but a lot of the transformation can happen on end-user read.
I still haven't got to play in the new Microsoft environment. This sounds a little like Looker in terms of how you have defined models and metrics in their ML files. Would you say that is accurate?
I think many companies are switching to the EL(T) process where they use a low code solution like Azure data factory or fivetran just to get the data into a datawarehouse and then transform.
Do you think this puts data governance at risk. Whenever I hear end-users can set up their own transforms, it gets concerning because then you have 5 definitions of the same field floating around.
So many questions.
@@SeattleDataGuy Data governance is definitely a concern. The security side can be handled multiple ways, such as native row-level security or dataset truncating by the client (for embedded analytics). As far as multiple definitions of the same field, I think that is more likely to cause havok. It can directly impact the quality of decisioning, along with the confidence to use the information to make decisions.
I have not had a chance to try out Looker yet. I am using Power BI, although, from what I have read, I expect Tableau, Power BI and Looker to have a lot of functional overlap.
One of the interesting things about Power BI and Synapse is that a person can define Synapse views in a Power BI query. Synapse can be completely empty of any tables, views or anything resulting in data. In this case, an analyst would strictly use Synapse as an engine to query the data store from a data visualization tool. Naturally, Power BI will not let a person connect front-end paramters to these queries :)
Yup! I agree about havoc being created. I have now worked on several projects with this issue. Not to mention in some cases low-code projects where analysts were just pulling in Billions of rows into Looker or PowerBI and wondering why it is taking so long to pull in the data.
But yes, I am loving a lot of these tools. They are making managing all the different data sources so much easier.
Love the background! 🤌
Thank you for noticing 🙏
@@SeattleDataGuy Please consider doing a "how to set up a youtube channel". Maybe covering what you use to produce your content. Software, hardware and etc.
Are you planning to start a channel! Yeah I want to do that when I finally get a real camera..I am just using my phone.
@@SeattleDataGuy Yes. Maybe I should just start instead of trying to save for expensive equipment. My phone's camera is actually good. Which affordable mic, on Amazon, would you recommend?
Awesome! Also on data engineering(Let's grow the community). Also I am just using a blue yeti mic. The classic affordable option.
Better background!
Thanks for noticing! We had fun setting it up
Informative video, thank you!
Glad you enjoyed it!
currently i am working as data engineer and i spend my days on data migration, etl dev, am and report development.
do you like doing that work?
The data janitor said that data engineers don't have many meetings, which is why he likes it over data science and machine learning. Not true? Also, can you get hired as a data engineer without a prior IT job (for example, if you did some machine learning as a biologist, and passed some SQL certification exams).
hello! Thank you for insightful content. I’d like to ask if archiving data is part of DE’S job? If yes, please make a separate video about it. Thanks.
Ben, thanks for video. Do you have A-Z course how become a data engineer? Or can you recommend one ?
Thank you
Hey Serik! Always good to see you. I think you have seen my data engineering roadmap video correct? That kind of is a hodge podge of courses. But perhaps I should put together a video A-Z of DEs.
Because I don't know if I know of a good udemy or coursera course
@@SeattleDataGuy that will be amazing, and you can sell the course as well. I feel there are so much information and hard to find A-Z course for DEs. Thank you
I don't know about selling a course, but maybe! It would take some time to put together a course of everything we do
your way of explanation is so fast. for the beginner but explanation is good clear and clarity .
Hey I was curious on what the workload is. Do they expect you to do x amount of data models per week. In other words, do they just give you the data say we want to find x y and z and you go out and do it. In a given week do you have to do like thousands of datasets? For example , the vouume of things they want done. I am looking for the day to day.
I love your content.
Thank you!!!
Dude, 5:38 -> "Queiries" is a 8 character string.. while the correct word has only 7. Look at the English Language data to figure that out.
We're engineers but that doesn't mean we can't spell (ironically, this is a word/term Data people use quite a lot :D).
thanks for sharing, I wonder if I can dabble in DE without statistics background or not. Is it a degree-specific occupation?
Ben lookin swole af I see you👀
The key is the angle...
Also, truthfully I have thought it might be funny to do a video "Data Engineer Vs Data Scientist Vs Data Analyst" Who is Stronger? And have Ken Jee and Luke Barousse and I do some form of tech bro-y challenge. Just because both Ken Jee and Luke Barousse have both put clips of them working out in their videos. So far I think Luke wins with his Muscle Ups.
Exactly. He does pipeline curls.
curl -X GET H 'Content-Type: application/json' 0.0.0.0:5001/api/get/swole
@@SeattleDataGuy Lol no
Per your request I smashed the like button ; p
Thank you kind sir!
Hey Ben, I was wondering if you'll be doing another resume review in the near future. Thanks
Yes, I will plan to make a post about it in the next day or two
@@SeattleDataGuy great, I'll be looking for it!
@@oresttokovenko Thanks for the support!
Quick question, are you a senior data engineer? If so, how long did it take for you to become one from when you started as a junior?
It took me about 7 years to become a Senior data engineer. But that really depends on the company. I have seen people get the senior title straight from college because they had a masters degree. I would worry less about titles and more on the work. The more you chase something, the farther it gets from you. Zach wilson talks about this on linkedin. He couldn't get a senior DE position at FB, quit, took a year off, went to Netflix and got hired as a Sr. SWE and is now a tech lead at airbnb.
How many DevOps tasks an average engineer covers? Is it data engineers job to monitor pipelines and being on-call?
Now I realized that I was hired as data analyst, but in fact I mainly do data engineer's job. Lol
That happens a lot. Time to ask for a raise
samedt
Basically we do plumbing work with data 😆
Thank you Andreas Kretz for pushing the term!
Plumbing of data science 💪
Great video Ben!
@@andreaskayy hahaha
The gangs all year!!
thank you
You're welcome!
I'm a Computer Engineering student, can I consider choosing Data Engineer as my career path? Thank you Ben :)
If you're into data, then yes! What are your goals?
@@SeattleDataGuy Yes I can say right now that I am into it! Data is everywhere and it is something that won't expire in the future. I just watched your video about 'Breaking into Data Engineering and How to be DE', my goal is to be Data Analyst first to gain experience and then later be DE. Also I might consider being a DE intern after I graduate but upon searching for DE internships here in my country, there are less results. Thank you so much Ben!
Hey Ben,
Great video. wrt Data Modelling, is The Data Warehouse Toolkit still the best resource for learning? If I'm not mistaken, it was last updated in 2013 so I'm wondering if there's anything more recent. Thanks!
also, would be great if you could summarize your videos at the beginning or split them up so people could find things in the video better and also recap what they learned.
Alright, I can work on that! Splitting up my videos that is.
@@SeattleDataGuy Thanks, prob easier for you to just divide the videos up. Any comment on my question about Data Modelling? The Toolkit is a long book and very detailed, I'm assuming it's not necessary to go though all of it. Is going through a couple Udemy courses on Data Modelling enough to know to do interviews?
how can we stay up to date with technology news about DE
This is always a challenge. I would say don't get to caught up with every new tool. Pick 1-2 a year to learn but focus more on what you're working on at work.
We're basically janitors of data?
I prefer plumbers 😂
@@SeattleDataGuy Oh yeah, pipelines! 😅
The pipelines, data modeling, and occasionally adhoc queries.
Does a data engineer work often with software engineers and app developers or is that more of a database administrator role?
Depends on the company's needs. I've already worked in a company that didn't want to use any third-party paid software outside AWS, so I worked a lot with Python creating connections and managing Airflow and Kafka in EKS and Hadoop in EMR, but now I work at a company that uses a lot of third-party tools and software and my main focus is modeling data using SQL and DBT for Airflow. Data Engineering roles vary a lot from company to company, but what Ben said in this video is mainly the day-to-day routine of all data engineering jobs I've been in.
I agree with Daniel. It really does depend company to company. For example, at large tech companies you will likely work closely with software engineers because they are building products you are pulling from. Where as in other companies you might be working with external third party solutions so you talk more with solutions engineers and sales engineers. It all depends on how close you are with the data!
Thank you Daniel for responding! I love it when people share their experiences(besides just me). Like you said, the role of a DE depends company to company. You could work a lifetime and not realize that some DEs do completely different work. So having other perspectives is exactly what this community needs!
Hi man,i'm brazilian and i'm doing a 2,5 degree in data science,i want to be a data engineer but a data anlyst too,did u heard about analytics engineer? i don't wanna be overwhelmed with data engineering and data science,what should i do?
Yeah there are a lot of terms out there. I would focus more on the work you like doing and not the title. Do you prefer analytics or engineering? Because even with analytics engineers, I imagine some people will be either leaning towards the analytics side of that role or the DE side of that role
@@SeattleDataGuy idk what i like omg,but for online job is better data analytics...right?
By online, do you mean remote? Data analytics or DE work can be done remote pretty easily. Been doing it for 2 years now...thanks to the cough
@@SeattleDataGuy thanks
@@henrique4171 Thank you for the question! I appreciate all the support.
How much SQL is required to begin with data engineering?
I would say it is very SQL heavy. So you need to know the basics as well as understand how to re-model data with SQL.
@@SeattleDataGuy Thanks for reply.
Can you make one video comparing AWS, GCP and Azure for data engineering?(job market, difficulty etc)
I am moving on path with Azure but just want a heads up regarding what is better.
And lastly, Keep up the good work!
Hey , It was a great video first of all but a quick question , Can a complete fresher ( College student) get an entry level job as a Data engineer?
This is challenging. I have a few videos talking about how I got my first DE job. There are some internships and jr. positions. But the other route is usually analyst to DE or SWE to de. Here is a video on the concept ua-cam.com/video/lGzh-QendJc/v-deo.html
Forgotten print statment xD
thats me
Are there enough openings for data engineers, as I'm planning to pursue masters in usa?
Yes I believe so
You talk so quickly makes it hard to understand
audio on youtube can be sped up or slowed down.
This and my upspeak are two areas I am trying to work on. You guys don't even see the times I borderline start saying a new sentence before i finish the last one. I am working on it. Thank you for your patience!
Which one do you do malcorub?