Really good project! two things to point out. you don't need to calculate the total medals won by each country because the row is unique by country so the data is already in a usable format for this task. 2) your query for average number of entries by gender is not correct. Again, each row is unique by country so the avg() will not work. here is the right code: select Discipline ,cast(female as float) / total as average_female ,cast(male as float) / total as average_male from entriesgender .
Hi sir I personally thank you very much for the AWS Data Engineering Project series and we are learning a lot from them. We all are really grateful for this level of your generosity. Our humble request you to create an Azure Data Engineering Project using SQL & SSMS as there is only one project of Azure Data Engineer. We want to learn more about the Azure Data Engineering Project as there is no Azure Data Engineer project using Bronze, Silver & Gold layer transformation in PySpark DataBricks and SSMS respectively. I hope our request will soon be accepted from your end. Thank you so much for your valuable guidance & support.
Thank you very much Darshil Sir. The way you have explained the project is very Awesome. You made the explanation as much simpler to understand us. I loved it.
If we want the query in 19:11 to give us same result as the notebook, won't the query be SELECT Discipline, (CAST(Female AS float)/CAST(Total AS float)) AS Fe_Average, (CAST(Male AS FLOAT)/CAST(Total AS FLOAT)) AS Ma_AVG FROM entriesGender; Right now you are just grouping by Discipline and I think the Discipline is already unique which is why it's pulling the same result as SELECT * FROM entresGender.
Briliant tutorial. If we wanted to post this on our git for portfolio reasons, would it not be possible to keep this all active or would there be charges incurred even though we aren't using them?
Quick question. In real-time, will companies use only Synapse analytics alone for data processing and for storage? If yes, during what scenarios they will use, will it not be challenge for testing?
Hello, Darshil I have finally finished this section as well. I sincerely appreciate your efforts. I have a question for you now: will you offer certification upon completion of your course? I emailed you earlier, but I never heard back. I understand that you are very busy, but I really wanted your combo course.
Thank you for this Darshil. After ETL if we dont want to do analytics and want to implement some ML library prediction, what would be the best way to achieve that in azure platform ?
19:11 -- Calculate the average number of entries by gender for each discipline SELECT Discipline, AVG(Female) AverageFemale, AVG(Male) AverageMale FROM entriesgender GROUP BY Discipline;
Thanks you! One question Darshil - Does this project including the Coursera preparation course enough for one to sit for the Azure Data Engineer Certification(Associate)?
Nice . quick question, after creating databricks and synapse how much cost it would take if we keep in pause state if not using. Also for databricks, is there any pause options?
Sir, in Part 1 at the very end ... I am not able to create 'transforemed-data' folders for entriesgender and medals, it's just creating folders for rest 3 files. :( Did I missing something or I made a mistake?
@@sumanthhabib8028 I did exactly the same as instructed in the tutorial and if things were wrong then it would not have created the 3/5 folders. I didn't have the code now, as I have deleted the whole resource group after completion. Thanks for showing concern, I will try the whole thing one more time for better understanding and practice ☺️
@@vemedia5850 bro you can go into your account details and add synapse into the list of programs you can use in your free account. it will then show up
Sir, at 8:45 of Part 2 It won't let me create external table after I press continue. It says "Failed to detect schema, Please review and update the file format settings to allow file schema detection" I had no problem in part 1 as everything went smooth. When I click detail it says "Failed to execute query. Error: Error encountered while parsing data: 'Invalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.'. Underlying data description: fil" I even googled and it seems like there is no solution. PLEASE HELP!!!
i solved it changing the format to .parquet > athletes.repartition(1).write.mode("overwrite").option("header",'true').parquet("/mnt/tokyoolymic/transformed-data/athletes.parquet") coaches.repartition(1).write.mode("overwrite").option("header", "true").parquet("/mnt/tokyoolymic/transformed-data/coaches.parquet") entriesgender.repartition(1).write.mode("overwrite").option("header", "true").parquet("/mnt/tokyoolymic/transformed-data/entriesgender.parquet") medals.repartition(1).write.mode("overwrite").option("header", "true").parquet("/mnt/tokyoolymic/transformed-data/medals.parquet") teams.repartition(1).write.mode("overwrite").option("header", "true").parquet("/mnt/tokyoolymic/transformed-data/teams.parquet") also modify the DB *or create a new one* in parquet format
One question where you do the data modeling, I mean the star model. Can I do it in Azure? And can I also do the measurements in Azure and use power bi only as a visualizer?
While creating any Azure Resource new user might get an error saying "Resource not registered". Like I got while creating Synapse saying "The Azure Synapse resource provider (Microsoft.Synapse) needs to be registered with the selected subscription." To fix this you can follow this method - ua-cam.com/video/fvdCWbadIko/v-deo.htmlsi=EXgMD2Eq_RKJYmkV Hope it helps!
Failed to execute query. Error: Error encountered while parsing data: 'Invalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.'. Underlying data description: file 'tokyoolympicdatasnug.dfs.core.windows.net/tokyo-olympic-data/transformed-data/athletes/part-00000-tid-446593903816833556-9250b5ec-d2a1-4ff8-a184-6e2e4276fe0e-17-1-c000.csv'. The batch could not be analyzed because of compile errors.
As an alternative, I decided to go back to Azure Databricks and write my transformed dataset types as Parquet instead of CSV format and everything worked out fine. Try this approach instead.
Really good project! two things to point out. you don't need to calculate the total medals won by each country because the row is unique by country so the data is already in a usable format for this task. 2) your query for average number of entries by gender is not correct. Again, each row is unique by country so the avg() will not work. here is the right code: select
Discipline
,cast(female as float) / total as average_female
,cast(male as float) / total as average_male
from entriesgender .
Exactly
Hi sir
I personally thank you very much for the AWS Data Engineering Project series and we are learning a lot from them. We all are really grateful for this level of your generosity. Our humble request you to create an Azure Data Engineering Project using SQL & SSMS as there is only one project of Azure Data Engineer. We want to learn more about the Azure Data Engineering Project as there is no Azure Data Engineer project using Bronze, Silver & Gold layer transformation in PySpark DataBricks and SSMS respectively.
I hope our request will soon be accepted from your end.
Thank you so much for your valuable guidance & support.
.
It's really supportive even for beginners. Great Work done. Love from Pakistan
its an awesome Project where End to End Clearly explained.i really loved it!! you helped all the way!!
Very good project for the understanding. Great work Darshil..
Really nice project to learn, Thanks a lot Darshil
loved the way of teaching
Thank you Darshill. It was a very good tutorial on how to work with different tools.
Your video is very clear and convenient to understand and gain confidence. Thanks a lot.
Thank you so much. Great work. Please upload more project videos.
The explanation was on point and really good. I liked it.
Thank you very much Darshil Sir. The way you have explained the project is very Awesome. You made the explanation as much simpler to understand us. I loved it.
Exactly what i was looking for, thank you so much!
Great explanation and this helps me a lot.
very informative and helpful video... plz make more videos related to azure data engineer with diff diff activities, pipelines.
Thanks for sharing, taking the same basis from part 1 of converting the same based on Microsoft Fabric
Excellent tutorial, I will definitely get in touch for your other courses
Nice work❤
Thank you so much!! really well explained
If we want the query in 19:11 to give us same result as the notebook, won't the query be
SELECT Discipline, (CAST(Female AS float)/CAST(Total AS float)) AS Fe_Average, (CAST(Male AS FLOAT)/CAST(Total AS FLOAT)) AS Ma_AVG FROM entriesGender;
Right now you are just grouping by Discipline and I think the Discipline is already unique which is why it's pulling the same result as SELECT * FROM entresGender.
Thank you for this project sir.
Briliant tutorial. If we wanted to post this on our git for portfolio reasons, would it not be possible to keep this all active or would there be charges incurred even though we aren't using them?
Quick question. In real-time, will companies use only Synapse analytics alone for data processing and for storage? If yes, during what scenarios they will use, will it not be challenge for testing?
Ye nhi nolega bhai ye free de rha hai
Hello, Darshil I have finally finished this section as well. I sincerely appreciate your efforts. I have a question for you now: will you offer certification upon completion of your course? I emailed you earlier, but I never heard back. I understand that you are very busy, but I really wanted your combo course.
can you share how to connect to synapse analytics using serverless SQL pool.
but we can only continue visualization part if we have premium or professional BI account
Thank you for this Darshil. After ETL if we dont want to do analytics and want to implement some ML library prediction, what would be the best way to achieve that in azure platform ?
How is there a new resource group at 0:25?
19:11
-- Calculate the average number of entries by gender for each discipline
SELECT Discipline, AVG(Female) AverageFemale, AVG(Male) AverageMale
FROM entriesgender
GROUP BY Discipline;
Nice tutorial, do you have a similar one for GCP cloud?
In the real world which service will you choose between datafactory, databricks and synapse?
i got trigger async thread was blocked when i run airflow standalone on Ubuntu. i did export pythonasyciodebug=1 but nothing is output. any tips?
check all dependencies and version of environment and packages. See for any upgrades. Check the log files too, to see if you find any error.
Thanks you!
One question Darshil - Does this project including the Coursera preparation course enough for one to sit for the Azure Data Engineer Certification(Associate)?
Nice . quick question, after creating databricks and synapse how much cost it would take if we keep in pause state if not using.
Also for databricks, is there any pause options?
If you keep the cluster in terminated state, there won't be any charges in Databricks not sure whether that is what you are expecting
good work bro
Sir, I am requesting to do more project videos on Azure technology
Excelente video!
How do I access the PowerBI when it's only available for business users?
create a new user and give the owner access to it. Then sign up with that account
While creating the table I am getting : review and update the file format settings to allow file schema detection in synapse studio
How to solve this?
I am getting the same error. Did you find hoe to resolve it..?
Sir, in Part 1 at the very end ... I am not able to create 'transforemed-data' folders for entriesgender and medals, it's just creating folders for rest 3 files. :( Did I missing something or I made a mistake?
can you share the code?
@@sumanthhabib8028 I did exactly the same as instructed in the tutorial and if things were wrong then it would not have created the 3/5 folders. I didn't have the code now, as I have deleted the whole resource group after completion. Thanks for showing concern, I will try the whole thing one more time for better understanding and practice ☺️
thank you
Thaankkss!!!
Synapse analytics does not work for student account
same issue
Did you find an alternative?
@@vemedia5850 bro you can go into your account details and add synapse into the list of programs you can use in your free account. it will then show up
Good one
Sir, at 8:45 of Part 2 It won't let me create external table after I press continue. It says "Failed to detect schema, Please review and update the file format settings to allow file schema detection" I had no problem in part 1 as everything went smooth. When I click detail it says "Failed to execute query. Error: Error encountered while parsing data: 'Invalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.'. Underlying data description: fil" I even googled and it seems like there is no solution. PLEASE HELP!!!
i solved it changing the format to .parquet >
athletes.repartition(1).write.mode("overwrite").option("header",'true').parquet("/mnt/tokyoolymic/transformed-data/athletes.parquet")
coaches.repartition(1).write.mode("overwrite").option("header", "true").parquet("/mnt/tokyoolymic/transformed-data/coaches.parquet")
entriesgender.repartition(1).write.mode("overwrite").option("header", "true").parquet("/mnt/tokyoolymic/transformed-data/entriesgender.parquet")
medals.repartition(1).write.mode("overwrite").option("header", "true").parquet("/mnt/tokyoolymic/transformed-data/medals.parquet")
teams.repartition(1).write.mode("overwrite").option("header", "true").parquet("/mnt/tokyoolymic/transformed-data/teams.parquet")
also modify the DB *or create a new one* in parquet format
@@fmwihler many thanks bri
i had the same issue. worked with .parquet to solve the issue. thank you @fmwihler
One question where you do the data modeling, I mean the star model. Can I do it in Azure? And can I also do the measurements in Azure and use power bi only as a visualizer?
You can do data modeling in PowerBI too.
have an issue while publishing the data : InternalServerError executing request:
were you able to resolve the error?
incremental loading ja tutorial miljata to mja ajata...
While creating any Azure Resource new user might get an error saying "Resource not registered". Like I got while creating Synapse saying "The Azure Synapse resource provider (Microsoft.Synapse) needs to be registered with the selected subscription."
To fix this you can follow this method - ua-cam.com/video/fvdCWbadIko/v-deo.htmlsi=EXgMD2Eq_RKJYmkV
Hope it helps!
Has anyone managed to succesfully complete the project and if so in how much time?
2 hrs
@@srijanbansal6078 thanks! Did you upload it anywhere? How easy isit to make as part of your portfolio via git etc?
encountered this error " Failed to detect schema
Please review and update the file format settings to allow file schema detection" ....
Failed to execute query. Error: Error encountered while parsing data: 'Invalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.'. Underlying data description: file 'tokyoolympicdatasnug.dfs.core.windows.net/tokyo-olympic-data/transformed-data/athletes/part-00000-tid-446593903816833556-9250b5ec-d2a1-4ff8-a184-6e2e4276fe0e-17-1-c000.csv'.
The batch could not be analyzed because of compile errors.
I encounter the same issue…..
please how did you go about it ?
@@pspc890121
Facing a similar challenge whenever I attempt to create tables.
As an alternative, I decided to go back to Azure Databricks and write my transformed dataset types as Parquet instead of CSV format and everything worked out fine. Try this approach instead.
capo
Where is Dashboard?😠
This is free course. Don't show that expression
Show some respect. He is doing all this for free
how do we create dashboard? do you have any idea
see the whole video. he mentions at 14:50
Excellent!