Cricket Statistics Data Pipeline in Google Cloud using Airflow | Data Engineering Project

TechTrapture

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 25 вер 2024

КОМЕНТАРІ • 39

@sravyam8055 Місяць тому
Excellent sir! I had never ever watched a full video with this much clarity on Batch pipelines.. keep going the same speed and very good explanation.🎉🎉🎉🎉🎉🎉🎉🎉
@dhananjaylakkawar4621 9 місяців тому ⁺²
I was thinking to build a project on GCP and your video arrived . great work sir! thank you
@shyjukoppayilthiruvoth6568 4 місяці тому ⁺¹
Very good video. would recommend to any one who is new to GCP
@venkatatejanatireddi8018 9 місяців тому ⁺¹
I sincerely recommend this to people who wants to explore DE pipeline orchestration on GCP
@ed-salinas-97 24 дні тому
Just discovered this channel so there may be some videos more recent that address some things. I worry about CSV files that have 100+ columns. That .js file and .json file will get pretty crazy very fast (although I suppose you could use python to create the json file). I'm also more interested in being able to append to a table once it's created with new data that can get either uploaded daily or weekly. Overall, great job! I definitely learned a few things here!
@bernasiakk Місяць тому ⁺¹
This is great! I followed your video step-by-step, and now it's time for me to do a project of my own based on your stuff! Will use something more European though, like soccer or basketball haha :D Thanks!!!
@techtrapture Місяць тому ⁺¹
True...better for you not to use Cricket 😅😅
@aashishsharma4734 Місяць тому
Very good video to understand data engineering workflow
@balajichakali9293 7 місяців тому
Thanks is a small word to you sir..🙏
This is the Best Explanation I ever seen in youtube. It is very helpful to me. I have completed this project end to end and l have learnt so many things.
@techtrapture 7 місяців тому
Glad that it helped you.
@brjkumar 8 місяців тому ⁺²
Good job. Looks like the best video for GCP ELT & other GCP stuff.
@techtrapture 7 місяців тому
Glad it was helpful!
@ajayagrawal7586 5 місяців тому
I was looking for this type of video for a long time. Thanks.
@prabhuduttasahoo7802 5 місяців тому
Learnt a lot from you. Thank you sir
@wreckergta5470 7 місяців тому
Thank you, learned a lot from you sir
@techtrapture 7 місяців тому
Happy to know. Keep learning brother 🎉
@ashishvats1515 2 місяці тому
Hello, sir! Great video.
If we need to implement CDC or append new data to a table, do we have to extract the data date-wise and load it to GCS? And how do we append that data to an existing table in BigQuery?
Cloud Composer: Extract data from an API and load it to GCS.
Cloud Function: Trigger the event to load a new CSV file to BigQuery using Dataflow.
So where do we need to write the logic to append the new data to an existing table in BigQuery?
@Anushri_M29 4 місяці тому
Hi Vishal, this is a really great video, but it would be very helpful if you could also explain the code that you have written from 6:01.
@pariyaparesh 7 місяців тому
Thanks a lot for such great explanation. Can you please share which video recording/editing tool is being used?
@NirvikVermaBCE 6 місяців тому
I am getting stuck on the airflow code, I think it might be an issue with the filename in the python code, bash_command='python /home/airflow/gcs/dags/scripts/extract_data_and_push_gcs.py', I have uploaded the extract_data_and_push_gcs.py in scripts of dags.
However, is there any way to check the path /home/airflow/gcs/dags/scripts/ ??
@techtrapture 6 місяців тому
/home/airflow/gcs/dags = your dags GCS bucket
It's same path
@rishiraj2548 9 місяців тому ⁺¹
Thanks
@sampathgoud8108 5 місяців тому
I tried the same way as per your video but i got this error when running the data flow job through template. Could you please help me out what exactly the mistake which i have done. I used the same schema which you have used.
Error message from worker: org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: Failed to serialize json to table row: 1,Babar Azam,Pakistan
@techtrapture 5 місяців тому
Are you using same json files?
@sampathgoud8108 5 місяців тому
yes@@techtrapture
Below is the JSON file
{
"BigQuery Schema": [{
"name": "rank",
"type": "STRING"
},
{
"name": "name",
"type": "STRING"
},
{
"name": "country",
"type": "STRING"
}
]
}
@sampathgoud8108 5 місяців тому
I tried Rank column with both String and INTEGER data types. For both i am getting the same issue.
@pankajgurbani1484 5 місяців тому
@sampathgoud8108 I was getting the same error, this got resolved after I put the 'transform' in JavaScript UDF name in Optional Parameters while setting up DataFlow job
@SwapperTheFirst 7 місяців тому
Hi Vishal, in this and your other Composer videos you use standard Airflow operators (for example, Python or Bash). Do you know how to install Google Cloud Airflow package for Google cloud specific operators? I've tried to upload the wheel to /plugins bucket, but nothing happens. Composer can't import Google Cloud operators (like pubsub) and DAGs with these operators are listed as broken.
Thanks!
@techtrapture 7 місяців тому ⁺¹
I usually refer this code sample
airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/index.html
@SwapperTheFirst 7 місяців тому
@@techtrapture thanks! But how to use these operators in Composer?
In Airflow I just pip install the package. How to do this in Composer?!
@techtrapture 7 місяців тому ⁺¹
Ohh k got your doubts now...you have to add it in requirements.txt and keep in dags folder. Also other options available here.
cloud.google.com/composer/docs/how-to/using/installing-python-dependencies
@SwapperTheFirst 7 місяців тому
@@techtrapture yes, this is exacthly what I needed. I can use both of these options, depending on the DAGs. Great!
@ShigureMuOnline 4 місяці тому
nice video. just one question why do you create a dataflow ? you can insert rows using python?
@techtrapture 4 місяці тому ⁺¹
Yes I agree but as a project I want to show the complete orchestration process and use multiple services
@ShigureMuOnline 4 місяці тому
@@techtrapture really thanks for the faster answer. I Will see all your videos
@venkatatejanatireddi8018 9 місяців тому
i have been facing issues invoking the dataflow job, while using the default App engine service account. Could you let me know if you were using a specific service account to work with the cloud function?
@techtrapture 9 місяців тому
No, I am using the same default service account.what error you are getting?
@TechwithRen-Z 4 місяці тому ⁺¹
This tutorial is 😩a waste of time for beginners. He did not show how to connect python to the GCP before storing data in bucket. There a lot of missing steps.
@Rajdeep6452 7 місяців тому ⁺¹
you didnt show how to connect GCP before storing data in bucket. You have jumped a lot of steps. your video lacks quality. You should also include which dependencies to use and all. Just running your code and uploading to Github is not everything.

Наступне

Автоматичне відтворення

Building an Automated Data Pipeline for Sales Data in Google Cloud | GCP Data Engineering Project