Excellent sir! I had never ever watched a full video with this much clarity on Batch pipelines.. keep going the same speed and very good explanation.🎉🎉🎉🎉🎉🎉🎉🎉
Just discovered this channel so there may be some videos more recent that address some things. I worry about CSV files that have 100+ columns. That .js file and .json file will get pretty crazy very fast (although I suppose you could use python to create the json file). I'm also more interested in being able to append to a table once it's created with new data that can get either uploaded daily or weekly. Overall, great job! I definitely learned a few things here!
This is great! I followed your video step-by-step, and now it's time for me to do a project of my own based on your stuff! Will use something more European though, like soccer or basketball haha :D Thanks!!!
Thanks is a small word to you sir..🙏 This is the Best Explanation I ever seen in youtube. It is very helpful to me. I have completed this project end to end and l have learnt so many things.
Hello, sir! Great video. If we need to implement CDC or append new data to a table, do we have to extract the data date-wise and load it to GCS? And how do we append that data to an existing table in BigQuery? Cloud Composer: Extract data from an API and load it to GCS. Cloud Function: Trigger the event to load a new CSV file to BigQuery using Dataflow. So where do we need to write the logic to append the new data to an existing table in BigQuery?
I am getting stuck on the airflow code, I think it might be an issue with the filename in the python code, bash_command='python /home/airflow/gcs/dags/scripts/extract_data_and_push_gcs.py', I have uploaded the extract_data_and_push_gcs.py in scripts of dags. However, is there any way to check the path /home/airflow/gcs/dags/scripts/ ??
I tried the same way as per your video but i got this error when running the data flow job through template. Could you please help me out what exactly the mistake which i have done. I used the same schema which you have used. Error message from worker: org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: Failed to serialize json to table row: 1,Babar Azam,Pakistan
@sampathgoud8108 I was getting the same error, this got resolved after I put the 'transform' in JavaScript UDF name in Optional Parameters while setting up DataFlow job
Hi Vishal, in this and your other Composer videos you use standard Airflow operators (for example, Python or Bash). Do you know how to install Google Cloud Airflow package for Google cloud specific operators? I've tried to upload the wheel to /plugins bucket, but nothing happens. Composer can't import Google Cloud operators (like pubsub) and DAGs with these operators are listed as broken. Thanks!
Ohh k got your doubts now...you have to add it in requirements.txt and keep in dags folder. Also other options available here. cloud.google.com/composer/docs/how-to/using/installing-python-dependencies
i have been facing issues invoking the dataflow job, while using the default App engine service account. Could you let me know if you were using a specific service account to work with the cloud function?
This tutorial is 😩a waste of time for beginners. He did not show how to connect python to the GCP before storing data in bucket. There a lot of missing steps.
you didnt show how to connect GCP before storing data in bucket. You have jumped a lot of steps. your video lacks quality. You should also include which dependencies to use and all. Just running your code and uploading to Github is not everything.
Excellent sir! I had never ever watched a full video with this much clarity on Batch pipelines.. keep going the same speed and very good explanation.🎉🎉🎉🎉🎉🎉🎉🎉
I was thinking to build a project on GCP and your video arrived . great work sir! thank you
Very good video. would recommend to any one who is new to GCP
I sincerely recommend this to people who wants to explore DE pipeline orchestration on GCP
Just discovered this channel so there may be some videos more recent that address some things. I worry about CSV files that have 100+ columns. That .js file and .json file will get pretty crazy very fast (although I suppose you could use python to create the json file). I'm also more interested in being able to append to a table once it's created with new data that can get either uploaded daily or weekly. Overall, great job! I definitely learned a few things here!
This is great! I followed your video step-by-step, and now it's time for me to do a project of my own based on your stuff! Will use something more European though, like soccer or basketball haha :D Thanks!!!
True...better for you not to use Cricket 😅😅
Very good video to understand data engineering workflow
Thanks is a small word to you sir..🙏
This is the Best Explanation I ever seen in youtube. It is very helpful to me. I have completed this project end to end and l have learnt so many things.
Glad that it helped you.
Good job. Looks like the best video for GCP ELT & other GCP stuff.
Glad it was helpful!
I was looking for this type of video for a long time. Thanks.
Learnt a lot from you. Thank you sir
Thank you, learned a lot from you sir
Happy to know. Keep learning brother 🎉
Hello, sir! Great video.
If we need to implement CDC or append new data to a table, do we have to extract the data date-wise and load it to GCS? And how do we append that data to an existing table in BigQuery?
Cloud Composer: Extract data from an API and load it to GCS.
Cloud Function: Trigger the event to load a new CSV file to BigQuery using Dataflow.
So where do we need to write the logic to append the new data to an existing table in BigQuery?
Hi Vishal, this is a really great video, but it would be very helpful if you could also explain the code that you have written from 6:01.
Thanks a lot for such great explanation. Can you please share which video recording/editing tool is being used?
I am getting stuck on the airflow code, I think it might be an issue with the filename in the python code, bash_command='python /home/airflow/gcs/dags/scripts/extract_data_and_push_gcs.py', I have uploaded the extract_data_and_push_gcs.py in scripts of dags.
However, is there any way to check the path /home/airflow/gcs/dags/scripts/ ??
/home/airflow/gcs/dags = your dags GCS bucket
It's same path
Thanks
I tried the same way as per your video but i got this error when running the data flow job through template. Could you please help me out what exactly the mistake which i have done. I used the same schema which you have used.
Error message from worker: org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: Failed to serialize json to table row: 1,Babar Azam,Pakistan
Are you using same json files?
yes@@techtrapture
Below is the JSON file
{
"BigQuery Schema": [{
"name": "rank",
"type": "STRING"
},
{
"name": "name",
"type": "STRING"
},
{
"name": "country",
"type": "STRING"
}
]
}
I tried Rank column with both String and INTEGER data types. For both i am getting the same issue.
@sampathgoud8108 I was getting the same error, this got resolved after I put the 'transform' in JavaScript UDF name in Optional Parameters while setting up DataFlow job
Hi Vishal, in this and your other Composer videos you use standard Airflow operators (for example, Python or Bash). Do you know how to install Google Cloud Airflow package for Google cloud specific operators? I've tried to upload the wheel to /plugins bucket, but nothing happens. Composer can't import Google Cloud operators (like pubsub) and DAGs with these operators are listed as broken.
Thanks!
I usually refer this code sample
airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/index.html
@@techtrapture thanks! But how to use these operators in Composer?
In Airflow I just pip install the package. How to do this in Composer?!
Ohh k got your doubts now...you have to add it in requirements.txt and keep in dags folder. Also other options available here.
cloud.google.com/composer/docs/how-to/using/installing-python-dependencies
@@techtrapture yes, this is exacthly what I needed. I can use both of these options, depending on the DAGs. Great!
nice video. just one question why do you create a dataflow ? you can insert rows using python?
Yes I agree but as a project I want to show the complete orchestration process and use multiple services
@@techtrapture really thanks for the faster answer. I Will see all your videos
i have been facing issues invoking the dataflow job, while using the default App engine service account. Could you let me know if you were using a specific service account to work with the cloud function?
No, I am using the same default service account.what error you are getting?
This tutorial is 😩a waste of time for beginners. He did not show how to connect python to the GCP before storing data in bucket. There a lot of missing steps.
you didnt show how to connect GCP before storing data in bucket. You have jumped a lot of steps. your video lacks quality. You should also include which dependencies to use and all. Just running your code and uploading to Github is not everything.