Load Data from GCS to BigQuery using Dataflow

Поділитися
Вставка
  • Опубліковано 25 вер 2024
  • Looking to get in touch?
    Drop me a line at vishal.bulbule@gmail.com, or schedule a meeting using the provided link topmate.io/vis... Load Data from GCS to BigQuery using Dataflow
    Unlock the potential of Google Cloud Dataflow in seamlessly transferring data from Google Cloud Storage (GCS) to BigQuery! This tutorial dives deep into the intricacies of leveraging Dataflow for efficient data loading. Gain valuable insights into the step-by-step process, optimizations, and best practices to orchestrate a smooth and scalable data transfer journey from GCS to BigQuery using Google Cloud Dataflow.
    Associate Cloud Engineer -Complete Free Course
    • Associate Cloud Engine...
    Google Cloud Data Engineer Certification Course
    • Google Cloud Data Engi...
    Google Cloud Platform(GCP) Tutorials
    • Google Cloud Platform(...
    Generative AI
    • Generative AI
    Getting Started with Duet AI
    • Getting started with D...
    Google Cloud Projects
    • Google Cloud Projects
    Python For GCP
    • Python for GCP
    Terraform Tutorials
    • Terraform Associate C...
    Linkedin
    / vishal-bulbule
    Medium Blog
    / vishalbulbule
    Github
    Source Code
    github.com/vis...
    Email - vishal.bulbule@techtrapture.com
    #googlecloud #devops #python #devopsproject #kubernetes #cloudcomputing #video #tutorial #genai #generativeai #aiproject #python
  • Наука та технологія

КОМЕНТАРІ • 43

  • @NangunuriKarthik
    @NangunuriKarthik 4 місяці тому

    Hi can you please me how to move the tables from Oracle to big query using google dataflow

  • @chandrasekharborapati4599
    @chandrasekharborapati4599 6 місяців тому

    Hi bro.. good day.. i have one query.. is it possible to delete bigquery records after processed all the records using dataflow job in gcp. Using java api.. please provide a solution if it is possible...

  • @chetanbulla9185
    @chetanbulla9185 10 місяців тому

    Nice video ..I am able to execute the DataFlow.. Thanks

  • @mulshiwaters5312
    @mulshiwaters5312 3 місяці тому

    Good realtime handson experience. I assume when I create Data pipeline using Dataflow which get executed when I click on RUN JOB. How I can use this pipeline for daily data load from GCS to BQ ? is this possible with Dataflow or do I need tool like Cloud Composer to schedule this job at certain intervals ?

    • @techtrapture
      @techtrapture  3 місяці тому

      Cloud composer is too costly , you can schedule it using cloud scheduler , check this video for your use case
      ua-cam.com/video/b593huRgXic/v-deo.html

  • @iloveraw100
    @iloveraw100 Рік тому

    I need to remove the header rows as this is getting populated. How to do that?

  • @Sriharibabup-w6f
    @Sriharibabup-w6f 8 місяців тому

    what are the Transimittion we used in Data Flow

  • @AdityaBajaj-z7d
    @AdityaBajaj-z7d 7 місяців тому

    How to upsert data in Dataflow?

  • @archanajain99
    @archanajain99 8 місяців тому

    hii,
    i need your help that i need to create a GCP dataflow pipeline using Java. This pipeline should take file in GCS bucket as input and write the data into Bigtable. how to work on it? please guide.

    • @techtrapture
      @techtrapture  8 місяців тому

      Here some idea from another video
      ua-cam.com/video/KrB6DpkvICE/v-deo.htmlsi=ZWBjt3CrCVJmwkQ5

  • @MiguelPumapillo-jd3ug
    @MiguelPumapillo-jd3ug 6 місяців тому

    thanks

  • @sikondyer2068
    @sikondyer2068 Рік тому +1

    How to load csv file with comma in data? do you know how to escape the comma? thanks

    • @techtrapture
      @techtrapture  Рік тому

      Comma is deliminator or its part of data?

    • @sikondyer2068
      @sikondyer2068 Рік тому

      @@techtrapture It's part of the data, like for example the column Address has a value of "Bangkok, Thailand"

    • @gnm280
      @gnm280 8 місяців тому

      i have exaclty the same issue with data rows with comma@@sikondyer2068

  • @adijos92
    @adijos92 Рік тому +2

    Can you send me that CSV.format file and all three files to my mail id..??

  • @premsoni0143
    @premsoni0143 Рік тому

    Is there need to configure VPC for streaming between cloud spannerto GCP pubsub? I tried to set up and it failed using: "Failed to start the VM, launcher-202xxxx, used for launching because of status code: INVALID_ARGUMENT, reason: Invalid Error: Message: Invalid value for field 'resource.networkInterfaces[0].network': 'global/networks/default'. The referenced network resource cannot be found. HTTP Code: 400."

    • @techtrapture
      @techtrapture  Рік тому

      It depends on how you are streaming...if you are doing it using dataflow which i seem from error then it's an error for dataflow worker vm. So you are missing details in dataflow configuration.

  • @chandanpatil2704
    @chandanpatil2704 10 місяців тому

    Hi,
    I have been using same approach like you but with different CSV file(UDF is same) but I am getting following error (Loyalty Number is Integer column):
    Error message from worker: org.apache.beam.sdk.util.UserCodeException: java.util.concurrent.CompletionException: javax.script.ScriptException: :5:12 Expected ; but found Number
    obj.Loyalty Number = values[0];
    ^ in at line number 5 at column number 12
    Can you tell me what the error is actually?

    • @techtrapture
      @techtrapture  10 місяців тому

      Check if datatype of bigquery column and CSV data is same

  • @natannascimento7388
    @natannascimento7388 Рік тому

    Hello I am getting the error below.
    org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: Error parsing schema gs://fazendo/mentloja.json
    Caused by: java.lang.RuntimeException
    Caused by: org.json.JSONException
    Can you help me?

  • @rahulhundare
    @rahulhundare Рік тому

    One more Question:- Why do we need to specify temp folders here?.

    • @techtrapture
      @techtrapture  Рік тому

      During job execution it stores some metadata and temporary staging files in temp folder. You can monitor it during job execution

  • @VishalKumar-z4p9v
    @VishalKumar-z4p9v Рік тому

    How can we load the same data from csv file to pubsub topic and then through dataflow job in bigquey ?

    • @techtrapture
      @techtrapture  Рік тому

      First thing you need to create dataflow job with template "Text files on Cloud storage to Pub/Sub" and now to load data from pub/sub to bigquery you don't need dataflow , Google added new subscription option for pubsub where we can directly load to BQ.

  • @yadavakshay53
    @yadavakshay53 3 місяці тому

    Can you share the CSV file?

    • @techtrapture
      @techtrapture  3 місяці тому

      Help me with your email id , I will share it with you

  • @shwetarawat4027
    @shwetarawat4027 Рік тому +1

    Can you also attach the .csv file so that we can download and use?

    • @techtrapture
      @techtrapture  Рік тому

      Sure, Can you share me Email id , i will share it with you for now.

    • @shwetarawat4027
      @shwetarawat4027 Рік тому +1

      @@techtrapture I've used another .csv file for now... thank you

    • @shwetarawat4027
      @shwetarawat4027 Рік тому

      Also, when trying to give the bigquery dataset name while creating the job i.e project ID:datasetname it is giving error : "Error: value must be of the form ".+:.+\..+"".... how to resolve this? Also, when I am giving the table name, it says' Table not found'

    • @techtrapture
      @techtrapture  Рік тому

      Use format
      Projectname.datasetname.tablename

    • @shwetarawat4027
      @shwetarawat4027 Рік тому

      @@techtrapture I am doing the same, still the same error

  • @jaykay7057
    @jaykay7057 Рік тому

    How to create USD as I do not have any java knowledge

  • @VarshiniAleti
    @VarshiniAleti Рік тому

    can you pls share .csv file

  • @rahulhundare
    @rahulhundare Рік тому

    Hello I am getting below error.
    org.apache.beam.sdk
    .util.UserCodeException: java.lang.NoSuchMethodException: No such function transform
    at org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39)
    why so

    • @techtrapture
      @techtrapture  Рік тому

      This is something related to your code you are using...don't think anything related to GCP environment

    • @rahulhundare
      @rahulhundare Рік тому +1

      @@techtrapture Yes you are correct that is with invalid function name inside the code.
      Thanx for prompt reply... :)

    • @techtrapture
      @techtrapture  Рік тому

      @@rahulhundare glad you found it.