This is what everyone wants. Bang on. We have used cloud dataflow in past and it was a nightmare. Not the development but it has to go through a very long process before pipeline can be deployed into production. E.g. code review, testing, SIT, code quality checks, check for usage of unapproved libraries etc. This looks like informatica + cognos + control M to me.
How to list files from a bucket, apply arbitrary logic and load some of them ? Can I write run Python script with gsutil capabilities in data Fusion ? Too many demonstrations of easy to do things that do not apply to real life :(
Hi..is there any way to trigger pipelines from outside events, like from cloud functions or trigger pipeline on file arrival GCS.,Please direct me if you have any documentation on that
Great presentation, and Great tool I have 2 questions please can you tell me what do you exactly mean by the type File in (Source, Sink), and is it also possible to send the result of the pipeline directly to a FTP server Thank you
I have worked with few major banking using GCP. Why none of them using fusion? They all are building their own data pipelines using Airflow and DataFlow
While every programmer shouts on top of their voice why hand coding is better than drag and drop ETL, the big boys are creating tools for everyone to adopt.
I am trying to use my own python code in the python transformer plugin. But I am facing no module : py4j even though I installed and mentioned in the PYTHONPATH and Python binary path in the NATIVE mode of execution. Can anyone please help me with this python transformer
We are trying to bring data from AWS RDS to Big query using data fusion pipeline. Can you please tell us how to connect Data Fusion to RDS without making RDS endpoint available to the public (0.0.0.0/0). Presently we are getting connection time out error.
I've no words for Google. It always lives in next generation. I'm happy to work on GCP products.
This is what everyone wants. Bang on. We have used cloud dataflow in past and it was a nightmare. Not the development but it has to go through a very long process before pipeline can be deployed into production. E.g. code review, testing, SIT, code quality checks, check for usage of unapproved libraries etc. This looks like informatica + cognos + control M to me.
Can we execute stored procedures in BQ via fusion? Thanks
Does It help to bring marketing analytics data from various sources into cloud slike azure or Google cloud ?
Precisely Presented
Does data fusion have the pipelibe resume capability incase of manual errors and we need not to run all the pipeline again ?
How to list files from a bucket, apply arbitrary logic and load some of them ? Can I write run Python script with gsutil capabilities in data Fusion ? Too many demonstrations of easy to do things that do not apply to real life :(
Following
Great presentation, very important tool in our Data/ ML pipelines
Amazing, exactly what I was looking for!
Hi..is there any way to trigger pipelines from outside events, like from cloud functions or trigger pipeline on file arrival GCS.,Please direct me if you have any documentation on that
Hi Did you get the solution ? If yes pls let me know Coz I'm also looking for that
Great presentation, and Great tool
I have 2 questions please can you tell me what do you exactly mean by the type File in (Source, Sink), and is it also possible to send the result of the pipeline directly to a FTP server
Thank you
source is source file and sink to target location.
I have worked with few major banking using GCP. Why none of them using fusion? They all are building their own data pipelines using Airflow and DataFlow
because data fusion is too basic
GCP response to Azure ADF and AWS glue / data pipeline.
While every programmer shouts on top of their voice why hand coding is better than drag and drop ETL, the big boys are creating tools for everyone to adopt.
I am trying to use my own python code in the python transformer plugin. But I am facing no module : py4j even though I installed and mentioned in the PYTHONPATH and Python binary path in the NATIVE mode of execution.
Can anyone please help me with this python transformer
We are trying to bring data from AWS RDS to Big query using data fusion pipeline. Can you please tell us how to connect Data Fusion to RDS without making RDS endpoint available to the public (0.0.0.0/0). Presently we are getting connection time out error.
Datafusion is the worst tool that I have worked on, really pathetic
What are those shortcomings that you came across?