In your case, in each job run, it will grab all data(instead of new data only) from the rds table to redshift and then do the merge. Let's say the table is very big ---over several hundreds of Gigbytes, the operation will be very expensive. Correct? Can you add a sql filter transformation step in between to grab only the new data changed since the last job run, so that only the new data is merged?
The reason it got appended into the target table is because, the "Matching Keys" involves all of the column. Had it been just the "industry_name_anzsic" in matching keys. It would have updated it. Actually, I think you assumed that the just the leftmost column is the Matching key which happens most of the time as left is usually the primary key column and we do merges and joins on it. Hence, This was a honest mistake happened due to old habits. Old habits die hard.
Thank you for watching my videos. Glad that it helped you. This time I did not collect the scripts. But if you follow the scenarios as explained in video , you would getting required script there.
hi brother im able to collect data one by one through table but when im trying to establish connection through crowler its says unable to connect or establish connection then its unable to connect is that possible to add all tables at a time
Thank you for watching my videos. There could be multiple reasons like below. 1. Check if vpc endpoints for rds 2. Check if inbound security group has required ports enabled here. 3. Check if credentials are correctly provided.
In your case, in each job run, it will grab all data(instead of new data only) from the rds table to redshift and then do the merge. Let's say the table is very big ---over several hundreds of Gigbytes, the operation will be very expensive. Correct? Can you add a sql filter transformation step in between to grab only the new data changed since the last job run, so that only the new data is merged?
Thank you for watching my videos.
Indeed , I shall make videos on this point.
This real very good input.
Thank you so much for the session. Its really helpful for the beginner like me..
Thank you for watching my videos.
Glad that it helped you.
The reason it got appended into the target table is because, the "Matching Keys" involves all of the column. Had it been just the "industry_name_anzsic" in matching keys. It would have updated it. Actually, I think you assumed that the just the leftmost column is the Matching key which happens most of the time as left is usually the primary key column and we do merges and joins on it. Hence, This was a honest mistake happened due to old habits. Old habits die hard.
Thank you for watching my videos.
It's built on capability for Glue that I have used. But I am happy to explore more about it.
hello, will it move the whole data from rds to Redshift or only a copy of rds data to Redshift?
Thank you for watching my videos.
It's copy of Data will be moved not back and lift and shift.
much thankful video, can you please share the script/code which was generated in etl glue
Thank you for watching my videos.
Glad that it helped you.
This time I did not collect the scripts.
But if you follow the scenarios as explained in video , you would getting required script there.
Can we do the opposite way, that is, load data from Redshift to RDS postgreSQL? I tried but it doesn't work. Can you make it work and make a video?
Thank you for watching my videos.
It's unique requirement though , I shall try creating a videos on this soon.
please tell me what are the policies you have attach in iam role
Thank you for watching my videos.
As this is a demo video I am using 'admin' access which is not recommended in production.
policies in iam role as i am facing timeout error in aws glue@@cloudquicklabs
hi brother
im able to collect data one by one through table but when im trying to establish connection through crowler its says unable to connect or establish connection then its unable to connect is that possible to add all tables at a time
Thank you for watching my videos.
There could be multiple reasons like below.
1. Check if vpc endpoints for rds
2. Check if inbound security group has required ports enabled here.
3. Check if credentials are correctly provided.
Please make a video on the pyspark script.
Thank you for watching my videos.
Indeed I shall make pyspark script videos
How to get classes ?
Thank you for watching my videos.
I don't take classes but help through my videos , let me know if you have any topic to cover in videos.
That clicking sound from windows 98 is very distracting.
It was my old system.
New videos are being built from new Windows 11.
Hope you Appreciate it.