ETL From AWS S3 to Amazon Redshift with AWS Lambda dynamically.

Cloud Quick Labs

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 22 січ 2022
===================================================================
1. SUBSCRIBE FOR MORE LEARNING :
/ @cloudquicklabs
===================================================================
2. CLOUD QUICK LABS - CHANNEL MEMBERSHIP FOR MORE BENEFITS :
/ @cloudquicklabs
===================================================================
3. BUY ME A COFFEE AS A TOKEN OF APPRECIATION :
www.buymeacoffee.com/cloudqui...
===================================================================
This video shows complete end-to-end demo achieving ETL with using AWS lambda (python) service dynamically where a data will be loaded from AWS S3 bucket to Amazon RedShift. It has clean architectural explanation, part by part explanation as well. And demo configuring complete automation set up.
Please find the relevant file at repo :
github.com/RekhuGopal/PythonH...
#etl #amazonredshift #s3 #lambda
Наука та технологія

КОМЕНТАРІ • 112

@udaykumar8177 2 роки тому ⁺¹
Thanks for sharing knowledge. Looking forward to see more videos from you. 🙏🙏🙏
@cloudquicklabs 2 роки тому
Thank you for watching my videos.
@Mstteluguspiritualaudiobooks 3 місяці тому ⁺¹
Thank you so much for sharing this video
@cloudquicklabs 3 місяці тому
Thank you for watching my videos.
Glad that it helped you.
@sujithashanmugasundaram5063 2 роки тому ⁺⁴
Thank you for the video, can you please let me know how can we change the Datatype. In my table , i am having other datatypes where as my CSV file will give only VARCHAR. It will be helpful if you show the demo on table which is having different datatypes other than Varchar
@cloudquicklabs 2 роки тому ⁺²
Thank you for watching my videos.
Sure.. I shall create new video on it asap, stay tuned.
@sindhurraju7733 Рік тому ⁺¹
Thank you for the video.
I am facing one issue.
I am unable to see host,db-name,pwd,table-name in the lambda environmental variables 28:44
Did i miss anything? Please help.
@cloudquicklabs Рік тому
Thank you for watching my videos.
Did you check the code file that I shared in description it has logic to read env variable and you need set the variables as well.
@eugeniosp3 2 роки тому ⁺¹
Excellent tutorial.
@cloudquicklabs 2 роки тому
Thank you for watching my videos.
Happy learning.
@himanshushukla6520 6 місяців тому ⁺¹
The lambda is in a vpc and s3 is outside the vpc so my code is giving timeout error. What to do?
@cloudquicklabs 6 місяців тому
Thank you for watching my videos.
May be you need to check if S3 bucket vpc endpoint interface is enabled or not here and also make sure that AWS lambda is integrated with VPC.
@nerellakoumudi1293 7 місяців тому ⁺²
Hello how to get that S3 key please clarify me,iam little bit confused
@cloudquicklabs 7 місяців тому
Thank you for watching my videos.
Those are IAM user key and secret, please check description for shared files.
@AmolModhekar 4 місяці тому ⁺¹
How you have generate Aws secrete and access key and how to use in lambda function with role
@cloudquicklabs 4 місяці тому
Thank you for watching my videos.
@vishalkaushal4311 2 роки тому ⁺¹
what is the acess_secrete and acess_key. Are these IAM ROLE CREDS( if yes then IAM role for what)?
@cloudquicklabs 2 роки тому
Thank you for watching my videos.
Those are IAM user acess_secrete and access_key, where required access role is attached.
@vishalkaushal4311 2 роки тому
Sorry to sound like a broken record. I am still very confused.
Basically,
If i am reading it correctly, we need to create a IAM user with redshift access and pass the creds for that IAM role to the query string.
Do we need to supply some special roles to the lambda itself?
Does the IAM role need to have some permission over lambda?
@umangsinghal9320 Рік тому ⁺¹
When uploading a new file the table is not being with the value of new files , can you please help
@cloudquicklabs Рік тому
Thank you for watching my videos.
Can you please check lambda is getting triggered when you upload the file in S3 bucket.
There you could figure out the reason for not being uploaded in Redshift cluster. Error log might help me to help you
@AmolModhekar 4 місяці тому ⁺¹
How you have create role and Access key and Secrete key an all
@cloudquicklabs 4 місяці тому
Thank you for watching my videos.
@user-jc5br8gs4z 2 місяці тому ⁺¹
Is there a redshift severless version of tutorial on this?
@cloudquicklabs 2 місяці тому
Thank you for watching my videos.
Please check my latest video on Redshift , I have used Serverless.
@dikshahirole246 2 роки тому ⁺¹
I have tried to implement the same but getting error in Lambda. I think my redshift is not able to connect to the S3. How can I figure out this?
@cloudquicklabs 2 роки тому
Thank you for watching my videos.
May I know what is the error that you are seeing here
@dikshahirole246 2 роки тому
@@cloudquicklabs Thanks for reply...pasting error right here only....after querry....
[ERROR] InternalError_: S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid S5RM15AXTZNQ248Z,ExtRid 7K6TdTJUNhNS+fKGY2HgUYjq40HjYsI6Hdes69txVhXESjZkqZM8myV5tQcagVwJh/uvB4564IM=,CanRetry 1
DETAIL:
-----------------------------------------------
error: S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid S5RM15AXTZNQ248Z,ExtRid 7K6TdTJUNhNS+fKGY2HgUYjq40HjYsI6Hdes69txVhXESjZkqZM8myV5tQcagVwJh/uvB4564IM=,CanRetry 1
code: 8001
@AmolModhekar 4 місяці тому ⁺¹
Please upload one video on your setup lambda, redshift
@cloudquicklabs 4 місяці тому
Thank you for watching my videos. I shall create video here
@robertomoreiradiniz5480 2 роки тому ⁺¹
Thanks!
@cloudquicklabs 2 роки тому
Thank you for watching my videos.
@xiaoqizhou7740 Рік тому ⁺¹
I followed the same steps, but seems when I upload a csv to S3, the trigger does not invoke the lambda function. what could be the reason?
@cloudquicklabs Рік тому
Thank you for watching my videos.
Following could be the reasons.
1. Check if you have have trigger from S3 bucket object creation is added or not
2. Check if Lambda role has the trust to AWS Lambda service.
@xiaoqizhou7740 Рік тому ⁺¹
@@cloudquicklabs I will double check . Thanks much . I also got an error saying psycopg2 does not have attribute connect error while testing the lambda . I am using Python run time 3.7 , I am not sure what caused this issue ?
@cloudquicklabs Рік тому
You need to create a Lambda layer, I have attached the zip file of psycopg2 module, please create a Lambda layer for the same
@xiaoqizhou7740 Рік тому ⁺¹
@@cloudquicklabs I did create the lambda layer using Python runtime 3.7
@xiaoqizhou7740 Рік тому
@@cloudquicklabs finally I successfully loaded the data from S3 to redshift table, however when printing print(curs.fetchmany(3)), it gives [ERROR] ProgrammingError: no results to fetch, but data is available in redshift table. what could be the reason?
@poulomisenpolo 8 місяців тому ⁺¹
Thank you for sharing the video, this works. Please let me know how can i ignore the header while using this code also how can i load a tsv i.e Tab separated file using the code. It will be helpful.
@cloudquicklabs 8 місяців тому
Thank you for watching my videos.
As we are loading data with lambda, you can ignore first row with code tactics.
To load tsv files first converts to csv files which is easy then you can follow same steps mentioned in this video.
www.google.com/amp/s/www.geeksforgeeks.org/python-convert-tsv-to-csv-file/amp/
@ajaydulange688 Рік тому ⁺¹
got confused between IAM ROLE and user.....can u plz help
@cloudquicklabs Рік тому
Thank you for watching my videos.
Here IAM role is for AWS Lambda service role which gets invoked during file upload in S3 bucket. And IAM user credentials are for SQL 'COPY' command run to copy data from s3 bucket to Amazon Redshift cluster table. Watch again hope you get this.
Happy learning.
@meenakrishnan1251 2 місяці тому ⁺¹
i dont see Python 3.7, i selected python 3.8, i get this error [ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'psycopg2._psycopg' Traceback (most recent call last):
@cloudquicklabs 2 місяці тому
Thank you for watching my videos.
I am creating a new video on topic where I would use latest Python version, it would be released at next weekend.
@CarbonsHDTuts 2 роки тому ⁺¹
Hi bro, I am doing a similar thing for a course project, but there are multiple csv files uploaded at each time, and there are three different tables that need separate queries. However, it takes 40 seconds to insert for each table. This is too long. Would a connection pool make it quicker? What is your opinion to solve this? Thank uou
@cloudquicklabs 2 роки тому
Thank you for watching my videos.
There multiple options here.
1. Even though multiple files are uploaded still you will come to know what are those file names.. So you could iterate through foreach loop and upload in to tables (then identifying which table is something you could do with code logic and convention write three copy commands as I shown in my videos and execute accordingly).
2. I believe there are less time issues here as Lambda can run for upto 15 minutes in one run.
3. If it still crosses 15 minutes may be you could do that with some like shown in ua-cam.com/video/uf8heaG0IhU/v-deo.html
@dattasaiakash2740 2 роки тому ⁺¹
I am getting error message in cloud watch log events after uploading my CSV file in s3 like "[ERROR] InternalError_: The specified S3 prefix 'Weather+readings+Analysis.csv' does not exist".
@cloudquicklabs 2 роки тому
Please enable vpc endpoint for s3 service of type gateway. Then it should be working.
Because here VPC is private network and S3 become public so it is not able to communicate, I believe adding endpoints will solve the issue here.
Do let me know if you still facing the issues further
@dattasaiakash2740 2 роки тому ⁺¹
@@cloudquicklabs still getting the same error even after adding endpoints
@cloudquicklabs 2 роки тому
Okay it looks file name issue - is it possible to change file name to 'WeatherReadingAnalysis.csv' as it looks to S3 url with spaces name is the issues
@dattasaiakash2740 2 роки тому ⁺¹
@@cloudquicklabs Thank you Sir error resolved.
@cloudquicklabs 2 роки тому ⁺¹
Wow.. Happy to know that you issue is resolved.
Thank you so much for watching my videos.
@vaibhavverma1340 11 місяців тому ⁺¹
I am getting error while testing the code in lambda function:-
{
"errorMessage": "'Records'",
"errorType": "KeyError",
"stackTrace": [
" File \"/var/task/lambda_function\", line 7, in lambda_handler
for record in event['Records']:
"
]
}
@cloudquicklabs 11 місяців тому ⁺¹
Thank you for watching my video.
It dictionary key mismatch issues. First print the event, check the dictionary value and add it accordingly.
@vaibhavverma1340 11 місяців тому
@@cloudquicklabs Print the event :-
import os
def lambda_handler(event, context):
print(event)
When I print the event from above code I got this :-
Test Event Name
test
Response
null
Function Logs
START RequestId: 753eb791-2312-447a-bf06-3becf8f90926 Version: $LATEST
{'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}
END RequestId: 753eb791-2312-447a-bf06-3becf8f90926
REPORT RequestId: 753eb791-2312-447a-bf06-3becf8f90926 Duration: 1.69 ms Billed Duration: 2 ms Memory Size: 128 MB Max Memory Used: 36 MB Init Duration: 109.30 ms
Request ID
753eb791-2312-447a-bf06-3becf8f90926
@onkarnarayanpure4824 Рік тому ⁺¹
is this a incremental data load in the table
@cloudquicklabs Рік тому
Thank you for watching my videos.
This will upload the new files data that getting generated in s3 side everytime but not incremental upload.
@MrMadmaggot Рік тому ⁺¹
This works with an s3 but what about an s3 folder??? I mean a folder inside the S3
@cloudquicklabs Рік тому
Thank you for watching videos.
As it is event driven it should work in that case as well. But if there are any error , it could be handled in code. Let me know if you face any errors in these cases.
@MrMadmaggot Рік тому
@@cloudquicklabs Yes it does work. Cool ETL. Do you have any videos about ETL that are hard to make like IRL jobs
@shashishekhar8066 2 роки тому ⁺¹
Could you please provide the sample data csv file in github repo.
@cloudquicklabs 2 роки тому
Thank you for watching my videos.
Yes, Sharing it soon.
Please stay tuned.
@twilightcloudcoderz-tcc3441 Рік тому ⁺¹
Thanks, could make video with AWS redshift serverless
@cloudquicklabs Рік тому ⁺¹
Thank you for watching videos.
Sure, I shall create a video on it.
@twilightcloudcoderz-tcc3441 Рік тому
@@cloudquicklabs thanks in advance waiting ✋️ 😊
@amanrajput8917 10 місяців тому ⁺¹
Hi sir,
First of all, I would like to say thank you for sharing such wonderful tutorial video with the code, great video lecture. Looking forward to see some more videos on AWS Glue, EMR, Athena, EC2 as well.
I am unable to download the zip file from hit repo, is there anything you can suggest ?
@cloudquicklabs 10 місяців тому
Thank you for watching my videos.
Indeed I shall make more videos on mentioned Services.
You could clone the repo to down the .zip file.
@amanrajput8917 9 місяців тому
@@cloudquicklabs thank you sir
@jeivah4925 Рік тому
Unable to import module 'lambda_function': No module named 'psycopg2._psycopg'
@jeivah4925 Рік тому ⁺¹
what should we do for it?
@cloudquicklabs Рік тому
Thank you for watching my videos.
Please check the video descriptions where I have given the link of 'psycopg' package to create the required Lambda layer and this helps to import the module. I encourage to go through the video again.
@njt3773 10 місяців тому ⁺¹
csv file is not uploaded on Github repo ?
@cloudquicklabs 10 місяців тому
Thank you for watching my videos.
In this video csv files would be uploaded directly s3 bucket.
@njt3773 10 місяців тому
For anyone watching the video, you just have to upload any csv file with 3 columns to the s3 bucket. No special format needed. You're welcome :)
@MrChase543 2 роки тому ⁺¹
How to provide access_secrets and access_key
@cloudquicklabs 2 роки тому
Thank you for watching my videos.
Here I have used acess_secrete and acess_key as Lambda environment variables.
You can trying following docs.aws.amazon.com/lambda/latest/dg/configuration-envvars.html
@user-el7qo3wf2k 11 місяців тому ⁺¹
kindly add glue job so that schema also be figure out automatically
@cloudquicklabs 11 місяців тому
Thank you for watching my videos.
I shall create a new video on it soon.
@maow35 2 роки тому ⁺¹
thanks for video, i have the following problem when executing the lambda function, I get this error "Unable to import module 'LambdaTest::LambdaTest': No module named 'LambdaTest::LambdaTest"
@cloudquicklabs 2 роки тому
Thank you for watching my videos.
I believe you have created the lambda as suggested in video, please try to recreate the Lambda with runtime python 3.7 and also creat the lamba layer using same python.zip folder that I have shared and you must attach same on lamba created.
@maow35 2 роки тому ⁺¹
@@cloudquicklabs I have done everything as in the video, but it keeps throwing me the same error that the LambdaTest module is missing
@cloudquicklabs 2 роки тому
Try creating the lambda first with 'Author from scratch' option using aws console, then copy paste the code that I have given in repo link.. It should work this way.
@maow35 2 роки тому ⁺¹
@@cloudquicklabs find the problem, in the runtime configuration, the controller was wrong, I configured it as in the video and it worked perfect, I only have the vpc part left, thank you very much
@cloudquicklabs 2 роки тому ⁺¹
Happy that it worked for you now.
Happy learning ahead.
@adilsonvital5833 Рік тому ⁺¹
youtube should allow us to do a DOUBLE LIKE, because that's what you deserve
@cloudquicklabs Рік тому
Thank you for watching my videos.
And thank you very much for encouraging note. Keep watching, Happy learning.
@abhishekchaudhari4070 2 роки тому ⁺¹
How to load json file in redshift by using lambda can any one please tell
@cloudquicklabs 2 роки тому
Thank you for watching my videos.
In Json file case may be Lambda code has to changed to download, read and upload to Redshift.
May I know what would be your json file size.
@abhishekchaudhari4070 2 роки тому
@@cloudquicklabs less than 100 mb I am working on live stream data
@abhishekchaudhari4070 2 роки тому ⁺¹
IF u have code please share it
@cloudquicklabs 2 роки тому
Thank you for coming back on this.
Code should be very simple in this case I shall create new video on this.. Please stay tuned.
@abhishekchaudhari4070 2 роки тому
@@cloudquicklabs share link when it's completed
@kush-909 11 місяців тому ⁺¹
Nice effort. But very unorganized flow. It would have been at best 20 minute video if you would have spoken properly and in the correct chronology.
@cloudquicklabs 11 місяців тому
Thank you for watching videos.
I shall take valuable feedbacks and work on it.
@kush-909 11 місяців тому ⁺¹
@@cloudquicklabs Please don't mind, brother. I know it takes a lot of time to prepare, record and edit. What you are explaining is very resourceful.
But, I guess, instead of explaining things on the fly, it is better to script so that you can minimize the words and don't jump between topics. This was my experience.
I wanted to watch and learn but could not maintain that attention span.
@cloudquicklabs 11 місяців тому
Thank you very much for this valuable input. I shall adapt these in my next videos.
@maybenew7293 Рік тому ⁺¹
I CANNOT UNDERSTAND ANYTHING YOU SAY!
@cloudquicklabs Рік тому
Thank you for watching videos.
Apologise that you are not following the video. Could you please watch the video again and may stop - start and watch
@maybenew7293 Рік тому
@@cloudquicklabs Eh?
@keerthanaapajany2120 2 роки тому ⁺¹
while I am testing the code in lambda getting the error like
Test Event Name
(unsaved) test event
Response
{
"errorMessage": "could not connect to server: No such file or directory
\tIs the server running locally and accepting
\tconnections on Unix domain socket \"/tmp/.s.PGSQL.5439\"?
",
"errorType": "OperationalError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 25, in lambda_handler
password = password)
",
" File \"/opt/python/psycopg2/__init__.py\", line 126, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
"
]
}
what is the problem
@cloudquicklabs 2 роки тому
Thank you for watching my videos.
Could you please check if you have enabled VPC endpoint for S3 service on VPC where Redshift cluster is hosted. And please make sure that you have integrated your lamba wit same VPC as well.
@keerthanaapajany2120 2 роки тому
@@cloudquicklabs May I know how to check and integrate it.
@bhaskarsr3013 2 роки тому ⁺¹
I am getting the below error....
{
unable to import module 'lambda function'
}
START RequestId: 40274256-8381-4eaf-84b9-51d8c833df43 Version: $LATEST
Unable to import module 'lambda_function': No module named 'psycopg2._psycopg'
I created the layer as per the instruction I uploaded the python.zip . and selected the run time version as python 3.6
I am using this for postgrasql
@cloudquicklabs 2 роки тому
Thank you for watching my videos.
Please try use 3.7 or Greater run time versions as 3.6 is already decommissioned from AWS.
Do follow all steps as shown in video.

Наступне

Автоматичне відтворення

ETL | Incremental Data Load from Amazon S3 Bucket to Amazon Redshift Using AWS Glue | Datawarehouse