Nice architecture project. Awesome demo. Thanks for the effort to make and upload this video. Could you kindly rename as part 1/3 etc better for beginners. Again awesome channel. Subscribed.
Dude your tutorial is very good! congratulations! I am making a very similar solution to collect data from sdk for ios and android to s3 as final destination. To capture dozens of events and structure them do you recommend using glue or simply persist everything on s3 parquet and then spark it to enrich the data? thanks.
Hi Israel. Sorry for the late response. Personally I prefer the S3 parquet with spark EMR solution over glue, however it all depends on the complexity of your problem. Glue is expensive and has a 10 minute minimum on jobs and takes advantage of spark anyway. It has some AWS native connectors but overall may not be the best bang for your buck. I noticed you also mentioned that this is happening on only dozens of events. Depending on how fresh you need this data to be, I'd say another option is to use kinesis -> lambda -> s3, but again this depends heavily on the quantity of the data and the required freshness.
Hi, I want to build the exact same architecture but instead of json I require the data to be in XML format, I am not sure what to add in the POST method for HTTP headers "Content-Type" I was using "applicaiton/xml" in the mapping template but that doesn't work for HTTP header. Can you please help me out with what I can use, stuck on this issue since a long time ?
Great tutorial! I have a stupid question, with API Gateway, what's the benefit of using Kinesis -> s3 instead of have API Gateway invoke Lambda -> S3 ? Is it more cost effective ? Better performance ? Thanks for your insight!
can you please produce some basic videos on api gateway for all three configurations like HTTP, rest api and restull api with detail and diargram and differences
Hi, Seth. thank you for your video. I am able to create my API gatewway with Kinesis to S# in AWS via your video. I requested a third party to send a sample data set to the URL I created and was told tat they got the Missing Authentication Token error. I feel like it has something to do with the Method because it may have been setup for GET only. I may be wrong. Fairly noob with API gateway. How can I setup the URL, or what how to setup the correct method, so the third party can push data into my URL?
More often than not, Missing Authentication Token errors are due to trying to hit a method or URL that doesn't exist. I would first confirm that the path is correct, the stage has been properly deployed, and the client is using the proper method when accessing the API. See this link for more details: stackoverflow.com/questions/39655048/missing-authentication-token-while-accessing-api-gateway
I appreciate if you can add some diagram before any configuration for new people. i did not get why did you create two kinesis streams> it will be more helpfull if you explain through a diagram . Please consider my request for new to aws
Great suggestion! There may be a bit more explanation in the attached article but to cover the basics- kinesis stream is used as a broker, typically between microservices, while kinesis firehose (the second kinesis) is used for delivery, especially with many aws integrations such as s3. Kinesis stream alone cannot actually write to s3, however, it can be used to trigger many other services, most commonly lambda. For more information on the difference between the two check out this link: jayendrapatil.com/aws-kinesis-data-streams-vs-kinesis-firehose/#:~:text=Kinesis%20data%20streams%20%E2%80%93%20Kinesis%20data,streaming%20data%20for%20specialized%20needs.&text=Firehose%20also%20allows%20for%20streaming,for%20processing%20through%20additional%20services.
In hindsight I'd look into using IoT core which is somewhere around $1 per million messages. From there I'd send messages directly to firehose, let firehose handle a transformation to parquet, and write to S3. Cheaper, less complexity, and better query performance.
Awesome Tuts! Eagerly waiting for the next Part that you promised at the end !!
This was great. I hope you will continue to make more videos. Thanks
Pretty solid video, hopefully i can use this to get a job
Great demo
Thanks for this. Been looking for a similar solution. This is really great.
Nice architecture project.
Awesome demo.
Thanks for the effort to make and upload this video.
Could you kindly rename as part 1/3 etc better for beginners.
Again awesome channel. Subscribed.
Thanks Satya! Absolutely, great idea!
Dude your tutorial is very good! congratulations!
I am making a very similar solution to collect data from sdk for ios and android to s3 as final destination.
To capture dozens of events and structure them do you recommend using glue or simply persist everything on s3 parquet and then spark it to enrich the data?
thanks.
Hi Israel. Sorry for the late response. Personally I prefer the S3 parquet with spark EMR solution over glue, however it all depends on the complexity of your problem. Glue is expensive and has a 10 minute minimum on jobs and takes advantage of spark anyway. It has some AWS native connectors but overall may not be the best bang for your buck. I noticed you also mentioned that this is happening on only dozens of events. Depending on how fresh you need this data to be, I'd say another option is to use kinesis -> lambda -> s3, but again this depends heavily on the quantity of the data and the required freshness.
We are pallning to use api to lambda to kinesis to elastic.. should we reconsider using lambda in between and directly send to kinesis?
Hi, I want to build the exact same architecture but instead of json I require the data to be in XML format, I am not sure what to add in the POST method for HTTP headers "Content-Type" I was using "applicaiton/xml" in the mapping template but that doesn't work for HTTP header. Can you please help me out with what I can use, stuck on this issue since a long time ?
Great tutorial! I have a stupid question, with API Gateway, what's the benefit of using Kinesis -> s3 instead of have API Gateway invoke Lambda -> S3 ? Is it more cost effective ? Better performance ? Thanks for your insight!
Good one !
Can we push data into the kinesis data stream with let's say faker js api?
Thanks a lot.
can you please produce some basic videos on api gateway for all three configurations like HTTP, rest api and restull api with detail and diargram and differences
Thanks a lot 🎉
Hi, Seth. thank you for your video. I am able to create my API gatewway with Kinesis to S# in AWS via your video. I requested a third party to send a sample data set to the URL I created and was told tat they got the Missing Authentication Token error. I feel like it has something to do with the Method because it may have been setup for GET only. I may be wrong. Fairly noob with API gateway. How can I setup the URL, or what how to setup the correct method, so the third party can push data into my URL?
More often than not, Missing Authentication Token errors are due to trying to hit a method or URL that doesn't exist. I would first confirm that the path is correct, the stage has been properly deployed, and the client is using the proper method when accessing the API. See this link for more details: stackoverflow.com/questions/39655048/missing-authentication-token-while-accessing-api-gateway
I appreciate if you can add some diagram before any configuration for new people. i did not get why did you create two kinesis streams> it will be more helpfull if you explain through a diagram . Please consider my request for new to aws
Great suggestion! There may be a bit more explanation in the attached article but to cover the basics- kinesis stream is used as a broker, typically between microservices, while kinesis firehose (the second kinesis) is used for delivery, especially with many aws integrations such as s3. Kinesis stream alone cannot actually write to s3, however, it can be used to trigger many other services, most commonly lambda. For more information on the difference between the two check out this link: jayendrapatil.com/aws-kinesis-data-streams-vs-kinesis-firehose/#:~:text=Kinesis%20data%20streams%20%E2%80%93%20Kinesis%20data,streaming%20data%20for%20specialized%20needs.&text=Firehose%20also%20allows%20for%20streaming,for%20processing%20through%20additional%20services.
24M DAO *4 = 100M batches in day * 3.5 $ = 3500$ in day for amazon api gateway, this is very expensive.
In hindsight I'd look into using IoT core which is somewhere around $1 per million messages. From there I'd send messages directly to firehose, let firehose handle a transformation to parquet, and write to S3. Cheaper, less complexity, and better query performance.