By faaaar the best video about modeling single-table in Dynamo. In comparison, the official AWS videos assume you know a lot, and for those who are just starting NoSql + Dynamo, AWS videos are advanced.
With the amplify framework it uses multiple dynamodb tables (each gql model creates a new dynamo table) which is a little bit confusing as it goes against the single table principle.
I am convinced that single table design would be a good pattern to work with DynamoDb. But, does it defy the Database-per-service style for Microservice patterns ? Thoughts ?
Not sure there is defiance. Each service can have a config for its key layers and use the db via those keys only. As such the services only have a view via their permissible keys. Say a users service dealing with the user key space only for all user permissible operations 🤔
For a single table, how do you efficiently keep track of dynamo stream events, and make sure not all events make a stream event? Or do we just allow them to all make stream events and filter them out with a "--filter-criteria"?
One question we are calculating hash of a primary key to identify the partition so in case new partition(s) are added, don't we have reshuffle all the data?
What happens when you have the sort key encoded with your partition key (such as a user name) and then the user name changes? You have to go update user name in a bunch of locations? (in partition key and then in all the sort keys)?
The reason to limit the projection in a GSI is not primarily because of performance, it is because of storage costs. Do you want to pay real money for storing all of that data that you may not use in a GSI?
This is generally solvable with GSI-overloading. The number of access patterns having 20 GSI’s enables when you’re using generic GSI PKs is insanely high
In his "why no JOIN" section, he goes over why 1 table is better than joining across 2 tables. But I have a use case where, e.g., I will only ever want either the org's info or all the org's user's info. So in that scenario, I could do 2 separate tables and never need a JOIN. Is this a scenario in which 2 table is okay, and 1 table won't offer better performance?
His example of "keeping track of LeBron James Twitter followers" is very similar to my use case. So in that scenario, I either want who is being followed (metadata) or their full list of followers, no JOINs needed.
Also have to think about things like monitoring... You now have to monitor two table... Two might not be too much but if the number of table increase, it can become of consequence. Like in everything, it's all questions of trade-offs. Understand what you're gaining/losing in the choice made
Hey Alex I have question regarding secondary index you mentioned while mentioning ticket handling, couldnt you just create USER#USER_ID as primary key and TICKET#TICKET_ID as sort key so we would have additional record with user as primary key and his ticket items as sort key
“Know your access patterns in advance” … so much harder to get this right than it seems, makes dynamo development so much slower and frustrating to work with…I found using much simpler access patterns to be easier to manage
I am little bit confused like i have a table name : my-application-table 1st Item : pk= USER#12345 , sk = PROFILE#12345 , image= {some_url}, name = Taman 2nd Item : pk= USER#123456 , sk = PROFILE#123456 , image= {some_url}, name = Alex 3rd Item : pk = USER#12345, sk = POST#1 , postbody = This is some tutorial 4th Item : pk = POST#1 , sk = COMMENT#1 , comment = my own comment , user = 123456 In this case when i want to show user comment in the post along with the user image and image in the comment but in the comment section i wont have user image and name. So which approach should i take 1.Complex Data Type : I cannot use this because user might change image or name 2. Duplication : I cannot use this too because user image and name might change What could be the correct approach for this type of structure or should restructure the table.
You have to allow some data duplication when switching SQL to NoSQL database. So, to handle user profile change, write some cloud functions to ensure data changed in all duplicated place.
This was by far the best video I've seen on DynamoDB. So clear and concise
This is the most valuable introduction to dynamodb!
The questions of the participants were on point. It's like they're reading my mind.
Thank you, best introduction, to DynamoDB
best video to get started learning DynamoDB design
Awesome! I'm migrating an App to DynamoDB, this brings light to me! Thank you!
By faaaar the best video about modeling single-table in Dynamo. In comparison, the official AWS videos assume you know a lot, and for those who are just starting NoSql + Dynamo, AWS videos are advanced.
Awesome presentation! Didn't know anything about DynamoDb before this presentation, great explanation!
Great explanations, Alex. It clarifies a lot about Single Table app.
You saved my day, Alex!
better than aws talks !!!!!!!!!!!!!
really a great lecture on one to many relational design use cases, It is really usefull
Excellent!!
, I'll use thsi info to create my first serverless app :D
With the amplify framework it uses multiple dynamodb tables (each gql model creates a new dynamo table) which is a little bit confusing as it goes against the single table principle.
Single table design is not a principal. It is one design pattern. It does not fit every use case.
I am convinced that single table design would be a good pattern to work with DynamoDb. But, does it defy the Database-per-service style for Microservice patterns ? Thoughts ?
he answers this at 59:30. one table per service
Each service with its own table.
Not sure there is defiance. Each service can have a config for its key layers and use the db via those keys only. As such the services only have a view via their permissible keys.
Say a users service dealing with the user key space only for all user permissible operations 🤔
Greate explanation , very clean
Great material, thanks a lot!
This is such a great explanation.
For a single table, how do you efficiently keep track of dynamo stream events, and make sure not all events make a stream event? Or do we just allow them to all make stream events and filter them out with a "--filter-criteria"?
One question we are calculating hash of a primary key to identify the partition so in case new partition(s) are added, don't we have reshuffle all the data?
What happens when you have the sort key encoded with your partition key (such as a user name) and then the user name changes? You have to go update user name in a bunch of locations? (in partition key and then in all the sort keys)?
imo you should not sort on something that users can change
The reason to limit the projection in a GSI is not primarily because of performance, it is because of storage costs. Do you want to pay real money for storing all of that data that you may not use in a GSI?
single table would be great if we can create more than 20 GSI per table. but now it seem we are limited 20 GSI per table.
This is generally solvable with GSI-overloading. The number of access patterns having 20 GSI’s enables when you’re using generic GSI PKs is insanely high
In his "why no JOIN" section, he goes over why 1 table is better than joining across 2 tables. But I have a use case where, e.g., I will only ever want either the org's info or all the org's user's info. So in that scenario, I could do 2 separate tables and never need a JOIN. Is this a scenario in which 2 table is okay, and 1 table won't offer better performance?
His example of "keeping track of LeBron James Twitter followers" is very similar to my use case. So in that scenario, I either want who is being followed (metadata) or their full list of followers, no JOINs needed.
Also have to think about things like monitoring... You now have to monitor two table... Two might not be too much but if the number of table increase, it can become of consequence. Like in everything, it's all questions of trade-offs. Understand what you're gaining/losing in the choice made
awesome!
Here in Brazil the book is very expensive, I really wanted to buy it, but equivalent to 500$
Hey Alex I have question regarding secondary index you mentioned while mentioning ticket handling, couldnt you just create USER#USER_ID as primary key and TICKET#TICKET_ID as sort key so we would have additional record with user as primary key and his ticket items as sort key
Hi, Alex doesn't see questions in here, but you can ping him on twitter @alexbdebrie
This was basically my question as well. Did you get an answer?
“Know your access patterns in advance” … so much harder to get this right than it seems, makes dynamo development so much slower and frustrating to work with…I found using much simpler access patterns to be easier to manage
the audio is out of sync with the slide by a lot
it's hard to see when he's talking about something. the slide is already on other topic
Might be on your end
I am little bit confused like
i have a table name : my-application-table
1st Item : pk= USER#12345 , sk = PROFILE#12345 , image= {some_url}, name = Taman
2nd Item : pk= USER#123456 , sk = PROFILE#123456 , image= {some_url}, name = Alex
3rd Item : pk = USER#12345, sk = POST#1 , postbody = This is some tutorial
4th Item : pk = POST#1 , sk = COMMENT#1 , comment = my own comment , user = 123456
In this case when i want to show user comment in the post along with the user image and image in the comment but in the comment section i wont have user image and name. So which approach should i take
1.Complex Data Type : I cannot use this because user might change image or name
2. Duplication : I cannot use this too because user image and name might change
What could be the correct approach for this type of structure or should restructure the table.
You have to allow some data duplication when switching SQL to NoSQL database. So, to handle user profile change, write some cloud functions to ensure data changed in all duplicated place.
16:35