Best Practices Working with Billion-row Tables in Databases

Поділитися
Вставка
  • Опубліковано 3 жов 2024

КОМЕНТАРІ • 146

  • @elultimopujilense
    @elultimopujilense 4 роки тому +201

    The only thing about this channel that makes me feel awful is that i didnt discovered it earlier. The topics discussed here are not really common, but extremely important in my opinion. I havent found anything that gets near this channel, some topics are so high level that I havent heard of them at all. Makes me feel like I know nothing, which is the best feeling ever. There is no much to learn! Thank you so much!

    • @oscarbarajas3610
      @oscarbarajas3610 4 роки тому +6

      I totally agree with you. This guy makes my time spent worthful.

    • @saurabhsharma7123
      @saurabhsharma7123 3 роки тому +6

      Exactly! UA-cam hasn't reported me any similars yet😂

  • @zacharythatcher7328
    @zacharythatcher7328 2 роки тому +17

    This idea of transitioning the delay to the writer AND then using queues for writes is true architecting foresight. I love it.

  • @achraf3117
    @achraf3117 4 роки тому +42

    I'm a junior Software Engineer and working my way through to specialize with Backend. I always learn new concepts from your videos. This one in particular hits home for me because I've worked on couple of projects were I had to make the design choice of the database. I found it quite difficult to make the right choices because I always end up building a search endpoint with full text search and other search parameters.
    To put it into context, the databases I had problems with were filled with recipes. I had multiple tables with 100k+ entries filled only with IDs (the problem you mentioned). After two projects I think I switched to the latest idea you mentioned with the list/json column and that one worked the best for me. Because not only it avoids searching through a big table but also saves me an extra query to another table.
    This is kind of irrelevant the this video but when implementing full text search I think it's better to go with postgresql rather than mysql since it supports gin and gist indexes and fuzzy searching that can help build a nice, affordable and quick solution to meduim sized databases.
    Keep doing the nice work.

    • @hnasr
      @hnasr  4 роки тому +5

      Thanks for sharing your thoughts Ashraf appreciate it. this is useful

    • @0161rt
      @0161rt 3 роки тому

      Isn't that best text search db is Elasticsearch??

    • @tikz.-3738
      @tikz.-3738 3 роки тому

      It's kind of related to data structure instead of one to one ur having one to many relation per row

  • @andreivilla9009
    @andreivilla9009 3 роки тому +4

    Man your content is GOLD. I come from a front-end stack and was underestimating the work with databases. But your content has helped in understanding the pitfalls of backend engineering.

  • @elmehdisaniss2731
    @elmehdisaniss2731 4 роки тому +24

    Your content is top bro.

  • @t529615
    @t529615 4 роки тому +8

    Yet another great video! Truly educational, even for someone who have been in the game for over 15 years. 👏

  • @joaopedromoreiradeoliveira4273
    @joaopedromoreiradeoliveira4273 3 роки тому +1

    You help not only software engineers with your channel, but also data analysts like me that works a lot with data engineering. Thanks! Greetings from Brazil.

    • @hnasr
      @hnasr  3 роки тому

      Love to all subs from beautiful Brazil 🇧🇷

  • @insearchof9090
    @insearchof9090 4 місяці тому

    I love you man. You are so crystal clear. you are legend.

  • @nilanjansarkar100
    @nilanjansarkar100 4 роки тому +27

    generally, it's helpful to think in terms of read and write paths of your data. On the read side, on top of partitioning you can add bloom filters to quicky test where a value exists or not to reduce searching the B-Tree or other persistent data structures

    • @hnasr
      @hnasr  4 роки тому +9

      nilanjan sarkar nice idea Nilanjan, Completely forgot about bloom filters probabilistic data structure..
      For anyone interested check out this video I discussed it before Bloom Filters Explained by Example
      ua-cam.com/video/gBygn3cVP80/v-deo.html

    • @vinny142
      @vinny142 4 роки тому +7

      " On the read side, on top of partitioning you can add bloom filters to quicky test where a value exists"
      That would mean scanning the bloom for every partition and the results you get will contain false-positives meaning you will still scan indexesof many partitions that wil not contain any matches. It's better than doing an index scan on all partitions, but I get the impression that we're not talking about the same thing when we say "partitioning'.
      The point of partitioning is that you divide rows up by the value of fields so you can predict the partitions-to-search by looking at the query. If I partition a phonebook by the first letter of the last name then querying for "Johnes" will only use the 'J' partition and the btree of that table wil be 1/26th of the original table (roughly) and so scan much faster. I don't have to pre-filter to find out that I need partition 'J', the database knows that from the query.

    • @nilanjansarkar100
      @nilanjansarkar100 4 роки тому

      @@vinny142 Yes. This is more from an LSM tree perspective where an entry in a partition can reside both in memory and on disk via multiple versions

  • @PiyushChauhan2011
    @PiyushChauhan2011 3 роки тому

    I’m front end developer acquired backend skills, all because of good content from this channel. Amazing content and energy, thank you 😊

  • @Bilo_7
    @Bilo_7 4 місяці тому

    I really like listening to you while doing other stuff, like driving eating or walking. Always being productive and learning new things "sidely" by your videos. Habibi ❤

  • @arianseyedi
    @arianseyedi 3 роки тому +3

    Maybe I didn't understand the second last section about eliminating the need to update both ends of a connection, but your solution will crumble when person A who is following person B closes their account since the information on who follows person B is only in the person's B's records. So when person A exits the medium, we won't know who to update.

  • @rajendiranvenkat5992
    @rajendiranvenkat5992 4 роки тому

    I've got many solution on database side from your video. Thanks for your support.

  • @aX4UTL1O9
    @aX4UTL1O9 4 роки тому +7

    2 extra ideas:
    1. table archiving: most large tables are caused by timeseries records -- just archive the old records in separate tables and keep the live table small
    2. use modern databases that are more scalable than traditional single-host databases: cockroachdb, spanner, aurora, tidb, fauna, etc.

    • @hnasr
      @hnasr  4 роки тому

      Great idea if its time series and you can afford to archive do it..

    • @vinny142
      @vinny142 4 роки тому +1

      I'm a big fan of archiving, just make sure that you don't shift the problem from the "live" table to the archive table, if you keep archiving to the same table for a year then that table will become impossibly huge and essentially unusable and you end updeleting it. A simple cronjob can create a new monthly table or something.
      Partitioning is in effect a type of archiving but with a transparent api so you can just run a query and the database will choose the partitions by itself.
      As for more scalable database engines; that would depend on wether those can still deliver the functionality you need. For example; any app dealing with money cannot live without transactions and referental integrity because one mistake can cost millions.
      Also; is't not exactly true that traditional databases are single-host. Replication is easy to setup to get more nodes involved and sharding is realtively easy to built into an app or a proxy.

    • @filippetrovic2596
      @filippetrovic2596 7 місяців тому

      Are we archiving entire db or just a table? How do we join between archived and non archived dbs? Because if there are joined users that do not exist anymore archive will not work. I guess we need to snapshot the db or at least more than one table that is problematic

  • @Sarwaan001
    @Sarwaan001 3 роки тому +1

    Woah the json method literally blew my mind.

  • @OmarBravotube
    @OmarBravotube 3 роки тому +1

    Couldn't have explained it simpler! Big ups dude!!

  • @4am4i
    @4am4i 3 роки тому +1

    Good video. The last trick in the video is called denormalization. Also as soon as you introduce sharding you need also add replication because the probability of failures increases.

  • @videosforthegoodlife2253
    @videosforthegoodlife2253 Рік тому +2

    I had to deal with tables with a few billion records per month, and MySql merge engine (merge tables) let us slice and dice them any way we needed. You can even have more specific indexes in the real tables themselves, as long as each merge index exists in all tables.
    The downside of Merge tables is that it multiplies the open file handles on the system, which can be tricky for a machine doing public networking, but with latest kernels, you can get kind of crazy.
    High memory makes a huge difference, of course.

  • @GurpreetSingh-pl8uz
    @GurpreetSingh-pl8uz 3 роки тому +2

    In json design, the followers list column can grow very large to millions and can cause speed issues there

  • @rashmidewangan8627
    @rashmidewangan8627 3 роки тому +1

    Fabulous content on your channel. Subscribed. Thanks !! Keep up the good work :)

  • @simonedeluca104
    @simonedeluca104 4 роки тому +1

    Hi Hussein, thank you for your awesome videos!!! They're really intresting!

    • @hnasr
      @hnasr  4 роки тому

      Simillo De Simo ❤️❤️ thanks 😊

  • @alichoobdar9030
    @alichoobdar9030 Рік тому +2

    I believe the last concept is actually called denormalization. Another option could be considering NoSQL.
    By the way, you are awesome

  • @sachinkalkur
    @sachinkalkur 4 роки тому

    top notch content, i just encountered similar problem at my work place. thanks.

  • @AbraKadabra-lr9nq
    @AbraKadabra-lr9nq 2 місяці тому

    great work there Hussein. could you please shed some light on database migration strategies. Thanks.

  • @dinakaranonline
    @dinakaranonline 4 роки тому +3

    Thanks for the video . I did a first time UA-cam 'applaud' feature with a small token of 100 Rupees . Hope you recieved it . Keep going 🙏🙏🙏

    • @hnasr
      @hnasr  4 роки тому

      Dinakaran S thank you so much Dinakaran for the support!

  • @p4s153
    @p4s153 3 роки тому

    Apart from great contents you inspires me to love and play with technology....you are awsm brother..

  • @salehdanbous6783
    @salehdanbous6783 2 роки тому +4

    Sometimes you cannot redesign the table or escape getting at billion rows tables. For example, the journal entry for real-time payment system gets very large in very few months. Moreover, you cannot resort to NoSQL databases since these data are ACID in nature in which eventual consistency is not applicable. I guess you will end up using all the aforementioned techniques like: Indexing (Obvious One), Partitioning and Sharding. One problem with Partitioning though is that most ORMs are not Partition aware. On the other hand, you have to be careful of new distributed and horizontally scalable databases like Cockroach DB, YugaByte,etc,. Because there is always a catch.

    • @vilkazz
      @vilkazz Рік тому

      I did have a field day implementing partitioning on top of JPA and Postgres a few years back. Works like a charm, but requires a lot of low level fiddling to get it to work

  • @brucewayne2480
    @brucewayne2480 4 роки тому +3

    In case of a twitter like app, I will go certainly go for a graph database like Neo4j and use another database to store indexes. Each user is a node and each followingX relation is an edge, It will limit the join operations

    • @hnasr
      @hnasr  4 роки тому +2

      Bruce Wayne fantastic thoughts, Graph databases are designed for this use cases so they will be optimized for this.. that will be another way to avoid that large table .. I don’t know how graph databases work though so not sure of their scale and how do they store things

    • @brucewayne2480
      @brucewayne2480 4 роки тому

      @@hnasr If I'm not wrong , they store the addresses in one file and the data in other ones , so if you find a node using indexes , you will find next to it the physicial address of each edge related to it , it's just a O(1) read operation, and so navigate in the graph by just simple read operations instead of complex joins.
      And this is an article introducing sharding for neo4j
      neo4j.com/whats-new-in-neo4j/scalability/
      Thanks for your videos , I'm learning a lot of new things each day

    • @vinny142
      @vinny142 4 роки тому +2

      Does the twitter example lend itself to Graph? The twitter database is basically three tables with single-field relations.The joins in the queries don't go deeper than one level. Graph is excellent for large...well graphs,where nodes are linked to nodes who are linked to nodes etc etc , but twitter doesn't d that.(unless you perhaps want to query all the users who follow all users who follow all users who follow allusers who follow Obama,but that's easily solved using recursion in SQLand really doesn't require graph.

    • @MartinsTalbergs
      @MartinsTalbergs 4 роки тому

      DGraph has nicer query syntax than Neo4j imo

  • @sany2k8
    @sany2k8 4 роки тому

    Awesome as always, hands on examples of sharding, partitioning and indexing will double awesome bro.

    • @hnasr
      @hnasr  4 роки тому +1

      Glad it helped Thank you so much!

  • @mennahamedo8204
    @mennahamedo8204 4 роки тому

    Thank you for your amazing content, I am really happy that I found your channel

    • @hnasr
      @hnasr  4 роки тому

      Thanks Menna 🙏😊

  • @Makersdiary小匠记
    @Makersdiary小匠记 3 роки тому

    Truly a hidden gem

  • @ghassanaljaafri9368
    @ghassanaljaafri9368 11 місяців тому

    Thank you for the content it's helpful,
    I think the problem with your design is that when you try to discover all the profiles you follow, you will end up in trouble.

  • @decipheringthematrix56
    @decipheringthematrix56 9 місяців тому

    Sounds like describing Cassandra and MongoDB =D. Cheers

  • @AchwaqKhalid
    @AchwaqKhalid 4 роки тому +4

    GPU based query is the way to go 💥

    • @minscj
      @minscj 3 роки тому

      how is that done?

  • @lucasbastos8019
    @lucasbastos8019 4 роки тому +1

    Awesome content, thanks!!!

  • @EddyCaffrey
    @EddyCaffrey 4 роки тому

    Great topic. Very helpful 👍🏾. Truthfully, I always go with the last option 😅 because most of my projects will never have a billion rows. Now I have others options too. Thanks.

    • @hnasr
      @hnasr  4 роки тому

      Glad it was helpful Eddy! thanks

  • @natem889
    @natem889 3 роки тому

    I liked all your ideas. Although I was waiting if you could talk about data archival and purging. Something I tried was on archival of data to another database and purging it from the original database which may not be accessed frequently.

  • @sarveshsawant7232
    @sarveshsawant7232 10 місяців тому

    Great video,question onf twitter design: what happens when someone unfllows you?

  • @javaprogramming7235
    @javaprogramming7235 8 місяців тому

    I am planning to make steak of daily watching your video(s). It's really amazing content 😁✌️ Great work. Let's see how much consistent I be in this activity 🎉
    #🔥 (1)

  • @ryanisthewind
    @ryanisthewind 10 місяців тому

    wait saving data like that OK?
    I had the same mindset but I thought It was wrong because no one did that around me.
    thank you, amazing talk!

  • @IdreesDargahwala47
    @IdreesDargahwala47 4 роки тому +3

    Thanks for the video man! Loved it!
    In the redesigning the DB part of your video, taking the Profile example, where you have a column or a key which stores the list of followers, my doubt is will that not exceed the maximum limit of the row or a document and make it extremely heavy while retrieving one follower ? I think it's an open-ended discussion and totally depends on the use case.

  • @flygonfiasco9751
    @flygonfiasco9751 3 роки тому +1

    Kinda don’t like the idea of having JSON tbh because you can’t enforce foreign keys and you, in this example, couldn’t get a list of your followers.
    That said, love your content. Thanks for the videos!

  • @TheRizwanSharp
    @TheRizwanSharp 2 роки тому

    Great video Hussein. Just wanted to share my thoughts as I have first hand experience with a table container over a half billion rows.
    I don’t believe, a billion row table is that big of a deal for a modern RDBMS if the schema is well designed and proper indexes are in place (to support the common query patterns).
    I have a database with over half billion rows which works flawless on a medium sized hardware. (Azure Sql database)
    Indexes go a long way, then partition and then shard as you mentioned.
    Also before making things complicated by adding shards etc, you can always ask yourself “is there data in the table that I can offload/archive/delete to some other table?” The answer in most cases would be yes. This should save your from a lot of additional complexity.

    • @user-bk5xo1gj7k
      @user-bk5xo1gj7k Рік тому +1

      Rizwan, might i ask what database technology you are using?
      I have been working on a similar table (well, multiple tables with around half a billion rows) and i have faced the performance issues. I have tried it on postgres as well as mysql. Indexes definitely make things better, however, in case of searches, even they are not enough. iterating the table sometimes took me days. I didn't design the structure, but have to maintain it. The only way left was sharding for me. If you hhave any other tricks and tips, please share!

    • @TheRizwanSharp
      @TheRizwanSharp Рік тому +1

      @@user-bk5xo1gj7k Hello, I am using Azure SQL Database but I would think others (MySql, Postgresql etc) perform very similar.
      For instance, here is the table I had that has:
      213,461,562 Rows (almost a quarter million)
      I ran a query on it by filtering on a column that is indexed (datetime) and selected top 100 rows and here are the results:
      SQL Server parse and compile time:
      CPU time = 0 ms, elapsed time = 0 ms.
      (100 rows affected)
      SQL Server Execution Times:
      CPU time = 0 ms, elapsed time = 0 ms.
      Completion time: 2023-01-24T08:34:59.6096640-06:00
      As you can see, it was barely anything for the server.
      A few things to look at:
      1) Is your table indexed for the query you are running against it?
      2) How much data is part of the query result?
      This is why my original comment stated that you barely need sharding (partition should come before sharding) until maybe you get into multi billions and space on a single server becomes a problem). Sharding is no easy topic when it comes to properly implementing it.
      Here is something that may be of help in your optimization scenario:
      use-the-index-luke.com/
      Best of luck and let me know if you have any questions.

    • @user-bk5xo1gj7k
      @user-bk5xo1gj7k Рік тому +1

      @@TheRizwanSharp thanks a bunch for reply!

  • @maikalkumar4382
    @maikalkumar4382 4 роки тому

    Shard - partition - index - parallel processing

  • @positivity1016
    @positivity1016 4 роки тому +1

    Great content 👍👌

  • @_SoundByte_
    @_SoundByte_ 4 роки тому +1

    Can you please talk about different databases and their pros and cons. Lsm, btree, column oriented, rdbms etc

    • @hnasr
      @hnasr  4 роки тому

      swapnil kale discussed this some what in my database engine course
      Database Engines Crash Course (MyISAM, Aria, InnoDB, XtraDB, LevelDB & RocksDB)
      ua-cam.com/video/K9Qd3UMHUQ4/v-deo.html

  • @serhiihorun6298
    @serhiihorun6298 3 роки тому

    You are amazing man!

  • @mayankpandey8271
    @mayankpandey8271 3 роки тому

    One thing you could add id vertical partition. Otherwise, top content.

  • @CrzyGazara
    @CrzyGazara 4 роки тому +10

    Best Practices Working with Billion-row Tables in Databases
    :
    Don't Work with Billion-row Table :)

  • @JoaoPinela
    @JoaoPinela 3 роки тому

    Very Nice content 🥇
    I would just ask: isn't sharding just a type of partitioning?
    Sharding = multi-host-partitioning?
    Keep up the good work.
    Very interesting 👏👏

  • @shashibhushan3289
    @shashibhushan3289 4 роки тому +1

    your content is dope bro

    • @hnasr
      @hnasr  4 роки тому

      Shashi Bhushan your dope thanks Shashi ❤️

    • @happyandinformedlife1212
      @happyandinformedlife1212 4 роки тому

      @@hnasr Can you make a similar one for API design?

  • @shubhamchopra5518
    @shubhamchopra5518 3 роки тому

    Hi Hussein, Love your videos. Can you please create a video on creating index on production tables without having a lock on the table.Maybe you can talk about Online DDL, percona , memory consumption and write latency impact.

    • @hnasr
      @hnasr  3 роки тому

      I actually discussed this on my indexing video. You use create index concurrently on postgres ua-cam.com/video/-qNSXK7s7_w/v-deo.html

  • @tobiowolawi5044
    @tobiowolawi5044 4 роки тому

    I'm currently having this problem right now with my current company. another way could be just archiving old data if they are not often accessed and creating a mechanism to load up the archive data whenever needed. for example transaction tables

    • @hnasr
      @hnasr  4 роки тому +1

      Correct archiving is another remedy for this Tobi agree.

  • @RealEstateAPBRDev
    @RealEstateAPBRDev Рік тому

    Will your last solution work for the user with millions followers ? I can not wrap my head around. That's a huge amount of data for a single document/column

  • @sureshchaudhari4465
    @sureshchaudhari4465 10 місяців тому

    if we partition data how do we run sort order select query efficiently since it will hit multiple partition

  • @woltron4o
    @woltron4o 3 роки тому

    Nice video!

  • @auroshisray9140
    @auroshisray9140 3 роки тому

    Effective content

  • @rabbyhossain6150
    @rabbyhossain6150 Рік тому

    What about writing to a big postgres table. let's say, we have to do 1000 writes per second on a particular table.

  • @gmggx8944
    @gmggx8944 3 роки тому +1

    can NoSQL store Billions of datas... ??? or
    use SQL tables stuff... is good ?
    or is their NoSQL + SQL DB like the row json

  • @treyheaney9483
    @treyheaney9483 3 роки тому

    Is there a method of grouping the data by vector similarity and when querying, the subset that is queried is chosen by a vectorized term in the query?

  • @tmanley1985
    @tmanley1985 4 роки тому

    I'd like to see an example of how to handle sql joins when using shards. I'm guessing since you can't do joins using sharding, you'll have to pull everything in on the application side and glue everything together.

    • @hnasr
      @hnasr  4 роки тому +1

      correct that is why I try to avoid sharding when possible

    • @tmanley1985
      @tmanley1985 4 роки тому

      @@hnasr I guess that's a nightmare for reporting too.

  • @tanzimchowdhury320
    @tanzimchowdhury320 Рік тому

    I would like to ask you just one thing, how did you gather so much knowledge? As a software engineer myself, I can assure you this isn't something you get from work experience. And I'm guessing it was books? If so please do share your recommended reading list

  • @krramesh5
    @krramesh5 3 роки тому

    Will this also help in reducing the query time for JOIN? Can someone explain how join will work?

  • @nathanbenton2051
    @nathanbenton2051 4 роки тому

    Wow great video. What about using both Spark+Hadoop in conjunction with the index-partitioning-sharding approach. Would such a strategy even makes sense?

    • @hnasr
      @hnasr  4 роки тому

      Nathan Benton Works but so much infrastructure to support so the question becomes : is it worth it?

  • @toxiclife
    @toxiclife Рік тому

    what to do when I want to overwrite 100 millions of rows into new table, in minutes?
    df.write.mode("overwrite").saveAsTable("FINAL"), if you could please help with this?

  • @rashadarbab2769
    @rashadarbab2769 3 роки тому

    what about a hashmap type algo where you can be guided to where you need to go.

  • @na_you_mess_am
    @na_you_mess_am 3 роки тому

    This is a typical job interview question

  • @enlgn7050
    @enlgn7050 3 роки тому

    But....is there not a problem with consistency in case of sharding?

  • @mmu200
    @mmu200 3 роки тому

    What is your thoughts on searching a varchar(1000) column in a billion rows table. Note , only one column and keyword search. Like %xyz% . I know like query will be supper slow. So what will be the alternates. How to solve

  • @112BALAGE112
    @112BALAGE112 4 роки тому

    Hussein has done a fenomenal job of impersonating Nigel Thornberry in the thumbnail picture.

  • @echoky3189
    @echoky3189 3 роки тому

    What if someone writes a bot that just do follow and unfollows? Will that overload the database?

    • @hnasr
      @hnasr  3 роки тому

      It will of course, that is why you shouldn’t make an API that will easily allow anyone to do so. Or charge them so much to call that api

  • @pavankumardittakavi5325
    @pavankumardittakavi5325 3 роки тому

    Hussain, how does a day in your life as an Engineer span out?. How do you manage to get time for your youtube videos?.

  • @muhammadkhan-qd-c5996
    @muhammadkhan-qd-c5996 3 роки тому

    awsome

  • @Flankymanga
    @Flankymanga 4 роки тому

    Regarding sharding: Isn't it better to put the database instances into a cluster and let the master take care of sharding so that the client will not need to know which db instance to query - it will only need to know the master? Im not a DB guru but i always thought that this is how it works???

    • @hnasr
      @hnasr  4 роки тому

      Correct, Some database vendors provide such feature yes but not every DB has this feature and for the longest time the engineer has to built this themself. I believe Vitess solves this for mySQL. Cassandra also has consistent hashing which distribute the data across shards..

    • @vinny142
      @vinny142 4 роки тому

      @@hnasr PostgreSQL can do it with Foreign-Data-Wrappers; essentially you create a partition for a table and stick it on a different node through FWD.
      www.percona.com/blog/2019/05/24/an-overview-of-sharding-in-postgresql-and-how-it-relates-to-mongodbs/
      If you route all traffic for all shards through the root node that means you can still only go as fast as the root node, you're not leveraging the full capacity of each shard. Look at geaographical sharding where each shard holds the data that is used for a region, completely bypassing the 'root' node.

  • @andreigatej6704
    @andreigatej6704 4 роки тому

    Thanks for the great video. I'm learning a lot from this channel. Could a junction table be used in this case as well? For example: id(primary key), followed_id(foreign key), follower_id(foreign key) ...; this would mean a single added row each time the 'follow' button is pressed and, if needed, an inner join to see the relevant details of the followers

    • @user-bk5xo1gj7k
      @user-bk5xo1gj7k Рік тому +1

      no i don't think that'd work. see, the issue is the number of rows in the end. junction table would be larger than the 2 tables that table is a junction between. the junction table could grow to multiple billions of rows and we would end up with massive table.
      (hope i understood your question correctly)

    • @andreigatej6704
      @andreigatej6704 Рік тому

      @@user-bk5xo1gj7k yes, that's right. Thanks! I'm wondering now whether there are any other approaches to this problem, apart from those already mentioned. Maybe as far as the DB schema is concerned.

  • @santoshbhatnagar2155
    @santoshbhatnagar2155 Рік тому

    How can we efficiently search in the JSON data? can someone share some tips on this?

  • @sojuthomas7727
    @sojuthomas7727 3 роки тому

    HI, Thanks for the video.This gives me an insights of how to design a very large database. But can we do this partitioning and sharding on a relational database like MYSQL for example? if we do so I believe we can't leverage the full benefits of MYSQL IMO ,a data base with lots of partitioned table how JOINS will work then?

    • @videosforthegoodlife2253
      @videosforthegoodlife2253 Рік тому

      MySql has something even cooler - Merge Tables. Doesn't support putting tables on multiple drives, but there are a few ways to deal with that.

  • @sohahm
    @sohahm 3 роки тому

    Some people like me don't design the table in advance rather they look for a solution once there are 2 billion rows already.

  • @claushellsing
    @claushellsing 3 роки тому

    Can't we use something like a Time Series DB, Elastic or Postgress or Cassandra, they are famous for being able handle heavy loads of data. Btw how the Bitcoin DB works ? There are millions of transactions

  • @nixietubes
    @nixietubes 2 роки тому +1

    I keep hearing sharting

  • @rahulmasurkar6971
    @rahulmasurkar6971 3 роки тому

    What if there are millions of followers?? Would that create an issue for that column??

  • @m.m.4589
    @m.m.4589 4 роки тому

    With nosql no problems, just an JSON array of IDs

    • @hnasr
      @hnasr  4 роки тому

      hmmmm I think there must be a limit so we need to think scalability

  • @benmoskovits5816
    @benmoskovits5816 Рік тому +2

    You had great ideas until the 10 minute mark. What happens a follower of 3 or n people leaves or changes their name. How will you find the follower unless you scan every single user? Stick to a real relational design and this will never be an issue.

    • @avinashthakur80
      @avinashthakur80 6 місяців тому

      Can you give any reference to what design are you talking about ?
      What is the real relational version of it ?

  • @TechPrinc
    @TechPrinc 4 роки тому

    Help!! i am user of database so indexing is not granted to me and all the rows except id are of datatype text so i can't make partition with those. partitioning with Id is not an option because data stored randomly. i am junior developer so sharding is not option again. my requirement is to search user based on parameter like Country or job title or company or all of them. data is huge almost 1B+. so please give idea what i can do ?

    • @hnasr
      @hnasr  4 роки тому

      I don't think you can do much if, any backend dev/DBA can help you with your queries? you need to be able to have the proper indexes on the table to get better performance

  • @kamillatosinski3054
    @kamillatosinski3054 3 роки тому

    But why are billion-row tables slow? Why having nice indexes is enough for a couple million rows, but it’s not enough for a > 1 billion rows?

    • @hnasr
      @hnasr  3 роки тому +1

      Because the index grows as well and it eventually becomes slow to traverse it.

  • @a.yashwanth
    @a.yashwanth 4 роки тому +1

    putting followers in one column violates 1NF i.e., atomic values.

    • @hnasr
      @hnasr  4 роки тому +1

      assuming there is a field type called "list" than it shouldn't

  • @nagrajankola1323
    @nagrajankola1323 11 місяців тому

    ❤❤

  • @brazenbull36
    @brazenbull36 3 роки тому

    No one talks about scaling publicly like you do, and you do it for free.

  • @insearchof9090
    @insearchof9090 4 місяці тому

    you missed: one more such video.

  • @rahulgoyal3780
    @rahulgoyal3780 4 роки тому

    your content is great but @10:20 u created list of users(mongodb/json) and that can grow infinite but also there is a limit of mongodb that 1 document cant exceed 16mb size.

    • @hnasr
      @hnasr  4 роки тому

      Rahul Goyal ohhh I didn’t know Mongo has that limit.. I wouldn’t say infinite though, Tyler swift had 50 million followers which is still manageable but still you are right .. someone mentioned graph database which can also be useful
      Thanks for sharing thoughts

    • @Iamhomkar
      @Iamhomkar 4 роки тому +1

      We could maintain a separate table, where each record being (user,list of follower Ids) which is queried only when required. You could break the total number of followers into multiple records. Something like paging from OS. Maybe userid+record_number as your primary key?

    • @rahulgoyal3780
      @rahulgoyal3780 4 роки тому

      Yes we can..@omkar

    • @vinny142
      @vinny142 4 роки тому

      @@Iamhomkar Technicaly yes, but if the database cannot hold the amount of data you need it to hold then you should move to a different database that can.

  • @alooooooola
    @alooooooola 3 роки тому

    So I can compress the whole video content into a single quote: "Move to no sql database" xD

  • @0xfaizan
    @0xfaizan 3 роки тому

    Try to make smaller videos

  • @mikinyaa
    @mikinyaa 3 роки тому

  • @auseralastname8466
    @auseralastname8466 4 роки тому

    𝘨𝘳𝘦𝘢𝘵 𝘪𝘥𝘦𝘢 𝘵𝘰 𝘴𝘦𝘵 𝘢 𝘵𝘦𝘹𝘵 𝘤𝘰𝘭𝘶𝘮𝘯 𝘸𝘪𝘵𝘩 𝘢 .𝘑𝘚𝘖𝘕 .
    𝕋𝕐𝕍𝕄 𝟜𝕥𝕙𝕚𝕤 𝕜𝕚𝕔𝕜𝕒𝕤𝕤 𝕔𝕠𝕟𝕥𝕖𝕟𝕥 𝕊𝕚𝕣.

  • @shei69
    @shei69 4 роки тому

    Can you share your views on fauna db ?

  • @rajendiranvenkat5992
    @rajendiranvenkat5992 4 роки тому

    I've got many solution on database side from your video. Thanks for your support.