How to Check if a User Exists Among Billions! - 4 MUST Know Strategies

Поділитися
Вставка
  • Опубліковано 21 лис 2024

КОМЕНТАРІ • 215

  • @codeillustrated
    @codeillustrated 2 місяці тому +109

    It's worth clarifying that with proper sharding, indexing and powerful enough machines (which a company with a billion users can afford) a single db lookup can be done in sub-milliseconds. So it's not expensive. The actual reason why a better solution like an in-memory cache is necessary is because of the number of simultaneous lookups (i.e. number of users trying to signup) is huge in such a scenario thus making even a sub-millisecond lookup time per query infeasible.

    • @Cassp0nk
      @Cassp0nk 2 місяці тому +2

      Databases have in memory caches.

    • @rishisoni3386
      @rishisoni3386 2 місяці тому +5

      I was thinking same, that a binary search will take O(50) to search on a database of Quadrillion, so what is the need of these, but simultaneous lookups are a valid reason to do so...

    • @tarunpahuja3443
      @tarunpahuja3443 2 місяці тому

      What about network delay ​@@Cassp0nk

    • @ayeameen
      @ayeameen Місяць тому

      What will be your shard key? We are trying to find if any user exists with the email address.

    • @antnauman
      @antnauman Місяць тому

      @@ayeameen we shard a DB not a table. So sharding here would make sense if the db has only one table or only user related tables. Anyway, I think if we keep it simple and just increase the data then we better use partitioning. It will create virtual sub-tables of a table with each having its own B-Tree for indexing. And we can also partition by starting alphabets maybe. Even with this model we'll have to search among dozens millions of rows instead of billion.

  • @adi36908
    @adi36908 Місяць тому +10

    Instead of number of oops concepts videos in UA-cam tech channels....i found it very very useful as she is explaining the real time usecase.... thank you...

  • @universeguide1996
    @universeguide1996 Місяць тому +12

    - Unique indexing or hashing: Standard and most effective for quick lookups.
    - Sharding: Ideal for distributed systems and extremely large datasets.
    - Bloom filters: Fast in-memory probabilistic checks to avoid unnecessary database lookups.
    - In-memory caching: Extremely fast for frequently queried user data.
    - Partitioning: Optimizes database lookups by reducing the size of search spaces.

  • @anothermouth7077
    @anothermouth7077 2 місяці тому +21

    This was brilliant. As beginner people do not think of advance techniques which is fine, but after a while in the industry you do need to look for these advanced techniques used by top players.

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому +3

      Agreed! Let me know what you think about my latest video- HLL in the Redis usage.

  • @2bitsbyab
    @2bitsbyab 25 днів тому +4

    The video itself is great but the comments are gold. Learnt so much.

    • @TechCareerBytes
      @TechCareerBytes  25 днів тому

      Glad to hear it! Please check our other videos too 🙏

  • @ashwanigupta9991
    @ashwanigupta9991 2 місяці тому +56

    Please upload this type of video were U teach what tech giants optimize their API. one of the best video on youtube please keep-it-up..

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому +5

      Sure, pls share it in your circle too and support this channel. 🙏

  • @pushpendratiwari7800
    @pushpendratiwari7800 2 місяці тому +9

    This is the first time I saw your video I'm saying this on behalf of all viewers that you are an amazing teacher with brilliant communication and visualization presentation skills...
    Subscribed🎉

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Wow, thank you! Means a lot to me! Please share it in your circle 🙏

    • @AryaArsh
      @AryaArsh 2 місяці тому

      Agree.

  • @phoneix24886
    @phoneix24886 Місяць тому +1

    Even before watching this video I can give my thoughts on it and what I use on a daily basis.
    1. Hash partitioning - you generate a hash of your data (maybe 1 to 100000) and then generate a secondary hash further from 1 to 1000 say. Create a composite index of both. Generate the hashes before querying and query the user efficiently. Your data set is reduced.
    2. Sharding is a good way to augment the above strategy with lets say keeping the hash of hash as a shard key.
    3. On top of that a persistent caching like redis can be very useful.

  • @UChmn
    @UChmn 2 місяці тому +22

    Thanks UA-cam for recommending this.

  • @andytube07
    @andytube07 День тому +1

    Thank you Mam, you can very well name your channel as "Tech Goldmine"!

  • @sidhanshuraghuvanshi1
    @sidhanshuraghuvanshi1 2 місяці тому +5

    Surely recommend your channel to my team as well in my office. Thanks a lot for this type of video.

  • @YoungGrizzly
    @YoungGrizzly Місяць тому +1

    My first thought is to optimize your query to cut down your dataset. I think that might be after/within step 3. Great video, it’s awesome to see what bigger companies do.

  • @aniketsharma3154
    @aniketsharma3154 2 місяці тому +2

    Wow, Ma'am, Not many peoples will watch your content because it is very neniche, But please please please do not stop sharing the knowledge you hold. I am amazed by your knowledge, which i know is result of doing hardwork in the industry by investing many many years. Thankyou very much for making this content and sharing your knowledge for free.

  • @universeguide1996
    @universeguide1996 Місяць тому +2

    - Use a hash-based index to map user information eg; username in game to hash values. Hash tables have an average time complexity of O(1) for lookups.
    - Cache like redis if finance of project allow.

  • @chetananand4037
    @chetananand4037 Місяць тому +1

    This bloom filter stuff is ingenious.

  • @damien9255
    @damien9255 2 місяці тому +7

    Wow. Please upload more of such system design videos

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Definitely. Please share it in your circle 🙏

  • @amanpreetsingh2696
    @amanpreetsingh2696 2 місяці тому +2

    Understood the concept, Keep sharing your precious knowledge with use

  • @shashikantmarskole
    @shashikantmarskole 2 місяці тому +1

    Thank you for providing such a clear explanation with examples of production services. Great content, keep up the amazing work, Ma'am

  • @TheBikerDude2024
    @TheBikerDude2024 Місяць тому +1

    Starting my day with good learning through this video and comments.
    Greatly explained. Just subscribed.

  • @YouTube-Joy-Happy
    @YouTube-Joy-Happy Місяць тому +2

    Even if you use increment decrement bit array it won't solve the false positives problem, thereby it highly relies on hash function this is the one we should focus more on.

  • @overunityinventor
    @overunityinventor 2 місяці тому +11

    have you ever looked for a word in the dictionary?
    a dictionary has hundreds of thousands of words, but you still take less than 30 seconds to find your word let's say "luck",
    you go to section of L,
    then you go to section of U,
    then you go to the section of C
    and then you go to section of K
    and then you find your word.
    this works only on sorted database, and database should be sorted periodically so newly added, deleted and modified data can be sorted to reduce lookup time.

    • @hydtechietalks3607
      @hydtechietalks3607 2 місяці тому

      you made all that Killer BIT process, simple man...Thank yoU!!

    • @anothermouth7077
      @anothermouth7077 2 місяці тому +5

      Isn't this the database partition in the nutshell?

    • @jasonbourn29
      @jasonbourn29 2 місяці тому

      Thnxxx

    • @dom4068
      @dom4068 2 місяці тому

      Yea, that is why databases uses an Index, to speed up the data lookup.
      What you explained is the process used to find dat in an index ognized table.
      For other tables, if Indexed and DB thinks that index is going to be useful, the same lookup proceses is used on the index, and that results in the location where data is stored.

    • @puspendertanwar9378
      @puspendertanwar9378 Місяць тому

      ​@@anothermouth7077 not partition. It's indexing

  • @pradeep8841
    @pradeep8841 Місяць тому +1

    Superb exploration!
    Keep posting such tutorials.

  • @peacelover2002
    @peacelover2002 2 місяці тому +1

    Very simple and in plain, understandable way!!! Excellent explanation. Please keep more videos coming. Subscribed!!!

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Thanks, will do! Please share it in your circle 🙏

  • @RohithVishwanath
    @RohithVishwanath 2 місяці тому +2

    Thanks for the video. It surely expands the knowledge of engineering with all the conversation going on in the comments.

  • @rafysiddiky2463
    @rafysiddiky2463 Місяць тому +1

    Learned a nice concept and strategy today. Thank you.

  • @schrodingerscat6189
    @schrodingerscat6189 2 місяці тому +1

    Never thought this video would be this informative!!!!

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Glad it was helpful! Please share it in your circle and support this channel 🙏

  • @Adiyen1974
    @Adiyen1974 2 місяці тому +1

    Excellent technical presentation. Very good

  • @kiran.i
    @kiran.i 12 днів тому +1

    Excellent explanation thanks a lot..

    • @TechCareerBytes
      @TechCareerBytes  12 днів тому

      Glad you liked it. Please check our other videos too 🙏

  • @shamboghosh7340
    @shamboghosh7340 Місяць тому +1

    Thank you for the nice and detailed explanation. Have a question - what is the max length of values use in hash function so in this case what would be max email length? Is there any thumb rule for that

    • @TechCareerBytes
      @TechCareerBytes  Місяць тому +1

      SHA-256 always generates a fixed 256-bit (32-byte) hash, regardless of input length, so it doesn’t limit the maximum length of an email. Email length limits are typically defined by standards or application constraints - usually up to 254 characters as per the RFC guidelines.

  • @bandhammanikanta
    @bandhammanikanta Місяць тому +1

    Good one. It could have been a youtube shot. Good luck

  • @uubaidullah
    @uubaidullah 2 місяці тому +1

    Thanks UA-cam algo, very well explained video, subbed

  • @antarikshverma8999
    @antarikshverma8999 2 місяці тому +1

    This is very informative. Thank you. Hope to see mote videos like this

  • @tarungrover9841
    @tarungrover9841 2 місяці тому +1

    wow , I really loved it just watched it out of curiosity and learned a lot

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Happy to hear that! Please share it in your circle 🙏

  • @ĀRYAN_GENE
    @ĀRYAN_GENE 2 місяці тому +4

    instantly subscribed 🙏
    .
    system design and concepts for optimal performance

  • @Arjunsingh-cf7nf
    @Arjunsingh-cf7nf 2 місяці тому +1

    Great, Please upload more videos on these concepts !!!!

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Thank you, I will. Please share it in your circle and support this channel 🙏

  • @Fen-i3n
    @Fen-i3n 2 місяці тому +2

    Straight to the point , instant sub :)

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому +1

      Thanks! Do check out our other videos 🙏

    • @Fen-i3n
      @Fen-i3n 2 місяці тому

      @@TechCareerBytes Ya

  • @aryankr
    @aryankr 2 місяці тому +2

    Thanks for a great explanation.

  • @josephkohilan6230
    @josephkohilan6230 Місяць тому +1

    First time watching your videos and it was very informative. Thank you for your efforts and clear explanation.

  • @nikhilarya7712
    @nikhilarya7712 2 місяці тому +3

    what a explanation, subscribed, so basically we will need this bloom filters only when we have data over lakhs or in crores, not in thousands or hundreds in which cahcing can be used efficiently, right?
    And also this bloom filter will be used for signup only or any other scenario it will be beneficial in?

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому +2

      Yes, for the data size you mentioned, caching with database sharding and indexing would be good enough. Better to check with your architect.
      I have mentioned a few other scenarios companies like Google, facebook and hbase are using bloom filters for. Please check.

    • @nikhilarya7712
      @nikhilarya7712 2 місяці тому

      @@TechCareerBytes ok, thanks, will check that.

  • @kishan.0296
    @kishan.0296 2 місяці тому +2

    What an insightful video, thank you for sharing such an amazing knowledge. Subscribed!

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Awesome, thank you! Please share in your circle 🙏

  • @abimbolaobadare6691
    @abimbolaobadare6691 2 місяці тому +1

    This was so insightful, thank you so much.

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Glad it was helpful! Please share it in your circle 🙏

  • @kumarsharwanofficial
    @kumarsharwanofficial 2 місяці тому +1

    Excellent Ma'am, you have explained real-world scenarios. I am expecting such more videos. It could be more better if you create a spring boot application and implement those scenarios what you have explained. it would be so helpful. Thank you..🙏

  • @SpiritOfIndiaaa
    @SpiritOfIndiaaa 2 місяці тому +2

    Thanks Rupa mam

  • @Kc-nn8mn
    @Kc-nn8mn 2 місяці тому +7

    AI has two meanings.
    Artificial Indian and An Instructor.

  • @satyasaineelapala570
    @satyasaineelapala570 2 місяці тому +1

    Great work ma'am!!

  • @praveenkumar5419
    @praveenkumar5419 2 місяці тому +1

    Insight full video Tutorial with very good real world examples. Thankyou Mam..Keep sharing knowledge and experiences

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Thanks for liking. Please share it in your circle 🙏

  • @omkargurme20
    @omkargurme20 2 місяці тому +1

    Very good and straight to the point video. Also how to implement these methods in other languages

  • @dgtemp
    @dgtemp 2 місяці тому +1

    I would say handling multiple users logging in at the same time is the only concern. Most of the time redis will take care of this. And for a smaller application the in-memory database of the application would be sufficient.
    Also many users mostly try to solve the problem on the application level while their own database do not have indexes. A well designed table is far more efficient than creating hash functions.

  • @Dipj01
    @Dipj01 2 місяці тому +1

    Wow, I learned quite some new things from this! Thanks

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Glad it was helpful! Please share it in your circle 🙏

  • @TheKumarAshwin
    @TheKumarAshwin Місяць тому +1

    You have a new subscriber, ma'am. 🎉

  • @dom4068
    @dom4068 2 місяці тому +2

    To find a user in milliseconds, we need a combination of Geo-location based routing, caching , and database sharding.
    Yes, if the objective is to check, if the user exists or not only, bloom filter may be the way to go.

  • @b21hirejayeshnanaji71
    @b21hirejayeshnanaji71 2 місяці тому +3

    The video was really helpful mam. Thank you for the video.

  • @QuintessentialDio
    @QuintessentialDio 2 місяці тому +1

    Thanks 4 the info, You've now got a new sub😁

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Thanks for the sub! Please check my other videos too 🙏

  • @dailydose7904
    @dailydose7904 2 місяці тому +1

    Explained so well!

  • @yasabhishek
    @yasabhishek Місяць тому +1

    very knowledgeable video, thanks mam.

  • @abhishekkumarxxx123
    @abhishekkumarxxx123 2 місяці тому +1

    Such concept in this short video, really appreciate it. ❤

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Glad you liked it!. Please share it in your circle 🙏

  • @ArjunKumar-zu2kl
    @ArjunKumar-zu2kl 2 місяці тому +1

    Nice tutorial, learnt something new today, thank you so much Mam...

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому +1

      Glad to hear that. Please share it in your circle too! 🙏

  • @prithvisingh40
    @prithvisingh40 Місяць тому +1

    This was impressive

  • @shyamgurunath5876
    @shyamgurunath5876 2 місяці тому +2

    Sharding the database also helps in querying speed & performance!

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      It does. It also adds complexity to querying.Database sharding definitely helps with scaling, as it distributes the load across multiple servers. However, even with sharding, cache and Bloom filters add an extra layer of speed by reducing direct database queries, which is crucial for minimizing latency at a massive scale.

  • @TabishAbbasiDev
    @TabishAbbasiDev 2 місяці тому +2

    Please post more video like this

  • @MeetKoriya-cu7bm
    @MeetKoriya-cu7bm 2 місяці тому +1

    Great video, would suggest to improve the quality of screenshots also invest in good quality microphone

  • @vishaldas3439
    @vishaldas3439 2 місяці тому +1

    Awesome explanation and informative video, but I have doubt what if we add constraint over the email column itself, how would the DB behave then, will it check over all the entries, will that be same as querying over all the records manually? Thank you.

  • @sudheerkumar-tp1mg
    @sudheerkumar-tp1mg 22 дні тому +1

    Super Madam.

  • @abhish_mazumder
    @abhish_mazumder 2 місяці тому +1

    Great info 🙌🙌

  • @VishalJangid1
    @VishalJangid1 2 місяці тому +1

    Thank you 🙏 please upload video in 4K resolution if possible

  • @kiranpai8
    @kiranpai8 29 днів тому +1

    Love your videos mam. Can you please make a video on sorted sets data structure?

  • @arai_19999
    @arai_19999 Місяць тому +1

    Great. Useful.

  • @jayesh_15
    @jayesh_15 2 місяці тому +1

    Nice video 😊

  • @RajAmitSingh
    @RajAmitSingh Місяць тому +1

    amazing video thanks for sharing

  • @chinmoykarmoker2185
    @chinmoykarmoker2185 2 місяці тому +1

    Great lesson! keep it up ! Thanks! :)

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Thanks! Please share it in your circle and support this channel 🙏

  • @girishanker3796
    @girishanker3796 2 місяці тому +1

    What a way to explain💐📈

  • @richardharris202
    @richardharris202 Місяць тому +1

    Helpful 🙏🏼

  • @supun_sandaruwan
    @supun_sandaruwan 2 місяці тому +1

    Wow, Unique video, great content, Nice explanation.. Thank you so much madam, Please make this kind of unique videos, Subscribed.... ♥

  • @srivijaykalki4279
    @srivijaykalki4279 2 місяці тому +1

    Just awesome 💯💯

  • @_krishnaIsHere
    @_krishnaIsHere 2 місяці тому +1

    I Appreciate Amazing knowledge shared by, but please buy some good quality mic, your audio should more clean

  • @abhaytiwari5991
    @abhaytiwari5991 2 місяці тому +1

    Keep it up ma'am 👏

  • @kannaiyand2707
    @kannaiyand2707 2 місяці тому +2

    When to use consistent hashing ? Please explain with real use cases. Thanks Ruba

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Sure. You can check my videos on data partition and data replication. They cover consistent hashing.

  • @nitindhemiwal9174
    @nitindhemiwal9174 2 місяці тому +1

    Really helpful

  • @Thierry4Teen
    @Thierry4Teen 2 місяці тому +2

    Quite interesting in Bloom Filter, however if we combined those three, we will get the downsides of the others isn't it, imagine we use Bloom filter for low memory footprint and we use Redis for another validations so Redis still need to store these record ? And how could we do query the database for another validations with faster responses ?

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому +3

      Great question! Combining Bloom filters, Redis, and database queries balances trade-offs. Bloom filters reduce unnecessary queries, while Redis stores frequently accessed data for faster lookups. Redis doesn't need to store all records, just recent or frequently used ones. For database queries, we rely on sharding and indexing to maintain speed, with Redis acting as a buffer to reduce load.

  • @chandansahoo2925
    @chandansahoo2925 2 місяці тому +1

    Incredible !!!! spot on

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Thank you! Pls share it in your circle 🙏

  • @younesessaadani9303
    @younesessaadani9303 2 місяці тому +1

    The video is great but i wish to improve the quality of the images provided as examples

  • @DevAdityaGupta
    @DevAdityaGupta 2 місяці тому +1

    Super Awesome video, make more like it.

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому +1

      I will try my best. Thanks. Please share it in your circle 🙏

  • @diveshrajdhar
    @diveshrajdhar 2 місяці тому +1

    great video mam....Thanks

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Thanks! Please share it in your circle 🙏

  • @ravikirankalal
    @ravikirankalal 2 місяці тому +1

    Thank you

  • @dheebanm3207
    @dheebanm3207 2 місяці тому

    Extraordinary mam.

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому +1

      Thanks a lot 🙏 please share it in your circle.

  • @pransukh8250
    @pransukh8250 Місяць тому +1

    Finally some insights.

  • @MrXperx
    @MrXperx 23 дні тому +1

    Most startups (99%) don’t have billion customers. Those that do have already implemented a one time custom solution to this problem. I don’t understand the reason for such interview questions. Just do a db query on an index.

  • @DR-qz6ti
    @DR-qz6ti 2 місяці тому +1

    Thank you so much for the video ma'am. Can you please provide the link to the code? Its not clear.

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Yes, sure. Please check the description box for the link. Don't forget to share the video in your circle 🙏

  • @sharma-cartoon-channel
    @sharma-cartoon-channel 2 місяці тому +1

    Great Video

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Thanks! Please share it in your circle and support this channel 🙏

  • @fatakful
    @fatakful 2 місяці тому +1

    Nice content Mam.

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Thanks a lot. Please share it in your circle and support this channel. 🙏

  • @anishbishnoi29xD
    @anishbishnoi29xD 2 місяці тому +1

    ❤ nice makes video on schema migration and database migration

  • @charuwaka1
    @charuwaka1 25 днів тому +1

    Thats where Cassandra, HBase come

  • @a_l_o_k_1991
    @a_l_o_k_1991 Місяць тому +1

    Please tell about sharding and other concepts

    • @TechCareerBytes
      @TechCareerBytes  Місяць тому

      Please check this video - ua-cam.com/video/EoHh1NMeUJM/v-deo.html

  • @MeetKoriya-cu7bm
    @MeetKoriya-cu7bm 2 місяці тому +4

    I don't understand how caching will help for this particular problem, if I wanna check the email is already in used or not, hardly anyone else will try to check for the same email in near time (before cache expiry)

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому +2

      Good point! Caching helps mainly with frequently checked or popular usernames/emails. For unique queries, cache hits are rare, but it still reduces load on the database for cases where multiple users may check the same email (e.g., typos or common names). Caching shines more in scenarios with repeated access patterns, but other techniques like Bloom filters handle the uniqueness aspect efficiently.

    • @sangram6848
      @sangram6848 2 місяці тому

      I'd assume there would be lot of queries to check most common emails like
      Max@mail
      John@mail etc
      Ofcourse not affective for very specific email IDs and that's why I agree with the solution which combines multiple approaches.

  • @2005kpboy
    @2005kpboy 2 місяці тому +3

    Bloom filter

  • @rajnikam5101
    @rajnikam5101 2 місяці тому +2

    What about indexing, will it not be helpful and the best approach

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Indexing and database sharding will definitely help. But, at large scale we also need a cache and bloom filter to speed up the process.

  • @dilipkumarsharma2492
    @dilipkumarsharma2492 2 місяці тому +1

    one of the great video

    • @TechCareerBytes
      @TechCareerBytes  2 місяці тому

      Glad you think so! Pls share it in your circle 🙏

  • @ak-vo8ip
    @ak-vo8ip 2 місяці тому +1

    Really awasome.. (y)

  • @GnomeEU
    @GnomeEU Місяць тому +2

    You would never put all users in one database. You would create one database for every 1m users or something like that. And then just need a lookup in which database the user would be found. Eg first 3 digits of customer number or whatever.

  • @RahulThakur-th1cb
    @RahulThakur-th1cb 2 місяці тому +1

    Nice