RAND is too slow (in MySQL)

Поділитися
Вставка
  • Опубліковано 14 січ 2025

КОМЕНТАРІ • 45

  • @ijf03208rek
    @ijf03208rek Рік тому +10

    hey man your videos kick ass and i cannot thank you enough for your approach with these. your videos can be watched once and understood... every single one of them... i don't know how you do it, but the way you have picked to teach anything you teach is incredible. you freaking rock! thank you!

    • @PlanetScale
      @PlanetScale  Рік тому +2

      Gah that really encourages me. Thank you

  • @Wundero
    @Wundero Рік тому +6

    One other option for the "hash" variant is to convert the hash into a numeric column (take the bottom half or quarter of md5 (a 128 bit hash), for example) and then modulo that by some arbitrary number. The modulo would help produce groups that you could then filter by and randomize the order of. Ideally the number is determined by your data size, since a small number gives larger groups, so if you have like 1000 rows, % 3 or % 7 might work well, while on millions of rows, something % 17737 might work better. The concept is similar, in theory, to a hash table, where you would take the hash % the size of the table to figure out what group to put them in. This also lets you "seed" some of the randomness, especially if you do the remaining randomness in your application, which can let you have randomized pages of data, and you can just change the modulo if you want to change the "seed".

  • @olivierm.594
    @olivierm.594 Рік тому +1

    On your first example, you may have collisions (same id appears twice or more), i think a simple "distinct" could avoid that (if the list of ids you're parsing is small enough)

  • @samho321
    @samho321 Рік тому +3

    in my case, primary key is uuid, i can randomly pick 4 char inside 32 length from primary key and sort it and limit it, also i able to make those 4 random positions in to a random seed for pagination

  •  Рік тому +20

    Another way: Create a generated decimal column, with default value is rand() function. Add index to them and use-it.

    • @PlanetScale
      @PlanetScale  Рік тому +3

      Good idea! That would lead to rows being really "clumpy" though. Rows with the same or similar random numbers would be returned together. Also a generated column can't have a RAND default unfortunately

    • @IsaacWalkercox
      @IsaacWalkercox Рік тому +1

      @@PlanetScale why can't Rand() be the default for a generated column? Too expensive computationally?

    • @devhammed
      @devhammed Рік тому +4

      @@IsaacWalkercox Because the function is not deterministic, A function is deterministic if, given the same data in tables, multiple invocations produce the same result, independently of the connected user. This is generally in place for features like master-slave database replication.

    • @IsaacWalkercox
      @IsaacWalkercox Рік тому +2

      @@devhammed makes sense. The index would need the same information per user. Seems like something they could get round with some sort of wider context though

    • @RandomGeometryDashStuff
      @RandomGeometryDashStuff Рік тому +1

      does this make result always same?

  • @tylerslater
    @tylerslater 11 місяців тому +1

    What client GUI are you using?

    • @troublescoot
      @troublescoot 11 місяців тому

      Looks like he's using TablePlus

  • @kriansa
    @kriansa Рік тому +3

    Would that inner join at 13:14 be effectively replaceable by a where exists keeping the same performance?

    • @mme725
      @mme725 Рік тому

      I'm wondering this as well.

    • @balduin_b4334
      @balduin_b4334 Рік тому

      If I had to make a guess, I would say that it is less performant, the inner join is a subquery, so it is executed first, and then it is getting joined on the primary key of the bookmarks table.
      "where exists " I would think is getting looked up (or even executed) for each row. (not an expert btw)

    • @OliverONeill
      @OliverONeill Рік тому

      He did explain this around 6:00. RAND() is a volatile function, so it being in the WHERE clause calls RAND() over and over again for each row.

  • @___dyego
    @___dyego 11 місяців тому

    You're an amazing teacher!

  • @gfni
    @gfni Рік тому +3

    Great video as always! 👏

  • @jacksonzamorano
    @jacksonzamorano Рік тому +1

    Wow! I never knew about covering indexes. I don't need to do a random pull like in the video but I could definitely put those indexes to use. Great video, keep it up!

    • @PlanetScale
      @PlanetScale  Рік тому

      Check out the video right before this I've, it's all about covering indexes! The one with the timer in the thumbnail

  • @tamasnogradi4524
    @tamasnogradi4524 Рік тому

    This was really interesting mate! Love your videos, you filled a huge gap in related db topic for a lot of people.

  • @sungvadan
    @sungvadan Рік тому

    Thank you so much for the video. I have an application that I use to learn vocabulary. It sends me ten random words by email every day. I'm using the random() function, which is fine because the vocabulary table is small. However, I've always wondered if there are better ways to do this. Thanks to your video, I now know some alternative methods.

  • @cafehodge
    @cafehodge Рік тому

    for gaps in auto incremented ids or uuids you could create a row_number with a window function to use the rand() function against and guarantee results will be returned. but idk how computationally expensive that’d be.

  • @roganl
    @roganl Рік тому

    Excellent Video. Picking up where you left off, using the hash column index to expedite the lookup, how would generating rand (n) hex values improve randomness (16^4 space) over just using two hex buckets ? Would it make sense to generate multiple two digit hash columns and use them both?

  • @Kane0123
    @Kane0123 Рік тому

    If its fast enough, its fast enough... relevant in so many places...

  • @AlexCernat
    @AlexCernat Рік тому

    yeah, deferred joins and covering indexes rules! great idea using hashes for "randomness"!

  • @IvanMontilla
    @IvanMontilla Рік тому

    I'm pretty sure someone will implement a random password generator this way, by storing all possible passwords in a table and then letting MySQL do its magic.

  • @mamad-dev
    @mamad-dev Рік тому

    i have a question, what is the application that he's using ?

  • @vasiovasio
    @vasiovasio Рік тому

    Super Useful Video! Thank you, Aaron!

  • @DuniC0
    @DuniC0 Рік тому

    I love videos that get me into start thinking.
    What if you strip down the size of the data several orders of magnitude with something like WHERE RAND() < 0.001
    And then just ORDER BY RAND() only for the subset?

  • @hannielgutierrez9139
    @hannielgutierrez9139 Рік тому

    If i try to make a virtual field an index, then Maria DB tells me this:
    1901 - Function or expression 'concat(`fecha`,' ',`hora`)' cannot be used in the GENERATED ALWAYS AS clause of `fecha_hora`
    Does it works just in MySQL?

  • @daniel_petrica
    @daniel_petrica Рік тому

    I got a similar problem with my plantescale free db where my bad code hanered the rows read the first month with the order by rand().
    I ended up doing a less random solution using cache and code.
    In php I generate a random int inside the count of table rows (so i dont care about id gaps), and i get n rows from the db with that int as an offset. Then I randomly order them in php and return an array. As I increase the amount of rows extracted, I'm caching that query in Laravel for a x amount of time and on each request I pick from the cache as with a big enough amount of rows the limited randomnes is good enough in my application.
    To reduce the time in the wire, I only select the columns I need for the use case, and I have an index on the columns needed.
    The cache can be generated by a worker to avoid having a slow request every x second to the web user.

  • @ahmad-murery
    @ahmad-murery Рік тому

    I liked the bucket method,
    Thanks Aaron!

  • @Voltra_
    @Voltra_ Рік тому

    Reservoir Sampling. 'nough said

    • @PlanetScale
      @PlanetScale  Рік тому +1

      Perhaps not quite enough, I've never heard of Reservoir Sampling! Will have to look that up.

    • @Voltra_
      @Voltra_ Рік тому

      @@PlanetScale here's the cppcon talk I've learned about it from: ua-cam.com/video/YA-nB2wjVcI/v-deo.htmlsi=H40wG1Iha9yOygHZ (I resisted the urge to make it a Rick roll)

  • @RayHorn5128088056
    @RayHorn5128088056 Рік тому

    Redis the rows and choose randomly from cache. Problem solved.

    • @PlanetScale
      @PlanetScale  Рік тому

      Redis the rows? I'm not sure what that means

    • @RayHorn5128088056
      @RayHorn5128088056 Рік тому

      @@PlanetScale Cache = Performance. Kudos on doing all that work to get your database to cough up the data but caching is always faster.

    • @PlanetScale
      @PlanetScale  Рік тому

      I understand that 🤗 But are you saying store all the rows in Redis? "Redis the rows" is unclear to me

    • @RayHorn5128088056
      @RayHorn5128088056 Рік тому

      @@PlanetScale I am always saying cache just enough data to speed up the process. This should always be the goal. Trust me, there is an optimal solution involving a cache.

  • @youtubehotdogdogyoutube
    @youtubehotdogdogyoutube Рік тому

    Yeah, maybe in your SQL