06 - Hash Tables (CMU Databases Systems / Fall 2019)

Поділитися
Вставка
  • Опубліковано 28 чер 2024
  • Prof. Andy Pavlo (www.cs.cmu.edu/~pavlo/)
    Slides: 15445.courses.cs.cmu.edu/fall...
    Notes 15445.courses.cs.cmu.edu/fall...
    15-445/645 Intro to Database Systems (Fall 2019)
    Carnegie Mellon University
    15445.courses.cs.cmu.edu/fall...
  • Наука та технологія

КОМЕНТАРІ • 33

  • @marcoq7160
    @marcoq7160 3 роки тому +29

    0:15 DJ Drop Tables
    0:56 Administrivia
    1:45 Course status
    3:04 Data structures
    4:47 Design decisions (data organization, concurrency)
    6:55 Hash tables
    8:38 Constant factors matter a lot
    9:28 Static hash table
    10:54 Static hash table: assumptions
    12:20 Static hash table: perfect hash function
    13:24 Hash table design decisions: hash function, hashing scheme
    16:12 Today's agenda: hash functions, static/dynamic hashing schemes
    16:43 Hash functions
    18:35 Hash functions: CRC-64, {Murmur,City,XX,Farm}Hash
    20:30 Hash function benchmark (XXHash3 was (still is?) the best)
    22:42 A question on hash collision attack
    24:45 Static hashing schemes (linear probe, robin hood, cuckoo)
    26:23 Linear probe hashing (open addressing)
    29:28 Linear probe hashing - deletes (tombstones)
    35:22 Non-unique keys (separate linked list, redundant keys)
    37:30 Robin Hood hashing
    43:01 Cuckoo hashing
    47:22 Observation: resizing, dynamic hash tables
    48:49 Chained hashing
    50:35 Extendible hashing
    56:59 What is the relationship between a hash table and a buffer pool?
    1:00:10 Linear hashing
    1:07:28 Linear hashing - deletes
    1:10:40 Conclusion

    • @luisponce3580
      @luisponce3580 3 роки тому +4

      beast

    • @marcoq7160
      @marcoq7160 3 роки тому +2

      @@luisponce3580 haha, man, you're doing great! Thanks for these comments, they make me smile :) This is the penultimate timestamp comment I made. But you motivate me to get back to it again!

    • @luisponce3580
      @luisponce3580 3 роки тому +1

      @@marcoq7160 lol, appreciated!

  • @ashishacharya4001
    @ashishacharya4001 4 роки тому +23

    No "Hit it" at the end? I'm disappointed.

  • @hansu7474
    @hansu7474 4 роки тому +38

    SHA256 is not reversible. But I think professor conflated it with, say, RSA, which is reversible. Lecturers have to convey a lot of information so those small mistakes are negligible. But I'm surprised why students who used SHA256 and MD5 didn't point out the mistake so that everyone who are watching the video could benefit from it.

    • @andypavlo
      @andypavlo 4 роки тому +37

      Yes, you are right. I corrected myself about this in the next lecture: ua-cam.com/video/JHZFc4hMGhk/v-deo.html

    • @zhjwpku
      @zhjwpku Рік тому

      One more point, MD5 and SHA256 are digest algorithms(or hash functions), and RSA and AES are encryption algorithms.

  • @garfieldnate
    @garfieldnate 3 роки тому +3

    Java checks if the collection's keys implement Comparable, and if so it uses TreeMaps in the buckets instead of simple LinkedLists. In my undergrad data structures class, the first thing we learned about hashes is "this linear probing thing sucks, use linked lists in the buckets instead" (I never heard the term "chained hashing" before this lecture). So it was really eye-opening to hear the lecturer say, "in practice everyone uses linear probing"! I suspect the JVM default is a balanced generic implementation for many disparate use-cases, which has quite different requirements from what we would use in an RDBMS. In general I feel that this lecture filled lots of holes in my knowledge. Thanks for another great lecture!

    • @NaderHGhanbari
      @NaderHGhanbari 2 роки тому

      I believe his statement about linear probing is in the context of databases, not for general purpose programming.

  • @willw2596
    @willw2596 2 роки тому

    Nice lecture. For extendible hashing, when the directory needs doubling due to a bucket split, you don't need to re-map all the the directory entries if using the suffix hash bits as the directory index. Just bulk copy the first half of the directory content to the second half of the new directory would work. Basically the two halves of the directory entries both pointing to the same buckets after the doubling.

  • @AndersonSilva-dg4mg
    @AndersonSilva-dg4mg 4 роки тому +9

    thank you Andy

  • @xuejianpan3020
    @xuejianpan3020 3 роки тому

    Great lecture!
    But I have some questions:
    1)is chained hashing the most popular one?
    2)Extendible hashing and linear hashing might be used when there are a lot of collisions, but do we need that if we have a good hash function? In practice are they used widely?
    3)are static hash tables actually used in some real systems? If so why are they better than chained hashing?
    Thanks!

  • @monitoringtsdb3884
    @monitoringtsdb3884 2 роки тому

    such fire beats

  • @paulchiang4752
    @paulchiang4752 4 роки тому

    Thanks, andy. So the tombstone will be considered when we calculate the fill factor(load factor) since it is a kind of entry?

    • @theUnorthodoxxx
      @theUnorthodoxxx 4 роки тому

      I do not think so. Because we have used it for our easiness only.It is just a special empty entry.

    • @paulchiang4752
      @paulchiang4752 4 роки тому

      I understand what you mean, but in this video, from 30:15 to 31:00, Professor said that "the tombstone is wasting space and contributing to the fill factor". I am not quite sure and thus want to confirm.

    • @AshishNegi1618
      @AshishNegi1618 3 роки тому

      @@paulchiang4752 Think what happens if you add N keys and delete N keys where hash table size is N. Should you consider hash table completely filled ?

    • @zhenghanghu2430
      @zhenghanghu2430 Рік тому

      Yes, tombstones are considered for the fill factor.

  • @420_gunna
    @420_gunna 4 роки тому

    feeling blessed to hear my professor say the crc hash "sucks ass" in 2020 year of our lord

  • @cacheman
    @cacheman 4 роки тому +3

    12:33 I will admit to some confusion here; this section makes it sound like perfect hashing is only of theoretical interest, but of course there's plenty of software to generate even a minimal mapping (like gperf, or CMPH) that is used in production software.
    19:03 Using CRC for hash tables, yes probably a bad idea. That said, modern CPUs (both AMD64/SSE4.2 and some ARM variants) include CRC32 instructions which should in some sense be "fast", for their intended use-case.
    22:42 Is he saying he's _never_ heard about this before, or that he's heard of it before? Okay kids, having a seed is not enough; see Aumasson & Bernstein on breaking murmur. More generally, look up 'algorithmic complexity attacks'. It's cool to ignore it, but only when you know what you're doing.

  • @abdelrhmanahmed1378
    @abdelrhmanahmed1378 2 роки тому

    In Robin hode what if we want to move the richer one down by one but the place is already taken ?

    • @abdelrhmanahmed1378
      @abdelrhmanahmed1378 2 роки тому

      i have example where we will have the number of jumps = 3 ,
      if we have A[0],B[0],C[1],D[2] , and now we want to insert M and its location is where A at .
      so M[0] == A[0] , so next B[0]

  • @alfin3644
    @alfin3644 4 роки тому +6

    SHA-256 is not reversible

    • @andypavlo
      @andypavlo 4 роки тому +9

      Ah you are correct! I got this very wrong. I will correct this in the next lecture. Thanks!
      My point about that we don't care about leaking information is the more important part though.

    • @anarionzuo1425
      @anarionzuo1425 4 роки тому

      @@andypavlo It seems some chinese guy has made this reversible...

  • @jimwichen7978
    @jimwichen7978 Рік тому

    beep

  • @zeevtarantov
    @zeevtarantov 4 роки тому

    Why is the a lesson on basics of hash functions in a "how to build a RDBMS" course?

    • @AshishNegi1618
      @AshishNegi1618 3 роки тому +3

      First 5 mins of this lecture talks about this.

  • @zeevtarantov
    @zeevtarantov 4 роки тому +1

    SHA-256 is not an asymmetric cryptographic primitive, and it is not reversible, and the author should bleep out the audio of that part if they don't want to take down the whole video.