05 - Buffer Pools + Memory Management (CMU Databases Systems / Fall 2019)

Поділитися
Вставка
  • Опубліковано 29 вер 2024

КОМЕНТАРІ • 44

  • @kuntaliem
    @kuntaliem 4 роки тому +11

    Thank you for the awesome video. I guess PostgreSQL also supports some form of scan sharing. For each relation, it keeps track of the starting location (implemented in ss_get_location()) from where a scan should begin. It tries to synchronize multiple scans on the same relation.

  • @marcoq7160
    @marcoq7160 4 роки тому +42

    0:15 DJ Drop Tables has some problems
    0:48 Administrivia
    1:18 Database workloads
    1:47 Bifurcated environment: OLTP data silos -> Extract-Transform-Load (ETL) -> OLAP data warehouse
    3:17 HTAP shifts some of the analytical queries to the OLTP side
    3:59 Database storage
    4:54 Spatial vs Temporal control
    5:48 Disk-oriented DMBS
    6:49 Today's agenda: buffer pool manager, replacement policies, other memory pools
    7:22 Buffer pool organization: array of fixed-size frames
    9:11 Buffer pool meta-data: page table, dirty flag, pin counter
    12:30 Locks (for DB's logical contents) vs latches (for internal structures, mutex)
    15:19 Page table (in-memory) vs Page directory (on disk)
    16:30 Allocation policies: global vs local policies
    17:53 Buffer pool optimizations: multiple buffer pools, pre-fetching, scan sharing, buffer pool bypass
    18:42 Multiple buffer pools (reduce latch contention and improve locality)
    21:26 Multiple buffer pools: 1) object ID, 2) hashing
    22:58 Pre-fetching (based on a query plan)
    28:24 Scan sharing
    32:55 Where do intermediate query results go?
    35:07 Reminder: relational model is unordered
    36:05 A question from student
    36:44 Buffer pool bypass
    38:21 OS page cache (use direct I/O, O_DIRECT to bypass it)
    40:13 Postgres demo
    55:06 Why don't other systems use OS page cache as Postgres does? (hard to guarantee consistent behavior across different OSs)
    58:19 Buffer replacement policies
    1:00:15 Least-recently used (LRU)
    1:01:00 Clock (approximation of LRU)
    1:03:36 Sequential flooding
    1:05:23 Better policies: LRU-K (estimates the interarrival times of references on a page by page basis)
    1:06:21 Better policies: localization (evict on a per txn/query basis)
    1:07:10 Better policies: priority hints
    1:08:39 Dirty pages (trade-off)
    1:10:10 Background writing
    1:11:06 Other memory pools
    1:11:20 Conclusion
    1:11:47 Project #1
    1:12:45 Task #1 - clock replacement policy
    1:13:37 Task #2 - buffer pool manager
    1:14:23 Getting started

  • @linziye7714
    @linziye7714 5 років тому +17

    precious resource for database! Thank you CMU!

  • @drakenguyen2390
    @drakenguyen2390 4 роки тому +1

    thank you for this high quality content. This is a long shot but is there anyway we can have a solution for the project? Like sample code for for eg.

  • @yashwinenamadi1673
    @yashwinenamadi1673 3 роки тому +1

    Why does any os allow DBMS to maintain its own memory i.e., the buffer pool? DBMS is a mere process running on the os. Can't the buffer pool be swapped out of memory by the os to accommodate other processes' memory? Or is there a hardware support for this buffer pool dedicated to be managed by a DBMS software?

    • @meamzcs
      @meamzcs 3 роки тому

      Sure the OS can but that's not that DBs job to worry about. The OS does its own bookkeeping to determine what memory pages it swaps and if it's the buffer pool that's swapped out that means most likely every other page in memory is better to have in memory than the buffer pool... And once the DBMS accesses the buffer pool again you will get a page fault anyway which will lead to the OS pulling it back into memory...

    • @sebastiannyberg3008
      @sebastiannyberg3008 Рік тому

      The memory used by the buffer pool is no different from any other virtual memory given by sbrk. The buffer pool memory can be "swapped out" to a swap partition to avoid OOMkill, but not written out by the OS like pages from a file opened without O_DIRECT.
      Note that with a DBMS you usually don't want swap partitions due to the hard-to-debug performance degradation.

    • @sebastiannyberg3008
      @sebastiannyberg3008 Рік тому

      Note that the Linux kernel used to separate the buffer cache from the page cache, but the two were merged some ways back. The OS obviously needs to bring in the blocks from disk, and does some (minimal) caching even when using O_DIRECT.
      You can read more about the O_DIRECT flag in the open(2) syscall manpage

  • @tianhaowang4045
    @tianhaowang4045 3 роки тому

    rarely a video with 0 thumb-down

  • @bisakhmondal8371
    @bisakhmondal8371 3 роки тому +5

    Coolest Prof. ever. Thanks for the awesome contents.
    Btw who is "tk"(he/she is in every video)!! XD

  • @anuragshah3433
    @anuragshah3433 8 місяців тому

    Why did query performance not improve post moving every tuple to buffer pool at 49:13? Time stamp of query run with tuples being fetched from disk - 43:37.

  • @AndersonSilva-dg4mg
    @AndersonSilva-dg4mg 5 років тому +8

    Hooray! new video. Thankyou very much!

  • @roly4301
    @roly4301 4 роки тому +5

    Thank you making available these high-quality lectures and programming assignments. Was wondering if there's any chance you could release the grading scripts for previous offerings of the course?

    • @andypavlo
      @andypavlo 4 роки тому +12

      We're working on it.

    •  4 роки тому

      Yes. I love to. This is an another great resource for this amazing course.

    • @AshishNegi1618
      @AshishNegi1618 4 роки тому

      @Vipul Bhardwaj i am also interested. But i think concern for Andy is that online discussion forum will cause solutions to leak.
      we can privately discuss in our emails.

  • @ljratner
    @ljratner 4 роки тому +3

    Slide 16 is missing from the PDF.

  • @himanshuhbk953
    @himanshuhbk953 5 років тому +5

    Can non-CMU students submit these assignments ?

    • @jcdyer3
      @jcdyer3 4 роки тому +6

      Probably not, that's asking for a big donation of unpaid work from the course staff, but you can do them yourself, and then research the answers to check if you were right.

  • @peijunwu7354
    @peijunwu7354 2 роки тому

    23:00

  • @willw2596
    @willw2596 3 роки тому

    I'm not buying the benefit of pre-fetching, that assumes the pages on the disk are prearranged to be adjacent for the particular query type. In real life, pages are allocated in random in regarding to the ordering of the data. The scanning of the index pages looks dubious. B+Tree pages are only allocated when split, which happens randomly based on the data inserted. I mean you can say this big chunk of pages have all the index pages and load all of them at one shot, but so can mmap to load the big chunk of pages at one shot.

  • @williamweiler449
    @williamweiler449 3 роки тому +1

    How do multiple buffer pools reduce latch contention if threads still must contend for latches on the same page?

    • @rofamohamed2398
      @rofamohamed2398 3 роки тому +1

      Each buffer pool has a page table related to it, so you have multiple buffer pools and also multiple page tables, which will reduce the latch contention due to we increased the number of page tables.
      Hope i cleared it enough.

    • @williamweiler449
      @williamweiler449 3 роки тому

      @@rofamohamed2398 Ah, so latches are placed on the page table, not the buffer page itself?
      thanks for the reply

    • @rofamohamed2398
      @rofamohamed2398 3 роки тому +1

      @@williamweiler449 Yeah, latches are placed on the page table, to prevent concurrent access to the position that will hold the pointer to the buffer pool frame, which will be helpful when we need to get a specific page to know which frame the page resides in and that's the job of page table (to know the positions of the pages inside the buffer pool).

  • @rosswang688
    @rosswang688 3 роки тому

    rarely a vid with 0 thumb down.

  • @alielzahaby3315
    @alielzahaby3315 Рік тому

    Should I implement the assignments when im listening online or is it additional thing?

    • @alielzahaby3315
      @alielzahaby3315 Рік тому

      Like would it matter that much with understanding the course?

  • @prathibhapb
    @prathibhapb 3 роки тому

    22:50 - How does Approach #2 (Hashing) help if each database has its own buffer pool?

    • @zhenghanghu2430
      @zhenghanghu2430 2 роки тому +1

      I think the point is if you don't have the one-buffer-pool-per-table set up, then you can use hashing to find out the buffer pool each page belongs to. If the one-buffer-pool-per-table set up is applied, then you don't need hashing. (Or, you can have multiple buffer pools for each table, that way hashing can be useful.) We also see in the lecture we can have a buffer pool for each query, that case we don't need hashing neither.

  • @sheikhmuhammedshoaib1124
    @sheikhmuhammedshoaib1124 3 роки тому

    Can we get project source code for reference?

  • @drevil7vs13
    @drevil7vs13 4 роки тому

    30:58 I wonder if Oracle supports cursor sharing for 2 queries with the same sql_id but with different bind variable values

    • @moazeldefrawy4379
      @moazeldefrawy4379 3 роки тому

      It probably does? since the values for each row are stored together (aka .. it's a row database)

  • @anikethdeshpande8336
    @anikethdeshpande8336 2 роки тому

    @4:33 ---> which is the new hardware where we can push the execution logic to disk ?

    • @andypavlo
      @andypavlo 2 роки тому

      pages.cs.wisc.edu/~jignesh/publ/SmartSSD.pdf

    • @rodypar317
      @rodypar317 2 роки тому

      ua-cam.com/play/PL5Q2soXY2Zi-Mnk1PxjEIG32HAGILkTOF.html

  • @arhyth
    @arhyth 4 роки тому +1

    do distributed databases also have buffer pools? i would assume they don't. as the communication cost of keeping members in sync with each other defeats the use of buffer pools.

    • @adamjaffe1712
      @adamjaffe1712 4 роки тому +6

      I think each node in the distributed database will manage its own buffer pool. There is no need to synchronize it since the buffer pool is an optimization to avoid disk access - it doesn't affect the correctness of the underlying data.

    • @AshishNegi1618
      @AshishNegi1618 4 роки тому +2

      currently in this lecture we are studying single machine database internal details. Different machines don't share memory and hence any of buffer pools etc data structures. Buffer pools is way to access 100 GB database file with 1 GB ram - like virtual memory - without higher layers knowing about it.

    • @prathibhapb
      @prathibhapb 3 роки тому

      Distributed databases can still have node local buffer pools as they have some data storage in each node.