Rust at speed - building a fast concurrent database

Поділитися
Вставка
  • Опубліковано 10 січ 2025

КОМЕНТАРІ • 170

  • @distrologic2925
    @distrologic2925 5 років тому +305

    21:36 This is actually one of the best explanations of the rust ownership system I have heard yet.

    • @juliavanderkris5156
      @juliavanderkris5156 5 років тому +13

      Damn yeah, that was great. I'm starting with learning Rust and the concept of borrowing seemed cool but also really confusing. This explanation made it so much easier to understand.

    • @gloatsticks
      @gloatsticks 4 роки тому +6

      I'm slowly but surely learning Borrowing. The compiler is one of the most helpful with error messages!

    • @someonlinevideos
      @someonlinevideos 2 роки тому +3

      @@gloatsticks For some type T you have:
      Here is the text from that slide because Jon really does create a wonderfully succinct explanation of ownership in less words than this comment has used.
      T, &mut T, &T
      T is owned,
      &mut T is exclusive,
      &T is shared

    • @davidkeller6334
      @davidkeller6334 2 роки тому +2

      That explanation should be "borrowed" by every educator/communicator who wants to explain ownership

    • @morisonjazaj7920
      @morisonjazaj7920 6 місяців тому

      ​@@gloatsticks6

  • @Codeaholic1
    @Codeaholic1 5 років тому +170

    Jon I really appreciate what you do. You inspired me to sit down and learn Rust. You've made a difference in my programming journey. Thanks

    • @allesarfint
      @allesarfint 2 роки тому +6

      Something tells me you liked the language

    • @wrong1029
      @wrong1029 2 роки тому +2

      He's extremely based. Thanks to him I doubled my salary and fell in love with a programming language.

    • @johanhellberg9677
      @johanhellberg9677 Рік тому

      Congratulations@@wrong1029 ! Good work. Would you like to share how that came about, and what made the previous difference?

  • @bdafeesh
    @bdafeesh 4 роки тому +49

    How you speak is perfectly paced and you hardly stutter/'um'. I'm always trying to improve my speaking abilities and you offer a great example to follow. Keep it up man

  • @plorio
    @plorio 5 років тому +31

    I used that exact same atomic counter technique to decide when to the other copy of data is safe to write to at work. Felt quite clever when I got it to work. Really cool to see the same thing in the wild.

  • @aemmelpear5788
    @aemmelpear5788 3 роки тому +12

    I absolutely love how you can tell, how much you are excited about the Rust programming language. Always with a big beaming smile when talking about the nice features Rust has.

  • @nickmaxwellambient6615
    @nickmaxwellambient6615 5 років тому +13

    Jon, just found your channel today. It's rare to find someone who both covers really in-depth topics and does so in an entertaining way. Fabulous job you are doing here.

  • @dsincl12
    @dsincl12 6 років тому +13

    Really enjoyed this talk. The epiphany for me when hitting the borrow checker wall was when I realized that it was helping me not penalizing me for the code I've written. This small mental shift made everything all the easier and suddenly Rust became a pure joy to work with. I have to admit I didn't read through the whole Rust book from start to finish before hitting the wall. I usually never do that when learning a new language but you are absolutely right. Reading the book from start to finish will save you a lot of pain and frustration with the language.

  • @LiamDennehy
    @LiamDennehy 2 роки тому +2

    I'm so glad i found this presentation again. I'm really getting into Rust and even advocating for it in my work, and recalled this presentation does a great job introducing Rust and proving how it can solve problems.

  • @MrGeissT
    @MrGeissT 6 років тому +55

    Well done, Jon! Fantastic talk and really interesting to hear more about what you're doing when you're not Rusting on UA-cam!

  • @yashashav_dk3766
    @yashashav_dk3766 5 років тому +12

    This talk is worth it's weight in gold. Thank you kind, Jon!

  • @JohnHAdams-vo2pk
    @JohnHAdams-vo2pk Рік тому

    Brilliant explanation of ownership. Simple and to the point. Better than any book I've read. Thank you for this, now it's pretty solid in my head.

  • @simonray4713
    @simonray4713 6 років тому +17

    I pretty sure he love this language. Full of energy with a great talk

  • @clinton9129
    @clinton9129 Рік тому

    Jon, thank you and your friends for digging deep and discovering the possibilities.
    Since our last discussion, I am now learning how to code in python and I am learning other languages also. But my idea behind RUST is low latency and Jupyter notebook style coding. In fact Jupyter notebook was also one of my crazy ideas.
    Now that I am learning how to code, my writings' will be phenomenal given my skills.
    Expect more movies, lyrics to songs, Comedy segments, etc.
    Anything is possible now as always.

  • @jonwise3419
    @jonwise3419 5 років тому +48

    I have a difference experience with rapid prototyping. To me rapid prototyping is MUCH easier with a good powerful type system if you do it in the right way.
    I'm still learning Rust, so I'll use Haskell as an example, but I already see it will be the same in Rust. In Haskell rapid prototyping is a breeze. Remember, that you don't need to write implementations or correct code! In Haskell you can write "undefined" and compiler will shut up. I normally, have several shorthands like "u = undefined" and then write a type and `u` as an implementation, compile, write more code replacing `u` with a more concrete implementation (that still can have more `u`s inside it) and then slowly an implementation writes itself as you replace undefined parts of your code with more and more concrete implementations. I heard this being called as type-driven or hole-driven development, because you follow the type, put holes in your code and fill those holes. Also, remember that with type inference you can replace anything with `_` in Haskell or with a sham type in Rust (like u8 when you know something is not u8) to figure its type or what type you need to pass, because it will complain and show you exactly which type something needs.
    Rapid prototyping is all about speeding up the feedback cycle and code-reuse. The latter is about not writing code at all, because you already have a library somewhere. Both are enabled by a strong and flexible type system.
    Starting with feedback, in Python to detect that your code is not working you'll have to wait until runtime .In Haskell or Rust, you'll immediately get feedback the moment you press a key. In fact, I usually wrap a function into another function that will print some useful output and I have very fast key shortcuts to both compile and to show some runtime feedback as well (like to print a datastructure I'm working on or run just tests that matter, keeping it fast). Like there will usually be a predefined name for me in a file, like `c`, which I can use at any time when working on a file. If I want to print something I just put one line in a file like `c = my_function testArg` and this is what will start getting printed out as I'm working on that function because that `c` will be like a main for that file, while I'm working on it that will constantly print output, and a script will automatically import that file and run it as I'm working on it. Same for tracing statements, pretty printing, and other tools that you would normally use. I want all of them to be a short names that I can use anywhere as I'm rapidly typing and working on something, preferably as a Swiss knife personal library that you can import in a few keystrokes into any file.
    Using other libraries is easier in a languages with a good type system due to a) type inference b) faster feedback cycles c) confidence in that if it compiles it will work. So, you can again just put a bunch of undefines, type holes / sham types, and slowly figure out what do you really need to some interface of that library or how to get there. In Python you can pass invalid data and you'll have to wait until runtime to figure this out. In Haskell or Rust, you will get immediate feedback that something is wrong and you have type inference to figure out EXACTLY what data needs to be there or how it should be used, and by using type holes can try to arrive at that point in any way you want by splitting the problem into smaller parts and following the types.
    What Rust misses here is more libraries, but when it will have a lot of libraries, figuring out those libraries will be much easier for reasons I've mentioned. Prototyping in Python can be easier DESPITE the lack of type system because there are more libraries so a lot of code you don't have to write. But when I have to write it, it's more painful. Python also has some good tools to speed up feedback cycles, like IPython, so some people naturally use workflows with better feedback cycles. But REPL is really not necessary if you have a good workflow which imitates REPL automatically. In fact, REPL can''t MAXIMIZE speed. Like you don't want to be typing code in REPL and then also in your file. You want your file to automatically give you REPL-like feedback. There are several ways to do it, beyond having scrips and conventions to automatically run things. You can have one file that acts as a RELP, imports a library you are working on and gets continuously run as you work on that library. I just find that more direct approach is faster.
    I guess to summarize this, I would say that if you don't fight the compiler but optimize your workflow then rapid then prototyping is easier with a more powerful type system. Rust can have no type system at all if you don't want it to, because you can ignore any parts of the code with unimplemented or panic.

    • @blo0mfilter868
      @blo0mfilter868 3 роки тому +9

      nice blog post

    • @QmVuamFtaW4
      @QmVuamFtaW4 2 роки тому

      uhh can you explain all of it in under like 17 words? my brain hurts

    • @julians.2597
      @julians.2597 9 місяців тому +1

      ​@@QmVuamFtaW4 Grug like Rust. Rust compiler not quite REPL, but compiler know lots things. Compiler often tell Grug when mistake in thing Grug makes. Tells also where problem abd how to make mistake go away.

  • @rajsahu4644
    @rajsahu4644 2 роки тому

    Jon is inspiring in so many ways. The work (which he explains in the presentation) and the way he described every topic! Just awesome. I probably never watched any hour long talk in one go until this one!

  • @yurikhomyakov9826
    @yurikhomyakov9826 5 років тому +4

    Thank you for an amazing talk. As soon as Rust 1.0 came out I knew something interesting was cooking!

  • @sgdfdsgs
    @sgdfdsgs Рік тому

    Your talk was fantastic! Your passion and delivery were both amazing, and it really got me excited about Rust. Thank you for sharing your knowledge and enthusiasm with us.

  •  4 роки тому +3

    This talk is amazing! You explain things in a very approachable way!

  • @BohdanTrotsenko
    @BohdanTrotsenko Рік тому

    33:30 the datastructure suggested implements a "weaker map", meaning that writes are not seen.

  • @BenjaminCronce
    @BenjaminCronce 5 років тому +1

    Super-linear is real, up to a point. Good memory access patterns can cause different threads to share cached data, such that 2 threads is more than 2x faster than 1 and 4 threads is more than 2x faster than 2. But in my limited experience, I never noticed past 4-6 threads, but at least beyond that was essentially perfectly linear.

  • @raysonlogin
    @raysonlogin 4 роки тому

    The 1 map for readers and another one for writer mechanism is like RCU - whenever the writer is ready then the pointer to the data structure is published such that the readers get the updated version.

    • @jonhoo
      @jonhoo  4 роки тому +1

      Yup, that's exactly right, except that in this scheme we only ever have two copies and switch between them. In RCU you create a new copy for each update.

  •  6 років тому +2

    Brilliant talk, thanks a lot! I really like how you've covered the ownership model, along with how you've implemented the lock free evmap datastructure and why that is safe. Well done.

  • @scriptozavr
    @scriptozavr 6 років тому +8

    Enjoyed the talk, Jon is indeed a phenomenal speaker!

  • @flesnuk
    @flesnuk 6 років тому +4

    Thank you for uploading this, the second part of the talk is awesome and easy to follow.

  • @veramentegina
    @veramentegina 4 роки тому

    wow, what a great talk. Passion and enthusiasm makes such a difference. It captivated me. Thank you so much.

  • @_akshaydeep
    @_akshaydeep 6 років тому +52

    Database talk start at 13:40

    • @VishalAnand24
      @VishalAnand24 6 років тому +17

      Don't jump, this talk is awesome

    • @eMbry00s
      @eMbry00s 5 років тому +6

      @@VishalAnand24 if you're into Rust, though, you've already been introduced to the Rust basics in like fifty other talks. So the timestamp is useful for some of us!

    • @VishalAnand24
      @VishalAnand24 5 років тому

      @@eMbry00s okay

  • @yurizappo5726
    @yurizappo5726 4 роки тому +1

    Love the idea of double increment and check if number is even. Simple and effective.

  • @bobwatson957
    @bobwatson957 4 роки тому

    Excellent description of some Rust features that will trip up the newb. Excellent video.

  • @thisiswill
    @thisiswill 4 роки тому +3

    You’re good at explaining this - and I’m very new to exploring Rust. Thanks for posting this keynote.

  • @corv882002
    @corv882002 Рік тому +1

    Very interesting talk. I'm curious why you couldn't just store an atomic counter of how many readers are active per map instead of doing the whole epoch and ignore it if it's even thing.

  • @vhiremath4
    @vhiremath4 4 роки тому +1

    This was an incredible talk. Thank you. I shared this with my team - I'd love for us to consider using Rust for some lower-level video stuff. :-)

  • @mahmoudabuzamel7038
    @mahmoudabuzamel7038 4 роки тому +1

    Thank you very much Jon, this is a valuable presentation rich with guidance and recommendations. I learned a lot from it!

  • @hanskessock3941
    @hanskessock3941 Рік тому +1

    Jon is a wonderful teacher, and inspirational.

  • @user-kn4wt
    @user-kn4wt 6 місяців тому

    @40:47 is that kdb with the benchmarking clause? 😅

  • @ZenTrickz
    @ZenTrickz 5 років тому +55

    49:16 "... your program will not compile unless everything is documented. It's fantastic!"
    You know you are a Computer Scientist when... 🤣

    • @YoloMonstaaa
      @YoloMonstaaa 3 роки тому +1

      @Ramon Oliveira you'll get used to it :)

    • @ekrem_dincel
      @ekrem_dincel 3 роки тому

      @Ramon Oliveira not really.

  • @ozank7327
    @ozank7327 6 років тому +11

    great talk, rust looks amazing

  • @samferrer
    @samferrer 2 роки тому +1

    Wonderful lecture ... and the switchable rw-lanes is a must takeaway... thank you!

  • @distrologic2925
    @distrologic2925 5 років тому +2

    37:03 Instead of counting each read begin/finish, why not just keep a boolean value, which indicates whether the thread is currently reading or not. When starting a read it will set it to true, after finishing a read, it sets it back to false.

    • @jonhoo
      @jonhoo  5 років тому +3

      If you did that, you might continuously read that readers are active if reads are sufficiently frequent and fast. That would block switching to the other map, even though it's safe to do so the moment all readers have finished at least one read.

  • @ankurleonardo
    @ankurleonardo 3 роки тому

    Every second in this video was worth watching. Awesome DS glimpse.

  • @MaxLambrecht
    @MaxLambrecht 4 роки тому

    Amazing presentation. Now I love Rust more.

  • @colinmaharaj
    @colinmaharaj 2 роки тому

    16:40 That is spectacular, I like it, will try that. (Update the read caches on write)

  • @guibirow
    @guibirow 4 роки тому

    At 37:00, to check if there are no readers still reading or suspended before starting writing, he suggested to increment the counter twice. One on starting the read, the other when completed, then check if the counter is even before writing. With this approach, if there are two readers or any even number, will cause the writer to falsely assume there is no reader because the counter will be even.
    I would assume the right approach is increment on start and decrease when finished. Then, if the counter is 0, the writer can confidently guarantee no reader is pending finalization before start writing.
    Is there anything I am missing?

    • @jonhoo
      @jonhoo  4 роки тому +1

      Ah, no, each reader has its own counter :)

    • @guibirow
      @guibirow 4 роки тому

      @@jonhoo Nice, I thought it was a global counter

  • @flyingsquirrel3271
    @flyingsquirrel3271 5 років тому +1

    Thanks for the very impressive and interesting talk! If I wouldn't love rust already, you would have convinced me to learn it ;)

  • @prestigek1ngs
    @prestigek1ngs 3 роки тому

    I’m currently getting into rust with no basically no prior knowledge. It’s kind of a pain but it’s been fun. I’m on the fence still because I can’t find the best tutorials but I’m trying hard. I think it’s the future

  • @MIHIRAMRELIA
    @MIHIRAMRELIA 2 роки тому

    Love your work, enjoyed the session !

  • @sanderbos4243
    @sanderbos4243 2 роки тому

    I adored this talk, thank you!

  • @steveharris3627
    @steveharris3627 4 роки тому

    Nice, sort of an advanced/smart “refresh ahead”, like caching systems do, that is built into the DB.

  • @FrancescoCielo
    @FrancescoCielo 5 років тому +2

    Isn't storing nodes of a graph in a list and then using indices basically just doing pointers while hiding it from the borrow checker so it doesn't bother you?

    • @jonhoo
      @jonhoo  5 років тому

      Hehe, yes, pretty much!

  • @giganooz
    @giganooz 2 роки тому

    As for the database thing, how does it work when you have multiple queries that try to change data through addition at once. Wouldn't it read the same value multiple times first an then change that multiple times instead of going in parallel?

  • @bozhidaratanasov7800
    @bozhidaratanasov7800 Рік тому

    Do you think Rust has caught up to speed on async networking stuff? I am trying to follow the Rust trends, but I am not sure what the difference between now and 4 years ago is. Tokio seems to have been doing a fair bit of work there.

  • @sluongng
    @sluongng 6 років тому +13

    at 52:00 he was describing that he had PTSD from fighting the Compiler ^_^

  • @passermelon7
    @passermelon7 5 років тому +1

    I have a question...When you exchange the reader/writer pointer, you need to ensure all the reader pointer number are even.What if you check the latter pointer, the former pointer's number change.It seems like we still need a mutex?

    • @jonhoo
      @jonhoo  5 років тому +2

      Ah, no, it's sufficient to observe an even number for each reader _separately_. For any given reader, if you observe their epoch number being even, you know that _that_ reader has seen the updated pointer, and so you don't need to check it again until the next time you want to swap the pointer.

    • @magnum789
      @magnum789 4 роки тому

      ​@@jonhoo Hey Jon, very exciting talk! I'm new to Rust but have experience with Go. How does the writer reapply the writes it has done to map B into map A after swapping the pointers and revealing map B to readers and thus obtaining exclusive access to A? It seems like the writer must internally keep track of the missing writes (surely more than one missing write in map A due to possibly a reader being slower than the writer)?
      Another question: if the writer observes the even numbers separately for each reader, how does it prevent a reader from starting a long read just after the writer made the observations and just before it does the swap?
      Edit: I think I got the answers from reading the evmap docs now. There is a log, of course. The swap is just done at some unspecified point in time and the replay is done AFTER waiting for the epoches to get even. I'm thinking of trying to implement something like this in Go.

    • @jonhoo
      @jonhoo  4 роки тому

      Hey there! Almost missed this comment since it's under a relatively old one, and UA-cam tends to hide those. Looks like you figured it out though! There is indeed an operation log so that the writer knows how to bring the "old" reader map up to speed when it becomes the "new" writer map. For the epochs, the way it works is that the writer will not start writing to what used to be the reader map until it has observed an even counter from every reader _since_ the swap. That ensures that any subsequent reader operation *must* be on the swapped map, and therefore there cannot be any readers left in the old reader map.

  • @shirshak6738
    @shirshak6738 6 років тому +1

    thank you ! Looking forward for your new stream. Rust is the best programming language i have ever used :)

  • @benoitranque7791
    @benoitranque7791 Рік тому

    This was fantastic, thank you

  • @robheusd
    @robheusd 2 роки тому

    Isn't there a chance that just after the writer sees that all read counts are even and it is safe to do a write, and just before the writer/reader pointers get interchanged bfore the actual write, a reader starts a read, or what prevents the reader from doing that?

    • @jonhoo
      @jonhoo  2 роки тому

      The writer swaps the pointer *first*, then waits for all the counters to change, and the starts modifying the old pointer. If a reader starts doing another read, they'd follow the swapped in pointer, which points to the other copy (the one the writer isn't about to modify)

  • @colinmaharaj
    @colinmaharaj 2 роки тому

    7:56 hmmm, I rely on null pointers in C++ to know about the state of my objects.

  • @thomascorbin5371
    @thomascorbin5371 4 роки тому

    This was a fantastic talk

  • @randomplan9537
    @randomplan9537 4 роки тому +1

    thank you for this great intro to noria. is there a way to try noria also ?

    • @jonhoo
      @jonhoo  4 роки тому +2

      github.com/mit-pdos/noria is what you'll want to look at :)

  • @DavidBeaumont
    @DavidBeaumont 5 років тому

    I'm interested in whether this approach is safe in shared memory multi-cpu systems. Aren't you relying on the order in which data pages are published to the different CPUs to be able to promise that an even count reflects the true state of the reader in another thread? Does Rust have some promises about the semantics of the memory architecture? (e.g. are you implicitly relying on the counter and the table pointers being in the same memory / same cache lines?)

    • @jonhoo
      @jonhoo  5 років тому +1

      The epoch counters use atomic loads and stores (fech-add specifically) with well-defined memory orderings. Take a look at the AtomicUsize type in the Rust standard library :)

  • @TheSunscratch
    @TheSunscratch 6 років тому

    This is a really great talk!

  • @kompeterPC
    @kompeterPC 6 років тому

    Really interesting talk, well done

  • @darkenblade986
    @darkenblade986 2 роки тому

    awesome stuff. I learned so much

  • @arhyth
    @arhyth 6 років тому

    good stuff and great presentation!

  • @danielfielding1938
    @danielfielding1938 3 роки тому +3

    If he were creating a fairly normal relational DB in Rust, then it would be a cool way to see the advantages of Rust, but those advantages seem to be somewhat obscured by the unconventional design of his unusual database. In other words, maybe the radical design is what makes Noria cool, not Rust?

  • @upevolutionism
    @upevolutionism 5 років тому

    Hi Jon, this is a great talk and I am reading your paper as well. I found this video while searching on the topic of Read/Write separation in databases because Datomic is not open sourced thus I can't study its implementation. Have you ever evaluated Datomic? If so, what are your thoughts of Datomic and how different it is from Noria?

  • @markocvejic6416
    @markocvejic6416 3 роки тому

    Link for voting on your youtube chanel, intro video is broken :)

  • @linuxsir-tw
    @linuxsir-tw 6 років тому

    Great talk, many thanks for your sharing

  • @TimLF
    @TimLF 5 років тому

    I was expecting "Multiversion concurrency control" of PostgreSQL, and CockroachDB to be compared, but then the video was over. I'll have to add that to my needs more research or testing list.

    • @joshuayanovski5765
      @joshuayanovski5765 4 роки тому

      MySQL also uses MVCC (albeit a slightly different variant of it, but one that actually makes reads by primary keys faster than Postgres at the cost of reads through secondary keys). The thing he's leaving out of this talk is that Noria does not fully support transactions, so it is not quite apples to apples; however, it is plausible that the queries he's testing are nonetheless transaction-safe (or have semantics that don't require full transactional isolation).

  • @junaid1464
    @junaid1464 2 роки тому

    Can't postgres function idx do that?

  • @dan-vw4ve
    @dan-vw4ve 4 роки тому

    Really awesome talk, one thing I don't get is the reasoning for an epoch counter instead of flipping between 0 and 1? edit: is it an ABA thing?

    • @jonhoo
      @jonhoo  4 роки тому +4

      So, you could get pretty far by just flipping between 0 and 1, but using an actual counter enables some cool optimizations. For example, in the real implementation, all writes are delayed until the _next_ swap, and only then are they applied to the "write" side of the map, which is then immediately made the "read" map and exposed to readers. This allows the swap to only block until the _previous_ pointer swap has been observed by all threads, at which point the "write" side is safe to modify. This, in turn, means that usually the swap doesn't have to wait _at_ _all_ , since in the time between the previous swap and the current swap, most reader threads will have finished any pending operation they had. _But_, this relies on being able to distinguish between a reader that is still doing the _same_ operation it was doing when the previous swap happened (and so hasn't seen the latest pointer swap) and one that is doing a _new_ operation (and so _has_ seen the latest pointer swap). And 0/1 wouldn't let you do that, so you would _always_ have to wait to observe a 0 from each thread.

  • @dbaldwin2803
    @dbaldwin2803 2 роки тому +1

    Does anyone know if this has been abandoned? The GitHub repo hasn’t been updated in quite some time and I really want this to flourish.

    • @jonhoo
      @jonhoo  2 роки тому +1

      For Noria, the thing to watch is readyset.io/. For the data structure described in the talk, check out github.com/jonhoo/left-right/ !

    • @dbaldwin2803
      @dbaldwin2803 2 роки тому +1

      @@jonhoo so some of it is open-source and some is proprietary?

    • @jonhoo
      @jonhoo  2 роки тому +1

      @@dbaldwin2803 All of Noria that was part of my research is open source on GitHub. However, since I graduated, it's not an active project, just the final state of the research prototype. ReadySet is building on top of the Noria code base and making it production ready. Some of that is proprietary, although much of it is also developed in the open over at github.com/readysettech/readyset

    • @dbaldwin2803
      @dbaldwin2803 2 роки тому

      @@jonhoo can you elaborate on which parts are proprietary? I tried pitching Datomic to our CTO for a new product and it being proprietary was a non-starter.

    • @jonhoo
      @jonhoo  2 роки тому +1

      You'd be best off reaching out to them directly! While I'm a co-founder, I'm not involved with any of the ongoing work or plans :)

  • @davidboreham
    @davidboreham 2 роки тому

    Quite surprised MySQL and Memcached are so fast by comparison. I'd expect more like O(10^2) difference.

  • @micknamens8659
    @micknamens8659 5 років тому

    What happens when a read client on the partial materialized view A has an odd number of points in his access accounting, and the associated client thread died or is in an endless loop and will never end its query, i.e. won't increment its counter?

    • @jonhoo
      @jonhoo  5 років тому +1

      It's a good question! The current implementation has no story for that - it would cause the writer to wait indefinitely. If you use Rust with unwinding, the trick to pull would be a "panic guard": a type created just before the user code is called that, if dropped during a panic, will increment the epoch counter to restore parity. If you want to try to submit a PR to evmap with that change, I'd be happy to mentor and review it! If the reader runs indefinitely, that requires the writer to wait as well.

    • @micknamens8659
      @micknamens8659 5 років тому

      @@jonhoo I'm currently just "interested" in Rust, so no PR from me 😉 . What about a time-out for Read handles. It expires after x seconds. That's then part of the API contract. Could be a semantic extension of Rust language or a user-defined wrapper type: when the read client tries to call a function of this type after expiry time an exception is raised on client side. So the writer need not wait for an expired read handle.

    • @jonhoo
      @jonhoo  5 років тому

      @@micknamens8659 Hehe, that's okay! I think getting the semantics of expiring read handles right would be quite a challenge. And especially because you'd need some way to preempt a running thread, which is far from trivial!

    • @jonhoo
      @jonhoo  5 років тому

      Just implemented this in github.com/jonhoo/rust-evmap/commit/73a67292b5588d6109b6845f6e771d804a9ac906 !

    • @micknamens8659
      @micknamens8659 5 років тому

      @@jonhoo Another approach to overcome this exceptional situation could be to create a temporary third materialized view for writing.

  • @paolophoenix
    @paolophoenix 5 років тому +4

    We did reasearch on this topic 25 years ago. The approach fail on non monotone schemas. Sometimes you have to recompute all the db. Easy to demonstrate. I can write a db in wich if you add a record in a table (cache) that cause all other cache to empty. The future is in functional databases. Where tables are functions.

    • @jonhoo
      @jonhoo  5 років тому +8

      It's true that there are cases like that, but the argument we're making here is that that is not true for *common* queries for web applications specifically. And even in those cases, if your application is read-heavy, it is far better to re-compute the query result only when the result changes, rather than on *every* read. If you have a particular problematic use-case in this domain in mind though, I'd be happy to take a look!

  • @adesanoyesamson668
    @adesanoyesamson668 2 роки тому

    2022 and I find this talk interesting

  • @chang8106
    @chang8106 3 роки тому

    Thanks for sharing. Appreciate it

  • @ddystopia8091
    @ddystopia8091 Рік тому

    It is just incredible talk

  • @lucasa8710
    @lucasa8710 2 місяці тому

    wait, is the guy talking Jon?

  • @marcusklaas4088
    @marcusklaas4088 6 років тому

    Lovely talk. Thanks!

  • @Ethan-gu9hm
    @Ethan-gu9hm 6 років тому

    The idea of caching query results cannot be that new or novel isn’t it? I recall that oracle db has the feature with the name “materialized view” for a long time now

    • @jonhoo
      @jonhoo  6 років тому +2

      There's a little bit of discussion around this in response to www.reddit.com/r/rust/comments/acucrs/rust_at_speed_building_a_fast_concurrent_database/edbxqv8/. In general, materialized views in commercial relational databases are limited, both in terms of performance and flexibility. I'd recommend giving the Noria paper a read if you want more in-depth analysis, as we evaluate that there too!

    • @panstromek
      @panstromek 6 років тому +1

      The idea of it is just so straghtforward... Basically since I started working with DBs, the whole inefficiency of their work has been bugging me, especially considering many of them were built at times when computational resources were sparse.
      I just don't get it..
      I am glad that someone just took this pretty logical idea and just built it. Great work and talk @@jonhoo ;)

    • @robheusd
      @robheusd 2 роки тому

      If I'm correct, even before materialized views were added, queries, parsed queries and their result sets were chached and stored in the shared pool of the PGA.

  • @serejkus
    @serejkus 5 років тому

    Great talk, thank you!

  • @SahilDawka
    @SahilDawka 5 років тому

    You are a genius! Thank you!

  • @maherkhalil007
    @maherkhalil007 5 років тому

    Can you make tutorial about using rust + databases + web assembly to produce MVC software?

  • @LaurentLaborde
    @LaurentLaborde 4 роки тому

    i came here mostly for the database part of the talk :)

  • @haydarinda
    @haydarinda 2 роки тому

    Thanks. Additionally, 0.8 is a reasonable speed for non-native speakers.

  • @jonabirdd
    @jonabirdd 5 років тому +1

    how can I upvote this talk twice?

    • @KhoaNguyen96
      @KhoaNguyen96 5 років тому +1

      Create another account ... perhaps :)

  • @haystackdmilith
    @haystackdmilith 6 років тому +2

    Great stuff

  • @mykra2939
    @mykra2939 6 років тому

    Thanks a lot, great talk

  • @mxoliv_
    @mxoliv_ 9 місяців тому

    amazing talk!

  • @KFlorent13
    @KFlorent13 2 роки тому

    Thas was really really good.

  • @gregoryterzian8145
    @gregoryterzian8145 4 роки тому

    Nice presentation, and I'm wondering if you've considered a potentially more simple solution to your problem: use multiple fine-grained locks, and use locks optimized for unfairness.
    It seems somewhat of a straw-man to compare a lock-free hashmap, with a non-concurrent hashmap protected by a single Rwlock, in the sense that one wouldn't expect the read throughput to scale linearly with the number of threads(beyond those few initial threads), when those would be contending on the same lock.
    Also I think you're taking a bit of a shortcut when saying that "with short critical sections, the lock acquisition/release becomes the bottle neck".
    With short critical sections, the point is doing work in parallel, and outside of the critical section. So if you compare the time spent acquiring and releasing a lock(entering/exiting the critical section), it should be very small compared to the time spent performing work outside of the critical section.
    The time spent in the critical section might be smaller than the time spent acquiring/releasing the lock, but that's besides the point, the real work is done outside of the critical section, and a thread should only enter the critical section to "publish" the result of parallel work for others to read(and take a snapshot to use in further parallel work?).
    ----
    So I think there are two problems you're trying to solve, and going from "a single lock doesn't work", to "let's use lock-free algorithms", is a big jump from one side of the complexity spectrum to the other, one that is missing the low-hanging fruits in the middle:
    1. When the individual threads in your workflow are making progress via lots of short-critical sections, you can optimize parallelism by allowing threads to re-acquire a lock they've just released, the so-called "unfairness" of the lock.
    2. To improve your ability to scale the throughput of operations on the shared-data when those involve more than just a few threads, one could use many fine-grained locks, for example one around each bucket of data.
    Another benefit of actual locks, is that they can be paired with condvars for signalling, which can be useful if your business logic is not just about achieving lots of parallelism in reads or writes, but also cares about ordering or other types of logical synchronization between threads doing the reading or writing.
    I recommend the following articles:
    This one, which talks about optimizing locks for small size, cheap uncontended operations, and unfairness: webkit.org/blog/6161/locking-in-webkit/
    This one, about locks, condvar, and concurrency in general(it has a nice paragraph on why one should avoid atomics as much as possible, and how locks can actually show better performance): www.chromium.org/developers/lock-and-condition-variable
    And off-course, in Rust, there is already a library offering you the kind of locks you would seem to need: docs.rs/parking_lot/

    • @jonhoo
      @jonhoo  4 роки тому +1

      Hi there! I am aware of all those options - I do research in this area. Fine grained locks do not work well when you have a skewed access pattern, because the threads still end up contending on the shared cache line for the lock semaphore. A shared reader-writer lock helps, but is harder to manage on top of a work stealing runtime where tasks may move between threads and cores decently often. As I say in the talk, this is a data structure specifically optimized for when you do not want stronger consistency, and that relaxed requirement allows you to perform additional optimizations that would otherwise be disallowed.
      That all said, I agree with you that this is not the right approach everywhere, and I also never claimed that it was. There was a lot of content to fit into a one hour talk, and some subtlety inevitably gets lost. I expect anyone with serious needs for concurrency to carefully weigh their options.

    • @gregoryterzian8145
      @gregoryterzian8145 4 роки тому

      @@jonhoo Ok I see, interesting.
      Have you tried this without any shared data?
      I can imagine having a "master thread" own the data, with reader/writers owning a local clone of the part of the data they're interested in. Then the master thread would receive streams of writes from writers, and propagate them as streams of updates to the readers? When a reader wants to access a different part of the data, it's a bit like subscribing to a new stream, where the first chunk is a clone of the initial data, followed by a stream of updates?

    • @jonhoo
      @jonhoo  4 роки тому

      You'd then up with many copies of the underlying data, which would be unfortunate. It would also require readers to synchronize with the writer to ensure they get the latest updates, which could be costly. It also adds significant costs if you have many reader threads accessing the same underlying data, which is often what happens when you have skewed access patterns. With the design I outline in this talk, you could have 16 threads all reading the same key at the same time with no contention among them at all.

    • @gregoryterzian8145
      @gregoryterzian8145 4 роки тому

      @@jonhoo
      I don't think you'd necessarily need the readers to synchronize with the writers, what I mean is that the writers publish updates to the "master thread", and the "master thread" then publishes updates to each readers(subscribed to updates for that particular piece of data).
      Or the readers pull updates, or a combination of both.
      Yes if all readers ask for an update to the same data from the master thread, that's going to end-up being serialized, and so would access to the same key on a concurrent map that offered strong consistency requirements.
      So looser consistency could also be encoded in either the behavior of the readers(don't ask for updates), or behavior of the master thread(pre-publish stale updates to meet request from readers).
      Optionally the writers could publish something to readers that "they've sent an update to the master thread", which would be a signal for readers to pull updates from it(or something like that).
      With regards to the data, a local (partial) "copy" also allows readers to read from it at full speed, since it's local data they own. So there is a trade-off there between copying data and avoiding shared-state(a concurrent map can be fast, but never as fast as a local map).
      That local copy is not going to be up to date, hence the option for readers to get up to speed by pulling updates from a master thread, or some other business logic.
      I understand there are lots of details and it's hard to argue this here, my main point is that you don't necessarily need to encode your parallel logic through a fast concurrent map.
      A fast concurrent map can be great, and it can be even better to simply have threads operate on local data in parallel, especially if you have loose consistency requirements.
      So when I read "16 threads all reading in parallel from the same key on a concurrent map" I think "16 threads all reading in parallel from local data, backed by some stream of updates implemented through message-passing(and hopefully doing some useful work besides reading from local data at a high-throughput)".
      Such a model also potentially allows for much more precise business logic in terms of requesting (consistent) updates, signalling that an update is available and so on...

    • @jonhoo
      @jonhoo  4 роки тому +1

      This discussion is very poorly suited for UA-cam comments :) I agree there are alternative designs, I don't think the design you outline above has any advantages over the one I outline in the talk in the read-heavy workload context, and I don't think it is any simpler. It also comes with disadvantages, such as more cross-thread data movement and duplication. My guess is also that the version I outline will be faster since it allows more sharing of CPU cache state, though that's hard to say that for sure without empirical evidence. Local data is also not any "faster" if it is not concurrently modified, since the data will be marked as Shared in the CPUs, and they will all be allowed to read and cache it. I _also_ don't agree that it allows for "more precise business logic". They both present the same consistency guarantees for updates if you swap on every write. It's true that it buys you condvars/notifications, but if you don't need those, then that is just pure overhead. You are right that it's possible to model this in a sort of actor model with message-passing, but I don't think that is an obviously better design, it's just a different one, which presents a different point in the design space.

  • @WolfBoy2700
    @WolfBoy2700 5 років тому

    Really interesting! you did that very well! Thank you .

  • @thingsiplay
    @thingsiplay 3 роки тому +1

    40 years from now, no one will understand how someone could even write unsafe code.
    You can quote me, if I am wrong.

  • @psychicopus
    @psychicopus 5 років тому

    My inspiration

  • @piyushkatariya1040
    @piyushkatariya1040 5 років тому +1

    Old wine in a new bottle. This is very similar to following the even sourcing model with a materialized view (MV).
    or Simply use PostgreSQL 11 + CitusDB (distributed computing and sharding) + PipelineDB (continuous computation and MV)

    • @jonhoo
      @jonhoo  5 років тому +14

      Noria is actually very different from both of your proposed systems, but this was a relatively high-level talk that didn't talk much about the actual contributions that Noria makes. If you're interested in a more technical discussion, I'd recommend giving the research paper from OSDI'18 a read: www.usenix.org/conference/osdi18/presentation/gjengset

    • @jawad9757
      @jawad9757 2 роки тому

      Interesting that you used the word similar

  • @mq110
    @mq110 6 років тому

    Hi, thanks a lot

  • @PaulSebastianM
    @PaulSebastianM 4 роки тому

    33:00 - yeah but people bashed the author of Actix-Web and even got him to give up maintaining it, because they didn't like unsafe in their Rust libraries. Who's more right here?

    • @jonhoo
      @jonhoo  4 роки тому +1

      That whole affair was never about "not liking unsafe". It was about the maintainer's apparent disregard for the importance of being really careful with unsafe code. Unsafe is a useful tool to be sure, but it's also a sharp tool, and you should only use it if you *really* need it, and only then with great care and extensive documentation and testing (like with miri).

    • @PaulSebastianM
      @PaulSebastianM 4 роки тому

      ​@@jonhoo As I recall, he demonstrated on reddit that the exact code someone was complaining about being unsafe, and which demonstrated how it was unsafe, was indeed unsafe in just one condition (which they both agreed on), but that condition should and WOULD never happen because the only way it could happen, was from explicit triggering from inside of the crate, which no-one would ever do..., or would never get merged to master if PR'ed, and also nobody using the crate could accidentally trigger it either! Seemed convincing to me... He also seemed rather upset at the fact that other programmers thought of him as not knowledgeable enough to decide whether his use of unsafe code was going to end up as actual unsafe behavior. I think he felt/feels rather proud of his library, but when people started constantly throwing the "unsafe" word around his actually quite marvelous library, everywhere... on social media, on boards, on PRs... constantly... I think he felt rather wounded, cornered, beaten and tired. And because that didn't stop, he just gave in to his anger.

    • @jonhoo
      @jonhoo  4 роки тому +2

      This is a discussion poorly suited for UA-cam comments :) Exposing unsafe code as safe even within your library is a foot-gun waiting to happen, and should be avoided wherever possible. It is too easy to forget the exact safety contract while you are working on code elsewhere. It has nothing to do with being "smart enough" to handle it. Unsafe should cause _anyone_ to be extremely careful, and should be contained in safe wrappers with well-documented safety contracts. Knowing when unsafe code is safe, and more generally, what code can trigger undefined behavior, is fraught with corner-cases, and so care is needed. _More_ care than in C/C++ land, because all safe Rust code relies on the unsafe code upholding the global Rust safety guarantees. The maintainer seemed largely unsympathetic to the potential dangers of unsafe, and required proof that an _end-user_ could cause undefined behavior, which is a very low bar in a core library like actix. In this, and other, situations, the maintainer was also not willing to accept patches that remove the source of the unsoundness, even when they were offered. That does not, of course, excuse harassing the maintainer, but as I recall that was an isolated incident, and not applicable to the community response overall.
      In any case, this has been hashed through enough times in a million places online already, so I do not think continuing this discussion here is worthwhile, as it's not really relevant to the message of the presentation. Feel free to email me if you want to continue one-on-one though.

  • @emdeization
    @emdeization 2 роки тому

    Amazing

  • @batorshikh.baavgaikhuu
    @batorshikh.baavgaikhuu 5 років тому

    Interesting!

  • @jwyliecullick8976
    @jwyliecullick8976 5 років тому

    Bravo.