The New Best Way To Search for Values in .NET 8

Поділитися
Вставка
  • Опубліковано 23 лис 2023
  • Use code BLACKFRIDAY23 and get 40% off any course and 20% off any bundle on Dometrain: dometrain.com/courses?coupon_...
    Get the source code: mailchi.mp/dometrain/vi8f453djn
    Become a Patreon and get special perks: / nickchapsas
    Hello everybody, I'm Nick, and in this video, I will introduce you to a new type added in .NET 8 called SearchValues which makes searching for value matches extremely efficient and fast. In fact it is the fastest way to do this in .NET.
    Workshops: bit.ly/nickworkshops
    Don't forget to comment, like and subscribe :)
    Social Media:
    Follow me on GitHub: github.com/Elfocrash
    Follow me on Twitter: / nickchapsas
    Connect on LinkedIn: / nick-chapsas
    Keep coding merch: keepcoding.shop
    #csharp #dotnet #dotnetconf

КОМЕНТАРІ • 144

  • @lordmetzgermeister
    @lordmetzgermeister 8 місяців тому +246

    I'm adding the quote "It's small but extremely efficient" to my list of excuses in the bed.

    • @Delphi80SVK
      @Delphi80SVK 8 місяців тому +6

      you killed it with this one :-D

    • @vytah
      @vytah 7 місяців тому +2

      Does "efficient" mean it finishes very fast?

    • @handlez411
      @handlez411 7 місяців тому

      69

  • @zachemny
    @zachemny 8 місяців тому +159

    Performance comparison with a naive HashSet approach would also be interesting.

    • @ardavaztterterian1211
      @ardavaztterterian1211 8 місяців тому +6

      Yes. That's what I was thinking

    • @Crozz22
      @Crozz22 8 місяців тому

      I did another benchmark with hashset (check foreach character in ExampleText whether it is in the hashset, return on first mismatch). And it was about the same speed as the ExampleText.AsSpan().IndexOfAnyExcept(Base64CharsArr) == -1; .. So SearchValues still insanely faster

    • @Assgier
      @Assgier 8 місяців тому +11

      Also a solution using LINQ should be joined into the mix.

    • @lordmetzgermeister
      @lordmetzgermeister 8 місяців тому

      @@Assgier If you deeply care about performance, you should not use LINQ.

    • @volan4ik.
      @volan4ik. 8 місяців тому +6

      ​@@Assgier it would be less efficient than the simple loop sample

  • @keithprice3369
    @keithprice3369 7 місяців тому +32

    It would be helpful if, when you share cool tips like this, you included a handful of practical use cases. I watch your videos and about half the time I leave thinking, "That's cool, but when would I use this?"

    • @jannowak2413
      @jannowak2413 7 місяців тому +1

      But in this video he actually did? He presented how microsoft is using it. Which in my opinion is even more valuable then imaginary examples :D

    • @zirexpl6395
      @zirexpl6395 7 місяців тому +3

      I have the same feeling everytime i watch his videos of performance 😅

    • @jeroboam4486
      @jeroboam4486 7 місяців тому

      It just means you have no use for it at the moment. But in my case I can see how it could massively improve a project at work.

  • @der.Schtefan
    @der.Schtefan 8 місяців тому +19

    What I find so crazy is that SearchValues has so many cases it handles, with so many classes, jumps, method calls, many years ago this would have been slow on .NET. But nowadays all of them are so aggressively inlined, and so JIT optimized, they use SSE3, packed instructions, SIMD, etc. that they still complete in an average of 6 CPU clock cycles (which of course is not true. it is only like this in the benchmark case, but due to hyperthreading and pipelining, speculative execution, etc. the throughput delay will be probably around 20 clock cycles).

    • @SirusStarTV
      @SirusStarTV 7 місяців тому

      By integrating extended processor instructions that almost no one knows about behind the scene anyone gets the benefits that those instructions provide

  • @AlFasGD
    @AlFasGD 8 місяців тому +30

    The performance improvements and tools they are adding in newer .NET versions are top-notch

  • @ibrahimhussain3248
    @ibrahimhussain3248 8 місяців тому +1

    I do a lot of searches within List. Is there a similar approach we can use with List?

    • @gfx2
      @gfx2 7 місяців тому

      SearchValues only supports byte and char with .NET 8, and string with .NET 9. Getting the Span of your List and you can use SearchValues if your T is any of those 3 types

    • @AKW91
      @AKW91 7 місяців тому

      If it's ordered, List provides a BinarySearch function with complexity O(log N), else there is only a looping search with an average complexity O(N). If your searches on the list only requires a specific property, consider using a Dictionary instead, accessing an Element by its key has an insanely fast complexity of O(1)

  • @tomchapman128
    @tomchapman128 7 місяців тому +1

    I love the example string you used for the valid base64 string in the benchmarks lol

  • @nocturne6320
    @nocturne6320 8 місяців тому +13

    Missed a chance to compare it with a compiled regex, would like to see the comparison there.

    • @J_i_m_
      @J_i_m_ 7 місяців тому +1

      Regex and performance.... it will be under the bottom

    • @nocturne6320
      @nocturne6320 7 місяців тому

      @@J_i_m_ Regexes can now be source generated into a function, even the Regex class has an option to compile the regex into a function at runtime, so it's never interpreted -> should be pretty fast

    • @_mihazupan
      @_mihazupan 7 місяців тому +2

      Regex will use SearchValues internally now where it makes sense. For such a simple case you'll just pay a little bit of overhead to call through the Regex logic instead

  • @sam2943
    @sam2943 4 місяці тому

    Thanks for bringing this golden nugget to people's attention.

  • @YuriBez2023
    @YuriBez2023 8 місяців тому +10

    I'd like to see the benchmarks for 1k, 10k, 100k, 1Mb size strings. I wonder if it scales.

    • @SteffenBauer
      @SteffenBauer 8 місяців тому +4

      You just can download the source code and try it ;)

  • @keyser456
    @keyser456 8 місяців тому +3

    I downloaded the sample (asking for email is kind of... ehh... but w/e) to do some comparisons to HashSet implementation (I see in the comments I'm not the only one who instantly thought of that). Rider is complaining (red squiggly error) about the initializer syntax on the character array, but dotnet builds it without errors. I have the latest .NET SDK. Are you using an EAP version of Rider maybe? I've been holding off hoping the new stuff makes it into the mainline soon.

  • @davepusey
    @davepusey 8 місяців тому +6

    I'd like to see a benchmark of doing this with a regex as that is how I would have likely implemented that method.

    • @jongeduard
      @jongeduard 8 місяців тому

      Maybe this story is even going to be opposite. It would not surprise me if they are using the SearchValues type inside the dotnet Regex engine as well.

  • @duramirez
    @duramirez 7 місяців тому

    Hey Nick, any tips on "warm up" services after configuring them in DI ? I have a thread that needs to keep running on my singleton, but it is only started when a new instance of the service is created, which happens when I access a method in the controller, but I want it to happen on the API initialization process. 🤔 I saw some solutions on Stackoverflow, but I found them dirty and too coupled. Help, please. :D Also my project is .net 7 🙂

  • @davestorm6718
    @davestorm6718 8 місяців тому

    My God! This has applications everywhere!

  • @markharwood6794
    @markharwood6794 8 місяців тому +10

    I think the "hat" char is called a "Carat" :)

    • @denissorn
      @denissorn 8 місяців тому

      Lol I thought it's 'carrot'?

    • @nick066hu
      @nick066hu 8 місяців тому +4

      Caret

    • @alfflasymphonyx
      @alfflasymphonyx 8 місяців тому

      in French, it is called Circonflex accent

    • @SteffenBauer
      @SteffenBauer 8 місяців тому +1

      its called "Circumflex". Caret is a old name for it. en.wikipedia.org/wiki/Caret_(proofreading)

  • @klaxxon__
    @klaxxon__ 8 місяців тому +5

    Do Regexes used this internally? I would expect at least a compiled regex would either use this internally, or build up something internally when checking for character ranges. Using a Regex would have been my first instinct when performing this check (not knowing about SearchValues before watching this video)

  • @brandonpearman9218
    @brandonpearman9218 7 місяців тому +1

    When I need to save a few nano seconds, I will definitely use this.

  • @erikgook598
    @erikgook598 8 місяців тому +8

    isn't those characters in a continuous range in ascii? You could just test for validity by casting them to ints and then do arithmetic instead of looping through an array on each character in the input.

    • @mk72v2oq
      @mk72v2oq 8 місяців тому +7

      You don't even need to cast. "if (c >= 'A' && c

    • @erikgook598
      @erikgook598 7 місяців тому

      But perhaps it's more complicated if the text is not utf8 @@mk72v2oq

    • @mk72v2oq
      @mk72v2oq 7 місяців тому +3

      @@erikgook598 it can't be UTF-8 in the first place. Dotnet strings are UTF-16 and 'char' size is fixed 2 bytes. Considering that in strings you enumerate over whole characters, not over raw bytes, this construct always works.

    • @ChyK24
      @ChyK24 6 місяців тому

      Actually the `SearchValues.Create` method takes this into account. Try to call `SearchValues.Create("ABCDEFGHIJKLM");` or `SearchValues.Create(@"UVWXYZabcdef[\]^_`abcd");` and see in debugger that the actual class that is instantiated is called "RangeCharSearchValues" while for `SearchValues.Create("abcABC");` it is `AsciCharSearchValues`.
      I guess the execution time of `SearchValues.Create` can be quite substantial. But this method is really meant to be called once to save the result in static field.

    • @tmnt9001
      @tmnt9001 6 місяців тому

      ​@@mk72v2oq actually it would still work but not for the reason you said. you can have characters that are bigger than 16 bits, which means they wouldn't fit in a single char. But that doesn't matter because all base64 characters fit in a single char and the comparison would fail in the expected situation (but this also means that it would work with 8 bit characters).

  • @nvmuzrowrihk
    @nvmuzrowrihk 8 місяців тому +12

    10:12 It's actually insane how many implementations they have depending on the characteristics of the data.

    • @kRySt4LGaMeR
      @kRySt4LGaMeR 8 місяців тому +2

      you'd be surprised how the C++ stdlib looks like :P

    • @reubendeleon-ji6up
      @reubendeleon-ji6up 8 місяців тому +6

      You need this because of how intrinsics work. Your cpu probably supports 128 and 256, maybe even 512. This means you need that many values preloaded into a vector (think span, but allocated for the cpu). The cpu then has a special instruction (single instruction) that will perform an operation on all 128-512 values in one tick. This is actually the craziest part. The cpu has to physically lay out each unit in logic gates somewhere physically in the cpu. So that logic (other than the optimizations for 1, 2 and 3) are just logic to create sets of those vectors efficiently. Most of intrinsic support got added in .net 6, and really utilized in .net 7. They’re just finding every last nook to stuff it in with 8.

  • @tetheredsun950
    @tetheredsun950 7 місяців тому

    Am I missing something or is it just implemented for char and byte? It would be lovely to have it for ints.

  • @MrCakeLP
    @MrCakeLP 8 місяців тому +10

    Your benchmarks never gonna give us up.

  • @DucaTech
    @DucaTech 8 місяців тому +1

    can we compare this to a regex?

  • @georgebelletty7861
    @georgebelletty7861 8 місяців тому

    Very cool

  • @JohnTasler
    @JohnTasler 7 місяців тому

    I would also like to see this benchmarked using AOT. Most of your performance videos, actually.

  • @oddikaro8236
    @oddikaro8236 8 місяців тому +1

    I guess the new search algorithm was part of a Microsoft interview for a junior position. Interesting way to improve .NET.

  • @fernandoluiz4593
    @fernandoluiz4593 8 місяців тому

    It would be nice if you also compared with regex match

  • @cheebadigga4092
    @cheebadigga4092 8 місяців тому

    Thank you!! This is awesome!

  • @martinprohn2433
    @martinprohn2433 7 місяців тому

    I really wonder, how a realistic approach would have performed. I mean, why would you iterate over 64 chars (having 64 comparisons), when a simple expression like "ch >= 'a' && ch

    • @_mihazupan
      @_mihazupan 7 місяців тому

      SearchValues can process multiple characters at once. You can try to benchmark your idea and you'll see it's still much slower when looking through a longer span.

  • @uncommonbg
    @uncommonbg 8 місяців тому +3

    Honestly, I like this more because of the name they chose than the performance itself. The performance might come in handy, but more importantly I believe the new type will make the code read better.

  • @saniel2748
    @saniel2748 7 місяців тому +2

    Love seeing such a high level language taking performance probably more seriously than any of its competitors
    To me last thing to hope for would be *some* manual memory management, like Go's arena allocators which would be unsafe but also very useful for applications such as games
    Like having per frame arena that gets wiped for temporary data or arena for spawning a bunch of objects really quickly and then destroying all at once

  • @williamliu8985
    @williamliu8985 8 місяців тому +16

    IMO, much better feature than primary constructor in net8...

    • @lordmetzgermeister
      @lordmetzgermeister 8 місяців тому +9

      There are no primary constructors in .NET 8, that's a C# 12 feature. Note the difference between those two.

    • @TheMonk72
      @TheMonk72 8 місяців тому

      ​@@lordmetzgermeisterC#12 depends on and was released with dotnet 8. When you start a new project in dotnet 8 you're using C#12 by default. So they're basically the same thing as far as most people are concerned.

  • @ChaimMosheStern
    @ChaimMosheStern 7 місяців тому

    I'm surprised you didn't try hash set of strings

  • @winterboltgames
    @winterboltgames 8 місяців тому

    Are search values similar to database indices?

    • @GumbootMan
      @GumbootMan 8 місяців тому +1

      They're similar in that they are both trying to optimise the problem "Do values A, B, C exist in data set D?". But SearchValues optimises A, B, C while a database index optimises data set D. The closest thing to a database index in .NET is probably SortedSet.

    • @winterboltgames
      @winterboltgames 8 місяців тому

      @GumbootMan makes sense. Thanks for the explanation!

  • @Agge_24
    @Agge_24 7 місяців тому

    Γειαα σου είμαι η μαθητρια τις μαμάς σου ❤❤❤

  • @kevinbro518
    @kevinbro518 8 місяців тому +3

    Is it faster than a source generated regex?

    • @nickchapsas
      @nickchapsas  8 місяців тому +3

      Yeap

    • @Jellow2202
      @Jellow2202 8 місяців тому +1

      @@nickchapsas actually the regex source code generator uses SearchValues and if you have larger input data, the overhead of the regex stack gets less noticeable and their performance converge. I think regex is a good middle ground, with its additional flexibility if you expect larger inputs.
      Using your benchmark code I added a generated regex using "^[A-Za-z0-9+/]*={0,3}$" which also only allows up to 3 equals at the end, and tested with a 5k character string:
      IsBase64_SearchValues: 124 ns
      IsBase64_CharArray: 179 ns
      IsBase64_Naive: 51,231 ns
      IsBase64_RegexGenerated: 147 ns

    • @carldaniel6510
      @carldaniel6510 8 місяців тому +1

      In .NET 8 source generated Regex uses SearchValues automatically where it's beneficial!

  • @CharlesBurnsPrime
    @CharlesBurnsPrime 8 місяців тому

    I can scarcely believe the performance difference.

  • @kondziossj3
    @kondziossj3 8 місяців тому +4

    I would like to have a project where I would need to care about such optimizations
    ehh... Maybe... someday

  • @ardavaztterterian1211
    @ardavaztterterian1211 8 місяців тому +10

    So, why don't you mention what kind of data structure is used behind the scenes?

    • @fusedqyou
      @fusedqyou 8 місяців тому

      Not the point of the video

    • @metaltyphoon
      @metaltyphoon 8 місяців тому

      Seriously… it’s probably the more interesting parts

    • @stefano_schmidt
      @stefano_schmidt 8 місяців тому

      what kind of data structure is used behind the scenes?

    • @oddikaro8236
      @oddikaro8236 8 місяців тому

      The internal algo probably uses several algorithms for different scenarios and all of them probably come from state-of-art papers. So, yeah, Nick is going to read 10 papers for a 10min video.

    • @ardavaztterterian1211
      @ardavaztterterian1211 8 місяців тому

      So what's the point of the video? Just that there is a new type which is fast? You would already know that there is a new type if you are following the release drafts.@@fusedqyou

  • @ciorchinos
    @ciorchinos 8 місяців тому +3

    this can be used for a web crawler to collect data for an AI learning algo. would be interesting to showcase a ML algo that learns something form a given document and cross reference to some online sites.

  • @KPAMCATEJlb
    @KPAMCATEJlb 8 місяців тому +4

    XcQ... so familiar symbols :D

  • @antonmartyniuk
    @antonmartyniuk 8 місяців тому

    I knew spans have something to do with this 😅

  • @NoName-1337
    @NoName-1337 7 місяців тому

    I think they use a Set in the background.

  • @wlbmonizuka
    @wlbmonizuka 7 місяців тому

    WTF why naive realisation is not a HashSet ??

  • @cheezyskipper
    @cheezyskipper 8 місяців тому +7

    He rick rolled us.

  • @neilbroomfield3080
    @neilbroomfield3080 7 місяців тому +3

    Shame you didn't compare to a HashSet 🤷‍♂️

  • @Baby4Ghost
    @Baby4Ghost 8 місяців тому

    Audio synch issues?

    • @nickchapsas
      @nickchapsas  8 місяців тому +5

      None that I can spot

    • @Baby4Ghost
      @Baby4Ghost 8 місяців тому +1

      @@nickchapsas might be local on my pc, LOVE the vid though! Will be using this when appropriate. That difference is HUGE!

  • @KingOfBlades27
    @KingOfBlades27 8 місяців тому

    I guess using this is more efficient than using regex when doing these kinds of comparisons? 🤔

    • @nickchapsas
      @nickchapsas  8 місяців тому +1

      Oh yeah regex is not even close

    • @KingOfBlades27
      @KingOfBlades27 8 місяців тому

      @@nickchapsas Damn. Good to know. Thanks for answering! 💪

    • @davidmartensson273
      @davidmartensson273 8 місяців тому +1

      @@KingOfBlades27 Even if, as mentioned in an other comment, regex will use this technique when applicable, since regex supports so many other things it will add overhead compared to the raw implementation and that is very often the case, high performance code is very often not elegant or easy to read so whenever MS can hide away some of that it helps everyone else build better code that is both fast and readable :)

  • @eejdownes
    @eejdownes 8 місяців тому

    I hate that my brain can recognize the first example string as Base64 in O(1) time from internet survival optimizations

    • @davidmartensson273
      @davidmartensson273 8 місяців тому +1

      The brain is extremely good at pattern matching in parallel, its one of its main features :)

  • @qwerty01453
    @qwerty01453 7 місяців тому

    Nick your videos are great but please speed them up! It took 5 minutes to get the interesting point! It’s too much!

  • @K1llRay64
    @K1llRay64 4 місяці тому

    don't be junior, explain how it works

  • @XmasApple
    @XmasApple 8 місяців тому

    Lmao I was Rick rolled

  • @TioPew
    @TioPew 7 місяців тому

    It looks a example made by ChatGPT transcribed into a video.

  • @Kingside88
    @Kingside88 8 місяців тому +2

    Maybe useful for Libraries but not for daily business

    • @RakodilGena
      @RakodilGena 8 місяців тому +1

      Useful in any case when you have some list of chars that should not appear in username or anywhere else. Build this once, check faster always

    • @davidmartensson273
      @davidmartensson273 8 місяців тому +1

      As mentioned, MS uses it within their methods, so even if you do not actively use it you can still benefit from is, like faster regex or HTTP methods, or serialization, deserialization, encryption and many many more examples.

    • @Kingside88
      @Kingside88 7 місяців тому +1

      @@davidmartensson273 I didn't think about that, but of course you're right

  • @wboumans
    @wboumans 8 місяців тому +2

    another feature that 'simple' devs wont ever use. The compiler needs to be smart and transform stuff to this.

    • @krccmsitp2884
      @krccmsitp2884 8 місяців тому +2

      What is a "simple dev" anyways? I can imagine several uses cases for two of my current projects.

    • @Dhaiky
      @Dhaiky 8 місяців тому

      @@krccmsitp2884 Can you tell me of some of these use cases? I'm just curious where this would be used for without being an replacement for some Regex

    • @davidmartensson273
      @davidmartensson273 8 місяців тому +3

      Since MS is adding this under the hood in .NET every dev will benefit to some degree even if they do not use it them self.
      And also the compiler already makes many rewrites of your code for better performance as has been shown using tools that can display the "lowered" code.
      Often convenient methods gets changed into more efficient before it really starts compiling to simplify the compiler and make it easier to reuse highly optimized solutions instead of re-implementing almost the same algorithm slightly different. "while" is often changed into a for loop, foreach can also be changed into a for loop or even to use if statement and goto ;)
      Its all about if its a common type of expression that warrants the extra time, and since many developers use examples from the web like from Stack overflow or other, some patterns will be very common and MS can then try to find such and rewrite them for better performance.

    • @TheMonk72
      @TheMonk72 8 місяців тому

      ​@@davidmartensson273opposite is true: for is implemented as while, decompilers just recognise for-like constructs.

    • @davidmartensson273
      @davidmartensson273 8 місяців тому

      @@TheMonk72 yes your right, I mixed them up :). But the concept still holds true :)

  • @youssefwael8
    @youssefwael8 8 місяців тому

    First

  • @JackBauerDev
    @JackBauerDev 8 місяців тому

    You cant get paid to write that... Just spend free time contributing to the runtime... Did you see this in the allow whitespace PR? I wish runtime was hiring...

  • @kumailn7662
    @kumailn7662 7 місяців тому

    Sorry your courses are way more expensive even after blackfriday discount.

  • @AndreViannaRJ
    @AndreViannaRJ 8 місяців тому +28

    So... Microsoft finally discovered binary search.

    • @AvenDonn
      @AvenDonn 8 місяців тому +2

      You jest, but have a look what switch actually compiles down to

    • @der.Schtefan
      @der.Schtefan 8 місяців тому +32

      Its not binary search, it is SIMD, SS3, packed instructions. Before writing, you should ED-UC-ATE-YOUR-SELF.

    • @yurkoflisk
      @yurkoflisk 8 місяців тому +9

      Binary search is irrelevant here, what it boils down to in most cases is basically boolean table lookup in one form or another, e.g. maybe the table itself is shoved into some YMM register (it's exactly 32 bytes, so can hold 256-bit mask) if possible, though the easiest way (and what I would do manually) is to just create 256-element boolean array.

    • @Kavisexe
      @Kavisexe 7 місяців тому

      😂 Max Microsoft Hater;

  • @Agge_24
    @Agge_24 7 місяців тому

    Γειαα σου είμαι η μαθητρια τις μαμάς σου ❤❤❤