No, efficient code that uses anything related to "unsafe" or "marshal" must be complicated and mind-blowing to show off in front of C# newcomers how good you are. This is what we never have to change🙂
Haskell has the most elegant solution to this that I'm aware of. Haskell uses the function alter with a type like this (translated to C# syntax and mutable dictionaries) void alter(Dictionary dict, K key, Func f) Now, the argument that is passed to `f` is `None` iff the key was not found and `Some(theValueAtTheKey)` otherwise, and returning `None` from f will remove the binding from the map. There is also alterF that allows f to return in an arbitrary functor, but that doesn't translate to C# all that well.
True. Thinking about it, internally it could be optimized to call the dictionary only once. The only difference is that with a ref you can handle structs as values more efficiently, but to be fair, who uses structs as values in dictionaries anyway? It's rare, and it's especially rare to also make it more performant because it could be a performance bottleneck.
In fact, the TryAdd method, which directly calls the private TryInsert method, has the comment "NOTE: this method is mirrored in CollectionsMarshal.GetValueRefOrAddDefault below", so in this case TryAdd makes more sense for clarity
@@Eternal--Sunshine TryAdd takes a precomputed value. What if that value is expensive and you want to compute it only if you need to insert it into the dictionary? In which case a factory method/delegate parameter will come in handy. But that represents potential overhead for dictionary access? Just thinking...
@@brianviktor8212 nobody does precisely because of the problems listed in the video. people have to go as far as creating custom crappy dictionary types because of this.
Why do we need use ContainsKey before assigning or using Dictionary value? We can assign directly var x = new Dictionary(); x[123] = "abc"; x[123] = "def"; It will add or replace value
Because I don’t want to add the value if it already exists. It is a valid usecase in many scenarios. Consider that a player in an online game needs to join the queue to find a match to play. If they don’t exist then we add them in the dictionary. If they do exist we show the "you are already in the queue" message
@@nickchapsas Yep, but in your case it's not performance matter already. In case of reference type object to be added into dictionary it depends on: 1) If object exists(created) when checking is performed 2) Actual object supposed to be stored in dictionary should always be the same(hashcode evaluated on object fields), so there is no reason to be afraid of replacing the same instance(only if it should be created before add operation(see p1)) 3)If you only need to inform caller that object already exists, it's better to use two methods: one for checking, one for adding/replacing and use for apropriate for your scenario It's kind a tricky question you've raised and in case of reference types I would not recomend using those approach. But in case of structs where unobvious copy operations exist, it's possible, but only if properly tested and measured
@@keksoid4 Objects with the same key are allowed to exist in your application, just not in the dictionary. Depending on your requirements you can handle the dictionary with "overwrite/replace" allowed, or not. x3 gave the code for 'always overwrite', Nick for 'never overwrite'.
Multiple things can have the same hash, just as multiple passwords can have the same hash(part of brute forcing is you don't need to find the exact password, just one with the same hash). Just the number of collisions is low(though does go up quickly, like the birthday problem).
Thanks for sharing this. I will continue with the old way. Why? It's easier and makes more "readable" sense to junior developers who will be supporting my code.
I would like to see a performance comparison where the dictionaries are operating in a multi threaded environment and we need to make sure that this code executes in a critical section.
Nice video, i've encountered the double lookup as well, and could of used this code several times before. Does the unsafe class mean i need to compile unsafe btw?
So I use Unity Engine, and I don't think we have access to these functions unfortunately, because Unity is stuck on an older .NET version. But do you think there are some internal functions that exist in the older .NET versions that I can do some reflection on to get access to? Btw, I really enjoy watching these videos where you show hidden gems of devious things you can do in C# It gets me excited!
You'd have to check out if Mono has some, since those might be implementation details and aren't standardized. Unity might switch to .NET 6 soon ish, though I sadly don't know when
It's a cool idea, but I can't condone using unsafe code. I am sure there are good use cases for it - but you need to weigh the risk before considering a path like this.
It's awesome :) , but i hope i will never find such tricky and magical code in my daily work. Even chatgpt won't help me understand what the heck is going on here XD.
7:45 In between the time GetValueRefOrAddDefault is called and the time valOrNew is updated, does it mean some other piece of code could get the entry from the dictionary and get a null value?
Yes, i imagine so. The dictionary is not locked in any way, and the marshal call only inserts a default. Would be "fixed" with a factory method instead of default
Pretty sure this has worked in Visual Studio as well since ... forever. There's a "Locals" window while debugging, or the inline code has context sensitive drop-downs (like you're seeing in this video), or you can probe stuff live with code including Intellisense from the "Immediate" window. I just switched to Rider about a month ago and it's very impressive, but a lot of this stuff can also be found in VS (including Community version).
@@keyser456 to be fair it doesnt always work well in VS, which is probably what the OP is referring to. for example many times for the stdlib it just shows you the method's "signatures" and not the body for some reason.
IReadOnlyDictionary can guarantee no modifications, or you can create your own wrapper that locks all operations on a dictionary so only one thing can be done to it at a time. ConcurrentDictionary also has a GetOrAdd method that does what is shown in the video
ref var x = ref blahblahblah f(var out y, struct butOnlyOnATuesday). LOL, now take that and syntactic-sugar-ize it. Been coding for 40+ years and I can't help but feel that over time all the things that made C# nice and readable are being obfuscated and re-obfuscated and soon it may read again like (my beloved but almost incomprehensible) C ! Having said that, thanks for the video, good one!
was close to trying this out today already, but... can't use refs in async code (well I guess it'd be running into the formentioned issues when updating the dictionary potentially as well) but still sad :D
The CollectionMarshall methods returns the store itself, so you use the ref keyword to interact directly with it. The specific method Nick uses puts a properly keyed and very unsafe null into the dictionary, the ref keyword allows you to replace this null with the actual value. You can also do some serious damage with it; it's borderline unsafe and fast for a reason.
Short version: this "forces" the compiler to use the variable as a pure memory pointer, even if it MAY point to a value/struct type, or even a "null" value of said type. It's a memory address, not a variable reference. This code behaves like C++ pointers, where you can store in a variable either the value (val) or the reference/address (ref) to that value. Basically: int a = 5; // Declares memory space and assign value. int *b = &a; // Stores the pointer value to another variable (& fetches the memory location of the variable, * indicates a pointer variable.) *b = 6; // Stores a NEW value directly to the memory location, by dereferencing the address b points to) printf(*b); // Would print the value AT position b by dereferencing, hence would show "6" printf(b); // Would print the integer value of the memory position, which would most likely NOT be "6". Using this method in C# (reference pointers) you can manipulate the value of BOTH reference types (classes) or value types (structs and values) via pointers. Hence, this is *unsafe* because you override the usual C# logic for referencing values, and directly tamper with memory locations: something the language was made to prevent for usability in most contexts.
There is an issue in the sample for the struct scenarios with updating the ID of the value inside the dictionary the Key referring to that is not really updated, the Key is still referring to the old value while the id of the instance of the struct is updated. could result in bugs at some scenarios ua-cam.com/video/rygIP5sh00M/v-deo.html
The "GetOrNullRef" example you made was really awkward.... in the end you have an item that is saved with an incorrect key. Just think you should have mentioned that one should not do that specifically.
That is built specifically for concurrent/multi-threaded access. This scenario does not imply that you will be looking to cut out the double hash computations which on a tight loop save CPU time and allocations.
At time index 7:22 a good example of why (IMO) `var` is undesirable. This is code I've never seen before and my comprehension is compromised a little by the lack of type information in the code. Especially in a team of developers where code gets included in the product via pull requests after review, the use of `var` hampers the review process even more because code is reviewed by Web based before/after UIs that don't let you hover over the `var` to tell you what the type is (and even having to do that is worse than simply using `bool` in the place of `var` for example).
True. I personally almost never use var. The only way to make it workable in Visual Studio is to activate some option which shows the actual type of vars right next to it. But as you said, outside of VS the problem is still there and makes life harder.
var was only intended for cases where the type is obvious. Unfortunately, most developers I've worked with decided to use it everywhere. Ironically, those same developers often complain about the difficulties they encounter when working with variables in languages like JavaScript or python because they're dynamically typed and so variable declarations don't give type information.
@@ItsTheMojo - In JS you're forced to always keep track of the types of everything, because otherwise you'll end up with some bad bugs. In C# the compiler does the work for you. Having the machine deal with it lessens the cognitive load of the developer and is much less error prone. There's a reason languages like TS, Python3 and PHP have moved towards more static analysis. Personal opinion: Types are for the compiler to ensure correctness. If they can be inferred, all the better. See also every FP language.
@@brianviktor8212 Code is much cleaner if you use simply use var keyword. Especially if you class has mile long names and it has Generics in Generics classes or collections. With ctrl + left mouse or mouse over you can easily see what type represents the return method or property.
Given what this does, and how it has massive footgun potential, that awful syntax might be deliberate. (See Scala's choice of "isInstanceOf".) If you really need it, you can put up with the ugliness. And if you don't... the ugliness reminds you that you likely shouldn't be using this technique :)
@@nickchapsas it is:) its one edge case that 95% will not experience and even if they do, I'm sure they have more serious problems in their code :) dont get me wrong its useful information but the thumbnail is a click bait :)
@@maciekwar It's not :) Just because the usecase is limited it doesn't make the video clickbait. I am showing you a way to double your performance. Whether you can use that in your very specific usecase or not is irrelevant. :)
Again the lowlevel stuff, even using !Unsafe i am so glad i do not even know what it does. Read the word UNSAFE that you should not use if you want to create readable and maintable code. if you are really worried about the penalty of containskey maybe assembler is your better language of choice to work with.
While I appreciate this type of videos, but tbh it's been a long time since I think these are useful, maybe you can stop with the obsession of posting performance things for 1 ms change, or no 100kb of memory used, please do something else, I know your actual useful videos are on a paywall in a course type of content, but maybe posting an introduction to those would actuall help people start, and maybe generate more sells for those videos.
I’m not looking to sell more courses. I’m looking to showcase things that I find cool to a wider audience. If I wanted to sell more courses I would make freemium videos that show 5% of a topic and then tell you that to get the rest you have to buy a course
Given readability and complexity I would only consider using the method in hot paths that really needed it. But good info to know.
great, now it needs to be simplified with syntactic sugar
No, efficient code that uses anything related to "unsafe" or "marshal" must be complicated and mind-blowing to show off in front of C# newcomers how good you are. This is what we never have to change🙂
Yep, the very first thing I'd do is create extension methods.
public static V GetOrPut(this Dictionary source, K key, Action valueGenerator) where K : notnull {}
@@volan4ik. literally none of that is complicated but ok
@@saniel2748 literally it's not, but might look like it is to people not familiar with that mechanism
I value maintainability over everything else an probably won’t be using this, except for very demanding and performance sensitive loops
Thanks! This helped me shave about 15% off the runtime of the pathfinding code for the game I'm working on!
Haskell has the most elegant solution to this that I'm aware of. Haskell uses the function alter with a type like this (translated to C# syntax and mutable dictionaries)
void alter(Dictionary dict, K key, Func f)
Now, the argument that is passed to `f` is `None` iff the key was not found and `Some(theValueAtTheKey)` otherwise, and returning `None` from f will remove the binding from the map.
There is also alterF that allows f to return in an arbitrary functor, but that doesn't translate to C# all that well.
I'm almost sure that last time I checked there was a "TryAdd" function on Dictionary, could it be used too?
Yes.
True. Thinking about it, internally it could be optimized to call the dictionary only once. The only difference is that with a ref you can handle structs as values more efficiently, but to be fair, who uses structs as values in dictionaries anyway? It's rare, and it's especially rare to also make it more performant because it could be a performance bottleneck.
In fact, the TryAdd method, which directly calls the private TryInsert method, has the comment "NOTE: this method is mirrored in CollectionsMarshal.GetValueRefOrAddDefault below", so in this case TryAdd makes more sense for clarity
@@Eternal--Sunshine TryAdd takes a precomputed value. What if that value is expensive and you want to compute it only if you need to insert it into the dictionary? In which case a factory method/delegate parameter will come in handy. But that represents potential overhead for dictionary access? Just thinking...
@@brianviktor8212 nobody does precisely because of the problems listed in the video. people have to go as far as creating custom crappy dictionary types because of this.
Would've been nice if you've added benchmark for TryAdd method.
Why do we need use ContainsKey before assigning or using Dictionary value?
We can assign directly
var x = new Dictionary();
x[123] = "abc";
x[123] = "def";
It will add or replace value
Because I don’t want to add the value if it already exists. It is a valid usecase in many scenarios. Consider that a player in an online game needs to join the queue to find a match to play. If they don’t exist then we add them in the dictionary. If they do exist we show the "you are already in the queue" message
@@nickchapsas Yep, but in your case it's not performance matter already. In case of reference type object to be added into dictionary it depends on:
1) If object exists(created) when checking is performed
2) Actual object supposed to be stored in dictionary should always be the same(hashcode evaluated on object fields), so there is no reason to be afraid of replacing the same instance(only if it should be created before add operation(see p1))
3)If you only need to inform caller that object already exists, it's better to use two methods: one for checking, one for adding/replacing and use for apropriate for your scenario
It's kind a tricky question you've raised and in case of reference types I would not recomend using those approach. But in case of structs where unobvious copy operations exist, it's possible, but only if properly tested and measured
@@keksoid4 I think these methods would be of help when considered for multi-level dictionaries.
@@keksoid4 Objects with the same key are allowed to exist in your application, just not in the dictionary.
Depending on your requirements you can handle the dictionary with "overwrite/replace" allowed, or not. x3 gave the code for 'always overwrite', Nick for 'never overwrite'.
Multiple things can have the same hash, just as multiple passwords can have the same hash(part of brute forcing is you don't need to find the exact password, just one with the same hash). Just the number of collisions is low(though does go up quickly, like the birthday problem).
Thanks for sharing this. I will continue with the old way. Why? It's easier and makes more "readable" sense to junior developers who will be supporting my code.
I would like to see a performance comparison where the dictionaries are operating in a multi threaded environment and we need to make sure that this code executes in a critical section.
Nice video, i've encountered the double lookup as well, and could of used this code several times before. Does the unsafe class mean i need to compile unsafe btw?
you don't
Why not add this as an extension method similar to the ConcurrentDictionary.GetOrAdd(TKey key, Func valueFactory) ?
Because that delegate slows it down. I could see that being useful if C# eventually adds value delegates, though.
@@protox4 the best solution would be if the JIT can inline the delegate
So I use Unity Engine, and I don't think we have access to these functions unfortunately, because Unity is stuck on an older .NET version. But do you think there are some internal functions that exist in the older .NET versions that I can do some reflection on to get access to?
Btw, I really enjoy watching these videos where you show hidden gems of devious things you can do in C# It gets me excited!
You'd have to check out if Mono has some, since those might be implementation details and aren't standardized.
Unity might switch to .NET 6 soon ish, though I sadly don't know when
Really nice, but curious what would happen if the dictionary changes during the lookup and setting? Undefined behavior?
potentially a seg fault ig
It's a collection changed exception.
We can't use lock to this scenario?
I really appreciate all of your videos!
9 minutes ago. What a great start to my afternoon.
"Dictionary" and "performance" in the title??? SOLD
It's a cool idea, but I can't condone using unsafe code. I am sure there are good use cases for it - but you need to weigh the risk before considering a path like this.
You are right, but Nick is here to push our boundaries.
It's awesome :) , but i hope i will never find such tricky and magical code in my daily work. Even chatgpt won't help me understand what the heck is going on here XD.
I wanted to know what happened to you after the ads? Why you changed your hoodie?
7:45 In between the time GetValueRefOrAddDefault is called and the time valOrNew is updated, does it mean some other piece of code could get the entry from the dictionary and get a null value?
Yes, i imagine so. The dictionary is not locked in any way, and the marshal call only inserts a default. Would be "fixed" with a factory method instead of default
There is a TryAdd method on dictionary now. Idk if this is necessary .
With TryAdd, you cannot create an element, if it not exists. You have to create it, in any case.
Why do they not handle it like the concurrent dict with a getoradd function or maybe with a factory method instead of default
because it would need to allocate the delegate
@@antonofka9018 right, i forgot about that.
The struct example seems more a warning about misusing structs than anything else.
Could we lock the dictionary and run this code then unlock it?
Hi Nick, I follow and liked many of your videos, dumb question do you will ever considered make videos about Golang or Java? :p
Stay tuned for the first of April. There will be a Java video going up 😂
🤣
@@nickchapsas awesome!
How are you able to see into the Dictionary's set definition? (3:29) Is this something specific to this IDE or can it be done in Visual Studio also?
This is one of features of ReSharper or Rider IDE (in his case) :)
Pretty sure this has worked in Visual Studio as well since ... forever. There's a "Locals" window while debugging, or the inline code has context sensitive drop-downs (like you're seeing in this video), or you can probe stuff live with code including Intellisense from the "Immediate" window. I just switched to Rider about a month ago and it's very impressive, but a lot of this stuff can also be found in VS (including Community version).
@@keyser456 to be fair it doesnt always work well in VS, which is probably what the OP is referring to. for example many times for the stdlib it just shows you the method's "signatures" and not the body for some reason.
So how would one guarantee, for instance in an web api, that the dictionary value won't change during the lookup, without lowering performance?
IReadOnlyDictionary can guarantee no modifications, or you can create your own wrapper that locks all operations on a dictionary so only one thing can be done to it at a time. ConcurrentDictionary also has a GetOrAdd method that does what is shown in the video
Locks, single thread, changes only during initialization, etc. Either way it doesn't look pretty
Aren't locks affecting performance, potentially cancelling the gain that was created by this way of setting the value?
ref var x = ref blahblahblah f(var out y, struct butOnlyOnATuesday). LOL, now take that and syntactic-sugar-ize it. Been coding for 40+ years and I can't help but feel that over time all the things that made C# nice and readable are being obfuscated and re-obfuscated and soon it may read again like (my beloved but almost incomprehensible) C ! Having said that, thanks for the video, good one!
was close to trying this out today already, but... can't use refs in async code
(well I guess it'd be running into the formentioned issues when updating the dictionary potentially as well)
but still sad :D
Does it work in an async context?
Yes, as long as no other code modifies the dictionary before you complete whatever operation you are trying to carry out.
@@alexisfibonacci ref locals are forbidden in async methods because they're actually fields, no?
And what about the following? (dict[key]
Hashmap! Haaaashmappp! => hired
I am making big animal type dictionary with different species and their attributes
Amazing 10/10
guud
Man, where were you before I retired? Your videos are the best and excellent!!!
I did not get why you are using the "ref" keyword?
The CollectionMarshall methods returns the store itself, so you use the ref keyword to interact directly with it.
The specific method Nick uses puts a properly keyed and very unsafe null into the dictionary, the ref keyword allows you to replace this null with the actual value.
You can also do some serious damage with it; it's borderline unsafe and fast for a reason.
Short version: this "forces" the compiler to use the variable as a pure memory pointer, even if it MAY point to a value/struct type, or even a "null" value of said type.
It's a memory address, not a variable reference. This code behaves like C++ pointers, where you can store in a variable either the value (val) or the reference/address (ref) to that value.
Basically:
int a = 5; // Declares memory space and assign value.
int *b = &a; // Stores the pointer value to another variable (& fetches the memory location of the variable, * indicates a pointer variable.)
*b = 6; // Stores a NEW value directly to the memory location, by dereferencing the address b points to)
printf(*b); // Would print the value AT position b by dereferencing, hence would show "6"
printf(b); // Would print the integer value of the memory position, which would most likely NOT be "6".
Using this method in C# (reference pointers) you can manipulate the value of BOTH reference types (classes) or value types (structs and values) via pointers. Hence, this is *unsafe* because you override the usual C# logic for referencing values, and directly tamper with memory locations: something the language was made to prevent for usability in most contexts.
@@francoiscoupal7057 Merci Francois 🙂
There is an issue in the sample for the struct scenarios with updating the ID of the value inside the dictionary the Key referring to that is not really updated, the Key is still referring to the old value while the id of the instance of the struct is updated. could result in bugs at some scenarios ua-cam.com/video/rygIP5sh00M/v-deo.html
11:33
simplified your link :)
(YT makes time comments into clickable timestamps)
WHY WHY WHY does your website not use Google as an authentication provider? Teachable? Really?
So what you're saying is that c# has finally caught up to Perl. With much uglier syntax.
time to test tryadd() :p
The "GetOrNullRef" example you made was really awkward.... in the end you have an item that is saved with an incorrect key.
Just think you should have mentioned that one should not do that specifically.
interesting again, but I'm never going to use it or allow it to be used!
I don't like such syntax sugar personally because there is no significant benefit. Just another way to make common things...
Isn't this just the same thing as a the built in ConcurrentDictionary?
That is built specifically for concurrent/multi-threaded access. This scenario does not imply that you will be looking to cut out the double hash computations which on a tight loop save CPU time and allocations.
*Ad* ::
"Do you want to use C# in the cloud and have muscles and hot babes like Nick Chapsas?!"
*Me* :: _panic buys all courses_
Interesting, but no thanks, I'll pass and take the slower code that will not blow in my (or my coworkers) face.
At time index 7:22 a good example of why (IMO) `var` is undesirable. This is code I've never seen before and my comprehension is compromised a little by the lack of type information in the code. Especially in a team of developers where code gets included in the product via pull requests after review, the use of `var` hampers the review process even more because code is reviewed by Web based before/after UIs that don't let you hover over the `var` to tell you what the type is (and even having to do that is worse than simply using `bool` in the place of `var` for example).
True. I personally almost never use var. The only way to make it workable in Visual Studio is to activate some option which shows the actual type of vars right next to it. But as you said, outside of VS the problem is still there and makes life harder.
var was only intended for cases where the type is obvious. Unfortunately, most developers I've worked with decided to use it everywhere. Ironically, those same developers often complain about the difficulties they encounter when working with variables in languages like JavaScript or python because they're dynamically typed and so variable declarations don't give type information.
@@ItsTheMojo - In JS you're forced to always keep track of the types of everything, because otherwise you'll end up with some bad bugs. In C# the compiler does the work for you. Having the machine deal with it lessens the cognitive load of the developer and is much less error prone. There's a reason languages like TS, Python3 and PHP have moved towards more static analysis.
Personal opinion: Types are for the compiler to ensure correctness. If they can be inferred, all the better.
See also every FP language.
@@brianviktor8212 Code is much cleaner if you use simply use var keyword. Especially if you class has mile long names and it has Generics in Generics classes or collections. With ctrl + left mouse or mouse over you can easily see what type represents the return method or property.
@@anderskehlet4196 and using var when the type is not obvious forces the reader to keep track of types too.
The syntax is awful. Really should be added directly into..
Given what this does, and how it has massive footgun potential, that awful syntax might be deliberate. (See Scala's choice of "isInstanceOf".) If you really need it, you can put up with the ugliness. And if you don't... the ugliness reminds you that you likely shouldn't be using this technique :)
2X FASTER is not the same as double performance. It it the same as triple performance.
2X AS FAST is the same as double performance.
Its a click bait :-) my honest opinion :-)
It's not :-) numbers don't lie :-)
@@nickchapsas it is:) its one edge case that 95% will not experience and even if they do, I'm sure they have more serious problems in their code :) dont get me wrong its useful information but the thumbnail is a click bait :)
@@maciekwar It's not :) Just because the usecase is limited it doesn't make the video clickbait. I am showing you a way to double your performance. Whether you can use that in your very specific usecase or not is irrelevant. :)
@@nickchapsas you have 4 scenarios in 75% of your scenarios difference is NOT 2x faster :)
Again the lowlevel stuff, even using !Unsafe i am so glad i do not even know what it does. Read the word UNSAFE that you should not use if you want to create readable and maintable code. if you are really worried about the penalty of containskey maybe assembler is your better language of choice to work with.
Yeah, using struct which are bigger than 32 Bytes is really, really bad. You should never do that 🙂
While I appreciate this type of videos, but tbh it's been a long time since I think these are useful, maybe you can stop with the obsession of posting performance things for 1 ms change, or no 100kb of memory used, please do something else, I know your actual useful videos are on a paywall in a course type of content, but maybe posting an introduction to those would actuall help people start, and maybe generate more sells for those videos.
I’m not looking to sell more courses. I’m looking to showcase things that I find cool to a wider audience. If I wanted to sell more courses I would make freemium videos that show 5% of a topic and then tell you that to get the rest you have to buy a course
Unsafe indeed. You changed the ID of the struct when using GetNullOrRef. So now the hash points to an item that doesn't have that hash