For the avoidance of doubt: The technique shown in the video is still considered unsafe (like I mentioned in the video's conclusion) and should not be used unless you really know what you're doing. The reason why it is included in the video is because it doesn't require the use of the unsafe keyword hanse why I said "unsafe code territory" in the beginning. This doesn't make it safe. It just makes it a candidate to compare with other things that don't require the unsafe keyword to be compiled. If we were to use the unsafe keyword then I'd include pointers too and that would be a different story.
How do you interpret standard deviation difference between SpanUnsafe and Span. I don't know statistics that much but from what I interpret given sometimes the std dev difference for unsafe span is like 2-3 times bigger depending on the type it doesn't justify the mere better mean performance it has over span. So would span.. in more cases be more reliable good performance compared to unsafe span? Does std dev difference even make any sense to watch in this statistic? I presume we would do more logic in these loops so can it like... explode exponentially etc. Just wondering.
@@Downicon3 Hi! StdDev value by itself can be different between groups for a variety of reasons and it can't really be interpreted in isolation. Though you are probably on the right track as to why it varies so much between different data types: the more complex the logic, the more factors potentially contributing to the random uncertainty of the measurement, what resources it depends on (and how random or correlated those are)... But the "Mean" and "Error" (latter being a misnomer here) are more useful for comparisons. "Error" here represents the half-width of a 99.9% confidence interval. One can be 99.9% confident that the true mean of some measurement lies within breadth of "Mean"+/-"Error", so to say. If the mean value of group B falls within the scope of group A's confidence interval, than there is some non-negligible chance that A and B are actually the same thing. If it falls outside of it, than you can be pretty sure that group A and group B are statistically different. But a statistically significant difference doesn't automatically mean it has a practical significance.
When you say "You should obviously NOT use this in real day to day code" how do think about an open source library that may be used in production? If it's thoroughly tested an there's no way the collections that are iterated can be muted at the same time?
A lot of programming more or less follows the pareto principle: 20% of the code gets 80% of the effort. A lot of your code doesn't need to be tightly optimized, because it just isn't in the hot paths. But there are going to be small chunks of very performance-critical code where optimization techniques like this may be worth the effort. Like all tools, it's good to keep this in your toolbox, even if you aren't planning to use it often.
This is basically raw memory iteration with all safeguards that using high level language provides by-passed. It is interesting to know that you CAN do this without switching to another language... but if you need to do this in more than once in your code base, you'd be better off writing it as a lower level library in a language that does not have the safeguards. Less risk of someone who does not understand that the safety is off doing something that writes or in some other way reuses that memory resulting in hard to trace crashes. I agree on the pareto principle tho. For 80% of the code readable and safe is more important than fast. Out of the remaining 20, another pareto split gives you the small fraction where sacrificing safety to speed is justified.
@@kidmosey ToList() is just a shalow copy, no? But yeah, it causes iterating the damn thing twice, at the very least. Have to admit - in my 20 years of coding ive used profiler only once, on a java application i think. Usually, the problem has been identified by other more crude means.
@@MadMakerWorkshop ToList/ToArray/etc basically are writing out the output from the iterators in the source, so it's essentially already going through the collection at least once. Then you need to do your own iterating over that again, so yeah, it's very much less efficient. That said, it is also cleaner and safer, which in many situations, is more valuable than being performant.
@@MadMakerWorkshop Don't want safeguards and want speed? You don't need to change language, just open an "unsafe" function and do whatever you want. People tends to forget about unsafe and that brings C# nearly to the same level of performance as C...
Completely agree on your second question: if you have to write C# like that C# probably isn't the best language to use IMO. Seems like it would probably be some sort of hyper-specialized Use Case or an artificial limitation. AFAIK one of the entire points of C# in the very first place is to abstract those sorts of details.
Somebody has to write those low level abstractions somehow, and if you start using c++ for this then you bring the cost of interop/marshalling and you need to maintain the binaries for multiple hardware and oses. You're also likely to only need it for very small fraction of your codebase, if at all
@@tajkris part of exactly why the perceived need for it should be proven objectively with a test. If that doesn't happen it isn't sufficiently needed IMO.
@@zvxcvxcz I'm not saying C++ has any particular issues in this context. I'm saying that if 95% of your project is in C#, and the remaining 5% needs to be super optimized and performant it is better (easier) to have an option to write it in c#. If C# wouldn't have such language constructs you would be forced to do interop with c++ (or other lang). This has non-trivial invocation cost (which may diminish the perf benefits), maintanability costs, hiring costs etc. My point is it's good to have the ability to write low-level code in C# when needed and enjoy the abstractions and safety mechanism for the rest of the codebase.
Well, if you can isolate the parts that you need the most performance with this type of code I think that's a good reason to use it, most projects probably wouldn't benefit much from it though.
You can say that for basically any of these kinda speed micro-optimizations. They're for people who DO need to make something as fast as possible, or are just curious about it
Honestly this "solution" really looks like something the compiler should be able to do for you under the hoot when it's able to. I think doing this manually is smelly af.
You should try and get the last element of an array, store it as 'last' Now, using a while loop, for each element 'current' set it to Unsafe.Add(ref, 1) of itself. And until 'current' is not in the span (most of the times, it is when 'current' poimter is bigger than 'last' pointer) This is probably the fastet way possible to iterate over an array, as there are no multiplication (index x size of item). This method is used in C and other pointer friendly programming langs, I believe c++ foreach loops are lowered to this, and rust's iterators may be optimized to an equivalent assembly after closure inlining.
That's right. The below is the fastest way actually. Also worth mentioning that it's only the fastest because the MemoryMarshal and Unsafe classes are especially well optimized by the JIT. Since it's quite ugly (at least for the eyes of .NET devs), it's only beneficial to use this on hot paths. ``` ref var itemRef = ref MemoryMarshal.GetArrayDataReference(_array32); ref var lastItemRef = ref Unsafe.Add(ref itemRef, _array32.Length); for (; Unsafe.IsAddressLessThan(ref itemRef, ref lastItemRef); itemRef = ref Unsafe.Add(ref itemRef, 1)) { _dummy32 = itemRef; } ``` (well it's not the last item, but the item after the last. one may use the actual last item in order to prevent the accidental dereferencing of a memory address that may even cause an access violation)
Generaly multiplication wont make much difference in any non embedded hardware tbh It might be a difference when you go about alus usage and pipelines. But boy, thats complicated and probably hardware soecific
@@affegpus4195 no. It dose matter, when you do these kind of optimisation, most memory read are L1 cache hits so multiplication or actualy any operation will matter, especialy if you do SIMD. These kind of stuff are worth doing only on very hot paths on software that really needs it, like in game engine renderers or anything HPC like simulations and so on. The fact that you can build abstactions in C# that will compete with c on performance is impressive. You should see this thread as trying to test the limits of "safe" C#
Just for you to have an example. I did some simulations code, and when the whole simulation takes several weeks to run with many many machines working together non stop, every optimazation counts and can save several hours atleast I did not use c# in that case. But it is a valid case for not using multiplication
Great video as always! The thing about the for loop not being as fast as foreach when using an array: you did not copy the array to a local variable, but accessed the field in the loop. With foreach loops, this happens automatically. I had the same issues with my benchmarks a few months back.
That's right, but the performance benefit does not come from accessing a field vs. accessing a local var. When foreaching, the JIT (obviously) won't emit a boundary check inside a loop. When using a For loop it only omits the intra-loop boundary checking when the array is in a local field (one can even declare it as a class field and assign to a local var before looping).
I love this type of knowledge. I would think another thing to factor in when deciding whether or not use this is the overall skill level of your team. If I put something like this in a project, my biggest concern would be how much extra time am I causing someone in the future that has to troubleshoot the code. I know it could be documented and all, but still something to consider I would think.
True, it's not really a standard way of iterating... Thus a lot of people will have to spend more time analyzing and refactoring in case something goes awry. Which only means one thing, if you're still with the team that developed it... MORE WORK
C++ iteration style loop is even faster: for (int* ptr = begin; ptr < end; ++ptr) {}. You can create a wrapper class Ptr to avoid unsafe code. The problem with integer index is that on 64-bit platforms it requires one extra cpu instruction to convert 32-bit index to 64-bit index.
@@diadetediotedio6918 I'd argue that converting all your array indexes to the "fast" C# code would be more harmful to readability and productivity than just programming it C/C++/Rust from the start.
@@user-di5wj8is7i This is a good point, but at the same time, that's why normally we don't do that with all array indexes, this is an approach for very specific C# libraries and specific code that needs to be very fast and maintains a readable C# code for C# developers
I am going to stay with the last point. If the project requires customized memory management and specific performance optimizations, dotnet might be the wrong tool for the job. Aside of that point, great information as always!
Cool! Now we are introducing the out of bounds error possibilities from C and ASM into higher level languages that tried to eliminate them. Isn't the whole point of using high-level loops that you avoid these methods. In my mind, if I use foreach, C# should do that internally, maybe even in native code, instead of me explicitly.
I don't get why a foreach on an array/list is not optimized to be the same as a for loop under the hood in the most simplest scenario. Why can't they look at the body of the loop, and in most scenarios optimize it to the same code even though they were written differently. Yes I know foreach creates iterators etc and is much more flexible. But if I am iterating over base type like an array instead of calling an IEnumerable function, they you should be able to just optimize that to the same code as a for loop without the iterators etc. Give me the iterator when I need it for the more complex scenarios, but derive my intent from my code in most scenarios and optimize that added functionality away and make them just as fast.
Very interesting! I would definitily use this that technique in my classes that parses C/C++ header files, documentations and doxygen commands. It sounds like that this technique is similar than the char* pointer arithmetics from C, where you can get the value and advance the pointer to the next item. So its basically this: char *ptr = myMemory + offset; while (ptr < ptrEnd) doSomething(*ptr++);
I'd love to see a comparison of this versus actual unsafe code with real pointers. The fact that you can use the Unsafe class to get a reference to the zeroth element of on array on the heap and then iterate by basically doing pointer arithmetic makes me assume that somewhere buried in there, the runtime must be pinning the array - either that or Unsafe.Add does some address translation behind the scenes to compensate for the GC potentially moving the data around? Either way I wonder what overhead that adds versus just going straight into fully unsafe code and doing it with pointers and pinning. This is a fascinating video though - I love that C# is "safe by default" but you can override that safety in gradual layers.
"ref" variables in C# are sometimes referred to as "managed pointers". That's because unlike regular pointer variables in C# ("unmanaged pointers"), they are tracked by the garbage collector and updated when the GC moves the data they are pointing to (yes, even if the ref variable is pointing to a location in the middle of an array). Thus there is no need to pin the array, and Unsafe.Add is just a simple addition (which is why it's so fast). But you're absolutely right that there is some GC trickiness going on; it's just not where you thought it was 🙂
This guy is every lead developers worst nightmare! Most code doesn't need to be this "tightly optimized" but do you think the junior or mid level devs care? NOPES! Instead of readable for loop or for each loops we now have Spans and refs and unsafe adds everywhere! Thanks dude! Thanks alot!
Ah nice to see a video about this. I have recently been experimenting with that class Unsafe, but not that Add method (there is also a Subtract method by the way) as shown here, but I played around with the CopyBlockUnaligned method and also Buffer.MemoryCopy, but that last one turned out to be a little bit slower. This was all for an algoritm that I had written in C, but of which I wanted to see how far I could go in C# to get performance as near as possible to C as I could. I started with real unsafe code, but then I tried to get everyting in to the "safe" zone. Hoping to still get reasonably high performance. Even Spans still give some overhead, I noticed. Probably because they are still wrappers around actual pointers. CollectionsMarshal is a new one for me though, so that one is nice to know about now.
In your benchmark timing tests you should consider not just DEBUG vs RELEASE builds, but more importantly, running within the IDE or running your app outside of it. Your app will run the fastest being a RELEASE build running without the debugger/IDE attached.
@@GregMoress I can definitely confirm what you say! With another performance testing project that I recently did using VS Code with OmniSharp, I was actually making this mistake for a while: using Release mode with a debugger attached. I was experimenting with different scenarios of CPU-intensive synchronous and asynchronous code in a multithreaded situation. It was not 100% certain to me in this case if it would still actually use the debugger with Release mode. But as it turned out, the difference with dotnet watch / run (for which no shortcut key exists by default in VS Code) was very big and therefore the result was wrong.
I hate stuff like this outside of the hotpath. I can see using spans in for loops because it is a reasonable abstraction. The only thing there, is I think they should "fix the compiler" to make foreach loops on certain constructs lower themselves to the faster code whenever it makes sense (I know that is sometimes easier said than done, but they have done similar things with high level constructs in c++). I think foreach on a span would be a nice tradeoff between clean(er) syntax and performance if that were the case. TBH what I really want is a world where LINQ runs just as fast as the equivalent loop, but I guess I won't hold my breath on that.
Remember folks, at the machine code level the REL (or equivalent ) machine code instruction is relative offset from base address point for an array. This REL takes 1 clock cycle to action. If you write your code so the compiler can work at the pointer level to ++ the pointer from the base address then it will go as fast as possible in the loop. My concern here is that some languages make this difficult. I work in very big data NGS and other biological data and there is nothing like C or C++ to operate on pointers directly giving the best performance at low level methods/functions since the compiler can easily make efficient machine code both using REL and small memory manipulation. For the best performance you need to understand compiler design, machine code design and hardware memory manipulation at the low level to take advantage of this - and avoid pitfalls - especially IO waits to memory or even L3 cache. For most people (especially at high level languages that don't have primitive data) then the performance improvement at this level of effort is not important/irrelevant, but where low level numerical analytics algorithms on big data it is worth effort. It comes down to what are you trying to do, and does your requirement dictate that one small part of the code need to be high performance and the rest maintainability is key - so just make the code readable. Also remember that algorithm is key - does not matter how you write your loops if you nested loops the wrong way round or that the algorithm has bad order then you are wasting your time.
exactly what you said at the end, if I have to deal with unsafe code why am I even writing in C#, then I could use just something different that is made for pure performance
Very true. A slight performance hit - and we're talking about a 0.7 ms vs 0.9 ms for a million items here - is what I'm more than willing to pay to use friendlier semantics in a programming language. Although I have to say that I like to overoptimize some of my projects as a hobby and I'll gladly take that 0.2 ms ;)
When I want code to run fast I use the for loop. When I want to understand the code tomorrow morning I use Linq. When I was a kid I'd just allocate a pointer to a struct and increment the pointer. The good old days.
There are rare occasions when you really *need* to eek out every bit of performance, especially when iterating through extremely large collections, and this is a place I would use such a thing. Interestingly, since much of what I have worked on lately has been stuck in .NET 3.5, I didn't have access to any Span, so I had to write my own ArraySlice class for a similar purpose.
I don't think it's an argument of if you should be using one language over the other. If you need that level of performance, C# is probably not the best candidate. But, its still another tool in your toolset. Thanks for sharing!
I love how C# can be both a comfortable, safe, relatively high level language, but recent versions also allow low level trickery like this. Not useful for your average Web or GUI app, but great if you're writing system software like databases (see RavenDB) or in fact the CLR.
This is an interesting insight into how different techniques compiles, but the whole endeavour looks very much like a premature optimisation. Clearly some scenarios have been optimised in the compiler and there is no reason why this might not change for future compilers. These techniques could be come pointless obfuscation in the future at best, or even less efficient at worst. For me the use cases of this kind of optimisation are very specific and probably dubious.
I love how we have to write all this code in these new languages now to do something which lower level languages do by default, Assembly, C, and Pascal have used direct memory pointer operations to iterate through arrays since the dawn of time, but to get c# to do it at the same level you have to write a bunch of code… why can t the language simply be smart enough to use pointers internally automatically… Great video though, its good to know this stuff…
Reminds me of how on older C compilers looping over an array by index would generate worse code than looping with a pointer. Not true on a modern, optimizing C compiler, but I wonder if this is a similar deal. And judging by the use of unsafe it sounds like it might elide bounds-checking, which would certainly also explain some of the difference, though I'd expect it to be very very small on large arrays due to branch prediction.
I think it is good that you are showing some use cases of the "unsafe territory". The "unsafe ness" with pointers and all that are actually part of the language and should not be feared, all though it should be treated carefully, of course. But as it is part of the language I think it is better to educate people about it. I mean memory management is kind of tricky if you are not used to it but it is not magic and evil. Once you know how that works it isn't that bad and it opens up doors. And yes, if you are coming from C# or Java or other GC-langs, be sure to pack a lot of feet because learning memory management will have you shooting yourself in the foot many times before it becomes natural ;)
Cool experiment and good to know!, What you do inside the iteration is almost always going to make iteration speed optimisations insignificant in relative terms. Probably not worth the unsafe memory risk in the majority of cases.
My half sent on, "If I need to write code like this, why am I using C#". I think that if you are going with a project where these kinds of optimizations are critical then yes pic something closer to C. However I think it's a great thing that C# allows for this since in some cases this kind of optimization my appear as a surprise, so the ability to "go down a level" is really useful to deal with this kind of nasty surprise.
If you are responsible for developing and maintaining millions of lines of code, which would be tocuhed by diff. competencies of developers, I would choose comprehensibility over minor performance gains. Maintenable less cryptic code is always better. Though performance optimization is an important concern, it is not required to this degree (1-2% improvement while making the code less understanble). It is really required when one is developing generic libraries, application frameworks and utilities, where developers really know what they are doing. But for the real world enterprise scenarios
Compilers should just do this for you when possible honestly. People say that premature optimization is the root of all evil in programming. But they also say if it ain't broke, don't fix it, so optimization won't happen except when it is in one critical location. Following that, they also say that software engineers have managed to reverse the hardware progress of the past 10 years to where modern software is as sluggish as their 10-year-old counterparts. It is death by a thousand cuts.
This is C/C++ buffer traversing in C#. Please, never do this so I don't have to come along later and debug a nightmare intermittent failure in your code. Its not named unsafe arbitrarily. If you truly want / need speed write it in C/C++. (edited some snark out of my comment)
Nice video! How does the performance compare to optimized unsafe context code? I landed up in unsafe coming from c++ and that boosted my algorithms. But I'm worried people after me will not be familiar with unsafe context. This is a little nicer syntactic wise(although it still is not for everybody) and so I'm all into span and co. when performance stays maximal.
4:50 funny you mention a reference to the 0th element... One of my favourite things in the newer .NETs is MemoryMarshall.GetArrayDataReference. And it allowed me to write this funny little extension. It's safe even when the array is empty. public static int IndexOf(this T[] array, ref T entry) where T : struct { var id = Unsafe.ByteOffset(ref MemoryMarshal.GetArrayDataReference(array), ref entry).ToInt64() / Unsafe.SizeOf(); if (id < 0 || id >= array.Length) return -1; return (int)id; } FAKE EDIT: and nick mentioned GetArrayDataReference 4 minutes later 😂
This is like being hired by a big company as some kind of business manager and be proud after 1-year because you have decreased the internal costs of your company by some big figure (just because you decided to...stop buying 1 million coffees for your employees every year...). Anyway, it is great C# offers that possibility. It might help some companies in very specific scenarios, who knows, they can say their clients now "this click" runs in half the time!!.
When working with array or list with fixed count, I have a habit to cache `array.Length` when using for-loops, i.e. instead of writing `for (int i = 0; i < array.Length; i++)`, I write `for (int i = 0, iMax = array.Length; i < iMax; i++)` so that it only needs to access `array.Length` once.
It won’t make a visible difference even in those benchmarks I showed. There is a section in the video showing the benchmark delta with and without caching the value and it’s less than a nanosecond
@@nickchapsas Normally it wouldn't. But the code you have as it is has the `for` loop doing a full field access every iteration in order to access the `_items.Length` value. This is in contrast to your `foreach` (and other benchmarks) that do the field lookup ONCE and only once. Your `for` loop code is doing the field lookup N*2 times for the size of the array (once for the `Length` and once for the indexer value access). Even if you keep hitting the `Length` property each iteration, you need to make a local variable reference to the `_items` field rather than hitting it twice for every single iteration. There's a very real chance this is the cause of your curious benchmarking times that you pointed out where the `foreach` version outperformed the `for` version.
If i am not mistaken, this practice may hurt the performance. With for loops from 0 to arr.Length, the jit can remove the bounds check. When you change to use a variable, i am not sure that will do the same.
@@shingok It may depend on runtimes, particularly the earlier ones that didn't have the bounds checking optimization. Doing some quick tests, LINQPad's reported disassembly for both versions are very similar: www.diffchecker.com/a2KWo9Zv and a quick benchmark comparing them across 10 million entries yielded functionally identical results: i.imgur.com/5YQZ9Iq.png I don't think tossing the Length value into a local variable makes a difference. But if your collection is a field that you have to load every single iteration, that can have an impact. So it's important to have a local variable reference to `_items` here rather than hitting the field twice each loop.
Will there be an increase in speed if you assign the "length" of the collection to a variable (before the for loop) and use this variable in for expression, instead of getting the length of the collection every time
I think alike your sentiment that is if I am required to write C-ish code for performance reasons, I should probably write C (or Visual C++ and then reference the dll from my C# project; never done that but I think it should work). I feel like this is not some code I would write, because even if I know what it means, co-workers or future co-workers might not. On a side note, the StdDev of this approach looks like the improvement is more like a hit-or-miss improvement. Iterating once can be super fast, but also can be slower. The nice thing about the regular loop with Spans is that it is fast AND fairly consistent in execution time.
It seems weird to me that a very high level language which has such low level access to memory/pointers does not automatically do some of this stuff for you if it can
Yeah that syntax for getting a managed pointer essentially looks like how you would do a reference (or const reference) in C++. That's essentially what it's doing anyway, which is why it's so fast.
since the span does not take into account new changes to the list, it is a natural cool way to perform Gauss Jordan iterative method for solving linear system. I mean Gauss Jordan, not Gauss Siedel.
This kind of information should be used by the compiler to automatically optimize clearly written code. The answer is not to use difficult to read and understand code in order to trick the compiler into using a faster implementation.
I think it's good to put the knowledge out there and I appreciate the video. However, I feel like there needs to be a strong caveat because beginners and even intermediate devs will want to use these techniques because they are "cool". My advice is that you should always write correct clear code first. That means, always use foreach loops and only after profiling do you then optimize. Because like the benchmark is showing, you don't really get any tangible benefits unless that code is called 100,000+ times which is true for Framework libraries but rarely true for everyday Line of Business applications. Not to mention the fact that compilers and the JIT are pretty sophisticated and in some cases can automatically do these optimizations like in the case of int[] so in those cases you have uglier code for no benefit whatsoever I'm sure you already agree with this but you need to make it clear for beginners and intermediate devs who stumble across this
Is there a risk of going over the stack limit if I do the Span - method? It's not stackalloc, but does casting the list into span put all the data into the stack ready to use? Let's say I have a mesh with a list of million vertices and I declare that as span in the start of the function and then proceed to do operations to them which might put more pressure to the stack limit.
First thing when I saw it - can't you just use this to create really performant ForEach(this ICollection list, Action action) and ForEach(this T[] array, Action action) and get an easy boost without changing much of your code? More importantly, can it use Memory instead of span to support async?
@@protox4 Correct you would incur a penalty for any closure in the lambda at least. But if you're calling another method in the loop anyway, which is very likely, it wouldn't be that bad. This would still probably be faster than LINQ, because LINQ is based on IEnumerable, enumerating, which is the slowest method.
Also, you wouldn't be protected against collection mutation while iterating, that would be very dangerous to use as a generic solution for all cases. IMO it has its place in deep optimization scenarios, only when there are a lot of iterations to be made and only when you fully control the collection.
Great video. I usually do fixed and use pointers if I need performance in C#. It performs as fast as C. UNSAFE is only really unsafe for NOOBS who don't know what they are doing, for us in the know unsafe is totally 100% safe :)
Is that something you could benchmark, and maybe explore to a degree? Is there a benefit to writing a highly optimised library in C++, and adding it as an unmanaged dependency. Can you get further performance boosts by passing a collection (or pointer) to unmanaged code? Where should one draw the line?
Maybe I'm wrong but marschaling, aside particular use cases but is mandatory when you deal with different language or existing API that exposes their functions or datas by pointers/reference (dll in C or other APIs....). Then we have to deal with so called "unsafe code" and anybody might understand that. Span is the C# way to deal with performance matters IMHO. What do U think. By the way, like your videos, I'm keeping on watching them!! Good Job.... Sincerly yours....
You say several times that this is for read-only access but it seems to me you could increment the integers in the span. Would that work? To put it another way, can you use this for map as well as reduce?
@@mildmanneredjanitor0 You can even use the Unsafe.Add to mutate a string ... essentially grap the pointer and rewrite it... it actually can be used to implement optimized decoding scenarios like converting bytes to a HEX strings. In that case you would allocate the string in the sice you need and grab the Unsafe.Add with a ref... ref var foo = ref Unsafe.Add(ref searchSpace, i); and the foo can be mutated.
IMO all this stuff should be a matter of the compiler. For example the compiler, in many cases, knows which is the real underlying type so, if I use foreach, just because a "fashion" matter, with an array of value types, the compiler should "translates" it with a "for" if it is more efficient.
My guess would be that C(++) is faster than C# because memory pointer operations are faster than memory managed operations. And you can improve the operation of C# code by bypassing memory management by pointing directly to memory. But memory management makes coding a lot safer.
If you look at the disassembly span is faster because there are no bounds checks also cos it is fixed length. The CollectionMarshal.AsSpan is really just abusing the internal list via reflections from the IList and giving you a span pointing to the internal array, you're not supposed to add to or remove from the List because then your span (I think?)gets invalidated.
would have liked this placed on a chart as you would have a much more visual impact of the comparisions, however very nice work and very interesting how different optimisations can be applied.
reason to use whole "unsafe" C# stuff is simple: even unsafe C# code still is better then C++ equivalent :) not saying C++ won't be faster or that you can't optimize this even more in C++ but it's still C++ with all its weirdness. at least that's my opinion, I left C++ in 2011 after 10 years of using it professionally and never looked back.
Good to know, but also smells pretty 1990s. One of my biggest gripes with all C based modern languages is failure to handle maps or folds well. C gave us many great ideas, but it's focus on loops* is becoming a real problem (that has been stewing for ~20 years now). Since MMX, code has not been strictly linear. CPUs have not been limited to handling SIMD by repeating the instructions yet our languages are still written as though they are. Map and fold really should be primitives, because high level code will never know how the silicon wants something vectorised. It bugs me that I have to go out of my way to avoid telling the compiler that there are no dependencies between the blocks in order to get performance (not least because it also makes it harder for future readers). A good compiler will spot where there are no dependencies and actually turn your loops into vectorised maps. *As a side note, functional languages over-dependence on recursion is a similar issue. Both these control flow graphs have nice properties, but sometimes the reality of a program is that it has a different set of nice properties that I would like to be able to tell the compiler about but are lost if you try to coax it into a loop or function based program.
But there are so many algorithms where the order does matter, and where things need to be mutated during the loop... As you say, a good compiler will (and should) spot when there is no dependency, you can also hint to the compiler, but also many of these optimizations are hardware-dependent... I wouldn't mind better map and fold support, but if I had to pick I would prefer loops to be the first class citizens, programming in something like R shows how obnoxious it can be to try to vectorize every algorithm to get decent performance when so many are super trivial to write as a loop.
If the goal is to ultimately using unsafe method why convert list to listofspans, can we not fetch the address of first element of list also just like we did for array ?
I believe that if you want to get thie level of performances, you should make C/C++ instead of making all this that moreover, reproduces exactly what we do when we handle an array in C/C++. The sad thing is that some developer do this to impress people to say "look how good I am! My algos are very performant. I'm don't make like everybody!... I'm different... I'm better!!!!", but most of time, developers of this kind do not only make their code painfull to read, they also tend to architecture their applications in way that makes it dramatically slow and impossible to maintain. Still, that's fun to see that we can reproduce C algo in NET! :D
The biggest performance benefit seen here is around 10microseconds on a list of 1M objects. For reference, if you could speed up an algorithm you're running by even a single millisecond (maybe by avoiding an extra loop somewhere) it'd be a performance improvement 100x greater than this. It's interesting to see what C# is capable of in this case, but I feel like videos like these are misleading and harmful when they're not prefaced with with big -massive- disclaimers that put performance into context for devs. Microsoft can use obscure methods to squeeze performance out of every line of code in LINQ, but that doesn't mean you should be replacing the foreach loops in your bubble sort with pointer arithmetic... or any of your foreach loops really. I know that's not what's being argued in the video, but I often see people bring up points like the one when discussing optimization and I think information without the right context bears some of the blame for devs missing the forest for the trees when thinking about performance. Don't get me wrong, I think the video is very well done. I just wish it felt more like "here's an interesting finding about how some loops are optimized" and less "you wanna write the fastest code? here's how Microsoft does it".
I think it's very cool we get these low-level functionalities. I know, you could argue, if you need to manage memory, why not write C or C++? But you don't have to use these - but you *can* - I wonder, is it doable to get performance out of C# to make a game engine that can compete with C++?
Hay Nick. Thanks a lot for sharing these amazing techniques. I wish if you could make some videos on SQl Server fastest queries. Please. Thanks from Afghanistan and pray for us to get rid of barbarian talibans.
I nice video. However: Spans are introduced to get the performance of plain array's of value-types, not the other way around. The reason that lists are normally slower to iterate is only partly due to the overhead of the iterator: it is the checks that the collection is not being changed in the meantime that is the biggest penalty. As your own data shows: on looping int arrays directly nothing can be gained; the iterator is a valuetype that has exact the same memory footprint as the for loop with its value, ans also the spans do basically the same. Long version short: The trick you showed allow you to directly access data, bypassing the the encapsulating data structures. In most cases: those encapsulating was there on purpose...
I don't think its simply just 'Its not fit for C#', but its extremely likely that the time someone needs to spend on doing an iteration like this (Especially if done from scratch) is more valuable spent on other things in the project, even for purely optimization. In most projects basically there would be bigger optimization potentials than this. But otherwise, cool trick, it could be really fun to trial-and-error whit it for fun.
This is actually the best way to loop through multi-dimensional arrays, you get a clean and fast code without having to calculate the dimension indices.
What do you think about adding a delegate as an input parameter for this super_crazy_fastest_forloop_ever()?Is that's probably killing the performance or it could be good worth it?
How the hell do you write code outside of classes, c++ style? I dont get it, is it some tool for prototyping in rider, that hides program class and main method in console app?
A great example of why I would never use this language. People still struggle to understand that no level of "safety" stops programmers from breaking their code. So all this "safe" stuff ends up mostly getting in the way. If you try to make efficient code, it looks like this mess. And if you fail, you end up with an unreadable undebuggable mess full of unsafe stuff. I think one big advantage of (modern) C++ is that you get plenty of safety options but none of it is required. And unsafe code ends up being simple to read and follow so it's easier to debug.
@@nickchapsas writing C++ is not as bad as reading it. It is designed to have a feature for every programmer, but you aren't every programmer so you only really need to know the stuff relevant to your style. And most of the features are high level stuff so you don't need them (and shouldn't use them) in high-performance code. I'm not trying to make you switch languages. I just want you to seriously question the idea of writing weird stuff that obviously isn't how the language is intended to be used.
For the avoidance of doubt: The technique shown in the video is still considered unsafe (like I mentioned in the video's conclusion) and should not be used unless you really know what you're doing. The reason why it is included in the video is because it doesn't require the use of the unsafe keyword hanse why I said "unsafe code territory" in the beginning. This doesn't make it safe. It just makes it a candidate to compare with other things that don't require the unsafe keyword to be compiled. If we were to use the unsafe keyword then I'd include pointers too and that would be a different story.
Ho Nick, please can you post a link to the source code shown in this video?
How do you interpret standard deviation difference between SpanUnsafe and Span.
I don't know statistics that much but from what I interpret given sometimes the std dev difference for unsafe span is like 2-3 times bigger depending on the type it doesn't justify the mere better mean performance it has over span.
So would span.. in more cases be more reliable good performance compared to unsafe span?
Does std dev difference even make any sense to watch in this statistic?
I presume we would do more logic in these loops so can it like... explode exponentially etc.
Just wondering.
@@Downicon3
Hi! StdDev value by itself can be different between groups for a variety of reasons and it can't really be interpreted in isolation. Though you are probably on the right track as to why it varies so much between different data types: the more complex the logic, the more factors potentially contributing to the random uncertainty of the measurement, what resources it depends on (and how random or correlated those are)...
But the "Mean" and "Error" (latter being a misnomer here) are more useful for comparisons. "Error" here represents the half-width of a 99.9% confidence interval. One can be 99.9% confident that the true mean of some measurement lies within breadth of "Mean"+/-"Error", so to say. If the mean value of group B falls within the scope of group A's confidence interval, than there is some non-negligible chance that A and B are actually the same thing. If it falls outside of it, than you can be pretty sure that group A and group B are statistically different. But a statistically significant difference doesn't automatically mean it has a practical significance.
C# reinventing C pointers 😀
When you say "You should obviously NOT use this in real day to day code" how do think about an open source library that may be used in production?
If it's thoroughly tested an there's no way the collections that are iterated can be muted at the same time?
A lot of programming more or less follows the pareto principle: 20% of the code gets 80% of the effort. A lot of your code doesn't need to be tightly optimized, because it just isn't in the hot paths. But there are going to be small chunks of very performance-critical code where optimization techniques like this may be worth the effort. Like all tools, it's good to keep this in your toolbox, even if you aren't planning to use it often.
This is basically raw memory iteration with all safeguards that using high level language provides by-passed. It is interesting to know that you CAN do this without switching to another language... but if you need to do this in more than once in your code base, you'd be better off writing it as a lower level library in a language that does not have the safeguards. Less risk of someone who does not understand that the safety is off doing something that writes or in some other way reuses that memory resulting in hard to trace crashes. I agree on the pareto principle tho. For 80% of the code readable and safe is more important than fast. Out of the remaining 20, another pareto split gives you the small fraction where sacrificing safety to speed is justified.
@@kidmosey ToList() is just a shalow copy, no? But yeah, it causes iterating the damn thing twice, at the very least. Have to admit - in my 20 years of coding ive used profiler only once, on a java application i think. Usually, the problem has been identified by other more crude means.
@@MadMakerWorkshop ToList/ToArray/etc basically are writing out the output from the iterators in the source, so it's essentially already going through the collection at least once. Then you need to do your own iterating over that again, so yeah, it's very much less efficient.
That said, it is also cleaner and safer, which in many situations, is more valuable than being performant.
@@MadMakerWorkshop Don't want safeguards and want speed? You don't need to change language, just open an "unsafe" function and do whatever you want. People tends to forget about unsafe and that brings C# nearly to the same level of performance as C...
@@drgusman problem with that is that people do not expect such code to be present in C# and that can lead to trouble...
Looks like c++ pointers but more "boilerplated"
It's pointer arithmetic, only with methods instead of operators, not that different.
It's pointers, but the garbage collector still keeps track of them. With raw pointers, the GC cannot keep track.
C# has pointers too, this is a slightly more "safe", or rather idiomatic, way of iterating without using pointers explicitly.
exactly at this point just use c++
@@trustytrojan your reply makes absolutely no sense
Completely agree on your second question: if you have to write C# like that C# probably isn't the best language to use IMO. Seems like it would probably be some sort of hyper-specialized Use Case or an artificial limitation. AFAIK one of the entire points of C# in the very first place is to abstract those sorts of details.
Somebody has to write those low level abstractions somehow, and if you start using c++ for this then you bring the cost of interop/marshalling and you need to maintain the binaries for multiple hardware and oses.
You're also likely to only need it for very small fraction of your codebase, if at all
@@tajkris part of exactly why the perceived need for it should be proven objectively with a test. If that doesn't happen it isn't sufficiently needed IMO.
@@tajkris Why should C++ have any special issues with this? Your objections there seem bizarre. I don't get what you think the problem is.
@@zvxcvxcz I'm not saying C++ has any particular issues in this context. I'm saying that if 95% of your project is in C#, and the remaining 5% needs to be super optimized and performant it is better (easier) to have an option to write it in c#. If C# wouldn't have such language constructs you would be forced to do interop with c++ (or other lang). This has non-trivial invocation cost (which may diminish the perf benefits), maintanability costs, hiring costs etc.
My point is it's good to have the ability to write low-level code in C# when needed and enjoy the abstractions and safety mechanism for the rest of the codebase.
You might want to optimize something with this in a library that a lot of code is dependent on and would benefit from better performance there.
Well, if you can isolate the parts that you need the most performance with this type of code I think that's a good reason to use it, most projects probably wouldn't benefit much from it though.
I suspect there's a way to mock out a test to demonstrate just how much if any value it might add.
You can say that for basically any of these kinda speed micro-optimizations. They're for people who DO need to make something as fast as possible, or are just curious about it
Use managed C++, call into code that compiles to native instructions.
Premature optimization is the root of all evil
Honestly this "solution" really looks like something the compiler should be able to do for you under the hoot when it's able to. I think doing this manually is smelly af.
You should try and get the last element of an array, store it as 'last'
Now, using a while loop, for each element 'current' set it to Unsafe.Add(ref, 1) of itself.
And until 'current' is not in the span (most of the times, it is when 'current' poimter is bigger than 'last' pointer)
This is probably the fastet way possible to iterate over an array, as there are no multiplication (index x size of item).
This method is used in C and other pointer friendly programming langs,
I believe c++ foreach loops are lowered to this, and rust's iterators may be optimized to an equivalent assembly after closure inlining.
That's right. The below is the fastest way actually. Also worth mentioning that it's only the fastest because the MemoryMarshal and Unsafe classes are especially well optimized by the JIT. Since it's quite ugly (at least for the eyes of .NET devs), it's only beneficial to use this on hot paths.
```
ref var itemRef = ref MemoryMarshal.GetArrayDataReference(_array32);
ref var lastItemRef = ref Unsafe.Add(ref itemRef, _array32.Length);
for (;
Unsafe.IsAddressLessThan(ref itemRef, ref lastItemRef);
itemRef = ref Unsafe.Add(ref itemRef, 1))
{
_dummy32 = itemRef;
}
```
(well it's not the last item, but the item after the last. one may use the actual last item in order to prevent the accidental dereferencing of a memory address that may even cause an access violation)
Sounds like aids
Generaly multiplication wont make much difference in any non embedded hardware tbh
It might be a difference when you go about alus usage and pipelines.
But boy, thats complicated and probably hardware soecific
@@affegpus4195 no. It dose matter, when you do these kind of optimisation, most memory read are L1 cache hits so multiplication or actualy any operation will matter, especialy if you do SIMD.
These kind of stuff are worth doing only on very hot paths on software that really needs it, like in game engine renderers or anything HPC like simulations and so on.
The fact that you can build abstactions in C# that will compete with c on performance is impressive. You should see this thread as trying to test the limits of "safe" C#
Just for you to have an example.
I did some simulations code, and when the whole simulation takes several weeks to run with many many machines working together non stop, every optimazation counts and can save several hours atleast
I did not use c# in that case. But it is a valid case for not using multiplication
Great video as always! The thing about the for loop not being as fast as foreach when using an array: you did not copy the array to a local variable, but accessed the field in the loop. With foreach loops, this happens automatically. I had the same issues with my benchmarks a few months back.
That's right, but the performance benefit does not come from accessing a field vs. accessing a local var. When foreaching, the JIT (obviously) won't emit a boundary check inside a loop. When using a For loop it only omits the intra-loop boundary checking when the array is in a local field (one can even declare it as a class field and assign to a local var before looping).
I love this type of knowledge. I would think another thing to factor in when deciding whether or not use this is the overall skill level of your team. If I put something like this in a project, my biggest concern would be how much extra time am I causing someone in the future that has to troubleshoot the code. I know it could be documented and all, but still something to consider I would think.
True, it's not really a standard way of iterating... Thus a lot of people will have to spend more time analyzing and refactoring in case something goes awry. Which only means one thing, if you're still with the team that developed it... MORE WORK
C++ iteration style loop is even faster: for (int* ptr = begin; ptr < end; ++ptr) {}. You can create a wrapper class Ptr to avoid unsafe code.
The problem with integer index is that on 64-bit platforms it requires one extra cpu instruction to convert 32-bit index to 64-bit index.
It is also out of the question. Like I mentioned in the into, the solutions we are testing do not require the use of the unsafe keyword
@@nickchapsas why don'T u just use c++ and get the best performance overall :D .. ofcourse productivity is something to do with experience..
@@nandhan6637
because there's no need for the "best performance overall" with the cost of the overall productivity
@@diadetediotedio6918 I'd argue that converting all your array indexes to the "fast" C# code would be more harmful to readability and productivity than just programming it C/C++/Rust from the start.
@@user-di5wj8is7i
This is a good point, but at the same time, that's why normally we don't do that with all array indexes, this is an approach for very specific C# libraries and specific code that needs to be very fast and maintains a readable C# code for C# developers
I am going to stay with the last point. If the project requires customized memory management and specific performance optimizations, dotnet might be the wrong tool for the job. Aside of that point, great information as always!
You mean like doing something with this in C# Unity? Unless the C# somehow gets converted to C++, there isn't really a way around it is there?
with new dotnet it's important for multi-platform support, that may be much more valuable for 90% of the codebase.
@@MrRizzyWizzy Funny you should mention that because a lot of unity games are converted to C++ using IL2CPP
Cool! Now we are introducing the out of bounds error possibilities from C and ASM into higher level languages that tried to eliminate them.
Isn't the whole point of using high-level loops that you avoid these methods. In my mind, if I use foreach, C# should do that internally, maybe even in native code, instead of me explicitly.
I don't get why a foreach on an array/list is not optimized to be the same as a for loop under the hood in the most simplest scenario. Why can't they look at the body of the loop, and in most scenarios optimize it to the same code even though they were written differently. Yes I know foreach creates iterators etc and is much more flexible. But if I am iterating over base type like an array instead of calling an IEnumerable function, they you should be able to just optimize that to the same code as a for loop without the iterators etc. Give me the iterator when I need it for the more complex scenarios, but derive my intent from my code in most scenarios and optimize that added functionality away and make them just as fast.
Very interesting! I would definitily use this that technique in my classes that parses C/C++ header files, documentations and doxygen commands.
It sounds like that this technique is similar than the char* pointer arithmetics from C, where you can get the value and advance the pointer to the next item.
So its basically this: char *ptr = myMemory + offset; while (ptr < ptrEnd) doSomething(*ptr++);
not only char* but for any pointer type
Το γεγονός ότι μιλάς τόσο καλά τα αγγλικά και έχεις κάνει καριέρα κιόλας στο youtube είναι ακραίο. Εύχομαι περισσότερη επιτυχία στο μέλλον.
I'd love to see a comparison of this versus actual unsafe code with real pointers. The fact that you can use the Unsafe class to get a reference to the zeroth element of on array on the heap and then iterate by basically doing pointer arithmetic makes me assume that somewhere buried in there, the runtime must be pinning the array - either that or Unsafe.Add does some address translation behind the scenes to compensate for the GC potentially moving the data around? Either way I wonder what overhead that adds versus just going straight into fully unsafe code and doing it with pointers and pinning.
This is a fascinating video though - I love that C# is "safe by default" but you can override that safety in gradual layers.
"ref" variables in C# are sometimes referred to as "managed pointers". That's because unlike regular pointer variables in C# ("unmanaged pointers"), they are tracked by the garbage collector and updated when the GC moves the data they are pointing to (yes, even if the ref variable is pointing to a location in the middle of an array). Thus there is no need to pin the array, and Unsafe.Add is just a simple addition (which is why it's so fast). But you're absolutely right that there is some GC trickiness going on; it's just not where you thought it was 🙂
This guy is every lead developers worst nightmare! Most code doesn't need to be this "tightly optimized" but do you think the junior or mid level devs care? NOPES! Instead of readable for loop or for each loops we now have Spans and refs and unsafe adds everywhere! Thanks dude! Thanks alot!
Ah nice to see a video about this. I have recently been experimenting with that class Unsafe, but not that Add method (there is also a Subtract method by the way) as shown here, but I played around with the CopyBlockUnaligned method and also Buffer.MemoryCopy, but that last one turned out to be a little bit slower.
This was all for an algoritm that I had written in C, but of which I wanted to see how far I could go in C# to get performance as near as possible to C as I could.
I started with real unsafe code, but then I tried to get everyting in to the "safe" zone. Hoping to still get reasonably high performance. Even Spans still give some overhead, I noticed. Probably because they are still wrappers around actual pointers.
CollectionsMarshal is a new one for me though, so that one is nice to know about now.
In your benchmark timing tests you should consider not just DEBUG vs RELEASE builds, but more importantly, running within the IDE or running your app outside of it.
Your app will run the fastest being a RELEASE build running without the debugger/IDE attached.
@@GregMoress I can definitely confirm what you say! With another performance testing project that I recently did using VS Code with OmniSharp, I was actually making this mistake for a while: using Release mode with a debugger attached.
I was experimenting with different scenarios of CPU-intensive synchronous and asynchronous code in a multithreaded situation.
It was not 100% certain to me in this case if it would still actually use the debugger with Release mode. But as it turned out, the difference with dotnet watch / run (for which no shortcut key exists by default in VS Code) was very big and therefore the result was wrong.
I hate stuff like this outside of the hotpath. I can see using spans in for loops because it is a reasonable abstraction. The only thing there, is I think they should "fix the compiler" to make foreach loops on certain constructs lower themselves to the faster code whenever it makes sense (I know that is sometimes easier said than done, but they have done similar things with high level constructs in c++).
I think foreach on a span would be a nice tradeoff between clean(er) syntax and performance if that were the case. TBH what I really want is a world where LINQ runs just as fast as the equivalent loop, but I guess I won't hold my breath on that.
Remember folks, at the machine code level the REL (or equivalent ) machine code instruction is relative offset from base address point for an array. This REL takes 1 clock cycle to action. If you write your code so the compiler can work at the pointer level to ++ the pointer from the base address then it will go as fast as possible in the loop. My concern here is that some languages make this difficult.
I work in very big data NGS and other biological data and there is nothing like C or C++ to operate on pointers directly giving the best performance at low level methods/functions since the compiler can easily make efficient machine code both using REL and small memory manipulation. For the best performance you need to understand compiler design, machine code design and hardware memory manipulation at the low level to take advantage of this - and avoid pitfalls - especially IO waits to memory or even L3 cache.
For most people (especially at high level languages that don't have primitive data) then the performance improvement at this level of effort is not important/irrelevant, but where low level numerical analytics algorithms on big data it is worth effort. It comes down to what are you trying to do, and does your requirement dictate that one small part of the code need to be high performance and the rest maintainability is key - so just make the code readable.
Also remember that algorithm is key - does not matter how you write your loops if you nested loops the wrong way round or that the algorithm has bad order then you are wasting your time.
exactly what you said at the end, if I have to deal with unsafe code why am I even writing in C#, then I could use just something different that is made for pure performance
Very true. A slight performance hit - and we're talking about a 0.7 ms vs 0.9 ms for a million items here - is what I'm more than willing to pay to use friendlier semantics in a programming language. Although I have to say that I like to overoptimize some of my projects as a hobby and I'll gladly take that 0.2 ms ;)
It would be nice if the compiler could identify safe times to do the optimization for us.
And where do you benchmark the conversions from whatever you have the data loaded into, to a span?
When I want code to run fast I use the for loop. When I want to understand the code tomorrow morning I use Linq. When I was a kid I'd just allocate a pointer to a struct and increment the pointer. The good old days.
There are rare occasions when you really *need* to eek out every bit of performance, especially when iterating through extremely large collections, and this is a place I would use such a thing. Interestingly, since much of what I have worked on lately has been stuck in .NET 3.5, I didn't have access to any Span, so I had to write my own ArraySlice class for a similar purpose.
I don't think it's an argument of if you should be using one language over the other. If you need that level of performance, C# is probably not the best candidate. But, its still another tool in your toolset. Thanks for sharing!
are you saying that you should not think about optimizing your code just because its C# ?
I love how C# can be both a comfortable, safe, relatively high level language, but recent versions also allow low level trickery like this. Not useful for your average Web or GUI app, but great if you're writing system software like databases (see RavenDB) or in fact the CLR.
So C# aspires to become a *lower* level language?
This is actually a good way to keep you employed, as not many others in the company would like to maintain those code.
This is an interesting insight into how different techniques compiles, but the whole endeavour looks very much like a premature optimisation. Clearly some scenarios have been optimised in the compiler and there is no reason why this might not change for future compilers. These techniques could be come pointless obfuscation in the future at best, or even less efficient at worst. For me the use cases of this kind of optimisation are very specific and probably dubious.
always learning something new from Nick, great vid!
Thanks Nick! Just added the Unsafe_For_Span to a project I’m working on. Was it necessary? Absolutely not. Was it fun to try? Hell yes!
It's basically how we traverse stuff in old fashioned C. Moving the pointer to the next node, moving the array pointer one step further etc
I love how we have to write all this code in these new languages now to do something which lower level languages do by default, Assembly, C, and Pascal have used direct memory pointer operations to iterate through arrays since the dawn of time, but to get c# to do it at the same level you have to write a bunch of code… why can t the language simply be smart enough to use pointers internally automatically… Great video though, its good to know this stuff…
Reminds me of how on older C compilers looping over an array by index would generate worse code than looping with a pointer.
Not true on a modern, optimizing C compiler, but I wonder if this is a similar deal. And judging by the use of unsafe it sounds like it might elide bounds-checking, which would certainly also explain some of the difference, though I'd expect it to be very very small on large arrays due to branch prediction.
use nint for indexing with Unsafe.Add for better code gen than int, it will be faster than span that way. So instead of using var i = 0, do nint i = 0
Love delving into the deep magic. Great video, Nick, thanks!
I think it is good that you are showing some use cases of the "unsafe territory". The "unsafe ness" with pointers and all that are actually part of the language and should not be feared, all though it should be treated carefully, of course. But as it is part of the language I think it is better to educate people about it. I mean memory management is kind of tricky if you are not used to it but it is not magic and evil. Once you know how that works it isn't that bad and it opens up doors. And yes, if you are coming from C# or Java or other GC-langs, be sure to pack a lot of feet because learning memory management will have you shooting yourself in the foot many times before it becomes natural ;)
This feels like we've come full circle from the 90s and we're back to linked lists.
Super insightful and good stuff to know, even though I may not use it, I'm sure if the case arises, I will be recalling this video from you Nick.
Cool experiment and good to know!, What you do inside the iteration is almost always going to make iteration speed optimisations insignificant in relative terms. Probably not worth the unsafe memory risk in the majority of cases.
My half sent on, "If I need to write code like this, why am I using C#".
I think that if you are going with a project where these kinds of optimizations are critical then yes pic something closer to C. However I think it's a great thing that C# allows for this since in some cases this kind of optimization my appear as a surprise, so the ability to "go down a level" is really useful to deal with this kind of nasty surprise.
Loving the videos! Keep them coming.
Did not expect the answer to be pointer arithmetic.
Very interesting.
If you are responsible for developing and maintaining millions of lines of code, which would be tocuhed by diff. competencies of developers, I would choose comprehensibility over minor performance gains. Maintenable less cryptic code is always better. Though performance optimization is an important concern, it is not required to this degree (1-2% improvement while making the code less understanble). It is really required when one is developing generic libraries, application frameworks and utilities, where developers really know what they are doing. But for the real world enterprise scenarios
Compilers should just do this for you when possible honestly. People say that premature optimization is the root of all evil in programming. But they also say if it ain't broke, don't fix it, so optimization won't happen except when it is in one critical location. Following that, they also say that software engineers have managed to reverse the hardware progress of the past 10 years to where modern software is as sluggish as their 10-year-old counterparts. It is death by a thousand cuts.
This is C/C++ buffer traversing in C#. Please, never do this so I don't have to come along later and debug a nightmare intermittent failure in your code. Its not named unsafe arbitrarily. If you truly want / need speed write it in C/C++. (edited some snark out of my comment)
I agree with what you said at the end. If you _really_ need that kind of performance, maybe c# isn't your tool.
Nice video! How does the performance compare to optimized unsafe context code? I landed up in unsafe coming from c++ and that boosted my algorithms. But I'm worried people after me will not be familiar with unsafe context. This is a little nicer syntactic wise(although it still is not for everybody) and so I'm all into span and co. when performance stays maximal.
Thank you!!! All your videos are great!
You can also use this technique to index a multidimensional array as a normal one
4:50 funny you mention a reference to the 0th element...
One of my favourite things in the newer .NETs is MemoryMarshall.GetArrayDataReference.
And it allowed me to write this funny little extension. It's safe even when the array is empty.
public static int IndexOf(this T[] array, ref T entry)
where T : struct
{
var id = Unsafe.ByteOffset(ref MemoryMarshal.GetArrayDataReference(array), ref entry).ToInt64() / Unsafe.SizeOf();
if (id < 0 || id >= array.Length) return -1;
return (int)id;
}
FAKE EDIT: and nick mentioned GetArrayDataReference 4 minutes later 😂
This is like being hired by a big company as some kind of business manager and be proud after 1-year because you have decreased the internal costs of your company by some big figure (just because you decided to...stop buying 1 million coffees for your employees every year...). Anyway, it is great C# offers that possibility. It might help some companies in very specific scenarios, who knows, they can say their clients now "this click" runs in half the time!!.
When working with array or list with fixed count, I have a habit to cache `array.Length` when using for-loops, i.e. instead of writing `for (int i = 0; i < array.Length; i++)`, I write `for (int i = 0, iMax = array.Length; i < iMax; i++)` so that it only needs to access `array.Length` once.
It won’t make a visible difference even in those benchmarks I showed. There is a section in the video showing the benchmark delta with and without caching the value and it’s less than a nanosecond
@@nickchapsas Normally it wouldn't. But the code you have as it is has the `for` loop doing a full field access every iteration in order to access the `_items.Length` value. This is in contrast to your `foreach` (and other benchmarks) that do the field lookup ONCE and only once. Your `for` loop code is doing the field lookup N*2 times for the size of the array (once for the `Length` and once for the indexer value access). Even if you keep hitting the `Length` property each iteration, you need to make a local variable reference to the `_items` field rather than hitting it twice for every single iteration. There's a very real chance this is the cause of your curious benchmarking times that you pointed out where the `foreach` version outperformed the `for` version.
If i am not mistaken, this practice may hurt the performance. With for loops from 0 to arr.Length, the jit can remove the bounds check. When you change to use a variable, i am not sure that will do the same.
@@shingok It may depend on runtimes, particularly the earlier ones that didn't have the bounds checking optimization. Doing some quick tests, LINQPad's reported disassembly for both versions are very similar: www.diffchecker.com/a2KWo9Zv and a quick benchmark comparing them across 10 million entries yielded functionally identical results: i.imgur.com/5YQZ9Iq.png I don't think tossing the Length value into a local variable makes a difference. But if your collection is a field that you have to load every single iteration, that can have an impact. So it's important to have a local variable reference to `_items` here rather than hitting the field twice each loop.
Will there be an increase in speed if you assign the "length" of the collection to a variable (before the for loop) and use this variable in for expression, instead of getting the length of the collection every time
I think alike your sentiment that is if I am required to write C-ish code for performance reasons, I should probably write C (or Visual C++ and then reference the dll from my C# project; never done that but I think it should work).
I feel like this is not some code I would write, because even if I know what it means, co-workers or future co-workers might not.
On a side note, the StdDev of this approach looks like the improvement is more like a hit-or-miss improvement. Iterating once can be super fast, but also can be slower. The nice thing about the regular loop with Spans is that it is fast AND fairly consistent in execution time.
p/invokes have overhead though
It seems weird to me that a very high level language which has such low level access to memory/pointers does not automatically do some of this stuff for you if it can
Yeah that syntax for getting a managed pointer essentially looks like how you would do a reference (or const reference) in C++. That's essentially what it's doing anyway, which is why it's so fast.
since the span does not take into account new changes to the list, it is a natural cool way to perform Gauss Jordan iterative method for solving linear system. I mean Gauss Jordan, not Gauss Siedel.
This kind of information should be used by the compiler to automatically optimize clearly written code. The answer is not to use difficult to read and understand code in order to trick the compiler into using a faster implementation.
I think it's good to put the knowledge out there and I appreciate the video. However, I feel like there needs to be a strong caveat because beginners and even intermediate devs will want to use these techniques because they are "cool".
My advice is that you should always write correct clear code first. That means, always use foreach loops and only after profiling do you then optimize. Because like the benchmark is showing, you don't really get any tangible benefits unless that code is called 100,000+ times which is true for Framework libraries but rarely true for everyday Line of Business applications. Not to mention the fact that compilers and the JIT are pretty sophisticated and in some cases can automatically do these optimizations like in the case of int[] so in those cases you have uglier code for no benefit whatsoever
I'm sure you already agree with this but you need to make it clear for beginners and intermediate devs who stumble across this
Is there a risk of going over the stack limit if I do the Span - method? It's not stackalloc, but does casting the list into span put all the data into the stack ready to use?
Let's say I have a mesh with a list of million vertices and I declare that as span in the start of the function and then proceed to do operations to them which might put more pressure to the stack limit.
Yes you can cause a stack overflow if you are not careful
First thing when I saw it - can't you just use this to create really performant ForEach(this ICollection list, Action action) and ForEach(this T[] array, Action action) and get an easy boost without changing much of your code? More importantly, can it use Memory instead of span to support async?
That would be slower due to the delegate indirection. And Linq already has that.
@@protox4 Correct you would incur a penalty for any closure in the lambda at least. But if you're calling another method in the loop anyway, which is very likely, it wouldn't be that bad. This would still probably be faster than LINQ, because LINQ is based on IEnumerable, enumerating, which is the slowest method.
Also, you wouldn't be protected against collection mutation while iterating, that would be very dangerous to use as a generic solution for all cases. IMO it has its place in deep optimization scenarios, only when there are a lot of iterations to be made and only when you fully control the collection.
@@robertnull Right, same problem as a for loop, the extra checks may be the main reason foreach is a bit slower.
Great video. I usually do fixed and use pointers if I need performance in C#. It performs as fast as C. UNSAFE is only really unsafe for NOOBS who don't know what they are doing, for us in the know unsafe is totally 100% safe :)
Is that something you could benchmark, and maybe explore to a degree? Is there a benefit to writing a highly optimised library in C++, and adding it as an unmanaged dependency. Can you get further performance boosts by passing a collection (or pointer) to unmanaged code? Where should one draw the line?
Really interesting content. Thanks!!
Maybe I'm wrong but marschaling, aside particular use cases but is mandatory when you deal with different language or existing API that exposes their functions or datas by pointers/reference (dll in C or other APIs....).
Then we have to deal with so called "unsafe code" and anybody might understand that. Span is the C# way to deal with performance matters IMHO.
What do U think.
By the way, like your videos, I'm keeping on watching them!! Good Job....
Sincerly yours....
You say several times that this is for read-only access but it seems to me you could increment the integers in the span. Would that work? To put it another way, can you use this for map as well as reduce?
Never said it is read only access. You can totally mutate the data.
@@nickchapsas thank you. That makes sense. Obviously you can't add or remove list elements but you can mutate individual elements.
@@mildmanneredjanitor0 You can even use the Unsafe.Add to mutate a string ... essentially grap the pointer and rewrite it... it actually can be used to implement optimized decoding scenarios like converting bytes to a HEX strings. In that case you would allocate the string in the sice you need and grab the Unsafe.Add with a ref... ref var foo = ref Unsafe.Add(ref searchSpace, i); and the foo can be mutated.
is there difference between IList and [] / List ?
IMO all this stuff should be a matter of the compiler. For example the compiler, in many cases, knows which is the real underlying type so, if I use foreach, just because a "fashion" matter, with an array of value types, the compiler should "translates" it with a "for" if it is more efficient.
Why not Memory?
My guess would be that C(++) is faster than C# because memory pointer operations are faster than memory managed operations. And you can improve the operation of C# code by bypassing memory management by pointing directly to memory. But memory management makes coding a lot safer.
Having started in binary, then moving to c, I always liked pointers... I do not use references very often now since the code gets complex
You say binary, but you really mean assembly I'm sure.
If you look at the disassembly span is faster because there are no bounds checks also cos it is fixed length. The CollectionMarshal.AsSpan is really just abusing the internal list via reflections from the IList and giving you a span pointing to the internal array, you're not supposed to add to or remove from the List because then your span (I think?)gets invalidated.
Very nice. Unfortunately Span cannot be used in async methods. Any solution for that?
you can wrap the looping code in a sync function and call it from your async method
would have liked this placed on a chart as you would have a much more visual impact of the comparisions, however very nice work and very interesting how different optimisations can be applied.
reason to use whole "unsafe" C# stuff is simple: even unsafe C# code still is better then C++ equivalent :) not saying C++ won't be faster or that you can't optimize this even more in C++ but it's still C++ with all its weirdness. at least that's my opinion, I left C++ in 2011 after 10 years of using it professionally and never looked back.
... as a pentester, I very much welcome pointer arithmetic usage in C# xD
Good to know, but also smells pretty 1990s.
One of my biggest gripes with all C based modern languages is failure to handle maps or folds well. C gave us many great ideas, but it's focus on loops* is becoming a real problem (that has been stewing for ~20 years now). Since MMX, code has not been strictly linear. CPUs have not been limited to handling SIMD by repeating the instructions yet our languages are still written as though they are. Map and fold really should be primitives, because high level code will never know how the silicon wants something vectorised. It bugs me that I have to go out of my way to avoid telling the compiler that there are no dependencies between the blocks in order to get performance (not least because it also makes it harder for future readers). A good compiler will spot where there are no dependencies and actually turn your loops into vectorised maps.
*As a side note, functional languages over-dependence on recursion is a similar issue. Both these control flow graphs have nice properties, but sometimes the reality of a program is that it has a different set of nice properties that I would like to be able to tell the compiler about but are lost if you try to coax it into a loop or function based program.
But there are so many algorithms where the order does matter, and where things need to be mutated during the loop... As you say, a good compiler will (and should) spot when there is no dependency, you can also hint to the compiler, but also many of these optimizations are hardware-dependent... I wouldn't mind better map and fold support, but if I had to pick I would prefer loops to be the first class citizens, programming in something like R shows how obnoxious it can be to try to vectorize every algorithm to get decent performance when so many are super trivial to write as a loop.
Ah yes, Horrors beyond our understanding, jk aside this is technically just pointer arithmetics
If the goal is to ultimately using unsafe method why convert list to listofspans, can we not fetch the address of first element of list also just like we did for array ?
In the original video, did he run the looping in different orders? I've seen very different times doing that. He also didn't include Linq.
I believe that if you want to get thie level of performances, you should make C/C++ instead of making all this that moreover, reproduces exactly what we do when we handle an array in C/C++.
The sad thing is that some developer do this to impress people to say "look how good I am! My algos are very performant. I'm don't make like everybody!... I'm different... I'm better!!!!", but most of time, developers of this kind do not only make their code painfull to read, they also tend to architecture their applications in way that makes it dramatically slow and impossible to maintain.
Still, that's fun to see that we can reproduce C algo in NET! :D
The biggest performance benefit seen here is around 10microseconds on a list of 1M objects. For reference, if you could speed up an algorithm you're running by even a single millisecond (maybe by avoiding an extra loop somewhere) it'd be a performance improvement 100x greater than this. It's interesting to see what C# is capable of in this case, but I feel like videos like these are misleading and harmful when they're not prefaced with with big -massive- disclaimers that put performance into context for devs.
Microsoft can use obscure methods to squeeze performance out of every line of code in LINQ, but that doesn't mean you should be replacing the foreach loops in your bubble sort with pointer arithmetic... or any of your foreach loops really. I know that's not what's being argued in the video, but I often see people bring up points like the one when discussing optimization and I think information without the right context bears some of the blame for devs missing the forest for the trees when thinking about performance.
Don't get me wrong, I think the video is very well done. I just wish it felt more like "here's an interesting finding about how some loops are optimized" and less "you wanna write the fastest code? here's how Microsoft does it".
How do you lowering the C# in Rider? I see only IL code.
It's coming in the next version of RIder
I like how it starts with lets make fast loop without unsafe code and proceeds with API's that does unsafe operations. Nonetheless it is informative.
I think it's very cool we get these low-level functionalities. I know, you could argue, if you need to manage memory, why not write C or C++?
But you don't have to use these - but you *can* - I wonder, is it doable to get performance out of C# to make a game engine that can compete with C++?
Hay Nick.
Thanks a lot for sharing these amazing techniques.
I wish if you could make some videos on SQl Server fastest queries.
Please.
Thanks from Afghanistan and pray for us to get rid of barbarian talibans.
i'm a c++ programmer, i like the fair conclusion :).
Very interesting. Could you please compare this with C++? I really love the awesome of those improvement on C#
I second this. I have a very specific application for this and an wondering how much slower c# is.
I nice video. However: Spans are introduced to get the performance of plain array's of value-types, not the other way around. The reason that lists are normally slower to iterate is only partly due to the overhead of the iterator: it is the checks that the collection is not being changed in the meantime that is the biggest penalty.
As your own data shows: on looping int arrays directly nothing can be gained; the iterator is a valuetype that has exact the same memory footprint as the for loop with its value, ans also the spans do basically the same.
Long version short: The trick you showed allow you to directly access data, bypassing the the encapsulating data structures. In most cases: those encapsulating was there on purpose...
coming from C/C++ the adding makes perfect sense
Does C# have a "nomicon" book for learning about unsafe like Rust does?
Any chance that using this with something like dapper would allow span through the data chain?
I don't think its simply just 'Its not fit for C#', but its extremely likely that the time someone needs to spend on doing an iteration like this (Especially if done from scratch) is more valuable spent on other things in the project, even for purely optimization. In most projects basically there would be bigger optimization potentials than this. But otherwise, cool trick, it could be really fun to trial-and-error whit it for fun.
This is actually the best way to loop through multi-dimensional arrays, you get a clean and fast code without having to calculate the dimension indices.
What do you think about adding a delegate as an input parameter for this super_crazy_fastest_forloop_ever()?Is that's probably killing the performance or it could be good worth it?
How the hell do you write code outside of classes, c++ style? I dont get it, is it some tool for prototyping in rider, that hides program class and main method in console app?
Nah main is implied in program.cs since C# 9
Why are there unicode chars in your identifiers?
A great example of why I would never use this language. People still struggle to understand that no level of "safety" stops programmers from breaking their code. So all this "safe" stuff ends up mostly getting in the way. If you try to make efficient code, it looks like this mess. And if you fail, you end up with an unreadable undebuggable mess full of unsafe stuff.
I think one big advantage of (modern) C++ is that you get plenty of safety options but none of it is required. And unsafe code ends up being simple to read and follow so it's easier to debug.
On the other hand, you have to write C++
@@nickchapsas writing C++ is not as bad as reading it. It is designed to have a feature for every programmer, but you aren't every programmer so you only really need to know the stuff relevant to your style. And most of the features are high level stuff so you don't need them (and shouldn't use them) in high-performance code.
I'm not trying to make you switch languages. I just want you to seriously question the idea of writing weird stuff that obviously isn't how the language is intended to be used.
One question: is cost of marshalling the collection as a span list small enough for this code to still be more efficient than for loop?
In this case they're just getting the pointer from the internal array of the list and returning it as a span, it should not cost much
The benchmarks fully answer that question, don't they?
Weird. Been using C# since 2003 and I didn't know they had a way to do pointer arithmetic without actually using pointers.