Just a quick clarification. The hand-written version of the host extraction assumes that there are no query strings or hashes on the URLs. That's just because of the sepcific usecase. If you want the behavior to match the Uri class you need to add index checks for a question mark and a hash to extract the correct host. Adding this logic won't affect the memory shown in the video nor the execution speed at all (I tested it).
We used to use a SrtingPool when we were using a middleware messaging library because there was a fixed number of strings we could use to create the recipient and sender strings. We were generating potentially hundreds of these messages per second so we couldn't be thrashing around in memory like that. StringPool to the rescue!
I've come to the conclusion you've bugged my house, only yesterday I was having a conversaion about strings and their performance... and you post this video, only last week I had a discusion about Dictionaries and their performance to then find out 5 minutes later you posted a video about something you could do to improve their performance in some cases. Keep up the work! These videos do help a lot!
Out of curiosity I benchmarked using a regular expression. The below expression benchmarked at ~1.5 microseconds and using 3.5KB of memory for the shorter URL and 11.5KB of memory for the longer. So memory usage doesn't scale very well at all (and is significantly worse when compared to the benchmarks in the video both in time and memory). (\w+:\/\/)[A-Za-z\.]+
heh, I had rolled my own string pool and added it into our project. We're not using it properly yet though. We do repeat some strings (particularly user names) a stupidly absurd number of times, so I had intended it to get used to coalesce those into a single allocation. The basics structure was an int -> weakref dictionary wrapped by a ReaderWriterLock, the int being the string's GetHashCode. I did it that way so the GC will clean up any strings that fall out of use, so I made it also force a vacuum after every 32768 inserts. I had no idea that nuget package existed though. I'm looking at its code, and it does some crazy stuff with structs and refs, but I don't think it would GC the way I would expect it to. At least it can be reset unlike .NET's Interned strings.
It is very tricky memory/cpu tradeoff, only applicable to code working with low diversity strings (sounds pretty untolerant 🙂). Memory optimization can easily become memory disaster, when you collect never needed anymore strings like grannies collect garbage 🙂. This feature must be used either wisely or never, I don't think it works like cache with sliding expiration cause it is expensive to manage and will reset all benefits from those microoptimizations.
Fair point regarding string diversity... Unless you're expecting the same ones to regularly occur, not obvious whether the StringPool approach adds that much.
StringPool has a fixed size (which you can also choose upon creation) and it internally uses an LRU schema to automatically drop old string instances when it reaches full capacity. So it doesn't really have a risk of just completely filling your memory, it can't ever exceed the maximum capacity you decided 🙂
@@iGexogen Yes, using things incorrectly or in the wrong use case is generally expected to not produce great results, that is a universal principle not specific to StringPool 😄
I think it would be worthwhile to create a demo app where you can show that `300 ns` translates to `3 s` under some inputs. And then reuse that demo app for all your real-world benchmarks. Eg., an API which runs an ETL pipeline over millions of rows. Here you could have "GetHost" as an ETL step, and then show that this takes 1mill * 300ns for the whole step; and then issue lots of requests, and suddenly you've got significant performance problems. The purpose of the API/ETL could be to summarise the last (simulated) few days of IoT sensor events, or something where processing millions of rows / request makes sense.
What I wanna do is get a project that performs fine and then apply multiple optimizations to show how speed and memory improved overal performance of the app. That's gonna be a good example
@@nickchapsas I would love to see this, it would be a great way to demonstrate the overall process of identifying problem areas and methods of addressing them.
I may try this. Made a webscraper that parses all found links using URI to get hosts. This would certainly improve performance a bit / memory allocations as it gets called alot in this use case.
Hello, I think you should have mentioned String.Intern(), a mechanism similar to StringPool, but available in the core libraries, what are the pros and cons of both. Thank you.
In my opinion, this is an excellent solution when you are making a small DSL full of small needs with repeated strings, perhaps it is not ideal for all cases since it considerably increases the complexity, but it is one more tool to have hand when needed.
Technically there is another optimization you could do. There should be a function for indexOf() with a start index where it should be starting at. At least java does. Meaning one subString can be eliminated and you come down to 1 subString allocation instead of 2. With spans ofc that isn't required, but it on allocation too.
I came here to post this. Indeed, in the documentation for the overloads of System.String.IndexOf in the .NET Framework, there's one for (char value, int startIndex) and another for (string value, int startIndex). That would eliminate the call to System.String.Substring in statement 3 of GetHostGood(string).
@@alexisfibonacci GetBetterHosts only stackallocates instead of heapallocation. But the same optimization still can be applied because a stack allocation still has to be moved. Yes its faster but this optimization can be still used. Since not a second Stack Allocation isn't required either with my suggestion.
one thing to note that while your implementation may be faster and more memory efficient. But the implementation is also different. The Uri version will not break with a query Param while yours does.
@@nickchapsas Apparently I can't give example URIs in comments as they are blocked by UA-cam. But you search for a trailing slash, but directly after the hostname a fragment or query param can start without the trailing slash. Your parsed hostname would include the query param and/or the fragment.
@@prowhiskey2678 Sure so you just add a IndexOf check for question mark or hash after the forward slash one fails and you're done. I just tested it and it doesn't affect execution speed or memory at all
What if you created a readonly private field for the char array of the ://, then created a method which is gets the string already as ReadOnlySpan, performed the search, while saving the indexes in a Span on the stack, and only returned the substring at the end, without always converting to a span since it is converted implicitly in the parameter?
is manually stack allocing a char array necessary for the span version? I can't remember where i saw it or what the name was (or i'd google it) but I was under the impression the compiler (or maybe runtime?) sort of converts string literals into constants so that it can reuse the same instance every time. In which case, wouldn't just using a string literal (instead of the much harder to read char array) be equally non-allocating? Or alternatively you could manually declare a const string yourself, if the compiler isn't doing it for you like I think. Or is there some reason a const string would still end up allocating that I'm not aware of?
well guess i should've watched the next video, cuz i was definitely thinking of interning. Still curious if there's some reason why using a string literal over a manual char array would be worse for memory, despite interning.
Span needs to be returned as a string at some point. You can't visualize a span without eventually allocating it on the heap as a string. The benchmark just cuts to that point directly
@@Dustyy01 That's for the original string. The host is nickchapsas.com so even though the span points to that, when it is turned into a string it is basically "chopped off" into a whole new string. It detaches from the span.
Thank you for your video. Few days ago, talking to some IT friends a question came up: Do you actually code? Like, do you work in real projects or not any more? Or never? Thank you.
I do. I've spent the better part of the last 4 years working for one of the largest payment gateways in the world (Stripe competitor) building systems dealing with thousands of requests per second and millions of users. Currently I am educating fulltime, while still practicing the same techniques I used at my work for my own projects. Currently building something that i can't announce yet.
@@nickchapsas Thank you, Nick, for you reply. I hope my question was not invasive or impolite. And thank you for your videos. We all learn a lot from it.
Yep, all this modern memory efficiency things in dotnet are great additions. By the way, we have at least 3 different of those pool types: MemoryPool, ArrayPool, StringPool. Note that those are also thread-safe! This might explain why you noticed a bit of lower performance when using the StringPool, because of some protective overhead.
I recently optimized some code for printing long (~250,000 characters) strings to the console, specifically for Linux. The tricky part was to turn the long C# StringBuilder's utf16 characters into a utf8 string, without the GC going amok. With a bit of unsafe code to allow manual memory allocation, I was able to turn the StringBuilder into a Span, then converting it to utf8 and storing it in a Span, and then use P/Invoke to print it to the console using libc's printf function. The speed is heavily CPU-dependent, but I managed to print around 12.5 million characters to the console every second (most of that being ANSI escape sequences to apply colors to the console output), all without the GC going crazy, and only using around 12-25 MB of RAM according to Visual Studio.
@@29Aios Converting from UTF16 (char/ushort) to UTF8 (byte) was actually the easy part. System.Text.Encoding.UTF8 has a "GetByteCount" method which tells how many bytes are needed to encode a UTF16 input to UTF8. It also has a "GetBytes" method which has an overload that takes an input Span and an output Span, and it then handles the actual conversion. Then you just gotta remember to add an extra byte to the byte span, and set it to zero, so the UTF8 string is properly null terminated.
"I managed to print around 12.5 million characters to the console every second" what? your "print around 12.5M" = 12.5M fps it's impossible by definition I work with consoles, and heavy console output is almost trivial count the time of output lines, and if this time exceeds some delta - skip the entire output buffer you need to separate the internal console buffer and the rendering of this console buffer, but you need to write your own console for this and then you will be able to output at least as many lines to the console, you just won’t see them))) and you will always see the last buffer and yes, what you are doing is crazy, because by definition, millions of lines are impossible to read and therefore it is not necessary the console should output the required 400 buffer lines, but this is not true either, the console should output 25 lines and no more you have 60 frames per second, let's say 120, 25 lines per frame is 120*25 = 3000 typical renderer - renders 30-40k lines per second on built-in video cards without drivers, only GDI, it doesn't get any faster this means that continuous line rendering at a fixed 60 fps (above) is rendering 1500 lines per second and this is great because it does not load the processor (about 3-5%) yes, yes, any huge number of lines and their rendering is about 3-5% CPU load, I have such a self-written console) in the most difficult scenarios it is 3-5% and yes, if you write your own console, then don't forget to add a pool of graphical objects, because it makes no sense to rebuild the same graphical objects, keep them immediately ready for rendering, because usually a lot of lines are repeated or contain repeated parts - but this is means that these are repeating graphic objects
@@AEF23C20 Alright, let's take this step-by-step: - You need to calm down. You are needlessly aggressive and judgemental over something that does not concern you. You are jumping to conclusions that, as you might imagine, are wildly inaccurate. - The project in question is a TUI framework I'm doing as a hobby project, and the numbers from my original comment relate to a stress test, hence the large number of characters being written to the console. As you might imagine, such a framework prints over the same area of the screen buffer for each render pass. Most of those characters are part of ANSI escape sequences, specifically the 38;2 and 48;2 sequences which dictate the 24-bit RGB value of the fore- and background, respectively. - I said 12.5 million characters, not frames. Please learn to read. The performance is naturally heavily dependent on the system it runs on, and the console host it is executed within. My home PC can print around 6-7 "FPS" with conhost set to the default 120x30 character size, while Windows Terminal seems far less consistent in terms of "FPS", jumping between around 30-100 "FPS".
4:24 IndexOf accepts a startIndex argument! No need to substring here! When comparing results in the benchmark, the difference is very negligible between the Good & Better solutions, so I'd suggest using the more intuitive and readable one (which in my opinion would be the homemade one but everyone is free to have their own opinion).
hi I'm curious to know that in real world when we write code and we are almost familiar with docker, kubernetes and ... should we write most performant code that can handle a lot of requests as a single instance or it's more common to write the most human readable code (by writing "new Uri(urlString)") and use load balancing and other techniques to handle more requests?
Will using ref keyword and changing url variable that is passed be as memory efficient as StringPool but as quick as span? You are not creating new string and changing one that you already have so you need to allocate string only once?
java has a string compacting garbage collector where it will consolidate multiple instances of the same string down to the one pointer - it would be great addition to .Net in some circumstances. Sometimes an external library hands you lots of repeat strings (i'm looking at you, Azure Blob Storage SDK) and you just have to take the memory hit.
I thought that in C#, the strings were reused via reference and if you create two string objects with the same content, they point to the same memory location. Why would we need this pool then?
that only happens for strings created at compile time and called string interning. Checking every created string at runtime against every string in memory is very expensive.
@@vivan000 Which is why the StringPool is slower even thought it doesn't allocate. The coder needs to be clear on what they are doing and optimize appropriately.
See, when you add a new string, you need to compare it with all strings in the heap, and as Ivan said it's too expensive. Instead it always added to the heap, and references of the same string can be equal only at compile time. I suppose when you add a Span to a StringPool, there are 2 cases - 1st, happy, when hash code is calculated from span, and if there is the same entity exist in Dictionary buckets, it fully compared byte by byte, and either returned existing string reference or 2nd - created a new one and returned new string from Span.
I thought that using a Dictionary for caching the results from the GetHost function would be better than using a StringPool. What are the advantages of using a StringPool vs Dictionary for caching the results? Wouldn't the Dictionary be better since it's faster?
There’s quite a few actually. StringPool has logic to automatically unload data that isn’t accessed frequently, it has a fixed size, it is thread safe, it uses a lot of custom set logic to be as efficient as possible.
Guys, MemoryCache has the same logic as StringPool and uses ConcurrentDictionary inside, also linked entries and much more. However, big advantage of the StringPool.GetOrAdd() is that it accepts Span parameter (MemoryCache/Dictionary do not), I suppose it calculates a Span hash and if it exists in buckets it compares if it the same, like a Dictionary does, but Span == string, and if so, it returns reference to existing string or else allocates a new one from Span. So there no need to allocate a new string on the heap each time, but for MemoryCache/Dictionary you instead allocate a string key on the heap to pass it as parameter to Get()/TryGetValue() PS. It would be nice to have string.Intern(Span) overload, so behavior is almost the same as StringPool, but singleton.
I think this is literally what I always imagined for creating allocation-less string returns. If I'm correct, StringPool is *essentially* like StringBuilder, except it's a singleton, meaning it's accessed from anywhere. It's essentially a "scratch pad" for your string data! This video made me ask so many other questions though. Like if StringPool was being used in threads, would there be (a need for) a seperate pool for each thread, to assure non momory conflicts? Either way the concept is amazing. This would help so much when it comes to games/ui where you are creating new strings each frame. Instead of trying to avoid it by doing value checks or somthing. Also, at 5:56 (line 8), I thought if you used a string literal, it is stored globally in read-only memory, so passing in a regular string would not allocate new memory. So no need to stackalloc right?
You’re calling AsSpan method 3 times. Is there any mileage performance wise in creating the span instance initially then manipulating that one object ?
Actually pooling is very important in my game framework Velaptor!! I use the pooling technique when loading textures and sounds which are expensive operations and its fantastic. Especially in game development where performance is important as well as saving those memory allocations. Great video!!
Surely, it's an optimization for cpu/memory performance which should be used on customer demand and after profiling. The maintainability of the code will also suffer because of these optimizations.
In this case, there isn't much of a difference. You can even use the const as it was originally and you won't see much of a diff there. if anything it might be a bit fater
That's not quite true. People in the comments seem to completely misunderstand how string interning works. I think I need to make a video to explain this
That 'string pool' applies to the entire App Domain, and cannot ever be emptied or truncated without unloading said AppDomain. It contains all constants and literals loaded within the assemblies thus far, and also anything that was explicitly added via String.Intern. However, unless you access it via specific paths (e.g. Enum.ToString), or explicitly call String.IsInterned or String.Intern, you will NOT be using these instances, but rather heap allocated copies, and to even call the latter methods, you're still very likely to be using a heap allocated copy to reference and compare. In summary, the biggest draw of this StringPool over the built in CLR one is that it can release its references to these strings, and allow them to be GC'd when no longer required or in use, a secondary draw is that you can use Spans to avoid needless allocations.
Have you seen overload with startIndex? public int IndexOf (string value, int startIndex); public int IndexOf (string value, int startIndex, int count, StringComparison comparisonType); Also replacing of literal string with Span doesn't have sence as literal string is in memory once and forever, but span is recreated each time. string GetHost(string url) { var i = url.IndexOf("://", StringComparison.Ordinal); i = i == -1 ? 0 : i+3; var j = url.IndexOf('/', i); return j == -1 ? url.Substring(i) : url.Substring(i, j-i); } But actually they all are wrong as host must end not only on /, but also on ? and on #. I would've make a static array of these 3 chars and use IndexOfAny(ends, i).
Is there honestly a problem here you're solving? I mean you could say "every byte is one too many" but does it *really* matter for any other purpose than to learn the internals? If I had a problem with slow services I'd just throw 4 extra nodes against the problem and call it a day. I worked on extremely high traffic APIs (millions of requests per hour) and these things have not once been a problem for us.
String pooling is a language agnostic concept. On top of that there is nothing less readable in the final example. It's exactly the same as the previous example with a single line being different
@@nickchapsas I don't think so. In C++ we have pointers to strings. Any mutation of strings in any thread is globally, but in C# it will be first copied, ie will be locally
@@nickchapsasHey love your videos, I just see that you make a lot of optimizations that are not ordinary for c# but a normal day in the office in c++.
That's depends where this code is used. Say in high load application, when there are thousands/millions of function calls per second, it should be optimized as much as possible, even sometimes it would be preferable to move a code to unmanaged C++ lib, or even in ASM lib, where specific processor code is implemented. Just for ex. Medical Visual Systems company, here in Russia, was going to hire me. They write 4K/60 from 8 cams around medical operating room to their DataCenter with cams streams encoded by specific Intel CPUs (not GPUs) to H.264 or H.265 (don't remember which), they are going to provided high-tech medical operating room to Antarctica Vostok station soon. So there is no need in code optimization, which is not frequently used, it should be decided on customer demand and then code profiling, but code maintainability is very valuable
Just a quick clarification. The hand-written version of the host extraction assumes that there are no query strings or hashes on the URLs. That's just because of the sepcific usecase. If you want the behavior to match the Uri class you need to add index checks for a question mark and a hash to extract the correct host. Adding this logic won't affect the memory shown in the video nor the execution speed at all (I tested it).
Or the port, before the slash or end of string! (example:8080)
And it doesn't support IP address hosts, but maybe that's desirable.
We used to use a SrtingPool when we were using a middleware messaging library because there was a fixed number of strings we could use to create the recipient and sender strings. We were generating potentially hundreds of these messages per second so we couldn't be thrashing around in memory like that. StringPool to the rescue!
I've come to the conclusion you've bugged my house, only yesterday I was having a conversaion about strings and their performance... and you post this video, only last week I had a discusion about Dictionaries and their performance to then find out 5 minutes later you posted a video about something you could do to improve their performance in some cases. Keep up the work! These videos do help a lot!
Let's check if you both have quantum entanglement, think about something... say WeakReference😁
Out of curiosity I benchmarked using a regular expression. The below expression benchmarked at ~1.5 microseconds and using 3.5KB of memory for the shorter URL and 11.5KB of memory for the longer. So memory usage doesn't scale very well at all (and is significantly worse when compared to the benchmarks in the video both in time and memory).
(\w+:\/\/)[A-Za-z\.]+
I was wondering about that, thanks!
heh, I had rolled my own string pool and added it into our project. We're not using it properly yet though.
We do repeat some strings (particularly user names) a stupidly absurd number of times, so I had intended it to get used to coalesce those into a single allocation.
The basics structure was an int -> weakref dictionary wrapped by a ReaderWriterLock, the int being the string's GetHashCode.
I did it that way so the GC will clean up any strings that fall out of use, so I made it also force a vacuum after every 32768 inserts.
I had no idea that nuget package existed though. I'm looking at its code, and it does some crazy stuff with structs and refs, but I don't think it would GC the way I would expect it to.
At least it can be reset unlike .NET's Interned strings.
It is very tricky memory/cpu tradeoff, only applicable to code working with low diversity strings (sounds pretty untolerant 🙂). Memory optimization can easily become memory disaster, when you collect never needed anymore strings like grannies collect garbage 🙂. This feature must be used either wisely or never, I don't think it works like cache with sliding expiration cause it is expensive to manage and will reset all benefits from those microoptimizations.
Fair point regarding string diversity... Unless you're expecting the same ones to regularly occur, not obvious whether the StringPool approach adds that much.
I mean hostname might well be frequently identical.... I've got immense milage on string pools in the past.
StringPool has a fixed size (which you can also choose upon creation) and it internally uses an LRU schema to automatically drop old string instances when it reaches full capacity. So it doesn't really have a risk of just completely filling your memory, it can't ever exceed the maximum capacity you decided 🙂
@@Sergio0694 Ok that makes sense, it will not exhaust memory, but still will be less efficient in wrong use cases comparing to not using it at all 🙂
@@iGexogen Yes, using things incorrectly or in the wrong use case is generally expected to not produce great results, that is a universal principle not specific to StringPool 😄
I think it would be worthwhile to create a demo app where you can show that `300 ns` translates to `3 s` under some inputs. And then reuse that demo app for all your real-world benchmarks.
Eg., an API which runs an ETL pipeline over millions of rows. Here you could have "GetHost" as an ETL step, and then show that this takes 1mill * 300ns for the whole step; and then issue lots of requests, and suddenly you've got significant performance problems.
The purpose of the API/ETL could be to summarise the last (simulated) few days of IoT sensor events, or something where processing millions of rows / request makes sense.
What I wanna do is get a project that performs fine and then apply multiple optimizations to show how speed and memory improved overal performance of the app. That's gonna be a good example
@@nickchapsas I would love to see this, it would be a great way to demonstrate the overall process of identifying problem areas and methods of addressing them.
Great video Nick.Concerning the substring, you should try an alternative which Microsoft provides in Microsoft.Extensions.Primitives.StringSegment.
I just don’t trust myself to implement uri properly haha
I may try this. Made a webscraper that parses all found links using URI to get hosts. This would certainly improve performance a bit / memory allocations as it gets called alot in this use case.
Hello, I think you should have mentioned String.Intern(), a mechanism similar to StringPool, but available in the core libraries, what are the pros and cons of both. Thank you.
In my opinion, this is an excellent solution when you are making a small DSL full of small needs with repeated strings, perhaps it is not ideal for all cases since it considerably increases the complexity, but it is one more tool to have hand when needed.
I love these optimization videos, sadly I don't need to optimize much at my job
Technically there is another optimization you could do.
There should be a function for indexOf() with a start index where it should be starting at. At least java does.
Meaning one subString can be eliminated and you come down to 1 subString allocation instead of 2.
With spans ofc that isn't required, but it on allocation too.
I came here to post this. Indeed, in the documentation for the overloads of System.String.IndexOf in the .NET Framework, there's one for (char value, int startIndex) and another for (string value, int startIndex). That would eliminate the call to System.String.Substring in statement 3 of GetHostGood(string).
GetHostBetter shows this.
@@alexisfibonacci GetBetterHosts only stackallocates instead of heapallocation.
But the same optimization still can be applied because a stack allocation still has to be moved. Yes its faster but this optimization can be still used.
Since not a second Stack Allocation isn't required either with my suggestion.
one thing to note that while your implementation may be faster and more memory efficient. But the implementation is also different. The Uri version will not break with a query Param while yours does.
Why would it break?
@@nickchapsas Apparently I can't give example URIs in comments as they are blocked by UA-cam.
But you search for a trailing slash, but directly after the hostname a fragment or query param can start without the trailing slash.
Your parsed hostname would include the query param and/or the fragment.
@@prowhiskey2678 Sure so you just add a IndexOf check for question mark or hash after the forward slash one fails and you're done. I just tested it and it doesn't affect execution speed or memory at all
Wait... if it's a pool, shouldn't you return the strings back to the pool once you don't need them anymore? Or is StringPool not threadsafe?
I didn't even notice the memory diff tool was in Visual Studio also! Nice!!!
How stringpool performance compares to interning?
holy smokes what a great sponsor from AWS!!!
What if you created a readonly private field for the char array of the ://, then created a method which is gets the string already as ReadOnlySpan, performed the search, while saving the indexes in a Span on the stack, and only returned the substring at the end, without always converting to a span since it is converted implicitly in the parameter?
You don't need to create anything for literal strings. They are always in memory and never recreated.
is manually stack allocing a char array necessary for the span version? I can't remember where i saw it or what the name was (or i'd google it) but I was under the impression the compiler (or maybe runtime?) sort of converts string literals into constants so that it can reuse the same instance every time. In which case, wouldn't just using a string literal (instead of the much harder to read char array) be equally non-allocating? Or alternatively you could manually declare a const string yourself, if the compiler isn't doing it for you like I think. Or is there some reason a const string would still end up allocating that I'm not aware of?
well guess i should've watched the next video, cuz i was definitely thinking of interning. Still curious if there's some reason why using a string literal over a manual char array would be worse for memory, despite interning.
Could u not just Return a Span (or a readonly one) ?
Span needs to be returned as a string at some point. You can't visualize a span without eventually allocating it on the heap as a string. The benchmark just cuts to that point directly
@@nickchapsas right but the span is a view of the originally passed string, so that's where it is allocated right?
@@Dustyy01 That's for the original string. The host is nickchapsas.com so even though the span points to that, when it is turned into a string it is basically "chopped off" into a whole new string. It detaches from the span.
Thank you for your video. Few days ago, talking to some IT friends a question came up: Do you actually code? Like, do you work in real projects or not any more? Or never?
Thank you.
I do. I've spent the better part of the last 4 years working for one of the largest payment gateways in the world (Stripe competitor) building systems dealing with thousands of requests per second and millions of users. Currently I am educating fulltime, while still practicing the same techniques I used at my work for my own projects. Currently building something that i can't announce yet.
@@nickchapsas Thank you, Nick, for you reply. I hope my question was not invasive or impolite. And thank you for your videos. We all learn a lot from it.
Nick, I think you can use the string.Intern pool, which is part of the BCL, to do this.
Yep, all this modern memory efficiency things in dotnet are great additions. By the way, we have at least 3 different of those pool types: MemoryPool, ArrayPool, StringPool.
Note that those are also thread-safe! This might explain why you noticed a bit of lower performance when using the StringPool, because of some protective overhead.
I like your videos and how you explain all those things. Thank you!
Do you have any videos on how to find out first what is the bottleneck before trying to do these optimization?
I recently optimized some code for printing long (~250,000 characters) strings to the console, specifically for Linux. The tricky part was to turn the long C# StringBuilder's utf16 characters into a utf8 string, without the GC going amok. With a bit of unsafe code to allow manual memory allocation, I was able to turn the StringBuilder into a Span, then converting it to utf8 and storing it in a Span, and then use P/Invoke to print it to the console using libc's printf function. The speed is heavily CPU-dependent, but I managed to print around 12.5 million characters to the console every second (most of that being ANSI escape sequences to apply colors to the console output), all without the GC going crazy, and only using around 12-25 MB of RAM according to Visual Studio.
Uh, char is 1 byte, but utf16 is 2 byte, utf8 is 1-4 bytes, I see your pain !
@@29Aios Converting from UTF16 (char/ushort) to UTF8 (byte) was actually the easy part. System.Text.Encoding.UTF8 has a "GetByteCount" method which tells how many bytes are needed to encode a UTF16 input to UTF8. It also has a "GetBytes" method which has an overload that takes an input Span and an output Span, and it then handles the actual conversion. Then you just gotta remember to add an extra byte to the byte span, and set it to zero, so the UTF8 string is properly null terminated.
@@MZZenyl That's cool !
"I managed to print around 12.5 million characters to the console every second"
what?
your "print around 12.5M" = 12.5M fps
it's impossible by definition
I work with consoles, and heavy console output is almost trivial
count the time of output lines, and if this time exceeds some delta - skip the entire output buffer
you need to separate the internal console buffer and the rendering of this console buffer, but you need to write your own console for this
and then you will be able to output at least as many lines to the console, you just won’t see them))) and you will always see the last buffer
and yes, what you are doing is crazy, because by definition, millions of lines are impossible to read and therefore it is not necessary
the console should output the required 400 buffer lines, but this is not true either, the console should output 25 lines and no more
you have 60 frames per second, let's say 120, 25 lines per frame is 120*25 = 3000
typical renderer - renders 30-40k lines per second on built-in video cards without drivers, only GDI, it doesn't get any faster
this means that continuous line rendering at a fixed 60 fps (above) is rendering 1500 lines per second and this is great because it does not load the processor (about 3-5%)
yes, yes, any huge number of lines and their rendering is about 3-5% CPU load, I have such a self-written console) in the most difficult scenarios it is 3-5%
and yes, if you write your own console, then don't forget to add a pool of graphical objects, because it makes no sense to rebuild the same graphical objects, keep them immediately ready for rendering, because usually a lot of lines are repeated or contain repeated parts - but this is means that these are repeating graphic objects
@@AEF23C20 Alright, let's take this step-by-step:
- You need to calm down. You are needlessly aggressive and judgemental over something that does not concern you. You are jumping to conclusions that, as you might imagine, are wildly inaccurate.
- The project in question is a TUI framework I'm doing as a hobby project, and the numbers from my original comment relate to a stress test, hence the large number of characters being written to the console. As you might imagine, such a framework prints over the same area of the screen buffer for each render pass. Most of those characters are part of ANSI escape sequences, specifically the 38;2 and 48;2 sequences which dictate the 24-bit RGB value of the fore- and background, respectively.
- I said 12.5 million characters, not frames. Please learn to read. The performance is naturally heavily dependent on the system it runs on, and the console host it is executed within. My home PC can print around 6-7 "FPS" with conhost set to the default 120x30 character size, while Windows Terminal seems far less consistent in terms of "FPS", jumping between around 30-100 "FPS".
4:24 IndexOf accepts a startIndex argument! No need to substring here!
When comparing results in the benchmark, the difference is very negligible between the Good & Better solutions, so I'd suggest using the more intuitive and readable one (which in my opinion would be the homemade one but everyone is free to have their own opinion).
Worth noting that some of these methods won't work with URLs that have fragments, querystrings or (the admitedly rare) username:password style syntax.
hi
I'm curious to know that in real world when we write code and we are almost familiar with docker, kubernetes and ... should we write most performant code that can handle a lot of requests as a single instance or it's more common to write the most human readable code (by writing "new Uri(urlString)") and use load balancing and other techniques to handle more requests?
Love you!
i thought you loved me?
Beefy Donkey, I love you too
@@nickchapsas im feeling REALLY left out here. i guess im about as needed as a string allocation:(
rngesus, I love you too too
@@nickchapsas hahaha 😄
Good stuff, thanks Nick
Maybe you could cover string.Intern as well.
Good suggestion! Will add it in the list
Will using ref keyword and changing url variable that is passed be as memory efficient as StringPool but as quick as span? You are not creating new string and changing one that you already have so you need to allocate string only once?
java has a string compacting garbage collector where it will consolidate multiple instances of the same string down to the one pointer - it would be great addition to .Net in some circumstances.
Sometimes an external library hands you lots of repeat strings (i'm looking at you, Azure Blob Storage SDK) and you just have to take the memory hit.
I realy enjoy the perf videos👍🏽
Thanks for the video, I wonder how String.Intern work. I had the impression it was a kind of pool of string to avoid keeping string reference.
I thought .Net does string pooling by default to reduce allocations whenever possible. Or is that only for string literals?
String literals and some other cases. Since so many people misunderstand this, I will be making a video this week to explain it
What's that site that has all the boilerplate projects you can download?
I thought that in C#, the strings were reused via reference and if you create two string objects with the same content, they point to the same memory location. Why would we need this pool then?
that only happens for strings created at compile time and called string interning. Checking every created string at runtime against every string in memory is very expensive.
@@vivan000 The it seems like we need pools for everything 😀
Especially for less GC runs.
@@nayanchoudhary4353 absolutely not. It's just another optimization technique that might help in some cases, but also could lead to severe problems.
@@vivan000 Which is why the StringPool is slower even thought it doesn't allocate. The coder needs to be clear on what they are doing and optimize appropriately.
See, when you add a new string, you need to compare it with all strings in the heap, and as Ivan said it's too expensive.
Instead it always added to the heap, and references of the same string can be equal only at compile time.
I suppose when you add a Span to a StringPool, there are 2 cases - 1st, happy, when hash code is calculated from span, and if there is the same entity exist in Dictionary buckets, it fully compared byte by byte, and either returned existing string reference or 2nd - created a new one and returned new string from Span.
"AsSpan" just for "IndexOf"? A lot of those string created with the first use of the "Uri" class were static, weren't they?
I thought that using a Dictionary for caching the results from the GetHost function would be better than using a StringPool. What are the advantages of using a StringPool vs Dictionary for caching the results? Wouldn't the Dictionary be better since it's faster?
There’s quite a few actually. StringPool has logic to automatically unload data that isn’t accessed frequently, it has a fixed size, it is thread safe, it uses a lot of custom set logic to be as efficient as possible.
Guys, MemoryCache has the same logic as StringPool and uses ConcurrentDictionary inside, also linked entries and much more.
However, big advantage of the StringPool.GetOrAdd() is that it accepts Span parameter (MemoryCache/Dictionary do not), I suppose it calculates a Span hash and if it exists in buckets it compares if it the same, like a Dictionary does, but Span == string, and if so, it returns reference to existing string or else allocates a new one from Span. So there no need to allocate a new string on the heap each time, but for MemoryCache/Dictionary you instead allocate a string key on the heap to pass it as parameter to Get()/TryGetValue()
PS. It would be nice to have string.Intern(Span) overload, so behavior is almost the same as StringPool, but singleton.
I think this is literally what I always imagined for creating allocation-less string returns. If I'm correct, StringPool is *essentially* like StringBuilder, except it's a singleton, meaning it's accessed from anywhere. It's essentially a "scratch pad" for your string data! This video made me ask so many other questions though. Like if StringPool was being used in threads, would there be (a need for) a seperate pool for each thread, to assure non momory conflicts? Either way the concept is amazing.
This would help so much when it comes to games/ui where you are creating new strings each frame. Instead of trying to avoid it by doing value checks or somthing.
Also, at 5:56 (line 8), I thought if you used a string literal, it is stored globally in read-only memory, so passing in a regular string would not allocate new memory. So no need to stackalloc right?
Why don't they make the implementation of the `Uri` class span-based?
You’re calling AsSpan method 3 times. Is there any mileage performance wise in creating the span instance initially then manipulating that one object ?
It won't make a difference no. As methods are simply casting.
The solution to all the world's problems is span if you ask Nick x3
Only with Spans
spans let u SIMD :)
10:23 - wasnt that the one without Span?
Actually pooling is very important in my game framework Velaptor!!
I use the pooling technique when loading textures and sounds which are expensive operations and its fantastic. Especially in game development where performance is important as well as saving those memory allocations.
Great video!!
Re-inventing the wheel for minor performance boosts on standard tasks that have good general purpose implementations is usually not a good idea.
Surely, it's an optimization for cpu/memory performance which should be used on customer demand and after profiling.
The maintainability of the code will also suffer because of these optimizations.
This is why I'm up at 10:30
Almost got goofed typing span
Any reason stackalloc would be better than a static readonly instance?
In this case, there isn't much of a difference. You can even use the const as it was originally and you won't see much of a diff there. if anything it might be a bit fater
@@nickchapsas Yeah, I would have just used the string "://" and if the extensions don't allow for it directly, just: "://".AsSpan().
aren't strings already pooled ?
Not really: ua-cam.com/video/Xeq8YGEyyp8/v-deo.html
Why do I need a string pool when the CLR manages an internal string pool itself?
That's not quite true. People in the comments seem to completely misunderstand how string interning works. I think I need to make a video to explain this
That 'string pool' applies to the entire App Domain, and cannot ever be emptied or truncated without unloading said AppDomain.
It contains all constants and literals loaded within the assemblies thus far, and also anything that was explicitly added via String.Intern.
However, unless you access it via specific paths (e.g. Enum.ToString), or explicitly call String.IsInterned or String.Intern, you will NOT be using these instances, but rather heap allocated copies, and to even call the latter methods, you're still very likely to be using a heap allocated copy to reference and compare.
In summary, the biggest draw of this StringPool over the built in CLR one is that it can release its references to these strings, and allow them to be GC'd when no longer required or in use, a secondary draw is that you can use Spans to avoid needless allocations.
Have you seen overload with startIndex?
public int IndexOf (string value, int startIndex);
public int IndexOf (string value, int startIndex, int count, StringComparison comparisonType);
Also replacing of literal string with Span doesn't have sence as literal string is in memory once and forever, but span is recreated each time.
string GetHost(string url)
{
var i = url.IndexOf("://", StringComparison.Ordinal);
i = i == -1 ? 0 : i+3;
var j = url.IndexOf('/', i);
return j == -1 ? url.Substring(i) : url.Substring(i, j-i);
}
But actually they all are wrong as host must end not only on /, but also on ? and on #.
I would've make a static array of these 3 chars and use IndexOfAny(ends, i).
I thought string.Intern does the same. Extract an existing string from the pool or adds one. It's time for me to reread the documentation...
9:23 : I'm going to change this to release mode - miss click - no ? sorry wasn't looking straight
There is an unnecessary Substring call in the "good" implementation :D
Is there honestly a problem here you're solving? I mean you could say "every byte is one too many" but does it *really* matter for any other purpose than to learn the internals? If I had a problem with slow services I'd just throw 4 extra nodes against the problem and call it a day. I worked on extremely high traffic APIs (millions of requests per hour) and these things have not once been a problem for us.
Could you switch to c++ at this rate? The optimization is more about c++, c# is more about readability
String pooling is a language agnostic concept. On top of that there is nothing less readable in the final example. It's exactly the same as the previous example with a single line being different
@@nickchapsas I don't think so. In C++ we have pointers to strings. Any mutation of strings in any thread is globally, but in C# it will be first copied, ie will be locally
@@nickchapsasHey love your videos, I just see that you make a lot of optimizations that are not ordinary for c# but a normal day in the office in c++.
Memory is cheap, CPU performance isn't...unless you're doing embedded stuff.
if a developer in my team wrote the substring code instead of using URI - I'd kick him from the team :-) KISS is always better than mess.
👆 Great team lead over here. Wouldn't sit down and talk about it, but simply kick the developer from the team
That's depends where this code is used. Say in high load application, when there are thousands/millions of function calls per second, it should be optimized as much as possible, even sometimes it would be preferable to move a code to unmanaged C++ lib, or even in ASM lib, where specific processor code is implemented.
Just for ex. Medical Visual Systems company, here in Russia, was going to hire me. They write 4K/60 from 8 cams around medical operating room to their DataCenter with cams streams encoded by specific Intel CPUs (not GPUs) to H.264 or H.265 (don't remember which), they are going to provided high-tech medical operating room to Antarctica Vostok station soon.
So there is no need in code optimization, which is not frequently used, it should be decided on customer demand and then code profiling, but code maintainability is very valuable
30% of ur vid show me the explorer and the ide, ppl Look vids on their smartphones.
At first you create a language with GC, then you waste your time to make it fast.
I think GetHostBetter will be my preferred approach.