For those of you wondering about missing methods and how they would perform, I updated the code with more benchmarks and I added their results here: gist.github.com/Elfocrash/14dc6de96917c564c80e88a319effb32 (Options that require unsafe code are disqualified from the benchmarks)
@@Bertiii That's very bad practice, because you are modifying the existing string, ClearValue, which is supposed to be immutable. There are no guarantees that the string instance won't be used again elsewhere in .NET especially in the case where you would perform a hash on that password for storage into a database.
StringBuilder can be written in a simpler way that will take 40B less. var sb = new StringBuilder(ClearValue.Length); sb.Append(ClearValue, 0, 3); sb.Append('*', ClearValue.Length - 3);
I'm surprised that you did not mention another way to solve it, which is the second best in memory and time: ClearValue.Substring(0, 3).PadRight(ClearValue.Length, '*'); and is the first one that came to mind (and IMO the one that you should be using instead of the string.Create approach as it should be the easiest to understand). Nice explanation about string.Create, didn't know I could use it like that tho ;)
I could not mention every possible way, there are probably 10s of ways to solve the problem and I wanted to show some fairly common ones that everyone would have used. PadRight on string builder will have similar performance as the string builder one
That approach is awesome to know!I didnt know about PadRight,i like string manipulations in general.ONe error i can see though is you should write Padright(ClearValue.Length-3,'*'); because since your substring is the first three characters the rest of the string is asterisks you should subtract three from the overall original length.
@@nickchapsas oh, no worries, I do understand that ;) I didn't meant to sound like "bUt yOU DiDnT Do THiS or ThAT" i was just surprised that one of the most straightforward (or at least easy to understand imo) wasn't included instead of one of the others. As you've said there's 10s of ways of solving the same problem, so it's good to look at some of them either way. Keep up the good work with the videos ;)
your example is very simple in terms of what you are trying to achieve, most of the times the alogorithim is much more compliated, I apprecate that we should keep your suggestions in mind
The more I learn about Java and C#, the more I fall in love with C++. That low level control, so lovely. How in the world is string an immutable if it's just an array of chars, let me edit it, append to it, do math with the pointers... Give me memory, and let me assign values to it. If I want something to be read-only cache/thread safe, I can do so with mutex, atomic, flags, etc. Thanks for sharing this string.Create() it is an amazing tool, although the mask creation logic was also really clever!
You can mutate a string even in C# if you really want to (requires enabling unsafe code): string result = new string('*', original.Length); fixed (char* mutableString = result) mutableString[0] = original[0]; // P***********
You can't really append to a c++ std::string without reallocation going on in the background either. You could make a point for realloc, but if the space is no longer available that will fail, besides realloc is much more C then C++, which would use new and delete instead. The advantage is that you can modify already allocated values, but I would argue that string.Create is practically the same here.
@@raphaelbaier6984 It's pretty different. string.Create doesn't allow any mutability either; it just allow you to specify how to fill the initial allocation in a much more flexible way.
The string.Create approach is equivalent in performance to init a Span via stackalloc with SkipLocalsInit in the method with much, much cleaner code. Nice to have in the toolbelt.
I've been really getting in to using Spans where appropriate. I recently rewrote some CSS variable generation code using Spans and a couple of other funky optimisations and it's now a whopping 2000 times quicker and allocates 17 times less. Most of that came from parsing strings for any hex values and then create Color structs from those, that particular bit of code is over 13000 times faster and allocates 0 compared to nearly 10KB before.
hi, may i ask you if you have this code on a public repository just to check it for learning purposes? Or if you have any resources to share about it. Thanks!
Span result = stackalloc char[value.Length]; result.Fill('*'); for (int i = 0; i < 3; i++) { result[i] = value[i]; } return new string(result); - more faster than string.Create(no waste time on action call and you don't need to copy all source chars)
Just a word of warning: C# char is UTF-16 code unit, and up to 2 UTF-16 code units can form a single Unicode code point (which is closer, but not quite equal, to the abstract concept of "character"). So, any of these solutions could split a code point in half, and effectively modify how the visible part of the string is rendered on the screen.
This is already mentioned in the comments multiple times. It was originally part of the video but it was cut out because I think I’d out of scope. The focus is on the method not the string/char. The char size applies to all of them so they cost equally and they can get sped up equally
Last optimization was kinda very new syntax and hard to get a grasp on imo. I'll be taking into consideration the new string method and Stringbuilder ones. Cheers !
To improve the StringBuilder approach, specify the estimated number of characters required for the builder, so in this case it would be 12, that way, the StringBuilder doesn't have to keep reallocating new array's each time you Append().
The default capacity of StringBuilder is 16. So it will not reallocate in this specific scenario. But you can still save 4 bytes by pushing 12 to constructor.
Great video. I would use the StringCreate, but I didn't know about the span method. Question1: What if you did the span "the other way". Fill the entire span initially with "*******" and then "replace" only the first 3 elements from your desired string? I will often do something like this: Create a "**********************" string, arbitrarily large(call it asterisks) and concatinate a substring of asterisks. Question 2: How do those two approaches compare with your 4?
But late to the party, but I think up to 3/4 appends stringbuilder is slightly slower, purely for setting up the string builder in the first place (though the memory is still less). After that it's faster. I wouldn't bother using a sb for two strings for example. And for specific things like file paths you have methods to deal with those like Path.Join() etc.
filling a span is nice as long as you know you are using 16-bit characters and not emoji characters. Alas, the is a problem with the char type in general. JavaScript buffers are similar to spans and have similar problems. Still, we've come a long way from C# when StringBuilder being the tool of choice.
it's not emoji but UTF-8, ie up to 4 bytes. You can't create someting like this: char x = '😉'; but you can string y = "😉"; and during debuging there are actually 2 chars in *y* string - 55357 \ud83d and 56841 \uded09 or 0xF09F9889
@@lucassaarcerqueira4088 should have used new StringBuilder(ClearValue.Length) to initialize to full required size. When just the first 3 characters were used, it would have to grow the space used when appending. I'd need to double check the constructors that there isn't something which allows you to assign the initial string and the length, but it is the resize which takes place that is in question.
PadRight is the second best alongside the CharArray approach at 14ns in terms of speed but allocated less memory than char array at 80B compared to 96B
I'd be curious to see how this compares to stackalloc char[] and then passing that span to the string constructor. Granted that's even more constrained because the max length has to be fixed at compile time, but it seems like it would have comparable performance
The thing that you have to remember about string.Create method is that it uses a delegate. So if you pass some values in it from outside of the delegate scope - it'll take memory for closure allocations
The whole point is that you should not be passing anything from outside, that's why the length and the initial string are parameters in the delegate in the first place, so they are not captured in closures.
@@nickchapsas Yes, you should not. But you can. If you want to make some computed string and you know the absolute length limit (you can trim the empty space later if it matters) - you might be misled to believe that string.Create would be good for that. I've watched a bit further now and you've actually mentioned closures, so that's on me for my comment eagerness I've just stumbled upon a similar task that involves string concatenation and considered all the options (remembering this particular video). I'm actually curious now if closures will make it worse than a regular concatenation
30 years ago this wasn't "alot" of memory today it is literally nothing. that is like saying a paper cut is the same as a amputation. String manipulation is a big deal when dealing with HUGH files like when the file is greater than the ram on the PC. Parsing a 3 gig file with 32mg of ram into any possible combination of mailing addresses for insertion into a database that was a real challenge and handling memory and optimized for speed becomes critical for success.
This isn’t about the amount of memory. It’s about allocating the memory in the first place. What we re trying to prevent is the GC locking our app to collect memory that we can avoid and ultimately that improves stability and speed. The memory itself isn’t really that much of a problem
Great video. Thanks for mentioning not to optimize if you don't need it. I hate when people write highly optimal, hard to read code in situations where it offers no advantage. My approach is to always try to make the code as readable as possible first. If performance is an issue (rarely), then optimize. An author on optimization, back in the 90s, Michael Abrash once wrote that "premature optimization is the root of all evil."
How about filling the asterisks addressing the string as an array? string pwd = ClearValue; for(int i = 3; i < pwd.Length; i++) { pwd[i] = '*'; } I'm sure this won't be the optimal solution, but I would've like to see it compare to the other methods.
Pretty sure this will create a new string per array allocation. Should be as slow as the first one. It’s definitely not better than string builder and below
And for completness sake, here are the results for both StringAsArray and StringAsPointer... pretty interesting they both appear to give the (almost) exact same results. public unsafe string StringAsPointer() { int len = ClearValue.Length; fixed(char* pwd = ClearValue) { for(int i = 3; i < len; i++) { pwd[i] = '*'; } return new string(pwd, 0, len); } } | Method | Mean | Error | StdDev | Gen 0 | Allocated | |---------------- |---------:|---------:|---------:|-------:|----------:| | StringAsArray | 44.12 ns | 0.976 ns | 1.335 ns | 0.0114 | 48 B | | StringAsPointer | 44.20 ns | 0.452 ns | 0.423 ns | 0.0114 | 48 B |
The chat arrays approach is 16ns. Didn’t add it because it would bloat the video and it’s not as common as the previous ones. Unsafe code was out of the question
@@nickchapsas safety has a cost... a huge cost, after you have designed, debugged and al.. optimizing with unsafe should bring the most performance, and then you add some unit-test for safety
I disagree with the use of var because strong data typing helps to reduce issues during development and maintenance. I only use var if forced to by either 3rd party code or those who have authority over the development. I generally use string except when the string is very long, then use string builder. (Captain Obvious can be a good role model:)
var has nothing to do with strong typing. It is type inference. Good naming should make the use of the actual type redundant. If I don’t know what the type is just by looking at the name then I should fix the name not cover up the problem with an explicit type
Even faster version: [Benchmark] public string MaskUnsafe() { var s = new String('*', Pass.Length); unsafe { fixed (char* c = s) { c[0] = Pass[0]; c[1] = Pass[1]; c[2] = Pass[2]; } } return s; } (string.Create takes ~14.26ns on my computer; the above takes ~10.86ns, while the above but with a for (var i = 0; i < 3; ++i) loop takes ~12.00ns)
Like I’ve mentioned in the video and in the pinned comment, unsafe methods are out of the question. If unsafe as allowed then there are even faster approaches than this one but it’s not
I would have just use PadRight, although it's probably not the quickest or best with memory. Console.WriteLine(firstChars.PadRight(ClearValue.Length, '*'));
@@nickchapsas Got it. For large strings maybe worth thinking about, but only for the few of us working with large amounts of text :) Thanks for your videos!
Thanks for promised video. :) I think it's really useful for your followers because it's fact that not really lot of people know about this new feature.
Disappointed you didn't show unsafe methods, as they do have some use cases within Unity. E.g. displaying a millisecond precision level timer onto the screen, unity requires a string for the UI text component, but you can't create a new string every millisecond, or modify one. So you overwrite the original string and tell the text component to force--refresh, immutable is just a suggestion in unsafe land.
In general, I would advise not using strings directly. You might have to localize your program at some point, and I can assure you you don't want to go through your entire program to remove all of the hardcoded strings when that happens. Think ahead, use enums along with some class that returns the correct string for each enum value. In my opinion, in the end it's more readable, more futureproof, and it should enforce your team not to concatenate strings everywhere (what works for the english language probably doesn't for other languages like Arabic or Japanese).
"localize your program at some point" I disagree. You usually know at the very early stage if you're going to localize in the nearest future (2-3 years, maybe even longer for certain apps), if you don't plan for it then it's time and money wasted on futureproofing. If you do plan it, then you have a lot more to consider than simple "use enum instead of string" - fonts that support diacritics, layout (some text in other language may be waaay longer), right to left text,input methods, icons, graphics or even colors (some symbols are offensive in some regions), even complete ui revamp if habits and culture are vastly different and differencent ui could result in better sales. Not saying simple "string replace" localization is bad, but doing it blindly without thinking of scenarios to cover is definitely not a recommended way
It looks like it's a tiny bit faster to manualy copy first three chars instead of using AsSpan() - for me it's 20% faster public string MaskStringCreateManualCopy() { return string.Create(ClearValue.Length, ClearValue, (span, value) => { for (var i = 0; i < 3; i++) span[i] = value[i]; span[3..].Fill('*'); }); }
I've tested that against already improved version where only 3 chars are copied to the span 'value.AsSpan()[..3].CopyTo(span);' - not sure why doing it manually is faster but it only actually is when copying 3 chars or less... I guess the CopyTo() has better time complexity but with some initial cost.
Am I the only one who would write this? [TestMethod] public async Task TestString() { string work = "somestring"; string encrypted = string.Join(String.Empty, work.ToCharArray() .Select((c, index) => index > 2 ? '*' : c) ); }
@@nickchapsas Yeah I guess they thought the benefits are not worth the effort then. Don't have that much knowledge about C# tbh. Just assumed it because that's what Java does.
Java doesn’t have Span or something equivalent and every string “mutation” is a new allocation so in java the 4th approach doesn’t exist. Do you have a scenario where Java will optimise this?
"Don't prematurely optimize something just to use a feature." I have fallen into this trap so many times. Especially when I first learned Generics. I feel bad for anyone who had to revisit that code later.
.NET compiler is not very good at optimizing ;-) But the main thing is - programmer must know how to optimize too! Otherwise it's just "monkey coding".
@@moestietabarnak no it not do that much. if i remember correctly i heard that java will auto convert + operator to string builder, so you can use + without performance penalty. (i known that string builder is better, but for me i think + is more readable)
This could beat them all: [Benchmark] public string Unsafe () // 50 times faster than Native, no memory { unsafe { fixed ( char* p = ClearValue ) { for ( int i = 3 ; i < ClearValue.Length ; i++ ) { *( p + i ) = '*'; } } } return ClearValue; }
The obvious most performant Unsafe approach was one of the things excluded as “acceptable” for this. You’re pinning the string in memory so it’s technically safe but not something that should be used IMO unless the team is heavily using unsafe in other places
hehe (slower than string.create, but also only allocates the final string) public unsafe string MaskCharsStack() { var length = Password.Length; var chars = stackalloc char[Password.Length]; for (var i = 0; i < length; ++i) { chars[i] = i < 3 ? Password[i] : '*'; } return new string(chars); }
The reason I gave you thubs down is because you claim this is the most efficient way, while: 1. it clearly is not 2. even if it was, you didn't explore several other approaches, which could be more efficient (and some of them acrually are) 3. your comment in the comments about heap vs. stack, which shows you don't fully understand what is really going on underneath Other people before me already made good points of why this is not the most efficient way and what are the other possible options, so I will not repeat them. Apart from that, the goal of this video - to make "the lazy C# developers" think "what happens under the bonet" and in general make them think about performance - I admire that! Just be very careful with claims like "this is the most efficient way" - all you had to do is say "this is a much more efficient way" and you would have been golden! :)
Firstly, thumbs up or down, they both count as engagement so thank you! Now for your points. 1. Why isn't it? Do you have a better approach that isn't using unsafe code? I'd really like to know 2. The ones I didn't explore are not faster. I tried A LOT of them and they are all slower than string.Create. The only faster ones use unsafe code, which is out of the question. 3. Sounds like you are one not understanding heap vs stack. I would really like to know which part is wrong.
@@nickchapsas Well, I really don't want to turn this to a public argument, but for the sake of education (including mine, because I could be wrong and actually learn something from this), let me try to explain myself a bit more. Most of these points were already made by @Aidiakapi though. 1. string.Create probably is the fastest way without unsafe code and probably just as fast as with (correct) unsafe code - after all, it is designed to be just that. The reason your code is not the most efficient is because you're copying the whole original string into the new string, instead of only the first 3 characters. I know for this sting length the difference is miniscule, but still faster. For huge strings the difference will be noticeable. 2. One variant worth exploring is to use StringBuilder with preallocated length (so only 1 memory allocation is done, no reallocating), then copy the first 3 characters with something like stringBuilder.Append(ClearValue, 0, 3), then add the extra '*' characters using stringBuilder.Append('*', length - 3). This is probably the best we could do in the old days (before string.Create and Span) without unsafe code. It is still slower than string.Create because of the additional memory allocation which will happen on stringBuilder.ToString(), otherwise (before the .ToString()) it should be comparable in performance (as it is literally doing the same thing, just not in the final memory). 3. I am pretty sure all strings reside on the heap. I cannot guarantee (without disassembly) that string.Create doesn't directly allocate heap memory, instead of allocating stack memory and then copying it to a newly allocated heap memory after the delegate is finished, but I don't see why it would - it would only make things worse (less efficient). Imagine you string.Create'd a 100MB string - what would be the point of allocating 100MB of stack memory (which you may not even have), just to copy it later into the heap? And stack memory by itself is not faster than the heap - it is the same kind of memory. Also Span is perfectly capable of "pointing" to heap memory - being ref struct only means that the Span variable itself is always on the stack, not the memory it points to. Think of Span as just a "pointer" and a size, except that "pointer" is a little bit more complicated than a regular unsafe pointer. One more thing about unsafe code: using it to override the internal buffer of the input string is a very bad idea, because of shared strings, precomputed hash codes, etc. Using it to change the internal buffer of a newly allocated string is still a hack, but apparently it works (and is the best we could do before string.Create and Span). If you think about it, string.Create is just the better (safer) way of doing the same thing - allocate some memory on the heap and let you fill that memory at the correct time (before the string initialization is finshed, not after that), using Span instead of an unsafe pointer. And one more thing worth mentioning - one has to be careful with the delegate passed as callback - if it uses some local variable from the calling function, it becomes a closure, which means heap allocation and all the efficiency goes out the window. Unless the compiler / JIT does some very clever optimisation, which I doubt. This is why the delegate has a 'state' argument passed to it.
Meh, doesn’t really matter, no need to obsess over minor details. You only live once. Code it and move on. If there’s a problem fix it. It’s only software.
Someone tried it in the comments. No difference at all but it’s most likely due to the size of the span. If it was a waaaay bigger one it should be faster
You know what this tells me? C# lied to us when it said it was trying to make things simpler. Wtf do i have to know all this? why can't i just make a string, and flip on a "optimize for this or that" flag? fuck.
You don't have to know all this and you don't have to optimize any of that. You only have to worry about all that when you've optimized everything else in high level and you now need to do microptimizations
@@MikeWardNet True! I like C# because it let's you create something useful with very little knowledge. But the more you know - the better your program became. Same with Python.
For those of you wondering about missing methods and how they would perform, I updated the code with more benchmarks and I added their results here: gist.github.com/Elfocrash/14dc6de96917c564c80e88a319effb32 (Options that require unsafe code are disqualified from the benchmarks)
@@Bertiii That's very bad practice, because you are modifying the existing string, ClearValue, which is supposed to be immutable. There are no guarantees that the string instance won't be used again elsewhere in .NET especially in the case where you would perform a hash on that password for storage into a database.
StringBuilder can be written in a simpler way that will take 40B less.
var sb = new StringBuilder(ClearValue.Length);
sb.Append(ClearValue, 0, 3);
sb.Append('*', ClearValue.Length - 3);
Nick, I've been a C# dev for almost two decades, and you keep teaching me new things. This time, it's that string has constructors. Thanks!
I'm surprised that you did not mention another way to solve it, which is the second best in memory and time: ClearValue.Substring(0, 3).PadRight(ClearValue.Length, '*'); and is the first one that came to mind (and IMO the one that you should be using instead of the string.Create approach as it should be the easiest to understand). Nice explanation about string.Create, didn't know I could use it like that tho ;)
I could not mention every possible way, there are probably 10s of ways to solve the problem and I wanted to show some fairly common ones that everyone would have used. PadRight on string builder will have similar performance as the string builder one
Would be first choice also for me
That approach is awesome to know!I didnt know about PadRight,i like string manipulations in general.ONe error i can see though is you should write Padright(ClearValue.Length-3,'*'); because since your substring is the first three characters the rest of the string is asterisks you should subtract three from the overall original length.
@@nickchapsas oh, no worries, I do understand that ;) I didn't meant to sound like "bUt yOU DiDnT Do THiS or ThAT" i was just surprised that one of the most straightforward (or at least easy to understand imo) wasn't included instead of one of the others. As you've said there's 10s of ways of solving the same problem, so it's good to look at some of them either way. Keep up the good work with the videos ;)
@@ZeroSleap actually, no. PadRight takes final string length as parameter, not amount of symbols to be added.
I use string.Create to construct cache keys (whose lengths are known in advance): it happens on a very hot path and is worth doing.
I immediately had a use case (shockingly) for this and it did have about 40% off my execution times. Thanks for this and all your other great videos!
your example is very simple in terms of what you are trying to achieve, most of the times the alogorithim is much more compliated, I apprecate that we should keep your suggestions in mind
Span is really cool. It feels like engaging with memory in a low-level C way with the benefits of type safety
Yeah I like it! Span is essentially a non-owning, bounds checked pointer.
Ok now I'm scared. I was just working with strings yesterday and now exactly the same approach for string optimization. Great video!
The more I learn about Java and C#, the more I fall in love with C++. That low level control, so lovely. How in the world is string an immutable if it's just an array of chars, let me edit it, append to it, do math with the pointers... Give me memory, and let me assign values to it. If I want something to be read-only cache/thread safe, I can do so with mutex, atomic, flags, etc.
Thanks for sharing this string.Create() it is an amazing tool, although the mask creation logic was also really clever!
You can mutate a string even in C# if you really want to (requires enabling unsafe code):
string result = new string('*', original.Length);
fixed (char* mutableString = result)
mutableString[0] = original[0]; // P***********
You can't really append to a c++ std::string without reallocation going on in the background either. You could make a point for realloc, but if the space is no longer available that will fail, besides realloc is much more C then C++, which would use new and delete instead. The advantage is that you can modify already allocated values, but I would argue that string.Create is practically the same here.
@@raphaelbaier6984 It's pretty different. string.Create doesn't allow any mutability either; it just allow you to specify how to fill the initial allocation in a much more flexible way.
The string.Create approach is equivalent in performance to init a Span via stackalloc with SkipLocalsInit in the method with much, much cleaner code. Nice to have in the toolbelt.
SkipLocalsInit requires unsafe code. It is a big no-no. Also, it is not faster.
| Method | Mean | Error | StdDev | Gen 0 | Allocated |
|----------------- |---------:|----------:|----------:|-------:|----------:|
| StackAllocCreate | 9.503 ns | 0.0781 ns | 0.0730 ns | 0.0029 | 48 B |
| MaskStringCreate | 7.846 ns | 0.0701 ns | 0.0655 ns | 0.0029 | 48 B |
I've been really getting in to using Spans where appropriate. I recently rewrote some CSS variable generation code using Spans and a couple of other funky optimisations and it's now a whopping 2000 times quicker and allocates 17 times less. Most of that came from parsing strings for any hex values and then create Color structs from those, that particular bit of code is over 13000 times faster and allocates 0 compared to nearly 10KB before.
hi, may i ask you if you have this code on a public repository just to check it for learning purposes? Or if you have any resources to share about it. Thanks!
@ilh86 Do you have some "getting started with spans" tutorials that you can recommend?
String builder is my go-to
Span result = stackalloc char[value.Length];
result.Fill('*');
for (int i = 0; i < 3; i++)
{
result[i] = value[i];
}
return new string(result); - more faster than string.Create(no waste time on action call and you don't need to copy all source chars)
It is not. I am benchmarking it and it is always performing worse with an average of 1 nanosecond slower.
This is insane optimization!......Loved your explanation :)
Love it that u r using jetbrains rider :)
Does it matter tho, Im just asking?
@@rade6063 looks more usable than vs community im using
Just a word of warning: C# char is UTF-16 code unit, and up to 2 UTF-16 code units can form a single Unicode code point (which is closer, but not quite equal, to the abstract concept of "character"). So, any of these solutions could split a code point in half, and effectively modify how the visible part of the string is rendered on the screen.
This is already mentioned in the comments multiple times. It was originally part of the video but it was cut out because I think I’d out of scope. The focus is on the method not the string/char. The char size applies to all of them so they cost equally and they can get sped up equally
Cool ...
I use most of the time string concatenation ..and rarely string builder ..
And never span 😑
Thanks for sharing this one 👌
Changing it to var made me happy 😊
Just want to say that I appriciate your videos :)
Would love to see how string interpolation would compare to this. I’ve always heard to use it over concatenation
Looks like a prime candidate for a string extension method. Great video.
Thank you for this. Was searching for an efficient way to fill a string for my batch file serializer
Last optimization was kinda very new syntax and hard to get a grasp on imo. I'll be taking into consideration the new string method and Stringbuilder ones. Cheers !
To improve the StringBuilder approach, specify the estimated number of characters required for the builder, so in this case it would be 12, that way, the StringBuilder doesn't have to keep reallocating new array's each time you Append().
Done that already and replied in a different comment. For this size, it saved 1 ns so it wasn’t noticeable
@@nickchapsas No probs.
The default capacity of StringBuilder is 16. So it will not reallocate in this specific scenario. But you can still save 4 bytes by pushing 12 to constructor.
Great video. I would use the StringCreate, but I didn't know about the span method. Question1: What if you did the span "the other way". Fill the entire span initially with "*******" and then "replace" only the first 3 elements from your desired string? I will often do something like this: Create a "**********************" string, arbitrarily large(call it asterisks) and concatinate a substring of asterisks. Question 2: How do those two approaches compare with your 4?
Thank for this. I thought stringbuilder was slower that using the string variables. I also liked the benchmark tools used in the video.
Why would you think it would be slower?
But late to the party, but I think up to 3/4 appends stringbuilder is slightly slower, purely for setting up the string builder in the first place (though the memory is still less). After that it's faster. I wouldn't bother using a sb for two strings for example.
And for specific things like file paths you have methods to deal with those like Path.Join() etc.
filling a span is nice as long as you know you are using 16-bit characters and not emoji characters. Alas, the is a problem with the char type in general. JavaScript buffers are similar to spans and have similar problems. Still, we've come a long way from C# when StringBuilder being the tool of choice.
it's not emoji but UTF-8, ie up to 4 bytes.
You can't create someting like this:
char x = '😉';
but you can
string y = "😉";
and during debuging there are actually 2 chars in *y* string - 55357 \ud83d and 56841 \uded09 or 0xF09F9889
@@29Aios Alas, even in JavaScript adding the 4-byte characters was a hack. A reminder that you can try to future-proof but you'll fail.
@@BobFrTube Well, there is a problem with dynamic size of UTF-8 chars, span suppose to be used as a constant size
You did not pass an initial capacity to your StringBuilder. That might have improved its performance even more.
He actually did
Default capacity is 16 so it wouldn't have made much difference (other than allocating 4 less bytes, but it wouldn't have extended the capacity)
@@lucassaarcerqueira4088 should have used new StringBuilder(ClearValue.Length) to initialize to full required size. When just the first 3 characters were used, it would have to grow the space used when appending. I'd need to double check the constructors that there isn't something which allows you to assign the initial string and the length, but it is the resize which takes place that is in question.
Your videos are awesome man! how did you learn all these? what's your learning methodology?
Greate explanation, but you didn't mention another one method with PadRight, it would be intresting to compare
PadRight is the second best alongside the CharArray approach at 14ns in terms of speed but allocated less memory than char array at 80B compared to 96B
@@nickchapsas I see, thanks
Couldn't you just generate a span from "Password123" as "Pas" then another as "********" then combine them?
Thanks for the valuable info!
I'd be curious to see how this compares to stackalloc char[] and then passing that span to the string constructor. Granted that's even more constrained because the max length has to be fixed at compile time, but it seems like it would have comparable performance
Very impressive
The thing that you have to remember about string.Create method is that it uses a delegate. So if you pass some values in it from outside of the delegate scope - it'll take memory for closure allocations
The whole point is that you should not be passing anything from outside, that's why the length and the initial string are parameters in the delegate in the first place, so they are not captured in closures.
@@nickchapsas Yes, you should not. But you can. If you want to make some computed string and you know the absolute length limit (you can trim the empty space later if it matters) - you might be misled to believe that string.Create would be good for that. I've watched a bit further now and you've actually mentioned closures, so that's on me for my comment eagerness
I've just stumbled upon a similar task that involves string concatenation and considered all the options (remembering this particular video). I'm actually curious now if closures will make it worse than a regular concatenation
I did not know about string constructors at all
string is a class ;-)
30 years ago this wasn't "alot" of memory today it is literally nothing. that is like saying a paper cut is the same as a amputation. String manipulation is a big deal when dealing with HUGH files like when the file is greater than the ram on the PC. Parsing a 3 gig file with 32mg of ram into any possible combination of mailing addresses for insertion into a database that was a real challenge and handling memory and optimized for speed becomes critical for success.
This isn’t about the amount of memory. It’s about allocating the memory in the first place. What we re trying to prevent is the GC locking our app to collect memory that we can avoid and ultimately that improves stability and speed. The memory itself isn’t really that much of a problem
very good man, thanks
This is pretty sick
Insane, what about setting the "count" parameter in the CopyTo command? would it be even 1 micro nanosecond faster? lol
Great video. Thanks for mentioning not to optimize if you don't need it. I hate when people write highly optimal, hard to read code in situations where it offers no advantage. My approach is to always try to make the code as readable as possible first. If performance is an issue (rarely), then optimize.
An author on optimization, back in the 90s, Michael Abrash once wrote that "premature optimization is the root of all evil."
Is this actually the same of using a char array and a for loop?
In MaskStringBuilder method you can combine first and third line ;)
How about filling the asterisks addressing the string as an array?
string pwd = ClearValue;
for(int i = 3; i < pwd.Length; i++) {
pwd[i] = '*';
}
I'm sure this won't be the optimal solution, but I would've like to see it compare to the other methods.
Pretty sure this will create a new string per array allocation. Should be as slow as the first one. It’s definitely not better than string builder and below
@@nickchapsas Actually, nevermind... the indexer of a string is actually readonly :facepalm:
But with a small change:
public string StringAsArray() {
char[] pwd = ClearValue.ToCharArray();
for(int i = 3; i < pwd.Length; i++) {
pwd[i] = '*';
}
return pwd.ToString();
}
We get better memory results than StringBuilder.
| Method | Mean | Error | StdDev | Gen 0 | Allocated |
|-------------- |---------:|---------:|---------:|-------:|----------:|
| StringAsArray | 44.61 ns | 0.935 ns | 1.183 ns | 0.0114 | 48 B |
And for completness sake, here are the results for both StringAsArray and StringAsPointer... pretty interesting they both appear to give the (almost) exact same results.
public unsafe string StringAsPointer() {
int len = ClearValue.Length;
fixed(char* pwd = ClearValue) {
for(int i = 3; i < len; i++) {
pwd[i] = '*';
}
return new string(pwd, 0, len);
}
}
| Method | Mean | Error | StdDev | Gen 0 | Allocated |
|---------------- |---------:|---------:|---------:|-------:|----------:|
| StringAsArray | 44.12 ns | 0.976 ns | 1.335 ns | 0.0114 | 48 B |
| StringAsPointer | 44.20 ns | 0.452 ns | 0.423 ns | 0.0114 | 48 B |
@@alcoholrelated4529 Yes, I should have provided a baseline for comparison.
Here are the results for the three test-runs:
| Method | Mean | Error | StdDev | Gen 0 | Allocated |
|----------------- |---------:|---------:|---------:|-------:|----------:|
| StringAsArray | 44.26 ns | 0.963 ns | 1.412 ns | 0.0114 | 48 B |
| StringAsPointer | 39.98 ns | 0.346 ns | 0.324 ns | 0.0114 | 48 B |
| MaskStringCreate | 98.69 ns | 1.081 ns | 0.958 ns | 0.0440 | 184 B |
| Method | Mean | Error | StdDev | Gen 0 | Allocated |
|----------------- |----------:|---------:|---------:|-------:|----------:|
| StringAsArray | 45.67 ns | 0.842 ns | 0.788 ns | 0.0114 | 48 B |
| StringAsPointer | 45.79 ns | 1.012 ns | 1.243 ns | 0.0114 | 48 B |
| MaskStringCreate | 104.77 ns | 2.106 ns | 3.955 ns | 0.0440 | 184 B |
| Method | Mean | Error | StdDev | Gen 0 | Allocated |
|----------------- |----------:|---------:|---------:|-------:|----------:|
| StringAsArray | 44.48 ns | 0.676 ns | 0.600 ns | 0.0114 | 48 B |
| StringAsPointer | 43.52 ns | 0.584 ns | 0.546 ns | 0.0114 | 48 B |
| MaskStringCreate | 101.39 ns | 1.448 ns | 1.355 ns | 0.0440 | 184 B |
I think that StringAsArray is the easiest to implement, and the optimal solution for handling very large strings (not the fastest though).
You can also use char arrays or unsafe char*s
The chat arrays approach is 16ns. Didn’t add it because it would bloat the video and it’s not as common as the previous ones. Unsafe code was out of the question
@@nickchapsas fits the theme of "the best way you shouldn't use" though :D
Haha true true. Might make another video with unsafe stuff as well. I’m always fascinated by how much you can do with unsafe code in C#
@@nickchapsas safety has a cost... a huge cost, after you have designed, debugged and al..
optimizing with unsafe should bring the most performance, and then you add some unit-test for safety
@@moestietabarnak If I wanted to write unsafe code I'd write C++. This is all within the context of safe code.
I disagree with the use of var because strong data typing helps to reduce issues during development and maintenance. I only use var if forced to by either 3rd party code or those who have authority over the development. I generally use string except when the string is very long, then use string builder. (Captain Obvious can be a good role model:)
var has nothing to do with strong typing. It is type inference. Good naming should make the use of the actual type redundant. If I don’t know what the type is just by looking at the name then I should fix the name not cover up the problem with an explicit type
@@nickchapsas I would prefer "StringBuilder stringBuilder = new(blah)" as it clear from the start what the datatype is with no guessing !
Even faster version:
[Benchmark]
public string MaskUnsafe()
{
var s = new String('*', Pass.Length);
unsafe {
fixed (char* c = s)
{
c[0] = Pass[0];
c[1] = Pass[1];
c[2] = Pass[2];
}
}
return s;
}
(string.Create takes ~14.26ns on my computer; the above takes ~10.86ns, while the above but with a for (var i = 0; i < 3; ++i) loop takes ~12.00ns)
Like I’ve mentioned in the video and in the pinned comment, unsafe methods are out of the question. If unsafe as allowed then there are even faster approaches than this one but it’s not
I would have just use PadRight, although it's probably not the quickest or best with memory.
Console.WriteLine(firstChars.PadRight(ClearValue.Length, '*'));
Actually the second best in my benchmarks in terms of speed and memory. uses span internally.
It also reads incredibly well.. Wonder what the performance is like.
The last one the copyto is writing every char beyond the 3rd un-necessarily?
In the first iteration yeah but at this length there is no performance hit. I’ve benchmarked it.
@@nickchapsas Got it. For large strings maybe worth thinking about, but only for the few of us working with large amounts of text :) Thanks for your videos!
Span CopyTo has no offset and length parameters? Feels unneccessary to copy more than 3 characters.
what about ClearValue[..3].PadRight(ClearValue.Length, '*')?
Thanks for promised video. :) I think it's really useful for your followers because it's fact that not really lot of people know about this new feature.
Disappointed you didn't show unsafe methods, as they do have some use cases within Unity.
E.g. displaying a millisecond precision level timer onto the screen, unity requires a string for the UI text component, but you can't create a new string every millisecond, or modify one.
So you overwrite the original string and tell the text component to force--refresh, immutable is just a suggestion in unsafe land.
Unsafe methods were explicitly out of the video's scope
Is it faster than using an unsafe block?
No but unsafe code is a no-no in several codebases, so this is the closest you can get without unsafe code
What if create char array of the length and fill it with data and return new string(character_array)? Would it be the same as string.Create solution?
No it would be between the new string() and the string.Create
Thank you!
This will add so many fps to new PC games.
Can I know what IDE program you are using?
The upper left shows a RD - that has to be 'Rider' provided by jetbrains.
In general, I would advise not using strings directly. You might have to localize your program at some point, and I can assure you you don't want to go through your entire program to remove all of the hardcoded strings when that happens. Think ahead, use enums along with some class that returns the correct string for each enum value.
In my opinion, in the end it's more readable, more futureproof, and it should enforce your team not to concatenate strings everywhere (what works for the english language probably doesn't for other languages like Arabic or Japanese).
"localize your program at some point" I disagree. You usually know at the very early stage if you're going to localize in the nearest future (2-3 years, maybe even longer for certain apps), if you don't plan for it then it's time and money wasted on futureproofing. If you do plan it, then you have a lot more to consider than simple "use enum instead of string" - fonts that support diacritics, layout (some text in other language may be waaay longer), right to left text,input methods, icons, graphics or even colors (some symbols are offensive in some regions), even complete ui revamp if habits and culture are vastly different and differencent ui could result in better sales.
Not saying simple "string replace" localization is bad, but doing it blindly without thinking of scenarios to cover is definitely not a recommended way
Would be similar performance using a char array, to the string builder?
It's between new string and string.create. In my tests it runs at 16ns and 96B of allocated memory. (Not with string builder but new string(charArray)
It looks like it's a tiny bit faster to manualy copy first three chars instead of using AsSpan() - for me it's 20% faster
public string MaskStringCreateManualCopy()
{
return string.Create(ClearValue.Length, ClearValue, (span, value) =>
{
for (var i = 0; i < 3; i++)
span[i] = value[i];
span[3..].Fill('*');
});
}
The reason why this is faster is because in my example I first copy the full string in the span and then overwrite the last characters
I've tested that against already improved version where only 3 chars are copied to the span 'value.AsSpan()[..3].CopyTo(span);' - not sure why doing it manually is faster but it only actually is when copying 3 chars or less... I guess the CopyTo() has better time complexity but with some initial cost.
so, back to C/C++
Have to admit, being able to just access strings as arrays right off the bat in C and C++ is convenient.
Am I the only one who would write this?
[TestMethod]
public async Task TestString()
{
string work = "somestring";
string encrypted = string.Join(String.Empty, work.ToCharArray()
.Select((c, index) => index > 2 ? '*' : c)
);
}
NDC is great
One more thing to implement unreplasable developer pattern better😄
There's no way the compiler doesn't optimize it.
Obv a easy case for an example but I image quite a common one.
The compiler isn’t that smart. It will do what you tell it and in this case, it won’t.
@@nickchapsas Yeah I guess they thought the benefits are not worth the effort then. Don't have that much knowledge about C# tbh.
Just assumed it because that's what Java does.
Java doesn’t have Span or something equivalent and every string “mutation” is a new allocation so in java the 4th approach doesn’t exist. Do you have a scenario where Java will optimise this?
@@nickchapsas
String numbers = "";
for(int i = 0; i
I’m curious why the 29 people disliked this video!!
Before watching the video, my guess is string.Create
"Don't prematurely optimize something just to use a feature."
I have fallen into this trap so many times. Especially when I first learned Generics. I feel bad for anyone who had to revisit that code later.
Premature optimization quotes are the worst thing that has happened to software engineers.
i was thinking that simple solution will optimized by compiler, so i don't need too care that much. but it was not T_T
.NET compiler is not very good at optimizing ;-)
But the main thing is - programmer must know how to optimize too!
Otherwise it's just "monkey coding".
I never heard of a compiler that optimize a bubble sort into a quicksort..or whatever algo... do they now?
-signed Old C programmer
@@moestietabarnak no it not do that much. if i remember correctly i heard that java will auto convert + operator to string builder, so you can use + without performance penalty. (i known that string builder is better, but for me i think + is more readable)
@@mix5003 yup readability trump optimization
public string MaskNewStringV2()
{
var length = ClearValue.Length - 3;
return $"{ ClearValue.Substring(0, 3) }{ new String('*', length) }";
}
This could beat them all:
[Benchmark]
public string Unsafe () // 50 times faster than Native, no memory
{
unsafe
{
fixed ( char* p = ClearValue )
{
for ( int i = 3 ; i < ClearValue.Length ; i++ )
{
*( p + i ) = '*';
}
}
}
return ClearValue;
}
The obvious most performant Unsafe approach was one of the things excluded as “acceptable” for this. You’re pinning the string in memory so it’s technically safe but not something that should be used IMO unless the team is heavily using unsafe in other places
hehe (slower than string.create, but also only allocates the final string)
public unsafe string MaskCharsStack()
{
var length = Password.Length;
var chars = stackalloc char[Password.Length];
for (var i = 0; i < length; ++i)
{
chars[i] = i < 3 ? Password[i] : '*';
}
return new string(chars);
}
#3 is the best because it's very easy to read. #4 is not worth the tiny performance boost.
I don't find #4 to be unreadable.
The reason I gave you thubs down is because you claim this is the most efficient way, while:
1. it clearly is not
2. even if it was, you didn't explore several other approaches, which could be more efficient (and some of them acrually are)
3. your comment in the comments about heap vs. stack, which shows you don't fully understand what is really going on underneath
Other people before me already made good points of why this is not the most efficient way and what are the other possible options, so I will not repeat them.
Apart from that, the goal of this video - to make "the lazy C# developers" think "what happens under the bonet" and in general make them think about performance - I admire that! Just be very careful with claims like "this is the most efficient way" - all you had to do is say "this is a much more efficient way" and you would have been golden! :)
Firstly, thumbs up or down, they both count as engagement so thank you!
Now for your points.
1. Why isn't it? Do you have a better approach that isn't using unsafe code? I'd really like to know
2. The ones I didn't explore are not faster. I tried A LOT of them and they are all slower than string.Create. The only faster ones use unsafe code, which is out of the question.
3. Sounds like you are one not understanding heap vs stack. I would really like to know which part is wrong.
@@nickchapsas Well, I really don't want to turn this to a public argument, but for the sake of education (including mine, because I could be wrong and actually learn something from this), let me try to explain myself a bit more. Most of these points were already made by
@Aidiakapi though.
1. string.Create probably is the fastest way without unsafe code and probably just as fast as with (correct) unsafe code - after all, it is designed to be just that. The reason your code is not the most efficient is because you're copying the whole original string into the new string, instead of only the first 3 characters. I know for this sting length the difference is miniscule, but still faster. For huge strings the difference will be noticeable.
2. One variant worth exploring is to use StringBuilder with preallocated length (so only 1 memory allocation is done, no reallocating), then copy the first 3 characters with something like stringBuilder.Append(ClearValue, 0, 3), then add the extra '*' characters using stringBuilder.Append('*', length - 3). This is probably the best we could do in the old days (before string.Create and Span) without unsafe code. It is still slower than string.Create because of the additional memory allocation which will happen on stringBuilder.ToString(), otherwise (before the .ToString()) it should be comparable in performance (as it is literally doing the same thing, just not in the final memory).
3. I am pretty sure all strings reside on the heap. I cannot guarantee (without disassembly) that string.Create doesn't directly allocate heap memory, instead of allocating stack memory and then copying it to a newly allocated heap memory after the delegate is finished, but I don't see why it would - it would only make things worse (less efficient). Imagine you string.Create'd a 100MB string - what would be the point of allocating 100MB of stack memory (which you may not even have), just to copy it later into the heap? And stack memory by itself is not faster than the heap - it is the same kind of memory. Also Span is perfectly capable of "pointing" to heap memory - being ref struct only means that the Span variable itself is always on the stack, not the memory it points to. Think of Span as just a "pointer" and a size, except that "pointer" is a little bit more complicated than a regular unsafe pointer.
One more thing about unsafe code: using it to override the internal buffer of the input string is a very bad idea, because of shared strings, precomputed hash codes, etc. Using it to change the internal buffer of a newly allocated string is still a hack, but apparently it works (and is the best we could do before string.Create and Span). If you think about it, string.Create is just the better (safer) way of doing the same thing - allocate some memory on the heap and let you fill that memory at the correct time (before the string initialization is finshed, not after that), using Span instead of an unsafe pointer.
And one more thing worth mentioning - one has to be careful with the delegate passed as callback - if it uses some local variable from the calling function, it becomes a closure, which means heap allocation and all the efficiency goes out the window. Unless the compiler / JIT does some very clever optimisation, which I doubt. This is why the delegate has a 'state' argument passed to it.
выделено 500 укусов :DDDD
Meh, doesn’t really matter, no need to obsess over minor details. You only live once. Code it and move on. If there’s a problem fix it. It’s only software.
"Quickly change it to a var".. Instant dislike
Classic
I bet you can save couple more microseconds if you only copy first 3 characters from span in last method.
Someone tried it in the comments. No difference at all but it’s most likely due to the size of the span. If it was a waaaay bigger one it should be faster
You know what this tells me? C# lied to us when it said it was trying to make things simpler. Wtf do i have to know all this? why can't i just make a string, and flip on a "optimize for this or that" flag? fuck.
You don't have to know all this and you don't have to optimize any of that. You only have to worry about all that when you've optimized everything else in high level and you now need to do microptimizations
I'm sorryu but you have to know how things works or you will be just "monkey coding" :-)
Programming is difficult business. It should never be undertaken in ignorance. - Douglas Crockford
@@MikeWardNet True!
I like C# because it let's you create something useful with very little knowledge. But the more you know - the better your program became.
Same with Python.
Having to keep track of a lot of flags would not make anything simpler. :)
First
everyone this code is way too complex. Me as javascript developer going yeah this is totally how I would do It in javascript
Could also Slice that Span before copying, since only the first 3 chars are needed. Haven't measured though.