How Regex in C# can kill your app

Поділитися
Вставка
  • Опубліковано 1 жов 2024

КОМЕНТАРІ • 129

  • @KevinInPhoenix
    @KevinInPhoenix Рік тому +16

    The is an old saying: If you have a problem that requires Regex then you now have two problems.

  • @pilotboba
    @pilotboba Рік тому +39

    I know this video wasn't about email... but...
    I think MS or some other people have determined there is no way to really verify an email with a regex. I think even MS changed it so they basically look for a single @ in the string to call it valid email format.
    The way to validate it is to send an email with a confirmation link. :)

    • @Mario-cr1ik
      @Mario-cr1ik Рік тому +5

      This approach is the recommended way mentioned somewhere in the ms docs

    • @rezataba6204
      @rezataba6204 Рік тому +1

      What about the login situation? It's not common to send verification emails for logins.

    • @pilotboba
      @pilotboba Рік тому +3

      @@rezataba6204 make the account pending untill the email has been verified. Pending accounts get no access.

    • @albe8479
      @albe8479 Рік тому

      @@rezataba6204 for login for an existing verified account it does not matter. If user with email as login exists, it's all ok. Maybe just to a length check.

  • @nidustash6964
    @nidustash6964 Рік тому +81

    "Not everyone can do that, mainly because nobody knows how to write a Regex"
    TBH this got me off guard!! I even choked on my own saliva for a brief moment. In the end, still very educational and fantastic content as usual sir!

    • @FrederickMarcoux
      @FrederickMarcoux Рік тому +9

      But it's so true. Nobody understands Regex.

    • @mandoschMUh
      @mandoschMUh Рік тому

      That one is pure gold, I agree :D

    • @robertnull
      @robertnull Рік тому +3

      I'd say that most people can write a regular expression, but nobody can read it after than, including the author. It's easier to rewrite it than to understand it ;)

    • @GufNZ
      @GufNZ Рік тому +2

      It's not true tho - I am very fluent in Regex, in various dialects, and for everyone else there's RegexBuddy.

    • @GufNZ
      @GufNZ Рік тому +1

      @@FrederickMarcoux I do, very well.

  • @myroslavberlad4428
    @myroslavberlad4428 Рік тому +63

    If you have a problem and you have a solution via Regex - now you have two problems

    • @LCTesla
      @LCTesla Рік тому +1

      Do people really believe that or is it about making a cute "zinger" for the uncritical masses

    • @myroslavberlad4428
      @myroslavberlad4428 Рік тому +3

      @@LCTesla yes, they do. And there are reasons for that. Hard to master, hard to debug, hard to update without breaking existing cases. It is not about the tool is bad. RegEx are actually powerfull instrument and there are nice places for its usages for sure, but it is hard to master. That is why, this saying was born

    • @LCTesla
      @LCTesla Рік тому +2

      @@myroslavberlad4428 seems to just applying the KISS principle and restricting its use to appropriate use cases counters all that. The fact that a tool can be mis-used is case against the user, not the tool.

    • @myroslavberlad4428
      @myroslavberlad4428 Рік тому

      @@LCTesla I do agree

  • @frossen123
    @frossen123 Рік тому +23

    In the cybersecurity, abusing regex like this is a category of DoS attacks called a ReDoS

    • @rapzid3536
      @rapzid3536 Рік тому +1

      Interesting, we call it the same thing outside the cybersecurity industry.

  • @HalasterBlackmantle
    @HalasterBlackmantle Рік тому +9

    What's the downside to using NoBacktracking? Or rather, what would be a scenario where you would not want to use it?

  • @Kommentierer
    @Kommentierer Рік тому +12

    Everything I see on your channel is super interesting and special. I never knew about those issues, but it is nice to know how to fix them.
    Sharing this with my colleagues.

  • @pilotboba
    @pilotboba Рік тому +2

    Developer has a problem.
    Developer uses RegEx
    Developer now has 2 problems.
    :)

  • @IvanRandomDude
    @IvanRandomDude Рік тому +6

    Chapas flexing with 32 cores on us mortals @4:52

    • @AcidNeko
      @AcidNeko Рік тому +2

      and rtx 4090 and 128gb of ram :)
      it can run 100 instances of Rider, or 6 instances of Visual Studio 2022

  • @billy65bob
    @billy65bob Рік тому +1

    2:30 that is very some bad and inefficient code for that pattern.
    I'm guessing this tool is more to break down what the various regex implementations will do in an easy to understand manner, rather than to generate something actually worth using.
    I had looked at the specification of email addresses some time ago, I wanted to know what was valid, and how sub addressing was defined.
    Just the bits in common use are very complicated, and that's before you get to all the weird emails that no one sane would use, but are actually allowed by the standard,
    such as using quotes, escaping quotes inside the quotes, double @'s, non-ascii symbols, a % to set the route, sub addressing, etc.
    What the standard allows is insane, and trying to handle it via regex is a fool's errand.
    You're way better off writing a small program (or library) for the dedicated purpose of validating emails, by having it identify fragments, and validating them as defined.

  • @rumplin
    @rumplin Рік тому +5

    What a subtle way to show us that you have a RTX 4090 :)

    • @nickchapsas
      @nickchapsas  Рік тому +8

      It’s the only reason I made the video

  • @RayanMADAO
    @RayanMADAO Рік тому +2

    that regex visualization site is really cool

  • @nickhubbard3671
    @nickhubbard3671 Рік тому +1

    The best way to avoid issues with Regex is to not use it; and to avoid people that do use it!🙃

  • @FunWithBits
    @FunWithBits Рік тому +11

    Regex is a super powerful. I just wish people would format it a little bit more. Usually I see regex and it is just a line of characters. RegEx code can be much easier to read when there is spacing, multiple lines, using different indenting, adding comments, etc. Programmers don't put CSharp code in a single line with no spaces or comments but in regex this is accepted. (and because its hard to read it's impossible to see any performance issues it might have)

    • @chriskruining
      @chriskruining Рік тому +2

      could you give me an example of such formatted regex? Because I always assumed it had to be a line of chars because every space and newline used to format is part of the query as far as I am aware. So I am curious how you do this, because I love clearly formatted code :D

    • @robertnull
      @robertnull Рік тому +4

      @@chriskruining There is a (?x) regex modifier than enables free-spacing mode, i.e. you can put spaces and newlines in your regex and they will be ignored, so you can make your expression multi-line, with each line containing a part that captures something significant. What's more, in this mode you can even use # comments at the end of each line!

    • @PeterK6502
      @PeterK6502 Рік тому

      @@robertnull True, but most input to be parsed is dependent on spaces, therefore this mode is useless in that situation (you could add comments however to increase readability).

    • @robertnull
      @robertnull Рік тому +2

      @@PeterK6502 Fret not, kind sir, for in free-spacing mode you just escape spaces with a backslash to make them part of the important expression and not part of the unimportant formatting :)

    • @PeterK6502
      @PeterK6502 Рік тому

      @@robertnull I did not know that, thanks for the info.

  • @Denominus
    @Denominus Рік тому +4

    Excellent video and great advice. We've fallen prey to this twice in the past. First an attack directly against one of our APIs and then during Cloudflare's global outage due to a bad regex on their side (not our fault in this case, but still an outage).
    At the time we changed the regex, but there are only a handful of people who know how to do this confidently on a complex regex. I really like these "safety net" approaches.

  • @brianviktor8212
    @brianviktor8212 Рік тому +1

    10 seconds to check if a given string is a valid e-mail? Sounds great! I mean I could do it with a little custom algorithm with ~0.001µs, but hey, it's regex!! We all love regex, don't we guys?!
    An E-Mail is setup like this: [text]@[domain].[ending] - Either split at the @ or get the index of it. If the result is !=2 elements in the array or -1 as index, you have either no @ or more than 1. Both should return "false" for the check. After that you get the last index of "." (apparently you can have multiple dots?). If it's -1, return false. Otherwise first part is the domain, the second part is the ending. Here you can verify if it's a valid e-mail address.
    It's really simple... I thought everybody would do this? Why even bother with Regex for this?

    • @brianviktor8212
      @brianviktor8212 Рік тому

      @@billy65bob - Hmm yeah, that would require adjustments then. I've never seen those before though. In the worst case I'd have to loop through every char manually, but only once.

  • @5hunt3r
    @5hunt3r Рік тому +13

    just a note: don't try to validate emails. It's nearly impossible to check if a mail is valid because so many special cases exist where it looks invalid but still is valid.

    • @nickchapsas
      @nickchapsas  Рік тому +10

      The actual RFC regex is HUUUUGE

    • @humanesque
      @humanesque Рік тому +2

      Pretty much this; about the furthest you can go is checking if the domain exists; short of asking the receiving server if it will accept it. Useless checks like these are worse than the blindly copying code (which is what this RegEx is) and being surprised when it goes wrong.

    • @orterves
      @orterves Рік тому +10

      My understanding is the best way to validate an email, is to send a verification email.

    • @nooftube2541
      @nooftube2541 Рік тому +2

      @@nickchapsas the real RFC Regex does not exist 😂 Because email like the domain cannot be parsed with regex.
      Actually there 2 normal solutions: either check @ sign and symbols existence before and after, and check that email is real. But the second option does not handle localhosts...

    • @EmptyGlass99
      @EmptyGlass99 Рік тому +2

      The only 100% guaranteed way to validate an email is to force the user to respond to an email sent to them i.e. sending a validation link or one-time validation code.

  • @TribalBoss
    @TribalBoss Рік тому +5

    Few years ago I had to check if an HTML string contained any email addresses using Regex. Needless to say, I had to reboot the Azure server after pushing to production 😂

  • @zxopink
    @zxopink Рік тому +3

    What's the backdraws of nobacktracking?

    • @adassko6091
      @adassko6091 Рік тому +1

      The option can’t be used in conjunction with RegexOptions.RightToLeft or RegexOptions.ECMAScript, and it doesn’t allow for the following constructs in the pattern:
      Atomic groups
      Backreferences
      Balancing groups
      Conditional
      Lookarounds
      Start anchors (\G)

  • @nickandrews1985
    @nickandrews1985 Рік тому +2

    My second biggest takeaway from this video is that Nick already has himself a RTX 4090 LOL

  • @shingok
    @shingok Рік тому +2

    I wonder if the Source Generator version was slower because it was compiled as debug. Maybe the dynamic compiled version generate optimized version regardless of compilation mode.

  • @carmineos
    @carmineos Рік тому +2

    DataAnnotations should be safe as RegularExpressionAttribute has a default timeout of 2s (at least from .NET 5, idk before)

  • @StasAbrosimov
    @StasAbrosimov Рік тому

    If you decide to solve the problem with regular expressions... You now have two problems: the original problem and the regular expression.
    It's an old joke....

  • @peledzohar
    @peledzohar Рік тому

    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. ~ Jamie Zawinski

  • @TonoNamnum
    @TonoNamnum Рік тому +12

    Regex are not extremely hard lol. If you study them for about a week you should be able to create very powerful stuff. And also the secret to regexes in my opinion is to separate them in little chunks.
    When you study them you definitely learn what Nick is describing. I don't regret learning/using regexes.
    I also agree that they are not the most efficient option but if you understand what you are doing it saves a lot of time.

    • @ProtectedClassTest
      @ProtectedClassTest Рік тому +7

      well, wait until you maintain other people's regex and come back here cryin hahaha

    • @RealMathewAdams
      @RealMathewAdams Рік тому

      You aren't coding for yourself, you are coding for the future. Regex can be unmaintainable if the use-case is non-trivial.

    • @TonoNamnum
      @TonoNamnum Рік тому

      @@ProtectedClassTest the crying will be for people that do not understand them like you 🤣

    • @TonoNamnum
      @TonoNamnum Рік тому

      Also this video encourages you to use them ua-cam.com/video/R5BcHIMZMxc/v-deo.html and that channel has a lot of subscribers.
      I guess the bottom line is you have to understand what you are doing just like everything else.

  • @gregcyrus2739
    @gregcyrus2739 Рік тому

    Hate regex! If you re-engineer foreign code you will never know what was intended to validate for. The LIKE operator is not that flexible but I could always validate everything (maybe with a sequence of LIKE-lines - and it was human readable)

  • @mastermati773
    @mastermati773 Рік тому

    Validating emails is so ubiquitous that I wonder why tf Regex can't have a special symbol onyl for emails xD

  • @nooftube2541
    @nooftube2541 Рік тому

    I love that regex for email... and it doesn't work, because email cannot be parsed by regex.

  • @djupstaten2328
    @djupstaten2328 Рік тому

    These patterns overuse capturing groups. (x) should be (?:x) more often than not, i.e. non-capturing groups. It makes a ton of difference in regards to bloat and lag.

  • @speakoutloud7293
    @speakoutloud7293 Рік тому

    Soo you got the 4090, wondering what king of games you are playing :P

  • @jspesh
    @jspesh Рік тому

    Nice RTX4090 & 128gb ram, bro!

  • @a13w1
    @a13w1 Рік тому +1

    That timeout option is quite cool when you know how long a normal regex will take to pass even under load. Plan to use it next time If makes sense when I write regex.

  • @GaryJohnWalker1
    @GaryJohnWalker1 Рік тому

    Regex kills my brain so why not the computer too

  • @tarsala1995
    @tarsala1995 Рік тому

    Wut? You already have RTX 4090? 5:00

  • @KanashimiMusic
    @KanashimiMusic Рік тому

    I find it funny that people keep saying "nobody knows how to write RegEx", because I don't find it TOO difficult. I mean it still takes me a while to do anything remotely complex, but like, it's manageable imo. Usually I will have RegExr open in another tab, since it contains a cheat sheet with the most important features, and it quickly lets me validate that my RegEx works the way it should

    • @KanashimiMusic
      @KanashimiMusic Рік тому

      @@karlfimm I really need to start using GitHub copilot.

  • @cn-ml
    @cn-ml Рік тому +2

    Thanks for the video, i already started using timeouts for regex wherever possible. However I don't fully understand what the non-backtracking option does. Why does it change the performance of the regex and what changes with the results?

    • @humanesque
      @humanesque Рік тому +1

      Non-Backtracking is basically lazy evaluation for your regular expressions, and it's implementation dependent. Unless you're using it for a throwaway match (instead of parsing, which is what regex is for), it will introduce weird, platform specific bugs and grief.

    • @cn-ml
      @cn-ml Рік тому

      @@humanesque okay thanks, so it's basically unsafe but faster

  • @Hamza-Shreef
    @Hamza-Shreef Рік тому

    this kinda thing has been really useful
    keep it up bro

  • @janneforsell525
    @janneforsell525 Рік тому

    Once again I've opened a PR during the video 😅

  • @mirabilis
    @mirabilis Рік тому

    No backtracking will break the regex.

  • @deepakkulkarni5356
    @deepakkulkarni5356 Рік тому +1

    Hey Nick, does SQL validation also increase exponentially with more records. Can you share any document link which proves the same?

    • @klekaelly
      @klekaelly Рік тому

      I thought the same thing, SQL validation uses Regex a lot

  • @IAmFeO2x
    @IAmFeO2x Рік тому +7

    Great video as always! Personally I avoid Regex like the devil - it always takes so long to read and understand them in code.

    • @infeltk
      @infeltk Рік тому

      I use Regex for simple things. Everything has its purpose and limitations. And problem described in this episode is described on Microsoft leanr page net fundamentals - it is not a secret information.

  • @PeterK6502
    @PeterK6502 Рік тому

    This kind of behaviour is frequently solved by using lazy capture instead of greedy capture, for example instead of using ()+ you should use ()+?
    I can see at least one greedy capture group in the shown expression.
    You should always try to avoid greedy captures, because of backtracking.
    Use ()*? or ()+? instead of ()* or ()+

  • @ToadieBog
    @ToadieBog Рік тому

    To me, Regex has always had the smell of something confusing to use, that I never really cared for. I'm looking forward to a replacement that humans can actually read.

  • @masonwheeler6536
    @masonwheeler6536 Рік тому

    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

  • @alirezanet
    @alirezanet Рік тому

    Nick I know regex 😊 stop saying that if you don't man 😂
    PS. just kidding ... I just can write regex but after a while only god knows what it is doing 😂😅

  • @zedmagdy
    @zedmagdy Рік тому

    I've tried this regex with php preg_match and it works fine I don't know if it's CSharp specific or what?

  • @gerakore8948
    @gerakore8948 Рік тому

    I've never decided to bother with regex. I see how it can be useful but its a clustered mess. Debugging and code maintenance would be a nightmare. I've done a lot of parsing and I doubt regex would be able to handle some of the inputs I've dealt with. For instance receipts with various formats printed that are cut off mid receipt and with inconsistent headers/footers scanned in low quality into an image format and placed into a pdf on which I would have to use OCR to extract the text. If you can imagine all the text is scrambled 5's tur in into S's 1's turn into I's etc. Sometimes characters are missing and you cant really rely on identifiable tags.

  • @tanglesites
    @tanglesites Рік тому

    Excellent video as usual! I was wondering if anyone knows of any resources on how to scan Assemblies, I trying to build a setup for a minimal api project I am working on. I would like to pull all the classes that are using a particular interface or interfaces, register them in the IoC, so that it kind of auto-magically works. Do you Nick have any videos on this, or anyone know of anywhere I can look. Everything I have found are particular use cases. Sorry new to C#, I could figure it out I sure given enough time, just looking to speed up development a little and make the code a little more organized. Again great content. You have taught me more in the last month than I have learned in a year, and its more than beginner level, loving it.

  • @Victor_Marius
    @Victor_Marius Рік тому

    It happened to froze my browser tab while testing a regex for matching file paths (in JS). It wasn't because of the length of the input but more like some spaces in the input. Why does it use backtracking? Can it be avoided with the format of the regex? If you use something as simple as /w0rd/ is it still going to use backtracking?

  • @ws_stelzi79
    @ws_stelzi79 Рік тому

    Well what is the saying "If you try to solve one problem with RegEx you have now two problems!"

  • @magashkinson
    @magashkinson Рік тому

    Very usefull video. Didn't know about this problem

  • @DuelingTreeMike
    @DuelingTreeMike Рік тому

    Amazing find sir. I had no idea backtracking can be so dangerous. Thank you so much for creating this video.

  • @tmhchacham
    @tmhchacham Рік тому +1

    Very nice, as usual. Keep it up!

  • @nothingisreal6345
    @nothingisreal6345 Рік тому

    My rule of thumb is: if possible avoid regex. Hard to write. Extremely hard to read for others. If you use proper typed data you will not need it. And no matter how much effort you put into testing and thinking about edge cases: there are sittlich too many times it will fail. For many strings there are alternative ways to verify them: IP address, URI, file path… very often the need to regex is based an a bad design or due to have to connect to legacy systems.

  • @anon0
    @anon0 Рік тому

    ooh very cool i just started doing my phd on symbolic automata regex. glad to see it being relevant

  • @codeforme8860
    @codeforme8860 Рік тому +2

    Does anyone acutely know how to use Regex

    • @ryanzwe
      @ryanzwe Рік тому +2

      Nope, I can't read or write it

    • @RougeEric
      @RougeEric Рік тому +2

      I think it's fair to assume that anyone who's spent enough time with it can comfortably create some shorter regex and know what they're doing. But as soon as you start playing with complex nested systems and tons of lookahead stuff, even with significant practice, I have to test things extensively just to make sure they are doing what I think they're supposed to.

    • @geomorillo
      @geomorillo Рік тому +1

      regwhat?

  • @casperhansen826
    @casperhansen826 Рік тому

    I use Regex for small strings with simple use cases,

  • @pmashurenko
    @pmashurenko Рік тому

    Well it worth to also read RFC 2821 and very quickly it will be get clear that regular expressions are bad tool for email validation - variety of options for names and domain names is so huge that it makes almost no sense to check beyond the point that there's "@" that isn't preceded by "\" character somewhere in the middle there.

    • @billy65bob
      @billy65bob Рік тому

      Not even that is foolproof, as a @ inside quotes is also escaped. :)
      Granted, no one uses quotes in their email addresses, but it is allowed by the standard.

  • @theMagos
    @theMagos Рік тому +1

    128 GB RAM? Yikes...

    • @FunWithBits
      @FunWithBits Рік тому

      Maybe for video editing?

    • @nickchapsas
      @nickchapsas  Рік тому +3

      I wish I had a good reason….but I don’t….

  • @jerryjeremy4038
    @jerryjeremy4038 Рік тому

    Wow that's a monster computer! Too many cores

  • @coced
    @coced Рік тому

    6:36
    I felt it

  • @dmytrk
    @dmytrk Рік тому

    In some cases, I write my own algorithm to scan the string, so I can actually debug that.

    • @McNerdius
      @McNerdius Рік тому +1

      This is why i love the new regex source generators, being able to view and step through the C# equivalent is a great learning aid for me. I comprehend the basics of regex but if a string + nontrivial regex combo doesn't pass a unit test or whatever and i can't figure out why... i can step through that particular scenario now, yay !

  • @katerinaandrasko3755
    @katerinaandrasko3755 Рік тому

    how about - don't do regex?... i know crazy, but with emails check if there is "@" symbol if it is, cool, accept it. applications should try to send you that email to continue with whatever you want. want to register? cool - type in the verification code? want to recover your account? cool, click on the link in your email. at the end of the day that's what truly validates your email address - you get an email.

  • @FunWithBits
    @FunWithBits Рік тому

    Thats odd. I wrote a longer comment and saw it in the comments but then it disappeared after a few minutes. Maybe the UA-cam engine removed it after post-processing?

    • @nickchapsas
      @nickchapsas  Рік тому +1

      UA-cam is notorious for auto deleting comments especially in programming content. I don’t delete any comments so maybe try to repost it

    • @FunWithBits
      @FunWithBits Рік тому

      @@nickchapsas -I think that happed before on other channel's also. I wish youtube would be more careful on what they delete as it had nothing negative/bad. I'll repost. Thank you for the awsome channel - I learn so much here. I also like how you consider performance as a higher priority is most of your videos.

  • @attribute-4677
    @attribute-4677 Рік тому

    Which version is the NonBacktracking enum in? I'm targeting .Net framework 4.8 and it can't seem to find it (VS2022 automatically selects the language version, but even when forced to C# 8 it fails to find it).

  • @antonmartyniuk
    @antonmartyniuk Рік тому

    nice call on the Regex problem!

  • @parkercrofts6210
    @parkercrofts6210 Рік тому

    Thank u for this ❤❤

  • @HadrielWonda
    @HadrielWonda Рік тому

    Thanks for the insight nick

  • @anonimzwx
    @anonimzwx Рік тому +1

    Regex is very easy to do tbh, the nonbacktracking option affects the result??

  • @matthewsheeran
    @matthewsheeran Рік тому

    Brilliant!

  • @rbogdan8980
    @rbogdan8980 Рік тому

    Thanks!

  • @abhishekbagchi6052
    @abhishekbagchi6052 Рік тому +1

    Clicked so fast

  • @Max_Jacoby
    @Max_Jacoby Рік тому

    nick@n.n.n.n.n.n.n.n.n.c should be a CPU benchmark.

  • @AvenDonn
    @AvenDonn Рік тому +4

    Brb gonna go try signing up to everything with nick@n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.c

    • @jedimastermaniac
      @jedimastermaniac Рік тому

      lol. we still have to take into account for every action that the end user is gonan end up notories stupid bastard :D :P

  • @claudiufarcas
    @claudiufarcas Рік тому

    Nice seeing you in person @dotnetdays.
    Keep doing great things!
    You're awesome!