C++ Weekly - Ep 425 - Using string_view, span, and Pointers Safely!

Поділитися
Вставка
  • Опубліковано 9 чер 2024
  • ☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟
    Upcoming Workshop: Understanding Object Lifetime, C++ On Sea, July 2, 2024
    ► cpponsea.uk/2024/sessions/und...
    Upcoming Workshop: C++ Best Practices, NDC TechTown, Sept 9-10, 2024
    ► ndctechtown.com/workshops/c-b...
    This episode is sponsored by think-cell. think-cell.com/cppweekly
    Episode details: github.com/lefticus/cpp_weekl...
    T-SHIRTS AVAILABLE!
    ► The best C++ T-Shirts anywhere! my-store-d16a2f.creator-sprin...
    WANT MORE JASON?
    ► My Training Classes: emptycrate.com/training.html
    ► Follow me on twitter: / lefticus
    SUPPORT THE CHANNEL
    ► Patreon: / lefticus
    ► Github Sponsors: github.com/sponsors/lefticus
    ► Paypal Donation: www.paypal.com/donate/?hosted...
    GET INVOLVED
    ► Video Idea List: github.com/lefticus/cpp_weekl...
    JASON'S BOOKS
    ► C++23 Best Practices
    Leanpub Ebook: leanpub.com/cpp23_best_practi...
    ► C++ Best Practices
    Amazon Paperback: amzn.to/3wpAU3Z
    Leanpub Ebook: leanpub.com/cppbestpractices
    JASON'S PUZZLE BOOKS
    ► Object Lifetime Puzzlers Book 1
    Amazon Paperback: amzn.to/3g6Ervj
    Leanpub Ebook: leanpub.com/objectlifetimepuz...
    ► Object Lifetime Puzzlers Book 2
    Amazon Paperback: amzn.to/3whdUDU
    Leanpub Ebook: leanpub.com/objectlifetimepuz...
    ► Object Lifetime Puzzlers Book 3
    Leanpub Ebook: leanpub.com/objectlifetimepuz...
    ► Copy and Reference Puzzlers Book 1
    Amazon Paperback: amzn.to/3g7ZVb9
    Leanpub Ebook: leanpub.com/copyandreferencep...
    ► Copy and Reference Puzzlers Book 2
    Amazon Paperback: amzn.to/3X1LOIx
    Leanpub Ebook: leanpub.com/copyandreferencep...
    ► Copy and Reference Puzzlers Book 3
    Leanpub Ebook: leanpub.com/copyandreferencep...
    ► OpCode Puzzlers Book 1
    Amazon Paperback: amzn.to/3KCNJg6
    Leanpub Ebook: leanpub.com/opcodepuzzlers_book1
    RECOMMENDED BOOKS
    ► Bjarne Stroustrup's A Tour of C++ (now with C++20/23!): amzn.to/3X4Wypr
    AWESOME PROJECTS
    ► The C++ Starter Project - Gets you started with Best Practices Quickly - github.com/cpp-best-practices...
    ► C++ Best Practices Forkable Coding Standards - github.com/cpp-best-practices...
    O'Reilly VIDEOS
    ► Inheritance and Polymorphism in C++ - www.oreilly.com/library/view/...
    ► Learning C++ Best Practices - www.oreilly.com/library/view/...
  • Наука та технологія

КОМЕНТАРІ • 61

  • @yokozombie
    @yokozombie Місяць тому +10

    all cool on hello world example, until you find out address sanitizer is out of memory to instrument your legacy 32-bit system

  • @progammler
    @progammler Місяць тому +24

    I had to optimize genome analysis code 2-3 years ago. ~200gb input strings and they made millions of copies during runtime (small substrings). Was very difficult to change their code to allow the use of string_views but it payed off and in the end runtime dropped from 45min to 8min half of which was writing the results to disk! So yes it can be painful to implement when the code was not designed accordingly but it may still pay off!
    Also using memory mapped file io instead of streams sped things up a lot! Couldn't use it in the end because the OS pages out only when ram is almost full. Since this ran on an hpc cluster memory was limited via software (slurm). The process was killed because the memory mapped file exceeded the jobs memory limit. I tried forcing the OS to page out before the software limit was reached but it didn't work for some reason... Didn't have the time to investigate further but would be very interested in a solution.

    • @markusasennoptchevich2037
      @markusasennoptchevich2037 Місяць тому +1

      That's why C is still preferable in high performance code. You are in control of literally anything, and almost always, C code doesn't imply uncontrollable allocations or copying. But for many people, being in control of "everything" is too uncomfortable, so they prefer C++

    • @markusasennoptchevich2037
      @markusasennoptchevich2037 Місяць тому +2

      Also, about your problem, there is no guaranteed solution for file buffering. If you do not close file descriptors explicitly, or flush them explicitly, cache manager of your OS handles flushes to the disk accordingly to it's internal algorithm (all opened files already mapped in memory by cache manager and writing to the file just updates the underlying page you are writing to and sets dirty bit to it, telling memory/cache manager (didn't remember exactly, may differ depending on OS) that this page is ready to be flushed (synchronized) with the disk representation of a file). I think, possible solution to optimize your disk i/o is to check what flags your OS provides you when you open a file (ofc, stl library won't give you those), setting up some cache manager settings, etc. The general solution to increase disk i/o is that your flushes must correlate with the speed of your data processing. Generally, disk can handle 64-256KB of data in one request, but that also differs on disk vendor, your OS and etc. Anyway, this field is where c++ must go and researching underlying OS mechanism comes into play!

    • @sledgex9
      @sledgex9 Місяць тому +6

      @@markusasennoptchevich2037 And with assembly your even more in control of everything. /troll

    • @markusasennoptchevich2037
      @markusasennoptchevich2037 Місяць тому +1

      @@sledgex9 sure it do. as for c vs c++, if you don't care about implicit allocations, copying and readability, then, c++ is not arguably better choice than c. that's, unfortunately, not the case for many products

    • @sledgex9
      @sledgex9 Місяць тому +5

      @@markusasennoptchevich2037 If you do care about implicit allocations and copying then you can certainly avoid those with C++ by choosing the correct idioms and structures. This improves both readability and coding efficiency.

  • @otanoshimi4
    @otanoshimi4 Місяць тому +2

    I learned a lot.

  • @raymundhofmann7661
    @raymundhofmann7661 Місяць тому

    What are the drawbacks to let compilers obvious cases that can be statically checked treat like an error?

  • @catsolstice
    @catsolstice Місяць тому +3

    I was running in a similar but more subtle issue: build a string, take a string_view on that string then move the string into a container. The string_view may now be invalid. Only happen because of SSO in my tests.

    • @quademasters249
      @quademasters249 Місяць тому

      Yeah you can't modify the source of the view without invalidating the view. A "std::move" modifies the string by clearing it during the move

  • @AlfredoCorrea
    @AlfredoCorrea Місяць тому +4

    3:12 “turn on Address sanitizer”. How was it done? using the godbolt interface?

    • @pierrecolin6376
      @pierrecolin6376 Місяць тому +13

      -fsanitize=address

    • @davidfrischknecht8261
      @davidfrischknecht8261 Місяць тому +2

      Does MSVC have a similar feature?

    • @edwincarlsson9014
      @edwincarlsson9014 Місяць тому +1

      You can pass -fsanitize=address to cl as well. target_compile_options(target PRIVATE -fsanitize=address) and the same for target_link_options() for CMake.

    • @nielsdegroot9138
      @nielsdegroot9138 Місяць тому +2

      @@davidfrischknecht8261 Yes, in Visual Studio 2019 version 16.9. First link when you Google msvc addresssanitizer.

    • @cihatkececi2310
      @cihatkececi2310 Місяць тому +2

      @@davidfrischknecht8261Yes, it does. Use the compiler option /fsanitize=address

  • @AlfredoCorrea
    @AlfredoCorrea Місяць тому +4

    4:26 if the conversion from string (vector) to string_view (span) were not automatic, it would have been difficult to sell. (and I don’t like either).

  • @piggy8435
    @piggy8435 Місяць тому +1

    Cool vid. Would love a video on how coroutines work/are supposed to work. I feel confused by the current version of coroutines, like I’m supposed to implement everything myself / thread creation. Whereas in Golang I can just say “go x()” to do something async in a thread

    • @bencekeomley-horvath385
      @bencekeomley-horvath385 Місяць тому

      There are many videos from the most recent cpp con about how to use coroutines, ofc a video from him would be also helpful. But also I don't think you understand what a coroutine is, definitely not a goroutine from go, a goroutine is practically a thread abstraction (with thread pools and everything). Coroutine a way to pause a function from executing by yielding (coroutine way of returning, but can be done multiple times) you and later you can resume the execution keeping the whole stack intact. Explaining it is hard there are many videos about it, but coroutines are not yet another way to handle multithreading

    • @vetirtal1168
      @vetirtal1168 Місяць тому

      You shouldn't implement everything yourself, you're expected to use a library (std or 3rd party)

    • @KX36
      @KX36 Місяць тому

      2015/2016 CppCon talks about C++/WinRT are some of the best about coroutines, as they were only in the standard from C++20 and not much use on their own since, but they were apparently in C++/WinRT long before, and it seems like that's somewhere you could actually learn to use and get used to coroutines.
      I would recommend:
      Introduction to C++ Coroutines by James McNellis (2016) followed immediately by
      Putting Coroutines to Work with the Windows Runtime by Kenny Kerr and James McNellis (2016)
      There is also C++ Coroutines - Under The Covers by Gor Nishanov (2016) which was part of the same day of talks, but I found it to be less practical and more about convincing sceptics that the compiler can still optimize coroutines.
      There is also C++ Coroutines by Gor Nishanov (2015) from the year before but I can't remember how good it is.
      Every other talk I have found, especially the more recent CppCon talks is just an hour of the same boilerplate skeletons out of context and if you're lucky you get a generator for an infinite series at the end, which James did more clearly in 2016. I think some of the other C++/WinRT talks from around the same time just casually drop in co_await etc without much explanation like it's common practice, which is probably still more helpful than an hour of the same boilerplate skeleton code.

    • @marcs9451
      @marcs9451 Місяць тому +1

      it's called being "a poorly designed language"

    • @anon_y_mousse
      @anon_y_mousse Місяць тому

      @@marcs9451 If you mean Go, then I wholeheartedly agree.

  • @alskidan
    @alskidan Місяць тому

    Is address sanitizer available for memory constraint and low power environments, like embedded systems?

    • @GreenJalapenjo
      @GreenJalapenjo Місяць тому

      Depends on what you mean by "embedded systems". It's available for embedded Linux systems, but not on things like MCUs.

    • @alskidan
      @alskidan Місяць тому

      @@GreenJalapenjo This was a rhetorical question. Point being, address sanitizer is not available everywhere, unfortunately. These defects in the language and its standard library need systemic fixes.

    • @cppweekly
      @cppweekly  Місяць тому

      @@alskidan This is why "designing for portability" is one of my best practices. You need to be able to compile as much of your code on as many platforms as possible so you can use as many tools as possible.
      I just had to have a conversation about this with a team that is highly embedded on custom hardware.
      There is great value to be gained in making sure as much of your code as possible can run on your native host too.

    • @alskidan
      @alskidan Місяць тому

      @@cppweekly Agree 👍🏻 What you say is perfectly reasonable to me. But as always, there is that pesky 80/20 rule. The devil would be in those 20 (target specific) percent. :^) I was trying to make a point that relying solely on a safety net of sanitizers and tools is not enough.

  • @von_nobody
    @von_nobody Місяць тому +1

    I think we should add attribute attributes to prevent code like this, like `std::string_view foo([[return_as_ref]] const std::string& x)` now calling `return foo("xx")` will cause compiler error.

    • @not_ever
      @not_ever Місяць тому +4

      It seems possible for clang to catch this so why is another attribute needed? Why can't compiler vendors have a flag for it? More people are likely to use this than an attribute. You don't have to remember to type it, it doesn't take any mental overhead once you set up your compiler flags and frankly it doesn't litter your code yet another attribute that wouldn't be necessary if the language wasn't so fond of implicit conversions and footguns.

    • @von_nobody
      @von_nobody Місяць тому

      @@not_ever Clang can catch it because this is `std::` type, will be able do same to `my::string_view`? that is opaque type compiled in other translation unit? At some point we will hit halting problem. Attributes will restrict what we can do in code and this will allow static analyzers to catch bugs.

  • @VioletGiraffe
    @VioletGiraffe Місяць тому +4

    In order to use views safely you must not return views. Only take them as arguments or use as local(!!!) variables.

    • @quademasters249
      @quademasters249 Місяць тому +1

      I don't agree. I'll pass in a const string& and return a view to inside of that const string& For example when applying an RE to the string. It's a question of the lifetime of the thing the view is referencing. As long as the life of the source of the view is longer than the life of the view you're OK. You can't mod the source of the view, that'll invalidate the view.
      You certainly don't return a view to a temporary thing from inside the function.
      std::string_view FindFilename(const std::string& sSrce);

    • @cppweekly
      @cppweekly  Місяць тому +1

      I disagree in one simple sense: If I receive a view, I should be allowed to return a view.
      If you don't allow them to be returned ever, then you miss out on the huge power of using string_view for creating parsers.

  • @piggy8435
    @piggy8435 Місяць тому +3

    At 2:10, why does creating a string do a heap allocation? I thought the string literal itself is stored in some static memory and the string object is a stack allocation to that part of memory. Not sure myself

    • @samuelpolacek6762
      @samuelpolacek6762 Місяць тому +6

      The character literal is copied into the allocated memory pointed to by string. See basic_string constructors on cppreference: "Constructs the string with the contents initialized with a copy of the null-terminated character string pointed to by `s`"

    • @CyberDork34
      @CyberDork34 Місяць тому +5

      It's a heap allocation because you want to be able to edit the string, append new text, and so on

    • @yntfwyk
      @yntfwyk Місяць тому +3

      It is not always a heap allocation, because the library implementers have added SSO (small string optimization), if your string is less than the SSO buffer size, the contents gets copied to this buffer. If not, then it is a heap allocation. Different STL libraries may have different SSO buffer size though.

    • @yokozombie
      @yokozombie Місяць тому +1

      the correct answer above is "because it has to be editable", everything else implementation details

    • @X_Baron
      @X_Baron Місяць тому +1

      Wikipedia says: "The string class provided by the C++ standard library was specifically designed to allow copy-on-write implementations in the initial C++98 standard, but not in the newer C++11 standard". Several other libraries provide copy-on-write string classes, so maybe some of them will skip allocation completely if possible.

  • @TsvetanDimitrov1976
    @TsvetanDimitrov1976 Місяць тому +1

    Implicit conversions are just evil, I get it that's more comfortable, but it's just not worth it

  • @GreenJalapenjo
    @GreenJalapenjo Місяць тому +2

    The advantages of string_view over C strings is already pretty weak (and the downsides to string_view significant). If casting from std::string to std::string_view was more verbose than a cast from std::string to C string, I would certainly be using const char* as my go-to "reference to string-like data" type rather than string_view.

  • @anon_y_mousse
    @anon_y_mousse Місяць тому

    Once again, I'm going to have to disagree with you regarding implicit conversions. The problem is not that they occur, but how and when they occur, as well as the additional problem here of how string_view and string are implemented in the general case. They'll never change them in the standard to make more sense, but it was always a mistake that string wasn't already a view of a string, allowing the underlying data to actually COW. The other mistake they made in designing C++ (not to imply there's only two, but rather you're only demonstrating two in the video) was not allowing the caller to take ownership of returned values automatically and have the caller initiate destruction when it leaves scope.
    I'm sure most in the comments will disagree with me, but if the default string type in a given language was a view of a string and made allowances for copying it only when it was coming from a mismatched source or when writing to it, especially considering the frequency of string literals being used, that it would be a better situation overall. It's really not as difficult for a compiler to track a source as it is made to be in C++ if you have a proper module system. This invariably means that closed source libraries will always be copied from as the compiler can't guess what's being done, but even then we could attempt cooperation with hinting.

  • @TheMrKeksLp
    @TheMrKeksLp Місяць тому +5

    Something something Rust