Rethinking Java String Concatenation

Поділитися
Вставка
  • Опубліковано 15 вер 2024
  • Presented by Claes Redestad - Principal Member of Technical Staff (Java Platform Group - Oracle) during the JVM Language Summit (August 2024 - Santa Clara, CA).
    String concatenation got a big overhaul in JDK 9 with JEP 280, bringing throughput improvements at the cost of startup, warmup and footprint overheads. This video goes through the current implementation, discusses some past efforts at getting things up to speed, and a few workarounds we’ve recently had to implement to avoid resource starvation issues when expressions get too complex. It then outlines a prototype which stays true to the spirit of JEP 280 while solving most of these edge issues. All the while improving significantly over the current baseline on startup, footprint and warmup metrics across the board.
    Make sure to check the • JVM Language Summit 2024 playlist.
    Resources
    • Indify String Concatenation ➤ openjdk.org/je...
    • String Templates (Preview) ➤ openjdk.org/je...
    • Class-File API (2nd Preview) ➤ openjdk.org/je...
    • JDK23 String Concat Fix ➤ bugs.openjdk.o...
    • String Concatenation, Redux ➤ cl4es.github.i...
    • Project Leyden ➤ jdk.java.net/l...
    • PR - Consolidate and streamline String concat code shapes ➤ github.com/ope...
    • PR - Efficient hidden class-based string concatenation strategy ➤ github.com/ope...
    • Inside Java ➤ inside.java/
    • Dev.Java ➤ dev.java
    • JVMLS ➤ openjdk.org/pr...
    Tags #Java #OpenJDK

КОМЕНТАРІ • 27

  • @norbu_la
    @norbu_la 28 днів тому +12

    Skipping through this, I didn't realize I know nothing about the simplest things

  • @mjduigou
    @mjduigou 27 днів тому +9

    I am curious if you have a histogram of arity usage. If it were me I would be fine with there being a performance cliff at some point, perhaps at two standard deviations above the mean arity. The cliff serves as a warning to the app developer that probably should be using a different approach. There has to be a limit to how much effort to spend in making bad code perform well; give that effort to make the good and average code even faster.

    • @ClaesRedestad
      @ClaesRedestad 27 днів тому +4

      I don't have any histograms to share, but most apps I've looked at have a lot of low-arity concats then a long tail. But then there are those apps that generate massive concat expressions on the fly and skew the picture. 😅
      I think the now integrated implementation strikes a good balance. Small expressions inline and optimize *very* well, then as things get larger regular inlining heuristics in the JIT will gracefully degrade performance without ever really falling off a cliff. More a gentle slope.

  • @andmal8
    @andmal8 22 дні тому +1

    Thank you!

  • @dispatch-indirect9206
    @dispatch-indirect9206 25 днів тому +2

    Interesting talk, thanks. If I'm generating Java code that needs to target JVMs before 23, and might generate a complex concatenation, what's a reasonable threshold to swap out the concatenation operator in favor of emitting a StringBuilder expression?

    • @ClaesRedestad
      @ClaesRedestad 24 дні тому +2

      20? 😊 I'd suggest measuring. Appropriate thresholds might be different on some older JDK versions and on non-HotSpot VMs, so making it configurable is probably good.

    • @dispatch-indirect9206
      @dispatch-indirect9206 22 дні тому

      @@ClaesRedestad Thanks, will definitely measure, but my guess was pretty close and it's nice to confirm you're seeing the blowup about where I thought it might be.

  • @yaderanibal
    @yaderanibal 28 днів тому +4

    Yo soy fan de este lenguage, mi favorito.

  • @hiEroneta
    @hiEroneta 28 днів тому +4

    hope string interpolation kinda thing would exist in java soon.

    • @alecbg919
      @alecbg919 28 днів тому +1

      look up string templates in java 21.

    • @_SG_1
      @_SG_1 28 днів тому

      I assume that String interpolation would use this "low-level" String concatenation under the hood anyway.

    • @ClaesRedestad
      @ClaesRedestad 28 днів тому +6

      ​@_SG_1 yes, this was briefly mentioned near the end of this talk. While the templates feature has been pulled out for now, if/when we're redoing it will benefit from the work we presented here, increasing confidence in the runtime side of it.

  • @hkupty
    @hkupty 28 днів тому +9

    One of the things I miss in java is the ability to get immutable views from a string that don't incur in allocation. If I get a substring of an already immutable string, why does it have to allocate? So, in this sense, rust's mutable and immutable string types might be a good inspiration, though arguably requiring a better API for Java.

    • @sebastianb7496
      @sebastianb7496 28 днів тому +7

      Allocation by the JVM is surprisingly fast already and doing substring this way would probably prevent garbage collection of the original string, so idk...

    • @vasiliigulevich9202
      @vasiliigulevich9202 28 днів тому +4

      Substring method used to point to the original String, this causes memory leak, as substring held the original String referenced. This was fixed in JDK 1.7. Basically, the behavior you describe is a bug to be fixed, not a desired behaviour.
      Search "substring memory leak".

    • @hkupty
      @hkupty 28 днів тому +1

      @vasiliigulevich9202 funny how what you're arguing is a bug is a language feature in other languages. This is a false equivalent, as the bug in previous implementation isn't necessarily the only solution to the problem, so you shouldn't consider the problem as "a bug to be fixed".
      I'm advocating for a string view, which could be a different class, instead of reusing the same class with an added responsibility.

    • @hkupty
      @hkupty 28 днів тому +2

      @@sebastianb7496 it really depends on what you're doing and the order of magnitude we're talking about. Operations that happens tens or hundreds of times per request can be significantly impact a request handling by having tiny allocations and increased GC pressure.

    • @vasiliigulevich9202
      @vasiliigulevich9202 28 днів тому +2

      @@hkupty String view has the same problem, if it holds a strong reference. Arguably, language could intercept garbage collection event and make a copy of substring whenever original is collected, but that would degrade GC performance. Languages with strict lifetime control can afford non-owning references at cost of dangerous or complex lifetime management.

  • @mchiareli
    @mchiareli 28 днів тому +5

    We want String interpolation like every other cool language

  • @VuLinhAssassin
    @VuLinhAssassin 27 днів тому

    After years, String is still a pain to work with in Java (not counting text block)

    • @zappini
      @zappini 23 дні тому

      Which languages are better?

    • @VuLinhAssassin
      @VuLinhAssassin 21 день тому

      ​@@zappiniThe interpolation in .NET looks amazing

  • @cmyanmar13
    @cmyanmar13 28 днів тому +8

    I hated JEP 280 from the moment I heard about it. And now I know who to blame for it. It's SO OBVIOUSLY an absolutely grotesque abuse of dynamic class generation, SO OBVIOUSLY doomed to bog down startup and waste memory and clog the code cache. It has no redeeming value whatsoever and should be rolled back as the bug it is. Invokedynamic is the hammer that makes everything look like a nail. It prompted me to MANUALLY use StringBuilder more often, knowing that trusting `+` for concatenation is going to suck. Oh what's that, they had to install a workaround in the VM to use StringBuilder after all? No surprise. PATHETIC.

    • @ClaesRedestad
      @ClaesRedestad 28 днів тому +16

      Ouch! I had (almost) nothing to do with JEP 280 initially, but have picked it up and worked on the implementation after it got integrated, reducing the footprint and runtime impact over the releases. I admit I've been thinking quite a few times - and probably argued internally - that we should roll this back entirely or make it opt-in on a number of occasions. Alas. Still, there have been a number of indirect benefits come out of persevering - including a great number of optimizations in several related and unrelated areas.
      The recent StringBuilder workaround in JDK 23 is ugly, yes, and prompted the prototype rework that this talk concludes around. (Now integrated in the main OpenJDK repo and slated for JDK 24).
      On the flip side, we now also have a very straightforward path to link- and assembly-time static code generation that will allow pre-generating any string concats while safely getting the peak performance benefits.

  • @Sir_Ray_LegStrong_Bongabong
    @Sir_Ray_LegStrong_Bongabong 28 днів тому

    Bonjour