"A JVM threading model for the containerized times" by Luiz Hespanha and Flavio Brasil

Поділитися
Вставка
  • Опубліковано 11 жов 2023
  • The threading model of JVM applications has become a common source of instability and inefficiency in containerized environments. In a company like Nubank, one of the largest fintech, with a microservices architecture comprising over 1500 services, manually tuning the ideal number of threads becomes both daunting and risky.
    This presentation introduces a novel approach that addresses these issues by implementing a dynamic control loop and fine-grained load-shedding mechanism. The solution continuously adjusts the number of threads during application execution, utilizing real-time signals such as CPU throttling, CPU usage, memory usage, and runtime pressure. Meanwhile, based on a configurable maximum queuing delay, our load-shedding mechanism ensures that the application remains functional under stress by only rejecting work exceeding its capacity.
    This comprehensive solution has significantly improved the stability and performance of our applications while reducing associated costs.
    Luiz Hespanha
    Senior Staff Software Engineer @ Nubank
    @luiz_hespanha
    Luiz Hespanha is a Systems Performance engineer at Nubank, the most influential Latin America fintech. Working in the area for more than 20 years, he’s been in microservices since before they were called that and has also built useful large-scale systems using RESTful principles and messaging-centric architectures.
    Flavio Brasil
    Principal Engineer
    @fbrasisil
    Flavio is a seasoned engineer with a specialization in high-performance solutions for functional programming on the JVM. Throughout his career, he has navigated a wide range of technologies, spanning from intricate JIT compilers to sophisticated libraries for functional programming. Driven by his passion for open source, Flavio has contributed to numerous projects such as Quill, a prominent library for seamless database access in Scala, and Finagle, the robust library at the heart of Twitter's scalable platform.
    ----
    Recorded Sept 21, 2023 at Strange Loop 2023 in St. Louis, MO.
    thestrangeloop.com
  • Наука та технологія

КОМЕНТАРІ • 4

  • @mikeswierczek
    @mikeswierczek 6 місяців тому +2

    That's amazing. I'm especially impressed how Nauvoo throttles based on multiple metrics all together, not just CPU usage, or memory usage, or number of threads.
    One part I'd like to understand better is how systems reject work when they're overloaded. Does Nubank have everything going into the services using Nauvoo have some sort of automatic monitor-and-retry mechanism if a request fails, or do they have to engineer specific rejection handling based on the request type?
    Also, I love Clojure and the JVM but I wonder if the fact Nauvoo's metric collection and throttling logic is itself resource-intensive enough to mean that it might need to be written in C/C++/Rust/etc... and accessed through the JVM foreign function interface.

  • @dmg46664
    @dmg46664 7 місяців тому +2

    Interesting

  • @runderwo
    @runderwo 7 місяців тому

    Don't recent JVMs solve this by allowing the number of GC threads to be limited by config flags (e.g. to the number of cores reserved to the pod)?

    • @jayvkman
      @jayvkman 6 місяців тому +3

      Containers, processes, jvm as well as executors can be limited statically, but that may only work if you have a homogeneous workload, so this is an interesting way to monitor livelyness of the system and use that as hint to step on the gas/brakes and have more dynamic adaptive scaling.