Mastering Your Microservices With Observability

Поділитися
Вставка
  • Опубліковано 7 тра 2024
  • It's hard to argue against observability being a good thing. However when you are working with a microservices architecture or other distributed system, it can prove extremely challenging to monitor or observe. Observability is often left as an afterthought for many organisations, leading to much more complicated problems in the future.
    In this episode, Dave Farley explains what observability is and more specifically, how to apply it to microservices.
    -
    📄 FREE MICROSERVICES HOW TO GUIDE:
    Advice to help you get started and focus on the right parts of the problem when you are creating a new microservices system. Includes tips on Design and Messaging. DOWNLOAD FOR FREE HERE ➡️ www.subscribepage.com/microse...
    -
    ⭐ PATREON:
    Join the Continuous Delivery community and access extra perks & content! ➡️ bit.ly/ContinuousDeliveryPatreon
    -
    🗣️ THE ENGINEERING ROOM PODCAST:
    Apple - apple.co/43s2e0h
    Spotify - spoti.fi/3VqZVIV
    Amazon - amzn.to/43nkkRl
    Audible - bit.ly/TERaudible
    -
    👕 T-SHIRTS:
    A fan of the T-shirts I wear in my videos? Grab your own, at reduced prices EXCLUSIVE TO CONTINUOUS DELIVERY FOLLOWERS! Get money off the already reasonably priced t-shirts!
    🔗 Check out their collection HERE: ➡️ bit.ly/3Uby9iA
    🚨 DON'T FORGET TO USE THIS DISCOUNT CODE: ContinuousDelivery
    -
    🔗 LINKS:
    "Observability Defined" ➡️ en.wikipedia.org/wiki/Observa...)
    🔗 "Microservices Observability Patterns" ➡️ lumigo.io/microservices-monit...
    🔗 "Observability and Introduction" ➡️ www.splunk.com/en_us/blog/lea...
    🔗 "How, When & What to Measure" ➡️ / microservices-observab...
    -
    BOOKS:
    📖 Dave’s NEW BOOK "Modern Software Engineering" is available as paperback, or kindle here ➡️ amzn.to/3DwdwT3
    and NOW as an AUDIOBOOK available on iTunes, Amazon and Audible.
    📖 The original, award-winning "Continuous Delivery" book by Dave Farley and Jez Humble ➡️ amzn.to/2WxRYmx
    📖 "Continuous Delivery Pipelines" by Dave Farley
    Paperback ➡️ amzn.to/3gIULlA
    ebook version ➡️ leanpub.com/cd-pipelines
    NOTE: If you click on one of the Amazon Affiliate links and buy the book, Continuous Delivery Ltd. will get a small fee for the recommendation with NO increase in cost to you.
    -
    CHANNEL SPONSORS:
    Equal Experts is a product software development consultancy with a network of over 1,000 experienced technology consultants globally. They increase the pace of innovation by using modern software engineering practices that embrace Continuous Delivery, Security, and Operability from the outset ➡️ bit.ly/3ASy8n0
    TransFICC provides low-latency connectivity, automated trading workflows and e-trading systems for Fixed Income and Derivatives. TransFICC resolves the issue of market fragmentation by providing banks and asset managers with a unified low-latency, robust and scalable API, which provides connectivity to multiple trading venues while supporting numerous complex workflows across asset classes such as Rates and Credit Bonds, Repos, Mortgage-Backed Securities and Interest Rate Swaps ➡️ transficc.com
    Semaphore is a CI/CD platform that allows you to confidently and quickly ship quality code. Trusted by leading global engineering teams at Confluent, BetterUp, and Indeed, Semaphore sets new benchmarks in technological productivity and excellence. Find out more ➡️ bit.ly/CDSemaphore
    #softwareengineer #developer #microservices
  • Наука та технологія

КОМЕНТАРІ • 26

  • @ContinuousDelivery
    @ContinuousDelivery  23 дні тому +1

    📄 FREE MICROSERVICES HOW TO GUIDE: Advice to help you get started and focus on the right parts of the problem when you are creating a new microservices system. Includes tips on Design and Messaging. DOWNLOAD FOR FREE HERE ➡ www.subscribepage.com/microservices-guide

  • @joyfulprogramming
    @joyfulprogramming 17 днів тому

    Fantastic video! Thanks for covering this Dave. Observability is so important and I've seen the benefits of this in production - before and after.

  • @jasondbaker
    @jasondbaker 23 дні тому +3

    Dave is absolutely right that observability is often left as an afterthought in many organizations. I attribute this to a few reasons. First, many organizations take the approach that building out observability is everyone's responsibility, but because there's no one individual or team providing strategic direction it fails to make much headway. Second, I rarely see observability requirements incorporated into developer stories or the "definition of done". It's common to see companies launch new services into production with little or no observability features in place. They might circle back later and add monitoring after an unplanned service outage. Finally, commercial observability tools are getting quite complex and many companies lack a training budget for their engineers. I can't tell you how many times I've walked into organizations and found that they were on their 3rd different observability platform in the past 5 years and they've only completed about 10% of the setup. Every year or two they decide that their current observability platform isn't providing any value so they go looking for a new one. Rinse and repeat.

    • @retagainez
      @retagainez 22 дні тому +1

      Anecdotal, but I agree. I've worked in a system where observability of things like resources and infrastructure health were there, but we wouldn't necessarily have anything like a "trace" to debug issues or "metrics" to track customer usage. For such a complex system where we needed to make new features quickly, it was odd to see such a lack in business intelligence.
      Most certainly the observability/monitoring was added AFTER an outage, I became familiar with the tools that observed the production system.

    • @TheEvertw
      @TheEvertw 18 днів тому

      Observability needs to be designed into the system by the Architect, and he/she needs to take responsibility that it is implemented. You need observability to test and maintain a system, so it DEFINITELY should be in the list of key stories for a new system. "As a developer, I want to know what is happening inside the system in order to debug it" -- and take it from there.

    • @TheEvertw
      @TheEvertw 18 днів тому

      An introspection / monitoring system is usually over-engineered. Learn from the UNIX principles: you can not predict at design time how you will want to use the data from that system. So let it generate a text-based stream, and make it interactive so that you can tell it what you want to see in the output -- like the log level. Than you can design your heart out for any storage, post-processing, logging or visualisation backend you may want.

    • @retagainez
      @retagainez 18 днів тому

      @@TheEvertw In my experience, the monitoring system was heavily under engineered.
      There isn't any need for deep insight, but it should be able to state its dependencies and anytime it interacts with them and the chain of errors that occur downstream in other services. I think most of the time that sort of in-depth insight is most visible in the current test suite rather than observability tools. The tools don't need to dive too deep other than to keep track of performance, stability, or user interaction.
      Rarely did I ever need to tie the specific log to a line of code. The real issue was just tracking down the origin of the error, which service was producing it and the circumstances to reproduce it.
      I agree that observability should be designed and engineered into microservices. But, first and foremost make sure your microservices are as close as possible to be a "micro" service. After all, observing is the way to attribute your work and how its has reaped benefits from and for your customer.

    • @retagainez
      @retagainez 18 днів тому +1

      @@TheEvertw Also I would further agree that designing observability is a bit more involved than simply adding it to production. The same benefits were often sorely needed into testing/staging environments where devs had no clue what was broken and why. Adding monitors retrospectively is great and all for keeping customers from getting angry at your own issues, but it misses the point of adding monitors and thinking about how the overall system might function and what one might expect from it.

  • @mrpocock
    @mrpocock 23 дні тому +2

    Orderability rarely if ever requires times. You can often use a causal graph, so that later events reference some ID that came from an event/transaction that triggered them. Events that are input to a transaction are "before" in the causal graph, and events triggered by the transaction are "after". You don't always need distinct transaction log entries, if you don't have things that are triggered by multiple events.

    • @karsh001
      @karsh001 22 дні тому +1

      You can use vector clocks as well. Works in many distributed systems.

    • @ulrichborchers5632
      @ulrichborchers5632 21 день тому +3

      I do not want to say that this can be a smell of a distributed monolith :-) And I guess that it is a good approach to give a unique id to every event and to include it in every log output, whenenver an event is processed.
      But with the suggested approach alone you have to make sure that all services, without any exception, are event-based and in a way that they have a system-wide common structure at least with respect to such an ID. With declaring time as obsolete to understand the sequence of events, you are imposing a hard requirement with this ID that all events across the whole system must have an id which must be unique across all different systems, different services and different instances of the same service.
      But can you really access that event id everywhere where logging happens? First of all, this would mean that all services must be designed in a way to have internal knowledge of events and even a basic data structure of those (violates encapsulation).
      Second, independently of the architectural style of microservices, all "lower-level" code must then have internal knowledge of events. This is a coupling which you would normally want to avoid. Or at least the ID of the processed event must be injected into log output fully transparently (and there must always be an event). But what if you are using libraries which have no knowledge about that fact that they are executed in an event-based system?
      So this means that you are introducing a global design constraint to all services and even to "implementation details", which can contradict the basic idea of slicing microservices in the first place, to reduce coupling.
      On top of that, you have to create a large tree of events (which you are calling graph) and you then depend on a tool for analyzing you log data which must be capable of this. Does it even exist? And does a logged "event" always depend on a single other event? And is there always a dependency on another event which originated somewhere in the system? And what if an event might have to be processed again for technical reasons ... does it keep its identity or is it re-created with a different id?
      So when wanting to rely on such IDs alone, you have a very hard architectural and system-wide design constraint which is even based on the assumption of a deterministic system, while distributed systems tend to be non-deterministic by nature.
      And you are still missing context data which is the log output from pieces of the system which do not have access to the current event:
      Let me give you an example. I once faced a defect where there was a buffer overflow in a driver which happened in a web server child process which was being re-used across different requests. The "event" which triggered the problem and the observable defect from a different request were not functionally related at all. What helped me to correlate those things to undestand and fix the problem, were two things, fortunately included in different log files: time and the process id.
      So while having a globally unique event id and including that in log output whenever possible is a very good thing to do ... it only takes into account the "known unknowns", only covering the happy path of expected failure, but not the "unknown unknowns" which are often the root cause of an unexpected situation.

    • @mrpocock
      @mrpocock 21 день тому +1

      @@ulrichborchers5632 thanks. That was interesting. I'm not convinced ids are quite a horrible as you last out, but they certainly address issues along the axis of the cascading shrapnel of work from an original triggering event, and not so much along the axis of all the things that happened to service x, which is what you needed for your buffer overflow.

    • @ulrichborchers5632
      @ulrichborchers5632 18 днів тому +1

      @@mrpocock Thanks for that reply. Logging (event) IDs is not horrible, I did not write that. On the contrary, if you have them, they belong into the log data and will provide useful context for sure! That is important advice from you.
      I just gave an example why time can in fact be relevant for correlating log data from different sources, as described by Dave, and you seemed to opt out of that ;-) Yet the topic is not about event based and asynchronuous systems (the mentioned flavour of microservices for example) alone.
      I could give more examples, so I was not referring to a single and special case there. Of course the question might arise what observability means ... if you particularly want to make the runtime behaviour of an event-based system more transparent, those event ids are obviously the way to go. Due to the asynchronuous nature of such systems, time can even be misleading, of course, depending on what you want to query your log data for.
      But if observability means to have useful information available when needed and not always knowing what will be needed in advance, so in a more "general" sense, then you will benefit a lot from putting as much context, in a structured form to be able to correlate values, into your log data as possible without crashing the log partition (avoiding duplicates on high traffic systems for example).
      So back to time: You can also see in this video that of course there is a bit more detail to logging and interpreting time entries in log data. First of all you will want a much more fine-grained timestamp because a unix timestamp down to the second alone is sometimes not enough. Second, for a distributed system with respect to multiple physical machines you will want to synchronize clocks, while of course this might not give enough precision for some situations, I guess (and don't forget the runtime of THAT signal :-D ).
      But I am only repeating here, but for a reason ;-) Let me give you a second example: Recently we implemented a page with asynchronuous events with Javascript in the UI to update multiple snippets on the same page at once via Ajax. Each event triggered an ajax request to fetch a partial and update the DOM.
      Unfortunately there was a race condition on the session in the backend. I will spare you most of the details about THAT here. But fortunately the backend logged a timestamp down to the microsecond and fortunately this all happened on the same machine, so different clocks were not an issue. This allowed to reconstruct the actual sequence of the otherwise asynchronuous events and to undestand which request from which route came a little bit earlier, which came next and then to understand that one request overwrite the changes from the other one because there was a non-blocking session handler.
      So there were asynchronuous events, but those were really simple and not at all part of an explicit event-based system. So the timestamp did in fact help a lot to undestand the sequence of events, where there was no event id after all :-)
      But the main point of this video is to collect useful log data which can be correlated to make a system observable, while it is often greatly underestimated how useful this actually is. So we might be one the same side here after all.

    • @mrpocock
      @mrpocock 18 днів тому

      @@ulrichborchers5632 thanks. Yes, logs are essential. I had one service where I was timestamping, and it was actually throttled by going to the system call for the time. I fixed it by reverting to a seconds clock and a counter, which was guaranteed to only count upwards in each second but may reset between each second. Obviously completely useless for comparing times between boxes. But documented linearity on the one box.

  • @JorgeEscobarMX
    @JorgeEscobarMX 22 дні тому

    Observability is a must that we don't really have at my job. I work as a data QA engineer.
    What tool could be used to generate dashboards that feed from actual database metrics, generated from the database engine or otherwise?

    • @karsh001
      @karsh001 21 день тому +1

      Splunk, Grafana, Graylog, ... there are quite a few.
      Define your needs
      Define your painpoints
      Refine your needs
      Define requirements
      Research solutions that fits your needs.
      If you have an infrastructure provider, they may have some solution you must/should use. Having a clear idea of your needs and painpoints before starting the dialogue helps everyone involved.
      And never assume that everyone know what you are talking about. Start from the beginning: what are your problem, what are you trying to achieve, what is the scope and context (in-house development, COTS, saas, partner dev, and so on).

  • @SalihGoncu
    @SalihGoncu 23 дні тому

    I would like to see the results of the experimental development in the neigbouring powerplant. (Nuclear or non-nuclear) 😊

  • @br3nto
    @br3nto 23 дні тому

    To what extent should logs and metrics be part of our data model vs separate from?? In my mind, log files seem to represent all the things we want to know but haven’t incorporated into our data persistence model. In theory, each log represents an actual event in our system that should match to a well defined process. Logs seem lazy; and incomplete solution. If we instead log these well defined events to a database, we can then query join and filter that data using one solution instead of using a separate log technology.

    • @retagainez
      @retagainez 22 дні тому +2

      Well, as Dave mentioned, microservice developers might have different standards from team-to-team. If done separately from the data model, logging that data would be most useful if its all connected together in some form or another (correlation IDs so that you could query it) and that could require disciplined teams to do that.
      I agree it is lazy. In my experience, you'd be able to create a partial picture from the logs and query it using something like ElasticSearch, but it was hardly ever conclusive enough and mostly a scratch on the surface for something that needed to be further reproduced. This is THE problem and solved with a unique navigation of how teams are organized along with smart solutions that provide the necessary and exhaustive set of logs/metrics/traces for any particular event.
      I'm mostly drawing from my anecdote and working in an ElasticSearch logging system that had logs but still not valuable on its own without even more context and data.
      I guess one question would be how easy would it be to add testing for the logs associated with the transactional data? Whereas if you do it separately, you might not be able to test for the existence of logs. It might just require discipline. Maybe if your business is to sell data with logs associated with that data to your customers? Otherwise not sure.

    • @mrpocock
      @mrpocock 22 дні тому

      I think the intuition is that you can ingest those logs into batch jobs or micro services that are a good fit for that particular use of them. E.g for a particular dashboard widget or report.

    • @ContinuousDelivery
      @ContinuousDelivery  22 дні тому +2

      I think it really depends on the rigour with which you use logs. A RDBMs is a log-based system. They are implemented by keeping and processing a "Transaction Log" of changes that modify the data, so as long as you collected ALL of the significant events, you can accurately replay a complete picture. We did a similar things, in a totally different way, at LMAX building one of the world's highest performance financial exchanges. So Logs aren't necessarily a poor tool, but we often don't use them very well.

  • @Flobyby
    @Flobyby 23 дні тому

    I'm fairly sure today's shirt is specifically not a qwertee one