That looks similar to the Varian 600 I used to work on in 2003 in UK Training was completed in Milpitas. I assume this couldn't happen in our machines ?
Great video, new subscriber here. Just pointing out something constructive - "interface" and "interlocks", not "innerlocks", etc. They are literal opposites . Thanks for the efforts
Maybe the software has been tested only on a hardware emulator i.e. another software sending/receiving data.. Which may lack the simulations for the actual hardware delays.
tested that it COULD work, or that it CAN work. The first is the dev pov "set it like that, it works". The second is a UX pov, stories and personnas are taken in account.
Man when some minor employee screws a document they fire him and his job is done, he can retire and enjoy staying in home with no other chances to screw up anything else never again I just dunno who can fire this... Self made, self tested, self guaranteed If you cant do it, dont start in the first place, You murderer
My first year CS professor started the entire class with a lecture on how bad code can kill people and how we should take bugs seriously. It's always chilling hearing stories about this.
@@r4ych836I'm not sure if bad code has ever killed people other than via the Therac-25... I'm also not sure I'd want for examples to exist, for obvious reasons...
@@r4ych836 my last lecture in an AI course was solely dedicated to ethics. Our professor shared some examples, but one was that an ai made to make insurance rates for people started to give racist rates. Because it was trained on data going back to the 40s or something, the AI was biased to giving certain minorities worse rates than they should have been given
Been there. In the mid 1980's I was treated in a Therac-25. Fortunately, mine didn't fail. I did hear about the failure and it took all I had to walk in there every weekday for 30 treatment days, knowing that it COULD fail. I worked in software and had a sub-routine called SNO - for Should Not Occur that printed an error message and exited. I was amazed at how many times I hit that routine. I was very glad it was there. FYI - I am now up to 5 rounds with old man cancer and I am still here. The average for Mom and the three kids stands a 7, so I get to look forward for two more. Yea Me!
Their entire industry is a scam. House of cards built on lies used to prop up another house of cards. This is standard dark triad operating procedure. 100% expected, in fact, you should just assume it and look for the exceptions.
There is actually an ISO with very strict guidelines on how to develop such critical software, one thing which particularly stood out to me is static memory i.e. you know all the necessary resources beforehand and prepare for the worst case. There's also a tracing level for the documentation, where the max level is being able to trace all use cases through requirements to every line of code responsible for them. So there definitely are methods for prevention. NASA also has some interesting design processes, if you're interested in reading. Using an old, specific JavaScript engine in space is one of the consequences I find quite funny.
Who allow this industry to function like this? If there is something wrong with them, why they dont make them stop? This is not a kindergarden to play with laser guns, or playing as a doctor
@@erikkonstas it's proprietary code, they're not just going to give it away for free. That same code is based on older IP from the previous machine. And I imagine future iterations will also have common code. Just like how Windows still has code from 20+ years ago in it.
The more you learn about engineering, not only software but hardware, mechanical, etc., the more you learn that world around you is held by duct tape and prayers
As someone who works in industrial automation, this is way too true! Our ability to put duct tape on a system without shutting it down, even the code, is pretty spectacular though!
As an engineer, not entirely true. I am in the biomedical industry and maybe its just us but our products get tested. Like a lot. Some of the tests sometimes seem ridiculous but you are frequently reminded that regulations are written by blood.
Race conditions are notoriously hard to debug. Because you only have a couple milliseconds for the exact right conditions to occur to trigger the bug. This is a race condition with an EIGHT. SECOND. WINDOW. Had they tested it properly, they would have been almost guaranteed to find this. This is not just negligence, this is recklessness.
this is also one of the best examples of why you need to pay people to try and break your software. you will by default always enter things correctly, you wrote the software, you wrote the procedure manual, you know what you are doing. the user DOES NOT KNOW WHAT THEY ARE DOING, and therefore is not restricted by your assumptions. this allows them the freedom to screw things up in ways you never imagined. This is why it is preferable to pay highly trained chimpanzees (also called QA testers) to find these issues first (no offense to QA people, you are literal life-savers)
@@skellious I worked on FDA regulated software, and we'd recruit complete noobs to test it, maybe a project manager or someone who has no knowledge of software or the subject area. We called them "monkey testers" and they'd misunderstand just about every instruction, thus flushing out all sorts of bugs that knowledgeable users would navigate around.
@@d00dEEE We also do the exact same thing where I work in the NHS. I build an application and test it thoroughly. We then send it off to doctors, clinicians and ward staff to test before actually going live. Because they do everything so "wrong" they are able to produce errors me and my team wouldn't have thought of ourselves and then we're able to fix it before making the application accessable to the whole organisation.
I'm a full-time software engineer and part-time nuclear/radiation nerd. I've heard this story on other channels and read about it online, but nobody else goes into detail about the software aspects, which I find the most interesting. Great stuff LLL!
@@danmurad8080 I think the testing aspect is very important and would've absolutely reduced the severity of the failure, but in this case when hardware is involved, relying too much on software to assume the hardware state without any way to verify it is begging for disaster. Even something as simple as a 3D printer would be a lot more hazardous without sensors to ensure the hardware is in the right state.
More like: Writing async code. Of course it had bugs. Most languages do not prevent data races, and I have yet to hear of a language that would help in this specific occasion without support from the hardware itself, i.e.: in this case the magnets and filter.
It had been written by a hobbyist student ofcourse it would be dangerous. A hobbyist never has the mindset to think about every critical safety system that has to be implemented to make software safe
I feel like the bigger problem was the primary design choice of the programmer to have an interface where you can freely write any kind of data and have a confirm command at the bottom. Every user would assume that nothing happens to the machine until the confirm command is sent. Why would he make the machine read certain values instantly long before the confirm command is read?
The message didn't say that, the message said 'dosage input 2' error. The 'dosage too high or too low' message was associated later on. The interface also said the user had only received 6 rads out of 202. The operator was totally correct in unpausing it. If the process only just started, there's no problem unpausing it and continuing to the end of the procedure
Reminds me of those 'scientists' measuring radiation in Chernobyl disaster to be exactly the level of the maximum value the device could show on the scale. Was quite plausible. 😀 I still don't get it, nobody blamed that 50/50 thinking medic ? I am not allowed to do things that have a 0.000000001% chance of harming someone.
It would be a public service to have a series on this topic: "Code that kills". There are many cases like this on which code that runs essential infrastructure end up costing lives? Thanks for sharing this!
Considering how many other failings, most of the blame isn’t the code. The decision to remove the hardware interlock, and just reuse the previous software (which was designed with the assumption the interlock was there), without extensive testing and examination was the biggest failing.
Code that kills Would it write it for me? With your hand so still, it makes me believe In the software's sins Let me compile now and never die I'm alive
Reminds me of when I went to the dentist to get an X-ray, and saw that the machine was running Windows Vista. I felt like I was in a Final Destination movie.
Uh, sorry to say, but you practically were... *[CONTENT WARNING] Careful before clicking "Read More"* Amongst other shit, very early versions of Vista were notorious for just up and crashing out of nowhere, or even not booting at all for no reason (the then infamous Red Screen of Death); if it crashed, who knows what would happen to the ray emission???
Luckily for dental the usually the xray emitter is its own device and doesn't use an external pc.The 'film' or reciever is what is connected to the pc. Unless you are getting a panoramic then its up to the manufacturer lol. Hopefully the engineers put hardware interlocks on everything now .
I hate this "too high or too low" type of error. It's like searching for an email on Lotus Notes: "Your search returned no results or too many results". Please be specific with error messages
This is common in trying to sense things with computers- they usually have a limited sensing range and once they reach their limit, they typically don't give any other readouts besides the highest or last recorded value. This can be either because of software or hardware limitations and the programmer can only put out his guesses on what went wrong. I have only worked on low voltage electronics so it wasn't that bad but you get the point- the sensor was clueless because it was overwhelmed.
Sounds like a chain of failure from the machine, to the hospital in all regards. The machine manufacture did not care and the Hospital also did not care too.
I can understand the mistake by the hospital administrators. They are paying top dollar for cutting edge equipment, so they kind of expect it to be made with high standards. But the main fault is at the company, it's a chain of negligence.
The bigger takeaway from this story isn't that ancient code lacked logic and user input safeguards. Rather, that Therac's upper management made unethical design choices to lower the cost of production. Coupled with minimal pre-shipment testing of said units. "It was decided to remove the physical (electro/mechanical) safeguards and rely entirely on software to lower costs"!
@@PencilPlane I can't imagine the dead look from the operator looking at the screen. Thinking that they might kill a person due to a high dose of radiation. Imagine the trauma that they have to endure.
I remember the Kyle Hill video on this, and he glossed over the software bugs part of the Therac tragedy. This shines a different light on the importance of software safety, especially in mission critical or life saving tools. Kyle's focused on the tragedies and their relationship to the ongoing nuclear age. Very different and interesting perspectives. As a software engineer, it is always chilling to recall this story.
I agree, I think both videos shine on their own different (but valid) intended context and thus, their own perspectives. While Kyle's video focuses more on the whole incidents as their target audience is for the broader masses, this video focuses more on the software itself. Nevertheless, I also think both videos do succeed in bringing the negligence and recklessness of AECL and hopefully can add more to the topics on code safety as a cautionary tale.
One of the most amazing things about this ordeal is at one point, AECL issued a bulletin telling the hospitals to use a screwdriver to pry the up arrow key off the VT100 keyboard, and to glue the key switch in place so it couldn’t be activated. The FDA was not amused by this “fix”.
Something like this, where someone's health is at stake, should have had a team of programmers agreeing on, and reviewing each others code. The root cause wasn't the lone programmer - it was all those above him who signed off on that lone programmer. Disgusting working practice and yes, that lone programmer should also have recognised the danger immediately.
To me the worst part of this was the removal of hardware interlocks. Software can NEVER be relied on 100%, even if it has been extensivley tested. Physical switches and relays should ALWAYS be in place for safety critical applications. If there were hardware interlocks in place in the Therac-25 this would never had happened. Sure, the bug would still have been there, but the machine couldn't have hurt anybody as the emitter PHYSICALLY would not have been able to activate without the magnets in place.
He was a lone programmer, working in assembly on a rather complicated machine. He may have been a "hobbyist", but I reckon he is more skilled than many current software engineers. Mistakes like these happen, logical errors and race conditions are incredibly common when working on any complex system. He "should have" caught it, is not expected. In fact current software engineering practices expects programmers to make mistakes like these. Which is why as you said, we have pair programming, code reviews, unit testing, etc. In critical systems like this, break testing should have been done to identify potential failure points.
@@nelsonahlvik6650Not just that, what if some hardware filter breaks off the machine and BOOM, EVERYTHING within a 5km radius is exposed??? Yes, the software would be bug-free, but the plastic broke physically so the radiation core was out there, not controlled by the software anymore...
@@erikkonstas The Therac-25 (and its older sibling, the Therac-20) used a double-pass accelerator that did not use a radiation source (such as Cobalt or Cesium) like older machines. The double-pass system uses a magnetron to create a beam, which only activates upon operator input to start a treatment. So, thankfully, if you're not in the same room as it, you're probably fine. This was probably the ONE good thing the Therac-25 had going for it. As a sidenote though, incidents of exposure via radiation sources from old radiotherapy and xray machines have happened before, and it is not pretty. I would imagine most radiotherapy machines nowadays use a magnetron instead of a radiation source as it's much safer, more easier to maintain, and easier to decommission. No deadly radiation sources, all you need to do is disconnect the power and it's powerless.
I'm working for an organisation that creates training checklists for operators working and operating machines in manufacturing sector. This video is an eye opener for me to why I must be more focused when writing my code. People's lives depends upon what I write.
Sobering and sad. A reminder that clean, thoroughly tested code is crucial, together with the assumption that there still may be bugs no matter how many edge cases are accounted for in the tests.
@@nelsonahlvik6650Or if vulnerable parts of the hardware go wrong, the locks protect the entire vicinity (e.g. if the locks worked correctly, Chernobyl wouldn't have exploded).
I'm studying programming in university right now and this was one of the examples my professor used to demonstrate how a mistake in code could have massive and sometimes even fatal consequences. He also pointed out that with more testing and a better graphical user interface this all could've been avoided.
I started my career in Dental X-Ray designing and manufacturing company as a Junior Embedded System R&D engineer. There the hardware team has the master role always they critisize and having less trust in software😂. I remember how regorous regression tests they've done before going to launch a product.
you are my man, ... if I may ask you. I noticed I am not always given those lead filled radiation protective ponchos (i don't know the exact name) any more nowadays when a dental X-ray is made on me. Am I right thinking it is because the newer (cone shaped beam) machine produce less radiation dose, and also less stray radiation with the cone shaped beams ? ..or just negligence and I should ask for one.
I first heard of and learned about the Therac-25 in a college technological ethics class. But I never knew what exactly happened in the code! So interesting and tragic!
Examples that immediately come to mind are the assembler bug in the moon landing (could be fixed) and entering imperial values into a metric controlsystem by NASA, I think (crash and burn).
yea simple and welle xplained demonstrated, more MORE im a cancer survivor i had chemio and radio and pills, i was curious the nuking machine is intense for sure, when nurse use 2 inch lead vest "oh its just to protect me from being nuked alive by your treatment" they literaly told me "to kill cancer cell we kill you and cancer and hope you survive while cancer die" O_O ok lets try lol
Writting software is a weird experience. It doesn't matter how many scenarios you've simulated and prepared for, there's always something that WILL go wrong.
If you go into military/FAA spec hardware verification, it reaches a point where EVERY bit of every variable MUST be toggled. The most advanced testing methods either spam your inputs with every possible combination of data, or they use Mathematical proof software (!) that verifies that no failures are physically possible. The airplane control software CANNOT fail, and you must prove it as such. One guy. Assembly. No testing. I might not sleep tonight...
The problem with testing is that you test what you think you should test. If they would never had the idea to change the mode afterwards, this bug might have been unnoticed despite testing.
That's only if you do basic software testing. in reality, there are like 15 levels/variations of types of testing and one of those is throwing random inputs at it repeatedly to see what failed
When I graduated I was offered a job as a software engineer at a biomedical company that sold medical hardware to hospitals. I didn't read to much into the details but it was a machine that was built to automatically feed (on a timed interval or when certain conditions are met etc.) doses of medicine/substances via IV to patients. They also sold heart rate monitors etc. The pay was good and the job was very enticing but I could not bear to accept it, precisely because of things like this, that were shown in this video. I could not handle the stress. Constantly having to worry if my spaghetti code is going end up costing someone their life (accidental overdose). Fuck that! I know there are engineers out there that write better code than me that would be better suited. I have no problem admitting that. I don't need this level of worry in my life. I am good. Something like this happening and me being responsible has to be one of my biggest nightmares as a software engineer.
You are also self conscious enough to anticipate these things happening. Which alone makes you more qualified than most. Had these managers more of that, those deaths could be avoided. But greed clouds judgement.
by writing what you wrote, you are more qualified for that position than 99% working in the medical field. I used to have your attitude, but then I found a lot of programmers in medicine just have a better "fuck it" attitude than me. "Who could have known"? You, you dummy, if you did your homework.
Just a guess (don't understand this subject), but I think it's because the x-ray mode projects a strong beam that is then "regulated". The problem was that the "regulator" was not in position.
@@shimadabryou're pretty close. To produce X-rays, the machine accelerates electrons and then crashes them into a tungsten target. The target stops the electrons and X-rays are produced. The dose rate from the X-rays is less than 1% of the dose rate from the electrons - most of the energy is lost in the target as heat. To produce election treatments, electrons are accelerated with no target in place and deposit their energy in the patient directly. So for the same electron beam current, the X-ray dose is orders of magnitude less than the electron dose. Or, put another way, to get the same dose, the beam current must be orders of magnitude higher in X-ray mode than in electron mode.
I would say, because the software doesn't check the dose before sending it (aka: the dosage doesn't have a prefixed limit for each mode in the software) it got sent anyways
Let's say you want to send 25000 electrons, but you put it on X rays, it will do it because it doesn't have a safeguard on it, that tells the system not to do it since it doesn't have any hardware safeguards either
@@DevinBaillie- Thank you for that info. I've always wondered why these machines could produce lethal radiation doses? The explanation of the software glitch made perfect sense, especially given the vintage of the equipment. But the magnitudes of higher overdose of radiation never made sense to me. I'm betting most reading these comments after watching these Therac videos still don't get it either. The now known software glitch would cause the unit to enter X-Ray mode, without enabling the electromagnetic beam deflector to hit the Tungsten target (instead of the patient being the target of 10-20,000RADs). Poor victims of these machines, ☢️ probably one of the longest most agonizing ways to go! 😱
as a programmer, I've always been afraid of going into something as serious as the medical field. I'm not always 100% confident about the code that goes out as I don't have testers, my code get's tested in live environment. I can't have blood on my hands.
I work in industrial automation, and there is a similar issue for us when something is critical for human safety. Safety critical aspects of our programs must always be tested and validated by a third party, and I wouldn't have it any other way. That said, usually safety stuff is pretty simplistic and pretty much guaranteed not to fail even before testing. It's a whole other world working on a medical device like this.
I took a safety class that presented an interesting perspective on the question of "can software fail". You seem to say yes. In the class, they claimed that software does not fail, because it always does what you tell it to. Whether what you told it was what you wanted, that's where you get problems. But that's not the software failing, that's you failing.
"can software fail?" Can it fail at what? Can it fail to do what we expect? Absolutely. Can it fail to do what it should do according to its instructions? Also yes, because rarely you can get a random error like a flipped bit in RAM, or even an error in the design of the CPU. So I would say software can fail either way you look at it.
@@WilcoVerhoef Which in turn could cause the software to fail or do something it wasn't supposed to do, which could've been prevented if you had made it better. So software can fail and have glitches, so I don't understand your point.
I actually was on cancer treatment in 2023 for a tumor i have in my brain and Iv'e been in one of these machines. So scary how it could have gone wrong. Thank god i didn't live in those years when these machines were that dangerous
This is exactly why I fight so hard as a programmer to employ good testing strategies, a testing plan is always better than a good lawyer in my opinion. I'd not want people to die for my mistakes, I'd dedicate heart and soul to good software engineering.
As a programmer myself, this was like watching horror movie. Like nowadays even the internet form that You use to order socks has more automatic tests and testing process then that machine that x-rayed those people to death. Really I cannot imagine the despair to be the ones that got that killing dosage :(. Like ... every programmer I know uses more or less defending coding strategies, I just cannot imagine I would even allow the machine to emit that dosage in too short timeframe. Just, I am shocked.
As a Software Engineer, I can say the statement "Software Will Fail" is very true. The only real way around this is redundancy, and in software, that typically means multiple independently developed systems which must all agree on an answer for it to executed
Don't forget, in the 1970's even 16 kilobytes was a lot of DRAM. Devoting a few KB's to error codes or safety redundancies would have been a huge deal.
This situation sent a chill down my spine. It reminded me of the time when I was designing security and door opening systems and the fears I had of software bugs or electronic design flaws. The extended weeks I would leave one system working alone 24/7 while another system monitored it. I can't believe that an industry that ships a machine with the lethal potential of this one would not test for it or even be tempted to eliminate fail-safe mechanical systems.
One of my Comp Sci professors, Clark Turner, was part of the investigation into the Therac-25 incident and I remember him telling us the story about how he and another person found the race condition that led to these people's demise. They wrote a paper about the investigation. Crazy stuff
Having a hobbyist write the program honestly isn't a huge error in my eyes, as I'm sure he was plenty skilled. What blows my mind is that it was never properly tested to ensure this type of thing was impossible. It doesn't matter how skilled you are at programming, you will make mistakes. We rely on others to help us catch them and correct them.
In this case, the programmer was programming in assembly. Assembly is an extremely difficult low level language that hobbyists should not be using to make medical devices with
I'm putting exactly 0% on the developer. The company that contracted them didn't do their due diligence, and you can't expect a solo dev to account for EVERY single edge case. They chose to test in prod.
I'd put _some_ culpability on the programmer, but the majority of it definitely falls elsewhere, from the lack of a physical failsafe (compared to previous models) to the cultural perception that "software doesn't fail". The simple fact that these incidents were preceded by any error message _at all_ indicates that the software itself detected something amiss, it just wasn't capable of identifying specifically what or why.
I was a surgical lighting service technician who spent 20 years and 6 months on the road and in the workshop repairing, designing, developing, and modifying imported equipment to meet local standards with occasional type testing. I learnt to no longer be surprised at how manufacturers used inappropriate materials, components, and mechanical and/or electrical designs that were sometimes fundamentally unsafe. Often, the worst features of a product would be forgotten and repeated a few equipment generations later, with each new model being fundamentally more complex, less reliable, more costly to own and with an ever shorter lifespan. I'm so glad to have left the industry and hopefully all of my trailing liability behind.
Yesterday I took an exam for a computer science / electrical engineering and those races were also part of that course. Now I feel a little guilty for having somewhat skipped over that part.
As someone who knew and (like EVERYONE) was "bored" to do testing on my software, I have now done a complete 180 degree turn and testing is ALWAYS in my mind! Test your software people! Write A LOT of SIMPLE and easy to debug tests (because remember, tests are code as well and they may have bugs)! And try to think about edge cases!
For me as a programmer for machinery it is the typical "blame everything on the programmer" thing. It is normal, that code has bugs and you will never find all of them. Therefore mechanical and electrical safety locks have to be implemented to prevent such malfunctions. In this case the software didn't do much wrong. It even gave an error message. The main software problem was, that it was possible to skip the message and continue. The main problems in this case were the removed hardware lock in this newer model of the machine for cost reduction. And the decision of the management to let the costumers continue using this devices, even after more than one accident was reported with this machine type.
As a programmer myself, it's absolutely horrendous seeing stuff like this. There are so many people who don't consider what impact their code might have. For example, when your phone decides to force an update, I guarantee you people have died because they have been using a flashlight and their phone decides to update at a critical moment. I almost had this happen to me once. Schools really need to teach programmers things similar to engineering ethics, you really need to consider the most extreme cases for what you are doing. Someone's life is on the line.
bro wtf why does it start before you even hit the start button, bro why doesn't it double check the conditions with something that dangerous and change if it notices new values!?!?
This highlights how even simple syntax errors can compile and run, but not work as intended. There's an old joke that only people who code will get, but it's hilarious because everyone who codes in multiple languages has had to contend with the differences in syntax: if (GoNuclear = 1) { launch_nukes(); } else { remain_chill(); }
Underground explosives engineer here. If I made even a minor mistake (in millisecond timings by primadet). People could die, or the blast would go terribly wrong. Note: Everything on earth dies, the people would die anyway.
One example I have from my fathers experience was in which he had assembled and installed robotic arms and the plc's he'd designed at a car plant and the programmer came in to do the software setup and calibration my father had made him aware the safety isolation switches hadn't been completed and he was like no its fine so he proceeds to send inputs to the robotic arms which also had in production car bodies on as you can guess the arm slammed through the roof of thirty cars as my father had to attempt to stop the incident the following day those cars had scrap marked onto them.
Why would the machine be able to physically give such a lethal dose in the first place, regardless of the software...I mean NO ONE is going to be prescribed such a high dose... ever !! 🙄
Even in the ‘90s there was still that attitude of software doesn’t fail. Take the despatch software that the London Ambulance Service started using in the ‘90s (LASCAD).
I know hindsight's 2020, but I don't know why there wasn't an event handler, even a basic one, so that nothing would happen without operator input. I understand this was one person, and I truly admire that they built this by themselves. They had to be under a shit ton of stress because someone that talented should be able to foresee the issues with reading inputs prematurely.
@@khatdubellI mean an explicit event handler, such as a button. The machine was doing stuff while she was still entering/correcting data. Nothing should have occurred until she was done entering the information, and she hit a commit button.
Good example of how dangerous OT bugs can be. If you're going to rely solely on software to control equipment, then you had better do some serious testing to make sure bugs that could kill someone don't exist.
The problem with bugs is you have to test every condition you don't plan for. It's always some obscure condition that no one thought about that happens and causes the issue. There is no way to test every user accident in freak cases many times. You can test code for function but you can't test user situations. There will always end up some strange case where an operator did something you had not planned for.
Well said. In other words, reliance on testing can never deliver defect-free software. Instead, it's necessary to somehow _prevent_ errors in the first place.
"Software can't fail" has got to be the single most terrifying thing to hear someone say when they're creating medical equipment. I mean, even in hardware you don't rely on a single point of failure, why would you do any different for software?
Yeah, forget about "nuclear deterrence" - the only reason humanity has managed to refrain from starting a nuclear war yet is the collective fear of some off-by-one error in an old piece of COBOL code causing the missile to detonate right at the start. How embarrasing would that be!
As an IT specialist i can say we love redundancy's... for this exact reason, a machine reserves the right to fail anytime it wants. So you need at the very least two failsafes.
@@309electronics5its not about the developer but the company does not fully test the system. The right thing is you should employ a separate QA to handle this kind of edge cases
This reminds me of picture from a Testla car recently that stopped functioning completely due to software update error. May be it's time for humanity to wake up and realise that relying on software too much creates a fragile world that can collapse at any moment. We should maintain some mechanical aspects and use software to optimise not to fully control stuff.
As bad as the code might have been written, you gotta give it to the guy to actually put in an error message that informs the operator that they are at fault and should rethink their settings.
I feel bad for this dude. He might not even know they defeated the safety device and might not have been rad qualified or given a source to really check other than timing and position, and the year... Imagine him giving the hospital a hundred miles of docs and procedures to test and they just, went to production... Imagine the operator not even reading the manual for a device like that or watching it work with the covers off to understand it's operation. Still though, get a code review at least. That said if I was this dudes associate and he said hey look at this, I wouldn't admit my eyes hit it unless I told him it was champion or bug free, and even then was the 8 second race condition in the docs? The reviewer might assume a quick click and done sub half second race. What a tragedy all around.
@Tartarus144 it's insane that these corporation will do anything for profits, even if it puts someone's life at risk. I'm certain that some employee might have asked for better testing and got his concerns ignored. I see that even today
This is not a programming mistake. This is the result of a poor software development process aka a software quality assurance issue. This issue should have been found during testing.
Your explanation of the actual failure is not correct, as far as I can tell, reading a report into the incident. Firstly, you can't shape X-Rays (or any photon beam) with magnets. These magnets are used to shape the electron beam when in electron mode. Secondly, the machine produces an electron beam as its native source of radiation, and needs a target in front of the beam to produce X-Rays instead, as well as a flattening filter which attenuates a lot of the beam energy. Producing X-Rays requires the maximum electron beam power, a power around 100 times greater than that used in electron mode. Therefore, the patient did NOT "receive X-Ray radiation at electron radiation doses", they actually received electron radiation at X-Ray radiation doses, since neither the target nor the flattener is placed in the beam (7:43)
To me the worst part of this was the removal of hardware interlocks. Software can NEVER be relied on 100%, even if it has been extensivley tested. Physical switches and relays should ALWAYS be in place for safety critical applications. If there were hardware interlocks in place in the Therac-25 this would never had happened. Sure, the bug would still have been there, but the machine couldn't have hurt anybody as the beam PHYSICALLY would not have been able to activate without the magnets in place.
For many systems today "hardware interlocks" are not feasible. It is not possible to implement, say, antilocking brakes or fly-by-wire systems with hardware performing the safety role. Or say a medication calculation software or patient medical record system. A wrong dose or a missing re-call for a patient that has cancer or the wrong patient's data shown to a doctor can all kill.
I used to work for an electronics company that designed and manufactured the electronics for Bobcat. They were obsessive about software testing. Every product had a 2" thick testing manual that took at least a week to perform. And any change to the software, no matter how insignificant, required full testing, and approval. Even if the software wasn't changed, but just recompiled for production, we were required to perform full testing and approval before use. Their argument was that it was a small price to pay to avoid a problem in the field with their customers.
🏫 COURSES 🏫 Check out my new courses and find the SECRET discount code at lowlevel.academy
hi
no
yes@@gipugly
That looks similar to the Varian 600 I used to work on in 2003 in UK
Training was completed in Milpitas.
I assume this couldn't happen in our machines ?
Great video, new subscriber here. Just pointing out something constructive - "interface" and "interlocks", not "innerlocks", etc. They are literal opposites . Thanks for the efforts
"never tested until it arrived at the hospital..." Thats got to be the worst case of testing in production ever recorded
Speaking of which, wanna go to the Titanic in a submersible?
only to be topped by the OceanGate Titan sub.
Maybe the software has been tested only on a hardware emulator i.e. another software sending/receiving data.. Which may lack the simulations for the actual hardware delays.
tested that it COULD work, or that it CAN work. The first is the dev pov "set it like that, it works". The second is a UX pov, stories and personnas are taken in account.
Man when some minor employee screws a document they fire him and his job is done, he can retire and enjoy staying in home with no other chances to screw up anything else never again
I just dunno who can fire this...
Self made, self tested, self guaranteed
If you cant do it, dont start in the first place, You murderer
My first year CS professor started the entire class with a lecture on how bad code can kill people and how we should take bugs seriously. It's always chilling hearing stories about this.
im a self taught engineer and this is just common sense, like bruh wtf
Can you share any other significant cases?
@@r4ych836I'm not sure if bad code has ever killed people other than via the Therac-25... I'm also not sure I'd want for examples to exist, for obvious reasons...
@@erikkonstasTeslas lmaoo
@@r4ych836 my last lecture in an AI course was solely dedicated to ethics. Our professor shared some examples, but one was that an ai made to make insurance rates for people started to give racist rates. Because it was trained on data going back to the 40s or something, the AI was biased to giving certain minorities worse rates than they should have been given
Been there. In the mid 1980's I was treated in a Therac-25. Fortunately, mine didn't fail. I did hear about the failure and it took all I had to walk in there every weekday for 30 treatment days, knowing that it COULD fail. I worked in software and had a sub-routine called SNO - for Should Not Occur that printed an error message and exited. I was amazed at how many times I hit that routine. I was very glad it was there. FYI - I am now up to 5 rounds with old man cancer and I am still here. The average for Mom and the three kids stands a 7, so I get to look forward for two more. Yea Me!
Happy your still here :D
Hell yeah, rock on! ❤ :)
Glad you’re still alive
You'd think the medical industry would get better at preventing this kind of thing. Instead, they've hired better lawyers to deny this kind of thing.
i hope thé family got a large payout from the company and the hospital
Their entire industry is a scam. House of cards built on lies used to prop up another house of cards. This is standard dark triad operating procedure. 100% expected, in fact, you should just assume it and look for the exceptions.
There is actually an ISO with very strict guidelines on how to develop such critical software, one thing which particularly stood out to me is static memory i.e. you know all the necessary resources beforehand and prepare for the worst case.
There's also a tracing level for the documentation, where the max level is being able to trace all use cases through requirements to every line of code responsible for them.
So there definitely are methods for prevention.
NASA also has some interesting design processes, if you're interested in reading. Using an old, specific JavaScript engine in space is one of the consequences I find quite funny.
@@blacklistnr1which one? Could you share the number of ISO standard?
Who allow this industry to function like this?
If there is something wrong with them, why they dont make them stop?
This is not a kindergarden to play with laser guns, or playing as a doctor
I wish the entire source code were online, but for those interested, excerpts of it can be viewed in an investigative report online.
could you link? i can't seem to find any code
AECL never released the source code
I suspect the reason they never released it is that there *might* have been way more horrendous bugs than that one (yes, exactly what I said)...
@@erikkonstas it's proprietary code, they're not just going to give it away for free. That same code is based on older IP from the previous machine. And I imagine future iterations will also have common code. Just like how Windows still has code from 20+ years ago in it.
Propietary code is a crime against humanity
The more you learn about engineering, not only software but hardware, mechanical, etc., the more you learn that world around you is held by duct tape and prayers
As someone who works in industrial automation, this is way too true! Our ability to put duct tape on a system without shutting it down, even the code, is pretty spectacular though!
Hahaha, what a perfect way to describe “held by duct tape and PRAYERS”
Praise the Omnissiah.
@@youkofoxy As a former electrician I tell you: Yeah, you better do, He is your only hope and salvation!
As an engineer, not entirely true. I am in the biomedical industry and maybe its just us but our products get tested. Like a lot. Some of the tests sometimes seem ridiculous but you are frequently reminded that regulations are written by blood.
Race conditions are notoriously hard to debug. Because you only have a couple milliseconds for the exact right conditions to occur to trigger the bug.
This is a race condition with an EIGHT. SECOND. WINDOW.
Had they tested it properly, they would have been almost guaranteed to find this. This is not just negligence, this is recklessness.
this is also one of the best examples of why you need to pay people to try and break your software. you will by default always enter things correctly, you wrote the software, you wrote the procedure manual, you know what you are doing. the user DOES NOT KNOW WHAT THEY ARE DOING, and therefore is not restricted by your assumptions. this allows them the freedom to screw things up in ways you never imagined. This is why it is preferable to pay highly trained chimpanzees (also called QA testers) to find these issues first (no offense to QA people, you are literal life-savers)
@@skellious I worked on FDA regulated software, and we'd recruit complete noobs to test it, maybe a project manager or someone who has no knowledge of software or the subject area. We called them "monkey testers" and they'd misunderstand just about every instruction, thus flushing out all sorts of bugs that knowledgeable users would navigate around.
@@d00dEEE We also do the exact same thing where I work in the NHS. I build an application and test it thoroughly. We then send it off to doctors, clinicians and ward staff to test before actually going live. Because they do everything so "wrong" they are able to produce errors me and my team wouldn't have thought of ourselves and then we're able to fix it before making the application accessable to the whole organisation.
Rust
@@bob450v4 🤣
I'm a full-time software engineer and part-time nuclear/radiation nerd. I've heard this story on other channels and read about it online, but nobody else goes into detail about the software aspects, which I find the most interesting. Great stuff LLL!
Well, too much detail is impossible, the code isn't out there at all.
They probably don't want it public cause it's recycled code that still gets used today 💀
I’m writing a book for junior developers, what in your opinion was the root cause? Failure to test?
@@danmurad8080 I think the testing aspect is very important and would've absolutely reduced the severity of the failure, but in this case when hardware is involved, relying too much on software to assume the hardware state without any way to verify it is begging for disaster. Even something as simple as a 3D printer would be a lot more hazardous without sensors to ensure the hardware is in the right state.
I have been writing assembly language code for over 30 years. I was cringing... watching this video.
Writing async code in assembly. Of course it had bugs.
More like: Writing async code. Of course it had bugs.
Most languages do not prevent data races, and I have yet to hear of a language that would help in this specific occasion without support from the hardware itself, i.e.: in this case the magnets and filter.
More like: writing async code without knowing that it is async so it is written as non async
It had been written by a hobbyist student ofcourse it would be dangerous. A hobbyist never has the mindset to think about every critical safety system that has to be implemented to make software safe
I feel like the bigger problem was the primary design choice of the programmer to have an interface where you can freely write any kind of data and have a confirm command at the bottom.
Every user would assume that nothing happens to the machine until the confirm command is sent. Why would he make the machine read certain values instantly long before the confirm command is read?
@@rivershen8199 Probably why you should hire a professional, not a hobbyist.
That error message though... "Radiation is either too high or too low" WTF 😂.
"50/50, let's do this!" - Medical staff
It just means radiation is out of permitted range.
@@darylphuah Yeah, but the error message is kind of useless. The doctor can't know if the patient is about to get some sweet cancer or not haha.
@@shimadabr The doctor also has no clue that that is the intended message, all they got was error 54 not the actual error reason
The message didn't say that, the message said 'dosage input 2' error. The 'dosage too high or too low' message was associated later on. The interface also said the user had only received 6 rads out of 202. The operator was totally correct in unpausing it. If the process only just started, there's no problem unpausing it and continuing to the end of the procedure
Reminds me of those 'scientists' measuring radiation in Chernobyl disaster to be exactly the level of the maximum value the device could show on the scale. Was quite plausible. 😀
I still don't get it, nobody blamed that 50/50 thinking medic ? I am not allowed to do things that have a 0.000000001% chance of harming someone.
It would be a public service to have a series on this topic: "Code that kills". There are many cases like this on which code that runs essential infrastructure end up costing lives?
Thanks for sharing this!
I sure fucking hope not, at least I'm not aware of any...
Considering how many other failings, most of the blame isn’t the code.
The decision to remove the hardware interlock, and just reuse the previous software (which was designed with the assumption the interlock was there), without extensive testing and examination was the biggest failing.
Code that kills
Would it write it for me?
With your hand so still, it makes me believe
In the software's sins
Let me compile now and never die
I'm alive
Boeing MCAS
Reminds me of when I went to the dentist to get an X-ray, and saw that the machine was running Windows Vista. I felt like I was in a Final Destination movie.
Uh, sorry to say, but you practically were...
*[CONTENT WARNING] Careful before clicking "Read More"*
Amongst other shit, very early versions of Vista were notorious for just up and crashing out of nowhere, or even not booting at all for no reason (the then infamous Red Screen of Death); if it crashed, who knows what would happen to the ray emission???
@@erikkonstas that’s more terrifying to think about
Luckily for dental the usually the xray emitter is its own device and doesn't use an external pc.The 'film' or reciever is what is connected to the pc. Unless you are getting a panoramic then its up to the manufacturer lol. Hopefully the engineers put hardware interlocks on everything now .
hell nah the military also uses windows xp america’s screwed
Just to remind, most of the metro systems are running from a floppy.
I hate this "too high or too low" type of error. It's like searching for an email on Lotus Notes: "Your search returned no results or too many results". Please be specific with error messages
ERROR 418 I'm a teapot.
ERROR 88: something's wrong but I won't tell you what
This is common in trying to sense things with computers- they usually have a limited sensing range and once they reach their limit, they typically don't give any other readouts besides the highest or last recorded value. This can be either because of software or hardware limitations and the programmer can only put out his guesses on what went wrong. I have only worked on low voltage electronics so it wasn't that bad but you get the point- the sensor was clueless because it was overwhelmed.
I forgot how much I hate Lotus Notes - thank you for the reminder 😅
@@keithou4389 That is not limited to computers...
... 3.6 Roentgen.
"Entirely software controlled in 1986." Is a scary sentence.
Sounds like a chain of failure from the machine, to the hospital in all regards. The machine manufacture did not care and the Hospital also did not care too.
I can understand the mistake by the hospital administrators. They are paying top dollar for cutting edge equipment, so they kind of expect it to be made with high standards. But the main fault is at the company, it's a chain of negligence.
Not caring should be a crime for things like this
@@Rin-qj7zt It's called negligence
There were six incidents though in different hospitals. The machine did something it was not supposed to do, and the user interface lied about it.
@@Rin-qj7zt it already is. They just hire good lawyers.
The bigger takeaway from this story isn't that ancient code lacked logic and user input safeguards. Rather, that Therac's upper management made unethical design choices to lower the cost of production. Coupled with minimal pre-shipment testing of said units. "It was decided to remove the physical (electro/mechanical) safeguards and rely entirely on software to lower costs"!
Even the name "Malfunction 54" sounds scary for a simple bug, and then is MORE scary when you see "Malfunction 54 (12777 rads delivered)"
Yes like those radiotherapy machine go particle accelerator
The pure helplessness is so sad
@@PencilPlane I can't imagine the dead look from the operator looking at the screen. Thinking that they might kill a person due to a high dose of radiation. Imagine the trauma that they have to endure.
Definitely gives the same vibe as the AZ5 button at Chernobyl
I remember the Kyle Hill video on this, and he glossed over the software bugs part of the Therac tragedy. This shines a different light on the importance of software safety, especially in mission critical or life saving tools. Kyle's focused on the tragedies and their relationship to the ongoing nuclear age.
Very different and interesting perspectives.
As a software engineer, it is always chilling to recall this story.
@@dixztube Kyle? Yeah, his videos, even about very serious topics, started to feel like History Channel talking about aliens.
@@vanjazed7021 I thought his target audience were kids/young adults.
I agree, I think both videos shine on their own different (but valid) intended context and thus, their own perspectives.
While Kyle's video focuses more on the whole incidents as their target audience is for the broader masses, this video focuses more on the software itself. Nevertheless, I also think both videos do succeed in bringing the negligence and recklessness of AECL and hopefully can add more to the topics on code safety as a cautionary tale.
One of the most amazing things about this ordeal is at one point, AECL issued a bulletin telling the hospitals to use a screwdriver to pry the up arrow key off the VT100 keyboard, and to glue the key switch in place so it couldn’t be activated. The FDA was not amused by this “fix”.
Kyle being a bad person? noooooo!!! how could that be???????? 😱
Software never got tested until it was shipped? Sounds like all the AAA games coming out
*Modern AAA games.
The difference being of course, that AAA games are extensively tested, despite their many bugs, and that they are not safety critical systems.
Yeah, but john, when the pirates of the Caribbean game breaks, the pirates don't eat the user.
@@khatdubell Scribbles down game idea
Yea
Something like this, where someone's health is at stake, should have had a team of programmers agreeing on, and reviewing each others code. The root cause wasn't the lone programmer - it was all those above him who signed off on that lone programmer. Disgusting working practice and yes, that lone programmer should also have recognised the danger immediately.
Basically the plot to Jurassic park
To me the worst part of this was the removal of hardware interlocks. Software can NEVER be relied on 100%, even if it has been extensivley tested. Physical switches and relays should ALWAYS be in place for safety critical applications. If there were hardware interlocks in place in the Therac-25 this would never had happened. Sure, the bug would still have been there, but the machine couldn't have hurt anybody as the emitter PHYSICALLY would not have been able to activate without the magnets in place.
He was a lone programmer, working in assembly on a rather complicated machine. He may have been a "hobbyist", but I reckon he is more skilled than many current software engineers.
Mistakes like these happen, logical errors and race conditions are incredibly common when working on any complex system. He "should have" caught it, is not expected. In fact current software engineering practices expects programmers to make mistakes like these. Which is why as you said, we have pair programming, code reviews, unit testing, etc.
In critical systems like this, break testing should have been done to identify potential failure points.
@@nelsonahlvik6650Not just that, what if some hardware filter breaks off the machine and BOOM, EVERYTHING within a 5km radius is exposed??? Yes, the software would be bug-free, but the plastic broke physically so the radiation core was out there, not controlled by the software anymore...
@@erikkonstas The Therac-25 (and its older sibling, the Therac-20) used a double-pass accelerator that did not use a radiation source (such as Cobalt or Cesium) like older machines. The double-pass system uses a magnetron to create a beam, which only activates upon operator input to start a treatment. So, thankfully, if you're not in the same room as it, you're probably fine. This was probably the ONE good thing the Therac-25 had going for it.
As a sidenote though, incidents of exposure via radiation sources from old radiotherapy and xray machines have happened before, and it is not pretty. I would imagine most radiotherapy machines nowadays use a magnetron instead of a radiation source as it's much safer, more easier to maintain, and easier to decommission. No deadly radiation sources, all you need to do is disconnect the power and it's powerless.
I'm working for an organisation that creates training checklists for operators working and operating machines in manufacturing sector. This video is an eye opener for me to why I must be more focused when writing my code. People's lives depends upon what I write.
Please don’t kill me. Thanks
Sobering and sad. A reminder that clean, thoroughly tested code is crucial, together with the assumption that there still may be bugs no matter how many edge cases are accounted for in the tests.
and hardware locks are the most important, they make sure that even if the software goes wrong nobody gets hurt
@@nelsonahlvik6650Or if vulnerable parts of the hardware go wrong, the locks protect the entire vicinity (e.g. if the locks worked correctly, Chernobyl wouldn't have exploded).
I'm studying programming in university right now and this was one of the examples my professor used to demonstrate how a mistake in code could have massive and sometimes even fatal consequences. He also pointed out that with more testing and a better graphical user interface this all could've been avoided.
Testing can NEVER demonstrate the absence of defects.
I started my career in Dental X-Ray designing and manufacturing company as a Junior Embedded System R&D engineer.
There the hardware team has the master role always they critisize and having less trust in software😂. I remember how regorous regression tests they've done before going to launch a product.
you are my man, ... if I may ask you. I noticed I am not always given those lead filled radiation protective ponchos (i don't know the exact name) any more nowadays when a dental X-ray is made on me. Am I right thinking it is because the newer (cone shaped beam) machine produce less radiation dose, and also less stray radiation with the cone shaped beams ? ..or just negligence and I should ask for one.
I first heard of and learned about the Therac-25 in a college technological ethics class.
But I never knew what exactly happened in the code! So interesting and tragic!
This style of video is really interesting, it would be pretty cool if you could produce more videos with stories like this
Examples that immediately come to mind are the assembler bug in the moon landing (could be fixed) and entering imperial values into a metric controlsystem by NASA, I think (crash and burn).
Yeah, I really like these kinds of videos. Kevin Fang has been doing these kinds of videos for a little while now.
@@lodginWoah, somebody else knows that name! His channel is severely underrated, I may say...
I agree, except for the part where there are casualties... Ariane 5 (wrong direction which led to a crash due to a FP error) comes to mind.
yea simple and welle xplained demonstrated, more MORE
im a cancer survivor i had chemio and radio and pills, i was curious
the nuking machine is intense for sure, when nurse use 2 inch lead vest "oh its just to protect me from being nuked alive by your treatment"
they literaly told me "to kill cancer cell we kill you and cancer and hope you survive while cancer die"
O_O ok lets try lol
New fear unlocked: going into a surgery and the machine just bluescreen mid-surgery
Writting software is a weird experience. It doesn't matter how many scenarios you've simulated and prepared for, there's always something that WILL go wrong.
If you go into military/FAA spec hardware verification, it reaches a point where EVERY bit of every variable MUST be toggled. The most advanced testing methods either spam your inputs with every possible combination of data, or they use Mathematical proof software (!) that verifies that no failures are physically possible. The airplane control software CANNOT fail, and you must prove it as such.
One guy. Assembly. No testing. I might not sleep tonight...
The problem with testing is that you test what you think you should test. If they would never had the idea to change the mode afterwards, this bug might have been unnoticed despite testing.
That's only if you do basic software testing. in reality, there are like 15 levels/variations of types of testing and one of those is throwing random inputs at it repeatedly to see what failed
When I graduated I was offered a job as a software engineer at a biomedical company that sold medical hardware to hospitals. I didn't read to much into the details but it was a machine that was built to automatically feed (on a timed interval or when certain conditions are met etc.) doses of medicine/substances via IV to patients. They also sold heart rate monitors etc.
The pay was good and the job was very enticing but I could not bear to accept it, precisely because of things like this, that were shown in this video. I could not handle the stress. Constantly having to worry if my spaghetti code is going end up costing someone their life (accidental overdose). Fuck that! I know there are engineers out there that write better code than me that would be better suited. I have no problem admitting that. I don't need this level of worry in my life. I am good.
Something like this happening and me being responsible has to be one of my biggest nightmares as a software engineer.
You are also self conscious enough to anticipate these things happening. Which alone makes you more qualified than most. Had these managers more of that, those deaths could be avoided. But greed clouds judgement.
I am with you haha, I don't want to feel guilty for the rest of my life
So work on Windows Update. Where failure is not an option, it is a certainty! :)
by writing what you wrote, you are more qualified for that position than 99% working in the medical field.
I used to have your attitude, but then I found a lot of programmers in medicine just have a better "fuck it" attitude than me.
"Who could have known"?
You, you dummy, if you did your homework.
I'm still confused (other than for profit motive) why the same machine for 180 rad would be used for 12.5k rad dosages.
Just a guess (don't understand this subject), but I think it's because the x-ray mode projects a strong beam that is then "regulated". The problem was that the "regulator" was not in position.
@@shimadabryou're pretty close.
To produce X-rays, the machine accelerates electrons and then crashes them into a tungsten target. The target stops the electrons and X-rays are produced. The dose rate from the X-rays is less than 1% of the dose rate from the electrons - most of the energy is lost in the target as heat.
To produce election treatments, electrons are accelerated with no target in place and deposit their energy in the patient directly.
So for the same electron beam current, the X-ray dose is orders of magnitude less than the electron dose. Or, put another way, to get the same dose, the beam current must be orders of magnitude higher in X-ray mode than in electron mode.
I would say, because the software doesn't check the dose before sending it (aka: the dosage doesn't have a prefixed limit for each mode in the software) it got sent anyways
Let's say you want to send 25000 electrons, but you put it on X rays, it will do it because it doesn't have a safeguard on it, that tells the system not to do it since it doesn't have any hardware safeguards either
@@DevinBaillie- Thank you for that info. I've always wondered why these machines could produce lethal radiation doses? The explanation of the software glitch made perfect sense, especially given the vintage of the equipment. But the magnitudes of higher overdose of radiation never made sense to me. I'm betting most reading these comments after watching these Therac videos still don't get it either. The now known software glitch would cause the unit to enter X-Ray mode, without enabling the electromagnetic beam deflector to hit the Tungsten target (instead of the patient being the target of 10-20,000RADs). Poor victims of these machines, ☢️ probably one of the longest most agonizing ways to go! 😱
as a programmer, I've always been afraid of going into something as serious as the medical field. I'm not always 100% confident about the code that goes out as I don't have testers, my code get's tested in live environment. I can't have blood on my hands.
I work in industrial automation, and there is a similar issue for us when something is critical for human safety. Safety critical aspects of our programs must always be tested and validated by a third party, and I wouldn't have it any other way. That said, usually safety stuff is pretty simplistic and pretty much guaranteed not to fail even before testing. It's a whole other world working on a medical device like this.
And idk if you go to jail for it too yea
Everyone will die in the end...
I took a safety class that presented an interesting perspective on the question of "can software fail". You seem to say yes. In the class, they claimed that software does not fail, because it always does what you tell it to. Whether what you told it was what you wanted, that's where you get problems. But that's not the software failing, that's you failing.
"can software fail?" Can it fail at what? Can it fail to do what we expect? Absolutely. Can it fail to do what it should do according to its instructions? Also yes, because rarely you can get a random error like a flipped bit in RAM, or even an error in the design of the CPU. So I would say software can fail either way you look at it.
Those are hardware failures though.
@@WilcoVerhoef Cosmic rays can also cause bit flips.
@@n1ppe Yup, that's the hardware failing
@@WilcoVerhoef Which in turn could cause the software to fail or do something it wasn't supposed to do, which could've been prevented if you had made it better. So software can fail and have glitches, so I don't understand your point.
I actually was on cancer treatment in 2023 for a tumor i have in my brain and Iv'e been in one of these machines. So scary how it could have gone wrong. Thank god i didn't live in those years when these machines were that dangerous
I feel the enjoyment you put into this one, thanks for the great content LLL!
I had a great time with this one :) Thanks for watching!
This is exactly why I fight so hard as a programmer to employ good testing strategies, a testing plan is always better than a good lawyer in my opinion.
I'd not want people to die for my mistakes, I'd dedicate heart and soul to good software engineering.
8:35 Hardware interlocks and oversights should always be included
"Too make things cheaper"
Ah, another money over lives situation
As a programmer myself, this was like watching horror movie. Like nowadays even the internet form that You use to order socks has more automatic tests and testing process then that machine that x-rayed those people to death. Really I cannot imagine the despair to be the ones that got that killing dosage :(.
Like ... every programmer I know uses more or less defending coding strategies, I just cannot imagine I would even allow the machine to emit that dosage in too short timeframe. Just, I am shocked.
As a Software Engineer, I can say the statement "Software Will Fail" is very true. The only real way around this is redundancy, and in software, that typically means multiple independently developed systems which must all agree on an answer for it to executed
Unbelievable that they were so careless about such a critical piece of code.
Don't forget, in the 1970's even 16 kilobytes was a lot of DRAM. Devoting a few KB's to error codes or safety redundancies would have been a huge deal.
This machine was ahead of its time. The computers in 1986 were terrible; it's amazing they even attempted to make a software-controlled machine.
We had a whole unit on this in undergrad comp. Ethics. Crazy how this went on for so long
I'm impressed enough that the one single developer writing assembly got anything to work to start with
Well, they also made an error when naming that thing. Therac-6 plus Therac-20 would be Therac-26. Classical off-by-one-error.
"Therac-6 plus Therac-20 would be Therac-26. Classical off-by-one-error."
Good one
This situation sent a chill down my spine. It reminded me of the time when I was designing security and door opening systems and the fears I had of software bugs or electronic design flaws. The extended weeks I would leave one system working alone 24/7 while another system monitored it.
I can't believe that an industry that ships a machine with the lethal potential of this one would not test for it or even be tempted to eliminate fail-safe mechanical systems.
the virgin pre-production testing vs the chad testing in production
I beg of you, in cases like THIS, BE the virgin!
One of my Comp Sci professors, Clark Turner, was part of the investigation into the Therac-25 incident and I remember him telling us the story about how he and another person found the race condition that led to these people's demise. They wrote a paper about the investigation. Crazy stuff
Having a hobbyist write the program honestly isn't a huge error in my eyes, as I'm sure he was plenty skilled. What blows my mind is that it was never properly tested to ensure this type of thing was impossible. It doesn't matter how skilled you are at programming, you will make mistakes. We rely on others to help us catch them and correct them.
Hey Boeing outsourced the mcas coding to india for only $4 an hour they save a ton of money
In this case, the programmer was programming in assembly. Assembly is an extremely difficult low level language that hobbyists should not be using to make medical devices with
True but i dont think it was an hobbyist, unless he wanted to suffer. Cause Assembly ...
@@kwiky5643 well back then there wasnt anything else
@@EperkeDashh total lie. As a devout FOCAL programmer, you disgust me.
I'm putting exactly 0% on the developer.
The company that contracted them didn't do their due diligence, and you can't expect a solo dev to account for EVERY single edge case.
They chose to test in prod.
I'd put _some_ culpability on the programmer, but the majority of it definitely falls elsewhere, from the lack of a physical failsafe (compared to previous models) to the cultural perception that "software doesn't fail". The simple fact that these incidents were preceded by any error message _at all_ indicates that the software itself detected something amiss, it just wasn't capable of identifying specifically what or why.
Great Video!
I can also recommend Kyle Hills video about this software bug :)
I was a surgical lighting service technician who spent 20 years and 6 months on the road and in the workshop repairing, designing, developing, and modifying imported equipment to meet local standards with occasional type testing. I learnt to no longer be surprised at how manufacturers used inappropriate materials, components, and mechanical and/or electrical designs that were sometimes fundamentally unsafe. Often, the worst features of a product would be forgotten and repeated a few equipment generations later, with each new model being fundamentally more complex, less reliable, more costly to own and with an ever shorter lifespan.
I'm so glad to have left the industry and hopefully all of my trailing liability behind.
Yesterday I took an exam for a computer science / electrical engineering and those races were also part of that course. Now I feel a little guilty for having somewhat skipped over that part.
Absolutely insane. Incredible video.
YO! thanks Lewis
Hi lewis!
Already saw Kyle Hill's video on the topic but its cool to see a more programming oriented approach
5:10 "Single hobbyist programmer alone in assembly" was all we needed to know
As someone who knew and (like EVERYONE) was "bored" to do testing on my software, I have now done a complete 180 degree turn and testing is ALWAYS in my mind!
Test your software people! Write A LOT of SIMPLE and easy to debug tests (because remember, tests are code as well and they may have bugs)! And try to think about edge cases!
For me as a programmer for machinery it is the typical "blame everything on the programmer" thing. It is normal, that code has bugs and you will never find all of them. Therefore mechanical and electrical safety locks have to be implemented to prevent such malfunctions. In this case the software didn't do much wrong. It even gave an error message. The main software problem was, that it was possible to skip the message and continue.
The main problems in this case were the removed hardware lock in this newer model of the machine for cost reduction. And the decision of the management to let the costumers continue using this devices, even after more than one accident was reported with this machine type.
As a programmer myself, it's absolutely horrendous seeing stuff like this. There are so many people who don't consider what impact their code might have. For example, when your phone decides to force an update, I guarantee you people have died because they have been using a flashlight and their phone decides to update at a critical moment. I almost had this happen to me once.
Schools really need to teach programmers things similar to engineering ethics, you really need to consider the most extreme cases for what you are doing. Someone's life is on the line.
“If something can go wrong IT WILL go wrong, sooner or later!”
What a great video! It's definitely wild that people once thought that software was invincible.
bro wtf why does it start before you even hit the start button, bro why doesn't it double check the conditions with something that dangerous and change if it notices new values!?!?
This highlights how even simple syntax errors can compile and run, but not work as intended. There's an old joke that only people who code will get, but it's hilarious because everyone who codes in multiple languages has had to contend with the differences in syntax:
if (GoNuclear = 1) {
launch_nukes();
}
else {
remain_chill();
}
Underground explosives engineer here. If I made even a minor mistake (in millisecond timings by primadet). People could die, or the blast would go terribly wrong. Note: Everything on earth dies, the people would die anyway.
One example I have from my fathers experience was in which he had assembled and installed robotic arms and the plc's he'd designed at a car plant and the programmer came in to do the software setup and calibration my father had made him aware the safety isolation switches hadn't been completed and he was like no its fine so he proceeds to send inputs to the robotic arms which also had in production car bodies on as you can guess the arm slammed through the roof of thirty cars as my father had to attempt to stop the incident the following day those cars had scrap marked onto them.
So from how I understand things, it was an absolute miracle that more people didn't die from this... thing. Absolutely astonishing.
Shoutout Kyle Hill for covering this 2 years ago, his Half-life Histories series is phenomenal!
that's horrifying. Especially since so much of our society now depends on software.
Why would the machine be able to physically give such a lethal dose in the first place, regardless of the software...I mean NO ONE is going to be prescribed such a high dose... ever !! 🙄
That’s almost as good as “an unspecified error has occurred.”
Legends test in production: Ocean gates
Even in the ‘90s there was still that attitude of software doesn’t fail. Take the despatch software that the London Ambulance Service started using in the ‘90s (LASCAD).
I know hindsight's 2020, but I don't know why there wasn't an event handler, even a basic one, so that nothing would happen without operator input. I understand this was one person, and I truly admire that they built this by themselves. They had to be under a shit ton of stress because someone that talented should be able to foresee the issues with reading inputs prematurely.
Everything happened with operator input.
@@khatdubellI mean an explicit event handler, such as a button. The machine was doing stuff while she was still entering/correcting data. Nothing should have occurred until she was done entering the information, and she hit a commit button.
@@theinquisitor18 I see.
Good example of how dangerous OT bugs can be. If you're going to rely solely on software to control equipment, then you had better do some serious testing to make sure bugs that could kill someone don't exist.
2:20
It still said OP. MODE: TREAT X-RAY in the bottom left. Unfortunate that the operator didn't see or recognize that.
I am guessing that the video was a recreation and wasn't accurate to what was actually displayed on the machine but i could be wrong
"software can't fail" after it's been coded?? I literally cant imagine anyone ever thinking that. People had some hubris back then
The problem with bugs is you have to test every condition you don't plan for. It's always some obscure condition that no one thought about that happens and causes the issue. There is no way to test every user accident in freak cases many times. You can test code for function but you can't test user situations. There will always end up some strange case where an operator did something you had not planned for.
Well said. In other words, reliance on testing can never deliver defect-free software. Instead, it's necessary to somehow _prevent_ errors in the first place.
Man, this sound when you said the rads scared the hell out of me 😂
"This error says the radiation delivered was either too high or too low..."
"I would greatly appreciate more specificity."
problem is definitely testing.
Please tell me the family sued the manufacturer for negligence
This is also the case I learn in my university embedded system class. Good case study how simple code error lead to big problem.
The Therac 25 a mandatory study for all engineering students.
"Software can't fail" has got to be the single most terrifying thing to hear someone say when they're creating medical equipment.
I mean, even in hardware you don't rely on a single point of failure, why would you do any different for software?
I wonder what some software bugs from old times could do in software controlling the nukes. I mean those systems are pretty old...
Yeah, forget about "nuclear deterrence" - the only reason humanity has managed to refrain from starting a nuclear war yet is the collective fear of some off-by-one error in an old piece of COBOL code causing the missile to detonate right at the start. How embarrasing would that be!
But that can apply to software from the "new times" just as well...
As an IT specialist i can say we love redundancy's... for this exact reason, a machine reserves the right to fail anytime it wants. So you need at the very least two failsafes.
This is why I won't ever code anything where human life is at risk
If i was the company i would hire a software engineer who is verified and thinks about everything, not a hobbyist coding student
@@309electronics5its not about the developer but the company does not fully test the system. The right thing is you should employ a separate QA to handle this kind of edge cases
Except that verifications are usually a matter of paying the fee and sitting through the course and have next to nothing to do with competence.
@@309electronics5They want money, sweet sweeeeet money, that's all. Sadly
Hoare calculus and verification
This reminds me of picture from a Testla car recently that stopped functioning completely due to software update error. May be it's time for humanity to wake up and realise that relying on software too much creates a fragile world that can collapse at any moment. We should maintain some mechanical aspects and use software to optimise not to fully control stuff.
As bad as the code might have been written, you gotta give it to the guy to actually put in an error message that informs the operator that they are at fault and should rethink their settings.
It sounded like, from the video, he had gotten a lethal dose before the error message.
@@khatdubellNot what they were saying at all...
@@erikkonstas Sure about that?
ua-cam.com/users/clipUgkxNT7FBU-YOqzPFtTk6qApltMuSbuyyFMB?si=78CImXtDNe6S-y9-
Everything the operator entered was correct by the time the operator actually initiated the procedure.
I feel bad for this dude. He might not even know they defeated the safety device and might not have been rad qualified or given a source to really check other than timing and position, and the year... Imagine him giving the hospital a hundred miles of docs and procedures to test and they just, went to production... Imagine the operator not even reading the manual for a device like that or watching it work with the covers off to understand it's operation.
Still though, get a code review at least. That said if I was this dudes associate and he said hey look at this, I wouldn't admit my eyes hit it unless I told him it was champion or bug free, and even then was the 8 second race condition in the docs? The reviewer might assume a quick click and done sub half second race. What a tragedy all around.
The fact of that they had no audio and video monitoring yet still proceeded to treatment is baffling
that's so screwed up wtf
Why? that was a mistake... no one did it on purpose.
@@ishark7822 i phrased that wrong, i just think it's insane that bad code can literally kill in some cases
@Tartarus144 it's insane that these corporation will do anything for profits, even if it puts someone's life at risk. I'm certain that some employee might have asked for better testing and got his concerns ignored. I see that even today
@@666pss agreed
This is not a programming mistake. This is the result of a poor software development process aka a software quality assurance issue. This issue should have been found during testing.
Paying attention at 2:30, I noticed the op mode still says x-ray. Oh no…
this is so reckless, I cannot comprehend it
The one person that write all the assembly and didnt test it managed to kill only 6 person !
Honestly, i'm very impress
haha. I mean back in the day everything was written in assembly because programing languages weren't a thing yet.
@@woosix7735 agreed !
But just thinking about all that could go wrong I wouldn't have had the shoulder for the job xD
absolutly true
@@woosix7735 Programming languages were a thing since the 40s, what are you talking about????
It was a student who was a hobbyist what ya expect? A real engineer would have thought about everything
Your explanation of the actual failure is not correct, as far as I can tell, reading a report into the incident. Firstly, you can't shape X-Rays (or any photon beam) with magnets. These magnets are used to shape the electron beam when in electron mode.
Secondly, the machine produces an electron beam as its native source of radiation, and needs a target in front of the beam to produce X-Rays instead, as well as a flattening filter which attenuates a lot of the beam energy. Producing X-Rays requires the maximum electron beam power, a power around 100 times greater than that used in electron mode. Therefore, the patient did NOT "receive X-Ray radiation at electron radiation doses", they actually received electron radiation at X-Ray radiation doses, since neither the target nor the flattener is placed in the beam (7:43)
To me the worst part of this was the removal of hardware interlocks. Software can NEVER be relied on 100%, even if it has been extensivley tested. Physical switches and relays should ALWAYS be in place for safety critical applications. If there were hardware interlocks in place in the Therac-25 this would never had happened. Sure, the bug would still have been there, but the machine couldn't have hurt anybody as the beam PHYSICALLY would not have been able to activate without the magnets in place.
For many systems today "hardware interlocks" are not feasible. It is not possible to implement, say, antilocking brakes or fly-by-wire systems with hardware performing the safety role. Or say a medication calculation software or patient medical record system. A wrong dose or a missing re-call for a patient that has cancer or the wrong patient's data shown to a doctor can all kill.
I used to work for an electronics company that designed and manufactured the electronics for Bobcat. They were obsessive about software testing. Every product had a 2" thick testing manual that took at least a week to perform. And any change to the software, no matter how insignificant, required full testing, and approval. Even if the software wasn't changed, but just recompiled for production, we were required to perform full testing and approval before use. Their argument was that it was a small price to pay to avoid a problem in the field with their customers.
Alternate Title: How a simple loading screen couldve saved 6 lives
These ppl wanna be saved