Why Japan's Moon Lander Crashed Due to An Unbelievable Computer Bug

Поділитися
Вставка
  • Опубліковано 8 вер 2024

КОМЕНТАРІ • 3,1 тис.

  • @mcarpenter2917
    @mcarpenter2917 Рік тому +3215

    That's what happens when you keep changing the software spec's of a project. It's a bit hard to believe that they changed the landing site without rerunning the simulations.

    • @MarlinMay
      @MarlinMay Рік тому +133

      This! This all day.

    • @pjotrtje0NL
      @pjotrtje0NL Рік тому +120

      Your first remark is very true, and not just in an aerospace environment!

    • @Powertampa
      @Powertampa Рік тому +218

      That's like releasing software without doing unit tests just right after the remote guy pushed ten thousand lines of code

    • @ailivac
      @ailivac Рік тому +228

      I feel like something in the sensor processing design isn't fundamentally robust enough if it can be this easily confused by real terrain features. Maybe they can add a second radar or lidar sensor for dissimilar redundancy or to differentiate unexpected yet real inputs from sensor faults.
      We all know what happens when you run a safety-critical algorithm on a single AoA sensor...

    • @Ni999
      @Ni999 Рік тому +58

      Exactly! Mission creep eats in to the project time line and system tests degrade into delta testing for success instead of system testing for non-failure.

  • @kjgoebel7098
    @kjgoebel7098 Рік тому +902

    I'd like to see an episode of "Things KSP Doesn't Teach" about instrumentation. How air/spacecraft instruments work, their limitations and quirks, and how they can fail.

    • @JohnWilliamNowak
      @JohnWilliamNowak Рік тому +39

      I'll second that.
      The Soviets had a number of uncrewed vehicle losses because they used ionic sensors to determine the orientation of the vehicle, which would fail on occasion. On the other hand, the gyroscopes aboard Apollo 13 held true despite being pushed well outside their comfort zone. Some sort of video about orientation sensors would be very enlightening.

    • @BabyMakR
      @BabyMakR Рік тому +4

      Yes please!!! We need more of those videos please Scott.

    • @ferdievanschalkwyk1669
      @ferdievanschalkwyk1669 Рік тому +3

      Another vote. I see it in formula racing where drivers are having to "fail" various sensors to address issues with the power train.

    • @eekee6034
      @eekee6034 Рік тому +2

      Yep, me too. I think I'm aware of the issues already, but I'd like to know how different real sensors would be.

    • @Spacedog49
      @Spacedog49 Рік тому +6

      @@ferdievanschalkwyk1669 As a former Formula 500 driver, the fastest lap times were NOT the shortest distance lines around the track. A computer simulation takes the shortest distance, while the faster drivers took a slightly longer, but faster path that defied logic.

  • @miroslavhoudek7085
    @miroslavhoudek7085 Рік тому +1123

    In my personal experience, people insufficiently care about aerospace software. I worked in a software company that worked for ESA and we were always pretty much ignored (e.g. in all presentations of our local space agency). But when some other company made a screw for a satellite, it was plastered all over their presentations. There were literal delegations going to take a look at the space-screw-producing machine. Such an interesting visit, you see, to a hall with machining equipment, clean rooms - that's the "space stuff" in minds of people. Something you can touch and see. How do you brag about a company with people sitting at their PCs? Nobody cares. Even if these are the guys whose work ultimately decides whether these magical screws end-up doing something or are splattered over the moon.
    I don't care about the publicity but it's the mindset. Everyone focuses on the aluminum this and titanium that - and software is always the afterthought. We can change that anytime. We can even send an update to space ... so, why should we think about too hard? Bam!

    • @deang5622
      @deang5622 Рік тому

      Good point.
      And I think it is because it takes a higher level of intelligence and technical knowledge to understand software systems and the media and others can't understand it.
      You only have to look at any news article published by the main stream media, on television, in newspapers and you will see the errors the journalists make, the incorrect use of terminology, the lack of detail and you walk away realising the news article has told you almost nothing.

    • @jayasuriyas2604
      @jayasuriyas2604 Рік тому +20

      oof

    • @windywaz
      @windywaz Рік тому +137

      Boom! As a retired architect for space sensor payloads, I can say you are spot on. I watched management spend all sorts of money on convenience tooling but if SW wanted licenses for software production and testing tools, oh God, you got run through the gauntlet.
      So how many times must a company learn these lessons? Simple, once per program.

    • @Mernom
      @Mernom Рік тому +55

      It's the same attitude all over the place. Games no longer ship out as completed projects... 'we can just patch it later'.
      Mamy other fields also do shit like this.

    • @B_dev
      @B_dev Рік тому +19

      Software in general too

  • @ReverendTed
    @ReverendTed Рік тому +563

    It continues to amaze me that we managed to safely land astronauts on the moon AND have them take off from the lunar surface and return home, several times. Obviously, having actual humans present makes a ton of difference, but the number of things that could have gone wrong but didn't is mind-boggling.

    • @MarlinMay
      @MarlinMay Рік тому +231

      The brain is a wonderful flight computer.
      Lander: I'm going to land here.
      Human: Dummy, there's a rock the size of a McMansion there! Gimme manual control.

    • @a4d9
      @a4d9 Рік тому +169

      The first moon landing was saved by the astronauts: the automation on the lander was going to put it down in a field of big boulders.

    • @unflexian
      @unflexian Рік тому +92

      think about it like this: humans have managed to control powered airplanes since the start of the 20th century, while autonomous aircraft have only just appeared in the last decade or two. humans are just that versatile

    • @raifikarj6698
      @raifikarj6698 Рік тому +31

      ​@@MarlinMay I am howling, when I pictured this in my head with astronaut Slapping their computer and called it dumb.

    • @technocracy90
      @technocracy90 Рік тому +110

      One of the NASA research reports justified the cost and risk to send human astronauts to the Moon with the allegory says "Human brain is the most lightweight and easy-to-aqcuire real-time non-linear computer"

  • @maurice_walker
    @maurice_walker Рік тому +481

    In their official debriefing, ispace actually admitted that it's primarily a (project / program) management issue, not an engineering issue. That gives me hope that they might actually learn something from this.

    • @rspawn
      @rspawn Рік тому +22

      most underrated comment

    • @curtislowe4577
      @curtislowe4577 Рік тому +22

      Life imitates art: a common problem in the Dilbert comic results in utter failure.

    • @philkarn1761
      @philkarn1761 Рік тому +54

      It's almost *ALWAYS* a project/program management issue, not an engineering issue. This was also true for Mars Polar Lander and for Mars Climate Orbiter (the one that famously mixed up imperial and metric units).

    • @tomhenry897
      @tomhenry897 Рік тому +2

      Don’t bet on it

    • @SayAhh
      @SayAhh Рік тому +14

      ​@@Josh_728 Get with the program: in 2023, we measure things in bananas

  • @robertbarron7660
    @robertbarron7660 Рік тому +100

    It's very interesting that this is almost the exact reverse of the famous 1201 alarm on Apollo 11.
    In that case the computer restarted and generated errors on the astronauts control panels. But because they knew that they were at the right altitude per the flight plan they had confidence that they were still flying correctly and Neil Armstrong brought the lander down safely.

    • @warrenpierce5542
      @warrenpierce5542 Рік тому +8

      Source of 1202 and 1201 alarms was traced to the rendezvous docking radar, used for rejoining the command/service module was inadvertently left on, at the same time the radar for landing, the only one needed for the decent phase was running. This overwhelmed the lunar module computer, but mission control knew it was still safe to land because of one man at Huston.

    • @robertbarron7660
      @robertbarron7660 Рік тому +5

      @@warrenpierce5542 yes, when you go into the details then these are different cases. But in the abstract, in both cases the computer was confused because it got signals which were unexpected and didn't handle them well. In Apollo's case, the human was able to use additional information to recognize that the problem wasn't severe and in this case - there was no human.

    • @larrybud
      @larrybud Рік тому +1

      @@warrenpierce5542 In Mike Collins' excellent book, he mentioned the 1201 and 1202 weren't exactly "well known" issues. Took a bit of "looking up" (quickly, albeit)

    • @richardmogie9675
      @richardmogie9675 Рік тому

      That second antennae wasn:t inadvertently left on. I saw Buzz sheepishly confess, the engineers didn’t think the same way he did in an interview.

    • @purnachandran87
      @purnachandran87 Рік тому +2

      Just realized that manned missions are technologically easier (skill of pilot) than unmanned soft landings that are possible now due to the progress of software systems.

  • @_Mentat
    @_Mentat Рік тому +620

    My experience of being a software engineer is that the code has to be tested every time. It's amazing how often things that can't go wrong do go wrong.

    • @hanskloss7726
      @hanskloss7726 Рік тому +6

      It is not a sw that changed but the parameters of the flight. You may of course argue that the sw was made for the particular landing zone which I do not buy.
      I may be mistaken as the video is the only source of my knowledge of the situation - sort of like this radar was. So you take a peek at the surface with radar and see this crater with it or rather a human having visual would have seen the crater - the landing module saw just a point on the surface which was 3km higher then the previous point it peeked at. I suspect what they would have needed to do is to have more points that radar is measuring especially from distance and make an average out of it or use some other technique to see where one is. When much lower this would also needed to be done to see if there is no big stone occupying part of the landing zone. I suppose this last thing was eliminated by assumption that the landing is going to be done on the flat empty surface by choice of the mission control. I suspect if they were landing on the water/liquid surface this radar error could only occur due to a massive tsunami - well no water surface and no tsunamis but hard landing.
      Interesting to know all this tho, aint it?

    • @simonmultiverse6349
      @simonmultiverse6349 Рік тому +27

      Been there! Written lots of software... made some unbelievable bone-headed mistakes, which are all *BLINDINGLY OBVIOUS* in retrospect. "This change is SOOOOOOOOOO OBVIOUS that we don't need to test it" ... ha ha ha... this is when reality bites you on the backside, informing you that you definitely *DO* need to test it again.

    • @simonmultiverse6349
      @simonmultiverse6349 Рік тому +4

      @@hanskloss7726 HA! Then you discover it's high tide instead of low tide... maybe you simulated it with mean sea level but a mile away someone opened the sluice gates and there was a large wave from the reservoir... etc.

    • @roguedrones
      @roguedrones Рік тому

      This moon lander crash is an example of space sabotage. Deliberate.

    • @hanskloss7726
      @hanskloss7726 Рік тому +1

      @@simonmultiverse6349 low tide v. high tide does not cut it here - the surface is mostly flat still at least from a 5km perspective. The crater is a different story so you need to have many points possibly also a map? Not sure what is easier here but their method obviously failed.
      We know this is not a shame - we all have been there....

  • @martinmacphee3262
    @martinmacphee3262 Рік тому +533

    Scott - great video as usual - thank you!
    But really this is not a software 'bug' is it. It's a systemic design and control failure. The software was designed to work as it did, but the specifications do not seem to have included passing over a crater like this. In other words the initial flight plan was intended to avoid this situation, and the software was designed to work within that flight plan.
    The first error was changing the flight plan without checking if the software could still function with the new one. The second error was not testing the software under the revised conditions it would have to work in.
    Both errors are symptomatic of inadequate control over change management.
    In other words, the flaw did not lie in the programming, but the organization's approach to change management.

    • @anotheruser676
      @anotheruser676 Рік тому +51

      ...and perhaps a Third error of the program disregarding the radar altimeter instead of querying it again. 'Say what? That result is outside of parameters. Please take your reading again'

    • @LezamaDamian
      @LezamaDamian Рік тому +57

      I agree this probably shouldn't be called a bug. Requirements were not properly validated, so it's a failure in their systems engineering process.

    • @nosuchanimal6947
      @nosuchanimal6947 Рік тому +13

      came here to say that!
      also, even if the result lateron would be inside parameters again: the device has already been proven to be unreliable. it might be an intermittent error, or it might be a bias that only on this occasion was noticed but existed all the time. revalidating system reliability would be a tough cookie to crack on its own if it didn't come with a redundant 2nd and 3rd system, though it should have notified ground control and gotten an update/patch. to my understanding that is how generally system failures are resolved. i don't know if their mission profile put an artificial time delay on that to prepare for longer ranged versions, or what happened.

    • @TheSheepwall
      @TheSheepwall Рік тому +22

      Haven't read the report so might be wrong, but if they use something like a Kalman filter, it is likely that they are not simply not querying the sensor, but that the calculated variance to associate to the sensor readings spiked. In that case, the sensor would still be queried, but is _effectively_ disregarded since the resulting effect on output would be so low (due to the change in the assigned variance). Someone can correct me if I am wrong here.

    • @sciencecompliance235
      @sciencecompliance235 Рік тому +35

      There's also the design of the spacecraft that has to be called into question, specifically the AD&C architecture. Relying on a single altimeter means that you can't verify the data with a redundant sensor. Since accelerometers and gyroscopes can't really capture things like topography from orbit, it's like flying with one eye. I don't know how much mass, power, and space another altimeter would have taken up, but perhaps a redundant altitude sensor, possibly one with a lower resolution and/or sample rate, could have been used to verify the data coming from the primary one.

  • @rhymereason3449
    @rhymereason3449 Рік тому +400

    It fascinates me that as you look at the history of disasters how many of them are ultimately caused by cutting corners to meet time pressures or budget targets. In this case you have to wonder (A) why the target zone was changed late in the game, and (B) why simulations with the new target zone weren't run. I would bet a dollar that engineers thought of it, but they were over-ruled because of time pressures or a budget target.

    • @aarondavis8943
      @aarondavis8943 Рік тому +7

      Your question (A) is a great one.
      It could be that the new landing site could be reached with less expenditure of propellant or something like that. They thought it was a lower margin of error. Or was it the opposite? Was there a "better" more ambitious site with more interesting geography?

    • @rhymereason3449
      @rhymereason3449 Рік тому +2

      @@aarondavis8943 It is interesting to speculate. IMHO "less expenditure of propellant" would fit into theory about disasters and cutting corners to meet budget targets. On the "better geography" thought... unless an asteroid suddenly impacted an area close to their original site, one would think that the geography question would have been settled long ago... the lunar surface is pretty well documented (at least the front side).

    • @pierQRzt180
      @pierQRzt180 Рік тому +7

      Proverbs have a sort of statistical truth. "Haste makes waste" exist exactly due to that.
      The sad part is that seemingly we keep doing the same mistakes

    • @rhymereason3449
      @rhymereason3449 Рік тому +3

      @@pierQRzt180 Yes it is sad to think about all the people who have lost their lives due to decisions on someone's part to save a few bucks by cutting corners. One of the latest examples appears to be that partial collapse of the apartment building in Davenport. Looking like the owner went with a cheaper contractor who would forego shoring up the building before proceeding.

    • @Beregorn88
      @Beregorn88 Рік тому +7

      And C) why there weren't redundant sistems with majority check before deciding to discard the most vital part of your data...

  • @Nioub
    @Nioub Рік тому +286

    There was a similar bug in the LEM : if the module had flown above a circular-shaped crated of a certain size, the radar altimeter would have shut off all propulsion, probably leading to a crash. Fortunately the bug was never triggered (mainly because the onboard crew had taken over manual controls at this point) and was found decades after the landings.

    • @alamrasyidi4097
      @alamrasyidi4097 Рік тому +7

      why are lunar manned missions not done anymore these days?

    • @jessepollard7132
      @jessepollard7132 Рік тому

      @@alamrasyidi4097 Congress dropped funding, so NASA had no money for going to the moon (canceled the last planned 4 trips).

    • @vast634
      @vast634 Рік тому +80

      @@alamrasyidi4097 No Soviets to beat

    • @dr.cheeze5382
      @dr.cheeze5382 Рік тому +11

      ​@@alamrasyidi4097 isn't nasa planning to go back? Starting with an (unmanned?) Mission sometime after 2024?

    • @alamrasyidi4097
      @alamrasyidi4097 Рік тому +4

      @@dr.cheeze5382 so ive heard. but compared to the alternative of having to lose these spacecrafts to software error, i think "no soviet to beat" is a ridiculoua excuse. so i still really dont understand why lunar exploration has been strictly rover based these past few years...

  • @ksbs2036
    @ksbs2036 Рік тому +456

    About 30 years ago I had a single page photocopied from Computer World or some such industrial publication taped to the outside of my cubicle. On that page was listed the ten most expensive software defects (bugs). I was astounded when the most expensive defects caused hundreds of millions of dollars of loss. When you read the list the top five defects (again, multi million losses) you found out that they were all losses of spacecraft and/or their payload. Flight software is tremendously complex and a single error will cost you your whole vehicle and years of effort. Now that page would have to be scaled to near billions of loss I expect

    • @a.p.2356
      @a.p.2356 Рік тому +40

      Maybe not most expensive, but Therac-25 should be on that list somewhere. Ya know, because it ended up maiming and killing a bunch of people with intense doses of radiation.

    • @RoryMacdonald-pfff
      @RoryMacdonald-pfff Рік тому +44

      There you go Scott - that’s an epic video right there. Top 10 most expensive Astro/Software defects.

    • @o0alessandro0o
      @o0alessandro0o Рік тому +31

      @@a.p.2356 In a way, that is possibly the most expensive software bug ever; in another, it's quite cheap. Consider: we know for a fact that cars kill people, all the time, in every way, yet we do not ban cars.
      The value of a human being's life has been calculated, and apparently it's cheaper than you would expect. Electricity production has a cost measured in lives per TW/h. You can look it up. Biofuel has a cost of 12 people per TW/h. Solar is 0.44. Wind is 0.15, and new/clear is 0.07.
      The average American consumes 0.1-0.2 GW/h per year. In other words, over the course of your entire life you will likely kill less than one fiftieth of a person in order to keep the lights on. This does stack with the people you kill while driving, however - I'm talking about tyres particulate and excess death from pollution, not running somebody over.
      Ain't that grand?

    • @travelbugse2829
      @travelbugse2829 Рік тому +10

      @@o0alessandro0o It's not easy to respond to that kind of information. I do know that training and regular checking of pilots contributes to a high level safety for commercial aviation (ignoring mechanical failures). For drivers, I reckon that similar processes should be followed. It would not be popular among the general public, but I have said for years that licenses should be graded, based on years of experience and how many training courses a driver takes. Governments balk at the idea, however, and go on putting up cameras and roadside radars, more draconian speed limits, but never addressing the fact that poor situational awareness, slow and inappropriate reactions, and limited skills are the biggest factors in car accident rates. But I'm going down a rabbit hole!

    • @malbacato91
      @malbacato91 Рік тому +11

      Not strictly a bug, rather bad design; but implicit nullability - first introduced in ALGOL in 1965 and later copied into most programming languages - was famously coined by its creator as a billion dollar mistake.
      I think I read somewhere that at the time the estimate was quite accurate, but that was 2009 so by now it wouldn't be surprising if it is an order of magnitude too low.

  • @sharizabel2582
    @sharizabel2582 Рік тому +35

    I flew fighters for over 20 years. The Kalman filter was the bain of the navigation and bombing solution. It would actually discount most of the updates I would insert. It thought it knew more than I did … it didn’t.

  • @johnbuchman4854
    @johnbuchman4854 Рік тому +157

    This is why you also have timers for expected milestones (earliest and latest time a milestone can be validly sensed). My background is that I worked on the Attitude and Articulation Flight Software for the Galileo and Cassini spacecraft when I worked at JPL. For a very simple and solid method they could have used what the Surveyor landers did.

    • @danrbarlow
      @danrbarlow Рік тому +10

      Thanks for your awesome contribution to space science!

    • @nocturnal6863
      @nocturnal6863 Рік тому +9

      I'm sure mission control had a plot of the expected altitude changes, the lander may have had one as well. Problem is that the expected rate of change of the altitude, was outside what had been set as acceptable for the altitude radar. It was probably written in the specs somewhere. Proper simulation of the landing would have caught this, it could possibly even have been dealt with after launch. It's changing the landing site without simulating it that screwed them.

    • @nocturnal6863
      @nocturnal6863 Рік тому +2

      What did Galileo and Cassini use for altitude readings? and would they have been equally screwed if forced to switch over to gyro / accelerometer readings with an apparent failed altitude radar?

    • @u1zha
      @u1zha Рік тому +7

      @@nocturnal6863 John's point was that "forced" switch is averted, if the switch algorithm is completely disabled at such an early phase of flight. Reread about "earliest time.. a milestone can be validly sensed".

    • @nocturnal6863
      @nocturnal6863 Рік тому +2

      @@u1zha except you wouldn’t disable the software monitoring a sensor for failure. Not unless you knew in advance it might give faulty readings at that point.
      Further thinking, I think I see what you are suggesting. That it should have been expecting by the dip in altitude and it’s failure the see it, means it should have known it’s altitude was off.

  • @Papershields001
    @Papershields001 Рік тому +500

    I feel such compassion for the Hakuto-R team. They are going to accomplish it!

    • @serronserron1320
      @serronserron1320 Рік тому +22

      I hope that they can make a new one and landed on the moon the next few years

    • @emileriksson76
      @emileriksson76 Рік тому +18

      I watched the landing live stream and I felt so bad for them. Their nervous faces really hurt me too. I bet they do it next tie!

    • @abarratt8869
      @abarratt8869 Рік тому +17

      They may not accomplish it. Very often such incidents reveal a whole load of issues that have been swept under the carpet, and the necessary organisational change required to address them all can easily break a small team / organisation.
      Even big companies can be killed by this. This is what is going on in Boeing right now. They caused the crashes of two 737MAXes and killed people. Since then they've tried to institute root and branch reform of how they run their business. Yet, they're still having problems. The most recent one was a fuselage manufacturing defect (they were building them wrong) that had gone unnoticed for approx 700 airframes (yep they're flying, possibly with Southwest today!). Fine, they've found it, repairs needed, not immediately dangerous, but cannot be ignored.
      Trouble is the manner of them finding it was accidental; someone was in the right place, at the right time and realised what was going wrong. The issue is that, if despite the introduction of a root and branch reform about how they approach quality (= safety, reliability) they're still finding major issues by chance, then the root and branch reforms are junk and are not working. They should be finding such problems as part of a systematic continuous improvement process, and they're not. So the bet-your-life question is, what else have they missed, given that they've essentially admitted that they've not been looking hard enough?
      It's similar with 787 (fuselage barrel joints), brand new 737MAXs with FOD and rodent damage, etc.
      This suggests to me that Boeing are in no way adequately reformed following the MAX crashes, the problem most likely being in the senior management who never understood it before and are still there today. It's worryingly possible that they're going to make another fatal mistake. Ok, the FAA is now (belatedly) keeping a much beadier eye on Boeing, but they can't see and check everything; certification engineers / inspectors are not there to do basic QC and basic QC improvement.
      The Hakuto team's best bet, if they're to try it again, is to just fix that one core issue and try again, and do as much simming as they can muster. Unlike Boeing, crashes are just disappointments and money.

    • @99guspuppet8
      @99guspuppet8 Рік тому

      ❤❤❤❤❤❤❤❤❤❤ Yes they will succeed…… After they spend a lot of someone else’s money……… Let’s all go to Sugar rock Candy Mountain

    • @thePronto
      @thePronto Рік тому +1

      But they launched knowing that their testing was invalid. Kinda like practicing parachute landings in a field, then jumping over water. I hope they don't ask me for a donation, because polite refusal often offends.

  • @subhakantagmail
    @subhakantagmail Рік тому +6

    Finally the software bug is fixed and the Vikram lander from Chandrayaan-3 landed safely on lunar surface by ISRO. Hope most of the space agencies share data among themselves so that space progress is accelerated faster, instead of each one reinventing the wheel. Knowledge for Humanity...👍

    • @henrikibjensen3869
      @henrikibjensen3869 Рік тому

      Sorry, Humanity doesnt land on the Moon, nations do - or dont.

  • @user-jz1su8bh5t
    @user-jz1su8bh5t Рік тому +114

    Another outstanding episode Scott! Being a software safety engineer for the last 39 years, I have to agree with previous comments that point out this is not a software bug, but more of a people problem during design, testing, management, etc. I believe the first Ariane 5 launch was a similar issue where the software worked perfectly per its specifications (from Ariane 4) and doomed the flight to failure. Like in this case, proper testing would have prevented the, expensive, tragedy. Also wanted to give a shout to "How To Destroy Wayward Rockets - Flight Termination Systems Explained". My 39 years were all spent on Range Safety Software with the last 13 years working on autonomous flight termination systems. That was another outstanding episode! Keep up the awesome work!

    • @Icowom2
      @Icowom2 Рік тому

      Pop op o99⁹9th kiwi's😊

    • @xGOKOPx
      @xGOKOPx Рік тому +1

      It is a software bug though. People problem is that the bug wasn't caught

    • @vast634
      @vast634 Рік тому +1

      Have you ever experience a flight termination system not working instantly, but 50 seconds late, as with the starship launch?

    • @user-jz1su8bh5t
      @user-jz1su8bh5t Рік тому +4

      @@vast634 Depends on the type of Flight Termination System (FTS). For solid rocket motors, they use a shaped linear charge that opens the casing and exposes the fuel which burns up quickly in an impressive display. (I think Scott mentioned that in his previous video.) For chemical fuels, things are different. You have more choices. The basic idea is to stop thrusting the vehicle so it falls into an unpopulated area, such as a broad ocean area in the case of SpaceX. Based on the video of the flight, the FTS worked properly and detonated explosive devices that created holes in the fuel tanks. That reduced or stopped the fuel flow to the engines. The FTS did its job. After that, it's all physics. If the fuels are hypergolic, they will combust on contact and you get a near-instant explosion. Otherwise, you need combustible fuel, oxygen, and an ignition source. Guessing, it took about 40 seconds before the three elements came together in the right quantities in the case of SpaceX. An FTS doesn't need to create an explosion. Rather than connect to explosives, the FTS can connect to fuel valves that terminate fuel flow.

    • @user-jz1su8bh5t
      @user-jz1su8bh5t Рік тому +2

      @@xGOKOPx I understand your perspective. My point is that the bug should have been avoided during design or implementation, and if not, then detected during development testing. Find and correct all the bugs before deployment. Since their development testing failed to react properly to "unexpected terrain" (kind of a silly term considering the moon's terrain is pretty stable), the people failed in the software development cycle and left in a failure mode (i.e., the bug) so it could be exposed during execution. The software did what it was designed to do so it worked properly. The people failed to account for something. The same thing happens with hardware but folks don't usually blame the hardware. The failure of Galloping Gertie wasn't blamed on the bridge. The people who designed and built it were blamed for not accounting for potential wind loading.

  • @BeardyBaldyBob
    @BeardyBaldyBob Рік тому +27

    I'd argue it's due to inadequate testing and making assumptions they shouldn't make rather than just blaming the software.
    To move the landing site and NOT run a series of full simulations for the new site is just an astonishing degree of incompetence!

    • @mcgilliman
      @mcgilliman Рік тому +3

      This.

    • @BeardyBaldyBob
      @BeardyBaldyBob Рік тому +5

      @@mcgilliman I like to think of an F1 analogy... Imagine if you set your car up to race in good sunny weather in Monaco at sea level, and they changed the race to be in Mexico in soaking wet weather at 2,260m above sea level... You would NEVER just race the car with the exact same set up and no testing before the race!!

    • @Myndale
      @Myndale Рік тому +3

      True, but if history has taught us anything it's that the incompetence almost certainly wasn't the software engineers themselves and was instead a cumulative effect of multiple levels of beurecracy repeatedly ignoring the recommendations and pleas of the people who actually knew what they were doing and what additional work had to be done. I suspect this is a scaled-down version of Challenger all over again, albeit thankfully with no loss of life this time.

  • @peterweston1356
    @peterweston1356 Рік тому +12

    Makes the Apollo landings even more amazing. Considering the precision of sensors and computational resources, both to simulates and support landing.

  • @hjalfi
    @hjalfi Рік тому +46

    There's an argument to be made that if a sensor is critical enough that if it fails you're going to land on non-existent terrain 5km up, then you just assume it won't fail. If you handle failure gracefully but then don't have enough data to avoid crashing, what's the point of handling it gracefully?
    Of course, ideally you'd have a backup. Like another radar, or GPS, or a video camera capable of estimating height using machine vision and a map, so you can sanity check it. The next best thing is just have a map: the vehicle knows where it is, so if it knows the terrain it can estimate what the radar values _should_ be, so instead of going 'eek, a delta of 3km in ten seconds is clearly wrong' you go 'the radar has shown a delta of 3km in ten seconds, what does the map say the delta should be? Right, 3km, moving on'.

    • @stoic.little
      @stoic.little Рік тому +4

      You can have a video camera that is very good at finding the distance by using phase detect autofocus, same principle as a rangefinder.

    • @driedurchin
      @driedurchin Рік тому +5

      I work in flight software and you're right. At a certain point if a system is so critical and irreplaceable you just have to trust it won't fail because as you said, detecting the failure isn't helpful if your SOL.

    • @Spillerrec
      @Spillerrec Рік тому

      There is an argument to be made that if a $90 million project can go up into smoke due to a single sensor failure you have an expectation that it could potentially fail, you should really have some sort of redundancy even if it is unlikely. Or some other form for backup plan. The question is if it was actually considered if this sensor could fail, or if it just used the same behavior failure detection and handling as any other sensor without further consideration.

  • @dmacpher
    @dmacpher Рік тому +243

    Such a bummer that a error correction filter with and edge case nailed them. Lots of amazing data and at least it’s a software fix!

    • @sliceofbread2611
      @sliceofbread2611 Рік тому +41

      Cliff case*

    • @dmacpher
      @dmacpher Рік тому +3

      🎢

    • @thePronto
      @thePronto Рік тому +11

      Edge case? A crater on the moon? But it's not just a software fix is it? Or are we talking about a KSP do-over?

    • @slcpunk2740
      @slcpunk2740 Рік тому +17

      Seems a pretty basic error, in what universe did they think they could figure the exact altitude without the radar? Even if it was broke too bad, damned if you do/don't.

    • @dmacpher
      @dmacpher Рік тому +5

      @@thePronto They moved their landing site to align with NASA South Pole targets super late in development (post validation). The threshold for culling/re-baselining seems to be the issue. The sudden change in relative altitude wasn’t expected from their simulations.

  • @ezequielblanco8659
    @ezequielblanco8659 Рік тому +41

    Being a software developer, I have seen this happen countless times in multiple companies. Software is often overlooked. Testing is usually considered redundant and a waste of time/money. Developer's warnings and requests are normally disregarded or displaced by other department's concerns which are non-technical and even non-functional.

    • @old_guard2431
      @old_guard2431 Рік тому

      In my experience the software developers/engineers are kept out of the decision-making inner circle. Actually, this goes for engineering/tech in general. It’s fine, just change this, this and that: what’s the worst that can happen?
      (Changing the Moon’s landscape to more closely resemble a seedy neighborhood in Brooklyn, one spacecraft at a time.)

    • @harshu2651
      @harshu2651 Рік тому

      After fully tested, I still fear my code would break in some case that we have not looked 😂, its scary for space mission

  • @regolith1350
    @regolith1350 Рік тому +104

    Software may have been the proximate cause but you can argue the real problem was somewhere in the development and quality control procedures. How can you not re-run a full landing simulation after changing the landing location? It reminds me of Starliner's problems in 2019. The software glitch where the flight computer grabbed the wrong "time" was the proximate cause, but the real problem was Boeing never ran a full end-to-end launch simulation.

    • @srinitaaigaura
      @srinitaaigaura Рік тому +3

      Actually these days so much of manufacturing and coding is outsourced that the management, hardware and software teams are no longer next to each other - quality control begins to suffer massively. The more people outsource stuff, the more the work gets into the hands of rookies paid on cheap wages, who then end up making rookie mistakes that then require even more time and energy to fix. Boeing turned from an engineering firm to a management firm and the rest is history - 787, 737 max, 777x, Starliner.
      And as more and more automation comes in there's less and less human intervention to take care of the times where the computers reach their limits.

    • @user-cr4sc1ht9t
      @user-cr4sc1ht9t Рік тому

      Feels like they might not have a great CI indeed, probably more like bunch of artifacts in git LFS type of management. But Starliner glitch might be slightly different topic IMO

    • @BubblefishOfTrem
      @BubblefishOfTrem Рік тому

      I was also wondering how expensive such a simulation would be. If they aren't too expensive, I was wondering if you couldn't run landing simulations from randomized positions and flag anomalies from there. Not so much that you can just fling the lander at the moon arbitrarily, but more so you can find starting conditions which result in something weird.
      IDK, maybe we're getting into a space where "moon lander software testing" and later "asteroid lander software testing" might be a market, that would be amazing. With the costs of these missions, there might be some money on the line for a testing company - especially if they end up with a body of "known problematic situations" like the one from the video.

    • @MrJdsenior
      @MrJdsenior Рік тому +2

      How can you put a tank that has experienced both problems and damage in test into Apollo 13? Exactly like that, only different. Or get km and miles crossed up and smash a probe into Mars (IIRC), or ... ad infinitum.
      You can run all the simulations in the universe and still have problems, but not running ANY sims to cover a deviation in the program...yeah, that's just begging for it. I would think, in this day and age, that you could pretty much run that sim real time in parallel with the mission, for the problem they had there, knowing the path and surface profile, I'm guessing, and have it fire up a quick "do not ignore the damned properly functioning radar" command, or some such. It might even be good to have REAL TIME simulations running against the truth of the mission.
      Having done some aerospace hardware design, I'm guessing that there were schedulers and/or bean counters directly in the problematical loop. Or maybe idiotic MBA wielding managers that think they are engineers, or worse know BETTER than the engineers, because they know a few buzz words, and then maybe hold people's feet to the fire to get them to sign off on VERY cold Shuttle launches, or what have you. That's the sort of feedback you do NOT want in, say, a servo. :-/ Sometimes I look back and am glad I am retired, frankly. Some of it was fun, some of it SUCKED.
      Doc requirements come to mind as some of the latter. I had one junior documentation fiefdom wannabe tell me that the real output of a program was the documentation. When I finally quit laughing I told her that if she actually believed that she should go talk to some F16 pilot and ask them which they'd rather have with them on a mission, a working LANTIRN pod, or the documentation that describes it. She wasn't happy, because then a couple of people standing around laughed too. She wasn't a nice person (that's putting it mildly), or I wouldn't have said it that way. My bad, I guess.

    • @i-love-space390
      @i-love-space390 Рік тому +1

      Armchair quarterbacks are a dime a dozen. You can certainly crow if you ever land a vehicle on the moon or even achieve orbit. Perhaps we can talk about "how obvious" the solution was when we stop whining about how LONG it takes to build and fly a vehicle and how the contractors are "milking the American public" for so much money.
      I thank Providence every day for Kathy Lueders and NASA for riding herd on SpaceX to make the Dragon 2 safe. Everyone had lots of criticism for NASA for being conservative and "delaying" the first launch of the manned spacecraft. But all that effort kept the astronauts safe. (Also SpaceX had a real leg up on Boeing, because they had a working cargo spacecraft in Dragon 1 to build on. The last time Boeing designed a manned spacecraft was the 1970s and the Space Shuttle. All those engineers are long since retired.)

  • @IsMaski
    @IsMaski Рік тому +127

    Unfortunate to see what led to the failure of this mission. But glad to see that they have found the issue. Really hoping they succeed on their next attempt. Thanks Scott for the comprehensive explanation on this!

    • @MrPaxio
      @MrPaxio Рік тому +3

      they didnt find the issue, they made the issue

    • @MonkeyJedi99
      @MonkeyJedi99 Рік тому

      Sounds like the software took the path of flat-Earth "science".
      What I see doesn't fit my preconceptions, ignore it!

    • @togowack
      @togowack Рік тому

      People need to wake up, controversy surrounded moon landings because there is stuff there. The issue / bug was in there on purpose. They will probably never let us see the real moon.

    • @davidbeppler3032
      @davidbeppler3032 Рік тому +1

      They did not find the issue. The issue was management. The software was fine. Software did not change the landing location, management did.

    • @togowack
      @togowack Рік тому

      @@davidbeppler3032 The whole things was planned it is every time with every country why do people not see this, every single machine that lands on the moon has issues - #1 because the surface is covered in glass domes and other hanging debris #2 to cover up such things from the public in a convincing way.

  • @AMeierhoefer
    @AMeierhoefer Рік тому +13

    Scott, I am surprised that you did not touch on redundancy. I was a fighter jet aviator and one of the things we always did was use multiple sensors to allow the software to compare and then estimate probability. If they has three Radar altimeters they could see the rate of change of the surface as the spacecraft travels. Even if each would have shown the cliff, probability calc would have told it that its is virtually impossible that all three are suddenly all bad. Redundancy would be one answer in my book.

    • @thierrybriand2413
      @thierrybriand2413 Рік тому +1

      Agree and also on my part, I always thought that radar altimeters were used « closer » to the surface.

    • @drill_fiend1097
      @drill_fiend1097 Рік тому

      Probably budget constrained.

    • @AMeierhoefer
      @AMeierhoefer Рік тому

      @@drill_fiend1097 This is a commercial effort so they could have just gotten one normally used in aircraft. It's not NASA where they cost $750K each just because...

  • @perishmokrat8257
    @perishmokrat8257 Рік тому +18

    Working as a Software Tester I often see the managers tend to take the risk to save some money vs malfunctioning SW especially when it has to deal with error handling.

    • @Henglaar
      @Henglaar Рік тому

      Which is a shame, really. The more expensive the project, the less management should feel like cutting corners on error handling and verification. Ah, well, what "should" happen in the real world doesn't agree closely with what actually happens in the real world.

  • @CodeKujo
    @CodeKujo Рік тому +53

    My reaction to just the title is "There are no unbelievable computer bugs".
    Now that I've watched the video: *very* believable. Accumulation of error is nasty and dead reckoning is very hard. Changing something that "can't possibly affect the outcome" late in the process and not doing a full test happens often enough that it's a subject of comic strips and many high profile failures.

    • @Hebdomad7
      @Hebdomad7 Рік тому

      Except the one that flew into one of the first computers and caused a short circuit.

    • @Ergzay
      @Ergzay Рік тому +2

      Scott's been moving to more and more clickbait titles of late. It's unfortunate to see him doing it.

    • @winebartender6653
      @winebartender6653 Рік тому

      When you're using accelerometer and gyroscopic data alone for position on a 2d plane, it can become hilariously inaccurate quickly, no matter how good your algo is.
      Doing this in a 3D plane would be basically impossible if I'm being honest.
      As an example, there is a reason VR relies so heavily on video processing for limb positioning. Obviously these aren't in the same ball park of cost/importance, but the same rules apply.

    • @VarenRoth
      @VarenRoth Рік тому

      The unbelievable part here, honestly, is how someone expected this to work without simulating the actual final flight plan at least once.

    • @CodeKujo
      @CodeKujo Рік тому

      @@winebartender6653 US missile submarines can pull it off, but their inertial navigation hardware is larger than the entire lunar probe and submarines experience much smaller accelerations.
      It does seem like it was selected as a fallback with rather optimistic expectations of how well it would stay accurate. In hindsight, it would have been better to try turning the radar off and back on, relying on inertial navigation only as long as it took the radar to come back on. Also, redundant radar.

  • @i_Kruti
    @i_Kruti Рік тому +2

    7:50 Yeah , the VIKRAM lander from CHANDRAYAAN-2 had lost communication and went out of control , but with improvements in software, damper etc , we are again ready for CHANDRAYAAN-3 to it in July according to official message......

  • @dust1209
    @dust1209 Рік тому +25

    This reminds me of an Alastair Reynolds novel where an automated system recorded the sudden vanishing of a planet but disregarded the data because the event was so far out of expected results that it assumed there was some kind of fault.

    • @letsburn00
      @letsburn00 Рік тому +5

      It then accidentally creates a cult.

    • @yogiwp_
      @yogiwp_ Рік тому

      Which novel is this?

    • @dust1209
      @dust1209 Рік тому +7

      @@yogiwp_ Absolution Gap, it's the third book in the Revelation Space series which is kind of weird. If you're looking to check out the author, I'd recommend Pushing Ice!

    • @ShoeTheGreyCat
      @ShoeTheGreyCat Рік тому

      @@letsburn00 And also liquifying the poor guys wife stuck in the scrimshaw suit

    • @letsburn00
      @letsburn00 Рік тому

      @@ShoeTheGreyCat I forgot about that bit. Given that series largely relates to characters that are functionally aging immortal, it's wild how easily they torture and kill each other.

  • @kennethng8346
    @kennethng8346 Рік тому +96

    I've never done it, but from what I have read, sensor fusion is an enormously complicated and fuzzy technique. You have to take a bunch of sensors, account for non linearities and malfunctions, and you need to figure out which ones are correct, which ones are sorta correct and by how much, and which to ignore. On top of this you have enormous weight and power restraints. And there must be a million fudge factors that have to be played with. Move it one way and you get a false positive, move it the other way and you get a false negative.

    • @andrewahern3730
      @andrewahern3730 Рік тому +3

      I wonder if this would be a good application for AI? A computer would definitely be able to interpret way more inputs than a human pilot ever could and in real time

    • @JKa244
      @JKa244 Рік тому +1

      It's a satisfying problem to work on.

    • @Niosus
      @Niosus Рік тому +28

      ​@@andrewahern3730 AI isn't a magic fix. Those sensor fusion algorithms are supported by a a deep understanding of the system and statistics. Like with the Falcon 9, they are extremely reliable once properly tuned.
      Obviously an advanced enough AI system can always do the job. But if, like in this case, you simply didn't test the system with enough variations of inputs, you're not going to get good results either. The amount of simulations needed to properly train the AI would also have been plenty to find this bug in the old control code.
      The lesson here is that more robust testing is needed. I have a feeling that spaceflight is often seen as hardware-first. That's understandable, but without proper software the hardware is useless. I think more modern software engineering practices could be useful here.

    • @Orieni
      @Orieni Рік тому +4

      IRL, nothing says you can’t have false positives and false negatives at the same time, while you struggle to understand the data. That’s no fun at all.

    • @GeorgeTsiros
      @GeorgeTsiros Рік тому +6

      kalman filtering is pretty damn straightforward. It's a basic method, not something extraordinary. Known for more than 50 years and optimal for typical sensors (ie those with common noise distribution).

  • @bretthoffstadt
    @bretthoffstadt Рік тому +4

    I can't believe they didn't simulate their final landing site but that's what you are saying. Thanks for the explanation. Such a shame, they picked the wrong thing for a shortcut!

  • @connecticutaggie
    @connecticutaggie Рік тому +6

    Yea, that is the challenge of small projects with limited resources. It is great that this is not a problem for larger projects (cough-cough-Starliner) that have the money and resources to allocate to proper SW verification.😆

  • @Alex-og3ev
    @Alex-og3ev Рік тому +55

    Similar thing happened in 2017 with second launch from new cosmodrome Vostochny, old software logic applied to new geography without double check. Didn't happen at first because they used very rare Volga upper stage but second launch was in default configuration that flew for decades from launch pads everywhere including South America. So after final separation, Fregat upper stage was scheduled to make 10 degree turn counter clockwise but due to geography of new cosmodrome and flight trajectory, software decided that it needs 350° clockwise turn instead. Didn't end well. Turned out that there was narrow set of input parameters that could make upper stage behave like this and new lauch pad won jackpot.

    • @JohnMullee
      @JohnMullee Рік тому

      Wasn't there something about thermal modelling and pipes freezing in the fregat upper? Or am I misremembering

    • @Alex-og3ev
      @Alex-og3ev Рік тому

      @@JohnMullee No, that was definitely some other story

  • @henrymalone422
    @henrymalone422 Рік тому +7

    Been watching you since 2015! You have helped keep me interested in space flight! Thank you for doing what you do Mr.Manley.

  • @ns219000
    @ns219000 Рік тому +75

    Japan, sorry for your loss, but thanks for the software design lesson. Rockets are hard and this is how we learn. Thanks for sharing this one, Scott!

  • @glennpearson9348
    @glennpearson9348 Рік тому +35

    Excellent explanation, Scott. Thanks for putting it all together for us to easily digest. Nice Kerbal recreation, too!

  • @AllAmericanGuyExpert
    @AllAmericanGuyExpert Рік тому +2

    My Dad helped design the Apollo lunar landing software ... and curiously enough, it was never used due to a sensor overload ... the famous DSKY error 1202. When Neil Armstrong disabled my Dad's software for Apollo 11, that was the end of it. The LM landing program was always over-ridden by future LM pilots and the LM was landed manually. The fault was in a completely unrelated system ... I guess a lot of people wonder if it would have done its job. My dad says it was pretty robust and he never saw a simulation that it would have failed if given the chance to run to completion.
    It's a good thing Armstrong was a good pilot!
    My dad would go on to be famous for mockups, and then later, he worked on the avionics of the world's most capable fighter jet. He's getting old, but still with us. I wish he was more of a storyteller ... but the one he thought was the funniest (and most irrelevant) was meeting the president in the restroom at NASA ... as in, _um, nice day, isn't it Lyndon?_ as they conducted their business. I am guessing it was during LBJ's visit to Houston in 1968, the same time frame that my dad was working there.

    • @PT-xi5rt
      @PT-xi5rt Рік тому

      You still believe in this fable? Open your eyes

    • @AllAmericanGuyExpert
      @AllAmericanGuyExpert Рік тому

      You @@PT-xi5rt didn't know that LBJ was president? Or that he used the bathroom like the rest of us?

  • @chouseification
    @chouseification Рік тому +70

    hey Scott - thanks for the analysis. I remember this one (as well as the Israeli and Indian ones) and seeing the disbelief in the control room was sad. It is easy to tell who has a clue and who is a bureaucrat by their expressions, etc. :P

    • @adarsh4764
      @adarsh4764 Рік тому +3

      Hope there's no software issue when Nasa lands back on the moon!😂

    • @chouseification
      @chouseification Рік тому

      @@adarsh4764 agreed - one would have thought that even a small lander would have a pretty robust navigation system these days, but obviously they met an edge condition they hadn't properly tested for... and a sad oversight too as nearly all landing trajectories will have the radar return affected by craters you're passing over. There are many of them after all, and although most are small, many are large/deep and you need to keep their profile in mind as you use the radar/laser/etc surface measurement.
      The state vector routine needs a sanity check to make sure the drift never disagrees from projected too much without it doing some form of reliable recheck.

  • @Hagop64
    @Hagop64 Рік тому +29

    If it stopped to a speed of 0, then fell to a speed of 500 km/h then it would have had to fallen for ~86 seconds. Moon gravity acceleration = 1.62 m/s^2. That means it was in free fall for a distance of about 6.0 km. That's all based off of the "500 km/h" crash speed given.

    • @scottmanley
      @scottmanley  Рік тому +46

      Actually, I figured out 500km/h based upon the amateur radar measurments of 88seconds of freefall.

    • @Hagop64
      @Hagop64 Рік тому +11

      @@scottmanley Love how reliable basic physics equations are! With either bits of data it still comes up with the same results! If only the rest of landing on the moon were that simple.

    • @travelbugse2829
      @travelbugse2829 Рік тому

      What I want to know is how that equates to a violent impact on earth. Do I divide by six, which comes to 83.3km/h or just under 52mph? That's bad enough for it to need airbags...

    • @highdefinist9697
      @highdefinist9697 Рік тому +3

      @@travelbugse2829 You multiply by the square root of six - assuming there is no air resistance, so with air resistance you might end up with something not too different from 500 km/h for this type of vehicle.

    • @Kromaatikse
      @Kromaatikse Рік тому +4

      @@travelbugse2829 When it comes to the moment of impact, 500kph is 500kph. It's about Mach 0.5. You know those old war movies where they show fighters shot down and augering in? *That.*

  • @gonun13
    @gonun13 Рік тому +4

    Putting aside changes in mission plans, redundant systems missing or even software bugs, I think the main issue here is overly strict programming. Assuming something is defective just because of a sudden change that is out of scope is bit extreme. Baffles me how it could hover waiting for the moon while letting propellant go to zero without at the some point trying to salvage itself with something like "this is not working, maybe I should take another look at that system i think it's dead".

    • @ahadsuleymanli9572
      @ahadsuleymanli9572 Рік тому +2

      what you're describing is human decision making, and you're ready to scratch this plan and try something better when the moment comes. you can't just imagine every scenario branching out at every step and hard-coding solutions to each. At some point you'll realize you need a generic decision making algorithm. In fact the mission failed due to them having a specific solution of switching off a reading since that allowed them success in previous simulations.

  • @bobboonstra3484
    @bobboonstra3484 Рік тому +71

    Not a software bug, it was a design bug. The software functioned as specified.

    • @pigsnoutman
      @pigsnoutman Рік тому +2

      How do you know? Did you read the design spec? If the design spec stated it should be able to handle multiple lunar landing locations, then it's not a design spec issue.

    • @simongeard4824
      @simongeard4824 Рік тому +2

      Definitely a process bug that this wasn't picked up in testing - but premature to say that it wasn't also a software bug.

    • @marcusdirk
      @marcusdirk Рік тому +1

      @@pigsnoutman 6:17

    • @DavidEsp1
      @DavidEsp1 Рік тому +1

      Mismatch at Requirements and/or Expectation levels. Activated by beyond test envelope operation. Needed a calm (seasoned?) "captain" to hold a steady, pre-planned course.

    • @Spillerrec
      @Spillerrec Рік тому +4

      ​@@simongeard4824 I think the video was quite clear on that the software started ignoring that sensor because it was programmed to do so. An intentional feature that behaved differently than expected *because* it was put into a situation that was not considered while designing it. And that this only happened because they changed the mission plan after the software was developed and did not test it again with the new landing site, because their tests would have detected the issue. That last part really hurts because they reasonably could have avoided the crash.

  • @firefly4f4
    @firefly4f4 Рік тому +99

    By, "unbelievable", I'm pretty sure you meant, "Completely realistic, very common scenario when the software is put in an untested environment."
    Note that I am saying this as a software developer myself. I actually just identified a scenario where our existing tests were thought to be sufficient, but then some surrounding parameters changed and a bug was found.

    • @jarisundell8859
      @jarisundell8859 Рік тому +19

      As a software developer myself, I'm actually asking myself why those simulations were not set up to run like a CI.

    • @firefly4f4
      @firefly4f4 Рік тому +7

      ​@@jarisundell8859
      Good question. Seems like actually running the sim again once the final site was chosen should have caught this, maybe allowing them to upload the fix.
      For the record, CI is how the one I looked at was caught... prior to release 👍

    • @danstenger1
      @danstenger1 Рік тому +2

      Scott is also a dev by trade, too, lol, he works at Apple.

    • @cinquine1
      @cinquine1 Рік тому +8

      @@firefly4f4 I think it's a joke, since the bug happened because the computer didn't "believe" the radar

    • @scottmanley
      @scottmanley  Рік тому +76

      By unbelievable, I mean the software stopped believing the radar

  • @wChris_
    @wChris_ Рік тому +4

    Its amazing how Apollo didnt have such bugs, despite it being written in pure Assembly!

    • @PMA65537
      @PMA65537 Рік тому

      They chose tamer landing sites.

    • @phloxie
      @phloxie Рік тому

      @@PMA65537 apollo 15 likes to have word wth you

    • @castafioreomg
      @castafioreomg Рік тому

      Apollo missions had some issues but they handled then well..The engineers couldn't even visit their families becoz of the work pressure

  • @thePronto
    @thePronto Рік тому +9

    A lunar lander encountered a crater and got confused. Total freak accident: one in a million. I can totally relate: today, I encountered a Starbucks in a strip mall.

  • @codediporpal
    @codediporpal Рік тому +8

    I'm very impressed with the abilities to diagnose what went wrong. Even amateurs helped! Another case study for future designers of "fail-safe" systems.

  • @elleryhorton44
    @elleryhorton44 Рік тому +4

    Redundant systems to help the mission don't matter if the mission never starts. I worked on a Single/Dual/Triple redundancy system a long time ago. I think the probability of a single incorrect signal per million samples for each device was 75/93/98 percent (roughly, I don't recall the exact number). A huge bonus from single to dual redundancy but rarely worth the extra 33% in cost between Dual and Triple. However, each module had to boot up on its own and if they did not, then the system wouldn't run anyway.

  • @bobbun9630
    @bobbun9630 Рік тому +11

    From this description it sounds like the software worked as intended based on the circumstances. It sounds more like they need to rethink the system level design to have more inputs that can be used to sanity check one another, and perhaps have a means for a one-time instrument glitch (at least in the design interpretation) to be "forgiven" if later sanity checks pass.

    • @u1zha
      @u1zha Рік тому +1

      Yes, that makes sense, and the "forgiving" part is commonly solved by Kalman filtering, which Scott also mentioned. Here it sounds like Ispace overengineered a little bit, overeagerly dropping sensor data on the floor before giving the filter a chance.

  • @joelcorley3478
    @joelcorley3478 Рік тому +66

    But what if the radar altimeter actually did fail around the time it passed over that crater? It sounds like it would have produced the same result. I think the only way to deal with this in the design is to have at least one redundant sensor for something this mission critical. Of course the problem with just one sensor is that you need to try figure out which one is actually the broken sensor. That's why there is often 3 sensors or 3 computer systems that are used in this kind of redundancy...

    • @sonaxaton
      @sonaxaton Рік тому +13

      Sounds like a redundant sensor wouldn't have helped this particular issue though, because it would have just gotten the same confusing measurements of the cliff wall. I think they just need to thoroughly run simulations of the actual mission to catch edge cases like this early.

    • @a4d9
      @a4d9 Рік тому +3

      On a vehicle like this, without humans onboard, the space and weight requirements might be too costly compared to the risk of a failed sensor.

    • @SashaNaronin
      @SashaNaronin Рік тому +4

      @@sonaxaton exactly. Proper simulation campaign would've catched that.

    • @Damien.D
      @Damien.D Рік тому +16

      @@sonaxaton 3 redundant sensor and a voting system is the way to go. Worked flawlessly in many aeronautical things, from Concorde autopilot to missile guidance system.

    • @travcollier
      @travcollier Рік тому +19

      The dead reckoning system combined with prior knowledge (a map of roughly what is expected) should have been enough of a redundant system. Seems like they should have included a reassessment/recovery routine to check if that apparent altimeter glitch (which wasn't a glitch of course) cleared and the instrument was giving reasonable data.
      This stuff is really tricky without a human in the loop.

  • @Anacronian
    @Anacronian Рік тому +6

    It's crazy to me that they didn't redo the simulations when a new landing site was chosen.

  • @Anvilshock
    @Anvilshock Рік тому +36

    Hardly a "bug" when it worked correctly for the data input it was programmed to handle. At best, it encountered data it _wasn't_ programmed to handle, which makes this more a missing feature.

    • @mikehartsough489
      @mikehartsough489 Рік тому +3

      I was thinking same thing. Sounds like the software did exactly what it was supposed to do.

    • @1224chrisng
      @1224chrisng Рік тому +11

      well, a bug is just unintended behaviour. The computer did exactly what you told it to do, just not what you wanted it to do

    • @ddnguyen278
      @ddnguyen278 Рік тому

      Can't imagine why they didnt run simulations of this. It's not like the moons topology isn't known down to the meter. Stick it in Kerbal and run simulations.

    • @RemyPorter
      @RemyPorter Рік тому +7

      @@ddnguyen278 Uh, the moon's topography *isn't* known down to the meter. Some areas of the moon are, but generating meaningful maps of the moon is actually quite hard and time consuming. There are folks whose entire job is to take lower res digital elevation maps and apply reasonable interpolations to generate higher fidelity maps than we actually have.
      Not saying they shouldn't have done more sims, but it's harder than it sounds.

    • @davidwright7193
      @davidwright7193 Рік тому +1

      Repeat after me “That’s not a bug it’s a feature”

  • @dorsetdumpling5387
    @dorsetdumpling5387 Рік тому +103

    Unbelievable that they had only one method of determining altitude!

    • @manuelsilva8640
      @manuelsilva8640 Рік тому +3

      My thought exactly.

    • @theqwert3305
      @theqwert3305 Рік тому +50

      And that that one method could be turned off for the rest of the landing!

    • @EmpereurHector
      @EmpereurHector Рік тому +6

      I guess that's part and parcel for those very small landers.

    • @GlutenEruption
      @GlutenEruption Рік тому +22

      I mean to be fair, even the Apollo lunar module only had a single non-redundant landing radar altimeter for determining exact altitude. The astronauts were fairly confident they could manage to land without it but if it failed, mission rules called for an immediate abort. The weight constraints for landers are so tight, engineers have no choice but to make those trade offs.

    • @dorsetdumpling5387
      @dorsetdumpling5387 Рік тому +46

      @@GlutenEruption Ah, but they had the backup that was the Mk. 1 Eyeball and its associated biological computer!

  • @ytashu33
    @ytashu33 Рік тому +8

    Love this! Thanks you for reminding me of Kalman Filters, i studied those in my M. Tech., loved them but never thought i would ever hear of them again. I still remember how the "location estimation" part, based on current velocity and direction integrated over time (aka: dead reckoning) can provide smooth and accurate predictions over short durations, but errors tend to accumulate in a physics based predictor like this and needs to be augmented with an independent measurement (ie: the radar), even if the radar data is not accurate. Amazing to see how stuff like that led to this outcome. It is a tough one though... I wish you had shared your thoughts on how should a "faulty sensor" be detected then? I mean, you could say that a 3 Km sudden jump in the sensor output means the the sensor is probably broken, right? If not, how else would you do that and handle the case when the sensor actually is broken?

    • @Beregorn88
      @Beregorn88 Рік тому +2

      Redundant systems and majority check: if all three of your radar sensor reports a sudden altitude change, than that's what actually happened. What surprise me is that the sudden altitude change eventuality is never accounted for...

  • @lyoha5028
    @lyoha5028 Рік тому +84

    I wonder what all these people in mission control were doing during the landing. Were they analyzing the telemetry in real-time? I assume they were supposed to notice that the radar altimeter was considered faulty and disabled. If so, perhaps they could have reviewed its readings and realized that after passing the edge of the crater, the readings returned back to normal. In that case, they could have just manually reenabled the radar altimeter. Since it is not Mars, the signal delay is small enough to allow for manual corrections during the landing.

    • @katho8472
      @katho8472 Рік тому +1

      Word!

    • @ooooneeee
      @ooooneeee Рік тому +11

      They lost telemetry. If they had a connection they could saved it.

    • @pavanshetty9806
      @pavanshetty9806 Рік тому

      There might also be delay in communication.

  • @BILLY-px3hw
    @BILLY-px3hw Рік тому +11

    It tore me apart watching the team coming so close, it really has to weigh on the people who didn't catch the glitch, I am sure some are still laying in bed awake at night, can't wait to see the team bounce back with a flawless mission

    • @OhNiceMatt
      @OhNiceMatt Рік тому

      Those software engineers were layed off, hence the laying in bed awake at night

  • @Songfugel
    @Songfugel Рік тому +53

    Having seen in person how Japanese programmers work, how specialized and narrow their programming skills are and how ridiculously rigid their management approaches are, how many non-unified standards they use, this sort of thing doesn't surprise me at all
    ps. the Ron Burgundu clip was priceless and so on money xD

    • @goodlife1302
      @goodlife1302 Рік тому +3

      I actually did not get your point . Could you please explain little bit more ?

    • @JosePineda-cy6om
      @JosePineda-cy6om Рік тому +3

      the point being this was a bug tha should've been relatively easy to find, if thoy had simulated a couple of "landing site changed at last minute" scenarios that included heavily cratered areas or craters with steep walls. Just doing some tests on random landing sites would've triggered this. But nobodu thought of this, and because of corporate culture, everybody was dis-incentivized to even raise the question

    • @StudioVRM
      @StudioVRM Рік тому +10

      The software was built by Astrobotic, an American company. Not sure how stereotypes of Japanese corporate culture come into this.

    • @goodlife1302
      @goodlife1302 Рік тому +2

      @@JosePineda-cy6om Oh ok . Thanks a lot for the explaination

    • @Dr.Kraig_Ren
      @Dr.Kraig_Ren Рік тому +4

      They outsource programming.
      It happened due to budget and time constraints. I'm pretty sure engineers wanted to rerun the simulation

  • @rorykeegan1895
    @rorykeegan1895 Рік тому +4

    Seems pretty sloppy not realising a change in landing site might cause the craft problems ... Sounds like bad project management to me.

  • @0x8badbeef
    @0x8badbeef Рік тому +9

    6:20 Planned landing site change? That would normally require a revalidation of the software in the industry. I would blame this on the people who decided not to do that. I would investigate those guys and why the change. I would not blame the software as the software was not designed to be used that way.

  • @bertram-raven
    @bertram-raven Рік тому +9

    I would add optical recognition and stored high resolution images to the package. These would optically compare the expected position and orientation to the visible information and so call out anomalies. This entire apparatus would be as small as a Raspberry Pi, using off the shelf components. DJI drones use something similar in their RTB software. When the drone sets off, it takes photographs of its starting location and compares them to the downward facing camera live images when returning to land.

    • @Topcoatdetail
      @Topcoatdetail Рік тому +3

      One of the reasons Chandrayan-2 failed because of the mapping. When the lander moved away from the photographed landing site it tried to over correct and failed.

  • @RogHawk
    @RogHawk Рік тому +6

    Thank you, Scott! You answered questions I've had for the last few years about the landers crashing on the moon.

  • @riccardob9026
    @riccardob9026 Рік тому +40

    To be honest (and a bit philosophical), I would not call this a "bug," in the sense that sometimes with bug you mean an error in the software that makes it behave differently from the behavior specified at design phase. In this case the software had to face a situation that was not expected, that is, a suddenly increase of altitude due to a deep crater. It was not an error introduced at implementation time (that is, when they wrote the software), but at design time. Like a bridge that breaks down, not because some error during the building, but because of a strong wind that was not considered at design time.

    • @DrDeuteron
      @DrDeuteron Рік тому +10

      I agree. This was planning error, or a failure to test error, or changing the landing into a regime that had not been tested, or all of the above. It's been know for a long time that radar altimeters can be spoofed by terrain, it is nothing new.

    • @serronserron1320
      @serronserron1320 Рік тому +1

      An engineering oversight

    • @aspuzling
      @aspuzling Рік тому

      As a software engineer I agree lol but that's not to say it is not also partly the responsibility of software engineers to raise potential bugs in the design.

    • @chaz720
      @chaz720 Рік тому +3

      Agreed, and came to write this. As a space systems engineer, this was a systems engineering failure, not a software bug.

    • @bbgun061
      @bbgun061 Рік тому

      They should have tested their software with real data.

  • @thetooginator153
    @thetooginator153 Рік тому +8

    It would be interesting to try an optical parallax system to verify the radar readings. If both systems agree, then the data is correct. Cameras could be a few meters apart, so, the parallax would be measurable from pretty far away.

    • @EnricoGolfettoMasella
      @EnricoGolfettoMasella Рік тому +1

      That’s a very creative solution! ✌🏼✌🏼Pretty sure would work!

    • @xonx209
      @xonx209 Рік тому +1

      If they don't agree, then what do you do?

    • @4k8t
      @4k8t Рік тому

      @@xonx209 In sci-fi usually it would be three independent systems with two having to agree as to what they were seeing. A two system setup would require that both system have to agree and if one system cut out a sensor as malfunctioning and the other didn't, something would have to be present to break the disagreement deadlock.

  • @ardag1439
    @ardag1439 Рік тому +11

    The lander knows where it is at all times. It knows this because it knows where it isn't. By integrating the difference between where it is and where it was just a moment ago, it doesn't have to rely on the radar altimeter to figure out just where exactly it is. By filtering and evaluating the radar altimeter measurement data, the lander can choose to ignore the upcoming measurements, should a discrepancy be detected between where it seems to be and where it could likely be. In a hypothetical scenario, where the radar altimeter measurement rises and lowers by a few kilometers over the span of several seconds, during which both the linear rate of change of difference between where it is and where it was a few moments ago, and the doppler shift both remain fairly constant, the lander may conclude the radar altimeter data is untrustworthy. After removing the radar altimeter input from the sensor fusion algorithms, the lander can estimate where it is using the Lunar topography data and a trustworthy data point about where it was some moments ago, along with its profile of the rate of change of the derivative of where it is, recorded until the time that it currently is in.

  • @bobblum5973
    @bobblum5973 Рік тому +6

    This sort of situation is actually a good point to use against those who claim the Apollo moon landings were impossible back then because of the limited computer power available. Having a human (or two!) at the controls made the landings difficult but not impossible. Comparing it to the success
    /failure rate of the even earlier Surveyor unmanned landers shows it can be hard to do, and losing a lander is expensive but you can try again.

  • @dandeprop
    @dandeprop Рік тому +3

    Hi Scott: Very nicely done! (but then, I say that a lot about your stuff...). This scenario is directly reminiscent of the situation on the Apollo landings where passing over a crater (or any other feature like that) would cause a 'jump' in the Radar Altimeter-portrayed altitude, and it would 'jump' from the PGNCS altitude. Remember 'Delta H'? The difference between RA and PGNCS altitudes. In order to keep things from diverging in the PGNCS, they had to incorporate a 'terrain map' into the software that accounted for local differences in surface elevation. Remember the landing of Apollo 17? At some time in the PDI maneuver, one of the crewmen (I can't tell which one--they sound a lot alike) said 'We went over the hump, and Delta H just jumped'. It sounds (at least at first blush) like a feature similar to the Apollo 'terrain map' might have been appropriate here (?) Thank you.

  • @SashaNaronin
    @SashaNaronin Рік тому +4

    Sounds like they tightened the Mahalanobis check magrins in Kalman filter. It's the check that real measurement at each time step, expected measurement and estimated measurement errors are all in accordance with each other. And you usually hardcode the acceptable marigins for that, i.e real-expected measurements must < 4 times expected error. If it isn't the measurement is bad (e.g. accelerometer physically fell off the mounting). Unfortunately the margins are often set too tight.
    It could've been another problem tho, related to algorithms similar to simulataneous localization and mapping, but I don't have enough experience with them to judge.

    • @dsewtz3139
      @dsewtz3139 Рік тому +2

      Good point. However, I believe (hope?) Scott was only using it as a a well known example and they don't really use a "naive"/textbook implementation of either Kalman filters or Mahalanobis distance... 🤷🏻 I mean, the moons surface is NOT a hidden probabilistic distribution - so at least four dimensions could use euclidean distance, verified against the same using surface maps - not some exhaustive search in simulation training data for planned approach vectors 🧐 ...if they need help implementing that - I actually wanted to visit a friend in Japan for some time now 🤣 (just kidding, compared to them I also don't have enough experience - but I'm 100% sure, if the control-loop "broke", it was more indepth than due to a wrong geometry of the probability space)

  • @snwendland
    @snwendland Рік тому +8

    I seem to recall the University of Wyoming having a "Missile Guidance for Dummies" audio description of a guidance system for knowing where the missle is by knowing where it isn't - it seemed pretty rock solid. I have to wonder why this method hasn't been adapted for spacecraft yet.

    • @frodo9649
      @frodo9649 Рік тому +7

      It substracts where it should be, from where it wasn't.

    • @H-S.
      @H-S. Рік тому +2

      Exactly. It would be especially helpful in this case; if the lander knew where it isn't, it would not waste fuel by trying to land as if it was just above the surface. :)

    • @u1zha
      @u1zha Рік тому +3

      I believe that's just a sentence for lulz, engineers expressing themselves purposefully obtuse. Kalman filters are exactly the "knowing" part, and a closed loop control system is exactly the "subtracting" part.

    • @frodo9649
      @frodo9649 Рік тому

      @@u1zha ua-cam.com/video/bZe5J8SVCYQ/v-deo.html This video is full of these sentences, that are close to how control loops work, but not quite, which I find quite funny, especially if you know how it works

    • @simongeard4824
      @simongeard4824 Рік тому +2

      @@u1zha Unfortunately, it also inspires a lot of morons to quote that line constantly on UA-cam, perhaps under the mistaken impression that it makes them look smart.

  • @AleXsSpaceXTalks
    @AleXsSpaceXTalks Рік тому +2

    Very good explanation and top video! I guess also the loss of the Mars Polar Lander was caused by a software issue, telling the landing thrusters to ignite too early, causing the probe to run out of fuel...

  • @antonioloma2327
    @antonioloma2327 Рік тому +8

    If they tested by simulating landing on other spots but not on the selected one, then they didn't tested! This isn't a software bug but a project mangement issue (specifically testing). It's like if you "test" your computer program on your desktop but then deploy it in a server and the faster hardware makes apparent a race condition that borks the system. Testing is expensive, testing is hard, but not testing the actual flight plan is dumb.

    • @RobertBlair
      @RobertBlair Рік тому

      Software engineering does not start at the keyboard, and end when it gets sent to a testing team that is somehow not software engineering.
      Engineers have a responsibility to work with testing crew, to validate the test scenarios. The teams failed to run enough variations of realistic input, so inputs outside the limited sets caused a fault.
      Specifically several bugs in the system as a whole
      1 Spacecraft is unable to land without altimeter inputs. Relying on only inertial guidance cannot be accurate enough to land, due to inherent input noise. If the altimeter signal is discarded more than X seconds before touchdown, error margins cause failure rates approaching 100%
      2. Guidance system (apparently) had no way to recover confidence in sensors
      3. Guidance system would erroneously flag valid inputs altimeter as a broken sensor.
      4. Testing was not done to cover new landing site (and yes, a senior engineer should have balked at the change)

  • @jbirdmax
    @jbirdmax Рік тому +7

    Enjoyed hearing you on NSF Mr. manly.

  • @Aditya-gp2ih
    @Aditya-gp2ih Рік тому +1

    Came here after successful landing of chandrayan 3 of India....best of luck to Japan for future projects...

  • @mshepard2264
    @mshepard2264 Рік тому +9

    Stuff like that is really hard to catch before hand. Perfectly good sensors spit out garbage sometimes. I honestly think it might be worth having dual altimeters just to make the signal more robust. (Yea i know it would weigh more) but maybe lidar + radar. Then you make the software not able to disqualify both the radar and lidar at once or something.

    • @dounyamonty
      @dounyamonty Рік тому +1

      Calling it a bug seems wrong tho to me, dual altimeters at least would continue measuring.
      Sounds to me if an unusual number pops up you don't straight up then ignore further readings maybe?
      Like did they set up the software to turn of a key component off once an odd reading occurred?
      Having a backup for dual if not triple check incoming data on the fly instead of relying on past information should be normal.
      Just because a road was empty 30 seconds ago doesn't mean I'll blindly trust there won't be something speeding around the corner.
      Expect the unexpected especially when it's not on the same floating rock right.

    • @patreekotime4578
      @patreekotime4578 Рік тому +3

      @@dounyamonty The system was designed to ignore all new data from a sensor if it felt like some of the readings were out of bounds. They could have had 20 altimeters and the system would have ignored them all because they all would have shown the same "error".

    • @Kromaatikse
      @Kromaatikse Рік тому +3

      ​@@patreekotime4578 When you have only one sensor for a given parameter, treating it as failed after seeing out-of-bounds data from it is just about the only thing you *can* do. But when you have more than one sensor, you can cross-check them, and *that* usually becomes your primary method of determining whether a failure has occurred. If both sensors do something unusual *at the same time,* you might reasonably infer that the problem is *not* in the sensors.
      In this particular case, the spacecraft should have been able to cross-check the radar-altimeter reading with the topography it was flying over. Seeing the distance-to-ground reading increase rapidly as the ground abruptly drops away beneath the spacecraft is an entirely expected condition.

    • @GntlTch
      @GntlTch Рік тому

      Right! Really hard. Who would consider that the Moon might have craters?

  • @brentboswell1294
    @brentboswell1294 Рік тому +5

    Didn't Neal Armstrong have to do some on the fly recoding to overcome the 1202 error when the Eagle lunar module was getting overwhelmed with input? (Which was fixed on later Apollo missions through code fixes and turning off an un-needed radar as part of the checklist?). Seems like they could have used an altimeter, but on the moon the altimeter setting is always "00.00" 😅

  • @scvcebc
    @scvcebc Рік тому +2

    Neil Armstrong took over the controls and manually landed on the moon when he saw rougher terrain than expected at the final approach of the first manned landing in 1969. He was a true test pilot who was able to think fast and take action without losing his nerve. He barely had enough fuel for the extra maneuver, so he was also lucky. The problem with depending on robotics is that software doesn't have "common sense" and enough experience to handle the unexpected. However, these crashed robot landers are much cheaper than manned missions, so with trial and error they will eventually work.

    • @ClickClack_Bam
      @ClickClack_Bam Рік тому

      And then a unicorn ran up & they rode the unicorn all around the Moon going 240,000 miles back to Earth.
      The Unicorn didn't run 28,000mph like they would've had to go in the pop rivet aluminum can they brought them there.

  • @jpdemer5
    @jpdemer5 Рік тому +4

    Given that it was not possible to land the craft without the altimeter data, it's an odd programming decision to permanently ignore that data the moment it starts to look janky. Odds of the altimeter recovering its wits may be slight, but I'd rather give that a go than rely on an approximation that's +/- 5 km in altitude.

    • @HweolRidda
      @HweolRidda Рік тому

      Exactly what I was going to say. Since radar altimeters are highly robust there is almost no situation where you should ignore one. If it does fail your mission is doomed, whether you ignore it or not.

  • @mballer
    @mballer Рік тому +6

    Why haven't they orbited GPS satellites around the moon yet?
    Why don't they drop transmitters to the surface first as becons?

    • @TheMonthlyJack
      @TheMonthlyJack Рік тому +6

      The Moon has very few stable orbits, and they still require propellant. The moon is very lumpy and the earth tends to fling you off.

    • @samuraidriver4x4
      @samuraidriver4x4 Рік тому +2

      Dropping beacons on the surface is the exact same thing as putting a lander like this down.

    • @jhonbus
      @jhonbus Рік тому +3

      @@samuraidriver4x4 To make it easier to land our probe, we will first land three probes.

    • @dalel3608
      @dalel3608 Рік тому

      Apollo did that once using a Surveyor lander as the beacon. But that was just for position, not altitude.

    • @mballer
      @mballer Рік тому

      @@samuraidriver4x4
      Really? Can you describe your design?
      I was thinking about a lightweight baseball sized package on the end of a long collapsible pole, throw a dozen of them out in hopes a few survived to do the job.
      Or how about a huge air bag with the probe suspended by rubber bands in the center, no need for an accurate landing of the beacon.
      Did y'all really think I was suggesting to land a huge piece of equipment to be a beacon?

  • @mikeburch2998
    @mikeburch2998 Рік тому

    I'm so sorry to hear that this happened. I hope they try again and maybe send back some remarkable pictures. Don't give up. Greetings from Arizona.

  • @tertiaryobjective
    @tertiaryobjective Рік тому +5

    Like when you're walking down the stairs and miss that last step.

  • @witchdoctor6502
    @witchdoctor6502 Рік тому +15

    such a stupid error... I'm sure the engineers were banging their heads against the wall when they found out. Hopefully they will be able to build it again and fix the SW for a proper landing.

    • @laimejannister5627
      @laimejannister5627 Рік тому

      imagine failing a moon landing when even china did it successfully

    • @witchdoctor6502
      @witchdoctor6502 Рік тому

      @@laimejannister5627 you do realize you just compared a private company to a government right? Also China has a ton of experience in space - they have their own space station, mars mission, satellites, multiple rockets... just because cheap crap is produced there for people not willing to spend $$$ doesn''t mean that they can't produce quality items

    • @arealperson641
      @arealperson641 Рік тому

      @@witchdoctor6502 well to be fair Space X is also a private company yet they are better than all except maybe a handful of government agencies on Earth. So it doesn’t mean much what entity it is. Plus they already got a pass since it was launched by a spacex rocket.

  • @therealzilch
    @therealzilch Рік тому

    Another fascinating and instructive example of Robert Burns' "The best laid schemes o' mice an' men / Gang aft a-gley.”.
    Cheers from sunny Vienna, Scott.

  • @seann4678
    @seann4678 Рік тому +13

    Hi Scott, during the iSpace debriefing, they reported that their velocimeter did not start reporting data when it expected to be 2km above the surface (event 9 in the schedule).
    Do you know if this is a separate issue or a consequence of being too high from the ground?

  • @yashrajb5251
    @yashrajb5251 Рік тому +3

    Indias Chandrayaan 3 has finally soft landed on the moons south pole successfully. 🎉

  • @noahserio4182
    @noahserio4182 Рік тому +2

    I’m surprised they didn’t have a redundant altimeter to verify the suspect altimeter reading against.

  • @brianjrichman
    @brianjrichman Рік тому +5

    When flying on instruments, pilots are trained to trust what the instruments are showing them, not what the software in their heads is telling them "I can't see the horizon, so I think I'm upside down..." No... You are the right way up etc.

    • @thePronto
      @thePronto Рік тому +1

      We too low...

    • @brianjrichman
      @brianjrichman Рік тому

      @@thePronto We too high when we ran out of fuel... OOPS. Instruments didn't lie then?

    • @thePronto
      @thePronto Рік тому +1

      @@brianjrichman bang ding ow.

    • @oohhboy-funhouse
      @oohhboy-funhouse Рік тому +3

      Been through the training, you absolutely cannot trust your feelings. It varies for each person, for me, I felt I was leaning one left and right. I had to actively fight to focus on the instruments, draining your energy very quickly. Even if you are completely focused on the instruments, your arms will try to 'Level' the plane. You acclimate with each subsequent flight easier, eventually it's nothing. Instrument flying is still draining as you have to fuse all these sensors. In comparison, looking outside is about as stressful as driving.

    • @brianjrichman
      @brianjrichman Рік тому

      @@thePronto

  • @DishNetworkDealerNEO
    @DishNetworkDealerNEO Рік тому +12

    Software is hard…

    • @dorsetdumpling5387
      @dorsetdumpling5387 Рік тому +3

      So, as the lander found, is the moon.

    • @bobbun9630
      @bobbun9630 Рік тому +1

      There are very few human activities that are as complex and as routine and yet have such potential to fail catastrophically from even the smallest of errors. So yes, it's objectively a hard thing to consistently do well.

  • @mrpocock
    @mrpocock Рік тому

    FYI if the landing location and approach is part of the software spec then a change to the landing site and approach is a change to the software spec and requires a full end-to-end revalidation of the software.

  • @MattMcIrvin
    @MattMcIrvin Рік тому +8

    Speaking as a veteran in the software industry, no bug, no matter how absurd, is unbelievable to me.

  • @wolf7115
    @wolf7115 Рік тому +6

    Damn. As a software developer myself, this is totally something that I feel like I could have happen to me as well. I feel for the ispace team, and know they'll get it next time!

    • @Henglaar
      @Henglaar Рік тому

      Agreed. You would have had no way of knowing, or testing for, a way that valid real world data could trigger the failure detection routine. I'm thinking that they need to do a hardware design change: add a second sensor, or better, a second TYPE of sensor and cross check the two. If the data only goes wonky on one of the sensors, declare a failure. And radio home that the mission is probably fatally compromised, given the nature of the failure.

    • @viCoN24
      @viCoN24 Рік тому

      Guess what? This is how it works. The problem here is the definition of what should be interpreted as error in the input. If you have incorrect assumption, another array of sensors will reach the same consensus about being out of order based on unexpected range of values read. Initial landing site was a plain so you wouldn't expect drastic changes in altitude. Compare that to the new landing site near a rim of a crater where rim was high enough to fit the definition of unexpected error for the assumed scenario of landing on a plain. It seems that everything worked correctly for the purpose it was designed and implemented for but someone decided to change critical part of assumptions and cut corners based on gut feeling instead of thorough simulations.
      I wonder if Japanese culture worked against this project where rising this issue was frowned upon by the higher-ups so nobody raised any concerns.

    • @virt1one
      @virt1one 11 місяців тому

      I don't normally have to deal with "what do I do if this important sensor fails?" scenarios. (in most of my things, all the data I have can be trusted 100%) So here I would envision "what do we do if the altimeter returns implausible data?" "Rely on other systems" seems like a good answer, but DEMANDS you also ask "how long can we rely on them?" The answer here is "not very long". Which brings up an important question "what do we do if we've been relying on this other system with decaying accuracy?" If THAT question had been asked, then they would have realized they needed to reconsider using the data from the altimeter.
      You solve this by assigning a "quality" to your sensors, and you go with the highest quality. If you get a wonky reading, you do something like drop the quality to 80% or 60% or 20% or whatever you feel is appropriate. (ONE weird reading might drop it from 95% to 70%, and successive apparent glitches might continue to subtract 5% each?) And when you're using inertial guidance, which is normally "recalibrated", that should have its Quality decay over time, AND reset to the quality of the recalibration source every time it's recalibrated. That's just how that's done.
      In this scenario, say the quality of the altimeter was changed from 95% to 60% when it glitched. OK. The inertial guidance would start ticking down from 95%. Hopefully it will cross below 60% soon, at which point you revert to using the altimeter because its quality is higher. This approach might have "re-enabled" the alitmeter and saved the lander.

  • @robertst-laurent6452
    @robertst-laurent6452 11 місяців тому +1

    Mr. Manley, for the whole planet you are our 21st century Eugene Kranz.
    At 01:13 your video proves that we now have available:
    ‘A da Vinci World of Creativity at Home’
    The video shows that they used a $170 Airspy R2 receiver (with a $620 LNC + antenna) with the mind blowing power of the software available for the Airspy, so for less than $900 USD you can have the same setup at home !
    Your use of the Kerbal simulator, to help us better understand the sequence of events, is of jaw dropping beauty.

  • @averystablegenius
    @averystablegenius Рік тому +6

    Yet another example for building a constellation of Lunar Position Satellites prior to attempting to land on the surface. Like GPS, but LPS for the Moon.
    As always, failure to invest in infrastructure always costs in the long run.
    Anyone wanting to create a commercial venture to put four satellites in Lunar orbit for licensing LPS services to other companies and governments, please reach out to me.

    • @7tonsofsalt865
      @7tonsofsalt865 Рік тому +1

      sadly lunar gravity is very lumpy
      that means no stable orbits
      and iirc the moon doesn't have a big enough sphere of influence for stationary orbit

    • @thePronto
      @thePronto Рік тому

      If we can't land on the moon 50+ years after it first happened we shouldn't be wasting money on tech that could be just as erroneous.

    • @averystablegenius
      @averystablegenius Рік тому

      @@7tonsofsalt865 Good input, 7, but I wonder if the lumps can be compensated for, even dynamically through INS, in software... LRO has been orbiting productively for, what, 16 years?

    • @averystablegenius
      @averystablegenius Рік тому +1

      @@thePronto What tech did you have in mind?

    • @7tonsofsalt865
      @7tonsofsalt865 Рік тому +1

      @@averystablegenius the problem with the lumps is that they make spacecraft require a lot higher fuel usage than say earth orbit.
      even the lunar reconnaissance orbiter which is on one of the few "stable" orbits will eventually run out of fuel
      and with the cost of lunar injection flights i just cant see that being profitable
      maybe one we can launch and refuel them from the moon but well see

  • @Twenty-Seven
    @Twenty-Seven Рік тому +4

    I remember one of my computational methods professors said: "There are no such thing as software bugs, only human error" in one of our first lectures so that we would code more carefully and make sure everything has correct syntax before moving on to a new line. It'll save you a lot of time rather than overconfidently writing 100 more lines of code and then having to scroll through it all only to find that you coded "fr" instead of "for."

    • @effedrien
      @effedrien Рік тому +3

      Most bugs are not caused by typos but by overlooking consequences somewhere else in the code, like creating a potential timing/synchronization issue. Or by wrongly interpreting the functional requirements, or by misreading the documentation from an external library and things like that. Typo bugs are rare, except in the ui but that is because certain programmers cannot write decent English sentences 😅

    • @aakksshhaayy
      @aakksshhaayy Рік тому

      @@effedrien I think he was just giving a basic example.

  • @kaineis
    @kaineis Рік тому

    I love the ksp2 animations you added. That was really nice to watch.

  • @AeroGraphica
    @AeroGraphica Рік тому +4

    I suppose that with the rapid advances in technology and AI , this kind of problems will soon disappear.
    A simple camera pair for example could recreate human-like vision, and give enough information to an AI to perform a landing, specially if paralleled with all the already existing sensors.

    • @LaBamba690
      @LaBamba690 Рік тому +1

      Excellent point.

    • @alwayshiking_
      @alwayshiking_ Рік тому

      And do you really think the Japanese didn't deploy that?

    • @AeroGraphica
      @AeroGraphica Рік тому

      @@alwayshiking_ Well, apparently not since it crashed after thinking for too long that it had landed, 5km above the surface ...

    • @drill_fiend1097
      @drill_fiend1097 Рік тому +1

      With the AI the need for higher processing power to multiply large matrices come. This increases the electricity power requirements and requires more RnD for creating radiation-hardened variants of processors.
      The Snapdragon 801 in ingenuity Mars drone is probably the state-of-the-art SoC being used. But that's the same one used for Galaxy S5 a decade ago.

    • @pavanshetty9806
      @pavanshetty9806 Рік тому

      I doubt AI in space crafts any time soon unless we develop more effecient processors.

  • @Bill_Woo
    @Bill_Woo Рік тому +4

    A moment of silence for the Mars programmer who couldn't handle math, metric and standard.
    Although, having some direct experience with atrocious egomaniacal team leaders and managers, including one in particular at JPL, I would suspect that the programmers and reviewers did exactly what they were told, while the team leader should get a "black mark in his permanent record" for being the one who was the true culprit.

  • @zrohit
    @zrohit Рік тому

    Maybe multiple countries could drop beacons around common landing areas that everyone could use during landing. Not a foolproof but can help.

  • @Jonathan_Doe_
    @Jonathan_Doe_ Рік тому +3

    Really makes you appreciate Margaret Hamilton’s achievements given the computing power of the time. Yes the Apollo landings were manned missions, but they were still very dependent on that code functioning correctly.

  • @younameme2
    @younameme2 Рік тому +5

    Perhaps a stupid question but why didn't it have redundancy on sensors?

    • @linecraftman3907
      @linecraftman3907 Рік тому +1

      Probably weight limit

    • @samuraidriver4x4
      @samuraidriver4x4 Рік тому

      Weight would be the first thing that comes to mind.
      More mass = more capable spacecraft needed = more costs.

    • @abdullahkhalil9284
      @abdullahkhalil9284 Рік тому

      It has. all those "radio Sensors" (I think it had 3, for collecting data from different directions) that were measuring the height were identified as "faulty" by the bug and It turned off the input from those sensors. Marking the sensor as faulty based on the output of the sensor and turning off all those sensors was the bug

    • @oohhboy-funhouse
      @oohhboy-funhouse Рік тому

      $$$ and weight.

    • @olasek7972
      @olasek7972 Рік тому

      find me a spacecraft that has/had redundant radar altimeter, even Apollo didn't have one

  • @LightsEnd304
    @LightsEnd304 Рік тому

    Your explanation reminded me quite a bit of dynamic positioning systems on ships / oil rigs

  • @jeffstaples347
    @jeffstaples347 Рік тому +22

    Dunno if you'll see this, but I'd be suuuuper interested to hear your thoughts on whether software engineers should have the same certification processes that physical engineers have.

    • @Esablaka
      @Esablaka Рік тому +6

      Fyi: they do have in some countries.
      One isn't allowed to develop software for medical gear for example here in Germany without certain qualifications. There are also legal requirements and standards that the software development teams working on critical or potentially dangerous software have to adhere to here both in regards to coding and testing, but also in regards to their overall software design, risk analysis and much much more.

    • @jhonbus
      @jhonbus Рік тому +4

      The kind of software you're working on makes an enormous difference to the consequences of getting it wrong, and I don't think it necessarily makes sense to try and develop a certification scheme that's simultaneously rigorous enough to assess people working on nuclear reactor control software while not failing 90% of candidates going into a career in casual game development...

    • @torinor6703
      @torinor6703 Рік тому +2

      They do. Software engineer, computer engineer, computer scientist, etc they are all degrees that you can get studying in college. The issue is that you have all the 3 month long React crash course Jimmys that call themselves "software engineers" when they are software developers instead. So yeah, there are many different certificates for software engineers, its just that the name is often not respected.
      Btw Im not trying to shit on self taught or non engineer software devs. The same way people like Michael Faraday did amazing things in the name of scientific research without a degree, some software developers are incredible at what they do. I dont want to sound like Im implicitly blaming the lack of certificate enforcement as the reason why stuff like these software bugs exist. This could have very well been coded by a software engineer, who would be also not to blame because stuff like this has to be peer reviewed and tested by many people and on many levels. It was simply and incredibly unfortunate event

    • @jeffstaples347
      @jeffstaples347 Рік тому +1

      A degree is not certification. And for physical engineers that assume accountability for the designs of their companies, Principal Engineers, their job titles do not come easily. What would be interesting to see is if there is any push to require engineer ACCOUNTABILITY, not just responsibility, as things are currently in the physical space.

    • @CarFreeSegnitz
      @CarFreeSegnitz Рік тому +4

      This brings me back to my software engineering course. There are supposed to be several stages like requirements, specifications, design, implementation, testing, maintenance. And each stage is supposed to be evaluated and fed back to improve processes. The instructor stated that there had only ever been ONE software project to have ever followed the theory: Space Shuttle avionics.

  • @acasualviewer5861
    @acasualviewer5861 Рік тому +11

    As a software developer I know there's no such thing as an unbelievable software bug. Whatever can happen, will happen.. you have to be incredibly diligent to remove as many bugs as possible, and still there will be bugs.

    • @teslatrooper
      @teslatrooper Рік тому +1

      okay but what if we just don't write the bugs in the first place, ever thought about that? j/k

    • @acasualviewer5861
      @acasualviewer5861 Рік тому

      @@teslatrooper haha.. tell me you don't know anything about software development without telling you don't know anything about software development.

  • @jeechun
    @jeechun Рік тому +2

    This story reminds me that once I planned to make a simulation for spaceships/probes, where the simulation goes down to almost hardware level, where the subsystems (sensors) could be configured to have a certain precision, sampling rate, processing delay, and the way how they communicate with the CPU, the flight computer, so the design of such a vehicle architecture would be closer to reality. Also, the propulsion units could be configured to have delay to start/stop/change working, and a function, how it is done.
    May be, in KSP3? :D
    (Feel free to use this idea, most probably I won't have time to develop it.)

  • @henningerhenningstone691
    @henningerhenningstone691 Рік тому +5

    Unfortunate that this happened, but kinda funny that it essentially boils down to "some dev forgot the moon has craters".
    Makes me feel better about my own mistakes, you can be solving the most complex of problems yet miss some way too obvious issues

    • @GntlTch
      @GntlTch Рік тому +1

      Not just one - the entire team!