Thanks for watching! Some additional details not explained in the video, which might help to better understand the irregularities observed: - I didn't show all the AI inputs in the video to keep things simple. In reality it has access to more information, such as x,y,z velocity, velocity rates, roll-pitch-yaw rates, etc. But maybe it's still missing some crucial information, it's hard to know. - The irregularities observed are not due to hardware or framerate issues. All the things I'm showing in the video are made with a tool called TMinterface. This tool allows me to inject action commands into the game at precise timings, in a way that is 100% repeatable. The same sequence of action on the same map will aways lead to the same outcome, even on a different computer. It's completely deterministic. And it's used a lot for Tool Assisted Speedrun (TAS), you can find many examples of that on youtube. I have a bit of extra footage that I couldn't fit into this long video. I plan to post some of this extra content on my Patreon in the next weeks. Any support there is a great motivation to keep making these videos :) patreon.com/Yoshtm
Do you know if the inputs are completely deterministic independent to time? If you run the same set of inputs offset in time, will the result be identical?
@@andrewbrown2038 If you are speaking of ingame time, it has an effect. Starting to accelerate at t=0.00s vs starting to accelerate at t=0.01s are two different inputs, which will have different outcomes, as shown in the video around 32:00. But it's still deterministic: starting to accelerate at t=0.01s will always have the same outcome in successive races.
So from what I remember about neural networks, each neuron of a neural network takes the inputs as numbers multiplied by weights and passes them through a function to output another number to be taken as input once again. Now I don't know the specifics of your neural network such as what activation function you are using, but combining this with how your outputs of the network will affect the inputs after any amount of time, changing a specific input at any specific time, even slightly, would create a chain reaction that would get you your random variations after any period (usually 1 or 2 seconds) as seen at 18:44. Thus, the random variation is not necessarily a problem of incorrect inputs or anything but rather an aspect of neural networks. Perhaps I am misunderstanding some way, or maybe you had already answered this question, but I believe that is the cause of the random variation.
@@mattanderson5239 a few seconds later, around 18:50, I show that even when I force all cars to use the exact same sequence of actions after the perturbation, the randomness is still there. Maybe the neural network is part of the problem, but according to this result, it can't be the only source of randomness
Did you try to add small randomness to AI actions? Like +/- 0.001% of steering. It should force AI to drive more safely, because of unpredictability of each action's outcome. Not a datascientist btw, just an idea to try.
I came here to see AI destroy humans on Trackmania, but I got a 37 minute essay on how pipes in Trackmania are an exercise in chaos theory instead. 10/10
PhD Student in Deep RL here. The behavior at the end seems mathematically chaotic and might lead you to the conclusion that you cannot predict deterministically the behavior of the car. However, that does not mean you can't improve the performance by a long shot with a few simple tricks. You are doing model-free reinforcement learning, which basically means that you don't need to predict what is going to happen exactly (and extremely hard here), you just need to figure out what actions are the best in this situation. In most environments, this is actually much easier to do (and a reason why learning an accurate model is often harder than just straight up optimizing for performance with model-free). Second problem is that you are using an RL algorithm and RL assumes you are in a Markov Devision Process, ie that you have full observability of the world, ie that the probability of taking the next action ONLY depends on the current information that you have, and not the past, ie that adding information about the past does NOT help ou make better decisions. HOWEVER, you are NOT in an MDP (Markov Devision Process) here since you lack critical information about the state to take decision, as many pointed out. In practice, what you can do will most certainly improve performance: - either manually add all the missing info (speed, rotations, ...) - or more simply, create a concatenated vector of the previous N past states and inputs and use that as input to your NN (LSTM/RNN works too but is more complex and often not necessary when we are dealing with a few steps of history). - Use more layers (deeper network = more complex function can be learned by the network) - Sticky/random actions: introduce manually randomness in the action of the agent. For example repeat randomly the past action with a small probability. Do NOT decrease this probability with time. What this means is that the agent will have to learn the whole time in a now stochastic (ie random) environment and will have no choice but to cope with the inherent randomness that you added, making it largely more robust than an deterministic agent. This is a common flaw of fully deterministic agent in these kind of environment btw. - Look at extensions like Max Entropy RL (Soft Actor Critic for instance), where you both maximize reward and entropy of the policy. Some papers proved that it makes the learned policy more robust to out-Of-Distribution perturbations, ie perturbations that were never seen during training. In your case, this will help the agent lean "recovering behaviors" to recover from deviations caused by physics bugs/randomness/whatever. - Max Entropy RL will also help with exploration a lot, which might help the AI be more creative in finding solutions. - Look at other tricks that improve performance "for free". I don't know how much you implemented already, but a basic start is looking at Rainbow DQN, PPO.... But you are already using it probably..?
Roboticist here. Some regions of configuration space are more chaotic than others; the demo you did where you perturbed a car that was about to fall off demonstrated a smooth increase in success rate as you moved the perturbation further away from the point at which outcomes diverged, suggesting that the failure was in a fairly narrow chaotic _region_ and moving out of that region made the system more stable and enabled more consistent success. This is what unnamed did to complete the jumps track: Found a strategy that consistently avoided highly-divergent regions of configuration space. One problem is that, if the display of the inputs is representative, the AI doesn't have the right sensors. Specifically, the AI has its position and speed but not its _velocity_; it doesn't know how much of its speedo reading is in each component of down the track vs. across the track vs. away from the track. More critically, it also doesn't have roll-pitch-yaw rates. It only knows its current orientation, not the rate at which its orientation is changing. I suspect that the car also needs to know the difference between a simple pipe elbow, a tee, and a cross. Even if the extra geometry isn't in the direction of its travel it'll almost certainly still interact. Finally, consider increasing the numerical precision of your network. This is one of those cases where running floats vs. doubles might actually matter.
Adding rate inputs, or more generally some memory of past frames (recurrent neural networks), should indeed improve the results. Unless Yosh is already using those and has just kept the narration simple. :)
I hadn't noticed that but that's a great point. If you think of the ai as trying to force configuration space into the few fast values where it doesn't fall off being able to actually see the full space can be a great help. Otherwise it becomes impossible to figure out the optimal action. Of course there might be performance and training concerns if you take it to the extreme but experimenting with adding more inputs could be a good way to see what improves consistancy.
fully agree - the AI needs more information to make the right decision. And I don't think giving it all the information would solve all the problems. Because the game engine itself is just a simulation of real world physics. There also is a tick rate and some approximations/simplifications being done. Some things will be hardcoded like material coefficients for energy conservation during different types of bounces (side of the wheel against a wall is vastly different than bottom of the wheel to road/pipe surface you land on). Imo yosh needs a well trained model on the game engine itself. In a way that the AI is able to predict the result of each action it makes.
I think this is it. The inputs look like they are suited for a track with a 2D layout like in his last videos but these pipes need a better 3D understanding of angular momentum. The AI is doing the best it can with a blind spot in its vision.
In previous videos, Yosh explained he uses the last X frames in addition to the current frame as input, so I'd assume he's doing the same here. (At least I think Yosh does, it's hard to remember which AI video does what :E)
most certainly. it also means that all of thousands of times you failed were simply bad luck, and everyone to ever succeed at a world record simply got lucky, and this is not at all a skill issue
Thanks for using my map :D I have heard people saying that Trackmania is a deterministic game hundreds of times. I think from now on I will prefer to use the word "chaotic" :D
Unironically one of the best track mania / AI beats things video I have ever seen, the visual representation of everything and your clear explanations made this so satisfying to watch. Like the showcase you did when explaining the randomness of tiny input changes was so well done and it is really clear how much time you put into this topic and video. Also your humor ist just amazing lmao
Wow, this was not only informative and entertaining but a cinematic masterpiece with perfect pacing and music selection and everything, I didn’t even realise it was 40 minutes long until after I finished watching it!
I’m not a TrackMania player but I love watching TrackMania videos. I also love game AI vids, and this video really did it for me. Being fascinated by chaos theory I’ve been screaming at my screen throughout the whole video ”bro it’s chaos theory”, and then you even mention it yourself. Top tier content, it’s what I’m here for. You just gained a new subscriber 😄
Haha same! I'm into chaos theory too, so the second he was like "it's deterministic, but it still feels so random...little differences in input..." I was yelling at my computer!
Genuine comment though, if you give the ai full knowledge of the decimal precision you’re getting, it might actually be able to do something about it, give it the smallest of minute details, only then can it do something about it
Your analysis about chaos theory is exactly what is going on. And TrackMania's chaotic physics are very familiar to me from another TrackMania project. The best way to control that chaos is by going slower, as the deviations caused by the bugs are less severe at lower speeds. Perhaps giving less reward for speed and more reward by having a high chance of getting far would help with consistency.
In fact, that's probably why the backwards driving cars were much more consistent. There's not as much deviation in speed when they're going backwards, so any bug has less impact and the performance reward is more focused on getting far too.
I dealt with similar issues in some TAS work with Neon White - in that case, small variations in frametimes would lead to different interpolation of player movement (even with fixed tickrate physics, they still interpolated the actual player position based on frametimes), which would add up to significant positional desyncs occasionally. Vsync helped reduce the frequency of this (as it would better sync frametimes to monitor refreshes), but it still wouldn't fix it entirely. sometimes, it's just about the little things.
I dont wanna say one thing is the reason for the other without 100% confidence but going forward = more speed = more chaos, backwards = less speed = less chaos, coincidence?
@@Bruno-cb5gk I think you will find such a calculation not only to be impractical, but next to impossible to perform… The Lyapunov time will only be a few frames/instances of the physics engine refreshing. Anyway, we know all the hidden variables here as he showed in the video, so there isn’t much point since it still is deterministic and we can actually view all the precise starting information right, why can’t we just let the ai view that data as well? Correct me if I am mistaken please.
At a high level-and as someone who does not necessarily understand the mathematical underpinnings of chaos theory or neural network training-I would say the issue to solving the inherent imperceptible randomness here is adaptability. I noticed that at certain points training on the pipe courses, you changed the rewards system or the parameters that the AI could use to operate and judge its success, allowing it to achieve a desired outcome to adapt to specific scenarios or ways of thinking. At the end of the Calm Down course, for example, you gave it specific parameters at one point in the track that encouraged it to follow in the course of Wirtual and try a completely different technique that it simply would not have discovered on its own using your existing parameters. All that to say, the creativity that you pointed out is key, and I think that could also be called adaptability. How do we define in code the ability to see a possible improvement and actually try to make that happen, then give up and move on if no appreciable progress is achieved? You might not just have to train the control model itself-you may need to train the rewards. Maybe by changing its goals as a part of the training process, you might get it to prioritize certain things like stability, consistency, or pace depending on the situation. I think humans operate a lot more like that. If we are trying to do the pipes, for example, we may start by trying to just stay on the pipes and stabilize. We might learn how to recover from a mistake or a random element introduced by the system. As we are learning the course, our priority might shift in the middle of the run from going as fast as possible to not falling off, and that changes the behavior. In this video, it seems like you only changed the goals yourself in between training sessions. How do we put all of these pieces together to train a model that can understand how to change its own rewards to best suit the situation and desired outcome? At the end of the day, it was fascinating to see how “human” the trained models looked. It continued to refine how it applied inputs, but it struggled with the inherent minute inconsistency in the game engine.
The butterfly transistion at 17:27 is a pure masterpiece! The many small things u do while editing dosen't get enough appreceation! Congrats on a beautiful video!
Bro, i was like no fkn way he beats wirtual again upside down and then does that backwards just to show off that wirtual is just driving like a grandma on that track lol
What an insanely good video! I haven't even played Trackmania in many years... but the way you explain the details and visualize everything just makes this soooo interesting! 😌👏👏👏
I love how the AI on the 65km pipe track looks like it's just blissfully skipping along while the human records look like they're terrified. it's so silly.
The editing, the music choice, the compactness of the information while still remaining coherent, the pacing. This video is a masterpiece! Even small things like using the butterfly-shaped pinhole transition when bringing up that topic for the first-time, or how the hue of the cars match up with what generation they're from, so many little details that I could write up my own essay detailing every little thing you put extra effort into. I was enraptured the whole time. Bravo!
Your comment perfectly captures the things I noticed most. The continuous audio and visual cues where important concepts were being introduced and then the release of focus where it was ok to do so. This is just as interesting as the stuff I remained engaged enough to learn due to the editing and composition. A god damn MASTERCLASS!!!!
Mathematician here, I fully agree with your chaos theoretic conclusions, and I think the pipe experiment at the end strongly supports this (as you realised). Why not give the ai control after a random amount of time has passed at the beginning? I think it is really interesting, I would like to see if the ai could somehow abuse the chaos theoretic nature of the situation - I remember a video about a Pokémon ai that figured out some elementary RNG manipulation so that it wouldn’t get grass encounters :D If you want to talk actual mathematics, then I think perhaps the UA-cam comment section is inadequate - I cant answer every question either, I only took one chaos theory class as a grad student, but perhaps we can try to understand some things.
What I'm curious about as a programmer, thinking about this issue, is if for whatever reason the physics system is tied to a clock/tick rate that could potentially cause that bizarre result where a certain delay always means success. Like if you tried the sim on another machine would the delay carry over? Is it a processing limitation? There could also just be mathematical issues within the physics engine itself; e.g. If you hit the ground hard, and lets say it spikes the suspension values to some crazy number, maybe the engine tosses values over a certain threshold or under another. @Yosh I wonder if you could teach it what stability "means", and set a reward system up for "good angles" that it finds for drops that are more likely to produce consistent results.
This video is incredible from start to finish, the subject matter is fascinating, the editing is sublime, the commentary and timing is great... The wirtual music, and the carrot sound byte. all of it is just so good. Thank you Yosh
this video is about something very technical, yet, it touches something deep inside me, something more emotional rather than rational. I sincerely congratulate you on this, I feel this turned out to be more philosophic and even artistic than I expected it to be. (also, huge detail to use wirtual theme and editing style, it adds a whole new emotional layer)
Your humor with "the song" and "the AI got this run" is just too damn much man. On top of being a cinematic masterpiece, it's also funny as hell. Insane video, genuinely hope you blow up beyond belief.
I didn't expect there to be micro fluctuations in a still car, but its equally fascinating as it is frustrating to learn the game is so sensitive that milliseconds can, in niche scenarios, determine your success. It may not be truly random, but it's close enough that you can't account for it.
But that's actually what happens to a car IRL. A straight line is a mathematical convention with absolutely no correspondence to anything in the observable universe
It does make sense in a way, where else can you see that much self-induced suffering lol Still, nice to see a fellow countryman show so much perseverance - inspiring, in a way.
"In the Hall of the Mountain King" playing while all the cars rush to their demise like a cascade, is so fitting. Your videos are so interesting and it was worth every second spent.
When I saw that butterfly, I thought you're gonna get deep into chaos theory. Thank you for the great video and for not complicating that topic further this video. I would love to see a project discovering the chaotic behaviour of collision with pipes, as an example, in Trackmania.
This is really stellar work. Not just the AIs but the videos. I'd equate these with the likes of Summoning Salt who are just very good at not only doing stellar work themselves (speed runs in his case) but also present it in an enjoyable and captivating fashion. I have not binged watched your material, but from the videos I have seen this is some of your best work. Keep em coming.
Even deterministically chaotic systems can be controlled, in practice it is when Lyapunov exponent is small and the agent has quick enough reaction time (and steering capability) to keep control. But the touch at 34:06 looks similar to a landing bug. A sudden deviation from usual behavior. What you can do is to add another (part of the overall) deep neural network to act as a predictor. That would detect such "buggy surprises", and you can focus the score on surviving such surprises. As to making the competition with humans more fair, what you could do is to make AI commit to its actions some time (say 0.3 seconds) into future. This would lower its control, so it would have to drive slower in order not to lose control. (Edited typos.)
Splitting the AI up into multiple smaller ones seems like the way to go. For example one for prediction, one for recovering from mistakes/bugs/chaos, one for just trying to go fast (pretty much what the AI in this video was) then a controller AI that decides when to switch between them. This approach would also make harder tracks more possible, if there would be a separate AI just for speedslides, bug slides and all the other techniques, a few prediction AIs and then a decision maker AI.
But landing bugs are deterministic. They are called bugs but if you execute the exact same inputs the landing bug will happen. That’s how trackmania saves replays in fact. I think what is nondeterministic is how long it takes the AI to compute an update, so it gets a slightly different view of the world at the next step.
This definitely is your best video ever! I've enjoyed the previous ones a lot, but this is a masterpiece! Keep going at it; the humor, the theorycrafting, your thought processes... it's all just one giant jigsaw that fits together perfectly.
Goosebumps indeed but I gotta the say the pun at the end that more patreon subscribers would be a great "reward" for him made me crack up even more 😂 suddenly the guy is just an AI himself 😅
Every time I see these videos appear, I can't help but watch. It's super interesting to see the steps required to train the model and the steps you take as you iterate. On top of that, the way you've visualised all of the attempts and with the use of colour to demonstrate how close the AI is to ideal for the scenario is just pure eye-candy. I love it!
This was such a masterpiece video! I’ve been following you for a long time but it is incredible on how far you have come. I enjoyed every minute of this video. Thanks. ❤
Software Developer with focus on ML here. Just from 32:03 you can see a lot of variance in the game engine physics calculations. This is most likely due to floating point precision error over time. To a human player, this variance isn't preceptable. Eg. one hundred thousandth of a velocity change looks the same in reality, but the different input may cause the AI to make a different decision. The AI may be over fitting into the region of noise in the signal. To make the AI more consistent and human like, I think you need to actually DECREASE the precision of these values fed into the model. That should help reduce the chaos being injected into the system. This is a balance because reducing the precision too far is bad too. A simple trick to try is to convert inputs to int8 or int16. That will give lower precision input and perform faster with aimpler math calculations. Or you can just truncate the float to the first 4-5 significant digits.
Im a new french IA dev. Chaos behavior could be stop, if you go to a stable deterministic and confort position / case. In your pipe jump race, you can reduce your problem circuit, into a single jump : start in any position on a pipe, jump on a next pipe and stop. Then reward if it stops on the pipe. So, just try to reward it, when it stops a moment on each pipe. Then chaos will be reset.
but it wouls have to stop at the same place each times, which is maybe not possible, because the initial situation make it so that, you have a limited numbers of movement you can do, and a limited number of different speed you can achieve, it's all graduated, so there may not be any way to reset the chaos like this.
@@ledernierutopiste Yes this can be a little problem. As he did before, he can spawn the car randomly on the pipe. Then the IA will learn how to go from any initial position. Then in the next pipe, after these 4 wheels are on contact and fix during a moment, the IA will know how to continue these next pipe. And so on.
The only think is the game isnt deterministic. He showed that the position of the car changes by am imperceivably amount hundred times a second that leads to drastically different outcomes. So it could never be in the same spot twice
@@cantingpython We don't need to be in the same spot twice. We need that the car have four wheels on the floor at the same time for a moment, to be stable, and recover a deterministic state. Jump. And repeat theses operations at each level.
I cant even imagine the amount of work behind the design/ training of the AI, and the writing/editing of this video. The amount of work you put is INSANE and the result is beautiful and astonishing! Tthe way you highlighted chaotic behavior was also very interresting! It was a real pleasure to watch!
First video of yours I've seen, and it's an instant subscribe. I know nothing about Trackmania and I'm not much of a gamer generally, but this was absolutely fascinating, and the quality of your shots and editing is outstanding.
Yo man was just gonna say that you inspired me to dive deeper into reinforcement learning for my master thesis after seeing your amazing work. Keep it up!❤
Fellow NN dabbler here, I have a few thoughts / suggestions: There's a degree of unpredictability from floating point bit precision and, as it's obvious to all, the system is inherently unstable. But I don't think this is insurmountable. Firstly, there's a limit to what you can achieve with just two hidden layers and no temporal context. The lack of layers will translate into a lack of abstraction. All you'll get is "this high, that low" kind of reaction. Worse still, the NN is meeting decisions based on the instantaneous state when the most important thing is momentum which has a time component. Therefore you've given it an unsolvable problem, like trying to play chess without knowing where your opponent's pieces are. You can test this hypothesis by calculating angular and linear momentum and feeding it in as an input. If I'm right, you'll see an instant improvement. Much better though is to allow access to the past. Try a simple recurrent NN first to see if it improves matters. Although you'll have to watch the expense. Finally, just like the pipe it rides, the instability of the system becomes critical when you approach loss of control circumstances. Right now that's not being trained for, the change in the input just alters which corner fails. So the loss is not factored into the reward meeting that you stay in the same position in latent space. You need to find the bell curve before the failure is most recoverable. Say it's 3 secs before the fall. Find a failure point and mass spawn there with a random time variable of say 0.5 seconds. Train the AI that, on average, has the least falls. Fundamentally though, an input / reaction AI is insufficient. I would probably look at using an attention vector where I'd look to get it to recognise certain situations and contextualise its response. Just a thought. Very fun vid btw!
Great comment! 👍 With a lot of effort and sophisticated methods I don't think it would be unfeasible to train a model to the point where it never "bugs", i.e. it avoids and/or works around situations where the chaotic nature of the simulation surfaces. Maybe even an AI that could uber at will... 👀
What's worrying about the floating point imprecision is that some of the car's parameters change by an order of magnitude even when it's at rest (32:02) which seems to indicate the game engine doesn't really try to limit fp error propagation.
@@JerehmiaBoaz That's just the physics engine implementation. There's a "gravitational" acceleration and then a hit box on the wheels which counteracts it. Obviously it doesn't quite return the body to the same place but it's imperceptible so nobody is going to worry about it. These things are an art not a science, they'll be "suspension" on the car and "friction" so they'll mess with the settings until the car feels right. Bit precision on these tiny effects just scrambles the numbers a little, it's not the cause as such. If you run the same computer programme twice you always get the same result; bit imprecision always gives the same result. The real issue here is how low entropy the solution is. The count of microstates of failed outcome macrostate is particularly large compared to that of the success macrostate so small changes escalate quickly. Compare that to noughts and crosses / tic tac toe, the success macrostate is a substantial proportion of the total. Sorry, that was an overly academic take but it's the fundamental issue.
@@davidmurphy563 I understand how physics engines work and I've dabbled in AI myself a bit (I'm a contributor to the stockfish chess engine). My point is that the number of microstates of the successful outcomr macrostate might be further reduced by the granularity of the fp calculations performed by the game engine, it might be impossible to take some corners perfectly because the engine's inaccuracy makes the car either understeer or oversteer depending on the involved physics parameters.
@@JerehmiaBoaz Oh wow, really?! I'm a big chess fan myself and Stockfish is an awesome engine! Hats off to you. Sorry, didn't mean to teach you to suck eggs, I just assumed layman and honestly, I just wrote what I'd been mulling over anyway. Helps me think it over. Yeah sure, I'll accept that point in principle, although perhaps not in practice here. In general, the more bites at an apple you have to correct the divergence the better. If you were to map the latent space onto a discrete cosine transform (I don't recommend fourier as the ends need to meet) with the outer product then you could likely identify the frequency ceiling and set the framerate accordingly. Although if I recall correctly I believe the dude tried upping the frames without result which - if you squint through rose tinted glasses - vaguely supports my suspicion that missing temporal data led to intractability.
What an interesting video, this is unironically one of the most entertaining videos I've watched on this website in a long time. Absolutely crazy concept and amazing work with great editing.
Oh my god... not only was this topic SUPER interesting, but it was also presented and explained beautifully, masterfully shot and edited... the list goes on. The way you exhibit the improvements with colors, showing hundreds or probably thousands of runs simultaneously is visually stunning but also very easy for the average viewer to digest. I'm blown away. I subscribed after being suggested and watching your last video, and I'm so glad I did. Just unbelievably well done. Massive props! 🤩
Bro i recently started watching ur vids but they are absolutely amazing. The editing to show all car runs are amazing along with the color coding and the overall video. Much love
To any professor out there trying to find a why to create an inspiring course on modern computer science and AI. THIS is how you do it. In this 40 min video the following essential concepts have been demonstrated: basics of neural networks, evolving of perbutation in non linear system leading to chaos, visualization to support human analysis of complex systems, gamification, inventing and using classification to measure change in complex systems, using evolutionary techniques combined with reinforcement learning, using focused training in AI systems to overcome plateaus, establishing metrics to quantify progress, competition, monetization, community building, the importance of endurance,... you name it. And all i using a fun game! Take this video. Write all the concepts on a board. Start teaching. (Male) students will love it. It amazes me that a single guy, without even attempting or knowing it, did all this on its own. Congratulations! An outstanding achievement.
@@aesop1451Yes? I fail to see why you ask such a dumb question? My little cousin loved Lightning McQueen so much that she used to have a Lightning McQueen bed.
There are two deterministic systems here to distinguish: the engine and the trained instance. The game engine, it is deterministic, and the many degrees of freedom plus digital nature of the floating point numbers introducing nonlinearities increases the chaotic behavior of the system. Each of the neural nets decides deterministically with given equations, which are nonlinear in the inputs (by the many layers and activation functions). Now, that chaotic/nonlinear physics feed into the chaotic/nonlinear trained model instance. Finding the weights of the model that give the perfect solution might be impossible, but there might be infinite solutions that bring the car in a WR time from start to finish. (infinite for the continuous model; countable many using floating point numbers). The bigger challenge is to define the perfect solution and finding it. For chaotic systems these parameters can sometimes not be found efficiently. Using evolution as an approach is great is the case, as you portrayed beautifully! Still this approach does not guarantee the optimal solution. Numerical optimization can make problems practical, but the theoretical optimum can only be found in the least non-chaotic cases. Thanks for the great video, loved watching it! (phd student in complex physics)
If you plug two chaotic systems together and use the same initial conditions then they will still produce the same output. It doesn’t matter that you stacked them. There has to be nondeterminism somewhere, to get the result of this video without manual perturbation
Well, finitely many IEEE floats can take up to 2^32 ish valid states each (minus nan and other special values). Thus while given the absurd scale of the permutations there are, by technicality a finite number of them@@drdca8263
I think it simply lies with the rounding errors that are induced by the variations in clock when certain calculations are performed. It's the difference in the initial parameter of the timing that causes the divergence despite the same inputs.
Among all the AI videos I have watched, this one was by far the most interesting. Your storytelling was spot on. Thank you so much! I really enjoyed it.
Video incroyable, franchement bravo pour le travail, et j'ai vraiment kiffer voir ta réfléxion au fur et a mesure de l'évolution des problèmes. C'était trop cool !
I think youre close with your conclusion abouy chaos theory. Here are my thoughts about it (but bear with me: I am not an expert on reinforcement learning, I work in supervised learning): 1. I believe one big problem with Trackmania specifically is: because of the bugs you desribed (like landing bugs), it is NOT a chaotic system where a small deviation leads to vastly different outcomes far into the future. The future will deviate extremely slowly, but a bug might introduce a sudden extreme change that cannot be foreseen. This is just an observation, but if you could somehow detect these bugs on the pipes and give reward for surviving such bugs, it may help. 2. In supervised learning there is a common technique of bagging/boosting (slightly different than what I am about to describe, but the idea is the same): take specifically runs of your agent that went wrong (like one that falls off), and train on that run specifically (maybe start 1 second before it fell, with the exact same conditions) some percentage of your training time (this percentage can be adjusted based on how much you value reducing false negative/false positives usually. However: in your case this percentage might be a hyperparameter that you adjust for the speed/consistency thing. You can also change the reward function in these runs specifically to 'just reach the next corner safely'. 3. Bit of a strange idea, but ita worth an experiment (this is an idea that comes from time series forecasting): introduce tiny noise to the inputs and/or outputs of your agent. This way, the agent may become more reactive and may learn to deal better with unexpected deviations.
I didn't understand your point 1. : why do you deduce from the presence of bugs that trackmania might not be a chaotic system ? Points 2. and 3. are good ideas
@@yoshtm To build on idea 3, you may want to also try reducing the precision of the inputs. If it has a lot of fluctuating low-significance noise even at rest, you may get a better model if you just ignore it by converting to a smaller FP type.
Im writing an exam tomorrow about evolutionary algorithms - so to speak the back bones of reinforcement learning - and it's so cool to finally see all these mathematical theories and concepts in a "real life" example, good job :)
Its most likely the floating point math. Basically the program checks in every cycle, looks at the state, then tries to press 0 but because computers are funny the computer has a bunch of unexpected values down the line. Im pretty sure you can find the information needed on a video about why computers can do calc or algebra or whatever higher level math they struggle with
La qualité de la description de la méthodologie et toute la documentation que tu donnes est incroyable, tu ferais pâlir beaucoup de docteurs, on est au niveau professoral là
Amazing video I'm surprised how consistent and dedicated you've been to this project, I don't have doubt you have spent as much time as some pro players have played the game.
Not to mention sheer computational power. It's like pitting up a normal human trying to learn how to draw and replicate mona lisa and spending a day on an attempt, vs a human learning it by drawing a thousand mona lisa at once with a thousand limbs in a minute, over and over again. I wonder which one will learn faster!
I got so wrapped up in the Wirtual-esque backwards driving montage that when 26:18 happened I just burst out laughing lol. Amazing video as always, between the writing, the editing, the visuals, everything is top-notch.
30:55 "it seems to lack creativity." Which is a casual way of saying: "The AI is performing a gradient descent on its loss function. One of the limitations of gradient descent is that it will find the _local_ minima rather than the global minimum. Because after all, gradient descent is a greedy approach." Humans are "creative" because we can use a dynamic approach. We do often use a greedy approach too as greedy approach requires less planning, but we are also always questioning, "Maybe there's a better solution that's out of the box?" which is what dynamic approach does: it is always comparing if a unique set of subsolutions can beat the previously discovered optimal set of subsolutions, when combined with a new possible "frontier" of choices. In the case of turning the car around to forwards-facing, the AI skipped over the possibility of a long term improvement because it is using a greedy approach (it only sees the immediately-next best thing it could do, turning the car around at this moment will slow it down). A human driver, will, however, consider that turning the car around does increase it's speed for the rest of the track. They will therefore compare if a unique solution set consisting of the car turned forward-facing will possibly beat solution sets where the car is left backwards. E.g. Let's say the track had only 4 turns S1= {left, left, turn car around, right} VS. S2= {left, left, rigth, right} Both greedy approach and dynamic approach realize solution set S2 is faster as turning the car around takes too much time. The difference is, greedy will now NEVER try turning around ever again. If say even if the track had 8 turns rather than just 4. Dynamic will STILL experiment with turning around. S3= {left, left, turn car around, right, right, left, left, right} S4= {left, left, rigth, right, right, left, left, right} Turns out, solution set S3 is faster than S4 because turning the car around forwards makes it faster on the next 4 turns. Dynamic has found this solution because it always considers the entire solution space. Greedy will not find S3 because it is a superset of S1 and remember S1 was discarded by greedy so this superset optimal solution was never calculated by it. So yeah, this "What if?" curiosity in dynamic approach is what makes it capable of finding the global minimum.
Real time physics are like a very complex mathematical formula with multiple variables and multiple steps, where each step consists of different operations (sum, multiply, division, root, etc.). All this means that even very small differences in the values of the variables can yield wildly different results. That's why they seem random to us humans. For example, if a formula involves squaring a number 9 times, when x is 0.01, the result is 163, but when x is 0.02, the result is 25310.
This definitely does seem like a chaotic system, but that doesn’t mean that it not possible to complete the maps. You cannot predict what is going to happen but you can make the system robust to small changes. My suggestion is to add a little noise to the control output during training to force the AI to avoid situations which is sensitive to small deviations. Best regards from a control engineer 😊
Yup. Another way of looking at it is that the system of the game physics is chaotic if given fixed inputs -- but the larger system that contains the game physics and the control algorithm may or may not be chaotic. And "making the control system robust to small changes" is another way of saying that you're making that overall system non-chaotic. That's a lot like how an inverted pendulum is unstable, but an inverted pendulum in combination with a properly-tuned PID controller will be stable. In both cases, mathematically you want all the poles (for the combined-system's behavior) on the left side of the plane and then things are good.
@@yoshtmhere's an example of a different controller (in this case, one powered by reinforcement learning) interacting with an inverted pendulum. (I don't know if links work so here's just the end: watch?v=lM6rYjM6HBU). I tried to find non-inverted-pendulum examples but inverted pendulums are pretty much the classic example of a chaotic system. A solution you might not have heard yet (I'm sure other people have said 'cap the progress/speed reward after a certain point and just penalise falls after that') is to is keep a record of every fall in the last 10000 attempts, continuously train an evaluator AI to predict 'how likely is this position to fall in the next five seconds', and penalise the car's AI for how likely the evaluator thinks the car is to fall off. "Continuously train a decisionmaking AI and a predictive AI against eachother" is a pretty useful solution for turning abstract, faraway, discrete incentives that are hard for AI's to master (death in videogames especially so: this approach has been used on stuff like mario, atari breakout, pong, various obscure dungeon games, etc.) into nearer and more concrete incentives that are easier for a network to learn.
The game calculate the position of the car with smalls rounding errors. Those errors are deterministic but happens and are corrected every fractions of seconds. It's too fast to be analyzed and act as RNG. A dedicated AI could possibly crack the rounding errors secrets, but that seems like a lot of work.
Thanks for watching!
Some additional details not explained in the video, which might help to better understand the irregularities observed:
- I didn't show all the AI inputs in the video to keep things simple. In reality it has access to more information, such as x,y,z velocity, velocity rates, roll-pitch-yaw rates, etc. But maybe it's still missing some crucial information, it's hard to know.
- The irregularities observed are not due to hardware or framerate issues. All the things I'm showing in the video are made with a tool called TMinterface. This tool allows me to inject action commands into the game at precise timings, in a way that is 100% repeatable. The same sequence of action on the same map will aways lead to the same outcome, even on a different computer. It's completely deterministic. And it's used a lot for Tool Assisted Speedrun (TAS), you can find many examples of that on youtube.
I have a bit of extra footage that I couldn't fit into this long video. I plan to post some of this extra content on my Patreon in the next weeks. Any support there is a great motivation to keep making these videos :)
patreon.com/Yoshtm
Do you know if the inputs are completely deterministic independent to time? If you run the same set of inputs offset in time, will the result be identical?
@@andrewbrown2038 If you are speaking of ingame time, it has an effect. Starting to accelerate at t=0.00s vs starting to accelerate at t=0.01s are two different inputs, which will have different outcomes, as shown in the video around 32:00. But it's still deterministic: starting to accelerate at t=0.01s will always have the same outcome in successive races.
So from what I remember about neural networks, each neuron of a neural network takes the inputs as numbers multiplied by weights and passes them through a function to output another number to be taken as input once again. Now I don't know the specifics of your neural network such as what activation function you are using, but combining this with how your outputs of the network will affect the inputs after any amount of time, changing a specific input at any specific time, even slightly, would create a chain reaction that would get you your random variations after any period (usually 1 or 2 seconds) as seen at 18:44. Thus, the random variation is not necessarily a problem of incorrect inputs or anything but rather an aspect of neural networks. Perhaps I am misunderstanding some way, or maybe you had already answered this question, but I believe that is the cause of the random variation.
@@mattanderson5239 a few seconds later, around 18:50, I show that even when I force all cars to use the exact same sequence of actions after the perturbation, the randomness is still there. Maybe the neural network is part of the problem, but according to this result, it can't be the only source of randomness
Did you try to add small randomness to AI actions? Like +/- 0.001% of steering. It should force AI to drive more safely, because of unpredictability of each action's outcome. Not a datascientist btw, just an idea to try.
I came here to see AI destroy humans on Trackmania, but I got a 37 minute essay on how pipes in Trackmania are an exercise in chaos theory instead. 10/10
While still seeing an AI smash the human WRs! I see it as a win-win scenario 😊
Dude took the most niche and least enjoyed playstyle ... to make a 10/10 documentary of his project.
Easily best TM youtuber atm
thanks for the save
This is better than school 🤣
Not only Pipes are an exercise, obviously the... (what was it, eo3?) track as well. 😂
PhD Student in Deep RL here. The behavior at the end seems mathematically chaotic and might lead you to the conclusion that you cannot predict deterministically the behavior of the car. However, that does not mean you can't improve the performance by a long shot with a few simple tricks.
You are doing model-free reinforcement learning, which basically means that you don't need to predict what is going to happen exactly (and extremely hard here), you just need to figure out what actions are the best in this situation. In most environments, this is actually much easier to do (and a reason why learning an accurate model is often harder than just straight up optimizing for performance with model-free).
Second problem is that you are using an RL algorithm and RL assumes you are in a Markov Devision Process, ie that you have full observability of the world, ie that the probability of taking the next action ONLY depends on the current information that you have, and not the past, ie that adding information about the past does NOT help ou make better decisions. HOWEVER, you are NOT in an MDP (Markov Devision Process) here since you lack critical information about the state to take decision, as many pointed out.
In practice, what you can do will most certainly improve performance:
- either manually add all the missing info (speed, rotations, ...)
- or more simply, create a concatenated vector of the previous N past states and inputs and use that as input to your NN (LSTM/RNN works too but is more complex and often not necessary when we are dealing with a few steps of history).
- Use more layers (deeper network = more complex function can be learned by the network)
- Sticky/random actions: introduce manually randomness in the action of the agent. For example repeat randomly the past action with a small probability. Do NOT decrease this probability with time. What this means is that the agent will have to learn the whole time in a now stochastic (ie random) environment and will have no choice but to cope with the inherent randomness that you added, making it largely more robust than an deterministic agent. This is a common flaw of fully deterministic agent in these kind of environment btw.
- Look at extensions like Max Entropy RL (Soft Actor Critic for instance), where you both maximize reward and entropy of the policy. Some papers proved that it makes the learned policy more robust to out-Of-Distribution perturbations, ie perturbations that were never seen during training. In your case, this will help the agent lean "recovering behaviors" to recover from deviations caused by physics bugs/randomness/whatever.
- Max Entropy RL will also help with exploration a lot, which might help the AI be more creative in finding solutions.
- Look at other tricks that improve performance "for free". I don't know how much you implemented already, but a basic start is looking at Rainbow DQN, PPO.... But you are already using it probably..?
I am a PhD student myself but in Sexology and I have nothing to offer here.
@night0x453 Hi do you have any course recommendations/ or youtube channel for beginners in ML/RL?
@@7dedlysins193 look at Sergey Levine course on Deep RL, it's one of the best out there to get started in deep RL/robot learning
lol@@raskreia8326
I'm not a PHD student in either sexology or Deep RL, but I can tell you that this wall of advice right here is sexy af.
Roboticist here. Some regions of configuration space are more chaotic than others; the demo you did where you perturbed a car that was about to fall off demonstrated a smooth increase in success rate as you moved the perturbation further away from the point at which outcomes diverged, suggesting that the failure was in a fairly narrow chaotic _region_ and moving out of that region made the system more stable and enabled more consistent success. This is what unnamed did to complete the jumps track: Found a strategy that consistently avoided highly-divergent regions of configuration space.
One problem is that, if the display of the inputs is representative, the AI doesn't have the right sensors. Specifically, the AI has its position and speed but not its _velocity_; it doesn't know how much of its speedo reading is in each component of down the track vs. across the track vs. away from the track. More critically, it also doesn't have roll-pitch-yaw rates. It only knows its current orientation, not the rate at which its orientation is changing.
I suspect that the car also needs to know the difference between a simple pipe elbow, a tee, and a cross. Even if the extra geometry isn't in the direction of its travel it'll almost certainly still interact.
Finally, consider increasing the numerical precision of your network. This is one of those cases where running floats vs. doubles might actually matter.
Adding rate inputs, or more generally some memory of past frames (recurrent neural networks), should indeed improve the results. Unless Yosh is already using those and has just kept the narration simple. :)
I hadn't noticed that but that's a great point. If you think of the ai as trying to force configuration space into the few fast values where it doesn't fall off being able to actually see the full space can be a great help. Otherwise it becomes impossible to figure out the optimal action.
Of course there might be performance and training concerns if you take it to the extreme but experimenting with adding more inputs could be a good way to see what improves consistancy.
fully agree - the AI needs more information to make the right decision. And I don't think giving it all the information would solve all the problems. Because the game engine itself is just a simulation of real world physics. There also is a tick rate and some approximations/simplifications being done. Some things will be hardcoded like material coefficients for energy conservation during different types of bounces (side of the wheel against a wall is vastly different than bottom of the wheel to road/pipe surface you land on).
Imo yosh needs a well trained model on the game engine itself. In a way that the AI is able to predict the result of each action it makes.
I think this is it. The inputs look like they are suited for a track with a 2D layout like in his last videos but these pipes need a better 3D understanding of angular momentum. The AI is doing the best it can with a blind spot in its vision.
In previous videos, Yosh explained he uses the last X frames in addition to the current frame as input, so I'd assume he's doing the same here. (At least I think Yosh does, it's hard to remember which AI video does what :E)
That ending sequence was an absolute joy. Prefect syncing with the music
witrual should trademark that line LOL
So you’re saying my world record fails are actually the butterfly effect’s fault…
most certainly. it also means that all of thousands of times you failed were simply bad luck, and everyone to ever succeed at a world record simply got lucky, and this is not at all a skill issue
funny how there is some truth to the sarcasm@@gabrielandradeferraz386
New coping mechanism just dropped
@@Vaaaaadim the most mecchy player.
coping 100
Or your successes...
your visualizations are just amazing, thank you for all the editing!
Amazing video! Interesting work
Yeah, these 1000+ AI car attempt renders are just visual candy.
Thanks for using my map :D
I have heard people saying that Trackmania is a deterministic game hundreds of times. I think from now on I will prefer to use the word "chaotic" :D
Heeey cool to see you here, I had a lot of fun in the replay editor with this map :D And gg for finishing such a map, it must be painful ahah
I want to see a merger with the world of "micromouse"
Gotta say man, your visualization skills are insanely good. Great job. Perfect execution.
Unironically one of the best track mania / AI beats things video I have ever seen, the visual representation of everything and your clear explanations made this so satisfying to watch. Like the showcase you did when explaining the randomness of tiny input changes was so well done and it is really clear how much time you put into this topic and video. Also your humor ist just amazing lmao
Wow, this was not only informative and entertaining but a cinematic masterpiece with perfect pacing and music selection and everything, I didn’t even realise it was 40 minutes long until after I finished watching it!
thats exactly what i thought too!
It was 40 minutes?!🫣
The song that played when you showed Wirtual's WR was just peak video making lol
Haha I was looking for this comment 😂
Time stamp?
@@jamport59739:00
9:03? you probably already seen it@@jamport5973
Then the AI got this run,
I’m not a TrackMania player but I love watching TrackMania videos. I also love game AI vids, and this video really did it for me. Being fascinated by chaos theory I’ve been screaming at my screen throughout the whole video ”bro it’s chaos theory”, and then you even mention it yourself. Top tier content, it’s what I’m here for. You just gained a new subscriber 😄
Haha same! I'm into chaos theory too, so the second he was like "it's deterministic, but it still feels so random...little differences in input..." I was yelling at my computer!
yooo such an honor that you used my map for the 3rd part ♥really enjoying your AI videos :) keep up the great work!
Ooooh hey corzo nice to see you here!!! I remember playing with you on the tm2 rpg titlepack sever many years ago :) I'm glad you liked it!!
28:40 was a comedic masterpiece, the editing, the pacing, incroyable!
Absolutely poetic.
God, I wanted to see turtle on pipes so badly 😢
"Ive Tried though" how many hours we're lost to these 3 words
😂😂 I was laughing so hard at that point
Genuine comment though, if you give the ai full knowledge of the decimal precision you’re getting, it might actually be able to do something about it, give it the smallest of minute details, only then can it do something about it
Your analysis about chaos theory is exactly what is going on. And TrackMania's chaotic physics are very familiar to me from another TrackMania project. The best way to control that chaos is by going slower, as the deviations caused by the bugs are less severe at lower speeds. Perhaps giving less reward for speed and more reward by having a high chance of getting far would help with consistency.
In fact, that's probably why the backwards driving cars were much more consistent. There's not as much deviation in speed when they're going backwards, so any bug has less impact and the performance reward is more focused on getting far too.
I dealt with similar issues in some TAS work with Neon White - in that case, small variations in frametimes would lead to different interpolation of player movement (even with fixed tickrate physics, they still interpolated the actual player position based on frametimes), which would add up to significant positional desyncs occasionally. Vsync helped reduce the frequency of this (as it would better sync frametimes to monitor refreshes), but it still wouldn't fix it entirely.
sometimes, it's just about the little things.
I dont wanna say one thing is the reason for the other without 100% confidence but going forward = more speed = more chaos, backwards = less speed = less chaos, coincidence?
@@BoxTMI'm only familiar with the basics of chaos theory, but I feel that trying to find the Lyapunov time of the system could be a good first step.
@@Bruno-cb5gk I think you will find such a calculation not only to be impractical, but next to impossible to perform… The Lyapunov time will only be a few frames/instances of the physics engine refreshing. Anyway, we know all the hidden variables here as he showed in the video, so there isn’t much point since it still is deterministic and we can actually view all the precise starting information right, why can’t we just let the ai view that data as well? Correct me if I am mistaken please.
At a high level-and as someone who does not necessarily understand the mathematical underpinnings of chaos theory or neural network training-I would say the issue to solving the inherent imperceptible randomness here is adaptability. I noticed that at certain points training on the pipe courses, you changed the rewards system or the parameters that the AI could use to operate and judge its success, allowing it to achieve a desired outcome to adapt to specific scenarios or ways of thinking. At the end of the Calm Down course, for example, you gave it specific parameters at one point in the track that encouraged it to follow in the course of Wirtual and try a completely different technique that it simply would not have discovered on its own using your existing parameters. All that to say, the creativity that you pointed out is key, and I think that could also be called adaptability.
How do we define in code the ability to see a possible improvement and actually try to make that happen, then give up and move on if no appreciable progress is achieved? You might not just have to train the control model itself-you may need to train the rewards. Maybe by changing its goals as a part of the training process, you might get it to prioritize certain things like stability, consistency, or pace depending on the situation. I think humans operate a lot more like that. If we are trying to do the pipes, for example, we may start by trying to just stay on the pipes and stabilize. We might learn how to recover from a mistake or a random element introduced by the system. As we are learning the course, our priority might shift in the middle of the run from going as fast as possible to not falling off, and that changes the behavior. In this video, it seems like you only changed the goals yourself in between training sessions. How do we put all of these pieces together to train a model that can understand how to change its own rewards to best suit the situation and desired outcome?
At the end of the day, it was fascinating to see how “human” the trained models looked. It continued to refine how it applied inputs, but it struggled with the inherent minute inconsistency in the game engine.
22:48 "THE AI GOT THIS RUN" omg even the song, that's great man
but then, hefes... heu i mean the AI got this run...
@@theaypisamfpv but then AIHefest got this run:
"And then hefest got this run"
Been looking for this comment
666 LIKES??????????
Every other comment has like 15!!
The butterfly transistion at 17:27 is a pure masterpiece! The many small things u do while editing dosen't get enough appreceation! Congrats on a beautiful video!
Beat me to it.
Me too, wanted to say the exact same thing 😊
“Foreshadowing is a literary device in which a writer gives an advance hint of what is to come later in the story”
I was so happy when I noticed it
What if it's linked to the Butterfly Effect in chaos theory? Nice detail!
28:50 “Now let’s have the AI drive upside down” had me rolling. For a hot second I really thought you were about to blow my mind once again 🤣
Bro, i was like no fkn way he beats wirtual again upside down and then does that backwards just to show off that wirtual is just driving like a grandma on that track lol
My stunned disbelief in that moment was at peak levels. I 100% thought it was going to happen haha
@@ArmageddonNerd I’m glad it wasn’t just me 🤣
LOLL
HE EVEN HAD TRAINING PREPPED FOR IT 🤣
What an insanely good video! I haven't even played Trackmania in many years... but the way you explain the details and visualize everything just makes this soooo interesting! 😌👏👏👏
I saw you on the free zone speed server last week
I saw you on the server freezone speed last week
I love how the AI on the 65km pipe track looks like it's just blissfully skipping along while the human records look like they're terrified. it's so silly.
The editing, the music choice, the compactness of the information while still remaining coherent, the pacing. This video is a masterpiece! Even small things like using the butterfly-shaped pinhole transition when bringing up that topic for the first-time, or how the hue of the cars match up with what generation they're from, so many little details that I could write up my own essay detailing every little thing you put extra effort into. I was enraptured the whole time. Bravo!
i dont even play this game, this video is a masterpiece!
Your comment perfectly captures the things I noticed most. The continuous audio and visual cues where important concepts were being introduced and then the release of focus where it was ok to do so. This is just as interesting as the stuff I remained engaged enough to learn due to the editing and composition. A god damn MASTERCLASS!!!!
@@J3TL4GG neither do i yet i sat herre and watched the entire thing because it was just that entertaining!
@@dazeoliver this that perfect 3AM video to watch
i was going to write this post. thank you for doing it for me ^^
Mathematician here, I fully agree with your chaos theoretic conclusions, and I think the pipe experiment at the end strongly supports this (as you realised). Why not give the ai control after a random amount of time has passed at the beginning? I think it is really interesting, I would like to see if the ai could somehow abuse the chaos theoretic nature of the situation - I remember a video about a Pokémon ai that figured out some elementary RNG manipulation so that it wouldn’t get grass encounters :D If you want to talk actual mathematics, then I think perhaps the UA-cam comment section is inadequate - I cant answer every question either, I only took one chaos theory class as a grad student, but perhaps we can try to understand some things.
I'm a mathematician specializing in Chaos theory, AMA
@@Life0 does chaos theory apply to women because i never seem to understand what the hell is going on
What I'm curious about as a programmer, thinking about this issue, is if for whatever reason the physics system is tied to a clock/tick rate that could potentially cause that bizarre result where a certain delay always means success. Like if you tried the sim on another machine would the delay carry over? Is it a processing limitation?
There could also just be mathematical issues within the physics engine itself; e.g. If you hit the ground hard, and lets say it spikes the suspension values to some crazy number, maybe the engine tosses values over a certain threshold or under another.
@Yosh I wonder if you could teach it what stability "means", and set a reward system up for "good angles" that it finds for drops that are more likely to produce consistent results.
This game is insane. I could never manage to complete it better than yellow maps
@@Life0 what are your favorite examples of everyday phenomena that involves chaos theory?
This video is incredible from start to finish, the subject matter is fascinating, the editing is sublime, the commentary and timing is great... The wirtual music, and the carrot sound byte. all of it is just so good. Thank you Yosh
I like how yosh's attitude towards the learning model went from "We shouldn't have to sacrifice consistency for speed" to "Fuck it we ball"
16:32 and there he is, introducing brownian motion in a virtual world to isolate entropy. Good job!
Yosh: "I am going to lock your breaks and acceleration so you cant slow down"
AI: "Pfft check this out"
and that it still figured out how to manage its speed with airtime despite those limitations. and he says the AI isn't creative.
@@binz2056 That is entirely deterministic. The increased airtime led to success. It is just maths masquerading as creativity.
@@bertram-raven call it whatever you want. But it is an interesting workaround given the constraints.
Yosh: "I am going to make you drive backwards then"
AI: "Pfft check this out"
@@bertram-ravenisn’t that true of all creativity that generates something that can be represented mathematically?
this video is about something very technical, yet, it touches something deep inside me, something more emotional rather than rational. I sincerely congratulate you on this, I feel this turned out to be more philosophic and even artistic than I expected it to be. (also, huge detail to use wirtual theme and editing style, it adds a whole new emotional layer)
Waiting for all the Trackmania UA-camrs to react to this.
sex update when?
Yeah
Yeah
Yeah
Yeah
Your humor with "the song" and "the AI got this run" is just too damn much man. On top of being a cinematic masterpiece, it's also funny as hell. Insane video, genuinely hope you blow up beyond belief.
I didn't expect there to be micro fluctuations in a still car, but its equally fascinating as it is frustrating to learn the game is so sensitive that milliseconds can, in niche scenarios, determine your success. It may not be truly random, but it's close enough that you can't account for it.
But that's actually what happens to a car IRL. A straight line is a mathematical convention with absolutely no correspondence to anything in the observable universe
As a Slovak, the Slovak flag on the Unnamed player's car made me smile :) Thank you for the amazing video! :)
It does make sense in a way, where else can you see that much self-induced suffering lol
Still, nice to see a fellow countryman show so much perseverance - inspiring, in a way.
I have no idea what's going on but it was really satisfying watching huge cascades of race cars falling off pipes, thank you!
As a former kid, I concur. Watching cars fall off pipes is satisfying.
"In the Hall of the Mountain King" playing while all the cars rush to their demise like a cascade, is so fitting.
Your videos are so interesting and it was worth every second spent.
I loved how it just randomly turned to a Wirtual documentary for a while at 21:55, even the Norwegian flag is here.
When I saw that butterfly, I thought you're gonna get deep into chaos theory.
Thank you for the great video and for not complicating that topic further this video.
I would love to see a project discovering the chaotic behaviour of collision with pipes, as an example, in Trackmania.
This is really stellar work. Not just the AIs but the videos. I'd equate these with the likes of Summoning Salt who are just very good at not only doing stellar work themselves (speed runs in his case) but also present it in an enjoyable and captivating fashion.
I have not binged watched your material, but from the videos I have seen this is some of your best work. Keep em coming.
Thanks a lot
Wow. The way you paint the picture between about 17:00 and 20:00 is so captivating, wonderfully done!
What a wondeful video, the quality, the effort, the scripting, even the joke about driving turtle, totally got me, piece of art!
I’ve never played track mania. Only watched a handful of these videos and all I have to say is WOW! this video was amazing! Please keep making them.
Even deterministically chaotic systems can be controlled, in practice it is when Lyapunov exponent is small and the agent has quick enough reaction time (and steering capability) to keep control. But the touch at 34:06 looks similar to a landing bug. A sudden deviation from usual behavior.
What you can do is to add another (part of the overall) deep neural network to act as a predictor. That would detect such "buggy surprises", and you can focus the score on surviving such surprises.
As to making the competition with humans more fair, what you could do is to make AI commit to its actions some time (say 0.3 seconds) into future. This would lower its control, so it would have to drive slower in order not to lose control.
(Edited typos.)
+1
+2
👍
Splitting the AI up into multiple smaller ones seems like the way to go. For example one for prediction, one for recovering from mistakes/bugs/chaos, one for just trying to go fast (pretty much what the AI in this video was) then a controller AI that decides when to switch between them.
This approach would also make harder tracks more possible, if there would be a separate AI just for speedslides, bug slides and all the other techniques, a few prediction AIs and then a decision maker AI.
But landing bugs are deterministic. They are called bugs but if you execute the exact same inputs the landing bug will happen. That’s how trackmania saves replays in fact. I think what is nondeterministic is how long it takes the AI to compute an update, so it gets a slightly different view of the world at the next step.
I love your video style and how you're displaying the learning process. Thanks for the great video!
This definitely is your best video ever! I've enjoyed the previous ones a lot, but this is a masterpiece! Keep going at it; the humor, the theorycrafting, your thought processes... it's all just one giant jigsaw that fits together perfectly.
Thanks a lot
17:27 The butterfly effect and the evolution of your AI are absolutely fascinating. Thank you so much for this content!! A+ editing too
25:55 was the funniest part, no questions asked. Long but great set-up to a satisfying slapstick punchline.
Goosebumps indeed but I gotta the say the pun at the end that more patreon subscribers would be a great "reward" for him made me crack up even more 😂 suddenly the guy is just an AI himself 😅
Every time I see these videos appear, I can't help but watch. It's super interesting to see the steps required to train the model and the steps you take as you iterate. On top of that, the way you've visualised all of the attempts and with the use of colour to demonstrate how close the AI is to ideal for the scenario is just pure eye-candy. I love it!
OMG i was litreally watching the previous ai video and this JUST DROPPED ive been BLESSED. Actually huge timing
same!
Same!
Best Trackmania AND AI video by far😱
This was such a masterpiece video! I’ve been following you for a long time but it is incredible on how far you have come. I enjoyed every minute of this video. Thanks. ❤
Software Developer with focus on ML here. Just from 32:03 you can see a lot of variance in the game engine physics calculations. This is most likely due to floating point precision error over time. To a human player, this variance isn't preceptable. Eg. one hundred thousandth of a velocity change looks the same in reality, but the different input may cause the AI to make a different decision. The AI may be over fitting into the region of noise in the signal. To make the AI more consistent and human like, I think you need to actually DECREASE the precision of these values fed into the model. That should help reduce the chaos being injected into the system. This is a balance because reducing the precision too far is bad too.
A simple trick to try is to convert inputs to int8 or int16. That will give lower precision input and perform faster with aimpler math calculations. Or you can just truncate the float to the first 4-5 significant digits.
Great video. Ive totally watched it after 44sec release
Im a new french IA dev.
Chaos behavior could be stop, if you go to a stable deterministic and confort position / case.
In your pipe jump race, you can reduce your problem circuit, into a single jump : start in any position on a pipe, jump on a next pipe and stop. Then reward if it stops on the pipe.
So, just try to reward it, when it stops a moment on each pipe. Then chaos will be reset.
but it wouls have to stop at the same place each times, which is maybe not possible, because the initial situation make it so that, you have a limited numbers of movement you can do, and a limited number of different speed you can achieve, it's all graduated, so there may not be any way to reset the chaos like this.
@@ledernierutopiste
Yes this can be a little problem.
As he did before, he can spawn the car randomly on the pipe.
Then the IA will learn how to go from any initial position.
Then in the next pipe, after these 4 wheels are on contact and fix during a moment, the IA will know how to continue these next pipe.
And so on.
The only think is the game isnt deterministic. He showed that the position of the car changes by am imperceivably amount hundred times a second that leads to drastically different outcomes. So it could never be in the same spot twice
@@cantingpython
We don't need to be in the same spot twice.
We need that the car have four wheels on the floor at the same time for a moment, to be stable, and recover a deterministic state. Jump. And repeat theses operations at each level.
You make the bestvideos on Reinforcement Learning! They inspired me a lot to want to learn how to do this, keep up the great work :)
I cant even imagine the amount of work behind the design/ training of the AI, and the writing/editing of this video. The amount of work you put is INSANE and the result is beautiful and astonishing! Tthe way you highlighted chaotic behavior was also very interresting!
It was a real pleasure to watch!
I never comment on UA-cam but this was an absolutely amazing video! Content and script/editing were both perfect. Thank you for making it!!
Hahaha the In The Hall Of The Mountain King montage at the end was really a master stroke of editing. Well done, as always.
A race without 🥕 = chaos
Congrat's Yosh ! Love watching this little car grow 🧡
First video of yours I've seen, and it's an instant subscribe. I know nothing about Trackmania and I'm not much of a gamer generally, but this was absolutely fascinating, and the quality of your shots and editing is outstanding.
I enjoyed this so much! Your voice is so calming! Keep up the good work and thanks for making good videos!
He sounds french, it's like he use syllabic intonation, while English should be stress-timed. Interesting.
Thanks! Yes i'm french
Wasn't expecting to watch this video for long and got to the end, barely blinking. Fascinating work, thank you
This is one of the best videos I've ever seen. Amazing work!
Apart from the excellent research, I'd just like to comment on how amazing the editing was for this! That kept it interesting AND funny!
babe wake up, new trackmania ai video
... just dropped
The visualization of the attempts are honestly the best part for me.
Also makes you think of the human players and their many attempts.
Yo man was just gonna say that you inspired me to dive deeper into reinforcement learning for my master thesis after seeing your amazing work. Keep it up!❤
Fellow NN dabbler here, I have a few thoughts / suggestions:
There's a degree of unpredictability from floating point bit precision and, as it's obvious to all, the system is inherently unstable. But I don't think this is insurmountable.
Firstly, there's a limit to what you can achieve with just two hidden layers and no temporal context. The lack of layers will translate into a lack of abstraction. All you'll get is "this high, that low" kind of reaction. Worse still, the NN is meeting decisions based on the instantaneous state when the most important thing is momentum which has a time component. Therefore you've given it an unsolvable problem, like trying to play chess without knowing where your opponent's pieces are. You can test this hypothesis by calculating angular and linear momentum and feeding it in as an input. If I'm right, you'll see an instant improvement.
Much better though is to allow access to the past. Try a simple recurrent NN first to see if it improves matters. Although you'll have to watch the expense.
Finally, just like the pipe it rides, the instability of the system becomes critical when you approach loss of control circumstances. Right now that's not being trained for, the change in the input just alters which corner fails. So the loss is not factored into the reward meeting that you stay in the same position in latent space.
You need to find the bell curve before the failure is most recoverable. Say it's 3 secs before the fall. Find a failure point and mass spawn there with a random time variable of say 0.5 seconds. Train the AI that, on average, has the least falls.
Fundamentally though, an input / reaction AI is insufficient. I would probably look at using an attention vector where I'd look to get it to recognise certain situations and contextualise its response.
Just a thought. Very fun vid btw!
Great comment! 👍 With a lot of effort and sophisticated methods I don't think it would be unfeasible to train a model to the point where it never "bugs", i.e. it avoids and/or works around situations where the chaotic nature of the simulation surfaces. Maybe even an AI that could uber at will... 👀
What's worrying about the floating point imprecision is that some of the car's parameters change by an order of magnitude even when it's at rest (32:02) which seems to indicate the game engine doesn't really try to limit fp error propagation.
@@JerehmiaBoaz That's just the physics engine implementation. There's a "gravitational" acceleration and then a hit box on the wheels which counteracts it. Obviously it doesn't quite return the body to the same place but it's imperceptible so nobody is going to worry about it. These things are an art not a science, they'll be "suspension" on the car and "friction" so they'll mess with the settings until the car feels right. Bit precision on these tiny effects just scrambles the numbers a little, it's not the cause as such. If you run the same computer programme twice you always get the same result; bit imprecision always gives the same result.
The real issue here is how low entropy the solution is. The count of microstates of failed outcome macrostate is particularly large compared to that of the success macrostate so small changes escalate quickly. Compare that to noughts and crosses / tic tac toe, the success macrostate is a substantial proportion of the total. Sorry, that was an overly academic take but it's the fundamental issue.
@@davidmurphy563 I understand how physics engines work and I've dabbled in AI myself a bit (I'm a contributor to the stockfish chess engine). My point is that the number of microstates of the successful outcomr macrostate might be further reduced by the granularity of the fp calculations performed by the game engine, it might be impossible to take some corners perfectly because the engine's inaccuracy makes the car either understeer or oversteer depending on the involved physics parameters.
@@JerehmiaBoaz Oh wow, really?! I'm a big chess fan myself and Stockfish is an awesome engine! Hats off to you.
Sorry, didn't mean to teach you to suck eggs, I just assumed layman and honestly, I just wrote what I'd been mulling over anyway. Helps me think it over.
Yeah sure, I'll accept that point in principle, although perhaps not in practice here. In general, the more bites at an apple you have to correct the divergence the better. If you were to map the latent space onto a discrete cosine transform (I don't recommend fourier as the ends need to meet) with the outer product then you could likely identify the frequency ceiling and set the framerate accordingly. Although if I recall correctly I believe the dude tried upping the frames without result which - if you squint through rose tinted glasses - vaguely supports my suspicion that missing temporal data led to intractability.
You've completely nailed the editing, pacing and comedy in this video. Keep it up, and thank you for such a masterpiece!
I find the thought of you trying so hard to make the ai drive upside down on the pipe hilarious. Poor ai 🤣
What an interesting video, this is unironically one of the most entertaining videos I've watched on this website in a long time.
Absolutely crazy concept and amazing work with great editing.
Oh my god... not only was this topic SUPER interesting, but it was also presented and explained beautifully, masterfully shot and edited... the list goes on. The way you exhibit the improvements with colors, showing hundreds or probably thousands of runs simultaneously is visually stunning but also very easy for the average viewer to digest. I'm blown away. I subscribed after being suggested and watching your last video, and I'm so glad I did. Just unbelievably well done. Massive props! 🤩
Thanks a lot :D
Bro i recently started watching ur vids but they are absolutely amazing. The editing to show all car runs are amazing along with the color coding and the overall video. Much love
That ending... I spent the whole video yelling "it's not random, it's just chaotic!"
To any professor out there trying to find a why to create an inspiring course on modern computer science and AI. THIS is how you do it. In this 40 min video the following essential concepts have been demonstrated: basics of neural networks, evolving of perbutation in non linear system leading to chaos, visualization to support human analysis of complex systems, gamification, inventing and using classification to measure change in complex systems, using evolutionary techniques combined with reinforcement learning, using focused training in AI systems to overcome plateaus, establishing metrics to quantify progress, competition, monetization, community building, the importance of endurance,... you name it. And all i using a fun game! Take this video. Write all the concepts on a board. Start teaching. (Male) students will love it. It amazes me that a single guy, without even attempting or knowing it, did all this on its own. Congratulations! An outstanding achievement.
I cringe at the (male) part. Was it really necessary to add to the sexist stereotype behind people in Computer Sciences?
@@Etrancical Do little girls love Lightning McQueen and Thomas the Tank Engine?
and possibly even propagation of error depending on whether float or double was used
@@aesop1451Yes? I fail to see why you ask such a dumb question?
My little cousin loved Lightning McQueen so much that she used to have a Lightning McQueen bed.
@@Etrancical Bet she loves Anna, Rapunzel, or Ariel more.
There are two deterministic systems here to distinguish: the engine and the trained instance.
The game engine, it is deterministic, and the many degrees of freedom plus digital nature of the floating point numbers introducing nonlinearities increases the chaotic behavior of the system.
Each of the neural nets decides deterministically with given equations, which are nonlinear in the inputs (by the many layers and activation functions).
Now, that chaotic/nonlinear physics feed into the chaotic/nonlinear trained model instance.
Finding the weights of the model that give the perfect solution might be impossible, but there might be infinite solutions that bring the car in a WR time from start to finish. (infinite for the continuous model; countable many using floating point numbers).
The bigger challenge is to define the perfect solution and finding it. For chaotic systems these parameters can sometimes not be found efficiently. Using evolution as an approach is great is the case, as you portrayed beautifully! Still this approach does not guarantee the optimal solution.
Numerical optimization can make problems practical, but the theoretical optimum can only be found in the least non-chaotic cases.
Thanks for the great video, loved watching it!
(phd student in complex physics)
If you plug two chaotic systems together and use the same initial conditions then they will still produce the same output. It doesn’t matter that you stacked them. There has to be nondeterminism somewhere, to get the result of this video without manual perturbation
Oh you are talking about optimization my bad, disregard my comment.
Surely only finitely many for floats? :P
Well, finitely many IEEE floats can take up to 2^32 ish valid states each (minus nan and other special values). Thus while given the absurd scale of the permutations there are, by technicality a finite number of them@@drdca8263
I think it simply lies with the rounding errors that are induced by the variations in clock when certain calculations are performed. It's the difference in the initial parameter of the timing that causes the divergence despite the same inputs.
Your videos are, in my humble opinion the best visual representations on how the AI works!
Among all the AI videos I have watched, this one was by far the most interesting. Your storytelling was spot on. Thank you so much! I really enjoyed it.
Incredible editing, story telling, and informative! Well done!!
The tiny turn to make a "butterfly effect" was very clever man.
Video incroyable, franchement bravo pour le travail, et j'ai vraiment kiffer voir ta réfléxion au fur et a mesure de l'évolution des problèmes. C'était trop cool !
La vidéo est très clean, le taff est fou, c'est intéressant, fun, chapeau, c'ets du très beau taff
Trop bien, c'est vraiment passionnant et avec cette qualité de montage et de story telling ça n'en est que meilleur, merci !
I think youre close with your conclusion abouy chaos theory. Here are my thoughts about it (but bear with me: I am not an expert on reinforcement learning, I work in supervised learning):
1. I believe one big problem with Trackmania specifically is: because of the bugs you desribed (like landing bugs), it is NOT a chaotic system where a small deviation leads to vastly different outcomes far into the future. The future will deviate extremely slowly, but a bug might introduce a sudden extreme change that cannot be foreseen. This is just an observation, but if you could somehow detect these bugs on the pipes and give reward for surviving such bugs, it may help.
2. In supervised learning there is a common technique of bagging/boosting (slightly different than what I am about to describe, but the idea is the same): take specifically runs of your agent that went wrong (like one that falls off), and train on that run specifically (maybe start 1 second before it fell, with the exact same conditions) some percentage of your training time (this percentage can be adjusted based on how much you value reducing false negative/false positives usually. However: in your case this percentage might be a hyperparameter that you adjust for the speed/consistency thing. You can also change the reward function in these runs specifically to 'just reach the next corner safely'.
3. Bit of a strange idea, but ita worth an experiment (this is an idea that comes from time series forecasting): introduce tiny noise to the inputs and/or outputs of your agent. This way, the agent may become more reactive and may learn to deal better with unexpected deviations.
I didn't understand your point 1. : why do you deduce from the presence of bugs that trackmania might not be a chaotic system ? Points 2. and 3. are good ideas
@@yoshtm To build on idea 3, you may want to also try reducing the precision of the inputs. If it has a lot of fluctuating low-significance noise even at rest, you may get a better model if you just ignore it by converting to a smaller FP type.
Im writing an exam tomorrow about evolutionary algorithms - so to speak the back bones of reinforcement learning - and it's so cool to finally see all these mathematical theories and concepts in a "real life" example, good job :)
back bone of RL? more like one particular example
@@busTedOaS oh you are right, im sorry! I guess I misunderstood the explanation of my professor
@@MalikMehsi hehe, this does sound like something a professor would say about his lecture topic. :D also backbone is a loose term I guess.
Its most likely the floating point math. Basically the program checks in every cycle, looks at the state, then tries to press 0 but because computers are funny the computer has a bunch of unexpected values down the line.
Im pretty sure you can find the information needed on a video about why computers can do calc or algebra or whatever higher level math they struggle with
The details with which you create these videos is beyond impressive. Thank you for this story! Can't wait for the next one
La qualité de la description de la méthodologie et toute la documentation que tu donnes est incroyable, tu ferais pâlir beaucoup de docteurs, on est au niveau professoral là
"I don't want to dream of this pipes anymore" 😂 Love the video
i love the ,butterfly effect‘ detail in 17:27 ^^
Amazing video I'm surprised how consistent and dedicated you've been to this project, I don't have doubt you have spent as much time as some pro players have played the game.
AI are like human. Doing something over and over to get better at something, but without the drawback of getting insane in each try.
Not to mention sheer computational power. It's like pitting up a normal human trying to learn how to draw and replicate mona lisa and spending a day on an attempt, vs a human learning it by drawing a thousand mona lisa at once with a thousand limbs in a minute, over and over again. I wonder which one will learn faster!
and solid memory (not ram)@@Dice-Z
Very much like rabbit that way
@@Dice-ZMona Lisa?
I got so wrapped up in the Wirtual-esque backwards driving montage that when 26:18 happened I just burst out laughing lol. Amazing video as always, between the writing, the editing, the visuals, everything is top-notch.
ikr haha
This is the pure essence of youtube, why this platform is amazing. This video is an absolute master piece, I am forever impressed, congrats
30:55
"it seems to lack creativity."
Which is a casual way of saying:
"The AI is performing a gradient descent on its loss function. One of the limitations of gradient descent is that it will find the _local_ minima rather than the global minimum. Because after all, gradient descent is a greedy approach."
Humans are "creative" because we can use a dynamic approach. We do often use a greedy approach too as greedy approach requires less planning, but we are also always questioning, "Maybe there's a better solution that's out of the box?" which is what dynamic approach does: it is always comparing if a unique set of subsolutions can beat the previously discovered optimal set of subsolutions, when combined with a new possible "frontier" of choices.
In the case of turning the car around to forwards-facing, the AI skipped over the possibility of a long term improvement because it is using a greedy approach (it only sees the immediately-next best thing it could do, turning the car around at this moment will slow it down). A human driver, will, however, consider that turning the car around does increase it's speed for the rest of the track. They will therefore compare if a unique solution set consisting of the car turned forward-facing will possibly beat solution sets where the car is left backwards.
E.g.
Let's say the track had only 4 turns
S1= {left, left, turn car around, right}
VS.
S2= {left, left, rigth, right}
Both greedy approach and dynamic approach realize solution set S2 is faster as turning the car around takes too much time.
The difference is, greedy will now NEVER try turning around ever again. If say even if the track had 8 turns rather than just 4. Dynamic will STILL experiment with turning around.
S3= {left, left, turn car around, right, right, left, left, right}
S4= {left, left, rigth, right, right, left, left, right}
Turns out, solution set S3 is faster than S4 because turning the car around forwards makes it faster on the next 4 turns. Dynamic has found this solution because it always considers the entire solution space. Greedy will not find S3 because it is a superset of S1 and remember S1 was discarded by greedy so this superset optimal solution was never calculated by it.
So yeah, this "What if?" curiosity in dynamic approach is what makes it capable of finding the global minimum.
27:45
not only did he teach an ai to beat a wr backwards
but he also was the first person to traumatize an ai
Real time physics are like a very complex mathematical formula with multiple variables and multiple steps, where each step consists of different operations (sum, multiply, division, root, etc.).
All this means that even very small differences in the values of the variables can yield wildly different results. That's why they seem random to us humans.
For example, if a formula involves squaring a number 9 times, when x is 0.01, the result is 163, but when x is 0.02, the result is 25310.
1.01 and 1.02
This was actually incredible to see including the fact that we're only talking about a game's physic. I learned ALOT
This definitely does seem like a chaotic system, but that doesn’t mean that it not possible to complete the maps. You cannot predict what is going to happen but you can make the system robust to small changes. My suggestion is to add a little noise to the control output during training to force the AI to avoid situations which is sensitive to small deviations. Best regards from a control engineer 😊
Yup. Another way of looking at it is that the system of the game physics is chaotic if given fixed inputs -- but the larger system that contains the game physics and the control algorithm may or may not be chaotic. And "making the control system robust to small changes" is another way of saying that you're making that overall system non-chaotic.
That's a lot like how an inverted pendulum is unstable, but an inverted pendulum in combination with a properly-tuned PID controller will be stable. In both cases, mathematically you want all the poles (for the combined-system's behavior) on the left side of the plane and then things are good.
Interesting, do you have other examples of controller interacting in a robust / non-chaotic way with chaotic systems ?
@@yoshtmhere's an example of a different controller (in this case, one powered by reinforcement learning) interacting with an inverted pendulum. (I don't know if links work so here's just the end: watch?v=lM6rYjM6HBU). I tried to find non-inverted-pendulum examples but inverted pendulums are pretty much the classic example of a chaotic system.
A solution you might not have heard yet (I'm sure other people have said 'cap the progress/speed reward after a certain point and just penalise falls after that') is to is keep a record of every fall in the last 10000 attempts, continuously train an evaluator AI to predict 'how likely is this position to fall in the next five seconds', and penalise the car's AI for how likely the evaluator thinks the car is to fall off. "Continuously train a decisionmaking AI and a predictive AI against eachother" is a pretty useful solution for turning abstract, faraway, discrete incentives that are hard for AI's to master (death in videogames especially so: this approach has been used on stuff like mario, atari breakout, pong, various obscure dungeon games, etc.) into nearer and more concrete incentives that are easier for a network to learn.
@@yoshtm humans
@@yoshtm literally all of life is built on this, actually
The game calculate the position of the car with smalls rounding errors. Those errors are deterministic but happens and are corrected every fractions of seconds. It's too fast to be analyzed and act as RNG. A dedicated AI could possibly crack the rounding errors secrets, but that seems like a lot of work.
There is a moment at 10:14 where the one little car chose life XD
A moment of self-actualization, satisfied with the reward it had.