Wow, I'm so happy someone actually took the time to develop that ! I second Stjepan's suggestion of creating a classification model that just outputs a country label. The model would probably end up relying on known metas like car color, antenna, cam gen, but also poles, roadsigns, etc... Then if that works well, you could potentially train different country-specific models for pinpointing the coordinates within each country. That would be a lot of work but I'm guessing it would have a much higher chance of beating pros. Good luck for your project, I'm definitely subscribing in hope to see more in the future :D
You can maybe as suggested first train the model based on countries, and when this works well, you can take one of the last layers as an embedding and start training from there on lat/lon. By not having one model per country you simplify the problem, but probably you will also learn non country specific information such as mountain, sun, terrain which will all help for your overall predictions.
In the video it uses categorical_crossentropy. This is a loss function for labels, meaning the model already uses labels. There is a comment later explaining how the labels used are geohashes, but often this is not accurate enough to beat pro players
Great video! One suggestion, I don't know if it is considered "cheating". But rewarding AI for guessing the country right would be very benificial (even more than distance probably) and could be implemented to the model (not easily though, because of how country borders work).
I love this implementation - have you considered using the browser extension's JavaScript code to monitor/scrape the network traffic and download the panoramic image files when they are fetched by the browser? This could allow you to acquire the training data faster rather than by interacting with the screen and screenshotting? May also provide you with a cleaner, non-overlapping image. I also like the idea of classifying countries first as another commenter suggested. Great project, really enjoy seeing the improvement
@@supremezzzzzzz presumably by overloading the XHR request and whenever it's downloading an image that you need, save it locally either with the FileSystem API or by posting it to a local web server
That's a good idea but in the context of the game the idea is that both parties get to see "limited information" wouldn't that be considered unfair advantage when playing against a human if the ai gets to see the whole paranomic view but a human can only see by scrolling around? Would be cool to have both options available maybe
This would GREATLY benefit from attention layers. If you think about how RAINBOLT thinks, he only really cares about certain very specific features, the AI must do the same. I also recommend using a openAI's UNET architecture rather than the basic CNN architecture.
Very interesting. Thank you for doing this. AI is usually very bad at recognizing where an object ends and another begins, so if it ends up being better than humans, it will probably be through color palette or something like that. That being said, I think it has a major disadvantage in not being able to recognize bollards, signs, road lines and telephone poles. It probably cannot learn what a country is either? But it does have the advantage of long learning time. I'm excited to see what happens!
Yeah it doesn't have an understanding of countries. With higher resolutions it could theoretically pick up clues from signs and billboards, but yeah - AI learns much slower than humans, they just get more data to learn from.
ai usually knows where objects begin and end through the colors, but without human interaction to learn a specific object it will probably need thousands of images of a specific country to start recognising bollards and stuff and even more to know the differences from one country to another when they have similar ones
pro tip: take another image looking downward. may need some more coding but i predict it will greatly and quickly increase accuracy. its thr most important clue in the game.
Would try multi-label classification :-) (with separate class sets). People were suggesting classification into countries since that is the most common meta but past that you can say how rich the region is, how dry/wet it is, how mountainous/flat it is, what biome it is, what latitude it is, what biogeographical realm it is... You can definitely extract a lot of these from some rendered maps with a bit of effort, might be possible to get most of them in tabular format. There was a suggestion of doing per country regressive model but I expect there it will just learn that southern half of this country is mountainous etc. and you'll be able to do it more accurately with a multi-label classifier. You can also do the final step of joining the different labels into one still in the model and hence have a weighed ranking that is trained with everything else. E.g. you can have a list of geohashes with a class asigned from all the other categories and based on that it has an edge linking it to the relevant label in each set with one connection and hence one weight per set. Regression will eventually work best but that would probably take orders of magnitude more data than you have? You were mentioning only 14 layers because 40k images isn't enough for more? But since it's very standard images you can just use a pretrained model and only train the last bit? I've trained a resnet50 on a minute of video and it worked better than smaller models...
I'll give multi-level classification a try, thank you for all the suggestions. Yeah, I started with regression before I switched to geohash es, but that didn't work well likely because of the limited data.
Nice! I wonder how an adaptation of a more developed image classification model (ResNet, inception, etc.) would do here. Especially if you can keep the imagenet pretrained weights.
Amazing video! I'm really curious why you decided to take on this project. You obviously have a lot of experience with coding, but you definitely seem to have a decent grasp of Geogeussr as well (which is mainly why I ended up here). I'm not a very good coder, but I enjoyed your explanation of your processes and I hope you create some more content in the future!
Thank you! I just graduated from my masters and had some free time! I like the game and thought it would be a nice challenge to train a network on it ☺️. I didn't have much experience with Geoguessr but I learned a bit about it on the way. I'll definitely do some more videos on geoguessr or other AI projects!
You should apply a GAN (generative adversarial network) to your CNN to make it more consistent and robust. This helps it generalize better, and may improve consistency.
Also I recommend increasing the width of your layers, even if you need to remove a layer or two at the end. A small layer can bottleneck the whole model.
Great video! A few questions about the model: it seems you're doing classification between 1024 classes and I'm assuming each one encodes one specific location. Have you considered training for regression, i.e. to regress the longitude and latitude directly? I feel like the model could perform better if you encoded this spatial dependency in the target variable better. Also, using a more advanced architecture like ResNet would probably boost performance (altough slow down training -- but here you could probably lower the image resolution noticeably without much loss :) ). Anyways, good luck!
Yeah, the 1024 classes come from geohashes (a standard for encoding locations in text) with a precision of 2. I tried regression initially, but that was much less accurate. I think because there is no linear relationship but a rather complicated one between coordinates and images/countries. I might try that again someday though. Or at least more classes/locations. ResNet / transfer learning is also on the ToDo list, but I'm currently limited by hardware. Thank you for your comment!!
idk if its possible but if you could hard-code some tips that are super typically for different countries (roadlines, camera generation etc) it should be easier and more efficient for AI to learn. Now, as it uses the general-look method it's easy to do a big mistake because for example british colombia looks very similar to north norway. Great project anyway, excited to see how it would play after few months
It's usually rather difficult and not always beneficial to hardcode any hints, as the system should eventually learn those by itself. However, I can provide some more information than just the position to the system. Then it has more information to learn with. Good thought though!
This is amazing. The first time I played Geoguessr about 2 years ago I thought "I wonder if someone has or will create AI for this and I wonder how". Then of course your AI play in the tournaments, etc. I actually first thought you used some brute force method of taking your program through every Google covered street in the world, reducing it for speed and then comparing the game image to your massive database. But that's because I'm no good at AI. I'm glad it works more like a human player works. Anyway, I really enjoyed seeing how you did this. I'm wondering what uses this might have outside the game. I first thought about geolocating for news networks (you know how CNN and others check if images sent to them were actually taken in the claimed location etc), then about military uses, then about police in crime investigations. Of course I understand pictures taken by phone will often not match those taken by Google's street view camera, but perhaps some version could move between sources of image. Anyway, I'm sure some other use can be found. Thank you for sharing your methods!
What about training from a pretrained model such as efficient net? finetune it on this data, then run it 5 times with square images and make an average on these predictions. Also, will you kindly share the dataset please?
This is actually insanely cool and I've recently been super into ML and CNN's. One question I have is how you got your current dataset? Did you have a bot play through thousands of games to collect your own dataset? Also how do you plan on increasing your data for this model? Repeat this same process? Is there a better way? And lastly (this is a bit of a noob question) but how do you recognize that the inaccuracies from this model is resulting from a lack of data vs. the model just not working well? In that case how many thousands of images would you reckon would produce a reasonably working model? 100,000? 1 million? more?
Also a suggestion: Have you considered using cloud computing (ie. aws or azure) to bypass hardware limitations? If your worried about costs I'm sure that you could set up an patreon or something and many AI or geoguesser enthusiasts would gladly fund your cloud computing,
The current data set is roughly 40.000 images. Im defenetly planning to increase the data. Data augmentation (rotation etc) also help to reduce the data requirements. No idea how many for a very good model, but so far it increased significant with more data.
@@TraversedTV Hey, great video! I actually had a very similar idea where I used selenium and geopy to make a dataset of 50k geoguesser images and feed it to a convnet, but I think the panorama input format is much more informative. Would it be possible for you to make the panorama data public at some point? Would love to play around with it. Either way I think the vid was inspired me to revisit the project so thank you :)
Any chance you'd be able to open source some of your code for this? I'm most interested in the frameworky bits for talking to Geoguessr programmatically so I can play with the model part myself. I spent a little bit of time playing with it tonight, took a different approach on the data collection and am trying to use Selenium rather than a browser plugin. I am struggling with getting the camera point of view to pan around so I can try to capture several points of view as you did. I have something working by having it press the arrow key a bunch of times, but it's slow. I suspect you might have been clicking on the compass needle, but in the newer geoguessr UI the compass at the top can't be interacted with. If that's the case, have you found a way to do this with the new UI? Either way, awesome project, thanks for sharing!
Well explained and neatly solved, such a smart idea to just merge all the screenshots together. How many Conv Layers did you use in the end and to you think the Model could improve more with more layers or do you think the limitation is in the training data?
Hi Malte, currently 14 conv layers. It's limited by the data atm. I think the layers are sufficient, it could use more filters/kernels per layer though!
If geoguesser is taking images from Google Streets View why don't you skip the middleman and go directly to Google for scraping the data? Instead of training the network from scratch isn't it beneficial to take an existing, pretrained, model (like resnet) and go from there? Firstly, you would get a solid architecture. Secondly, it's weights won't be completely random so retraining could take less time. And finally, the idea with guessing the country and than coordinates within it could be quite beneficial (but harder to implement)
Is this project open source? As a geoguessr fan and ML researcher, I would like to try improving upon this as a hobbyist project. Pretty sure that using semi-supervised learning would instantly yield significantly improved results given the currently very limited dataset.
@@TraversedTV Mit Selenium kannst du mehrere Instanzen parallel laufen lassen. Ich kann auch Puppeteer empfehlen, das ist eine Art Selenium aber ohne ein grafisches Chromium und direkt von Google.
Does the plugin follow geoguessrs terms and services? Im asking because each time you create a new game an API call is made which costs money for the game.
here from rainbolts vid. you’re awesome, can’t wait for the AI to improve.
You can use the street view API to get screenshots much faster, and ensure they connect seamlessly with the FOV set to 90
I had a similar thought.
would be cool to see an update vid if he makes these changes.
Incredible idea! Will be super interesting to see how this progresses. Good luck with the AI!
Thank you!
Wow, I'm so happy someone actually took the time to develop that ! I second Stjepan's suggestion of creating a classification model that just outputs a country label. The model would probably end up relying on known metas like car color, antenna, cam gen, but also poles, roadsigns, etc... Then if that works well, you could potentially train different country-specific models for pinpointing the coordinates within each country. That would be a lot of work but I'm guessing it would have a much higher chance of beating pros.
Good luck for your project, I'm definitely subscribing in hope to see more in the future :D
You can maybe as suggested first train the model based on countries, and when this works well, you can take one of the last layers as an embedding and start training from there on lat/lon. By not having one model per country you simplify the problem, but probably you will also learn non country specific information such as mountain, sun, terrain which will all help for your overall predictions.
In the video it uses categorical_crossentropy. This is a loss function for labels, meaning the model already uses labels. There is a comment later explaining how the labels used are geohashes, but often this is not accurate enough to beat pro players
Great video!
One suggestion, I don't know if it is considered "cheating". But rewarding AI for guessing the country right would be very benificial (even more than distance probably) and could be implemented to the model (not easily though, because of how country borders work).
That's an excellent idea - I'll do that!
This is super fascinating! Can't wait to see how this progresses :D
Thank you! I'll probably share an update in 2 weeks or so! 🤖
this is incredibly impressive; awesome work!
Thank you!
Great work, here from Rainbolt
Thank you! ☺️
Very cool project, dude! Appreciate the collab with rainbolt
Thank you!
I love this implementation - have you considered using the browser extension's JavaScript code to monitor/scrape the network traffic and download the panoramic image files when they are fetched by the browser? This could allow you to acquire the training data faster rather than by interacting with the screen and screenshotting? May also provide you with a cleaner, non-overlapping image.
I also like the idea of classifying countries first as another commenter suggested.
Great project, really enjoy seeing the improvement
This is the way
how would you do this exactly ?
@@supremezzzzzzz presumably by overloading the XHR request and whenever it's downloading an image that you need, save it locally either with the FileSystem API or by posting it to a local web server
That's a good idea but in the context of the game the idea is that both parties get to see "limited information" wouldn't that be considered unfair advantage when playing against a human if the ai gets to see the whole paranomic view but a human can only see by scrolling around? Would be cool to have both options available maybe
1:43 a couple of months later this seems like dinasour technology, now we go straight to GPT
This would GREATLY benefit from attention layers. If you think about how RAINBOLT thinks, he only really cares about certain very specific features, the AI must do the same. I also recommend using a openAI's UNET architecture rather than the basic CNN architecture.
Very interesting. Thank you for doing this.
AI is usually very bad at recognizing where an object ends and another begins, so if it ends up being better than humans, it will probably be through color palette or something like that. That being said, I think it has a major disadvantage in not being able to recognize bollards, signs, road lines and telephone poles. It probably cannot learn what a country is either? But it does have the advantage of long learning time. I'm excited to see what happens!
Yeah it doesn't have an understanding of countries. With higher resolutions it could theoretically pick up clues from signs and billboards, but yeah - AI learns much slower than humans, they just get more data to learn from.
ai usually knows where objects begin and end through the colors, but without human interaction to learn a specific object it will probably need thousands of images of a specific country to start recognising bollards and stuff and even more to know the differences from one country to another when they have similar ones
pro tip: take another image looking downward. may need some more coding but i predict it will greatly and quickly increase accuracy. its thr most important clue in the game.
everyone gangsta until the ai 5k's the dirt
Make the 2nd part possible of the AI vs PRO video! This is cool asf.
Very soon ;)
Would try multi-label classification :-) (with separate class sets). People were suggesting classification into countries since that is the most common meta but past that you can say how rich the region is, how dry/wet it is, how mountainous/flat it is, what biome it is, what latitude it is, what biogeographical realm it is... You can definitely extract a lot of these from some rendered maps with a bit of effort, might be possible to get most of them in tabular format. There was a suggestion of doing per country regressive model but I expect there it will just learn that southern half of this country is mountainous etc. and you'll be able to do it more accurately with a multi-label classifier. You can also do the final step of joining the different labels into one still in the model and hence have a weighed ranking that is trained with everything else. E.g. you can have a list of geohashes with a class asigned from all the other categories and based on that it has an edge linking it to the relevant label in each set with one connection and hence one weight per set.
Regression will eventually work best but that would probably take orders of magnitude more data than you have?
You were mentioning only 14 layers because 40k images isn't enough for more? But since it's very standard images you can just use a pretrained model and only train the last bit? I've trained a resnet50 on a minute of video and it worked better than smaller models...
I'll give multi-level classification a try, thank you for all the suggestions.
Yeah, I started with regression before I switched to geohash es, but that didn't work well likely because of the limited data.
It will be interesting watching development of this technology.
Nice! I wonder how an adaptation of a more developed image classification model (ResNet, inception, etc.) would do here. Especially if you can keep the imagenet pretrained weights.
Yes, I'll eventually try a pre-trained model and fine tune it for geo-location! 🤖👍. Also curious to see the results of that.
Cool idea, gl!
Thanks! ☺️🤖
great video! very nice explanations.
hope the AI improves much further.
Amazing video! I'm really curious why you decided to take on this project. You obviously have a lot of experience with coding, but you definitely seem to have a decent grasp of Geogeussr as well (which is mainly why I ended up here). I'm not a very good coder, but I enjoyed your explanation of your processes and I hope you create some more content in the future!
Thank you! I just graduated from my masters and had some free time! I like the game and thought it would be a nice challenge to train a network on it ☺️. I didn't have much experience with Geoguessr but I learned a bit about it on the way. I'll definitely do some more videos on geoguessr or other AI projects!
Very cool project! I hope i didn´t miss it in the video, what kind of hardware are you using to do your training on(CPU/GPU)?
Thanks Fabi! I'm used a GPU through the aws infrastructure. To reduce costs I switched to an own GPU now (rtx 3060)
You should apply a GAN (generative adversarial network) to your CNN to make it more consistent and robust. This helps it generalize better, and may improve consistency.
Also I recommend increasing the width of your layers, even if you need to remove a layer or two at the end. A small layer can bottleneck the whole model.
Very nice project, good luck!
Thank you!
Great video! A few questions about the model: it seems you're doing classification between 1024 classes and I'm assuming each one encodes one specific location. Have you considered training for regression, i.e. to regress the longitude and latitude directly? I feel like the model could perform better if you encoded this spatial dependency in the target variable better. Also, using a more advanced architecture like ResNet would probably boost performance (altough slow down training -- but here you could probably lower the image resolution noticeably without much loss :) ). Anyways, good luck!
Yeah, the 1024 classes come from geohashes (a standard for encoding locations in text) with a precision of 2. I tried regression initially, but that was much less accurate. I think because there is no linear relationship but a rather complicated one between coordinates and images/countries. I might try that again someday though. Or at least more classes/locations. ResNet / transfer learning is also on the ToDo list, but I'm currently limited by hardware. Thank you for your comment!!
@@TraversedTV I suggest colab pro and kaggle for the hardware
idk if its possible but if you could hard-code some tips that are super typically for different countries (roadlines, camera generation etc) it should be easier and more efficient for AI to learn. Now, as it uses the general-look method it's easy to do a big mistake because for example british colombia looks very similar to north norway. Great project anyway, excited to see how it would play after few months
It's usually rather difficult and not always beneficial to hardcode any hints, as the system should eventually learn those by itself. However, I can provide some more information than just the position to the system. Then it has more information to learn with. Good thought though!
This is amazing. The first time I played Geoguessr about 2 years ago I thought "I wonder if someone has or will create AI for this and I wonder how". Then of course your AI play in the tournaments, etc. I actually first thought you used some brute force method of taking your program through every Google covered street in the world, reducing it for speed and then comparing the game image to your massive database. But that's because I'm no good at AI. I'm glad it works more like a human player works.
Anyway, I really enjoyed seeing how you did this. I'm wondering what uses this might have outside the game. I first thought about geolocating for news networks (you know how CNN and others check if images sent to them were actually taken in the claimed location etc), then about military uses, then about police in crime investigations. Of course I understand pictures taken by phone will often not match those taken by Google's street view camera, but perhaps some version could move between sources of image.
Anyway, I'm sure some other use can be found. Thank you for sharing your methods!
What about training from a pretrained model such as efficient net? finetune it on this data, then run it 5 times with square images and make an average on these predictions. Also, will you kindly share the dataset please?
This is actually insanely cool and I've recently been super into ML and CNN's. One question I have is how you got your current dataset? Did you have a bot play through thousands of games to collect your own dataset? Also how do you plan on increasing your data for this model? Repeat this same process? Is there a better way? And lastly (this is a bit of a noob question) but how do you recognize that the inaccuracies from this model is resulting from a lack of data vs. the model just not working well? In that case how many thousands of images would you reckon would produce a reasonably working model? 100,000? 1 million? more?
Also a suggestion: Have you considered using cloud computing (ie. aws or azure) to bypass hardware limitations? If your worried about costs I'm sure that you could set up an patreon or something and many AI or geoguesser enthusiasts would gladly fund your cloud computing,
The current data set is roughly 40.000 images. Im defenetly planning to increase the data. Data augmentation (rotation etc) also help to reduce the data requirements.
No idea how many for a very good model, but so far it increased significant with more data.
I'm actually using aws already, so it's boils down to the costs. I might consider patreon! Good idea 👍!
@@TraversedTV and the patrons are then allowed to play against the AI.
@@TraversedTV Hey, great video! I actually had a very similar idea where I used selenium and geopy to make a dataset of 50k geoguesser images and feed it to a convnet, but I think the panorama input format is much more informative. Would it be possible for you to make the panorama data public at some point? Would love to play around with it. Either way I think the vid was inspired me to revisit the project so thank you :)
Any chance you'd be able to open source some of your code for this? I'm most interested in the frameworky bits for talking to Geoguessr programmatically so I can play with the model part myself. I spent a little bit of time playing with it tonight, took a different approach on the data collection and am trying to use Selenium rather than a browser plugin. I am struggling with getting the camera point of view to pan around so I can try to capture several points of view as you did. I have something working by having it press the arrow key a bunch of times, but it's slow. I suspect you might have been clicking on the compass needle, but in the newer geoguessr UI the compass at the top can't be interacted with. If that's the case, have you found a way to do this with the new UI?
Either way, awesome project, thanks for sharing!
Well explained and neatly solved, such a smart idea to just merge all the screenshots together. How many Conv Layers did you use in the end and to you think the Model could improve more with more layers or do you think the limitation is in the training data?
Hi Malte, currently 14 conv layers. It's limited by the data atm. I think the layers are sufficient, it could use more filters/kernels per layer though!
Wow man great work! I want to help you. Would you be intrested in team up to beat Rain bolt ? 😂 I'd like to develop your ideas event further
Great Video! Which Browser Plugin is used for taking the images?
Thanks Kedi, I wrote the plugin myself.
@@TraversedTV Oh okay, thanks for replying!
If geoguesser is taking images from Google Streets View why don't you skip the middleman and go directly to Google for scraping the data?
Instead of training the network from scratch isn't it beneficial to take an existing, pretrained, model (like resnet) and go from there?
Firstly, you would get a solid architecture. Secondly, it's weights won't be completely random so retraining could take less time.
And finally, the idea with guessing the country and than coordinates within it could be quite beneficial (but harder to implement)
The google street view api is kinda expensive I think
Good work! Do you plan to release the source code?
Currently not, I don't want people to use it in competitive. ;)
@@TraversedTV ah yes, I was not even thinking about that haha. I would like to do something similar but testing different architecture.
Is this project open source? As a geoguessr fan and ML researcher, I would like to try improving upon this as a hobbyist project. Pretty sure that using semi-supervised learning would instantly yield significantly improved results given the currently very limited dataset.
how did you do that ?
All it needs to know is this forest looks polish, telephone poles, roads, this dirt looks Brazilian...
hi, I wonder if you can give out the code with dataset
I don't want people to mess with it in competitive mode. And for the dataset, there are probably some licence issues if I upload it.
if you use selenium with python, you can take screenshots faster
I'll check it out! It still has to load the street view image though.
@@TraversedTV Mit Selenium kannst du mehrere Instanzen parallel laufen lassen. Ich kann auch Puppeteer empfehlen, das ist eine Art Selenium aber ohne ein grafisches Chromium und direkt von Google.
im wondering if you plan on releasing the ai so that others can duel it when your finished with it?
Currently not, but I might do some live streams where people can play against it in September.
Potential to be the next AlphaGo
Have you thought of making it open-source?
try this with country streak (its a gamemode on geoguessr)
I'll give it a try 😁
will it ever be possible to play against the bot? i'd really love to
i am subscriber #420
Does the plugin follow geoguessrs terms and services? Im asking because each time you create a new game an API call is made which costs money for the game.
nice ai.
Thanks!
@@TraversedTV 👍
I swear in 20 years all the aimbots and cheats are gonna be ran by ai
I hope people don't use to cheat...
I hope so too! But that's why I don't make it available to the public. People would need to redo it by themselves.
i fell asleep listening ngl