You didnt even have to go this hard. Ask GPT 4 to solve a simple ceasar cipher. You can even tell it the exact letter shift and it will still fail to apply it.
ChatGPT doesn't do well with the actual details of a string of text. It can't really store information you give it, it just knows what word probably comes next.
What's funny about all these "Coded an entire website using ChatGPT" is that 1. It's not really an entire website. It's just basic stuff. Unless you count some one page site with some simpler functionality and buttons an "entire website"... And 2. There were plenty of corrections before ending up with whatever was made.
ChatGPT can't solve anything because it doesn't understand the meaing of words. All it does is pattern matching and probability models. The answers to simple puzzles come out probably just because the training input already had them and the bot correlated these answers with the questions in the input.
It's only good for stuff that do not require too much thinking or calculations, if I told it to give me certain logic in code, it was only able to do it for very common things such as a levenshtein distance alg, but not for something lesser known
That's because it doesn't do any thinking or calculations. It's a word predictor. It predicts words. Don't hammer your nails with a sponge. Don't run mathematical calculations with a word predictor.
ChatGPT is a word predictor with a bias towards attempting to match the current conversation context. The more you try to correct it in a conversation, the more tied up in the context it gets. Don't correct the bot with further conversation, edit your statements or restart the conversation entirely
Yes! While these puzzles are really difficult and I’m not so sure it would have gotten them anyway, the way he was talking to gpt was not exactly “correct” as you said it relies heavily on context and while it is possible to correct simple mistakes when it comes to difficult tasks, correcting it does more harm than good because he starts to get confused about what is fact and what is not based on what both he said previous and what the user has inputted, the majority of the hallucinations that I have witnessed with gpt comes from trying to correct him without doing it in the correct way if that makes sense
I asked if ChatGPT knows Bulls and Cows game and suggested to play it. Bot thought a number and I had to guess it. After the third answer limitations of a bot that just tries "to continue a sequence of words with the most probable candidate" became very obvious)) Answers were inconsistent and when I pointed out inconsistency it agreed about mistake, but the new answer it gave was as inconsistent as previous. To sum up, when ChatGPT saw something similar to a problem in a training set as I believe was the case for the single-cross problem it can produce wonderful results, but do not expect real reasoning from it.
I created a simple stenography challenge for ChatGPT, which only required 5 steps to uncover my pseudo Google account details. The bot could not solve it. I used no password encryption, only standard ASCII reversal , encoded to binary, then to Base64. I then advanced every 3rd character in the Base64 by 1. I then added the resulting string into metadata in a standard jpeg file depicting a red rose. It would have been cool if the AI could have uncovered the hidden data. Perhaps we still have far to go before AI can achieve this. ?
ChatGPT is really just a fancy next-word predictor, so it can't really do stuff like that and probably never will for a long time. It's like trying to use a fork to eat soup. That being said, if there's an AI that is specially trained for this task, it may be able to recover your account details.
The issue is that you found these tests online. ChatGPT has scanned the internet so it can get many of the word riddles. Some of them it just doesn't know what you're asking.
Yup, I've suspected as much when I copy-pasted an Einstein's famous riddle to the ChatGPT, and it immediately blurted out the right answer, however, when I simply replaced "German guy" with a "Russian guy" within the puzzle's description, ChatGPT fucking exploded with wrong answers, illusions of logical thinking, and other nonsense stuff. So, in the end, it couldn't even compare my input with the IDENTICAL input it was learned upon save for ONE replaced word.
@@AgentFire0 I have managed to actually get it to do problem solving by making word riddles that are not anywhere on the internet. I asked it something like this. There are three boxes, box A, B, and C, all placed side by side. These all look identical. Fred places a coin in box A for storage, and leaves the room. While Fred is not anywhere nearby, box A is switched places with box C. The coin is removed, and placed in box B. When Fred returns, he looks in the storage where he placed his coin, which box will Fred check first? It got this right for me. And then as another test, you could say the boxes are labelled BUT you need a way to ask it "Which position does he check" rather than "Which box does he check". Because, if they are labelled, he will see it has switched places, and check the third position (where box C was) BUT this is still technically box A. So even though the answer is the same if they're labelled (He will check box A) the place where box A is has changed.
Not really, seeing the difficulty of the puzzles I would be surprised if it got any of the reasoning correctly. (although I did suspect it would have seen at least some of those answers on the internet previously but I guess not)
A one story house with a basement is still considered one story. Only above grade level counts. Edge cases are everywhere. Tug of war: You might be able to convince ChatGPT that the correct answer is incorrect.
@@Alex_Vir Balcony/deck is not a habitable space, not even inside the house, so no. Inside there could be stairs up to a roof deck in a one story house.
I have been using bing for a while now to search things if a very obscure language. And to be very honest, it's good to provide general answers but really bad at giving very precise result, in that case Google actually beats it. Also I hate that it doesn't open youtube videos on UA-cam. In conclusion I don't really see much of an improvement atm :)
Language models are horrible at letter wise questions like the first one. It basically doesnt even read letter by letter or know where they are since it turns the words into chunks and the chunks into embedding space vectors. Language models are also not really built for math since the neural networks theyre based on have no way to perform calculations outside of memorization and pattern matching. That being said even if these two limitations didnt exist it would probably still get the questions wrong since ai just isn't there yet to do this level of multi level creative reasoning... Yet. Id be curious to see the results with gpt-4 and "chain of thought" prompting, as im sure that would perform much better.
I asked it to play tic tac toe with me. A game famous for always ending in a draw, been solved by computers since the 50s and famously used in the 80s film Wargames to teach the rogue ai in that film about no win scenarios. I won every game against chatgpt. It’s a very clear word prediction algorithm but intelligence it is not.
im not surprised by this. puzzle solving probably was a fairly low percentage of the training data and since it cant access the internet its not able to learn or look up how it works and therefore is just randomly guessing like probably most people would.
"Difficulty" of a puzzle (to humans) is not a relevant metric here. Rather, it's more a matter of how well-documented the solutions are in its training dataset.
ChatGPT can't actually read letters. Instead, words are simplified into "tokens," which may be a full word or a part of a word. This puts it at a severe disadvantage with any word puzzles.
Language models are impressively bad with anything rekated to math. I once gave one a string and asked it to count the characters and it failed in the most spectacularly impressive ways.
ChatGPT doesn't actually understand or analyze anything, it just makes shit up. Its main goal is to "sound" true, whether or not it's actually true doesn't matter.
Prompt engineering is an actual skill, sir. You are not leveraging the complete potential of cGPT with your prompts. Retry this again but this time craft the prompts in instructions format. Look at its training for reference.
This guy talks to ChatGPT exactly the same way interviewers do during my technical interviews xd
hello, HumanGPT
You didnt even have to go this hard. Ask GPT 4 to solve a simple ceasar cipher. You can even tell it the exact letter shift and it will still fail to apply it.
ChatGPT doesn't do well with the actual details of a string of text. It can't really store information you give it, it just knows what word probably comes next.
What's funny about all these "Coded an entire website using ChatGPT" is that 1. It's not really an entire website. It's just basic stuff. Unless you count some one page site with some simpler functionality and buttons an "entire website"... And 2. There were plenty of corrections before ending up with whatever was made.
4:09 FIY it didn't bug out, it ran out of tokens for the answer. You can tell it "continue" or "go on", and it will go on with the answer!
yes, this and other things make me think he doesn't understand language models and their limitations too.
Thought this was like a 50k subs channel, only 148? Greatly underrated
3 month later it is 23k subbed
One year later, 117 subs
It produced text that looks convincingly like an answer to a logic puzzle, which is exactly what it's trained to do, so 10/10.
This channel is destined for big stuff
you're destined for big stuff
I was trying to concentrate on the puzzles, but I kept getting distracted by THE LICC
ChatGPT can't solve anything because it doesn't understand the meaing of words. All it does is pattern matching and probability models. The answers to simple puzzles come out probably just because the training input already had them and the bot correlated these answers with the questions in the input.
If anyone wants some peak humour:
Ask chatgpt to draw an ascii art of yoshi.
You’ll be surprised at what you find
It's only good for stuff that do not require too much thinking or calculations, if I told it to give me certain logic in code, it was only able to do it for very common things such as a levenshtein distance alg, but not for something lesser known
That's because it doesn't do any thinking or calculations. It's a word predictor. It predicts words.
Don't hammer your nails with a sponge. Don't run mathematical calculations with a word predictor.
underrated channel is underrated
ChatGPT is a word predictor with a bias towards attempting to match the current conversation context.
The more you try to correct it in a conversation, the more tied up in the context it gets.
Don't correct the bot with further conversation, edit your statements or restart the conversation entirely
Yes! While these puzzles are really difficult and I’m not so sure it would have gotten them anyway, the way he was talking to gpt was not exactly “correct” as you said it relies heavily on context and while it is possible to correct simple mistakes when it comes to difficult tasks, correcting it does more harm than good because he starts to get confused about what is fact and what is not based on what both he said previous and what the user has inputted, the majority of the hallucinations that I have witnessed with gpt comes from trying to correct him without doing it in the correct way if that makes sense
ChatGPT is a language model based thing. Don't expect it to understand problems that fall outside of the scope of basic logic and language comparison.
This absolutely. I wonder though if a similar model trained purely on mathematics would have better success with maths problems?
So, it failing to provide a word with 11 letters has nothing to do with language?
Kinda feeling happy for being a subscriber before 1k, you'll do great if you keep at it!
that LifeAdviceLamp "Buy Lottery Tickets" tweet, is the king of the city of my heart
this is really cool
you're pretty cool yourself, IcecubeX 2000
Are you the Kevin Fang that works at Jane Street or is that a different Kevin Fang? He’s on LinkedIn if you’re not him and want to find him.
Hitting the AI with "bruh" is what's gonna lead to the robot uprising isn't it
You are very underrated ):
Man I would die on the hill about seed being an early stage of a plant.
Lol
7:08 all that math looks impossible to me.
I asked if ChatGPT knows Bulls and Cows game and suggested to play it. Bot thought a number and I had to guess it. After the third answer limitations of a bot that just tries "to continue a sequence of words with the most probable candidate" became very obvious)) Answers were inconsistent and when I pointed out inconsistency it agreed about mistake, but the new answer it gave was as inconsistent as previous. To sum up, when ChatGPT saw something similar to a problem in a training set as I believe was the case for the single-cross problem it can produce wonderful results, but do not expect real reasoning from it.
I created a simple stenography challenge for ChatGPT, which only required 5 steps to uncover my pseudo Google account details. The bot could not solve it. I used no password encryption, only standard ASCII reversal , encoded to binary, then to Base64. I then advanced every 3rd character in the Base64 by 1. I then added the resulting string into metadata in a standard jpeg file depicting a red rose. It would have been cool if the AI could have uncovered the hidden data. Perhaps we still have far to go before AI can achieve this. ?
ChatGPT is really just a fancy next-word predictor, so it can't really do stuff like that and probably never will for a long time. It's like trying to use a fork to eat soup. That being said, if there's an AI that is specially trained for this task, it may be able to recover your account details.
@@samuelthecamel Agreed
The issue is that you found these tests online. ChatGPT has scanned the internet so it can get many of the word riddles. Some of them it just doesn't know what you're asking.
Yup, I've suspected as much when I copy-pasted an Einstein's famous riddle to the ChatGPT, and it immediately blurted out the right answer, however, when I simply replaced "German guy" with a "Russian guy" within the puzzle's description, ChatGPT fucking exploded with wrong answers, illusions of logical thinking, and other nonsense stuff.
So, in the end, it couldn't even compare my input with the IDENTICAL input it was learned upon save for ONE replaced word.
@@AgentFire0 I have managed to actually get it to do problem solving by making word riddles that are not anywhere on the internet.
I asked it something like this.
There are three boxes, box A, B, and C, all placed side by side. These all look identical.
Fred places a coin in box A for storage, and leaves the room.
While Fred is not anywhere nearby, box A is switched places with box C.
The coin is removed, and placed in box B.
When Fred returns, he looks in the storage where he placed his coin, which box will Fred check first?
It got this right for me.
And then as another test, you could say the boxes are labelled BUT you need a way to ask it "Which position does he check" rather than "Which box does he check". Because, if they are labelled, he will see it has switched places, and check the third position (where box C was) BUT this is still technically box A. So even though the answer is the same if they're labelled (He will check box A) the place where box A is has changed.
Can you add what bgm you used to the description?
All original music in this one - the intro one is on this channel (the davie504 video)
After seeing so many reports of astonishing things ChatGPT can do, it appears the worm has turned and now we find it interesting where it fails.
it's an nlp language model, not the oracle of delphi, i wouldn't be concerned if your job requires brain power.
Not really, seeing the difficulty of the puzzles I would be surprised if it got any of the reasoning correctly. (although I did suspect it would have seen at least some of those answers on the internet previously but I guess not)
A one story house with a basement is still considered one story. Only above grade level counts. Edge cases are everywhere.
Tug of war: You might be able to convince ChatGPT that the correct answer is incorrect.
Also is a roof balcony counted as an additional story?
@@Alex_Vir Balcony/deck is not a habitable space, not even inside the house, so no. Inside there could be stairs up to a roof deck in a one story house.
I once tried something like this and it did same crap. Now I wonder how many hours they spent to get those ad results.
Any improvement with ChatGPT4?
Not sure (not going to sign up for premium). I think Bing AI uses GPT4 though...
I have been using bing for a while now to search things if a very obscure language. And to be very honest, it's good to provide general answers but really bad at giving very precise result, in that case Google actually beats it. Also I hate that it doesn't open youtube videos on UA-cam. In conclusion I don't really see much of an improvement atm :)
Language models are horrible at letter wise questions like the first one. It basically doesnt even read letter by letter or know where they are since it turns the words into chunks and the chunks into embedding space vectors. Language models are also not really built for math since the neural networks theyre based on have no way to perform calculations outside of memorization and pattern matching.
That being said even if these two limitations didnt exist it would probably still get the questions wrong since ai just isn't there yet to do this level of multi level creative reasoning... Yet.
Id be curious to see the results with gpt-4 and "chain of thought" prompting, as im sure that would perform much better.
I gave up on chatgpt when it just kept giving me the same code over and over again.
It's so painful.
chatgpt is a very chatty talking dice.
I asked it to play tic tac toe with me. A game famous for always ending in a draw, been solved by computers since the 50s and famously used in the 80s film Wargames to teach the rogue ai in that film about no win scenarios. I won every game against chatgpt. It’s a very clear word prediction algorithm but intelligence it is not.
im not surprised by this. puzzle solving probably was a fairly low percentage of the training data and since it cant access the internet its not able to learn or look up how it works and therefore is just randomly guessing like probably most people would.
I'm actually crying, this was hilarious.
"Difficulty" of a puzzle (to humans) is not a relevant metric here. Rather, it's more a matter of how well-documented the solutions are in its training dataset.
Answer: No, but it sure thinks it can.
Puzzle one sounds like the ABCs song.
bruh is my most used comment back to chat-gpt's answers hahaha had me laughing there
Would be fun to see this revisited with chat gpt 4
Funny how the language model is terrible at the language puzzle, but great at the maths one
ChatGPT can't actually read letters. Instead, words are simplified into "tokens," which may be a full word or a part of a word. This puts it at a severe disadvantage with any word puzzles.
Try GPT-4
You have truly doubled Dolly with extra care.
Duuude, you new, you good! please keep it up! :D
Sup from me
The model sees token ids, not words or letters
I'd give it a score of 1/2.
Language models are impressively bad with anything rekated to math. I once gave one a string and asked it to count the characters and it failed in the most spectacularly impressive ways.
It can't answer simple questions and you're talking about jane street puzzles
i was here before he got big.
ChatGPT doesn't actually understand or analyze anything, it just makes shit up. Its main goal is to "sound" true, whether or not it's actually true doesn't matter.
Worth trying again with v4
The Python code is cringe.
important note: you started with gpt4 and continuted with chat gpt. Next time try to use gpt4, its much smarter)
yoo
Prompt engineering is an actual skill, sir. You are not leveraging the complete potential of cGPT with your prompts. Retry this again but this time craft the prompts in instructions format. Look at its training for reference.
Bot solves 0/3 problems, scores 2/5... Sure, ok.