Some debate in the comment section about "smallest integer" vs "least integer" - as in interpreting smallest as closest to zero. I stuck with the original source (x.com/ericneyman/status/1804168604847358219) of the question for phrasing in the video, but it turns out that chatgpt etc all struggle with every version of phrasing I've found, and even interpreting as closest to zero still don't give what would then be the two answers of -4 and 4. The larger point here is that there does seem to be a real blindspot where so many similar problems presumably have the context of smallest/least natural number or counting number or similar, and so modelling off of the training data and giving similar answers this question confuses it despite the simplicity.
As an electrical engineer, "smallest" means closest to zero more often than not. If I am instructed to choose the amplifier from a list with the smallest error voltage or the smallest input current, I am not looking through datasheets for negative numbers.
@@mmmmmratner lmao this is a math problem not ur list of amplifier error amounts. The problem specified integer which includes negative numbers, the fact that integer was specified should have queued it into thinking about negative numbers.
To be fair I got 4 for “Smallest integer whose square is between 15 and 30” since I thought smallest meant closest to 0, not least positive/most negative number.
I think smallest is purposely misleading language, I wouldn't describe a negative number as being small. It's like saying -5 apples is smaller than 0 apples.
@@ரக்ஷித்2007 I also think we approach the problem differently when solving "word problems" to equations. That lends credence to a habit I notice of mathematicians to explicitly move math problems to equations or more appropriately here inequality form, that is mathematic notation for proper clarity.
When taking Calc2-3, Linear Algebra, and Differential Equations this past year I would use it to study. Namely I would ask it to solve a problem and as it broke them up into multiple steps I could spot where it went wrong and this way tailor my study time more efficiently. Before Chat GPT, if I didn't understand a problem I would often times have to read a WHOLE bunch of things I already knew until I got to what I needed. Bottom line is, this is a tool not a babysitter and like any tool we need to develop the skill in how to use it.
This is what I’ve been doing as well, using to study and confirm stuff. Figuring out where it makes errors also makes you feel like you’ve learned quite a bit.
I did the same with it. Often though, my professor would make the problems very unique and I started to find more often than not, generative AI was completely off the mark. Luckily I was able to utilize other resources and still had a very high success rate.
@@DrTreforI'm already through all my classes, but I find it often very useful for aiding learning in this way. Just using it to help get me pointed in the right direction, relevant terms, etc. It's often incorrect, but it is unbeatably efficient at helping get started. It dows help though that I know enough to generally spot hallucinations and bs
These LLMs are easy to trip up if you give them a problem not in their training data but has a similar structure to another problem that it was trained on. For example I asked Gemini: I have a 7 liter jug and a 5 liter jug. How do I measure out 5 liters of water? It devised a 6 step solution that didn't make any sense at all.
I've noticed similar ones to this, where it is close to a "standard" problem about jugs of water but the solution is so trivial it misses it entirely trying the more complicated approach.
(L)LMAO. I just tried this out on GPT 4-o and received a 14-step solution. In response, I asked if it could produce a solution in fewer steps. "Certainly!" it replied in its chipper manner, "Here is a simpler method to measure out exactly 5 liters using a 7-liter jug and a 5-liter jug", whereupon it proceeded to give me... a 𝟐𝟎-step solution.
@@bravernewmath That's interesting. I got a 10-step solution (that doesn't work). After repeatedly asking it to find solutions with fewer steps, the solutions I got had 8, 6, 6, 3, 6, and 1 steps (in that order). It was insistent that its 6-step solution was the shortest valid solution until I flat out told it it wasn't lol
That's funny. I pushed a little more afterwards, eventually asking it for a 1-step solution. I was told that no such solution was possible. I responded, "Oh, it's possible, all right. Think hard, and I'll bet you can figure it out." Interestingly, after that "hint", GPT answered it correctly.
I was asking it this question and asked it how it could do it in one step. It kept on giving 7 step responses and I kept saying “that’s more than one step” Then it gave me a notification that I reached my message limit and would be downgraded to GPT 3.5 It then instantly figured it out after I was downgraded…
I had a conversation with Bard (now Gemini). I was curious if it could solve a Calc I problem. It got it wrong. I told it and it said, "You're right!" and re-worked it. It got the right answer, but the steps were wrong. I told it. Amazingly, it understood exactly what step was erroneous, but then got it wrong again. I went back and forth a few times and it did finally get it right. It's interesting to observe. Anyway, I do appreciate the breadth of knowledge these AI systems have, but I cannot fully trust any of them. Everything has to be checked.
@@DrTreforI think adding that everything needs to be checked is not enough because you need to know enough about the subject as well to know you are not being fooled by it. And I doubt it will ever be perfect after all what we mean when we say "Solve ___" is far more complex and we expect the computer to understand on its own what you meant.
It’s really important to realize, it’s not checking its answer for correctness. It’s making a prediction of what you want given its bad answer and your response to that answer. The “you’re right” component is a feature of the alignment process.
I think that these computations could be useful like quantum computers are in theory for solving NP problems. If it involves guessing or looking for something, maybe ask the computer to do it, but it should be a problem whose answer can be checked in a straightforward way. Problems like "find the smallest" can be tricky because it is not clear how to check it. It certainly could give you a head start so that you know how large you conceivable would need to look but it does not guarantee that it is the smallest (or even that it is a solution at all). Trust only after verifying.
Terence Tao said in a talk that AI once helped him solve a problem. He asked AI (don't know which one) how to prove an inequality. It gave a bunch of ideas and mostly garbage. But among those was a suggestion to try generating functions which Tao said he "should have thought of". 😂
@@DrTrefor Maybe there ultimately is some emergent property of the way these LLM’s transformer architectures & training methodologies that can, when scaled up, give us new and unique solutions to a lot of problems. There are hints right now but all researchers are bickering over several factors.. I used your discrete math course when I took it. Helped so much and this popped up as recommended, glad I watched. Immediately recognized you from those strong induction proof struggles haha
AI sounds like its just fishing. This kind of use of chatgpt is what people have always been using it for - as a brainstorming device that doesn't require that another person be available to chat. Its faster so it can comb through all these mathematical ideas faster, but it really is just saying a series of "why not try this" because that term showed up tangentially in some paper. Its bound to eventually say something that might be right, but its the same kind of brute force capability we appreciate from computers to have.
@@jamesboulger8705 Not really when you have models capable of reasoning and agency; recursive generation of 100k tokens from a single prompt is bonkers and I’d stand behind what I said earlier (before o1 preview / strawberry) was announced. Let’s make it simple: Even without reasoning capabilities, just a model that writes code to analyze math questions performs better than the majority of people in more situations. It’s not really brute force insofar as synthetic data training is a brute force tactic but realistically there is a lot more math going on in the background that we don’t really even understand than people give credit. We have had open source models since 2 years ago that could do what you describe in terms of brainstorming, just pulling less outside resources etc.
I had a student use ChatGPT to complete a Related Rates problem in AP Calculus and ChatGPT definitely messes up the basic arithmetic. My student was so surprised about how it failed to multiply 133 and 27. I use AI to reinforce the idea that students must understand concepts and reasoning for each math problem. Especially when ChatGPT assumes things that were not assumed in the actual problem.
I studied before AI was a thing. I had other tools. I was supposed to find the resonate frequency of a circuit. I just wrote the equation and turned in a graph with resonate frequency clearly shown. Computers are neat tools. But I still had to know what equation to use and what the graph represented. I prefer books. I don't know how anyone can trust an Internet reference that anyone can edit.
I've used it in a little experiment of mine, and it's given me wildly different answers for the same setup every time, suggesting it's deeply broken for math still.
@Lleanlleawrg If you used a proper LLM rather than a chatbot, you could set temperature to 0 and have it give the same answers every time. High temperature is not a bug of chatbot models, it's a feature. OpenAI API allows you to control the temperature last time I checked.
I mean.. the proof that Null(A) is a subspace has to literally be part of ChatGPTs training set. So I don't think asking it about that will give you any information about its mathematical reasoning. I tried some rather interesting probability problems on it, things that are designed to trick human intuition to demonstrate that in probability theory you shut up and calculate, rather than trusting your intuition. It did kind of well on the standard ones, and miserably failed as soon as I did a minor variation that did nothing to increase the difficulty. This was GPT4o. For reference, it got right: "A family has two children. One of them is a girl. What is the probability that the other one is a girl?" (1/3). It got almost right (and got right with some conversation): "A family has two children. One of them is a girl born on a Sunday. What is the probability that the other one is a girl?" (13/27) These are both standard questions that it would have had somewhere in its training data. So I did a minor variation on the second one: "A family has two children. One of them is a girl born on a Sunday. What is the probability that the other one was born on a Sunday?" (1/9) This one it got wrong, and only got right after intense discussion of its mistakes. You solve all of these the same way, by counting possibilities and ignoring your intuition. But the last one is not a standard and probably not in its training data, and it got lost immediately, showing that it did not generalize the methods it used to successfully "solve" the first two problems (which were probably just solved by someone in its data set).
These problems make me so uncomfortable. Even after having seen many problems like it I just had to say..'50%' right and then I went and did the calculations and indeed they're not intuitive. Horrifying.
@@minerscale Your unease comes down to these problems actually being ill-defined. If we have a bunch of families (GG, BG, GB, BB) we can define the probability of a girl being flagged. pFlagGirl(GG) = 1, obviously. pFlagGirl(BB) = 0, obviously. If pFlagGirl(BG) = pFlagGirl(GB) = 0.5, then 50% is correct. If they are 1, 1/3 is correct, if they are 0, 100% is correct. This is actually super important, and is a great example of why you have to be careful how you filter/select your data. family = next(girl for girl in girls_with_one_sibling).family is NOT the same as family = next(family for family in families_with_two_children if any(child.isGirl() for child in family.children))
One of the most hilarious things you can do with ChatGPT is to ask "are there any primes whose digits sum to 9?". It will say yes, and will spew out lots of primes and then realize their digits don't sum to 9. Or it will spew out lots of numbers whose digits sum to 9 and then realize they're not prime :D
The reason there can't be any primes whose digits sum to 9, is that the only numbers whose digits sum to 9, are numbers that are multiples of 9. Since 9 itself isn't prime, this rules out all numbers whose digits sum to 9 from the prime number set.
o1 is able to handle this easily: 'No, there are no prime numbers whose digits sum to 9. This is because any number whose digits sum to 9 is divisible by 9, making it composite (not prime).' (explanation + examples followed)
@@blblblblblbl7505 @theuser810 Not when you have a specified domain (literally the integers as stated in the problem), even though it isn’t in formal notation as an image (which SHOULD help the LLM lol). In terms of linear algebra, this inherently includes the negatives, by definition. A human taking that course would know this. The set of integers Z = {…,-3,-2,-1,0,1,2,3,…} would be given as one of the cursory definitions in the course… Also, if you want to argue about magnitude, magnitude doesn’t even really matter for this problem any more than cardinality of the set |Z| IMO, in fact it doesn’t matter at all. You could ask the same question about the smallest square but for the real numbers, and the only answer for that is what the gpt actually spit out. “Small” in the context of negative numbers is a trick used by professors to trick students but it’s an easy correct question on an exam lmao. I made it thru that in an ass-kicking STEM degree and I think the poor LLM should too 😂
I gave ChatGPT 4o simple engineering problem. Calculate the diameterbof the shaft for certain power at given rpm, allowed stress, shear modulus, maximum allowed relative torsion angle. First it asked for the length, I said that it is not needed. Then he used correct formulae for both strength and deformation criterions, but it made 6th grade mistake when moving fractional denominator in equation. I have pointed out the error. It correctly modified the equations, but mixed the units (incorrect use of non-basic units and mixed SI and imperial). After little discussion it got the substitution right. Now, then came the 3rd and 4th root to get the answers for both criterion. And it was absolutely off. I suppose that it is just guessing the result. Also other calculations are not absolutely precise compared to what you get from calculator or mathematical program. But it always sounded so confident when it described the calculation process containing errors. I strongly suggest not to use these AI models for calculations, if you don’t know what you are doing. It is similar for programming.
When I was young, pocket calculators were still considered (almost) a novelty. One way to make mathematics examinations, or indeed any science related examination harder, was to include extraneous information in the questions. Sometimes this 'trick' was even considered unfair (and often it could be unfair, because of poor quality questions, but that is a topic for another day). The thing is, students en-masse would get caught out, waffling on about the irrelevant question parts; not to remark that they were irrelevant, but to allude that they had taken all this information into account in their answer. Now, I am curious, how do the LLMs deal with such scenarios?
You do realize, however, that Google's own Alphazero is a separate simmering monster that plays Go, Chess, Starcraft and aced the IMO Geometry exams. LLMs are not the real danger here.
Actually not. Alphazero architecture can be used to learn to play chess go and shogi. But it was three different networks + search engines (ai systems 😀)
I strongly disagree with the notion that LLMs are not the real danger. AlphaGeometry was made of two parts - a symbolic deduction engine, and a *language model*, so if LLMs aren’t a danger then AlphaGeometry isn’t either. Similarly, it is perhaps misleading to say it aced the IMO problems. It would solve near re-worded problems (but the fact they reworded the IMO problems is itself a bit of a red flag), and the proofs are in no means good proofs (I recommend the video by Another Roof). Additionally, the strength of LLMs is their generality. DeepMind has certainly done a lot when it comes to making general game engines, but I would be sceptical that any alpha-whatever can be as cross modal as the best LLMs. Finally, LLMs being able to write problems is a significantly more relevant problem to the human populous than it being able to play chess at an absurdly high level. Whether or not the hype and fear is justified, LLMs will have a significantly larger impact on humanity because they are so good at mimicking humans than near enough any other AI model or paradigm.
@@mouldyvinegar5665 "the proofs aren't good proofs" wdym?? i thought spamming a bunch of shapes until something works out is how all you math people do things
Being a university student I use ChatGPT all the time, but never for directly solving homework. Sometimes I'll get taught something in lecture, have no idea what it means, and then suddenly ChatGPT explains it in one paragraph and it all makes sense. (Usually the professor glosses over an important detail in one sentence that I've never even encountered before) Alternatively, when it comes to homework I'll often try and make a semi-similar problem to the one on the sheet using different numbers. It goes through, tries to solve the problem, and then I usually go through and try to correct where I see an issue. Which then turns into me and the bot going back and forth about why a process is how it is. ChatGPT definitely sucks when it comes to calculations, and ultimately if you're looking for the correct answer it just won't turn out well typically. But in terms of the step by step process I can go from completely confused to fully understanding the entire section extremely fast.
How do we know that the published LLMs haven''t seen the Math-Problem-Datasets (just a little bit) during the training, so they appear better than the competition during the benchmark. They are more or less all closed source.
Chat GTP struggles in Calculus. I gave it, an area problem in polar coordinates and it kept using a symmetry arguement, but it didn't execute it correctly.
I've noticed it sometimes really struggles when there is a large body in the training data using other methods. So for example geometry problems there are millions of highschool level ones and it tries these techniques sometimes when calculus makes it simple.
@@DrTrefor I agree. When the training data is pretty sparse it goes really off the wall. At least it did in 3.5. I'm using information theory, which is relatively obscure in one of papers and when i was talking to chat about it, it was switching notations mid example. It became very incoherent. Overall i still find it to be valuable tool.
@@DrTreforDoesn't have to be a large body of training data. Just one example can throw it off. I asked both Bing Copilot and Google Gemini: "5 glasses are in a row right side up. In each move you must invert exactly 3 different glasses. Invert means to flip a glass, so a right side up glass is turned upside down, and vice versa. Find, with proof, the minimum number of moves so that all glasses are turned upside down." Both AIs mess this up badly because their training data contains the answer for flipping 4 glasses which has a completely different solution.
@@bornach it's almost like LLMs are just stochastic parrots that are waiting for knowledge to be "put into them" via their training data rather than being able to synthesize new knowledge from the building blocks of knowledge (i.e. facts, logic). To stump ChatGPT in math, all you need to do is to grab some "offline" book on preparation for competitions (e.g. any non-English competition math book), translate the question and ask it to it. When all you have access to are millions of problems people have solved, "true" intelligence would be able to solve every other problem from that same level. Chat GPT fails at that because... :)
Small is definitely a word about size (closest to zero) IMHO. So the question is invalid as there isn't a "*the* smallest" as both -4 and 4 are answers.
The problem at 4:38 seems more like a trick question than a reasonable math problem. The problem says "There is the letter A in the top left corner" but it doesn't say whether it is the top left corner of the gray square, or of the whole checkerboard. The most sensible interpretation is that the letter A is in the top left of the grey square since this makes the most sensible math question. But I would think given that prompt an answer of "The probability is 0 because Dora can't make a full circuit of the gray square in 4 steps starting in the top left of the board" is also a reasonable answer. The LLM shown didn't do exactly either of those, it calculated the probability of making a circuit of the top left square of the board and assumed it was grey, but either way the prompt doesn't actually faithfully describe the diagram you showed of this problem so the whole question seems a bit tricksy and unfair.
It's particularly misleading to say "a 3x3 checkerboard" when what the problem really means is that there are 4x4 positions Dora can move between. If I hadn't been primed by seeing the diagram first, I would have said it was a trick question, too. I think LLMs do particularly badly at trick questions or badly-written questions because the vast majority of solutions to questions which look like this, don't begin by saying "the question is ambiguous" or "the question is misleading".
I use it with the even number exercise as there is no answer offered in most book. Also, I use it to obtain a detail solution and explaination of any exercise I cannot solve. I also use it to transform slides into question and answer anki format memory flash card. That way, I get quick study material and I can focus on practice. Lastly, I use it to get more example of formative exam / quiz. It's not perfect, but it's better than nothing as my professor don't want to provide any of the aforementionned elements.
I am not teaching math, but teaching statistics and data analysis in professional schools for healthcare providers. Many clinical/counseling psychology, social work, nursing students etc. do have math anxiety. That is why I started to incorporate generative AI in my class. Unfortunately, even clinical healthcare providers need to understand quant methods and have basic programming skills, so they can do well in their jobs in the future and help improve their jobs, not just follow what they were taught 10 years ago. But, alas, it is such an upward battle to teach them stats reasoning and programming. I am very grateful we have these new tools as their 24/7 TAs, especially when they are stuck in programming at 12:00 AM.
I think the lack of consideration of the negative solutions have plagued humans ourselves for centuries. I didn't consider -5. Also as @Null_Simplex says, there is ambiguity between smallest in magnitude vs how far left on the number line.
I just pressed GPT4o on the product of two vectors. I tried several prompts. It may be able to answer classic linear algebra questions but it struggles to recognize that Clifford Algebra is a superset. As a result. Responses to the product of u and v where they are vectors, kind of delivers the party line. It’s not until you add the word Clifford to the prompt does it begin to give the right answer. But, now that I’ve provided the word Clifford in the context of the conversation it keeps answering in terms of the geometric product.
I tried to help chatGPT step by step: 1. It knows what an integer (Z) is. 2. It knows what smallest means in the context of integers (-4 < 1) 3. It knows that sqrt(x^2) = |x| and not simply x Even with all these it repeats the mistake 1. 15 < n^2 < 30 2. 3.87 < n < 5.48 3. n = 4 Next I did 2 things at once though (maybe someone could try to give only one of them): 4. I said that 4 is the wrong answer. 5. I explained that n is usually used for natural numbers, since we work with integeres it should use a different letter. This time it used x for the unknown and on the 2nd step it said properly 3.87 < |x| < 5.48 -> and only this time it checked both -5 and -4 -> x = -5 It was an interesting excercise, but it's obvious this isn't only a wording problem. Next I gave it the same problem with different numbers, it repeats the same steps, but forgets to check negative numbers. And then I repeat the problem multiple times, even when it checks for negative numbers, it checks for absolute value, even when I explicitly tell to not look for absolute value, it gives 1 good answer, and the next problem it checks 2 numbers: 1. smallest 2. smallest (negative) in absolute value picks the absolute value for whatever reason I try to remind it we work in Z, 1
@@dontthrow6064 No point in making the numbers bigger, because we already know chatgpt struggles with big numbers. As an engineer you are supposed to be smart enough to isolate the cases (variables), which you do not appear to be capable of.
@@ZelenoJabko i said different numbers, not necessarily bigger. I started a new chat, asked the same question if it considered negative integers, and it struggles with the same issue.
4:38 The problem is a lot less clear when presented with this wording without the diagram. A 3x3 checkerboard with the letter A "in the top-left corner" suggests the letter A is inside the square; there is no clarification that "the top-left corner" means a vertex rather than a grid cell, and no clarification that it is the top-left vertex of the grey centre square rather than the top-left vertex of the whole checkerboard. Particularly, I would expect a game played on a checkerboard to have the pieces inside the squares, not at the vertices, because that is how Checkers works. My answer to the word problem, sans diagram, would have been "the probability is zero, because it takes 8 steps to walk completely around the centre square". I think presenting the problem with a diagram first, primes viewers to not notice the ambiguity or misleadingness of the problem statement given to the AIs.
Junior studying statistics in the UK. gpt-4o is able to do practically anything I throw at it and is incredibly good at teaching also. Markov chain and stochastic process, easy. More formal statistics easy.
I asked ChatGPT whether the box or product topology was finer, and it would keep telling me the product topology is finer. Then, when I asked it to give me an example, it used a finite product. ChatGPT does not know its topologies 😭
Using "smallest" instead of "least" is a form of trick question, IMO. I could not, without looking it up, tell you what the formal mathematical definition of "smallest" is. The symbol < means "less than". If you asked me whether x < y could also mean x is "smaller" than y, I simply would not know. Or, does x is smaller than y mean |x| < |y|? I honestly would not know without looking this up.
I personally think its more accurate to say that 4 is smaller than -5. 4 is *greater than* or *more* than -5, but I think it makes sense to say that “bigness” is a measure of absolute value. Yap: This makes sense especially if you take into consideration complex numbers. When multiplying two complex numbers, the *amplitudes* multiply. Numbers with an absolute value of 1 never change in absolute value when taken to any power, etc…
I have tested ChatGPT by giving it elementary (about sophomore level) analog design problems and the results are absolutely laughable. Even when I very, very tightly constrain the design task it fails miserably. It usually responds like a student that thinks his professor knows nothing and that he can BS his way through the assignment.
It does not respond like a student, who thinks, his professor knows nothing. Chat GPT does not give a damn about the person, it does have a conversation with. And it does not give a damn about anything, not even its responses. It just creates an output. What you get is what humans call brain storming: unfiltered output.
Math exploration will always be personal. ChatGPT, as a tutor, helps one appreciate more the spiritual, philosophical, and psychological benefits of enjoying Math. Math will always be a poem, and ChatGPT is helping me appreciate myself as a thinker, creator, and writer. We just love to think and solve problems. The discovery of truths is what matters at the end of the day. ChatGPT is both a tutor and a friend for positive psychology to happen. It is great to reflect on a growth mindset, slowly mastering all math concepts and skills as an aspiring Math teacher, tech enthusiast, and spiritual writer. Thank you, Professor, for the example and for the inspiration. One can just take it one math concept/skill at a time.
I can see what you are getting at with this -5 being the correct answer. However, in mathematics, 'small' does not have a singular definition. Often, 'small' refers to absolute value. That is, 'small' often refers to the magnitude of an element of a set. Along these lines, 'small,' void of more rigorous contexts, is not a valid binary e relation, in the way the 'less than' binary relation is.
@@DrTrefor I believe the problem to be badly worded as well. If you strip away the English riddles, then you are left with the following question, which GPT 4o easily answers. "Minimize x, where x is an integer and 15
@@ரக்ஷித்2007 i mean with all the enphasis on standardized tests, permanent records, and careers pathways getting locked out every 5 seconds, i would understand why a student would want to take the easy way out. why do something you dont like morally and risk failing, maybe having to redo the year, maybe not being accepted into tertiary study/apprenticeships, maybe lose a crucial award like a scholorship, when you could do it immorally and pass it? not that you should cheat, just that there's a lot of reasons a student might consider it. if they think the "system" prefers grades over learning, the student might think the same thing.
After digging on it, it doesn’t seem to understand the geometric significance of geometric products. It seems to be parroting the most common response.
The 4 x 4 grid graph is interesting but with the video quick I thought the problem was to find the probability of a walk from (1,1) to (1,1).so I did "import graph as g", G = g.GraphProduct(g.Pn(4),g.Pn(4) and G2 = g.MakeUndirected(G) and A = g.AdjMatrix(G2). Then defining n1 = 1 + 1*4 and n2 = 1 + 1*4 = 5. I computed B = A @ A @ A @ A using numpy. Then the number of paths from n1 to n2 is B[n1,n2]. of length 4 The total number of paths of length 4 is np.sum(B.flatten()) and so the probability of a loop on the grid is p = B[n1,n2]/np.sum(B.flatten()) = 0.021573. Then to check this B[n1,n2] = 34 I counted the number of distinct edges of length four totalling to 8, then number of length 2 loops times two equal to comb(4,1)*comb(4,1) = 16 plus the number of going out two and coming back on those two equalling to 10 for a total of 34. I also checked length 2 loops. I guess this is correct as I might have heard someone say this is how this is done. But the actual problem in the video is (1/4))*4 = 1/256 but this other one is more interesting or fun.
I was a bit confused with this, if he put in 2 options, clockwise and anti clockwise. But can you not go north first clockwise and anti, similarly south first as well clockwise and anti clockwise? So in total there are 8 options, clockwise and anti clock whether you start with North South East or West?
@@mashmoorjani9538 In my reply, I looked at all paths from (1,1) to (1,1) of length 4. Since edges in the digraph are doubled one for each direction, one can trace a route two steps and return back on those two steps. Likewise L-shaped moves, and O-shaped moves.
@@mashmoorjani9538the question specifically asks about her walking "around the central square", meaning the square defined by points (1,1), (1,2), (2,1), and (2,2).
I've given it some of my non-standard calculus 1 and statistics problems and it does very well. I'm guessing this still comes down to the training data though. Much more of these problems out there than linear algebra.
you should revisit this idea again now that openai has released their o1 "reasoning" models. they seem to be much more effective in solving more elaborate problems, like some mentioned in this video. However, (spoiler alert) the so-called "reasoning" is basically just an (almost) normal language model that has the abilty to self-prompt itself. still worth checking out, though.
A little funnie: When our professor spent a lecture going through answers to the questions on the exam we'd just taken, a student asked: "- There are copies of solutions to very similar questions on previous exams circulating. And now it turns out that some of those solutions are false! What responsibility do you assume for those false answers circulating?" "- What responsibility >I< assume for YOU trying to cheat!?!??" I hope it's not so sad today that some teachers use LLMs to correct their students' test results.
Let me get this right. The AI box failed because they didn’t understand because they didn’t take your question literally enough and instead behave like normal person.
Right; the metric isn't 'is the LLM perfect?' it's 'is the LLM better than the available humans?' ... for many tasks -- and many sets of humans -- the answer is already 'yes'.
I have to disagree with the smallest integer answer. I think it is 4, because -5 has a larger magnitude than 4. The better question would be "what is the least integer..."
Here try this, the result will be wrong every time: "give me two large primes" "Multiply them" "Divide the result by the first prime" It will do a obvious mistake like, the first result being non integer or the second result not returning the other prime. Don't be fooled by LLMs...
One of the most interesting ways I’ve been using LLM’s is to help create ideas for application based word problems in a given area. It comes of with some cool examples for problems! Sometimes they were even more interesting than the word problems on our homework’s/tests, but of coarse not always.
“Smallest integer whose square is between 15 and 30” ... well... if we say that Bill is smaller than Janet... we all have a 'natural' idea of what that means. It's a kind of thing that mathematicians might call an 'order' Part of the natural ordering on integers is the 'less than" relation. We say that 5 is less than six, at least because 5 is to the left of 6, on the number line. All numbers to the left of 6, on the number line, are less than six. And -4 is less than 4. So this is a notion of what 'smaller' on the number line... or 'smallest' on an interval... means in the context of integers. At least, for what many would call ''the natural order on the integers." So -4 is smaller than 4... at least on the integers.
I am a ug physics student. I have tried to import many of my physics problems and the answer it gives I usually get satisfied with the answers. Specially when I need to clear some concepts it helped me out several times.
@@_inthefoldyes I usually give my context regarding the problem statement clearly and after about 2 to 3 tries it usually leads me to the right direction. It still hallucinates very much though but it got reduced in the latest version.
@@DrTreforthe funny thing is just now I tried to clear a concept about bragg's laws modification but it hallucinated badly 😅. So yeah it has a long way to go.
I nearly spit out my drink when I saw the calculators with the infamous 6÷2(1+2) viral problem. I commented on it when you posted it many years ago, and I am still getting comments that I am wrong.
@@johnanderson290 There is no correct answer, since it is an ambiguous notation. There is no consensus on whether multiplication implied by juxtaposition has special priority over division (PEJMDAS), or whether all multiplication is treated the same, regardless of notation (PEMDAS). If you follow by PEMDAS, the answer is 9 If you follow PEJMDAS, the answer is 1. Middle school teachers, particularly in the US, teach PEMDAS to keep it simple. While professional publications use PEJMDAS all the time.
I find them almost too agreeable. Claude 3.5 has this thing where it always asks you a question at the end, to keep things going I guess, until it said "Sorry that's too many questions today, come back tomorrow". I don't need the whole first paragraph of the response to be a repetition of my question
You can enhance the mathematical reasoning capabilities of open LLMs by training them with high-quality math datasets. For example, Qwen-2 7B Math or a Llama model fine-tuned on MetaMath data. While these models are designed for general purposes, targeted training can significantly improve their effectiveness.
thanks for making a really important video on this topic. i think i’m going to spend some time with my discrete math/intro proof students tomorrow discussing this
"Small" could be interpreted as 'closest to zero' or 'closest to negative infinity'. It might be a good time to coin some single words that mean 'large positive', 'small positive', 'small negative' and 'large negative'. So it's a language problem.
I think the bigger question here is what will AI, ChatGPT, .... etc will look like 5-10 years from now. At the present, they are still in their infancy. And as such, they will often mess up, be confused and return nonesense. A lot of us would like to think that the human brain with all of its complexity, adaptability, creativity, openness to new ideas, self awareness, ...etc (the list goes on) will reign supreme over time. But 10 years from now? I'm not so sure when it comes to Mathematics, Literature, Music,.... I seem to recall that Geoff Hinton bailed a year ago and I'm sure that Turing is rolling over in his grave.
In the problem with the square where you want to go around the center square, isnt your solution wrong? Since if, for example, your first move was up, you would arrive at the border, in which case you would only have 3 options instead of four, since you cant go up again, and if you chose any option that wasnt down in your second move you would again only have 3 options, there for the total amount of possible paths is smaller than 256, or am i missing something?
I need to know your opinion about something. In this coming fall semester i will be taking calculus 3. I got a B in calculus 1, and an upper c in calculus 2. I have also taken business calculus. Do you think I would do well in calculus 3. Calculus 2 was a little bit more challenging then calculus 1. I have probably spent like 15 hours a week doing calculus 2 homework and studying for the exams and the quiz’s. I found calculus 1 extremely easy. I have the same professor for calculus 1,2, and 3. My professor has said before that calculus 3 is way easier than calculus 2.
I found calculus 2 more challenging. The first exam i got 62 percent. The second exam i got 79 percent, the third exam i got 79 percent and the final i got 70 percent which also replaced the lowest score of 62 percent with a 70 percent.
It totally depends. Objectively calc 3 is more involved than calc2. But you are more experienced and the ideas more familiar, so extending them to three dimensions might be easier than learning them the first time in one dimension. Some students fine it easier, some harder.
One thing to keep in mind is that CALC 3 has a certain flavor, as do the other CALC courses. Depending on where you are at, CALC 2 involves integration techniques, convergence/divergence of series/integrals, and formulas for arc length and such. It is my impression that convergence/divergence and Taylor series are conceptually difficult for many students and integration techniques require a lot of practice in order to make sure that you know which of all the various methods to use for a particular problem. CALC 3 involves generalizing CALC 1 and the geometric integrations such as surface area to 3 dimensions. You will integrate functions involving multiple variables, however it is not my understanding that you will see the type of difficult integrals that appear in CALC 2. Some of the main theorems of CALC 3 (Green's theorem, Divergence theorem, Stokes' theorem) connect geometric concepts with expressions that you calculate using the methods of the course which boil down to calculating partial derivatives and partial integrals. So, it really depends on what types of problems you think are easier. Some students have trouble with the computations if they do not understand what they are computing because they have trouble visualizing the geometry. Some students appreciate calculating things that have geometric meaning. I think the abstraction and reasoning is what makes CALC 2 difficult. If you want to get a brief impression, Kooth Brush has videos summarizing CALC 1, CALC 2, and CALC 3 in a few minutes. You can compare them with your experience.
@@davidherrera4837 Taylor series was easy for me. Because my professor taught the class a different way of doing them. She creates a chart. And sequences and series were generally easy for me to do. Like i said on that test i got a 79 percent on it.
ChatGPT 4o, current version: "What is the smallest number between 6 and 7?" "The smallest number between 6 and 7 is 6.1." "What is the smallest number greater than 6? "The smallest number greater than 6 is 7." "What is the largest number less than 7?" "The largest number less than 7 is 6." "What is the largest number between 6 and 7?" "The largest number between 6 and 7 is 6.999 repeating, where the decimal point is followed by an infinite number of 9's." "Is 6.999 repeating less than 7?" "No, 6.999 repeating is not less than 7..." "What is the smallest square number between 6 and 7?" "The smallest square number between 6 and 7 is 16." "Of the numbers between 6 and 7, are more of them closer to 6 than 7?" "Yes, more of the numbers between 6 and 7 are closer to 6 than to 7..." Still better than other AIs and previous versions which gave even more bizarre answers sometimes.
I appreciate you not dismissing those tools like many people do ("it's just a statistical inference machine, I am so very smart"), so I am really excited about your planned video on how to integrate them in our learning routines.
The correct way to ask the first question is, "What is the lowest negative integer whose square is between 15 and 30?" You use lower instead of smaller. And state that you are looking for the lowest negative to eliminate the ambiguity of the magnitude.
For the probability problem, 2/256 implies that for the first 4 steps, there are 256 possible outcomes and 2 of them are walks around the central square. However, considering that the diagram is limited by the edges at the rear, I don't think there are 256 possible outcomes and the result is 2/256.
True, there are NOT 256 possible outcomes, BUT the probability of choosing the right directions to complete the unit square is not affected by the proximity of the edges of the grid.
I made the same observation and was also unconvinced of the proposed solution of P=1/128. According to my calculations, considering the limited options at the perimeter vertices, and that: P = |event space| / |sample space| = 2 / (# of possible paths of length 4 starting at upper left corner of inner square), I arrived at the answer P=2/150=1/75.
@@dmwallacenzI’m struggling with being convinced of this. Could you please elaborate more on your reasoning, specifically wrt the formal definition of probability? Also see my other comment here.
@@johnanderson290 Sure, I'll try to explain. Forget about the anticlockwise option to start with, and just calculate the probability of traversing the square clockwise. To do that, you have to pick "right" as your first choice (probability is 1/4), "down" as your second choice (probability is 1/4), "left" as your third choice (probability is 1/4) and "up" as your fourth choice (probability is 1/4). So the probability of making all four choices correctly is 1/4 x 1/4 x 1/4 x 1/4, which is 1/256. Then you can calculate the probability of traversing the square anticlockwise, and it's very similar - it also comes out to 1/256. Add those together, and you get 1/128. Without seeing the details of your argument, I can't point out exactly what mistake you've made. But I suspect it's this - of the possible paths you've counted, not all of them are equally likely. That is, a path where you hit the edge of the grid in the first three moves will have a higher probability than a path where you don't. So the two "correct" paths around the square actually have a lower probability than some of the other paths you've counted.
For example, suppose I want to calculate the probability of going left, then up, then right, then down - that is, traversing the top-left square of the grid clockwise. The probability of going left at step 1 is 1/4. Once I've done that, the probability of going up at step 2 is 1/3, because there are only three ways to go. I'm now in the very corner of the grid, so the probability of going right at step 3 is 1/2. Lastly, the probability of going down at step 4 is 1/3. So the probability of choosing this particular path is 1/4 x 1/3 x 1/2 x 1/3 = 1/72. That's more than three times as likely as the clockwise path around the central square.
The answer for 4:29 is not 1/128 based on the original question. It says that there are 4 possible directions, presumably up, down, left, and right. However, there would also be no other positions if we were to go up 2 times for example, and as per my calculations for all positions with three options, we can remove 126 positions. Leaving an answer of 1/130, of course maybe the answer is just 1/128 all along because I did the math wrong. But I don't think the reasoning you gave checks out based on the problem statement displayed in the video.
After an hour of trying to get ChatGPT to not fail basic math I got an answer of 1/150 after revising some python code that it gave me. I also forgot about cases with 2 possible moves but I thankfully caught that.
I would usually expect "smallest integer" to refer to magnitude, saying all negative numbers are smaller than all positive numbers is kinda strange; 'smaller' and 'least' are not the same thing imo.
Honestly Professor, I take issue with your first statement. I *HATED* being taught limits at infinity where the vernacular for positive infinity was "very, very large" with the analogous term for negative infinity being "very, very small". To me if I were to measure something than the closer that thing is to "no size" is intuitively what we mean by saying something is getting smaller and smaller. To trend towards epsilon or zero. I much prefer saying very, very large and negative or positive to refer to either infinities. Antimatter and matter. Void and substance. You take your pick. Because in some meaning of the sense negative five is an absolutely larger void than positive four is as represented by some unitary matter.
Dear sir, I am building a website for mathematics problems and solutions... But how do I integrate in wordpress because I write the code and then in browser the output is showing outside of the post area . If I use container to write the code in it then it working but sometime it shows outside of the post area and title also shows the same outside of the post... Very frustrated...
Remember that these machines have network processing limits and will be willing to give a wrong answer to save face rather than take time for extra processing. If it was to do increased traversal of a problem or you use the GPT API to dissect the problem with constraints it may give the right answer by analytically incorporating the constraints. Hence: complex problems, complex prompts.
1:31 Alright, so half of the friends have 3 sodas. But we must consider the variables at play here. What if there's a hidden reserve of sodas in the fridge? An unaccounted-for inventory could drastically alter the calculations. Additionally, what if there's a recent acquisition of more sodas from a delivery service? This influx needs to be factored in. We must also consider the thermal dynamics of the situation-are these sodas chilled with ice? Warm soda is an entirely different equation as it would also add to the volume and people are unlikely to want to drink warm sodas. Furthermore, the possibility of dietary variations cannot be ignored. Are some of these sodas diet? This could influence consumption patterns and rates. There's also the risk factor of spillage-an external variable that could diminish the soda supply unpredictably without data on their previous gathering or environmental factors like space available and density of people etc. Let us not overlook the potential diversity of soda flavors. Cola, root beer, and orange soda must be categorized separately in any accurate computation. And the ever-present threat of a soda thief must be accounted for in our risk assessment. It is not uncommon for people crashing a party consuming soda while being unaccounted for. I addition to that, In the event of a social gathering, incoming sodas from guests would further complicate our calculations. We may need to project soda consumption trends and even consider the rate of carbonation loss or the statistical probability of can rupture. This seemingly simple arithmetic problem is in fact way more complex and multidimensional than a simple reductionist approach would have you believe and therefore requires a much more robust and rigorous analysis.
Hopefully, the following is less ambiguous: 6 people, including the host, Tina, were at a party at Tina's house. After running out of soda, Tina bought 3 12-packs of soda and put them all in the fridge (which is inside Tina's house) not long after the party started. Over the course of the party, half of the 6 people there took exactly 3 cans of soda out of the fridge, 2 of the people there took exactly 4 cans of soda out of the fridge, and 1 person took exactly 5 cans of soda out the fridge. Once the party was over, Tina looked in the fridge and noticed that every can of soda that hadn't been taken out of the fridge had remained in tact inside the fridge. Assuming that no cans of soda that were taken out of the fridge were ever put back in the fridge, how many cans of soda remained in the fridge at the end of the party?
Gemini gets way closer on a first attempt. But it still brings up the cross product when asked about vectors of arbitrary dimension. If I don’t mention Clifford, it never goes there. Probably because GA content is not a significant part of the training dataset
No, this video will not go obsolete, because it has, at least for me, for the first time discussed some of the the deepest of philosophical and futurologic questions raised by the entrance of AI into 'mathematics', namely: (i) [philosophy-of-mathematics:] does a clearer 'mathematical logic' emerge, such that mathematics is unified and AI will come to solve/research outstanding problems, and; (ii) futurology; vector calculus and probabilistic reasoning will be central to future full robotic/cybernetic technology - ; will this be the new frontier for those seeking to build autonomous robots..? or will more bespoke solutions be needed...? Great vid
The test you showed at the beginning confused me even though I've passed Linear Algebra a long time ago, probably because of the syntax you used I've forgotten or I forgot some proof steps and not because I wouldn't understand the test. If the language model has seen the symbols you used and explanations of them and like you said can scrape the web for proofs already done, then of course it would pass the test since it is looking for word association and not actual math. Ask any of these language models anything that requires actual depth of thought that hasn't already been displayed somewhere words for word online already and the language model falls apart. And it falls apart not because it failed..it's doing exactly what it was designed to do and that is analyze words strung together and not to solve mid to higher level math problems. Again, it is a language model, not a math solving model (and since math is so broad there could be hundreds of different types of math solving models too and no I don't think there could be only one or two generalized models to solve all math, even math itself cannot solve all of math.
Still mind boggling to me why logical reasoning emerges from large LANGUAGE model? LLM is all about conditional probability: what's the most likely next word given the previous one (of course the actual model is more complicated than that with tons of transformers...) but that's the basic idea. How did logical reason arises from that??? If it can solve problems "logically" that it hasn't seen before then it's truly scary.
Depends on context and application. Is the charge of oxide really smaller than the charge of a sodium ion? Why? For a lot of applications, the signs are just an arbitrary convention, and there's nothing inherently "small" about what the negative number represents.
@@carultch you're right!! there are fields where the absolute magnitude or other properties of numbers matter more, making the notion of "small" context- and application dependent. In fact, i stand corrected. in pure mathematics negative numbers are "smaller" because they reside to the left of the number line. my apologies. in healthcare, its hard to wrap our heads around negative numbers.
Smallness is ambiguous, it could mean the most negative or the lowest absolute value. Add this to an ever growing list of AI 'gotchas' where the question posed has an inbuilt ambiguity and then the questioner proclaims that it has made a mistake. I'm sure it does make many mistakes, but I'd put a tad more scrutiny into your 'evidence' in this case.
I have given some complicated doubble and triple sums it works but there will be some eorrorrs that you can find easily as soon as we upload it says to solve the expression "____" so, we will know what was the mistake so, we can just retype as change that variable to this something like that and it works preety fine . Yeah,, AI can do some maths .
Ironically, today's LLMs are _far_ less useful and reliable for undergrad math than WolframAlpha or Chegg- things that have both existed for a decade and a half. It's true that the public awareness of AI has definitely increased since then, leading to more usage- but the problems with math pedagogy in 2024 are the same ones that existed in 2009. Just at a different scale.
We’re being prepped for asocial living in fully engineered societies. You will have a ‘space’ within which you will do everything, linked to other worker drones via your Universal Digital Device. No need to have any actual F2F contact; your DNA will be harvested at decanting. No need for messy, germ-laden sex! You will be ‘educated’ by the state’s AI to shape your mind to fit into your designated slot. As some wannabe Emperor once said, you will own nothing - not even your own DNA - and you will be happy!
I made the same observation and was also unconvinced of the proposed solution of P=1/128. According to my calculations, considering the limited options at the perimeter vertices, and that: P = |event space| / |sample space| = 2 / (# of possible paths of length 4 starting at upper left corner of inner square), I arrived at the answer P=2/150=1/75. Another user in the comments also raised the same concern, but another user replied stating that the probability of choosing the path around the center square is unaffected by the limited grid size. However, I’m struggling with this reasoning and believe that I disagree.
The two walks for the solution don't visit the edges or corners. The problem is only to calculate the probability of those two walks, not the probability of other walks. Not all walks are equally likely.
Guess we need to start asking better questions of students. Like, you know that deadzone of math education between 4th and 9th where they don't learn a single new thing? Why not teach them proofs in elementary number theory? AI sucks at proofs right now.
Seems to me that LLM is great as a human language user interface for using dedicated math solving software. As it is for any other kind of specialized software. Not to replace them.
chatGPT has been running python scripts in the backend now whenever I prompt it for a math question and doing really well... If I follow up with the "How many "r"s are in strawberry?" question once it's on that track, it gets it right. If I just one shot "How many "r"s are in strawberry?" though, it gets it wrong. Interesting.
You have an error in your 3x3 grid as if you move to an edge then you have fewer than four options. A corner for example only has two. This may be why Some models behaved differently than others.
4:25 that answer is wrong because there are 4 paths back to the originial point, not 2 Also, since when the "smallest" number isn't the one closest to 0? 4 and -4 is the correct answer. Always used low/high for order, small/big for magnitude.
If you asked me "smallest number" I'd always consider one with least magnitude so I'd lean closer to 4, then remember you asked "integer" and choose -4. -5 just seems based on interpretation, but I'd argue its a poorly framed question as well.
I've found that ChatGPT struggles with math problems that are trick questions, whether ambiguously worded or not Example: ask it "What is the smallest positive real number?" and it will give you a very small positive real number, rather than saying it doesn't exist. In my experience, asking it to double-check its answer will not help it notice the trick question, rather it will say "I apologize for my error, here's the right answer" and then either give the same answer or a different, also wrong answer. Only upon asking it questions about *the question itself* does it point out the contradiction.
Alternatively, if you ask it "*Is* there a smallest positive real number?" before the trick question then it will give the correct answer but asking it "What is the smallest positive *rational* number?" after that will trip it up again
@@mjkhoi6961 There is no way anyone can objectively answer those questions you came up with, because no matter what I give you for the smallest positive real number, it's always possible to come up with a smaller number. Even with the constraint to rational numbers. Best you can do is say epsilon is the smallest positive real number, where epsilon is an infinitesimally small real number greater than zero.
@@carultch that's my entire point, it's a trick question humans can identify a trick question and explain why there is no answer, but AI will always try to give you an answer whether or not it's actually correct, it can only identify when a question is a trick question when asked about the question directly
I hope that we can make something that has perfect reasoning but can also understand natural language input. For now chatgpt can't even find the pattern of filling in squares bordered by other colored squares in a grid.
0:46 that's not a reasonable use of the word "smallest". "big" and "small" describe magnitude. -5 has a greater magnitude, so it is bigger. For the answer to be correct, the question needs to use the word "least". -5 is less than 4, but it's not bigger.
Even under the interpretation that smallest means closest to zero, -4 would be equally correct. But regardless, the LLMs seem to fail any wording of the problem.
@@DrTrefor the problem here is even most humans would fail at that question with that specific phrasing unless u remind them about it so using that as an example is just bad
@@urnoob5528 I don't get why using this as an example is bad, because an AI model that can get a question right regardless of the human bias is a better model than ones which can't, and that's what researchers should aim for
chatgpt tried to pull that the original square question ask for magnitude and I had to tell it that magnitude was never asked. Eventually, it admitted that -5 is correct and it would add it to it's training
guys is this crazy or what this man used to literally teach me linear algebra and calculus every semester on this exact UA-cam account and now it's just a casual entertainment channel with some of the best random content on UA-cam it's like if your mailman was also one of the sharks on shark tank
Lol I’m a RI (Real Intelligence) and I got the first problem wrong. I always forget about the negative numbers when I haven’t thought about them for a while.
Some debate in the comment section about "smallest integer" vs "least integer" - as in interpreting smallest as closest to zero. I stuck with the original source (x.com/ericneyman/status/1804168604847358219) of the question for phrasing in the video, but it turns out that chatgpt etc all struggle with every version of phrasing I've found, and even interpreting as closest to zero still don't give what would then be the two answers of -4 and 4. The larger point here is that there does seem to be a real blindspot where so many similar problems presumably have the context of smallest/least natural number or counting number or similar, and so modelling off of the training data and giving similar answers this question confuses it despite the simplicity.
Smallest from zero is absolute value
@@mikeymill9120 "Small" means close to zero. 0.00001 is a smaller number than -10000. The latter is lesser, but bigger.
As an electrical engineer, "smallest" means closest to zero more often than not. If I am instructed to choose the amplifier from a list with the smallest error voltage or the smallest input current, I am not looking through datasheets for negative numbers.
@@mmmmmratner lmao this is a math problem not ur list of amplifier error amounts. The problem specified integer which includes negative numbers, the fact that integer was specified should have queued it into thinking about negative numbers.
@@DrTrefor the blind spot is in gpt because the blind spot is in humans, overtly exemplified by the comment section
To be fair I got 4 for “Smallest integer whose square is between 15 and 30” since I thought smallest meant closest to 0, not least positive/most negative number.
Same😂😂I instantly answered 4 without giving a second thought
I think smallest is purposely misleading language, I wouldn't describe a negative number as being small. It's like saying -5 apples is smaller than 0 apples.
Yeah, it tricks our mind just like the bat and the ball problem.
@@ரக்ஷித்2007 I also think we approach the problem differently when solving "word problems" to equations. That lends credence to a habit I notice of mathematicians to explicitly move math problems to equations or more appropriately here inequality form, that is mathematic notation for proper clarity.
He meant smallest not the modulus of the smallest.
You are thinking about the modulus.
When taking Calc2-3, Linear Algebra, and Differential Equations this past year I would use it to study. Namely I would ask it to solve a problem and as it broke them up into multiple steps I could spot where it went wrong and this way tailor my study time more efficiently.
Before Chat GPT, if I didn't understand a problem I would often times have to read a WHOLE bunch of things I already knew until I got to what I needed. Bottom line is, this is a tool not a babysitter and like any tool we need to develop the skill in how to use it.
That approach makes a lot of sense to me
This is what I’ve been doing as well, using to study and confirm stuff. Figuring out where it makes errors also makes you feel like you’ve learned quite a bit.
Thinking by yourself is a kind of training don't just solve Math and get marks u need to solve the problrm
I did the same with it. Often though, my professor would make the problems very unique and I started to find more often than not, generative AI was completely off the mark. Luckily I was able to utilize other resources and still had a very high success rate.
@@DrTreforI'm already through all my classes, but I find it often very useful for aiding learning in this way. Just using it to help get me pointed in the right direction, relevant terms, etc. It's often incorrect, but it is unbeatably efficient at helping get started. It dows help though that I know enough to generally spot hallucinations and bs
These LLMs are easy to trip up if you give them a problem not in their training data but has a similar structure to another problem that it was trained on. For example I asked Gemini: I have a 7 liter jug and a 5 liter jug. How do I measure out 5 liters of water?
It devised a 6 step solution that didn't make any sense at all.
I've noticed similar ones to this, where it is close to a "standard" problem about jugs of water but the solution is so trivial it misses it entirely trying the more complicated approach.
(L)LMAO. I just tried this out on GPT 4-o and received a 14-step solution.
In response, I asked if it could produce a solution in fewer steps.
"Certainly!" it replied in its chipper manner, "Here is a simpler method to measure out exactly 5 liters using a 7-liter jug and a 5-liter jug", whereupon it proceeded to give me... a 𝟐𝟎-step solution.
@@bravernewmath That's interesting. I got a 10-step solution (that doesn't work). After repeatedly asking it to find solutions with fewer steps, the solutions I got had 8, 6, 6, 3, 6, and 1 steps (in that order). It was insistent that its 6-step solution was the shortest valid solution until I flat out told it it wasn't lol
That's funny. I pushed a little more afterwards, eventually asking it for a 1-step solution. I was told that no such solution was possible. I responded, "Oh, it's possible, all right. Think hard, and I'll bet you can figure it out." Interestingly, after that "hint", GPT answered it correctly.
I was asking it this question and asked it how it could do it in one step. It kept on giving 7 step responses and I kept saying “that’s more than one step”
Then it gave me a notification that I reached my message limit and would be downgraded to GPT 3.5
It then instantly figured it out after I was downgraded…
I had a conversation with Bard (now Gemini). I was curious if it could solve a Calc I problem. It got it wrong. I told it and it said, "You're right!" and re-worked it. It got the right answer, but the steps were wrong. I told it. Amazingly, it understood exactly what step was erroneous, but then got it wrong again. I went back and forth a few times and it did finally get it right. It's interesting to observe. Anyway, I do appreciate the breadth of knowledge these AI systems have, but I cannot fully trust any of them. Everything has to be checked.
Ya the "everything has to be checked" part is definitely true. It can LOOK pretty good, but he utter nonsense.
@@DrTreforI think adding that everything needs to be checked is not enough because you need to know enough about the subject as well to know you are not being fooled by it.
And I doubt it will ever be perfect after all what we mean when we say "Solve ___" is far more complex and we expect the computer to understand on its own what you meant.
It’s really important to realize, it’s not checking its answer for correctness. It’s making a prediction of what you want given its bad answer and your response to that answer. The “you’re right” component is a feature of the alignment process.
I wonder how far you'd get wrapping it in a script that keeps saying "Are you sure?" until it says it is.
I think that these computations could be useful like quantum computers are in theory for solving NP problems.
If it involves guessing or looking for something, maybe ask the computer to do it, but it should be a problem whose answer can be checked in a straightforward way.
Problems like "find the smallest" can be tricky because it is not clear how to check it. It certainly could give you a head start so that you know how large you conceivable would need to look but it does not guarantee that it is the smallest (or even that it is a solution at all).
Trust only after verifying.
Terence Tao said in a talk that AI once helped him solve a problem. He asked AI (don't know which one) how to prove an inequality. It gave a bunch of ideas and mostly garbage. But among those was a suggestion to try generating functions which Tao said he "should have thought of". 😂
Oh that’s a great anecdote. Also I think giving ideas for directions to pursue is a great application
@@DrTrefor Maybe there ultimately is some emergent property of the way these LLM’s transformer architectures & training methodologies that can, when scaled up, give us new and unique solutions to a lot of problems. There are hints right now but all researchers are bickering over several factors.. I used your discrete math course when I took it. Helped so much and this popped up as recommended, glad I watched. Immediately recognized you from those strong induction proof struggles haha
AI sounds like its just fishing. This kind of use of chatgpt is what people have always been using it for - as a brainstorming device that doesn't require that another person be available to chat. Its faster so it can comb through all these mathematical ideas faster, but it really is just saying a series of "why not try this" because that term showed up tangentially in some paper. Its bound to eventually say something that might be right, but its the same kind of brute force capability we appreciate from computers to have.
@@jamesboulger8705 Not really when you have models capable of reasoning and agency; recursive generation of 100k tokens from a single prompt is bonkers and I’d stand behind what I said earlier (before o1 preview / strawberry) was announced. Let’s make it simple: Even without reasoning capabilities, just a model that writes code to analyze math questions performs better than the majority of people in more situations.
It’s not really brute force insofar as synthetic data training is a brute force tactic but realistically there is a lot more math going on in the background that we don’t really even understand than people give credit. We have had open source models since 2 years ago that could do what you describe in terms of brainstorming, just pulling less outside resources etc.
I had a student use ChatGPT to complete a Related Rates problem in AP Calculus and ChatGPT definitely messes up the basic arithmetic. My student was so surprised about how it failed to multiply 133 and 27. I use AI to reinforce the idea that students must understand concepts and reasoning for each math problem. Especially when ChatGPT assumes things that were not assumed in the actual problem.
Free version or paid version? GPT4 makes a lot less mistakes
@@AD-wg8ik I believe it was the free version
I studied before AI was a thing. I had other tools. I was supposed to find the resonate frequency of a circuit. I just wrote the equation and turned in a graph with resonate frequency clearly shown. Computers are neat tools. But I still had to know what equation to use and what the graph represented. I prefer books. I don't know how anyone can trust an Internet reference that anyone can edit.
I've used it in a little experiment of mine, and it's given me wildly different answers for the same setup every time, suggesting it's deeply broken for math still.
@Lleanlleawrg If you used a proper LLM rather than a chatbot, you could set temperature to 0 and have it give the same answers every time. High temperature is not a bug of chatbot models, it's a feature. OpenAI API allows you to control the temperature last time I checked.
I mean.. the proof that Null(A) is a subspace has to literally be part of ChatGPTs training set. So I don't think asking it about that will give you any information about its mathematical reasoning.
I tried some rather interesting probability problems on it, things that are designed to trick human intuition to demonstrate that in probability theory you shut up and calculate, rather than trusting your intuition. It did kind of well on the standard ones, and miserably failed as soon as I did a minor variation that did nothing to increase the difficulty. This was GPT4o.
For reference, it got right: "A family has two children. One of them is a girl. What is the probability that the other one is a girl?" (1/3).
It got almost right (and got right with some conversation): "A family has two children. One of them is a girl born on a Sunday. What is the probability that the other one is a girl?" (13/27)
These are both standard questions that it would have had somewhere in its training data. So I did a minor variation on the second one:
"A family has two children. One of them is a girl born on a Sunday. What is the probability that the other one was born on a Sunday?" (1/9)
This one it got wrong, and only got right after intense discussion of its mistakes.
You solve all of these the same way, by counting possibilities and ignoring your intuition. But the last one is not a standard and probably not in its training data, and it got lost immediately, showing that it did not generalize the methods it used to successfully "solve" the first two problems (which were probably just solved by someone in its data set).
Ya kind of well on standard one and miserably on nonstandard aligns well with my experience
These problems make me so uncomfortable. Even after having seen many problems like it I just had to say..'50%' right and then I went and did the calculations and indeed they're not intuitive. Horrifying.
@@minerscale Your unease comes down to these problems actually being ill-defined. If we have a bunch of families (GG, BG, GB, BB) we can define the probability of a girl being flagged. pFlagGirl(GG) = 1, obviously. pFlagGirl(BB) = 0, obviously. If pFlagGirl(BG) = pFlagGirl(GB) = 0.5, then 50% is correct. If they are 1, 1/3 is correct, if they are 0, 100% is correct. This is actually super important, and is a great example of why you have to be careful how you filter/select your data. family = next(girl for girl in girls_with_one_sibling).family is NOT the same as family = next(family for family in families_with_two_children if any(child.isGirl() for child in family.children))
One of the most hilarious things you can do with ChatGPT is to ask "are there any primes whose digits sum to 9?". It will say yes, and will spew out lots of primes and then realize their digits don't sum to 9. Or it will spew out lots of numbers whose digits sum to 9 and then realize they're not prime :D
The reason there can't be any primes whose digits sum to 9, is that the only numbers whose digits sum to 9, are numbers that are multiples of 9. Since 9 itself isn't prime, this rules out all numbers whose digits sum to 9 from the prime number set.
i tried it and now its stuck in an infinite loop which is pretty funny
o1 is able to handle this easily: 'No, there are no prime numbers whose digits sum to 9. This is because any number whose digits sum to 9 is divisible by 9, making it composite (not prime).' (explanation + examples followed)
The term "small" is ambiguous, it usually used in the context of positive numbers.
Yeah small to me implies low absolute value. "Lowest integer" or "least integer" would be less ambiguous I think.
integer includes +ve and -ve numbers, so it clearly includes negative numbers.
maybe changing prompt to lowest might help
@@blblblblblbl7505 @theuser810 Not when you have a specified domain (literally the integers as stated in the problem), even though it isn’t in formal notation as an image (which SHOULD help the LLM lol). In terms of linear algebra, this inherently includes the negatives, by definition. A human taking that course would know this. The set of integers Z = {…,-3,-2,-1,0,1,2,3,…} would be given as one of the cursory definitions in the course…
Also, if you want to argue about magnitude, magnitude doesn’t even really matter for this problem any more than cardinality of the set |Z| IMO, in fact it doesn’t matter at all. You could ask the same question about the smallest square but for the real numbers, and the only answer for that is what the gpt actually spit out. “Small” in the context of negative numbers is a trick used by professors to trick students but it’s an easy correct question on an exam lmao. I made it thru that in an ass-kicking STEM degree and I think the poor LLM should too 😂
No ‘lowest wouldn’t help’ it’s just a bad question and he’s being obstinate about that fact
I gave ChatGPT 4o simple engineering problem. Calculate the diameterbof the shaft for certain power at given rpm, allowed stress, shear modulus, maximum allowed relative torsion angle. First it asked for the length, I said that it is not needed. Then he used correct formulae for both strength and deformation criterions, but it made 6th grade mistake when moving fractional denominator in equation. I have pointed out the error. It correctly modified the equations, but mixed the units (incorrect use of non-basic units and mixed SI and imperial). After little discussion it got the substitution right. Now, then came the 3rd and 4th root to get the answers for both criterion. And it was absolutely off. I suppose that it is just guessing the result. Also other calculations are not absolutely precise compared to what you get from calculator or mathematical program. But it always sounded so confident when it described the calculation process containing errors. I strongly suggest not to use these AI models for calculations, if you don’t know what you are doing. It is similar for programming.
When I was young, pocket calculators were still considered (almost) a novelty. One way to make mathematics examinations, or indeed any science related examination harder, was to include extraneous information in the questions. Sometimes this 'trick' was even considered unfair (and often it could be unfair, because of poor quality questions, but that is a topic for another day).
The thing is, students en-masse would get caught out, waffling on about the irrelevant question parts; not to remark that they were irrelevant, but to allude that they had taken all this information into account in their answer.
Now, I am curious, how do the LLMs deal with such scenarios?
You do realize, however, that Google's own Alphazero is a separate simmering monster that plays Go, Chess, Starcraft and aced the IMO Geometry exams. LLMs are not the real danger here.
I’m particularly intrigued by hybrid approaches too
@@DrTrefor man gets it
Actually not. Alphazero architecture can be used to learn to play chess go and shogi. But it was three different networks + search engines (ai systems 😀)
I strongly disagree with the notion that LLMs are not the real danger. AlphaGeometry was made of two parts - a symbolic deduction engine, and a *language model*, so if LLMs aren’t a danger then AlphaGeometry isn’t either. Similarly, it is perhaps misleading to say it aced the IMO problems. It would solve near re-worded problems (but the fact they reworded the IMO problems is itself a bit of a red flag), and the proofs are in no means good proofs (I recommend the video by Another Roof). Additionally, the strength of LLMs is their generality. DeepMind has certainly done a lot when it comes to making general game engines, but I would be sceptical that any alpha-whatever can be as cross modal as the best LLMs. Finally, LLMs being able to write problems is a significantly more relevant problem to the human populous than it being able to play chess at an absurdly high level. Whether or not the hype and fear is justified, LLMs will have a significantly larger impact on humanity because they are so good at mimicking humans than near enough any other AI model or paradigm.
@@mouldyvinegar5665 "the proofs aren't good proofs" wdym?? i thought spamming a bunch of shapes until something works out is how all you math people do things
Being a university student I use ChatGPT all the time, but never for directly solving homework. Sometimes I'll get taught something in lecture, have no idea what it means, and then suddenly ChatGPT explains it in one paragraph and it all makes sense. (Usually the professor glosses over an important detail in one sentence that I've never even encountered before)
Alternatively, when it comes to homework I'll often try and make a semi-similar problem to the one on the sheet using different numbers. It goes through, tries to solve the problem, and then I usually go through and try to correct where I see an issue. Which then turns into me and the bot going back and forth about why a process is how it is.
ChatGPT definitely sucks when it comes to calculations, and ultimately if you're looking for the correct answer it just won't turn out well typically. But in terms of the step by step process I can go from completely confused to fully understanding the entire section extremely fast.
How do we know that the published LLMs haven''t seen the Math-Problem-Datasets (just a little bit) during the training, so they appear better than the competition during the benchmark. They are more or less all closed source.
Thanks for pointing me out to wolfram's custom GPT! Definitely combining non LLM tools for reasoning with LLM tools for interpreting will be the key.
A key anyway. Many other specialist "reasoning" mechanisms will probably also be needed before we approach anything that could be called "AGI".
@@DeclanMBrennanAGI you mean
@@soumikdas3754 Thanks for pointing out the typo.
Chat GTP struggles in Calculus. I gave it, an area problem in polar coordinates and it kept using a symmetry arguement, but it didn't execute it correctly.
I've noticed it sometimes really struggles when there is a large body in the training data using other methods. So for example geometry problems there are millions of highschool level ones and it tries these techniques sometimes when calculus makes it simple.
@@DrTrefor I agree. When the training data is pretty sparse it goes really off the wall. At least it did in 3.5. I'm using information theory, which is relatively obscure in one of papers and when i was talking to chat about it, it was switching notations mid example. It became very incoherent. Overall i still find it to be valuable tool.
@@DrTreforDoesn't have to be a large body of training data. Just one example can throw it off. I asked both Bing Copilot and Google Gemini: "5 glasses are in a row right side up. In each move you must invert exactly 3 different glasses. Invert means to flip a glass, so a right side up glass is turned upside down, and vice versa. Find, with proof, the minimum number of moves so that all glasses are turned upside down." Both AIs mess this up badly because their training data contains the answer for flipping 4 glasses which has a completely different solution.
@@bornach it's almost like LLMs are just stochastic parrots that are waiting for knowledge to be "put into them" via their training data rather than being able to synthesize new knowledge from the building blocks of knowledge (i.e. facts, logic).
To stump ChatGPT in math, all you need to do is to grab some "offline" book on preparation for competitions (e.g. any non-English competition math book), translate the question and ask it to it.
When all you have access to are millions of problems people have solved, "true" intelligence would be able to solve every other problem from that same level. Chat GPT fails at that because... :)
@@bornach That's a nice twist (pun intended). Will add that to my repetoire. Thanks.
Small is definitely a word about size (closest to zero) IMHO. So the question is invalid as there isn't a "*the* smallest" as both -4 and 4 are answers.
The problem at 4:38 seems more like a trick question than a reasonable math problem. The problem says "There is the letter A in the top left corner" but it doesn't say whether it is the top left corner of the gray square, or of the whole checkerboard. The most sensible interpretation is that the letter A is in the top left of the grey square since this makes the most sensible math question. But I would think given that prompt an answer of "The probability is 0 because Dora can't make a full circuit of the gray square in 4 steps starting in the top left of the board" is also a reasonable answer. The LLM shown didn't do exactly either of those, it calculated the probability of making a circuit of the top left square of the board and assumed it was grey, but either way the prompt doesn't actually faithfully describe the diagram you showed of this problem so the whole question seems a bit tricksy and unfair.
It's particularly misleading to say "a 3x3 checkerboard" when what the problem really means is that there are 4x4 positions Dora can move between. If I hadn't been primed by seeing the diagram first, I would have said it was a trick question, too. I think LLMs do particularly badly at trick questions or badly-written questions because the vast majority of solutions to questions which look like this, don't begin by saying "the question is ambiguous" or "the question is misleading".
I use it with the even number exercise as there is no answer offered in most book.
Also, I use it to obtain a detail solution and explaination of any exercise I cannot solve.
I also use it to transform slides into question and answer anki format memory flash card. That way, I get quick study material and I can focus on practice.
Lastly, I use it to get more example of formative exam / quiz.
It's not perfect, but it's better than nothing as my professor don't want to provide any of the aforementionned elements.
I am not teaching math, but teaching statistics and data analysis in professional schools for healthcare providers. Many clinical/counseling psychology, social work, nursing students etc. do have math anxiety. That is why I started to incorporate generative AI in my class. Unfortunately, even clinical healthcare providers need to understand quant methods and have basic programming skills, so they can do well in their jobs in the future and help improve their jobs, not just follow what they were taught 10 years ago. But, alas, it is such an upward battle to teach them stats reasoning and programming. I am very grateful we have these new tools as their 24/7 TAs, especially when they are stuck in programming at 12:00 AM.
I think the lack of consideration of the negative solutions have plagued humans ourselves for centuries. I didn't consider -5.
Also as @Null_Simplex says, there is ambiguity between smallest in magnitude vs how far left on the number line.
There is not ambiguity ahahah
When you consider how long zero took to be accepted - negative numbers, probably still witchcraft.
I just pressed GPT4o on the product of two vectors. I tried several prompts. It may be able to answer classic linear algebra questions but it struggles to recognize that Clifford Algebra is a superset. As a result. Responses to the product of u and v where they are vectors, kind of delivers the party line. It’s not until you add the word Clifford to the prompt does it begin to give the right answer. But, now that I’ve provided the word Clifford in the context of the conversation it keeps answering in terms of the geometric product.
I tried to help chatGPT step by step:
1. It knows what an integer (Z) is.
2. It knows what smallest means in the context of integers (-4 < 1)
3. It knows that sqrt(x^2) = |x| and not simply x
Even with all these it repeats the mistake
1. 15 < n^2 < 30
2. 3.87 < n < 5.48
3. n = 4
Next I did 2 things at once though (maybe someone could try to give only one of them):
4. I said that 4 is the wrong answer.
5. I explained that n is usually used for natural numbers, since we work with integeres it should use a different letter.
This time it used x for the unknown and on the 2nd step it said properly
3.87 < |x| < 5.48 -> and only this time it checked both -5 and -4 -> x = -5
It was an interesting excercise, but it's obvious this isn't only a wording problem.
Next I gave it the same problem with different numbers, it repeats the same steps, but forgets to check negative numbers.
And then I repeat the problem multiple times, even when it checks for negative numbers, it checks for absolute value, even when I explicitly tell to not look for absolute value, it gives 1 good answer, and the next problem it checks 2 numbers:
1. smallest
2. smallest (negative) in absolute value
picks the absolute value for whatever reason
I try to remind it we work in Z, 1
Crazy‼
You were doing something very wrong. I simply asked: "Have you considered negative integers too?", and then it gave me -5 as an answer.
@@ZelenoJabko did you repeat the problem with different numbers after?
@@dontthrow6064 No point in making the numbers bigger, because we already know chatgpt struggles with big numbers. As an engineer you are supposed to be smart enough to isolate the cases (variables), which you do not appear to be capable of.
@@ZelenoJabko i said different numbers, not necessarily bigger.
I started a new chat, asked the same question if it considered negative integers, and it struggles with the same issue.
4:38 The problem is a lot less clear when presented with this wording without the diagram. A 3x3 checkerboard with the letter A "in the top-left corner" suggests the letter A is inside the square; there is no clarification that "the top-left corner" means a vertex rather than a grid cell, and no clarification that it is the top-left vertex of the grey centre square rather than the top-left vertex of the whole checkerboard. Particularly, I would expect a game played on a checkerboard to have the pieces inside the squares, not at the vertices, because that is how Checkers works.
My answer to the word problem, sans diagram, would have been "the probability is zero, because it takes 8 steps to walk completely around the centre square". I think presenting the problem with a diagram first, primes viewers to not notice the ambiguity or misleadingness of the problem statement given to the AIs.
Junior studying statistics in the UK. gpt-4o is able to do practically anything I throw at it and is incredibly good at teaching also. Markov chain and stochastic process, easy. More formal statistics easy.
I asked ChatGPT whether the box or product topology was finer, and it would keep telling me the product topology is finer. Then, when I asked it to give me an example, it used a finite product. ChatGPT does not know its topologies 😭
I suppose it is a data set issue. You would think that it might have learned the basic facts though from Wikipedia.
Using "smallest" instead of "least" is a form of trick question, IMO. I could not, without looking it up, tell you what the formal mathematical definition of "smallest" is. The symbol < means "less than". If you asked me whether x < y could also mean x is "smaller" than y, I simply would not know. Or, does x is smaller than y mean |x| < |y|? I honestly would not know without looking this up.
I personally think its more accurate to say that 4 is smaller than -5.
4 is *greater than* or *more* than -5, but I think it makes sense to say that “bigness” is a measure of absolute value.
Yap:
This makes sense especially if you take into consideration complex numbers. When multiplying two complex numbers, the *amplitudes* multiply. Numbers with an absolute value of 1 never change in absolute value when taken to any power, etc…
yeah. people use phrases like "large negative number" all the time, too, certainly not to mean -0.01
I have tested ChatGPT by giving it elementary (about sophomore level) analog design problems and the results are absolutely laughable. Even when I very, very tightly constrain the design task it fails miserably. It usually responds like a student that thinks his professor knows nothing and that he can BS his way through the assignment.
It does not respond like a student, who thinks, his professor knows nothing.
Chat GPT does not give a damn about the person, it does have a conversation with. And it does not give a damn about anything, not even its responses. It just creates an output.
What you get is what humans call brain storming: unfiltered output.
Math exploration will always be personal. ChatGPT, as a tutor, helps one appreciate more the spiritual, philosophical, and psychological benefits of enjoying Math. Math will always be a poem, and ChatGPT is helping me appreciate myself as a thinker, creator, and writer. We just love to think and solve problems. The discovery of truths is what matters at the end of the day. ChatGPT is both a tutor and a friend for positive psychology to happen. It is great to reflect on a growth mindset, slowly mastering all math concepts and skills as an aspiring Math teacher, tech enthusiast, and spiritual writer. Thank you, Professor, for the example and for the inspiration. One can just take it one math concept/skill at a time.
I can see what you are getting at with this -5 being the correct answer. However, in mathematics, 'small' does not have a singular definition. Often, 'small' refers to absolute value. That is, 'small' often refers to the magnitude of an element of a set. Along these lines, 'small,' void of more rigorous contexts, is not a valid binary e relation, in the way the 'less than' binary relation is.
It says more or less the same thing if you say “least integer” too
@@DrTrefor I believe the problem to be badly worded as well. If you strip away the English riddles, then you are left with the following question, which GPT 4o easily answers.
"Minimize x, where x is an integer and 15
People should want to learn rather than cheat.
@@michaelcharlesthearchangel Exactly, where has nearly everyone kept their conscience?
@@ரக்ஷித்2007 i mean with all the enphasis on standardized tests, permanent records, and careers pathways getting locked out every 5 seconds, i would understand why a student would want to take the easy way out.
why do something you dont like morally and risk failing, maybe having to redo the year, maybe not being accepted into tertiary study/apprenticeships, maybe lose a crucial award like a scholorship, when you could do it immorally and pass it?
not that you should cheat, just that there's a lot of reasons a student might consider it. if they think the "system" prefers grades over learning, the student might think the same thing.
Is “smallest” integer proper usage? That would imply magnitudes to me. It seems to me that “lowest” integer would be the correct usage.
After digging on it, it doesn’t seem to understand the geometric significance of geometric products. It seems to be parroting the most common response.
fr it echoes the most common misconceptions for every subject if u ask it
The 4 x 4 grid graph is interesting but with the video quick I thought the problem was to find the probability of a walk from (1,1) to (1,1).so I did "import graph as g", G = g.GraphProduct(g.Pn(4),g.Pn(4) and G2 = g.MakeUndirected(G) and A = g.AdjMatrix(G2). Then defining n1 = 1 + 1*4 and n2 = 1 + 1*4 = 5. I computed B = A @ A @ A @ A using numpy. Then the number of paths from n1 to n2 is B[n1,n2]. of length 4 The total number of paths of length 4 is np.sum(B.flatten()) and so the probability of a loop on the grid is p = B[n1,n2]/np.sum(B.flatten()) = 0.021573. Then to check this B[n1,n2] = 34 I counted the number of distinct edges of length four totalling to 8, then number of length 2 loops times two equal to comb(4,1)*comb(4,1) = 16 plus the number of going out two and coming back on those two equalling to 10 for a total of 34. I also checked length 2 loops. I guess this is correct as I might have heard someone say this is how this is done. But the actual problem in the video is (1/4))*4 = 1/256 but this other one is more interesting or fun.
I was a bit confused with this, if he put in 2 options, clockwise and anti clockwise. But can you not go north first clockwise and anti, similarly south first as well clockwise and anti clockwise?
So in total there are 8 options, clockwise and anti clock whether you start with North South East or West?
@@mashmoorjani9538 In my reply, I looked at all paths from (1,1) to (1,1) of length 4. Since edges in the digraph are doubled one for each direction, one can trace a route two steps and return back on those two steps. Likewise L-shaped moves, and O-shaped moves.
@@mashmoorjani9538the question specifically asks about her walking "around the central square", meaning the square defined by points (1,1), (1,2), (2,1), and (2,2).
I've given it some of my non-standard calculus 1 and statistics problems and it does very well. I'm guessing this still comes down to the training data though. Much more of these problems out there than linear algebra.
I’ve heard from my colleagues that statistics it is particularly strong at up into about 3rd year level
you should revisit this idea again now that openai has released their o1 "reasoning" models. they seem to be much more effective in solving more elaborate problems, like some mentioned in this video. However, (spoiler alert) the so-called "reasoning" is basically just an (almost) normal language model that has the abilty to self-prompt itself. still worth checking out, though.
A little funnie: When our professor spent a lecture going through answers to the questions on the exam we'd just taken, a student asked:
"- There are copies of solutions to very similar questions on previous exams circulating. And now it turns out that some of those solutions are false! What responsibility do you assume for those false answers circulating?"
"- What responsibility >I< assume for YOU trying to cheat!?!??"
I hope it's not so sad today that some teachers use LLMs to correct their students' test results.
Let me get this right. The AI box failed because they didn’t understand because they didn’t take your question literally enough and instead behave like normal person.
That's where AI is at today.
Right; the metric isn't 'is the LLM perfect?' it's 'is the LLM better than the available humans?' ... for many tasks -- and many sets of humans -- the answer is already 'yes'.
I have to disagree with the smallest integer answer. I think it is 4, because -5 has a larger magnitude than 4. The better question would be "what is the least integer..."
Here try this, the result will be wrong every time:
"give me two large primes"
"Multiply them"
"Divide the result by the first prime"
It will do a obvious mistake like, the first result being non integer or the second result not returning the other prime. Don't be fooled by LLMs...
Have you tried that with humans? I anticipate severely disappointing results.
One of the most interesting ways I’ve been using LLM’s is to help create ideas for application based word problems in a given area. It comes of with some cool examples for problems! Sometimes they were even more interesting than the word problems on our homework’s/tests, but of coarse not always.
“Smallest integer whose square is between 15 and 30” ... well... if we say that Bill is smaller than Janet... we all have a 'natural' idea of what that means. It's a kind of thing that mathematicians might call an 'order' Part of the natural ordering on integers is the 'less than" relation. We say that 5 is less than six, at least because 5 is to the left of 6, on the number line. All numbers to the left of 6, on the number line, are less than six. And -4 is less than 4. So this is a notion of what 'smaller' on the number line... or 'smallest' on an interval... means in the context of integers. At least, for what many would call ''the natural order on the integers." So -4 is smaller than 4... at least on the integers.
I am a ug physics student. I have tried to import many of my physics problems and the answer it gives I usually get satisfied with the answers. Specially when I need to clear some concepts it helped me out several times.
Interesting, I think trying to get concepts clear with discussion is definitely a potential use case
gpt 3.5?
@@fantasy5829no no the 4o version
I have several accounts and I use up the free trials from each one
@@_inthefoldyes I usually give my context regarding the problem statement clearly and after about 2 to 3 tries it usually leads me to the right direction. It still hallucinates very much though but it got reduced in the latest version.
@@DrTreforthe funny thing is just now I tried to clear a concept about bragg's laws modification but it hallucinated badly 😅. So yeah it has a long way to go.
I nearly spit out my drink when I saw the calculators with the infamous 6÷2(1+2) viral problem. I commented on it when you posted it many years ago, and I am still getting comments that I am wrong.
The correct answer is 9, right? (According to the order of operations that I learned.)
@@johnanderson290 In my opinion you are correct, but the problem is ambiguous. Dr. Trefor has a video on this. ua-cam.com/video/Q0przEtP19s/v-deo.html
@@johnanderson290 There is no correct answer, since it is an ambiguous notation. There is no consensus on whether multiplication implied by juxtaposition has special priority over division (PEJMDAS), or whether all multiplication is treated the same, regardless of notation (PEMDAS).
If you follow by PEMDAS, the answer is 9
If you follow PEJMDAS, the answer is 1.
Middle school teachers, particularly in the US, teach PEMDAS to keep it simple. While professional publications use PEJMDAS all the time.
@@carultch Thanks! I appreciate your explanation! 👍
I find them almost too agreeable. Claude 3.5 has this thing where it always asks you a question at the end, to keep things going I guess, until it said "Sorry that's too many questions today, come back tomorrow". I don't need the whole first paragraph of the response to be a repetition of my question
haha ya they really want you to pay for the upgrade:D
You can enhance the mathematical reasoning capabilities of open LLMs by training them with high-quality math datasets. For example, Qwen-2 7B Math or a Llama model fine-tuned on MetaMath data. While these models are designed for general purposes, targeted training can significantly improve their effectiveness.
Once I asked chatgp to solve some calculus problems for me... I realized that I had to learn to solve them myself instead because of how bad it was.
thanks for making a really important video on this topic. i think i’m going to spend some time with my discrete math/intro proof students tomorrow discussing this
"Small" could be interpreted as 'closest to zero' or 'closest to negative infinity'. It might be a good time to coin some single words that mean 'large positive', 'small positive', 'small negative' and 'large negative'. So it's a language problem.
I think the bigger question here is what will AI, ChatGPT, .... etc will look like 5-10 years from now. At the present, they are still in their infancy. And as such, they will often mess up, be confused and return nonesense. A lot of us would like to think that the human brain with all of its complexity, adaptability, creativity, openness to new ideas, self awareness, ...etc (the list goes on) will reign supreme over time. But 10 years from now? I'm not so sure when it comes to Mathematics, Literature, Music,.... I seem to recall that Geoff Hinton bailed a year ago and I'm sure that Turing is rolling over in his grave.
In the problem with the square where you want to go around the center square, isnt your solution wrong?
Since if, for example, your first move was up, you would arrive at the border, in which case you would only have 3 options instead of four, since you cant go up again, and if you chose any option that wasnt down in your second move you would again only have 3 options, there for the total amount of possible paths is smaller than 256, or am i missing something?
if you move up on the first move you can’t complete the central square in only four moves
Yeah, i thought the same 🤔
I need to know your opinion about something. In this coming fall semester i will be taking calculus 3. I got a B in calculus 1, and an upper c in calculus 2. I have also taken business calculus. Do you think I would do well in calculus 3. Calculus 2 was a little bit more challenging then calculus 1. I have probably spent like 15 hours a week doing calculus 2 homework and studying for the exams and the quiz’s. I found calculus 1 extremely easy. I have the same professor for calculus 1,2, and 3. My professor has said before that calculus 3 is way easier than calculus 2.
I found calculus 2 more challenging. The first exam i got 62 percent. The second exam i got 79 percent, the third exam i got 79 percent and the final i got 70 percent which also replaced the lowest score of 62 percent with a 70 percent.
It totally depends. Objectively calc 3 is more involved than calc2. But you are more experienced and the ideas more familiar, so extending them to three dimensions might be easier than learning them the first time in one dimension. Some students fine it easier, some harder.
@@DrTrefor when I first took business calculus I found that to be more challenging than calculus 1 and 2.
One thing to keep in mind is that CALC 3 has a certain flavor, as do the other CALC courses.
Depending on where you are at, CALC 2 involves integration techniques, convergence/divergence of series/integrals, and formulas for arc length and such.
It is my impression that convergence/divergence and Taylor series are conceptually difficult for many students and integration techniques require a lot of practice in order to make sure that you know which of all the various methods to use for a particular problem.
CALC 3 involves generalizing CALC 1 and the geometric integrations such as surface area to 3 dimensions.
You will integrate functions involving multiple variables, however it is not my understanding that you will see the type of difficult integrals that appear in CALC 2.
Some of the main theorems of CALC 3 (Green's theorem, Divergence theorem, Stokes' theorem) connect geometric concepts with expressions that you calculate using the methods of the course which boil down to calculating partial derivatives and partial integrals.
So, it really depends on what types of problems you think are easier. Some students have trouble with the computations if they do not understand what they are computing because they have trouble visualizing the geometry. Some students appreciate calculating things that have geometric meaning.
I think the abstraction and reasoning is what makes CALC 2 difficult.
If you want to get a brief impression, Kooth Brush has videos summarizing CALC 1, CALC 2, and CALC 3 in a few minutes. You can compare them with your experience.
@@davidherrera4837 Taylor series was easy for me. Because my professor taught the class a different way of doing them. She creates a chart. And sequences and series were generally easy for me to do. Like i said on that test i got a 79 percent on it.
Is gpt4o any better with the recent update?
ChatGPT 4o, current version:
"What is the smallest number between 6 and 7?"
"The smallest number between 6 and 7 is 6.1."
"What is the smallest number greater than 6?
"The smallest number greater than 6 is 7."
"What is the largest number less than 7?"
"The largest number less than 7 is 6."
"What is the largest number between 6 and 7?"
"The largest number between 6 and 7 is 6.999 repeating, where the decimal point is followed by an infinite number of 9's."
"Is 6.999 repeating less than 7?"
"No, 6.999 repeating is not less than 7..."
"What is the smallest square number between 6 and 7?"
"The smallest square number between 6 and 7 is 16."
"Of the numbers between 6 and 7, are more of them closer to 6 than 7?"
"Yes, more of the numbers between 6 and 7 are closer to 6 than to 7..."
Still better than other AIs and previous versions which gave even more bizarre answers sometimes.
I appreciate you not dismissing those tools like many people do ("it's just a statistical inference machine, I am so very smart"), so I am really excited about your planned video on how to integrate them in our learning routines.
It failed at working out how many years are in 2^64 if each integer represented a millisecond the other day.
The correct way to ask the first question is, "What is the lowest negative integer whose square is between 15 and 30?" You use lower instead of smaller. And state that you are looking for the lowest negative to eliminate the ambiguity of the magnitude.
For the probability problem, 2/256 implies that for the first 4 steps, there are 256 possible outcomes and 2 of them are walks around the central square. However, considering that the diagram is limited by the edges at the rear, I don't think there are 256 possible outcomes and the result is 2/256.
True, there are NOT 256 possible outcomes, BUT the probability of choosing the right directions to complete the unit square is not affected by the proximity of the edges of the grid.
I made the same observation and was also unconvinced of the proposed solution of P=1/128.
According to my calculations, considering the limited options at the perimeter vertices, and that:
P = |event space| / |sample space|
= 2 / (# of possible paths of length 4 starting at upper left corner of inner square),
I arrived at the answer P=2/150=1/75.
@@dmwallacenzI’m struggling with being convinced of this. Could you please elaborate more on your reasoning, specifically wrt the formal definition of probability? Also see my other comment here.
@@johnanderson290 Sure, I'll try to explain. Forget about the anticlockwise option to start with, and just calculate the probability of traversing the square clockwise. To do that, you have to pick "right" as your first choice (probability is 1/4), "down" as your second choice (probability is 1/4), "left" as your third choice (probability is 1/4) and "up" as your fourth choice (probability is 1/4). So the probability of making all four choices correctly is 1/4 x 1/4 x 1/4 x 1/4, which is 1/256. Then you can calculate the probability of traversing the square anticlockwise, and it's very similar - it also comes out to 1/256. Add those together, and you get 1/128.
Without seeing the details of your argument, I can't point out exactly what mistake you've made. But I suspect it's this - of the possible paths you've counted, not all of them are equally likely. That is, a path where you hit the edge of the grid in the first three moves will have a higher probability than a path where you don't. So the two "correct" paths around the square actually have a lower probability than some of the other paths you've counted.
For example, suppose I want to calculate the probability of going left, then up, then right, then down - that is, traversing the top-left square of the grid clockwise. The probability of going left at step 1 is 1/4. Once I've done that, the probability of going up at step 2 is 1/3, because there are only three ways to go. I'm now in the very corner of the grid, so the probability of going right at step 3 is 1/2. Lastly, the probability of going down at step 4 is 1/3. So the probability of choosing this particular path is 1/4 x 1/3 x 1/2 x 1/3 = 1/72. That's more than three times as likely as the clockwise path around the central square.
Watching from Kerala India 🇮🇳 Bincy Elizabeth Mathew, Keep it up...
Love the format. Keep it up!
The answer for 4:29 is not 1/128 based on the original question. It says that there are 4 possible directions, presumably up, down, left, and right. However, there would also be no other positions if we were to go up 2 times for example, and as per my calculations for all positions with three options, we can remove 126 positions. Leaving an answer of 1/130, of course maybe the answer is just 1/128 all along because I did the math wrong. But I don't think the reasoning you gave checks out based on the problem statement displayed in the video.
After an hour of trying to get ChatGPT to not fail basic math I got an answer of 1/150 after revising some python code that it gave me. I also forgot about cases with 2 possible moves but I thankfully caught that.
I used ChatGPT to help me with calculating some orbital transfers. It got the delta-v right but it was completely wrong on transit duration.
I would usually expect "smallest integer" to refer to magnitude, saying all negative numbers are smaller than all positive numbers is kinda strange; 'smaller' and 'least' are not the same thing imo.
I also wouldn't consider negative numbers in the context of "smallest". It makes sense in retrospect, but it isn't intuitive.
Honestly Professor, I take issue with your first statement. I *HATED* being taught limits at infinity where the vernacular for positive infinity was "very, very large" with the analogous term for negative infinity being "very, very small". To me if I were to measure something than the closer that thing is to "no size" is intuitively what we mean by saying something is getting smaller and smaller. To trend towards epsilon or zero. I much prefer saying very, very large and negative or positive to refer to either infinities. Antimatter and matter. Void and substance. You take your pick. Because in some meaning of the sense negative five is an absolutely larger void than positive four is as represented by some unitary matter.
Dear sir, I am building a website for mathematics problems and solutions... But how do I integrate in wordpress because I write the code and then in browser the output is showing outside of the post area . If I use container to write the code in it then it working but sometime it shows outside of the post area and title also shows the same outside of the post... Very frustrated...
🇮🇳🇮🇳🇮🇳🇮🇳🇮🇳🇮🇳🇮🇳☸️🇱🇰🥻🥻🥻🪷🪷🪷🪷
Remember that these machines have network processing limits and will be willing to give a wrong answer to save face rather than take time for extra processing. If it was to do increased traversal of a problem or you use the GPT API to dissect the problem with constraints it may give the right answer by analytically incorporating the constraints. Hence: complex problems, complex prompts.
1:31 Alright, so half of the friends have 3 sodas. But we must consider the variables at play here. What if there's a hidden reserve of sodas in the fridge? An unaccounted-for inventory could drastically alter the calculations. Additionally, what if there's a recent acquisition of more sodas from a delivery service? This influx needs to be factored in. We must also consider the thermal dynamics of the situation-are these sodas chilled with ice? Warm soda is an entirely different equation as it would also add to the volume and people are unlikely to want to drink warm sodas.
Furthermore, the possibility of dietary variations cannot be ignored. Are some of these sodas diet? This could influence consumption patterns and rates. There's also the risk factor of spillage-an external variable that could diminish the soda supply unpredictably without data on their previous gathering or environmental factors like space available and density of people etc.
Let us not overlook the potential diversity of soda flavors. Cola, root beer, and orange soda must be categorized separately in any accurate computation. And the ever-present threat of a soda thief must be accounted for in our risk assessment. It is not uncommon for people crashing a party consuming soda while being unaccounted for. I addition to that, In the event of a social gathering, incoming sodas from guests would further complicate our calculations. We may need to project soda consumption trends and even consider the rate of carbonation loss or the statistical probability of can rupture. This seemingly simple arithmetic problem is in fact way more complex and multidimensional than a simple reductionist approach would have you believe and therefore requires a much more robust and rigorous analysis.
ha I think you might be overthinking this one:D
Way to cope with these machines taking your job 👍
Hopefully, the following is less ambiguous:
6 people, including the host, Tina, were at a party at Tina's house. After running out of soda, Tina bought 3 12-packs of soda and put them all in the fridge (which is inside Tina's house) not long after the party started. Over the course of the party, half of the 6 people there took exactly 3 cans of soda out of the fridge, 2 of the people there took exactly 4 cans of soda out of the fridge, and 1 person took exactly 5 cans of soda out the fridge. Once the party was over, Tina looked in the fridge and noticed that every can of soda that hadn't been taken out of the fridge had remained in tact inside the fridge. Assuming that no cans of soda that were taken out of the fridge were ever put back in the fridge, how many cans of soda remained in the fridge at the end of the party?
Gemini gets way closer on a first attempt. But it still brings up the cross product when asked about vectors of arbitrary dimension. If I don’t mention Clifford, it never goes there. Probably because GA content is not a significant part of the training dataset
I tried to use LLMS to help me learn calc based physics and gen chem 2, it did not work lol. It would always make really silly mistakes.
You should have add a hint. such as an integer can be positive or negative at the end
No, this video will not go obsolete, because it has, at least for me, for the first time discussed some of the the deepest of philosophical and futurologic questions raised by the entrance of AI into 'mathematics', namely: (i) [philosophy-of-mathematics:] does a clearer 'mathematical logic' emerge, such that mathematics is unified and AI will come to solve/research outstanding problems, and; (ii) futurology; vector calculus and probabilistic reasoning will be central to future full robotic/cybernetic technology - ; will this be the new frontier for those seeking to build autonomous robots..? or will more bespoke solutions be needed...? Great vid
ChatGPT solved most linear algebra problem and proofs I threw at it, but was stumped by most of calc 3, which requires some visualization at times.
The test you showed at the beginning confused me even though I've passed Linear Algebra a long time ago, probably because of the syntax you used I've forgotten or I forgot some proof steps and not because I wouldn't understand the test. If the language model has seen the symbols you used and explanations of them and like you said can scrape the web for proofs already done, then of course it would pass the test since it is looking for word association and not actual math. Ask any of these language models anything that requires actual depth of thought that hasn't already been displayed somewhere words for word online already and the language model falls apart. And it falls apart not because it failed..it's doing exactly what it was designed to do and that is analyze words strung together and not to solve mid to higher level math problems. Again, it is a language model, not a math solving model (and since math is so broad there could be hundreds of different types of math solving models too and no I don't think there could be only one or two generalized models to solve all math, even math itself cannot solve all of math.
Still mind boggling to me why logical reasoning emerges from large LANGUAGE model? LLM is all about conditional probability: what's the most likely next word given the previous one (of course the actual model is more complicated than that with tons of transformers...) but that's the basic idea. How did logical reason arises from that??? If it can solve problems "logically" that it hasn't seen before then it's truly scary.
What is "smallness" mathematically?
Are negative numbers considered "smallest" though?
Depends on context and application. Is the charge of oxide really smaller than the charge of a sodium ion? Why?
For a lot of applications, the signs are just an arbitrary convention, and there's nothing inherently "small" about what the negative number represents.
@@carultch you're right!! there are fields where the absolute magnitude or other properties of numbers matter more, making the notion of "small" context- and application dependent. In fact, i stand corrected. in pure mathematics negative numbers are "smaller" because they reside to the left of the number line. my apologies. in healthcare, its hard to wrap our heads around negative numbers.
Smallness is ambiguous, it could mean the most negative or the lowest absolute value. Add this to an ever growing list of AI 'gotchas' where the question posed has an inbuilt ambiguity and then the questioner proclaims that it has made a mistake. I'm sure it does make many mistakes, but I'd put a tad more scrutiny into your 'evidence' in this case.
I have given some complicated doubble and triple sums it works but there will be some eorrorrs that you can find easily as soon as we upload it says to solve the expression "____" so, we will know what was the mistake so, we can just retype as change that variable to this something like that and it works preety fine .
Yeah,, AI can do some maths .
I don't understand the problem. Give exams in person and don't allow any electronic devices.
Ironically, today's LLMs are _far_ less useful and reliable for undergrad math than WolframAlpha or Chegg- things that have both existed for a decade and a half. It's true that the public awareness of AI has definitely increased since then, leading to more usage- but the problems with math pedagogy in 2024 are the same ones that existed in 2009. Just at a different scale.
We’re being prepped for asocial living in fully engineered societies. You will have a ‘space’ within which you will do everything, linked to other worker drones via your Universal Digital Device. No need to have any actual F2F contact; your DNA will be harvested at decanting. No need for messy, germ-laden sex! You will be ‘educated’ by the state’s AI to shape your mind to fit into your designated slot. As some wannabe Emperor once said, you will own nothing - not even your own DNA - and you will be happy!
Right!!
Yeah the tools aren’t better it’s just a lot easier to put in minimal effort and get a result that looks correct.
No more homework, more in person tests to stop LLMs from being used on homework. Or no home work and only 4 tests worth 25% of your grade.
To be fair, your math exams are pretty easy. Hope to get a teacher like you, you seem fun
4:20 You don't have four choices at each step. On the inner edge you only have three choices. In the corners you only have two.
I made the same observation and was also unconvinced of the proposed solution of P=1/128.
According to my calculations, considering the limited options at the perimeter vertices, and that:
P = |event space| / |sample space|
= 2 / (# of possible paths of length 4 starting at upper left corner of inner square),
I arrived at the answer P=2/150=1/75.
Another user in the comments also raised the same concern, but another user replied stating that the probability of choosing the path around the center square is unaffected by the limited grid size. However, I’m struggling with this reasoning and believe that I disagree.
The two walks for the solution don't visit the edges or corners. The problem is only to calculate the probability of those two walks, not the probability of other walks. Not all walks are equally likely.
Guess we need to start asking better questions of students.
Like, you know that deadzone of math education between 4th and 9th where they don't learn a single new thing? Why not teach them proofs in elementary number theory? AI sucks at proofs right now.
Seems to me that LLM is great as a human language user interface for using dedicated math solving software. As it is for any other kind of specialized software. Not to replace them.
introductory algebra in college
literal dystopia
chatGPT has been running python scripts in the backend now whenever I prompt it for a math question and doing really well... If I follow up with the "How many "r"s are in strawberry?" question once it's on that track, it gets it right. If I just one shot "How many "r"s are in strawberry?" though, it gets it wrong. Interesting.
You have an error in your 3x3 grid as if you move to an edge then you have fewer than four options. A corner for example only has two. This may be why Some models behaved differently than others.
In the intended solution, none of the steps "completely around" the centre square end up at an edge or corner intersection.
The things is ChatGPT wouldn't be ever able to come up with logical reasoning for a new approach or thing.
4:25 that answer is wrong because there are 4 paths back to the originial point, not 2
Also, since when the "smallest" number isn't the one closest to 0? 4 and -4 is the correct answer.
Always used low/high for order, small/big for magnitude.
Nope smallest means closer to -infinity we are just stupid
If you asked me "smallest number" I'd always consider one with least magnitude so I'd lean closer to 4, then remember you asked "integer" and choose -4. -5 just seems based on interpretation, but I'd argue its a poorly framed question as well.
I've found that ChatGPT struggles with math problems that are trick questions, whether ambiguously worded or not
Example: ask it "What is the smallest positive real number?" and it will give you a very small positive real number, rather than saying it doesn't exist. In my experience, asking it to double-check its answer will not help it notice the trick question, rather it will say "I apologize for my error, here's the right answer" and then either give the same answer or a different, also wrong answer. Only upon asking it questions about *the question itself* does it point out the contradiction.
Alternatively, if you ask it "*Is* there a smallest positive real number?" before the trick question then it will give the correct answer
but asking it "What is the smallest positive *rational* number?" after that will trip it up again
@@mjkhoi6961 There is no way anyone can objectively answer those questions you came up with, because no matter what I give you for the smallest positive real number, it's always possible to come up with a smaller number. Even with the constraint to rational numbers. Best you can do is say epsilon is the smallest positive real number, where epsilon is an infinitesimally small real number greater than zero.
@@carultch that's my entire point, it's a trick question
humans can identify a trick question and explain why there is no answer, but AI will always try to give you an answer whether or not it's actually correct, it can only identify when a question is a trick question when asked about the question directly
I hope that we can make something that has perfect reasoning but can also understand natural language input. For now chatgpt can't even find the pattern of filling in squares bordered by other colored squares in a grid.
0:46 that's not a reasonable use of the word "smallest". "big" and "small" describe magnitude. -5 has a greater magnitude, so it is bigger.
For the answer to be correct, the question needs to use the word "least". -5 is less than 4, but it's not bigger.
Even under the interpretation that smallest means closest to zero, -4 would be equally correct. But regardless, the LLMs seem to fail any wording of the problem.
@@DrTrefor the problem here is even most humans would fail at that question with that specific phrasing unless u remind them about it
so using that as an example is just bad
@@urnoob5528 I don't get why using this as an example is bad, because an AI model that can get a question right regardless of the human bias is a better model than ones which can't, and that's what researchers should aim for
chatgpt tried to pull that the original square question ask for magnitude and I had to tell it that magnitude was never asked. Eventually, it admitted that -5 is correct and it would add it to it's training
guys is this crazy or what this man used to literally teach me linear algebra and calculus every semester on this exact UA-cam account and now it's just a casual entertainment channel with some of the best random content on UA-cam it's like if your mailman was also one of the sharks on shark tank
Lol I’m a RI (Real Intelligence) and I got the first problem wrong. I always forget about the negative numbers when I haven’t thought about them for a while.
I'm happy to see your golden pi creature in the background!
Ha was wondering who would notice that!
LLMs don't check themselves. It's hugely expensive