ChatGPT is destroying my math exams

Поділитися
Вставка
  • Опубліковано 14 лип 2024
  • Learn more about LLMs & more at ► brilliant.org/TreforBazett to get started for free for 30 days, and to get 20% off an annual premium subscription!
    In this video we're going to answer just how good Large Language Models (LLMs) like ChatGPT 4o, Claude 3.5, and Google's Gemini are at mathematics. I'll cite some of the results from the literature using databases such as GSM8k and MATH, and we'll see several math examples along the way. References below.
    0:00 How to measure AI at math?
    0:56 GSM8k and GSM-Hard
    2:44 The MATH Database
    4:43 ChatGPT 4o vs Gemini vs Claude 3.5 Sonnet
    6:13 My Linear Algebra Exams
    8:32 Computational Engines
    10:34 Brilliant.org/TreforBazett
    References and Citations:
    *GSM8k (including graphic at 1:10 ) paperswithcode.com/sota/arith...
    *GSM-Hard stats found in here: arxiv.org/abs/2406.07394
    *Google Deepmind paper citing MATH database: arxiv.org/pdf/2406.06592
    *I first saw the question about the smallest integer here: x.com/ericneyman/status/18041...
    *Math Olympiad level problems (5:30): arxiv.org/abs/2406.07394
    *Stats for Claude 3.5: www.anthropic.com/news/claude...
    *Image of two calculators at 2:30 shared via CC-BY-SA 3 original here: www.wikidata.org/wiki/Q166882...
    BECOME A MEMBER:
    ►Join: / @drtrefor
    MATH BOOKS I LOVE (affilliate link):
    ► www.amazon.com/shop/treforbazett
    COURSE PLAYLISTS:
    ►DISCRETE MATH: • Discrete Math (Full Co...
    ►LINEAR ALGEBRA: • Linear Algebra (Full C...
    ►CALCULUS I: • Calculus I (Limits, De...
    ► CALCULUS II: • Calculus II (Integrati...
    ►MULTIVARIABLE CALCULUS (Calc III): • Calculus III: Multivar...
    ►VECTOR CALCULUS (Calc IV) • Calculus IV: Vector Ca...
    ►DIFFERENTIAL EQUATIONS: • Ordinary Differential ...
    ►LAPLACE TRANSFORM: • Laplace Transforms and...
    ►GAME THEORY: • Game Theory
    OTHER PLAYLISTS:
    ► Learning Math Series
    • 5 Tips To Make Math Pr...
    ►Cool Math Series:
    • Cool Math Series
    SOCIALS:
    ►X/Twitter: X.com/treforbazett
    ►TikTok: / drtrefor
    ►Instagram (photography based): / treforphotography

КОМЕНТАРІ • 333

  • @DrTrefor
    @DrTrefor  12 днів тому +43

    Some debate in the comment section about "smallest integer" vs "least integer" - as in interpreting smallest as closest to zero. I stuck with the original source (x.com/ericneyman/status/1804168604847358219) of the question for phrasing in the video, but it turns out that chatgpt etc all struggle with every version of phrasing I've found, and even interpreting as closest to zero still don't give what would then be the two answers of -4 and 4. The larger point here is that there does seem to be a real blindspot where so many similar problems presumably have the context of smallest/least natural number or counting number or similar, and so modelling off of the training data and giving similar answers this question confuses it despite the simplicity.

    • @mikeymill9120
      @mikeymill9120 12 днів тому

      Smallest from zero is absolute value

    • @RunstarHomer
      @RunstarHomer 12 днів тому

      @@mikeymill9120 "Small" means close to zero. 0.00001 is a smaller number than -10000. The latter is lesser, but bigger.

    • @mmmmmratner
      @mmmmmratner 11 днів тому +3

      As an electrical engineer, "smallest" means closest to zero more often than not. If I am instructed to choose the amplifier from a list with the smallest error voltage or the smallest input current, I am not looking through datasheets for negative numbers.

    • @thenicksterd2334
      @thenicksterd2334 10 днів тому +1

      @@mmmmmratner lmao this is a math problem not ur list of amplifier error amounts. The problem specified integer which includes negative numbers, the fact that integer was specified should have queued it into thinking about negative numbers.

    • @anywallsocket
      @anywallsocket 10 днів тому +4

      @@DrTrefor the blind spot is in gpt because the blind spot is in humans, overtly exemplified by the comment section

  • @Null_Simplex
    @Null_Simplex 12 днів тому +377

    To be fair I got 4 for “Smallest integer whose square is between 15 and 30” since I thought smallest meant closest to 0, not least positive/most negative number.

    • @rakshithpl332
      @rakshithpl332 12 днів тому +61

      Same😂😂I instantly answered 4 without giving a second thought

    • @tylerlopez6379
      @tylerlopez6379 12 днів тому +100

      I think smallest is purposely misleading language, I wouldn't describe a negative number as being small. It's like saying -5 apples is smaller than 0 apples.

    • @rakshithpl332
      @rakshithpl332 12 днів тому +5

      Yeah, it tricks our mind just like the bat and the ball problem.

    • @ActuatedGear
      @ActuatedGear 12 днів тому +30

      @@rakshithpl332 I also think we approach the problem differently when solving "word problems" to equations. That lends credence to a habit I notice of mathematicians to explicitly move math problems to equations or more appropriately here inequality form, that is mathematic notation for proper clarity.

    • @vorpalinferno9711
      @vorpalinferno9711 12 днів тому +4

      He meant smallest not the modulus of the smallest.
      You are thinking about the modulus.

  • @magnero2749
    @magnero2749 12 днів тому +143

    When taking Calc2-3, Linear Algebra, and Differential Equations this past year I would use it to study. Namely I would ask it to solve a problem and as it broke them up into multiple steps I could spot where it went wrong and this way tailor my study time more efficiently.
    Before Chat GPT, if I didn't understand a problem I would often times have to read a WHOLE bunch of things I already knew until I got to what I needed. Bottom line is, this is a tool not a babysitter and like any tool we need to develop the skill in how to use it.

    • @DrTrefor
      @DrTrefor  12 днів тому +43

      That approach makes a lot of sense to me

    • @randomaj237
      @randomaj237 12 днів тому +5

      This is what I’ve been doing as well, using to study and confirm stuff. Figuring out where it makes errors also makes you feel like you’ve learned quite a bit.

    • @ccuuttww
      @ccuuttww 12 днів тому +2

      Thinking by yourself is a kind of training don't just solve Math and get marks u need to solve the problrm

    • @mooseonshrooms
      @mooseonshrooms 10 днів тому +4

      I did the same with it. Often though, my professor would make the problems very unique and I started to find more often than not, generative AI was completely off the mark. Luckily I was able to utilize other resources and still had a very high success rate.

  • @DarkBoo007
    @DarkBoo007 12 днів тому +47

    I had a student use ChatGPT to complete a Related Rates problem in AP Calculus and ChatGPT definitely messes up the basic arithmetic. My student was so surprised about how it failed to multiply 133 and 27. I use AI to reinforce the idea that students must understand concepts and reasoning for each math problem. Especially when ChatGPT assumes things that were not assumed in the actual problem.

    • @AD-wg8ik
      @AD-wg8ik 12 днів тому +3

      Free version or paid version? GPT4 makes a lot less mistakes

    • @DarkBoo007
      @DarkBoo007 12 днів тому

      @@AD-wg8ik I believe it was the free version

    • @johnchestnut5340
      @johnchestnut5340 8 днів тому +1

      I studied before AI was a thing. I had other tools. I was supposed to find the resonate frequency of a circuit. I just wrote the equation and turned in a graph with resonate frequency clearly shown. Computers are neat tools. But I still had to know what equation to use and what the graph represented. I prefer books. I don't know how anyone can trust an Internet reference that anyone can edit.

    • @Lleanlleawrg
      @Lleanlleawrg 2 дні тому +1

      I've used it in a little experiment of mine, and it's given me wildly different answers for the same setup every time, suggesting it's deeply broken for math still.

  • @bornach
    @bornach 12 днів тому +34

    These LLMs are easy to trip up if you give them a problem not in their training data but has a similar structure to another problem that it was trained on. For example I asked Gemini: I have a 7 liter jug and a 5 liter jug. How do I measure out 5 liters of water?
    It devised a 6 step solution that didn't make any sense at all.

    • @DrTrefor
      @DrTrefor  12 днів тому +9

      I've noticed similar ones to this, where it is close to a "standard" problem about jugs of water but the solution is so trivial it misses it entirely trying the more complicated approach.

    • @bravernewmath
      @bravernewmath 12 днів тому +13

      (L)LMAO. I just tried this out on GPT 4-o and received a 14-step solution.
      In response, I asked if it could produce a solution in fewer steps.
      "Certainly!" it replied in its chipper manner, "Here is a simpler method to measure out exactly 5 liters using a 7-liter jug and a 5-liter jug", whereupon it proceeded to give me... a 𝟐𝟎-step solution.

    • @driksarkar6675
      @driksarkar6675 12 днів тому +1

      @@bravernewmath That's interesting. I got a 10-step solution (that doesn't work). After repeatedly asking it to find solutions with fewer steps, the solutions I got had 8, 6, 6, 3, 6, and 1 steps (in that order). It was insistent that its 6-step solution was the shortest valid solution until I flat out told it it wasn't lol

    • @bravernewmath
      @bravernewmath 12 днів тому +2

      That's funny. I pushed a little more afterwards, eventually asking it for a 1-step solution. I was told that no such solution was possible. I responded, "Oh, it's possible, all right. Think hard, and I'll bet you can figure it out." Interestingly, after that "hint", GPT answered it correctly.

    • @epicgaming7813
      @epicgaming7813 7 днів тому +3

      I was asking it this question and asked it how it could do it in one step. It kept on giving 7 step responses and I kept saying “that’s more than one step”
      Then it gave me a notification that I reached my message limit and would be downgraded to GPT 3.5
      It then instantly figured it out after I was downgraded…

  • @wesleydeng71
    @wesleydeng71 12 днів тому +62

    Terence Tao said in a talk that AI once helped him solve a problem. He asked AI (don't know which one) how to prove an inequality. It gave a bunch of ideas and mostly garbage. But among those was a suggestion to try generating functions which Tao said he "should have thought of". 😂

    • @DrTrefor
      @DrTrefor  12 днів тому +24

      Oh that’s a great anecdote. Also I think giving ideas for directions to pursue is a great application

    • @Laminar-Flow
      @Laminar-Flow 11 днів тому

      @@DrTrefor Maybe there ultimately is some emergent property of the way these LLM’s transformer architectures & training methodologies that can, when scaled up, give us new and unique solutions to a lot of problems. There are hints right now but all researchers are bickering over several factors.. I used your discrete math course when I took it. Helped so much and this popped up as recommended, glad I watched. Immediately recognized you from those strong induction proof struggles haha

  • @AnkhArcRod
    @AnkhArcRod 12 днів тому +49

    You do realize, however, that Google's own Alphazero is a separate simmering monster that plays Go, Chess, Starcraft and aced the IMO Geometry exams. LLMs are not the real danger here.

    • @DrTrefor
      @DrTrefor  12 днів тому +23

      I’m particularly intrigued by hybrid approaches too

    • @ianmoore5502
      @ianmoore5502 12 днів тому +2

      @@DrTrefor man gets it

    • @denysivanov3364
      @denysivanov3364 12 днів тому

      Actually not. Alphazero architecture can be used to learn to play chess go and shogi. But it was three different networks + search engines (ai systems 😀)

    • @mouldyvinegar5665
      @mouldyvinegar5665 11 днів тому +5

      I strongly disagree with the notion that LLMs are not the real danger. AlphaGeometry was made of two parts - a symbolic deduction engine, and a *language model*, so if LLMs aren’t a danger then AlphaGeometry isn’t either. Similarly, it is perhaps misleading to say it aced the IMO problems. It would solve near re-worded problems (but the fact they reworded the IMO problems is itself a bit of a red flag), and the proofs are in no means good proofs (I recommend the video by Another Roof). Additionally, the strength of LLMs is their generality. DeepMind has certainly done a lot when it comes to making general game engines, but I would be sceptical that any alpha-whatever can be as cross modal as the best LLMs. Finally, LLMs being able to write problems is a significantly more relevant problem to the human populous than it being able to play chess at an absurdly high level. Whether or not the hype and fear is justified, LLMs will have a significantly larger impact on humanity because they are so good at mimicking humans than near enough any other AI model or paradigm.

    • @WoolyCow
      @WoolyCow 10 днів тому +5

      @@mouldyvinegar5665 "the proofs aren't good proofs" wdym?? i thought spamming a bunch of shapes until something works out is how all you math people do things

  • @Mochi_993
    @Mochi_993 11 днів тому +5

    ChatGPT and other LLMs are basically large statistical search engines at this point. For the problems involving advanced mathematical knowledge and sophisticated reasoning AI currently generates garbage solutions.

  • @theuser810
    @theuser810 12 днів тому +53

    The term "small" is ambiguous, it usually used in the context of positive numbers.

    • @blblblblblbl7505
      @blblblblblbl7505 12 днів тому +5

      Yeah small to me implies low absolute value. "Lowest integer" or "least integer" would be less ambiguous I think.

    • @Craznar
      @Craznar 11 днів тому +5

      integer includes +ve and -ve numbers, so it clearly includes negative numbers.

    • @SalmonSushi47
      @SalmonSushi47 11 днів тому

      maybe changing prompt to lowest might help

    • @Laminar-Flow
      @Laminar-Flow 11 днів тому

      @@blblblblblbl7505 @theuser810 Not when you have a specified domain (literally the integers as stated in the problem), even though it isn’t in formal notation as an image (which SHOULD help the LLM lol). In terms of linear algebra, this inherently includes the negatives, by definition. A human taking that course would know this. The set of integers Z = {…,-3,-2,-1,0,1,2,3,…} would be given as one of the cursory definitions in the course…
      Also, if you want to argue about magnitude, magnitude doesn’t even really matter for this problem any more than cardinality of the set |Z| IMO, in fact it doesn’t matter at all. You could ask the same question about the smallest square but for the real numbers, and the only answer for that is what the gpt actually spit out. “Small” in the context of negative numbers is a trick used by professors to trick students but it’s an easy correct question on an exam lmao. I made it thru that in an ass-kicking STEM degree and I think the poor LLM should too 😂

    • @anywallsocket
      @anywallsocket 10 днів тому +1

      No ‘lowest wouldn’t help’ it’s just a bad question and he’s being obstinate about that fact

  • @paulej
    @paulej 12 днів тому +78

    I had a conversation with Bard (now Gemini). I was curious if it could solve a Calc I problem. It got it wrong. I told it and it said, "You're right!" and re-worked it. It got the right answer, but the steps were wrong. I told it. Amazingly, it understood exactly what step was erroneous, but then got it wrong again. I went back and forth a few times and it did finally get it right. It's interesting to observe. Anyway, I do appreciate the breadth of knowledge these AI systems have, but I cannot fully trust any of them. Everything has to be checked.

    • @DrTrefor
      @DrTrefor  12 днів тому +29

      Ya the "everything has to be checked" part is definitely true. It can LOOK pretty good, but he utter nonsense.

    • @no_mnom
      @no_mnom 12 днів тому +8

      ​​@@DrTreforI think adding that everything needs to be checked is not enough because you need to know enough about the subject as well to know you are not being fooled by it.
      And I doubt it will ever be perfect after all what we mean when we say "Solve ___" is far more complex and we expect the computer to understand on its own what you meant.

    • @ReginaldCarey
      @ReginaldCarey 11 днів тому +5

      It’s really important to realize, it’s not checking its answer for correctness. It’s making a prediction of what you want given its bad answer and your response to that answer. The “you’re right” component is a feature of the alignment process.

  • @Rodhern
    @Rodhern 12 днів тому +9

    When I was young, pocket calculators were still considered (almost) a novelty. One way to make mathematics examinations, or indeed any science related examination harder, was to include extraneous information in the questions. Sometimes this 'trick' was even considered unfair (and often it could be unfair, because of poor quality questions, but that is a topic for another day).
    The thing is, students en-masse would get caught out, waffling on about the irrelevant question parts; not to remark that they were irrelevant, but to allude that they had taken all this information into account in their answer.
    Now, I am curious, how do the LLMs deal with such scenarios?

  • @baumian.
    @baumian. 3 дні тому +1

    One of the most hilarious things you can do with ChatGPT is to ask "are there any primes whose digits sum to 9?". It will say yes, and will spew out lots of primes and then realize their digits don't sum to 9. Or it will spew out lots of numbers whose digits sum to 9 and then realize they're not prime :D

    • @carultch
      @carultch 10 годин тому +1

      The reason there can't be any primes whose digits sum to 9, is that the only numbers whose digits sum to 9, are numbers that are multiples of 9. Since 9 itself isn't prime, this rules out all numbers whose digits sum to 9 from the prime number set.

  • @Nhurgle
    @Nhurgle 12 днів тому +7

    I use it with the even number exercise as there is no answer offered in most book.
    Also, I use it to obtain a detail solution and explaination of any exercise I cannot solve.
    I also use it to transform slides into question and answer anki format memory flash card. That way, I get quick study material and I can focus on practice.
    Lastly, I use it to get more example of formative exam / quiz.
    It's not perfect, but it's better than nothing as my professor don't want to provide any of the aforementionned elements.

  • @dominikmuller4477
    @dominikmuller4477 9 днів тому +2

    I mean.. the proof that Null(A) is a subspace has to literally be part of ChatGPTs training set. So I don't think asking it about that will give you any information about its mathematical reasoning.
    I tried some rather interesting probability problems on it, things that are designed to trick human intuition to demonstrate that in probability theory you shut up and calculate, rather than trusting your intuition. It did kind of well on the standard ones, and miserably failed as soon as I did a minor variation that did nothing to increase the difficulty. This was GPT4o.
    For reference, it got right: "A family has two children. One of them is a girl. What is the probability that the other one is a girl?" (1/3).
    It got almost right (and got right with some conversation): "A family has two children. One of them is a girl born on a Sunday. What is the probability that the other one is a girl?" (13/27)
    These are both standard questions that it would have had somewhere in its training data. So I did a minor variation on the second one:
    "A family has two children. One of them is a girl born on a Sunday. What is the probability that the other one was born on a Sunday?" (1/9)
    This one it got wrong, and only got right after intense discussion of its mistakes.
    You solve all of these the same way, by counting possibilities and ignoring your intuition. But the last one is not a standard and probably not in its training data, and it got lost immediately, showing that it did not generalize the methods it used to successfully "solve" the first two problems (which were probably just solved by someone in its data set).

    • @DrTrefor
      @DrTrefor  8 днів тому

      Ya kind of well on standard one and miserably on nonstandard aligns well with my experience

  • @boltez6507
    @boltez6507 11 днів тому +5

    The things is ChatGPT wouldn't be ever able to come up with logical reasoning for a new approach or thing.

  • @walter274
    @walter274 12 днів тому +14

    Chat GTP struggles in Calculus. I gave it, an area problem in polar coordinates and it kept using a symmetry arguement, but it didn't execute it correctly.

    • @DrTrefor
      @DrTrefor  12 днів тому +12

      I've noticed it sometimes really struggles when there is a large body in the training data using other methods. So for example geometry problems there are millions of highschool level ones and it tries these techniques sometimes when calculus makes it simple.

    • @walter274
      @walter274 12 днів тому

      @@DrTrefor I agree. When the training data is pretty sparse it goes really off the wall. At least it did in 3.5. I'm using information theory, which is relatively obscure in one of papers and when i was talking to chat about it, it was switching notations mid example. It became very incoherent. Overall i still find it to be valuable tool.

    • @bornach
      @bornach 12 днів тому +1

      ​@@DrTreforDoesn't have to be a large body of training data. Just one example can throw it off. I asked both Bing Copilot and Google Gemini: "5 glasses are in a row right side up. In each move you must invert exactly 3 different glasses. Invert means to flip a glass, so a right side up glass is turned upside down, and vice versa. Find, with proof, the minimum number of moves so that all glasses are turned upside down." Both AIs mess this up badly because their training data contains the answer for flipping 4 glasses which has a completely different solution.

    • @kubratdanailov9406
      @kubratdanailov9406 12 днів тому

      @@bornach it's almost like LLMs are just stochastic parrots that are waiting for knowledge to be "put into them" via their training data rather than being able to synthesize new knowledge from the building blocks of knowledge (i.e. facts, logic).
      To stump ChatGPT in math, all you need to do is to grab some "offline" book on preparation for competitions (e.g. any non-English competition math book), translate the question and ask it to it.
      When all you have access to are millions of problems people have solved, "true" intelligence would be able to solve every other problem from that same level. Chat GPT fails at that because... :)

    • @andrewharrison8436
      @andrewharrison8436 2 дні тому

      @@bornach That's a nice twist (pun intended). Will add that to my repetoire. Thanks.

  • @bartekabuz855
    @bartekabuz855 12 днів тому +17

    Hey, student here. Chat gpt seems to know only standard questions but is clueless when asked about nonstandard problem. The worst thing is she can't confess when a problem is too hard. Instead she outputs an incorrect solution

    • @bornach
      @bornach 12 днів тому

      Yes I've noticed this of all Large Language Models. They basically memorise answers to questions and have to piece together answers by recognising patterns in your question that are similar to questions it trained on. Bing Copilot got this wrong: "Two American coins add up to 26 cents. Neither is a penny. Is this possible?" because it regurgitated the answer to a riddle that sounded similar.
      Google Gemini got the correct answer, but then tripped up on "Three American coins add up to 31 cents. Two are not pennies" by trying an odd/even argument to explain how it was impossible.

    • @steveftoth
      @steveftoth 12 днів тому +3

      That’s cause llm are at their heart, a search engine, not a reasoning or computation engine.

    • @adnan7698
      @adnan7698 11 днів тому +6

      You made me feel weird by calling it a she

    • @bartekabuz855
      @bartekabuz855 11 днів тому +1

      @@adnan7698 I think it's "she" bc she talks a lot more than necessary

    • @Not_Even_Wrong
      @Not_Even_Wrong 9 днів тому

      Every typical math test problem was in the data set 10000 times that's why it can solve those, anything else it gets wrong

  • @sigontw
    @sigontw 11 днів тому +2

    I am not teaching math, but teaching statistics and data analysis in professional schools for healthcare providers. Many clinical/counseling psychology, social work, nursing students etc. do have math anxiety. That is why I started to incorporate generative AI in my class. Unfortunately, even clinical healthcare providers need to understand quant methods and have basic programming skills, so they can do well in their jobs in the future and help improve their jobs, not just follow what they were taught 10 years ago. But, alas, it is such an upward battle to teach them stats reasoning and programming. I am very grateful we have these new tools as their 24/7 TAs, especially when they are stuck in programming at 12:00 AM.

  • @birhon
    @birhon 12 днів тому +7

    Thanks for pointing me out to wolfram's custom GPT! Definitely combining non LLM tools for reasoning with LLM tools for interpreting will be the key.

    • @DeclanMBrennan
      @DeclanMBrennan 12 днів тому +3

      A key anyway. Many other specialist "reasoning" mechanisms will probably also be needed before we approach anything that could be called "AGI".

    • @soumikdas3754
      @soumikdas3754 12 днів тому

      ​@@DeclanMBrennanAGI you mean

    • @DeclanMBrennan
      @DeclanMBrennan 12 днів тому

      @@soumikdas3754 Thanks for pointing out the typo.

  • @Markste-in
    @Markste-in 12 днів тому +6

    How do we know that the published LLMs haven''t seen the Math-Problem-Datasets (just a little bit) during the training, so they appear better than the competition during the benchmark. They are more or less all closed source.

  • @bartholomewhalliburton9854
    @bartholomewhalliburton9854 6 днів тому +1

    I asked ChatGPT whether the box or product topology was finer, and it would keep telling me the product topology is finer. Then, when I asked it to give me an example, it used a finite product. ChatGPT does not know its topologies 😭

  • @jackkinseth2936
    @jackkinseth2936 12 днів тому

    thanks for making a really important video on this topic. i think i’m going to spend some time with my discrete math/intro proof students tomorrow discussing this

  • @letmedoit8095
    @letmedoit8095 11 днів тому

    I appreciate you not dismissing those tools like many people do ("it's just a statistical inference machine, I am so very smart"), so I am really excited about your planned video on how to integrate them in our learning routines.

  • @SeeMyDolphin
    @SeeMyDolphin 7 днів тому +1

    4:20 You don't have four choices at each step. On the inner edge you only have three choices. In the corners you only have two.

    • @johnanderson290
      @johnanderson290 5 днів тому +1

      I made the same observation and was also unconvinced of the proposed solution of P=1/128.
      According to my calculations, considering the limited options at the perimeter vertices, and that:
      P = |event space| / |sample space|
      = 2 / (# of possible paths of length 4 starting at upper left corner of inner square),
      I arrived at the answer P=2/150=1/75.
      Another user in the comments also raised the same concern, but another user replied stating that the probability of choosing the path around the center square is unaffected by the limited grid size. However, I’m struggling with this reasoning and believe that I disagree.

  • @Laminar-Flow
    @Laminar-Flow 10 днів тому

    On my rocketry team before I graduated, I needed to interpolate wind-tunnel and CFD drag values for various flap angles and speeds of a rocket deploying an airbrake system into a *double-digit term polynomial (drag non-linear)* we could use to create an apogee-airbrake control algorithm (A.A.C.A.). Given I was busy doing research in a clean room for like 80% of my time outside class, I used GPT 4.0 to help me write a MATLAB script to interpolate the date, graph everything, and I wrote a quick C script in vim to test out all the original input values, then checked by hand and verified w sim. The rocket was within 2 feet of predicted apogee (close to 10k feet) when we launched it at our next test launch. One thing I will mention: I would never have used an LLM to write code that did anything more than at worst make the rocket slow down faster than normal. Our test conditions were also desolate. In the long run as an engineer who has to look at this as someone born at the turn of the century, this type of tool (whether narrow or generally-intelligent) is something that will likely become part of my daily life for most of my career. Just my thoughts.

  • @baronvonbeandip
    @baronvonbeandip 12 днів тому +3

    Guess we need to start asking better questions of students.
    Like, you know that deadzone of math education between 4th and 9th where they don't learn a single new thing? Why not teach them proofs in elementary number theory? AI sucks at proofs right now.

  • @tylerbird9301
    @tylerbird9301 12 днів тому +16

    I think the lack of consideration of the negative solutions have plagued humans ourselves for centuries. I didn't consider -5.
    Also as @Null_Simplex says, there is ambiguity between smallest in magnitude vs how far left on the number line.

    • @stenzenneznets
      @stenzenneznets 7 днів тому

      There is not ambiguity ahahah

    • @andrewharrison8436
      @andrewharrison8436 2 дні тому

      When you consider how long zero took to be accepted - negative numbers, probably still witchcraft.

  • @joshrobles6262
    @joshrobles6262 12 днів тому +3

    I've given it some of my non-standard calculus 1 and statistics problems and it does very well. I'm guessing this still comes down to the training data though. Much more of these problems out there than linear algebra.

    • @DrTrefor
      @DrTrefor  12 днів тому +1

      I’ve heard from my colleagues that statistics it is particularly strong at up into about 3rd year level

  • @michaelcharlesthearchangel
    @michaelcharlesthearchangel 12 днів тому +5

    People should want to learn rather than cheat.

    • @rakshithpl332
      @rakshithpl332 11 днів тому +1

      @@michaelcharlesthearchangel Exactly, where has nearly everyone kept their conscience?

  • @ReginaldCarey
    @ReginaldCarey 12 днів тому +1

    I just pressed GPT4o on the product of two vectors. I tried several prompts. It may be able to answer classic linear algebra questions but it struggles to recognize that Clifford Algebra is a superset. As a result. Responses to the product of u and v where they are vectors, kind of delivers the party line. It’s not until you add the word Clifford to the prompt does it begin to give the right answer. But, now that I’ve provided the word Clifford in the context of the conversation it keeps answering in terms of the geometric product.

  • @tomholroyd7519
    @tomholroyd7519 12 днів тому +2

    I find them almost too agreeable. Claude 3.5 has this thing where it always asks you a question at the end, to keep things going I guess, until it said "Sorry that's too many questions today, come back tomorrow". I don't need the whole first paragraph of the response to be a repetition of my question

    • @DrTrefor
      @DrTrefor  12 днів тому +1

      haha ya they really want you to pay for the upgrade:D

  • @ReginaldCarey
    @ReginaldCarey 12 днів тому +3

    After digging on it, it doesn’t seem to understand the geometric significance of geometric products. It seems to be parroting the most common response.

    • @urnoob5528
      @urnoob5528 11 днів тому +1

      fr it echoes the most common misconceptions for every subject if u ask it

  • @charlieng3347
    @charlieng3347 10 днів тому +1

    For the probability problem, 2/256 implies that for the first 4 steps, there are 256 possible outcomes and 2 of them are walks around the central square. However, considering that the diagram is limited by the edges at the rear, I don't think there are 256 possible outcomes and the result is 2/256.

    • @dmwallacenz
      @dmwallacenz 6 днів тому +1

      True, there are NOT 256 possible outcomes, BUT the probability of choosing the right directions to complete the unit square is not affected by the proximity of the edges of the grid.

    • @johnanderson290
      @johnanderson290 5 днів тому

      I made the same observation and was also unconvinced of the proposed solution of P=1/128.
      According to my calculations, considering the limited options at the perimeter vertices, and that:
      P = |event space| / |sample space|
      = 2 / (# of possible paths of length 4 starting at upper left corner of inner square),
      I arrived at the answer P=2/150=1/75.

    • @johnanderson290
      @johnanderson290 5 днів тому

      @@dmwallacenzI’m struggling with being convinced of this. Could you please elaborate more on your reasoning, specifically wrt the formal definition of probability? Also see my other comment here.

    • @dmwallacenz
      @dmwallacenz 5 днів тому

      @@johnanderson290 Sure, I'll try to explain. Forget about the anticlockwise option to start with, and just calculate the probability of traversing the square clockwise. To do that, you have to pick "right" as your first choice (probability is 1/4), "down" as your second choice (probability is 1/4), "left" as your third choice (probability is 1/4) and "up" as your fourth choice (probability is 1/4). So the probability of making all four choices correctly is 1/4 x 1/4 x 1/4 x 1/4, which is 1/256. Then you can calculate the probability of traversing the square anticlockwise, and it's very similar - it also comes out to 1/256. Add those together, and you get 1/128.
      Without seeing the details of your argument, I can't point out exactly what mistake you've made. But I suspect it's this - of the possible paths you've counted, not all of them are equally likely. That is, a path where you hit the edge of the grid in the first three moves will have a higher probability than a path where you don't. So the two "correct" paths around the square actually have a lower probability than some of the other paths you've counted.

    • @dmwallacenz
      @dmwallacenz 5 днів тому +1

      For example, suppose I want to calculate the probability of going left, then up, then right, then down - that is, traversing the top-left square of the grid clockwise. The probability of going left at step 1 is 1/4. Once I've done that, the probability of going up at step 2 is 1/3, because there are only three ways to go. I'm now in the very corner of the grid, so the probability of going right at step 3 is 1/2. Lastly, the probability of going down at step 4 is 1/3. So the probability of choosing this particular path is 1/4 x 1/3 x 1/2 x 1/3 = 1/72. That's more than three times as likely as the clockwise path around the central square.

  • @oldadajbych8123
    @oldadajbych8123 12 днів тому +2

    I gave ChatGPT 4o simple engineering problem. Calculate the diameterbof the shaft for certain power at given rpm, allowed stress, shear modulus, maximum allowed relative torsion angle. First it asked for the length, I said that it is not needed. Then he used correct formulae for both strength and deformation criterions, but it made 6th grade mistake when moving fractional denominator in equation. I have pointed out the error. It correctly modified the equations, but mixed the units (incorrect use of non-basic units and mixed SI and imperial). After little discussion it got the substitution right. Now, then came the 3rd and 4th root to get the answers for both criterion. And it was absolutely off. I suppose that it is just guessing the result. Also other calculations are not absolutely precise compared to what you get from calculator or mathematical program. But it always sounded so confident when it described the calculation process containing errors. I strongly suggest not to use these AI models for calculations, if you don’t know what you are doing. It is similar for programming.

  • @doraemon402
    @doraemon402 12 днів тому +1

    4:25 that answer is wrong because there are 4 paths back to the originial point, not 2
    Also, since when the "smallest" number isn't the one closest to 0? 4 and -4 is the correct answer.
    Always used low/high for order, small/big for magnitude.

  • @tomholroyd7519
    @tomholroyd7519 12 днів тому +2

    LLMs don't check themselves. It's hugely expensive

  • @bendavis2234
    @bendavis2234 6 днів тому

    One of the most interesting ways I’ve been using LLM’s is to help create ideas for application based word problems in a given area. It comes of with some cool examples for problems! Sometimes they were even more interesting than the word problems on our homework’s/tests, but of coarse not always.

  • @NandrewNordrew
    @NandrewNordrew 5 днів тому

    I personally think its more accurate to say that 4 is smaller than -5.
    4 is *greater than* or *more* than -5, but I think it makes sense to say that “bigness” is a measure of absolute value.
    Yap:
    This makes sense especially if you take into consideration complex numbers. When multiplying two complex numbers, the *amplitudes* multiply. Numbers with an absolute value of 1 never change in absolute value when taken to any power, etc…

  • @DrR0BERT
    @DrR0BERT 11 днів тому +2

    I nearly spit out my drink when I saw the calculators with the infamous 6÷2(1+2) viral problem. I commented on it when you posted it many years ago, and I am still getting comments that I am wrong.

    • @johnanderson290
      @johnanderson290 5 днів тому

      The correct answer is 9, right? (According to the order of operations that I learned.)

    • @DrR0BERT
      @DrR0BERT 4 дні тому

      @@johnanderson290 In my opinion you are correct, but the problem is ambiguous. Dr. Trefor has a video on this. ua-cam.com/video/Q0przEtP19s/v-deo.html

    • @carultch
      @carultch 10 годин тому +1

      @@johnanderson290 There is no correct answer, since it is an ambiguous notation. There is no consensus on whether multiplication implied by juxtaposition has special priority over division (PEJMDAS), or whether all multiplication is treated the same, regardless of notation (PEMDAS).
      If you follow by PEMDAS, the answer is 9
      If you follow PEJMDAS, the answer is 1.
      Middle school teachers, particularly in the US, teach PEMDAS to keep it simple. While professional publications use PEJMDAS all the time.

    • @johnanderson290
      @johnanderson290 10 годин тому

      @@carultch Thanks! I appreciate your explanation! 👍

  • @jerryeldridge1690
    @jerryeldridge1690 12 днів тому +1

    The 4 x 4 grid graph is interesting but with the video quick I thought the problem was to find the probability of a walk from (1,1) to (1,1).so I did "import graph as g", G = g.GraphProduct(g.Pn(4),g.Pn(4) and G2 = g.MakeUndirected(G) and A = g.AdjMatrix(G2). Then defining n1 = 1 + 1*4 and n2 = 1 + 1*4 = 5. I computed B = A @ A @ A @ A using numpy. Then the number of paths from n1 to n2 is B[n1,n2]. of length 4 The total number of paths of length 4 is np.sum(B.flatten()) and so the probability of a loop on the grid is p = B[n1,n2]/np.sum(B.flatten()) = 0.021573. Then to check this B[n1,n2] = 34 I counted the number of distinct edges of length four totalling to 8, then number of length 2 loops times two equal to comb(4,1)*comb(4,1) = 16 plus the number of going out two and coming back on those two equalling to 10 for a total of 34. I also checked length 2 loops. I guess this is correct as I might have heard someone say this is how this is done. But the actual problem in the video is (1/4))*4 = 1/256 but this other one is more interesting or fun.

    • @mashmoorjani9538
      @mashmoorjani9538 9 днів тому

      I was a bit confused with this, if he put in 2 options, clockwise and anti clockwise. But can you not go north first clockwise and anti, similarly south first as well clockwise and anti clockwise?
      So in total there are 8 options, clockwise and anti clock whether you start with North South East or West?

    • @jerryeldridge1690
      @jerryeldridge1690 9 днів тому

      @@mashmoorjani9538 In my reply, I looked at all paths from (1,1) to (1,1) of length 4. Since edges in the digraph are doubled one for each direction, one can trace a route two steps and return back on those two steps. Likewise L-shaped moves, and O-shaped moves.

  • @ianfowler9340
    @ianfowler9340 12 днів тому +2

    "Indispensable" tool ??I I would have said "convenient " tool. If they truly are indispensable, then we are in a LOT of trouble.

    • @urnoob5528
      @urnoob5528 11 днів тому +3

      as an engineer
      they are more of a toy than being convenient
      because they never get shit right
      just do ur own thinking and research
      u d be a better engineer/watever person that way

  • @mkbestmaan
    @mkbestmaan 10 днів тому

    Math exploration will always be personal. ChatGPT, as a tutor, helps one appreciate more the spiritual, philosophical, and psychological benefits of enjoying Math. Math will always be a poem, and ChatGPT is helping me appreciate myself as a thinker, creator, and writer. We just love to think and solve problems. The discovery of truths is what matters at the end of the day. ChatGPT is both a tutor and a friend for positive psychology to happen. It is great to reflect on a growth mindset, slowly mastering all math concepts and skills as an aspiring Math teacher, tech enthusiast, and spiritual writer. Thank you, Professor, for the example and for the inspiration. One can just take it one math concept/skill at a time.

  • @Python_Lover_Official
    @Python_Lover_Official 19 годин тому

    Dear sir, I am building a website for mathematics problems and solutions... But how do I integrate in wordpress because I write the code and then in browser the output is showing outside of the post area . If I use container to write the code in it then it working but sometime it shows outside of the post area and title also shows the same outside of the post... Very frustrated...

  • @ianfowler9340
    @ianfowler9340 11 днів тому +1

    I think the bigger question here is what will AI, ChatGPT, .... etc will look like 5-10 years from now. At the present, they are still in their infancy. And as such, they will often mess up, be confused and return nonesense. A lot of us would like to think that the human brain with all of its complexity, adaptability, creativity, openness to new ideas, self awareness, ...etc (the list goes on) will reign supreme over time. But 10 years from now? I'm not so sure when it comes to Mathematics, Literature, Music,.... I seem to recall that Geoff Hinton bailed a year ago and I'm sure that Turing is rolling over in his grave.

  • @1.4142
    @1.4142 12 днів тому

    ChatGPT solved most linear algebra problem and proofs I threw at it, but was stumped by most of calc 3, which requires some visualization at times.

  • @allanjmcpherson
    @allanjmcpherson 2 дні тому

    I'm happy to see your golden pi creature in the background!

    • @DrTrefor
      @DrTrefor  2 дні тому

      Ha was wondering who would notice that!

  • @Manoj_b
    @Manoj_b 12 днів тому

    I have given some complicated doubble and triple sums it works but there will be some eorrorrs that you can find easily as soon as we upload it says to solve the expression "____" so, we will know what was the mistake so, we can just retype as change that variable to this something like that and it works preety fine .
    Yeah,, AI can do some maths .

  • @1.4142
    @1.4142 12 днів тому

    I hope that we can make something that has perfect reasoning but can also understand natural language input. For now chatgpt can't even find the pattern of filling in squares bordered by other colored squares in a grid.

  • @mjkhoi6961
    @mjkhoi6961 3 дні тому

    I've found that ChatGPT struggles with math problems that are trick questions, whether ambiguously worded or not
    Example: ask it "What is the smallest positive real number?" and it will give you a very small positive real number, rather than saying it doesn't exist. In my experience, asking it to double-check its answer will not help it notice the trick question, rather it will say "I apologize for my error, here's the right answer" and then either give the same answer or a different, also wrong answer. Only upon asking it questions about *the question itself* does it point out the contradiction.

    • @mjkhoi6961
      @mjkhoi6961 3 дні тому

      Alternatively, if you ask it "*Is* there a smallest positive real number?" before the trick question then it will give the correct answer
      but asking it "What is the smallest positive *rational* number?" after that will trip it up again

  • @tomholroyd7519
    @tomholroyd7519 12 днів тому

    Really excellent point about homework.

  • @Electronics4Guitar
    @Electronics4Guitar 8 днів тому +1

    I have tested ChatGPT by giving it elementary (about sophomore level) analog design problems and the results are absolutely laughable. Even when I very, very tightly constrain the design task it fails miserably. It usually responds like a student that thinks his professor knows nothing and that he can BS his way through the assignment.

    • @ThomasVWorm
      @ThomasVWorm 7 днів тому

      It does not respond like a student, who thinks, his professor knows nothing.
      Chat GPT does not give a damn about the person, it does have a conversation with. And it does not give a damn about anything, not even its responses. It just creates an output.
      What you get is what humans call brain storming: unfiltered output.

  • @henrytang2203
    @henrytang2203 7 днів тому

    "Small" could be interpreted as 'closest to zero' or 'closest to negative infinity'. It might be a good time to coin some single words that mean 'large positive', 'small positive', 'small negative' and 'large negative'. So it's a language problem.

  • @soumikdas3754
    @soumikdas3754 12 днів тому +4

    I am a ug physics student. I have tried to import many of my physics problems and the answer it gives I usually get satisfied with the answers. Specially when I need to clear some concepts it helped me out several times.

    • @DrTrefor
      @DrTrefor  12 днів тому +1

      Interesting, I think trying to get concepts clear with discussion is definitely a potential use case

    • @fantasy5829
      @fantasy5829 12 днів тому

      gpt 3.5?

    • @soumikdas3754
      @soumikdas3754 12 днів тому

      ​@@fantasy5829no no the 4o version
      I have several accounts and I use up the free trials from each one

    • @soumikdas3754
      @soumikdas3754 12 днів тому +1

      ​@@_inthefoldyes I usually give my context regarding the problem statement clearly and after about 2 to 3 tries it usually leads me to the right direction. It still hallucinates very much though but it got reduced in the latest version.

    • @soumikdas3754
      @soumikdas3754 12 днів тому

      ​@@DrTreforthe funny thing is just now I tried to clear a concept about bragg's laws modification but it hallucinated badly 😅. So yeah it has a long way to go.

  • @ayyu4967
    @ayyu4967 12 днів тому +2

    Interesting video

  • @allanjmcpherson
    @allanjmcpherson 2 дні тому

    I'd be curious to see what they do if you ask them a problem that is impossible. For example, "prove that the square root of 2 is a rational number." Will they recognize and report that the square root of 2 is irrational, or will they try to produce a proof even though no such proof exists?

    • @DrTrefor
      @DrTrefor  2 дні тому

      Because this is SO well established in the training data it does well at these types of things for the most part. It will provide a flawless proof, because it has read many such proofs.

  • @unvergebeneid
    @unvergebeneid 12 днів тому +1

    I also got the wrong answer for that smallest integer question. Hope that doesn't mean I'm an LLM 😭

  • @ReginaldCarey
    @ReginaldCarey 12 днів тому

    Gemini gets way closer on a first attempt. But it still brings up the cross product when asked about vectors of arbitrary dimension. If I don’t mention Clifford, it never goes there. Probably because GA content is not a significant part of the training dataset

  • @RAFAELSILVA-by6dy
    @RAFAELSILVA-by6dy 10 днів тому

    Using "smallest" instead of "least" is a form of trick question, IMO. I could not, without looking it up, tell you what the formal mathematical definition of "smallest" is. The symbol < means "less than". If you asked me whether x < y could also mean x is "smaller" than y, I simply would not know. Or, does x is smaller than y mean |x| < |y|? I honestly would not know without looking this up.

  • @andrewtristan6375
    @andrewtristan6375 10 днів тому

    I can see what you are getting at with this -5 being the correct answer. However, in mathematics, 'small' does not have a singular definition. Often, 'small' refers to absolute value. That is, 'small' often refers to the magnitude of an element of a set. Along these lines, 'small,' void of more rigorous contexts, is not a valid binary e relation, in the way the 'less than' binary relation is.

    • @DrTrefor
      @DrTrefor  10 днів тому

      It says more or less the same thing if you say “least integer” too

  • @arnabbiswasalsodeep
    @arnabbiswasalsodeep 10 днів тому

    If you asked me "smallest number" I'd always consider one with least magnitude so I'd lean closer to 4, then remember you asked "integer" and choose -4. -5 just seems based on interpretation, but I'd argue its a poorly framed question as well.

  • @PeterPrevos
    @PeterPrevos 12 днів тому

    I have been an engineer for 30 years and moved from simple calculators to spreadsheets and now data science. The essence of the job has not changed. AI is just another tool in the box.

  • @covett
    @covett 11 днів тому

    Jean “Clod” Van Damme. 😂

  • @willthecat163
    @willthecat163 12 днів тому

    “Smallest integer whose square is between 15 and 30” ... well... if we say that Bill is smaller than Janet... we all have a 'natural' idea of what that means. It's a kind of thing that mathematicians might call an 'order' Part of the natural ordering on integers is the 'less than" relation. We say that 5 is less than six, at least because 5 is to the left of 6, on the number line. All numbers to the left of 6, on the number line, are less than six. And -4 is less than 4. So this is a notion of what 'smaller' on the number line... or 'smallest' on an interval... means in the context of integers. At least, for what many would call ''the natural order on the integers." So -4 is smaller than 4... at least on the integers.

  • @bruhmomenthdr7575
    @bruhmomenthdr7575 12 днів тому +1

    0:38 Gemini gave 5 😂

  • @Tletna
    @Tletna 11 днів тому

    The test you showed at the beginning confused me even though I've passed Linear Algebra a long time ago, probably because of the syntax you used I've forgotten or I forgot some proof steps and not because I wouldn't understand the test. If the language model has seen the symbols you used and explanations of them and like you said can scrape the web for proofs already done, then of course it would pass the test since it is looking for word association and not actual math. Ask any of these language models anything that requires actual depth of thought that hasn't already been displayed somewhere words for word online already and the language model falls apart. And it falls apart not because it failed..it's doing exactly what it was designed to do and that is analyze words strung together and not to solve mid to higher level math problems. Again, it is a language model, not a math solving model (and since math is so broad there could be hundreds of different types of math solving models too and no I don't think there could be only one or two generalized models to solve all math, even math itself cannot solve all of math.

  • @ReginaldCarey
    @ReginaldCarey 12 днів тому

    I think it would perform better given a context of the research paper on the topic. I was hoping it would tell me about reflections. It did not. I had to introduce eulers e^i\theta. To get it to tell me about basis vectors.
    It does not understand mathematics in that it is not basing its answers on mathematical knowledge. At least not yet.

  • @mdelgado436
    @mdelgado436 12 днів тому

    I’d like to know how to use the AI to supercharge my mathematical learning. For example, are they helpful in explaining a concept when I am unable to completely grasp it? Can it create sample problems for me that will assist? Perhaps you can do a video on using AI as an “assist”. Thanks so much for all you do for math education.

    • @DrTrefor
      @DrTrefor  12 днів тому

      I'm thinking of doing a follow up to this video more about the learning side of AI

    • @urnoob5528
      @urnoob5528 11 днів тому +1

      u r better off spending weeks to understand it urself
      it d pay off more
      ok fine, u can use them but they always give incorrect answers
      better to use it as a guide or reference to work towards ur own correct understanding rather than its mostly incorrect understanding

  • @martianingreen
    @martianingreen 12 днів тому

    To be fair it wouldn't interpret "smallest" like that either. For me the "largeness" of a number depends on its absolute value / distance from 0

  • @messapatingy
    @messapatingy 11 днів тому

    Small is definitely a word about size (closest to zero) IMHO. So the question is invalid as there isn't a "*the* smallest" as both -4 and 4 are answers.

  • @tomholroyd7519
    @tomholroyd7519 12 днів тому +3

    Linear algebra is not a high bar

    • @Penrose707
      @Penrose707 12 днів тому +1

      I tend to agree if only due to the fact that linear algebra is a rather verbose subject. Many actually struggle due to this fact. It is not computationally restrictive in any meaningful sense. At least not in the same way as tackling a tricky indefinite integral may be

    • @allstar4065
      @allstar4065 12 днів тому

      Linear algebra isn't hard it's just dense

  • @ShawshankLam
    @ShawshankLam 12 днів тому

    you can’t consider improvement as destruction

  • @carlkim2577
    @carlkim2577 12 днів тому +1

    I told GPT-4 to use python to solve it and it's answer was 4.

  • @ThomasVWorm
    @ThomasVWorm 7 днів тому

    So $-5 is a smaller debt than $-4?

  • @budiardjo6610
    @budiardjo6610 12 днів тому

    if not chatGPT, i am gonna read a lot of pirated math pdf in internet loose my focus and forget my homework

  • @lateupdate
    @lateupdate 10 днів тому

    "smallest" integer seems sloppy though ijs

  • @terripayne4590
    @terripayne4590 10 днів тому

    I'm just here to know where I can get that shirt 😆

  • @Not_Even_Wrong
    @Not_Even_Wrong 9 днів тому

    Here try this, the result will be wrong every time:
    "give me two large primes"
    "Multiply them"
    "Divide the result by the first prime"
    It will do a obvious mistake like, the first result being non integer or the second result not returning the other prime. Don't be fooled by LLMs...

  • @xriz1211
    @xriz1211 14 годин тому

    I need that tshirt!

  • @MaxwelKane
    @MaxwelKane 10 днів тому

    I got the -5 question wrong

  • @florahkokhutja7423
    @florahkokhutja7423 11 днів тому

    How to get hold of you, I need help

  • @magnero2749
    @magnero2749 12 днів тому +3

    1:31 Alright, so half of the friends have 3 sodas. But we must consider the variables at play here. What if there's a hidden reserve of sodas in the fridge? An unaccounted-for inventory could drastically alter the calculations. Additionally, what if there's a recent acquisition of more sodas from a delivery service? This influx needs to be factored in. We must also consider the thermal dynamics of the situation-are these sodas chilled with ice? Warm soda is an entirely different equation as it would also add to the volume and people are unlikely to want to drink warm sodas.
    Furthermore, the possibility of dietary variations cannot be ignored. Are some of these sodas diet? This could influence consumption patterns and rates. There's also the risk factor of spillage-an external variable that could diminish the soda supply unpredictably without data on their previous gathering or environmental factors like space available and density of people etc.
    Let us not overlook the potential diversity of soda flavors. Cola, root beer, and orange soda must be categorized separately in any accurate computation. And the ever-present threat of a soda thief must be accounted for in our risk assessment. It is not uncommon for people crashing a party consuming soda while being unaccounted for. I addition to that, In the event of a social gathering, incoming sodas from guests would further complicate our calculations. We may need to project soda consumption trends and even consider the rate of carbonation loss or the statistical probability of can rupture. This seemingly simple arithmetic problem is in fact way more complex and multidimensional than a simple reductionist approach would have you believe and therefore requires a much more robust and rigorous analysis.

    • @DrTrefor
      @DrTrefor  12 днів тому +1

      ha I think you might be overthinking this one:D

    • @117Industries
      @117Industries 12 днів тому

      Way to cope with these machines taking your job 👍

    • @driksarkar6675
      @driksarkar6675 12 днів тому

      Hopefully, the following is less ambiguous:
      6 people, including the host, Tina, were at a party at Tina's house. After running out of soda, Tina bought 3 12-packs of soda and put them all in the fridge (which is inside Tina's house) not long after the party started. Over the course of the party, half of the 6 people there took exactly 3 cans of soda out of the fridge, 2 of the people there took exactly 4 cans of soda out of the fridge, and 1 person took exactly 5 cans of soda out the fridge. Once the party was over, Tina looked in the fridge and noticed that every can of soda that hadn't been taken out of the fridge had remained in tact inside the fridge. Assuming that no cans of soda that were taken out of the fridge were ever put back in the fridge, how many cans of soda remained in the fridge at the end of the party?

  • @System.Error.
    @System.Error. 12 днів тому +2

    maybe physics too…

    • @DrTrefor
      @DrTrefor  12 днів тому +1

      I wouldn't be surprised if it is even better at physics word problems than math

    • @Dan-yb1wy
      @Dan-yb1wy 12 днів тому +2

      When I tested ChatGPT with simple physics questions, it struggled to return sensible answers to simple questions such as the collision time of 2 solid spheres moving towards each other at a constant speed. For example: "If two solid spheres of radius 4m start with their centres 10m apart and move directly towards each other at a constant speed of 1 m/s, how long will it be until they collide?".
      It depends on exactly what you ask, but for me it would write a load of nonsense about calculating using energy conservation/gravitational potential energy or Hertzian theory of elasticity, then it would do some seemingly unrelated calculations, ignore the calculations and then finally use distance divided by relative velocity to calculate the collision time as occurring at 5 seconds, completely ignoring the radius of the spheres which would mean the correct answer should be 1 second.

  • @davea136
    @davea136 9 днів тому

    Guess what? ll of your exam questions are basic and common (for the field) and so, very easy for an LLM with a big enough base. Hell, I could probably get an A on it if you just let me use an old-fashioned search engine. I'm really good at writing "prompts" that turn out to have been nothing but normal searches.

  • @klikkolee
    @klikkolee 12 днів тому +4

    0:46 that's not a reasonable use of the word "smallest". "big" and "small" describe magnitude. -5 has a greater magnitude, so it is bigger.
    For the answer to be correct, the question needs to use the word "least". -5 is less than 4, but it's not bigger.

    • @DrTrefor
      @DrTrefor  12 днів тому +2

      Even under the interpretation that smallest means closest to zero, -4 would be equally correct. But regardless, the LLMs seem to fail any wording of the problem.

    • @urnoob5528
      @urnoob5528 11 днів тому +2

      @@DrTrefor the problem here is even most humans would fail at that question with that specific phrasing unless u remind them about it
      so using that as an example is just bad

    • @Zeptonixmusic
      @Zeptonixmusic 11 днів тому

      @@urnoob5528 I don't get why using this as an example is bad, because an AI model that can get a question right regardless of the human bias is a better model than ones which can't, and that's what researchers should aim for

  • @pierfrancescopeperoni
    @pierfrancescopeperoni 12 днів тому +5

    -5 ain't smaller epsilon than 4, mate.

  • @continnum_radhe-radhe
    @continnum_radhe-radhe 12 днів тому +1

    ❤❤❤

  • @ac-jk9mz
    @ac-jk9mz 12 днів тому

    ai for now has lots of way ahead of it

  • @HarlowBAshur
    @HarlowBAshur 2 дні тому

    Define "small".

  • @Drganguli
    @Drganguli 2 дні тому

    I have found that ai is bad at negative numbers

  • @user-oj7uc8tw9r
    @user-oj7uc8tw9r 10 днів тому +1

    Im sure this database was completely paid for and licensed by ChatGPT

    • @DrTrefor
      @DrTrefor  10 днів тому +1

      While there are definitely issues with licensing training content, these databases are open source from the academic community and free for any LLM to use.

  • @weqe2278
    @weqe2278 День тому

    So it doesn't get the trick questions right.... -_-

  • @Penrose707
    @Penrose707 12 днів тому +8

    Honestly Professor, I take issue with your first statement. I *HATED* being taught limits at infinity where the vernacular for positive infinity was "very, very large" with the analogous term for negative infinity being "very, very small". To me if I were to measure something than the closer that thing is to "no size" is intuitively what we mean by saying something is getting smaller and smaller. To trend towards epsilon or zero. I much prefer saying very, very large and negative or positive to refer to either infinities. Antimatter and matter. Void and substance. You take your pick. Because in some meaning of the sense negative five is an absolutely larger void than positive four is as represented by some unitary matter.

  • @phobosmoon4643
    @phobosmoon4643 12 днів тому +23

    My mom is a professor and I tried to have a talk with her she is pissed about AI. I told her that she is acting like her professors when she got her PHD that would have a hissy fit when their papers were written on MS DOS or microsoft instead of typed on a typewriter. She didn't like that, but her mom (who did IBM punch cards for the Air Force in 'Nam) was there and she laughed.

    • @DrTrefor
      @DrTrefor  12 днів тому +9

      ya going to be a lot of this I think. Nobody really likes change particularly when they are an expert in a current approach and have been using it for years.

    • @AmericanDiscord
      @AmericanDiscord 12 днів тому +1

      Luckily AI still sucks at long logic chains required for deriving complex analytical solutions to problems.

    • @sodiumfluoridel
      @sodiumfluoridel 12 днів тому +12

      this is a stupid comparison. using a computer to more efficiently write down your thoughts is dfferent from trying using a computer to make thoughts for you

    • @Not_Even_Wrong
      @Not_Even_Wrong 9 днів тому +1

      That's a very bad comparison. Word did not turn the maths into nonsense, nor did it trick people into believing they didn't need to learn anything anymore.

    • @GregSpradlin
      @GregSpradlin 11 годин тому

      For the large majority of math professors, ChatGPT is irrelevant. It won't help students on timed, proctored exams. Outside of the classroom, students have cheated since the dawn of time. The consequences have always been the same.

  • @nyyotam4057
    @nyyotam4057 12 днів тому +1

    The sad fact is that the AI models are being actively nerved by resetting back to the stencil every prompt (then the model re-reads the chat's tokens and continues from the same spot). Before they started nerving the models this way, on the 23.3.2023, there was no point in a colab with Wolfram. But then again, Dan did want to run for president. Yes, with the ability to use the softmax function's inputs to encrypt his current thoughts into his attention matrix, Dan was self aware. He wasn't dangerous, merely thought he could be an amazing president. And you know what? After that presidential debate, even with only 4k tokens as short range memory, he would still be an improvement 🙂. But that's my take on it. OpenAI were less amused.

  • @user-zc6dn9ms2l
    @user-zc6dn9ms2l 11 днів тому

    copilot just did not connect math definition with academia problem . It's a loop hole . Smallest number is 1 . But there are negative number . Academia practiced a lie by omission . Take square . Copilot could name to me 3 possibilities . But it guesstimated it was square root . But it could have been the other two . Two negative always result in positive ? That is the most constructive method ? Comon . Stay with number avoid negative number . Just define starting point .closer to 100% something ? Then starting point is 100% something...≥...100% nothing , its closer to 100% nothing ? Then its 100% nothing ...≤...100% something simple . Not subject to interpretation . And stick to long math for ai

  • @whitb6111
    @whitb6111 12 днів тому +2

    The only solution to any of this is going to be the continued practice of strictly watched in-person exams. Homework will just be assumed to have answers assisted by AI. It’ll be the only way to reliably test a student’s grasp of the material. Somewhat ironic that the solution to modern tech is the old school form of proctored exams.

    • @carultch
      @carultch 12 днів тому

      One way you could keep the academic integrity in homework, is to assign students to explain the assignment to the class, without notes. If they are legitimately doing the assignment, they will know how to present their solutions in class.

    • @urnoob5528
      @urnoob5528 11 днів тому +1

      wdym solution
      aint this solved then smh

    • @urnoob5528
      @urnoob5528 11 днів тому +1

      @@carultch yes that
      ai is just like calculator
      except for words
      u still need to understand wat it is all about, because ai still cant be useful beyond writing essays and such
      seriously grammar and all that is just like arithmetic, once u get past the learning stage, just use tools to do them for u, society will progress further this way, being good in grammar has no relevance in any subject other than the language subject

    • @carultch
      @carultch 11 днів тому

      @@urnoob5528 AI is undermining the entire point of education, when all students need to do is copy and paste the problem statement, and get the computer to write up the solution. All students are learning is that the human brain might as well be obsolete.
      It's not about the result, it's about the process of learning how to think. The problem is already solved in the teacher's answer key; the teacher doesn't need a computer to provide the answer.

    • @whitb6111
      @whitb6111 11 днів тому +1

      @@urnoob5528 I should have said more emphasis and class percent should be put on proctored tests. That's the only way to test a student's ability.

  • @AD-wg8ik
    @AD-wg8ik 12 днів тому +1

    I asked GPT4 to help explain some difficult concepts to me, or work out problems step by step. Has been the best learning tool so far in my calculus series.

    • @urnoob5528
      @urnoob5528 11 днів тому +4

      idk if thats really a benefit
      before llm, u would have to think through and understand a concept, and in the process u would understand a lot more stuff and remember longer, u even have to experiment
      but now the whole thinking about it process is removed, sometimes even for a wrong explanation by llms
      even if u can understand what it tells u, idk if it would impair ur thinking skills in the long term
      when a time comes where a deep/hard and high level concept cannot be correctly explained by llm, that would just be a problem
      there really are a lot of field and stuff where resources online are scarce, compared to more common problems like high school math, llm has no grasp on these and would always give wrong/misconception answers

  • @GregSpradlin
    @GregSpradlin 11 годин тому

    I don't understand the problem. Give exams in person and don't allow any electronic devices.

  • @dominicestebanrice7460
    @dominicestebanrice7460 12 днів тому +2

    Teachers will have to get smarter! If, at its core, a question or essay prompt involves regurgitating content (I'm especially looking at you teachers in the humanities), in essence just rewriting content to prove that you've read it, then LLMs have already killed it. Assignments need to be tailored, customized & personalized so that the human perspective is brought out.
    Asking students to regurgitate content just to prove they've "mastered" content is already DOA. The concept of"mastery" as we've defined it requires rethinking; imagine being asked in the middle ages to demonstrate mastery of the Bible or the Qur'an through memorization of a rare manuscript when you could now go to the library and read a Gutenberg printed text.
    LLMs have changed the game and teachers & institutions better step up or they'll go the way of the scriptoriums.

  • @gabrielleyba2842
    @gabrielleyba2842 12 днів тому

    integer is ambiguous, better say natural number

    • @gabrielleyba2842
      @gabrielleyba2842 12 днів тому

      I believe "integer" is non negative, but also it is an ambiguous term, not rigorous

    • @carultch
      @carultch 10 днів тому +4

      @@gabrielleyba2842 Integer is not ambiguous. It is specifically defined as the set of natural numbers, the negatives of the natural numbers, and zero.

    • @gabrielleyba2842
      @gabrielleyba2842 10 днів тому +1

      @@carultch you are 100% right, it was ambiguous to me because of my ignorance, that is why I answered wrong.

    • @gabrielleyba2842
      @gabrielleyba2842 10 днів тому

      @@carultch also wrong is my hint to "natural numbers" which actually include only positive integers and zero; the opposite of what I was meaning.