AlphaGeometry: Solving olympiad geometry without human demonstrations (Paper Explained)

Поділитися
Вставка
  • Опубліковано 17 тра 2024
  • #deepmind #alphageometry #llm
    AlphaGeometry is a combination of a symbolic solver and a large language model by Google DeepMind that tackles IMO geometry questions without any human-generated trainind data.
    OUTLINE:
    0:00 - Introduction
    1:30 - Problem Statement
    7:30 - Core Contribution: Synthetic Data Generation
    9:30 - Sampling Premises
    13:00 - Symbolic Deduction
    17:00 - Traceback
    19:00 - Auxiliary Construction
    25:20 - Experimental Results
    32:00 - Problem Representation
    34:30 - Final Comments
    Paper: www.nature.com/articles/s4158...
    Abstract:
    Proving mathematical theorems at the olympiad level represents a notable milestone in human-level automated reasoning1,2,3,4, owing to their reputed difficulty among the world’s best talents in pre-university mathematics. Current machine-learning approaches, however, are not applicable to most mathematical domains owing to the high cost of translating human proofs into machine-verifiable format. The problem is even worse for geometry because of its unique translation challenges1,5, resulting in severe scarcity of training data. We propose AlphaGeometry, a theorem prover for Euclidean plane geometry that sidesteps the need for human demonstrations by synthesizing millions of theorems and proofs across different levels of complexity. AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. On a test set of 30 latest olympiad-level problems, AlphaGeometry solves 25, outperforming the previous best method that only solves ten problems and approaching the performance of an average International Mathematical Olympiad (IMO) gold medallist. Notably, AlphaGeometry produces human-readable proofs, solves all geometry problems in the IMO 2000 and 2015 under human expert evaluation and discovers a generalized version of a translated IMO theorem in 2004.
    Authors: Trieu H. Trinh, Yuhuai Wu, Quoc V. Le, He He & Thang Luong
    Links:
    Homepage: ykilcher.com
    Merch: ykilcher.com/merch
    UA-cam: / yannickilcher
    Twitter: / ykilcher
    Discord: ykilcher.com/discord
    LinkedIn: / ykilcher
    If you want to support me, the best thing to do is to share out the content :)
    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
    SubscribeStar: www.subscribestar.com/yannick...
    Patreon: / yannickilcher
    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
  • Наука та технологія

КОМЕНТАРІ • 70

  • @YannicKilcher
    @YannicKilcher  3 місяці тому +8

    OUTLINE:
    0:00 - Introduction
    1:30 - Problem Statement
    7:30 - Core Contribution: Synthetic Data Generation
    9:30 - Sampling Premises
    13:00 - Symbolic Deduction
    17:00 - Traceback
    19:00 - Auxiliary Construction
    25:20 - Experimental Results
    32:00 - Problem Representation
    34:30 - Final Comments

  • @paxdriver
    @paxdriver 3 місяці тому +28

    Thanks Yan, I love these paper review videos.

  • @AndyHOTlife
    @AndyHOTlife 3 місяці тому +9

    I remember that back in school most of the people my age had difficulties with geometry specifically due to these extra constructions. Some had such difficulties with these that several jokes inevitably emerged. It is very interesting to see that, similar to school pupils, an ML model also struggles with these. But unlike school pupils with their geometry homework, an ML model living in the cloud has enough compute power to propose and reason about several constructions in the blink of an eye. Thank you for the video!

  • @DrDanielCho-Kee
    @DrDanielCho-Kee Місяць тому +2

    Great video very insightful! What programs do you use to record your videos? I love the paper backdrop with annotations with you at the bottom!

  • @meinbherpieg4723
    @meinbherpieg4723 Місяць тому +1

    This video is criminally underappreciated. You have succinctly covered some of the most nuanced aspects of cutting edge AI research. Thank you.

  • @piratepartyftw
    @piratepartyftw 3 місяці тому +27

    I expect this time next year (or sooner?), we'll be reading a similar paper where someone applied this kind of method to mathematics more broadly, leveraging something like the Lean language. It might be Coq or Isabelle instead, though, as those languages have more mature "autoprovers" of the kind used in this paper.

    • @piratepartyftw
      @piratepartyftw 3 місяці тому +6

      My money's on Lean, long term. It's growing fastest and has the best developer ecosystem, and it's got a cohesive single library of proofs that can act as training data. The thing its missing is a good autoprover and a good tokenizer (it's hard to tokenize because it uses tons of unicode), but those are both actively being developed.

    • @HughBlackstone-tm6bw
      @HughBlackstone-tm6bw 3 місяці тому

      That's on God

  • @Buddharta
    @Buddharta 3 місяці тому +22

    Geometry facts:
    The first trangle is isoceles not equilateral.
    The triagles A and B at around 14:00 are congruent (have the same angles).
    Four points are cyclic (are in a circle) if the sum of opposite angles are 180°.

    • @warrenarnold
      @warrenarnold 3 місяці тому +3

      Good thing you kept attention in class coz that thing is still coming to replace you 💔

    • @costadekiko
      @costadekiko 3 місяці тому

      Adding here that opposite angles summing to 180 is only one of the criteria for cyclic quadrilaterals, and is the one that applies for EADH (two right angles E and D).
      The one that most easily applies to EBCD: a side of the quadrilateral is “under” the other two points with equal angles (in this case BC is “under” E and D with right angles).

    • @RajarshiBose
      @RajarshiBose 3 місяці тому

      @Buddharta congruent doesn't only mean triangles with same angles though they will have same angle. If two triangle are congruent, you can place one triangle completely on other one.(they are equal in terms of every thing, sides angles). If angles are same , triangles are called similar.
      Two similar triangle might not be congruent.

    • @Buddharta
      @Buddharta 3 місяці тому

      @@RajarshiBose true, the triangles are similar and not congruent. I confused the definitions

  • @mahiainti678
    @mahiainti678 3 місяці тому +9

    Very interesting paper. While quite limited, this approach leaves no space for any hallucinations/inaccuracies which is a great plus. I'm interested how/where it can be further applied.

    • @asdfasdfasdf1218
      @asdfasdfasdf1218 3 місяці тому +2

      Not much further yet because it's narrow, it can only find auxiliary points for 2D geometry problems. I'm sure deepmind is working on AI for math in general though, but it'll likely be in increments, step-by-step. But it'll definitely be huge if it can ever master all of math, not just 2D geometry.

  • @-mwolf
    @-mwolf 3 місяці тому +8

    There might be an application in chip design actually...?

  • @googleyoutubechannel8554
    @googleyoutubechannel8554 3 місяці тому +2

    The search space for these geometry problems, if you go about it even marginally reasonably, seems very small compared with the scale of data processing available in modern times needed for say 1024x1024 diffusion, seems very reasonable to 'solve' these tightly constrained problems.

  • @oncedidactic
    @oncedidactic 3 місяці тому +2

    Imagine a Go board of infinite size with randomly initialized "walls" defining a usable play area. This is alphaGo for that.

  • @ai-interview-questions
    @ai-interview-questions 3 місяці тому +1

    Thank you, Yan!

  • @boffo25
    @boffo25 3 місяці тому +3

    Beautiful lesson

  • @greengoblin9567
    @greengoblin9567 3 місяці тому +8

    If the large language model is somehow able to change the symbolic engine and synthetic data, it can be adapted to any problem. For example the language model can switch between different symbolic engines that it has created on will continuously for better logical reasoning at inference.

    • @wege8409
      @wege8409 3 місяці тому +9

      The researchers in the paper said that readers should be advised that it may be difficult to adapt this solution to other domains. Geometry has really nice properties that allows them to do it this way. I'm sure we'll figure out something soon though.

    • @greengoblin9567
      @greengoblin9567 3 місяці тому +3

      @@wege8409 I am talking about other domains of math. Basically, you can have a library of symbolic engines that are finely tuned, and you can toggle between them.

    • @wurstelei1356
      @wurstelei1356 3 місяці тому

      @@greengoblin9567 The paper also stated, that it required a lot compute power to get to that level. Would be interesting if the weights of this are released. I can find the Github page with code but no weights. Then one could try to adopt other domains.

    • @Gingnose
      @Gingnose 3 місяці тому

      I believe geometry has carry over through many fields of mathematics tho@@wege8409

    • @andreaterlizzi
      @andreaterlizzi 3 місяці тому +1

      ​@@greengoblin9567 ​The problem is that other domains of math (e.g. calculus, linear algebra, differential topology) are very different from Euclidean geometry. The latter is a probably a self-complete logical system, and is MUCH less expressive than e.g. calculus, in Gödelian terms. This is because other (and all the relevant) domains of math are probably logically incomplete by the Gödel theorem, so it's impossible to have a set of statements that you can sample from to get a reasonable coverage of the domain. All the things that AlphaGeometry does with, say, 90% accuracy, a symbolic solver could have done efficiently with 100% accuracy (and that's why they could have exploited it for training the LLM).

  • @LeePenkman
    @LeePenkman 3 місяці тому +3

    I'm thinking most problem domains, even code can be considered following long reasoning chains but would likely involve fuzzy review steps like LLM code review etc as well to help speed it up

  • @AntonMilan
    @AntonMilan 3 місяці тому

    Is the sole purpose of the LLM to suggest which construct to sample at each step? If so, why use an LLM at all? Is it because it is able to better approximate the optimal sampling strategy?

  • @lerpmmo
    @lerpmmo 2 місяці тому

    We can apply same concept to train to write software like devin ai?

  • @albinoameise
    @albinoameise 24 дні тому

    Can someone shed some light on why to use a LLM instead of some Q-Learning approach with a fixed action space? I don't see the reason why AlphaGeometry relies on LLMs to search the best action...

  • @onurozkan1077
    @onurozkan1077 3 місяці тому

    I think that reverse engineering could largely benefit from this approach(going from assembly to a language) especially if the target language is known(and imperative). Then a compiler becomes a solver and statements/expressions become premises(constructs if you will) in that sense. However i may be wrong...

  • @robertfontaine3650
    @robertfontaine3650 3 місяці тому +1

    using agents and llms to drive tactical changes to achieve objectives seems to be everyone's trick of the week.

  • @kaikapioka9711
    @kaikapioka9711 3 місяці тому

    Thz!

  • @volkerengels5298
    @volkerengels5298 3 місяці тому

    Can't think that proper - but this thing may help reasoning...

  • @paxdriver
    @paxdriver 3 місяці тому +2

    Isn't that an "isocolese" or something like that?

    • @user-wd8wx5md5z
      @user-wd8wx5md5z 3 місяці тому +1

      Yeah, an Isoscele triangle. Yannick is not a highschool math teacher hahaha

    • @ekstrapolatoraproksymujacy412
      @ekstrapolatoraproksymujacy412 3 місяці тому +1

      ​@@user-wd8wx5md5z Yannich is not a native english speaker, I'm not one either and I didn't know it was called that in english either, but of course we both know what it is lol

    • @paxdriver
      @paxdriver 3 місяці тому

      It might actually be equilateral, i'm not a highschool math teacher either and i am a native english speaker lol

  • @AngouSana69
    @AngouSana69 3 місяці тому +3

    Every video makes my eyes water, I'm not used to light.

    • @Hexanitrobenzene
      @Hexanitrobenzene 3 місяці тому

      Yeah, I use grey background for pdfs. I don't know any trick for videos, though...

  • @luke2642
    @luke2642 3 місяці тому

    This video has saddle geometry: it starts off with an exciting premise then rapidly becomes incredibly disappointing, a classical algorithm would be fine, then recovers at the end if the compute budget really was modest.

  • @amafuji
    @amafuji 3 місяці тому +5

    Stable Diffusion occasionally solves non-euclidian geometry. It's horrifying

    • @drdca8263
      @drdca8263 3 місяці тому

      Like, hyperbolic geometry, or spherical geometry, or...?

    • @leolrg2783
      @leolrg2783 3 місяці тому +1

      riemann geometry

    • @drdca8263
      @drdca8263 3 місяці тому

      @@leolrg2783 hm? That’s more of a general topic than a specific geometry though?
      Like, Euclidean geometry, spherical geometry, and hyperbolic geometry, are all described within it?

    • @Hexanitrobenzene
      @Hexanitrobenzene 3 місяці тому +2

      @amafuji
      Could you provide some links ? Because your statement is rather bold.

    • @timmygilbert4102
      @timmygilbert4102 3 місяці тому +1

      People... That's a joke about how mangled dime generation are 😂

  • @theosalmon
    @theosalmon 3 місяці тому +5

    ... suggest you capture a new profile picture around 8:20.

  • @user-if1ly5sn5f
    @user-if1ly5sn5f 3 місяці тому

    Should look into sacred geometry, it sounds weird but if you understand, which its simple, then you can use its understanding like the ai to see things not in our current. Kinda like how the wright brothers saw a plane in the pieces of reality assisted by math and language and such to help find it.

  • @nevokrien95
    @nevokrien95 3 місяці тому

    This is ridiculously impressive olympiad is top mathematicians so an ai doing it is a HUGE break through by itself

  • @nikitastaf1996
    @nikitastaf1996 3 місяці тому +1

    Take my words with grain of salt. I have deduced it with Claude's help. They are not solving imo per se. What they say with this paper:we can now reliably generate shitton amount of synthetic data(at least for geometry). We use extremely small model (it has around 1gb in size) that is stupid to help us with essentially bruteforce approach to solve geometry. Next my predictions:If we apply this approach to large model and on top of it we apply something like chain of thought (which they essentially do in this paper but they use bruteforce solver) it will have significantly higher performance on geometry problems than any human alive.

  • @atypocrat1779
    @atypocrat1779 3 місяці тому +7

    0:17: 📐 Innovative paper introduces Alpha Geometry model for solving challenging geometry problems without human demonstrations.
    5:18: 🔍 Solving complex geometry problem without human demonstrations requires specific language model and training.
    10:27: ⚙ Automated geometry problem solving using algebraic reasoning and a small set of representations.
    15:57: 🔍 Logical deduction and proof process in geometry problem solving.
    20:59: ⚙ Symbolic deduction and traceback in problem solving process.
    25:58: 📝 Advanced system solves math Olympiad problems with lengthy proofs, outperforming previous methods.
    31:32: ⚙ Challenges in adapting math problems to a specialized language for proof steps and the correlation of problem difficulty with the number of required steps.
    Recap by Tammy AI

    • @vzxvzvcxasd7109
      @vzxvzvcxasd7109 3 місяці тому +2

      I was gonna say, that reply was way too quick.
      Ai

    • @notu483
      @notu483 3 місяці тому

      Is Tammy better than Harpa?

  • @nevokrien95
    @nevokrien95 3 місяці тому

    Isnt 25 75% of 30...

    • @Hexanitrobenzene
      @Hexanitrobenzene 3 місяці тому

      No ? 25/30 = 5/6 =~0.8333...

    • @ShA-ib1em
      @ShA-ib1em 2 місяці тому +2

      the researchers seem to say that their model can solve 75% of the problems and then they say they have solved 25 out of the 30 problems in the competition.
      However they seem to mean that 75% of all the problems can be represented by using the language that they choose to use.
      And then they say they have included only the problems that can be represented by using the language they choose to use.
      And they solved 25 of the 30 problems they tested the model on ...

  • @ruchirgarg2725
    @ruchirgarg2725 3 місяці тому +1

    Get a grip on geometry sensei.

  • @moeinhasani8718
    @moeinhasani8718 3 місяці тому

    I wonder why a group like this still cares about getting their papers published in journals. They were done with this project in April and because of the paper, they released the code in December. That's roughly 9 months that the community could have accessed the source code but they waited on a journal acceptance.

  • @Ori-lp2fm
    @Ori-lp2fm 3 місяці тому

    That child molster finally losing his hairc

  • @telesniper2
    @telesniper2 3 місяці тому +1

    Wow they discovered how to do backtracking, but the hard way. LOL nubs. I love "research" that is just like nubbing for stock pumps

    • @lilithstenhouse267
      @lilithstenhouse267 3 місяці тому

      Isn't the key that they managed to reduce the search space on a problem with an infinite search domain down far enough that it can be backtracked? It's certainly not widely applicable yet, but it's potentially a very powerful premise

    • @Hexanitrobenzene
      @Hexanitrobenzene 3 місяці тому

      @@lilithstenhouse267
      Yes, @telesniper2 is too dismissive for some reason. Brute force search would run into combinatorial explosion very quickly. LLMs are precisely for reducing the search space, that is, suggesting promising solutions.