Is the Future of Linear Algebra.. Random?

Поділитися
Вставка
  • Опубліковано 1 тра 2024
  • The machine learning consultancy: truetheta.io
    "Randomization is arguably the most exciting and innovative idea to have hit linear algebra in a long time." - First line of the Blendenpik paper, H. Avron et al.
    Follow up post: truetheta.io/concepts/linear-...
    SOCIAL MEDIA
    LinkedIn : / dj-rich-90b91753
    Twitter : / duanejrich
    Github: github.com/Duane321
    SUPPORT
    / mutualinformation
    SOURCES
    Source [1] is the paper that caused me to create this video. [3], [7] and [8] provided a broad and technical view of randomization as a strategy for NLA. [9] and [12] informed me about the history of NLA. [2], [4], [5], [6], [10], [11], [13] and [14] provide concrete algorithms demonstrating the utility of randomization.
    [1] Murray et al. Randomized Numerical Linear Algebra. arXiv:2302.11474v2 2023
    [2] Melnichenko et al. CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT). arXiv:2311.08316v1 2023
    [3] P. Drineas and M. Mahoney. RandNLA: Randomized Numerical Linear Algebra. Communications of the ACM. 2016
    [4] N. Halko, P. Martinsson, and J. Tropp. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. arXiv:0909.4061v2 2010
    [5] Tropp et al. Fixed Rank Approximation of a Positive-Semidefinite Matrix from Streaming Data. NeurIPS Proceedings. 2017
    [6] X. Meng, M. Saunders, and M. Mahoney. LSRN: A Parallel Iterative Solver for Strongly Over- Or Underdetermined Systems. SIAM 2014
    [7] D. Woodruff. Sketching as a Tool for Numerical Linear Algebra. IBM Research Almaden. 2015
    [8] M. Mahoney. Randomized Algorithms for Matrices and Data. arXiv:1104.5557v3. 2011
    [9] G. Golub and H van der Vorst. Eigenvalue Computation in the 20th Century. Journal of Computational and Applied Mathematics. 2000
    [10] J. Duersch and M. Gu. Randomized QR with Column Pivoting. arXiv:1509.06820v2 2017
    [11] Erichson et al. Randomized Matrix Decompositions Using R. Journal of Statistical Software. 2019
    [12] J. Gentle et al. Software for Numerical Linear Algebra. Springer. 2017
    [13] H. Avron, P. Maymounkov, and S. Toledo. Blendenpik: Supercharging LAPACK's Least-Squares Solver. Siam. 2010
    [14] M. Mahoney and P. Drineas. CUR Matrix Decompositions for Improved Data Analysis. Proceedings of the National Academy of Sciences. 2009
    TIMESTAMPS
    0:00 Significance of Numerical Linear Algebra (NLA)
    1:35 The Paper
    2:20 What is Linear Algebra?
    5:57 What is Numerical Linear Algebra?
    8:53 Some History
    12:22 A Quick Tour of the Current Software Landscape
    13:42 NLA Efficiency
    16:06 Rand NLA's Efficiency
    18:38 What is NLA doing (generally)?
    20:11 Rand NLA Performance
    26:24 What is NLA doing (a little less generally)?
    31:30 A New Software Pillar
    32:43 Why is Rand NLA Exceptional?
    34:01 Follow Up Post and Thank You's

КОМЕНТАРІ • 351

  • @charilaosmylonas5046
    @charilaosmylonas5046 23 дні тому +229

    Great video! I want to add a couple of references to what you mentioned in the video related to neural networks:
    1. Ali Rahimi got the Neurips 2017 "test of time" award for a method called - Random kitchen sinks (kernel method with random features).
    2. Choromansky (from Google) made a variation of this idea to alleviate the quadratic memory cost of self-attention in transformers (which also works like a charm - I tried it myself, and I'm still perplexed how it didn't become one of the main efficiency improvements for transformers.). Check "retrinking attention with performers".
    Thank you for the great work on the video - keep them coming please! :)

    • @howuhh8960
      @howuhh8960 22 дні тому +8

      it didn't because all efficient variations have significantly worse performance on retrieval tasks (associative recall for example), as all recent papers demonstrated

    • @Arithryka
      @Arithryka 18 днів тому

      The Quadratic Memory Cost of Self-Attention in Transformers is my new band name

  • @octavianova1300
    @octavianova1300 23 дні тому +668

    reminds me of that episode of veggie tales when larry was like "in the future, linear algebra will be randomly generated!"

    • @NoNameAtAll2
      @NoNameAtAll2 23 дні тому +41

      W E E D E A T E R

    • @rileymurray7437
      @rileymurray7437 22 дні тому +11

      Reminds you of what???

    • @jedediahjehoshaphat
      @jedediahjehoshaphat 22 дні тому +6

      xD

    • @Godfather-qr6ej
      @Godfather-qr6ej 21 день тому +3

      I thought it would be some nice science show, but it turns out to be some kids show : (

    • @notsojharedtroll23
      @notsojharedtroll23 21 день тому

      ​@@rileymurray7437 he means this video: ua-cam.com/video/j4Ph02gzqmY/v-deo.htmlsi=wb2atwfoSQaefrjL

  • @BJ52091
    @BJ52091 23 дні тому +393

    As a mathematician specializing in probability and random processes, I approve this message. N thumbs up where N ranges between 2.01 and 1.99 with 99% confidence!

    • @Mutual_Information
      @Mutual_Information  23 дні тому +31

      Great to have you here!

    • @purungo
      @purungo 21 день тому +32

      So you're saying there's a 1 chance in roughly 10^16300 that you're giving him 3 thumbs up...

    • @frankjohnson123
      @frankjohnson123 19 днів тому +5

      My brother in Christ, use a discrete probability distribution.

    • @nile6076
      @nile6076 18 днів тому +11

      Only if you assume a normal distribution! ​@@purungo

    • @sylv256
      @sylv256 17 днів тому +2

      Is this just one big late april fool's? What the hell

  • @laurenwrubleski7204
    @laurenwrubleski7204 21 день тому +224

    As a developer at AMD I feel somewhat obligated to note we have an equivalent to cuBLAS called rocBLAS, as well as an interface layer hipBLAS designed to compile code to make use of either AMD or NVIDIA GPUs.

    • @sucim
      @sucim 19 днів тому +14

      but can your cards train imagenet without crashing?

    • @389martijn
      @389martijn 18 днів тому +10

      ​@@sucimsheeeeeeeeesh

    • @johnothwolo
      @johnothwolo 18 днів тому

      Are you guys hiring?

    • @Zoragna
      @Zoragna 18 днів тому

      OP forgot about BLAS being a standard so most implementations have been forgotten, it's weird to point at Nvidia

    • @cannaroe1213
      @cannaroe1213 18 днів тому +7

      As an AMD customer who recently bought a 6950XT for €600, I am disappointed to learn rocBLAS is not supported on my outdated 2 year old hardware.

  • @TimL_
    @TimL_ 23 дні тому +107

    The part about matrix multiplication reminded me of studying cache hit and miss patterns in university. Interesting video.

  • @charlesloeffler333
    @charlesloeffler333 22 дні тому +46

    Another tidbit about LinPack: One of its major strengths at the time it was written was that all of its double precision algorithms were truly double precision. At that time other packages often had double precision calculations hidden within the single precision routines where as their double precision counter parts did not have quad-precision parts anywhere inside. The LinPack folks were extraordinarily concerned about numerical precision in all routines. It was a great package.
    It also provided the basis for Matlab

  • @scottmiller2591
    @scottmiller2591 22 дні тому +73

    Brunton, Kutz et al. in the paper you mentioned here "Randomized Matrix Decompositions using R," recommended in their paper using Nathan Halko's algo, developed at the CU Math department. B&K give some timing data, but the time and memory complexity were already computed by Halko, and he had implemented it in MATLAB for his paper - B&K ported it to R. Halko's paper from 2009 "FINDING STRUCTURE WITH RANDOMNESS: STOCHASTIC ALGORITHMS FOR CONSTRUCTING APPROXIMATE MATRIX DECOMPOSITIONS" laid this all out 7 years before the first draft of the B&K paper you referenced. Halko's office was a mile down the road from me at that time, and I implemented Python and R code based on his work (it was used in medical products, and my employer didn't let us publish). It does work quite well.

    • @Mutual_Information
      @Mutual_Information  22 дні тому +13

      Very cool! The more I researched this, the more I realized the subject was deeper (older too) than I had realized with the first few papers I read. It's interest to hear your on-the-ground experience of it, and I'm glad the video got your attention.

    • @ajarivas72
      @ajarivas72 10 днів тому

      @@Mutual_Information
      Has anyone tried genetic algorithms instead of purely random approches?
      In my experience, genetic algorithms are 100 faster than Monte Carlo simulations to obtain an optimum.

  • @RandomDude-yv5zf
    @RandomDude-yv5zf 20 днів тому +27

    Randomized numerical linear algebra has been common for over two decades. A good resource is Finding Structure with Randomness. The authors on it (Halko, PG Martinsson, Tropp) are very big in the field. Others are Drineas and E. Liberty. The general idea shows up everywhere. My advisor used it for solving classes of PDEs

    • @Mutual_Information
      @Mutual_Information  19 днів тому +7

      Yes, it is older than I originally appreciated. It got coined in the early 2010's, and that's where most of my reading started.

    • @SirCutRy
      @SirCutRy 18 днів тому

      ​@@Mutual_Information
      Do you anticipate these methods being used in ML and other computing libraries within the next few years?

  • @danielsantiagoaguilatorres9973
    @danielsantiagoaguilatorres9973 23 дні тому +35

    I'm writing a paper on a related topic. Didn't know about many of these papers, thanks for sharing! I really enjoyed your video

  • @pietheijn-vo1gt
    @pietheijn-vo1gt 23 дні тому +31

    I have seen a very similar idea in compressed sensing. In compressed sensing we also use a randomized sampling matrix, because the errors can be considered as white noise. We can then use a denoising algorithm to recover the original data. In fact I know Philips MRI machines use this technique to speed up scans, because you have to take less pictures. Fascinating

    • @tamineabderrahmane248
      @tamineabderrahmane248 22 дні тому

      random sampling to reconstruct the signal

    • @pietheijn-vo1gt
      @pietheijn-vo1gt 21 день тому

      @@tamineabderrahmane248... what?

    • @MrLonelyrager
      @MrLonelyrager 19 днів тому +2

      Compressed sensing is also useful for wireless comunications. I studied its usage for sampling ultra wideband signals and indoor positioning. It only works accurately under certain sparsity assumptions. In MRI scans , their "fourier transform" can be considered sparse, then we can use l1 denoising algorithms to recover the original signal.

    • @pietheijn-vo1gt
      @pietheijn-vo1gt 18 днів тому

      @@MrLonelyrager yes correct, that's exactly what I used. In the form of ISTA (iterative shrinkage and thresholding) algorithms and its many (deep-learning) derivatives

  • @richardyim8914
    @richardyim8914 20 днів тому +19

    Golub and Van Loan’s textbook is goated. I loved studying and learning numerical linear algebra for the first time in undergrad.

  • @makapaka8247
    @makapaka8247 23 дні тому +55

    I'm finally far enough in education to see how well made your stuff is. Super excited to see a new one from you. Thanks for expanding people's horizons!

  • @zyansheep
    @zyansheep 22 дні тому +14

    Dang, I absolutely love videos and articles that summarize the latest in a field of research and explain the concepts well!

  • @KipIngram
    @KipIngram 22 дні тому +6

    Fascinating. Thanks very much for filling us then on this.

  • @noahgsolomon
    @noahgsolomon 19 днів тому +5

    You discussed all the priors incredibly well. I didn’t even understand the premise of random in this context and now I leave with a lot more.
    Keep it up man ur videos are the bomb

  • @charlesity
    @charlesity 17 днів тому +6

    As always this is BRILLIANT. I started following your videos since I saw the GP regression video. Great content! Thank you very much.

  • @marcegger7411
    @marcegger7411 19 днів тому +5

    Damn... your videos are getting beyond excellent!

  • @aleksszukovskis2074
    @aleksszukovskis2074 22 дні тому +5

    its always a pleasure to watch this channel

  • @bluearctik3980
    @bluearctik3980 18 днів тому +4

    My first thought was "this is like journal club with DJ"! Great stuff - well researched and crisply delivered. More of this, if you please.

  • @Daniel-bn8ws
    @Daniel-bn8ws 13 днів тому +1

    Outstanding content, instant sub. Keep up the good work!

  • @AjaniTea
    @AjaniTea 3 дні тому +1

    This is a world class video. Thanks for posting this and keep it up!

  • @jondor654
    @jondor654 19 днів тому +2

    Lovely type, great clarity .

  • @from_my_desk
    @from_my_desk 20 днів тому +1

    thanks a ton! this was eye-opening 😊

  • @piyushkumbhare5969
    @piyushkumbhare5969 20 днів тому +1

    This is a really well made video, nice!

  • @JonathanPlasse
    @JonathanPlasse 11 днів тому +1

    Awesome presentation, thank you!

  • @moisesbessalle
    @moisesbessalle 23 дні тому +6

    Amazing video!

  • @wiktorzdrojewski890
    @wiktorzdrojewski890 20 днів тому +2

    this feels like a good presentation topic for numerical methods seminar

  • @gaussology
    @gaussology 15 днів тому

    Wow, so much research went into this! It makes me even more motivated to read papers and produce videos 😀

  • @Stephen_Kelley
    @Stephen_Kelley 12 днів тому +1

    Excellent video, really well paced.

  • @mgostIH
    @mgostIH 22 дні тому +6

    I started reading this paper when you mentioned it on Twitter, forgot it was you who I got it from and was now so happy to see a video about it!

  • @braineaterzombie3981
    @braineaterzombie3981 18 днів тому +1

    This is exactly what i needed. Subscribed

  • @deltaranged
    @deltaranged 23 дні тому +22

    It feels like this video was made to match my exact interests LOL
    I've been interested in NLA for a while now, and I've recently studied more "traditional" randomized algorithms in uni for combinatorial tasks (e.g. Karger's Min-cut). It's interesting to see how they've recently made ways to combine the 2 paradigms. I'm excited to see where this field goes. Thanks for the video and for introducing me to the topic!

    • @Rockyzach88
      @Rockyzach88 22 дні тому +1

      UA-cam has you in its palms. _laughs maniacally_

    • @Sino12
      @Sino12 19 днів тому

      where do you study?

  • @tiwiatg2186
    @tiwiatg2186 3 дні тому +1

    Loving it loving it loving it!! Amazing video, amazing topic 👏

  • @AlexGarel-xr9ri
    @AlexGarel-xr9ri 16 днів тому

    Incredible video with very good animations and script. Thank you !

  • @JoeBurnett
    @JoeBurnett 19 днів тому +1

    You are an amazing teacher! Thank you for explaining the topic in this manner. It really motivates me to continue learning about all things linear algebra!

  • @billbez7465
    @billbez7465 2 дні тому +1

    Amazing video with great presentation. Thank you

  • @EkShunya
    @EkShunya 22 дні тому

    Been a while since ur last post
    thanks
    Please make more often
    I like what u make

  • @lbgstzockt8493
    @lbgstzockt8493 23 дні тому +5

    Very good video on a very interesting topic. Who would have thought that there is this much to gain in such a commonly used piece of mathematics.

  • @tantzer6113
    @tantzer6113 22 дні тому +1

    I enjoyed this video. Thank you.

  • @hozaifas4811
    @hozaifas4811 23 дні тому +23

    We need more content creators like you ❤

    • @Mutual_Information
      @Mutual_Information  23 дні тому +4

      Thank you. These videos take awhile, so I wish I could upload more. But I'm confident I'll be doing UA-cam for a long time.

    • @hozaifas4811
      @hozaifas4811 23 дні тому +2

      @@Mutual_Information Well ,This news made my day !

  • @CyberBlaster-fu2dz
    @CyberBlaster-fu2dz 23 дні тому +1

    Great video, thank you!

  • @Otakutaru
    @Otakutaru 9 днів тому +1

    Adequate density of new information, and sublime narrative. Also, on point visuals

  • @Pedritox0953
    @Pedritox0953 23 дні тому +2

    Great video!

  • @iamr0b0tx
    @iamr0b0tx 23 дні тому +4

    This is a really good video 💯

  • @pygmalionsrobot1896
    @pygmalionsrobot1896 22 дні тому +2

    Whoa - very cool stuff !!

  • @vNCAwizard
    @vNCAwizard 13 днів тому +1

    An excellent presentation.

  • @ernestoherreralegorreta137
    @ernestoherreralegorreta137 17 днів тому +3

    Amazing explanation of a complex topic! You've got yourself a new subscriber.

  • @oceannuclear
    @oceannuclear 15 днів тому

    Oh my god, this forms a small part of my PhD thesis where I've been trying to understand LAPACK's advantage/disadvantage when it comes to inverting matrices. Having this video really helps me put things into contex! Thank you very much for making this!

  • @broccoli322
    @broccoli322 23 дні тому +4

    Great stuff

  • @MachineLearningStreetTalk
    @MachineLearningStreetTalk 22 дні тому +4

    Great video brother! 😍

    • @Mutual_Information
      @Mutual_Information  22 дні тому

      Thank you MLST! You're among a rare bunch providing non-hyped or otherwise crazy takes on AI/ML, so it means a lot coming from you.

  • @StratosFair
    @StratosFair 16 днів тому

    As a grad student in theoretical machine learning, I have to say i'm blown away by the quality of your content, please keep videos like these coming !

  • @the_master_of_cramp
    @the_master_of_cramp 22 дні тому +2

    Great and clear video!
    Makes me wanna study more numerical LA...combined with probability theory
    because it shows how likely inefficient many algorithms use currently are, and that randomized algorithms are usually insanely much faster, while being approximately correct.
    So those randomized algorithms basically can be used anywhere when we don't need to be 100% sure about the result (which is basically always, because our mathematical models are only approximations of what's going on in the world and thus are inaccurate anyways and as you mentioned, if data is used, it's noisy).

  • @ihatephysixs
    @ihatephysixs 15 днів тому +2

    Awesome video

  • @DocM221
    @DocM221 10 днів тому +1

    I've been through some basic linear algebra courses, but really the covariance problem struck me as one obviousness to a statician. A statician would never go and sample everybody, they would first determine how accurate they needed to be in their certainty, and then go about sampling exactly the number of people that satisfies that equation. I actually had to do this in my job! I can totally see how this will be a great tool used with data prediction and maybe hardware accelerators to make MASSIVE gains. We are in for a huge wild ride! Thanks for the video!

  • @EE-wo5ty
    @EE-wo5ty 23 дні тому +5

    the quality on this editing is top notch, congratulations!!!

  • @scottmiller2591
    @scottmiller2591 22 дні тому +7

    This was a nice walk down memory lane for me, and a good introduction to the beginner. It's nice to see SWE getting interested in these techniques, which have a very long history (like solving finite elements with diffusion decades ago, and compressed sensing). I enjoyed your video.
    A few notes:
    It's useful to note that "random" projections started out as Gaussian, but it turns out very simple, in-memory, transformations let you use binary random numbers at high speed with little to no loss of accuracy. I think you had this in mind when talking about the random matrix S in sketch-and-solve.
    BLAS sounds like blast, but without the t. I'm sure there's people who pronounce it like blahs. Software engineers mangle the pronunciation of everything, including other SWE packages, looking at you, Ubuntu users. However the first pronunciation is the pronunciation I have always heard in the applied linear algebra field.
    FORTRAN doesn't end like "fortune," but rather ends with "tran," but maybe people pronounce "fortran" (uncapitalized) that way these days - IDK (see note above re: mangling; FORTRAN has been decapitalized since I started working with it).
    Cholesky starts with a hard "K" sound, which is the only pronunciation you'll ever hear in NLA and linear algebra. It certainly is the way Cholesky pronounced it.
    Me, I always pronounce Numpy to sound like lumpy just to tweak people, even though I know better ☺. I've always pronounced CQRRPT as "corrupt," too, but because that's what the acronym looks like (my eyes are bad).
    One way to explain how these work intuitively is to look at a PCA, similar to what you touched on with the illustration of covariance. If you know the rank is low, then there will be, say, k large PCA directions, and the rest will be small. If you perform random projection on the data, those large directions will almost certainly show up in your projections, with the remaining PCA directions certainly being no bigger than they were originally (projection is always non-expanding). This means the random projections will still contain large components of the strong PCA directions, and you only need to make sure you took enough random projections to avoid being unlucky enough to accidentally be very nearly normal with the strong PCA directions every time. The odds of you being unlucky go down with every random projection you add. You'd have to be very unlucky to take a photo of a stick from random directions, and have every photo of the stick be taken end-on. In most photos, it will look like a stick, not a point. Similarly, taking a photo of a piece of paper from random directions will look like a distorted rectangle, not a line segment It's one case where the curse of dimensionality is actually working in your favor - several random projections almost guarantees they won't all be projections to an object that's the thickness of the paper.
    I've been writing randomized algos for a long time (I have had arguments w engineers about how random SVD couldn't possibly work!), and love seeing random linear algebra libraries that are open and unit tested.
    I agree with your summary - a good algorithm is worth far more than good hardware. Looking forward to you tracking new developments in the future.

    • @Mutual_Information
      @Mutual_Information  22 дні тому +4

      This is the real test of a video. When an expert watches it and, with some small corrections, agrees that it gets the bulk of the message right. It's a reason I try to roll in an subject matter expert where I can. So I'm quite happy to have covered the topic appropriately in your view. (It's also a relief!)
      And I also wish I had thought of the analogy: "You'd have to be very unlucky to take a photo of a stick from random directions, and have every photo of the stick be taken end-on. In most photos, it will look like a stick, not a point." I would have included that if I had thought of it!

    • @scottmiller2591
      @scottmiller2591 22 дні тому

      @@Mutual_Information Agree absolutely!

    • @rileyjohnmurray7568
      @rileyjohnmurray7568 21 день тому +3

      Jim Demmel and Jack Dongarra pronounced it "blahs" the last time I spoke with each of them. (~This morning and one month ago, respectively.) 😉

    • @Mutual_Information
      @Mutual_Information  21 день тому +1

      @@rileyjohnmurray7568 lol

    • @scottmiller2591
      @scottmiller2591 21 день тому +1

      @@rileyjohnmurray7568 I hope they perk up ☺

  • @DawnOfTheComputer
    @DawnOfTheComputer 8 днів тому +1

    The math presentation and explanation alone was worth a sub, let alone the interesting topic.

  • @DavidS-ji6qv
    @DavidS-ji6qv 16 днів тому

    Phenomenal video

  • @TrungHieuTu
    @TrungHieuTu 22 дні тому +1

    Very useful, thanks

  • @Ohmriginal722
    @Ohmriginal722 22 дні тому +1

    Whenever randomness is involved you got me wanting to use Analogue processors for fast and low-power processing

  • @tanithrosenbaum
    @tanithrosenbaum 18 днів тому +1

    "They're quite good" - Understatement of the decade 😄

  • @h.b.1285
    @h.b.1285 20 днів тому +1

    Excellent video! This topic is not easy for the layperson (admittedly, the layperson that likes Linear Algebra), but it was clearly and very well structured.

  • @antiguarocks
    @antiguarocks День тому

    Reminds me of what my high school maths teacher said about being able to assess product quality on a production line with high accuracy by only sampling a few percent of the product items.

  • @michaeln.8185
    @michaeln.8185 22 дні тому +2

    Great video! Thank you for producing this!

  • @chakrasamik
    @chakrasamik 19 днів тому +1

    Excellent ❤

  • @user-gv6fn6yt2u
    @user-gv6fn6yt2u 15 днів тому +1

    it's really mind-blowing how random numbers can achieve something such fast

  • @mohammedbelgoumri
    @mohammedbelgoumri 23 дні тому +4

    No better way to start the day than with an MI upload 🥳

  • @catcoder12
    @catcoder12 22 дні тому +1

    anotha banger by DJ

  • @nikita_x44
    @nikita_x44 22 дні тому +4

    linearity @ 4:43 is diffirent linearity. linear functions in the sense of linear algebra must always pass through (0,0)

    • @sufyanali3992
      @sufyanali3992 15 днів тому

      I thought so too, the 2D line shown on the right is an affine function, not a linear function in the rigorous sense.

    • @KepleroGT
      @KepleroGT 6 днів тому

      Yep, otherwise the linearity of addition and multiplication which he just skipped over wouldn't apply and thus wouldn't be linear functions, or rather the correct term is linear map/transformation. Example: F(x,y,z) = (2x+y, 3y, z+5), (0,0,0) = F(0,0,0) is incorrect because F(0,0,0) = (0,0,5). The intent is to preserve the linearity of these operations so they can be applied similarly. If I want 2+2 or 2*2 I can't have 5

  • @wafikiri_
    @wafikiri_ 7 днів тому

    The first program I fed a computer was one I wrote in FORTRAN IV. It almost exhausted the memory capacity of the IBM machine, which was about 30 KBytes for the user (it used memory overloads, which we'd call banked memory today, in order to not exceed the available memory for programs).

  • @General12th
    @General12th 23 дні тому +3

    Hi DJ!
    I love improvements in algorithmic efficiency.

  • @metromap9618
    @metromap9618 22 дні тому

    great video!

  • @robmorgan1214
    @robmorgan1214 21 день тому +2

    Of course. This isn't a surprise. I've been using these techniques for optimization for a long time. Simulated annealing was proven (decades ago) to scale better than many optimization algorithms. If your big O is bigger than Sim annealing, use sim annealing! Always calculate your big O and THEN measure your implementation to make sure you hit it. Same thing goes for your error... and controlling that can blow out your big O and that's data not algorithm dependent! ALWAYS MEASURE! If you have to pre sort before accumulating to minimize error you are not going to hit your scaling numbers and you're going to murder your cache and memory pipelining. The key with that 1/e term is to recall that floating point math is going to accumulate rounding errors at a precision of about 0.1-1.0 in 1M. This sets your floor and the sensitivity of your eigenvalues ( if they vary by more than about one part in 1M, your answers will be dominated by errors, so you take the hit and use doubles). This kind of stuff used to be explicitly covered in scientific computing classes when resources were limited and the hardware was MUCH less complex. It's interesting that this complexity has managed to hide potential optimizations of order 20-1000 x. But it makes sense, in order to use the HW efficiently you need to be an expert in so many things that the problems you're actually trying to solve becomes something of an afterthought and resources allocation in universities and other organizations focused on numerical methods face the pressures of silos and hyperspecialization. Conaway's law strikes again, as our software matches the organizational structures that create it.

  • @midou6104
    @midou6104 10 днів тому +1

    Okay, objectively, that's the hardest thing in linear algebra I've ever seen.

  • @johannguentherprzewalski
    @johannguentherprzewalski 16 днів тому

    Very interesting content! I did find that the video felt longer than expected. I was intrigued by the thumbnail and the promise of at least 10x speed improvement. However, it took quite a while to get to the papers and even longer to get to the explanation. The history definitely deserves its own video and most chapters could be much shorter.

  • @nandanshettigar7261
    @nandanshettigar7261 11 днів тому +1

    Another beautiful global optima of priceless information to pull me out of my local tunnels :) Thank you as always

  • @rainaldkoch9093
    @rainaldkoch9093 11 днів тому +1

    Danke!

  • @jonmichaelgalindo
    @jonmichaelgalindo 22 дні тому +2

    "Rasterizing triangles to pixels--gone." I was like, "Unreal's not using triangles???" LOL but it was just a very confusingly worded statement.

  • @pr0crastinatr
    @pr0crastinatr 19 днів тому

    Another neat explanation for why the randomized least-squares problem works is the Johnson-Lindenstrauss lemma. That lemma states that most vectors don't change length a lot when you multiply them by a random gaussian matrix, so the norm of S(Ax - b) is within (1-eps) to (1+eps) of the norm of Ax-b with high probability.

  • @pythonguytube
    @pythonguytube 15 днів тому

    Worth pointing out that there is a modern sparse linear algebra package called GraphBLAS, that can be used not just for graphs (which generalize to sparse matrices) but also to any sparse matrix multiplication operation.

  • @rr00676
    @rr00676 15 днів тому +1

    I've been hoping some advances in probabilistic numerics and random matrix theory bring PGM's some love. Computing matmuls/inverses every iteration of MCMC makes me sad :(. As expected, great video!

  • @sherifffruitfly
    @sherifffruitfly 17 днів тому +2

    That's cool as hell - thanks!
    1) an interesting thing you didn't address/answer: why is data generally expected to contain so much redundancy, that a "small" subset suffices for
    2) seems like LLM/NN would be the place randNLA evangelists would want to go. If they can convert the drivers of LLM/NN to randNLA, pretty much everyone else will likely follow

    • @Mutual_Information
      @Mutual_Information  17 днів тому +1

      Glad you like it. To your points:
      1) Yea, good Q. If A is m-by-n, then solving Ax = b (special case: distance to b is zero) only requires n rows of A (assumes it's full rank). So you could say, the extra m - n rows are redundant. So necessarily, A has a lot of redundancy.
      2) Yes! I'm sure the researchers are thinking of this. But things get tricky (it's hard to prove theorems) when you go from the pure linear algebra questions to the messy and wild west of NNs and LLMs.

    • @cahdoge
      @cahdoge 16 днів тому +2

      @@Mutual_Information the way I understand it now: The redundancy is a result of using the method of least squares to compute a function that describes the result of your matrix multiplication and using computers to calculate them.
      Since it's a type of regression, it is already an approximation. If you use a subset it gets faster but also becomes les precise.
      Next thing is, your computer has an upper limit for precision, so as long as you choose a subset that gives results within the limit you are fine.
      The tricky bit is finding a way to choose the subset and optimizing your error to be as close to the computers "natural" one as possible.

  • @psl_schaefer
    @psl_schaefer 22 дні тому +1

    As always great (very educative) content. I very much appreciate all the work you put into those videos!

  • @minsookim-ql1he
    @minsookim-ql1he 12 днів тому +1

    This is very interesting

  • @damondanieli
    @damondanieli 22 дні тому +6

    Great video! One thing: “processor registers” not “registries”

  • @HelloWorlds__JTS
    @HelloWorlds__JTS 4 дні тому

    Phenomenal! But I have one correction for (25:33): Full rank isn't restricted to square [invertible] matrices, it just means rank = min(m,n) rather than rank = k < min(m,n).

  • @u2b83
    @u2b83 15 днів тому

    I tripped across the Integer relation algorithm at 15, when I wrote a calculator program to calculate how much change you put on the scale just based on the total weight. Thanks to this video (top 10 problems list), I finally know what that's called. Until now I called this the "primeness of unique coin weights" lol

  • @MariusKavaliauskas
    @MariusKavaliauskas 22 дні тому

    Very informative video and I will be waiting for more. I am hooked on randomized linear algebra since Ewin Tang "dequantization" papers. I wonder if randomized algos will have huge impact on ML training performance (not just inference). I also wonder how will it compare in performance and accuracy: low-rank approximations of ML models vs randomized inference on full models.

  • @ShivaTD420
    @ShivaTD420 11 днів тому +1

    If you take white noise. And put a filter on it. You can produce every note, because every tone and semi tone is in the noise.

  • @HyperDevv
    @HyperDevv 20 днів тому +3

    NEW MATH UPDATE JUST DROPPED

  • @janni7439
    @janni7439 10 днів тому

    There are also other approaches where you choose for an epsilon and reduce complexity of the problem, like in hierarchical matrices

  • @cannaroe1213
    @cannaroe1213 18 днів тому +1

    Nearly 7 years ago when I was still a practicing geneticist, sequenced DNA would usually only be a few nucleotides long, maybe 50, and it would have to get mapped to a genome with billions of possible locations to test. The fastest algorithms ended up being used in the most published papers, so competition was pretty fierce to be the fastest.
    The gold standard was a deterministic program called BWA/Bowtie, but just before I left the field a new breed of non-deterministic aligners with mapping times orders of magnitude faster were developed, and it really split opinions. Different deterministic programs would give different results (i.e. they had noise/error too, even if they're consistent about it), so in many ways who cared if a program gave different results every time you ran it, particularly if you only intend to run it once...
    But there were other problems. You couldn't create definitive analyses anymore, you couldn't retrace someone else's steps, you couldn't rely on checksums, total nightmare.
    The "hidden structures" aspect of the paper was interesting, the structures are in the data, and how the algorithm interacts with the data, which as the programmer you don't have access to by definition - but you also kinda know all you need to know about it too. It feels very similar to making a good meme.

  • @RealUniquee
    @RealUniquee 14 днів тому

    Simply phenomenal if it get implemented in deep learning framework

  • @baptiste-genest
    @baptiste-genest 23 дні тому +4

    Great video ! I had a compressive sensing class this semester, it sure is a very interesting and promissing topic of reasearch !
    But I'm not sure that video games would benefit a lot from it ? If I understood correctly, the gains are mostly at high dimension, while video games linear algebra is basically only 3D, do you have exemples ? Thanks again !

    • @Mutual_Information
      @Mutual_Information  23 дні тому +3

      Thank you! My take is that that’s only in a certain representation. E.g. when a dimension refers to a specific pixel, the dimensions are quite high.

  • @Patashu
    @Patashu 22 дні тому

    This has been my thought about deep learning for a while now - we build computers to be deterministic, but deep learning would run best on a different kind of computer that is lossy but as a tradeoff much more energy inefficient. This is a different take though (keep determinism, but instead deliberately code faster but lossy algorithms) that could also do the job,

  • @ericlaska4748
    @ericlaska4748 23 дні тому +3

    At 30 minutes I think you got to the crux of the algorithm: The Law of Large Numbers.

  • @user-qp2ps1bk3b
    @user-qp2ps1bk3b 22 дні тому

    very nice!

  • @nonamehere9658
    @nonamehere9658 23 дні тому +3

    The trick of multiplying by random S in argmin (SAx-Sb)^2 reminds me of the similar trick in the Freivalds' algorithm: instead of verifying matrix multiplication A*B==C we check A*B*x==C*x for a random vector x.
    Random projections FTW???

  • @WhiteGandalfs
    @WhiteGandalfs 18 днів тому

    Let me try to phrase it for people who have no math degree education, but rather engineering level: You effectively select the best fitting equations of the linear problem which is originally highly overdefined for your x vector to sufficiently represent the complete system with a small subset of the original equations. - correct? That's not directly "inducing random noise" but rather a simplification by omission of probably irrelevant equations.
    This reminds me of how we did such a scheme for a "bundle block adjustment" application: We used the drastic performance boost from simplification to do multiple simple bba within each reaction step of the system with different drastically simplified subsets from the data, to then compare the results with the expected outcome (low rest error, good alignment with the continuation of the coordinates of our x vector from the previous step), then performing a final selection based on those outcomes and then performing a final error minimizing solving with those perfectly selected equations. That gives the best from both worlds: Speed up but without sacrificing correctness.
    And there is no magic at all (and no "introduced random noise"). Just a "try simple" first iteration, then based on that a selected final iteration. Basically engineering optimization based on working standard linear algebra systems.

  • @MyWatermelonz
    @MyWatermelonz 20 днів тому

    Using that engineering mindset, close enough is good enough! Worked for fast inverse square, and like you said floating is pretty inaccurate anyway but still works, might as well try to guess and see how much you can get away with. AI basically does that with quantization (and in general).
    I should really finish watching vids before commenting since he mentions ML literally 30 seconds after I made this.