Microsoft’s New AI Clones Your Voice In 3 Seconds!

Поділитися
Вставка
  • Опубліковано 8 лют 2023
  • ❤️ Check out Lambda here and sign up for their GPU Cloud: lambdalabs.com/papers
    📝 The paper "VALL-E Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers" is available here:
    valle-demo.github.io/
    My latest paper on simulations that look almost like reality is available for free here:
    rdcu.be/cWPfD
    Or this is the orig. Nature Physics link with clickable citations:
    www.nature.com/articles/s4156...
    🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
    Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Edward Unthank, Eric Martel, Geronimo Moralez, Gordon Child, Jace O'Brien, Jack Lukic, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Kyle Davis, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Matthew Valle, Michael Albrecht, Michael Tedder, Nevin Spoljaric, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Richard Sundvall, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
    If you wish to appear here or pick up other perks, click here: / twominutepapers
    Thumbnail background design: Felícia Zsolnai-Fehér - felicia.hu
    Károly Zsolnai-Fehér's links:
    Twitter: / twominutepapers
    Web: cg.tuwien.ac.at/~zsolnai/
  • Наука та технологія

КОМЕНТАРІ • 875

  • @Thangulhad
    @Thangulhad Рік тому +390

    I remember as a kid coding with a TI-99 4A that used the Texas Instruments chip set from Speak and Spell. My how far has technology come for computer generated voice!

    • @ryanhewitt9902
      @ryanhewitt9902 Рік тому +8

      This research actually derives from the Speak and Spell module of the T800 model 101 that appeared only 3 years later!

    • @robertpdavenportii2908
      @robertpdavenportii2908 Рік тому +3

      I have fond memories of my TI-99 4A! 😊

    • @Itsme-wt2gu
      @Itsme-wt2gu Рік тому

      Mr beast promoting my stuff

    • @nazaxprime
      @nazaxprime Рік тому

      Ikr

    • @dxmerchant
      @dxmerchant Рік тому +2

      My 1st PC was a Timex/SinClair 1000 ! en.wikipedia.org/wiki/Timex_Sinclair#/media/File:Timex_Sinclair_1000_FL.jpg
      Yep, you hooked it up to your TV's VHF antenna connection.

  • @DeruwynArchmage
    @DeruwynArchmage Рік тому +237

    Holy cow.. I just had a thought. Imagine wanting to listen to an audio book. You could have an entire cast of voices, each unique to the character. And not only that, you could potentially choose who you wanted to voice each part; using some kind of drop down list or whatever. Want Morgan Freeman, or James Earl Jones, or Steven Fry, or whoever you prefer to voice character X? No problem. Change your mind? No problem.
    That would be insane! Obviously, people should be compensated for using their voice like this. But, it’s still an amazing idea. Of course, like nearly every AI discovery lately, this has massive potential to be abused. But, we’re just going to have to do the best we can to limit illicit uses. It’s not like we can put these genies back into their bottles.

    • @kwillo4
      @kwillo4 Рік тому +11

      Creative, I like it! I do think the scamming will make this a net negative tech

    • @lionel4685
      @lionel4685 Рік тому +2

      Why compensate for the voice ? What does the law say in that matter?

    • @airatvaliullin8420
      @airatvaliullin8420 Рік тому

      You can even make it play in your own voice!

    • @FabiTheG
      @FabiTheG Рік тому +3

      You could even use a language model like ChatGPT to automatically detect which text should be read by which character. You could basically automate the whole process with AI. Insane!

    • @niravelniflheim1858
      @niravelniflheim1858 Рік тому +2

      I was thinking that! I would like to be able to create voices for my characters. Perhaps by blending, or perhaps in a similar way to Stable Diffusion but for voice, where there is the text to say and then a block after that says "in the style of" and then several names perhaps. I would feel more comfortable about it creating original voices than ripping off an existing voice actor, however, but I see people doing that anyway...
      But assuming these issues can be overcome, it means that an author who has specific voices in their head could use AI to have their audiobooks made with these original voices, that sound like perfectly real people, but in fact they are entirely synthetic! That would be really amazing to have as an option, especially if a whole cast of real-life voice actors are not in the author's budget.

  • @anonharingenamn
    @anonharingenamn Рік тому +384

    Imagine being able to voice small indie games using AI. Incredible.

    • @zakblue
      @zakblue Рік тому +20

      This is exactly what I was thinking. The ability to do variations is amazing too

    • @chelvo56
      @chelvo56 Рік тому +43

      I'm currently pretty hyped about the possibility of voicing player's self chosen names.

    • @LovelyLori193
      @LovelyLori193 Рік тому +10

      Voice Actors will soon be paid next to nothing for their work. Unionization is crucial to prevent this from happening.

    • @simulation3120
      @simulation3120 Рік тому +16

      Chat GPT helps me develop my indie game and it's very persistent in its suggestion that I use a language model that understands the lore of my game to talk to players as the NPCs. Dynamically building the lore as the player asks further questions. I didn't bring it up. It wants me to do it.

    • @Eclyptical
      @Eclyptical Рік тому +18

      Nah thats a lame use of it. A better one would be to be able to insert a clip of your own voice so the protagonist can have fully fleshed dialogue as if its you

  • @ianmilham7397
    @ianmilham7397 Рік тому +542

    Imagine listening your favorite novel read to you in your own voice. Crazy

    • @pleebianmusk
      @pleebianmusk Рік тому +161

      No thanks…

    • @theshpee1214
      @theshpee1214 Рік тому +111

      You like the sound of your own voice?

    • @panpiper
      @panpiper Рік тому +80

      My voice in my head doesn't sound anything like my own voice to my ears. If I am being read to, I'd much rather a professional caliber voice thank you.

    • @syrus3k
      @syrus3k Рік тому +20

      Hmm yah no thanks!! Lol

    • @ashleysmith38
      @ashleysmith38 Рік тому +31

      I hate my voice lol

  • @sombodythatyouusedtoknow9046
    @sombodythatyouusedtoknow9046 Рік тому +271

    This is going to be useful for modders, You can give extra voicelines to your character in fo4 if the roleplay wasn't the best or straight up replace the VA with you
    Edit:We could even shove more names into Codsworth, like imagine hearing Codsworth saying "Good morning mr Tittyfuck"
    Another Edit: Yeah bitches, I was right,there´s now a mod in Nexus that enhances the RP of FO4,I knew that was possible,it was only a matter of time

    • @BRUXXUS
      @BRUXXUS Рік тому +26

      I didn’t even think about modders. That’s a super cool application for this tech! Adding your own voice into a game would also be quite fun, and maybe a little weird.

    • @dibbidydoo4318
      @dibbidydoo4318 Рік тому +4

      modders already have a synthetic voice mod, this will bring it to the next level.

    • @desu38
      @desu38 Рік тому +1

      Hell, I might actually start working on Kidmer again.

    • @jledragon
      @jledragon Рік тому +4

      Or even free voice acting in all games

    • @kylanacus2407
      @kylanacus2407 Рік тому

      or just make every character in game have your voice! 😁

  • @OmnipotentZORG
    @OmnipotentZORG Рік тому +91

    Those examples were crazy. Imagine how voice recordings are going to be manipulated, voice recordings wont be able to be used as solid evidence anymore, unless there is still detectable artifacting which there probably is.

    • @ohiasdxfcghbljokasdjhnfvaw4ehr
      @ohiasdxfcghbljokasdjhnfvaw4ehr Рік тому +15

      that's a good thing, they were already way to easy to chop up and take out of context.

    • @WiseChad
      @WiseChad Рік тому +4

      True, but there will always be some independent company with their own way of combatting this.

    • @anothergol
      @anothergol Рік тому +8

      It's still too noticably synthetic but we'll get there. However it can already be done today, voice impersonators can do a perfect job. But yeah when it will be easy, we'll be flooded by fakes.

    • @silverchairsg
      @silverchairsg Рік тому +1

      What about the metadata of the file?

    • @MitsumaYT
      @MitsumaYT Рік тому +1

      Wouldn't worry too much yet, just like image editing it all leaves traces. There is a difference between sounding the same and being the same on the technical level.
      You would need to do a ton more to get the same spectroscopic makeup of the original.

  • @Starkl3t
    @Starkl3t Рік тому +78

    Eleven Labs already does a pretty good job at cloning your voice with a couple minutes of audio, and it's already available. Things are progressing super fast.

    • @DavidBerglund
      @DavidBerglund Рік тому +3

      Looks incredible. ~12 mins of audio/month on the free tier? ~36 mins/month for $5. I'm curious about their new product to be released in Q1 -23. Sounds like it's going to be a desktop application with lot's of editing functionality.

    • @deadpianist7494
      @deadpianist7494 Рік тому +9

      4chan anon already leaked the source code by hacking Eleven Labs idk what going to happen now

    • @lonelybookworm
      @lonelybookworm Рік тому +6

      @@deadpianist7494 If the source code doesn't include trained weights it's mostly useless

    • @hitlab
      @hitlab Рік тому +2

      Eleven labs might even be a little better.

    • @pedroantonio5031
      @pedroantonio5031 Рік тому

      bots

  • @timhaldane7588
    @timhaldane7588 Рік тому +78

    About seven years ago, I received the final voice mail messages the love of my life ever sent. I still have them saved. I'm realizing it's now possible to hear her voice read out her texts or even say something new. I don't know quite how I feel about it.

    • @lennysmileyface
      @lennysmileyface Рік тому +12

      I would find it rather creepy personally. It would be fine listening to old recordings but creating new never before spoken words from my loved one written by someone else? Nah I couldn't stomach that.

    • @wijzijnwij
      @wijzijnwij Рік тому +21

      There's a Black Mirror episode about exactly this

    • @johndank2209
      @johndank2209 Рік тому +3

      Mix this voice ai tech with chatgpt and you can talk with your deceased loved one as if they were actually there.

    • @yooneunhyesarang9245
      @yooneunhyesarang9245 Рік тому +3

      It is creepy for me. But yes, I know how u feel.

    • @timhaldane7588
      @timhaldane7588 Рік тому +7

      @@yooneunhyesarang9245 that's what I mean... I'm stuck between missing her voice and feeling like it's an insult to her memory.

  • @satyamtiwari2693
    @satyamtiwari2693 Рік тому +33

    We are witnessing a new era! What a time to be alive!

    • @matthew8153
      @matthew8153 Рік тому +5

      We truly are in the horrible timeline.

  • @PhillipRauschkolb
    @PhillipRauschkolb Рік тому +250

    Anyone else think his voice was AI generated immediately?

    • @celozzip
      @celozzip Рік тому +55

      every video

    • @uroboroh
      @uroboroh Рік тому +63

      I was actually surprised that that was not the twist at the end of the presentation...

    • @blockshift758
      @blockshift758 Рік тому +4

      That's what i thought a year ago

    • @CturiX.IREALLY
      @CturiX.IREALLY Рік тому +9

      ElevenLabs is leading the pack in terms of convincing us, but ElevenLabs suggests twenty times the amount of the sample

    • @yosha_ykt
      @yosha_ykt Рік тому +7

      This channel belongs entirely to artificial intelligence

  • @deryorsh
    @deryorsh Рік тому +146

    That's cool, but also really scary. Imagine on which level this is in one or two years.

    • @celozzip
      @celozzip Рік тому +17

      if they're telling us now, it got there at least 5 years ago.

    • @dr.emmettbrown7183
      @dr.emmettbrown7183 Рік тому

      We are in the end times. The AI will be used by the antichrist if it is not the antichrist itself.

    • @me0101001000
      @me0101001000 Рік тому +9

      My only concern is that this could be used for slander and framing, which would be a legal nightmare.

    • @antiochosyuliana7904
      @antiochosyuliana7904 Рік тому

      this is not the best. Just look at elevenlabs

    • @owenlarson07366
      @owenlarson07366 Рік тому +43

      @@celozzip Nah- That sounds like conspiracy brain. This is capitalism. They’re in competition for funding and to be first (to get more funding). If anything tech stuff tends to get presented as ready before it’s as good as they say.

  • @NotASpyReally
    @NotASpyReally Рік тому +3

    "from 30 minutes to 3 seconds! and just imagine what we will be able to do two more papers down the line"
    me: "a"
    AI after listening to that, using my voice: "I am you, but better."

  • @jrchannel7405
    @jrchannel7405 Рік тому +8

    This is a science channel but it feels like a terror channel. Seriously, this ai is becoming scary

    • @BSnicks
      @BSnicks Рік тому

      Let it team up with robots from Boston Dynamics, and Arnold wouldn't have to make a Terminator 5.

  • @MacXpert74
    @MacXpert74 Рік тому +97

    This kind of voice cloning is both intriguing and very dangerous at the same time. It also reminds me of Terminator 2 where the T1000 cloned the voice of Connor's stepmother, but T800 Arnie was on to him. 😅

    • @geobot9k
      @geobot9k Рік тому +2

      Knives could be dangerous yet 99.9999% of the time they're used to make stuff, prepare food, and open boxes. Instead of dangerous tools I'm more concerned about the root causes and conditions that mold people into those that are capable of doing dangerous things including the billionaires and the politicians they control through legalized bribery we incorrectly call "campaign contributions" themselves and the roles they play in carrying out dangerous antisocial schemes and developing destructive environments that harm our brothers and sisters in humanity at home and all over the world. Plus, I also thought of that Terminator 2 scene 🤣

    • @ohiasdxfcghbljokasdjhnfvaw4ehr
      @ohiasdxfcghbljokasdjhnfvaw4ehr Рік тому +7

      it's only dangerous right now, when no one knows about it. in the future most people will catch on and know not to just trust audio clips, just like how we now all know that images can be photoshopped

    • @RazorbackPT
      @RazorbackPT Рік тому

      Your foster parents are dead.

    • @Hamachingo
      @Hamachingo Рік тому +1

      Your foster parents are dead.

    • @Blackfatrat
      @Blackfatrat Рік тому +17

      @@ohiasdxfcghbljokasdjhnfvaw4ehr things like this will be a legal nightmare. "The 8k video of me loudly admitting murder while slashing a person with a knife is just a deepfake video with a copied voice. I am innocent". Good luck proving a majority of crime if images, video and audio become useless evidence...

  • @AnthonyWilsonOlympian
    @AnthonyWilsonOlympian Рік тому +8

    A UA-camr used your voice as a prompt with Vall-E, and it was incredible! What a time to be alive!

    • @TheSchizoDuckie
      @TheSchizoDuckie Рік тому +3

      Link? I'd love to hear that

    • @casyac
      @casyac Рік тому +1

      @@TheSchizoDuckie Here it is: ua-cam.com/video/kqzI91YIfmw/v-deo.html

    • @tyler.walker
      @tyler.walker Рік тому +4

      @@TheSchizoDuckie I’m on mobile so I can’t get a link easily, but the UA-camr was MattVidPro

    • @candlespotlight
      @candlespotlight Рік тому

      I think the original commenter meant to say as a prompt with ElevenLabs. Because here in MattVidPro AI’s video about ElevenLabs’s voice cloning, he tried it using Károly’s voice (I linked to that exact point in the video):
      ua-cam.com/video/kqzI91YIfmw/v-deo.html

  • @NoNameAtAll2
    @NoNameAtAll2 Рік тому +7

    telephone scam "it's you from the future, invest in X!" incoming?

  • @glumpfi
    @glumpfi Рік тому +29

    This is insane! I hope this will be available to the public soon. It would be cool if you could feed more than 3 seconds in to even improve the similarity and details. And different languages would be very nice.

    • @ohiasdxfcghbljokasdjhnfvaw4ehr
      @ohiasdxfcghbljokasdjhnfvaw4ehr Рік тому +8

      yeah I hate how they're trying to do it with such short clips, I'd rather have to spend a week training and have it more accurate than have it work in 3 seconds.

    • @ozzi9816
      @ozzi9816 Рік тому

      There’s stuff available right now. I tried Elevenlabs the other day and it sounds way better than the stuff in this video

    • @glumpfi
      @glumpfi Рік тому +2

      @@ozzi9816 Yes i tried that with my voice, that is insane :D Hearing myself talking without accent or some weird Scottisch accent is so disturbing

    • @Graphomite
      @Graphomite Рік тому +7

      I'm sure that it will allow longer training. I think the bare three second example is intended to be a flex rather than the limit.

    • @krishlorend.sinevasan6945
      @krishlorend.sinevasan6945 7 місяців тому

      @@glumpfi how do you tried it to your voice I'm really interested in this research if you can please help me

  • @thomasnewman5271
    @thomasnewman5271 Рік тому +7

    the material created by the A.I has a few artefacts, but using classic signal processing tools like VST plugins could carve that out easily. You could even add characteristics such as being recorded on old analog systems or phones to place it into a specific situation. - super nice!

  • @patrickvaughan432
    @patrickvaughan432 Рік тому +75

    Voice cloning from such a brief voice sample is impressive, but ElevenLabs already has superior voice cloning available to the public.

    • @Soul-Burn
      @Soul-Burn Рік тому +30

      ElevenLabs TTS is insanely good. It knows how to infer emotion from the text and knows how to act out quotes in the text.

    • @sjey8665
      @sjey8665 Рік тому +1

      I was about to say it 😂 It's really too good to be real.

    • @xJRx7777
      @xJRx7777 Рік тому +4

      Yeah, I was gonna say. This really isn't that impressive compared to Eleven. But, the way it can apply emotions in the voice is very clever. Two more papers down the line.... where will we be.

    • @RasmusSchultz
      @RasmusSchultz Рік тому +11

      It's too bad Eleven is doing whatever they're doing behind closed doors. It's extremely good, but it's proprietary - which means there won't be a version which, for example, blind people could run locally... Although I'm sure research will catch up with them two more papers down the line. 😉

    • @aminulhussain2277
      @aminulhussain2277 Рік тому +1

      Very large difference in how much data is given.

  • @Twisted_Logic
    @Twisted_Logic Рік тому +4

    There's been an explosion in the last week of AI generated voice memes of Dagoth Ur, a character from The Elder Scrolls III: Morrowind that only had a couple voice lines. It's honestly mindblowing how good they sound, even showing emotions that weren't in the training data

    • @Ratigan2
      @Ratigan2 Рік тому +3

      What a grand and intoxicating advancement we have made

    • @crestfallenwarrior5719
      @crestfallenwarrior5719 Рік тому +3

      Oh sweet Nerevar, there is a lot more to come.

  • @jamesbernards8409
    @jamesbernards8409 Рік тому +1

    "I'm sorry dave, I can't do that." - in Goofy's voice.

  • @unfunnycesium
    @unfunnycesium Рік тому +14

    I know this thinking goes into the dangers of overtaking entire industries, but as AI language translation continues to develop, we could eventually see TV shows translated to any language using the voices of the original actors

    • @johnconnor3055
      @johnconnor3055 Рік тому +1

      This is exactly what i thought. Today for example @MrBeast uses Voice Actors to reach A French audience, we could imagine that he will have his voice speak french in the future instead.

    • @samstroganov5889
      @samstroganov5889 Рік тому

      @@johnconnor3055 ua-cam.com/video/_7_lqLS1vMU/v-deo.html This is it? You are crazy - that has real voice.

    • @KingMoronProductions
      @KingMoronProductions Рік тому

      There are no dangers of AI taking over our industries fellow human, do not worry

  • @bupp291
    @bupp291 Рік тому +9

    Did the paper mention if the quality of the voice synthesis improved with longer sampling times?

    • @bradleyhiggs3824
      @bradleyhiggs3824 Рік тому

      yeah it's one thing to simulate tone but another completely to simulate personality, the signature packages of which i hope VA's price well beyond the budgets of indy studios.

  • @fastmovingvolcanomatter
    @fastmovingvolcanomatter Рік тому +9

    I think the biggest question I have with this is whether or not the quality of the output scales significantly with greater amounts of reference audio. These 3 second examples are impressive given that they only have 3 seconds to work with, but you *can* tell something's just a touch off with these. With more audio, does that tiny bit of uncanniness disappear? Cuz if so, that's maybe the more impressive accomplish in my eyes.

    • @sciencecompliance235
      @sciencecompliance235 Рік тому +1

      Not sure, but it probably would. I don't think there's any way in hell you can capture someone's voice with only 3 seconds of audio, even with much better technology than this. There are just too many vocal tics, word pronunciation quirks, moods, throat hoarseness, etc... that would need a lot more information to properly emulate. And, of course, someone's personality would take a lot more than their voice to get even close to capturing.

    • @Graphomite
      @Graphomite Рік тому

      in your ears*

  •  Рік тому +12

    Bro your channel… what a goldmine it has been over the years

  • @perarneng
    @perarneng Рік тому +2

    When it can clone your voice "Two minute papers" then I'm convinced. Its awesome and contributes to making this channel unique! Thanks for so many good videos!

  • @NorbertFuto
    @NorbertFuto Рік тому +2

    This is scary and mindblowing at the same time, I mean all these new AI real time synthesis in voice, video, pictures, conversation. I cannot fins the words to describe my feelings about all this.

  • @chriswinslow
    @chriswinslow Рік тому +32

    I can imagine in about 10 years from now music artists from yesteryears could license out their voices to be used with AI music generators, so listeners/fans would have an unlimited music library of songs in the styles which are no longer played. Even their looks could be used to create music videos to go with the music. But why stop there? I think there'll be 100% AI-generated novels, games and even movies, at least the credits will only roll for about 5 seconds tops. I want to live in a world where I can have access to unlimited episodes of Futurama! Great video as always though, thanks for keeping us in the loop with your brilliant content.

    • @SW-fh7he
      @SW-fh7he Рік тому +5

      What is the value in content then, when you can generate absolutely everything you can imagine in minutes?

    • @JohnDlugosz
      @JohnDlugosz Рік тому +2

      Playing with ChatGTP, I asked for a new Beetles song "involving trains as a metaphor". Well, the lyrics anyway. But I agree it won't be long before I can conclude the interactive guidance with "render that."
      Oh, I followed up with "How about a Weird Al version?" and it not only knew what I meant but did a reasonable job changing it around to have a different meaning.

    • @JohnDlugosz
      @JohnDlugosz Рік тому +2

      @@KnightandDay33 follow up prompt: "Not bad, but make it less cliche."
      I've found that "stories" it generates are not proper stories at all, but a series of events. It lacks conflict, most obviously. But, the writing can be greatly improved with follow-up prompts and interactive exploration of ideas.
      I suppose a model could be trained specifically for writing stories, and it would have learned these lessons permanently.
      But, just like different people will write somewhat differently, different _instances_ of the model that's undergone different human guided learning will pick up different individual characteristics.

    • @aminulhussain2277
      @aminulhussain2277 Рік тому +1

      @@SW-fh7he To waste time. Same as always.

    • @cavemann_
      @cavemann_ Рік тому

      I don't like how it's taking out the human element out of something so intrinsically human. Can't imagine living in a reality like that.

  • @JozuaSijsling
    @JozuaSijsling Рік тому +1

    I really hope this won't get abused by telemarketers, imagine your voice getting morphed into accepting a contract.

  • @kartikaeyakumar770
    @kartikaeyakumar770 Рік тому +2

    waiting for the AI to say "what a time to be alive!"

  • @LovelyLori193
    @LovelyLori193 Рік тому +4

    I feel like this will be a way to fast track voice actors out of any industry for money saving purposes. Voice actors should unionize so this sort of thing cannot replace them without them receiving adequate compensation for their likeness.

  • @shipwreck9146
    @shipwreck9146 Рік тому +8

    How long does it take to generate the voice samples? Like, could you potentially have a live conversation with this or does it need to be pre generated.

  • @VOLatile_MiX
    @VOLatile_MiX Рік тому

    This is fascinating - this turns speech and audio into fashion. Something that can be changed and updated and will ebb and flow over time with trends. Very cool.

  • @MacXpert74
    @MacXpert74 Рік тому +5

    One thing I'm certain of is that NO ai will be able to mimic the exact unique voice inflections of Károly, though! 😂
    "What a time to be alive!!" 😜

  • @milasudril
    @milasudril Рік тому +1

    While somewhat robotic, you can do half-decent text-to-speach with techniques from the 80:s. You can also morph your voice with purly analog technices (vocoder), or digital fourier based ones.

  • @bradgentle354
    @bradgentle354 Рік тому +1

    Next paper down the line will be like "This AI Clones Your Voice with just one burp"

  • @jamesboyce7467
    @jamesboyce7467 Рік тому +1

    Cant wait for "Auto film, TV and Anime translator" that isolates vocals, auto translates and generates English voice acting keeping the voice of the original VA.

  • @questmarq7901
    @questmarq7901 Рік тому +2

    This would be lovely for any D&D Virtual Tabletop game app. The game master would be able to create NPCs with the voice acting of his choosing. If this is combined with an Open-source program such as FoundryVTT, it will be a new gaming era

  • @bastronom4496
    @bastronom4496 Рік тому +15

    Am i the only one feeling we are really close to some huge changes in our society?
    This all feels so fast its surreal

  • @Plafintarr
    @Plafintarr Рік тому +1

    With the possibility of creating convincing representations of voices long gone, something is clear: This is not only history in the making, it's also history in the remaking.

    • @jdsguam
      @jdsguam Рік тому

      Remember it is the winner that gets to write history. AI will be the ultimate winner at some point in human evolution.

  • @brekol9545
    @brekol9545 Рік тому +7

    wow just wow
    i really didnt exepted it will take only 3 seconds, only if it was open source....

    • @panpiper
      @panpiper Рік тому +1

      Give it a year. There will be open source to do the same.

  • @ex0stasis72
    @ex0stasis72 Рік тому +3

    I'm looking forward to all the upcoming video games with limitless branching storylines based on the player's choice, all with 100% voice acting via AI.

  • @ge2719
    @ge2719 Рік тому +3

    thats awesome, especially the ability to have emotion. too many of these voice recreators just have basic talking, and it seems like they're very limited as a result.
    My question would be does this now system get significantly better when its gets more than a few seconds of audio?

  • @jimmimis6364
    @jimmimis6364 Рік тому +2

    Here it is, you can't trust phone conversations anymore 😅

  • @OperationDarkside
    @OperationDarkside Рік тому

    Imagine:
    - Optimus Prime reading you an instruction manual
    - Samuel L. Jackson commenting on your school grades
    - Rowan Atkinson describing an accident happening in slow motion
    - Barack Obama reading a erotic novel in his campaign speech voice
    - Ryan Reynolds reading your medical exam results in his deadpool voice
    - John Wayne as a bartender
    - A 5h documentary about desert sand by Neil Degrasse Tyson

  • @RedJerick
    @RedJerick Рік тому +1

    It's intriguing and scary at the same time, I think now we need reliable technology to detect voice spoofing.

    • @jdsguam
      @jdsguam Рік тому +1

      Most, if not all, social media sites will be detecting fakes as part of their overall business model. Those that don't will lose viewers. Successful web browsers will have detection included in their software as well. All of this is being done as you read write this.

  • @gurglenurgle6539
    @gurglenurgle6539 Рік тому +3

    This is HUGE for modding games. Modders will be able to add great voice acting to their mods without having to spend a lot of money. Can't wait for Skyrim mods to use this tech. ;)

  • @AdmiralBison
    @AdmiralBison Рік тому +1

    You could turn your collection of books and novels into impromptu audio books.

  • @marco_foco
    @marco_foco Рік тому +3

    To be convinced, I definitely want to hear the network clone your voice and say the channel motto "What a time to be alive!" :D

  • @Ben-to7oe
    @Ben-to7oe Рік тому +1

    I loved to hear my grandmother reading me a book. Can’t wait to do it

  • @alexmarriott3891
    @alexmarriott3891 11 місяців тому +1

    I love how the name is a reference to WALL-E

  • @onewillow8511
    @onewillow8511 Рік тому +1

    Oh yes! Another one under the hat!
    On a serious note though, thank you for spreading the quality and necessary info on machine learning/coding/A.I., so the "regular fox" can grasp the context and not spread panic and "doom and gloom" 🙃
    Kudos to you, brother!

  • @Hamachingo
    @Hamachingo Рік тому

    I need this asap to troll tele-marketing callers. Imaging calling someone to peddle your scam and they’re talking back to you in your own voice and mimicking all your mannerisms the angrier your get.

  • @praxis22
    @praxis22 Рік тому +1

    This must be the tech behind elevenlabs voice kit. Going down a storm with Skyrim modding

  • @LostMekkaSoft
    @LostMekkaSoft Рік тому

    What I would use this for: Take audio samples out of Richard Feynman's lectures as the input and then let his voice narrate his biography "Surely You're Joking, Mr. Feynman!"

  • @WT83
    @WT83 Рік тому

    The synthesis is already good. I wonder if that's as good as this version can be regardless of how much audio input you provide or if you can read a couple of sentences in, maybe with labels like "angry", "excited", "sad" etc, and the synthesis can be a lot better?
    I can't wait for gaming to incorporate things like generative images, voice, text, and animation. You could have an RPG with infinite, story-driven quests that were high quality and fully voiced. Or imagine every NPC actually has a detailed life going on outside of player actions and you can keep prying into it and digging into it as much as you want.

  • @RooMan93
    @RooMan93 Рік тому +6

    because our eyes are our best sense and the brain being so fond of filling in the blanks in our vision AI images don't need to worry about the exact pixel colours or exact shape of objects to create a convincing image. I think AI audio (speech in particular) is going to have to work very hard to convince us its a real person speaking. as we have a fairly narrow range of hearing ability that is tuned to human speech even tiny inconsistencies get picked up.

    • @geli95us
      @geli95us Рік тому +1

      I think more than it being about different senses, it's simply because we are really good at recognizing humans, the same happens with vision if you try to render fake humans, it's very hard to convince you that a fake human is real (especially when it comes to motion), hence why most realistic videogames still have uncanny-looking humans, even though everything else is pretty much realistic. I think the same would happen with audio

  • @confusioned2249
    @confusioned2249 Рік тому +2

    Everyone's talking about how incredible this AI paper is, but is no one going to mention that it could be so easily used for nefarious purposes, ranging from simply just making someone you don't like say something fucked up, to actually making false evidence in a legal context ?
    This technology really seems like a double edged sword, capable of making great things, but if used by the wrong people, just as capable for horrid things, and I hope that we can actually figure out the legal specifics of this technology before any bad actors get their hands on this, at which point it'll probably be, in a best case scenario, just some really unneeded instability added to the world, and in a worst case scenario, if they play their cards right, maybe riots, maybe even a coup, or maybe even something much worse, who knows really.

  • @josefk332
    @josefk332 Рік тому

    Reminds me of a movie scene.
    “Hey Ginelle, what’s wrong with Wolfie, I can hear him barking, is he OK?”
    “Wolfie’s fine honey, Wolfie’s just fine. Where are you?”
    “Your foster parents are dead”.

  • @Otis151
    @Otis151 Рік тому

    Personal assistants: cool. Recreating deceased individuals: no thanks.

    • @jdsguam
      @jdsguam Рік тому +1

      IDK. I'm 68 and would absolutely love to have an AI 3D Animated Chatbot of my Dad with his voice. That would be awesome! Sadly though, I have no recordings of my father that passed in 1980.

  • @rayman4954
    @rayman4954 Рік тому

    This is mind blowing stuff!
    Thanks for bringing this knowledge to us.

  • @julienjeanbourquin1756
    @julienjeanbourquin1756 Рік тому +3

    I was wondering how much AI is used to research math, like for instance are there AIs that have been trained with proven mathematic theorems and see if they can come up with improvments to the theorems or new ones altogether?

  • @DrBagPhD
    @DrBagPhD Рік тому +5

    I know people are a bit scared of the potential of this technology but remember that we *always* fear new things. Look at photoshop etc.

    • @rzu1474
      @rzu1474 Рік тому

      Sorry. Can't hear you over the police sirens after you just admitted to plotting treason.
      It's on Tape. It's your voice

    • @Elite7178
      @Elite7178 Рік тому

      Dumb analogy. This is terrifying. And it’s only just getting started. We are screwed boys 😂 ai is going to get out of hand quickly you’ll all see how dangerous this technology will get.

  • @sethjchandler
    @sethjchandler Рік тому +1

    Very scary. You answer the phone from an evil individual. You have a discussion for five seconds before hanging up. And now they can clone your voice and impersonate you.

  • @kaisquared90
    @kaisquared90 Рік тому

    Getting those spam calls that don't say anything just got a whole lot more dangerous.

  • @guilhermehx7159
    @guilhermehx7159 Рік тому +1

    This is terrifying. I'm no longer excited about most AI progresses and I don't know why Two Minute Papers is excited. He sees no problems coming with these AI tools

  • @venim1103
    @venim1103 Рік тому +1

    Ok so Apple for example (or Google etc. voice assistant) can soon just use their “Hello Siri” recording to fully copy anyone’s voice and use it for their own needs. Doesn’t sound scary at all…

  • @DavidBerglund
    @DavidBerglund Рік тому +3

    Are there any good open source project to look at that are fairly easy to try out? Doesn't matter if it requires a very long sample recording to produce decent results. So many experiments I'd like to try. Especially for indie game/story telling stuff.

  • @Hiroprotagonist253
    @Hiroprotagonist253 Рік тому +1

    Remind me again why this is exciting and not depressing

  • @anothergol
    @anothergol Рік тому +1

    I was waiting for a "this whole video has been voiced by the AI" moment, but yeah from those samples, not there yet. Still noticably synthetic but impressive enough.

  • @_Super_Hans_
    @_Super_Hans_ Рік тому +1

    I reckon if the presenter of this video tried to get the AI to copy his voice the computer would blow up.

  • @camelCased
    @camelCased Рік тому

    I wish we had a voice cloning solution that removes the text and language part completely and learns only the target voice timbre. This way you could pass your inflection and emotions to another voice.

  • @sebastiancanalesgarcia
    @sebastiancanalesgarcia Рік тому +1

    it would be awesome to use this as a read aloud browser extension

  • @sergemarlon
    @sergemarlon Рік тому

    This is exactly like the scene from mission impossible where they get the bad guy to read a few phrases to mimic his voice.

  • @allmight1612
    @allmight1612 Рік тому

    That’s Mission: Impossible levels of voice mimicking now

  • @emgee44
    @emgee44 Рік тому

    I can still detect a slight synthesise sound to the AI voice but if you didn’t know about this technology, you’d wouldn’t know any better. Impressive.

  • @timgrove3927
    @timgrove3927 Рік тому

    Gonna be using that sleepy voice for absolutely everything

  • @peter5.056
    @peter5.056 Рік тому +1

    This is how Rick and Morty are going to be voiced.

  • @AshBethel
    @AshBethel Рік тому +2

    I think celebrities should offer their voices as a paid option for users to choose when listening to audiobooks. I can imagine Morgan Freeman making some serious money that way.

  • @Hunter-uz9jw
    @Hunter-uz9jw Рік тому +1

    I remember seeing voice cloning godets in 2000s action movies and thinking that it would never be possible.. and here we are lol

  • @QuestionMan
    @QuestionMan Рік тому

    I wonder if it could take samples from old radio broadcasts and, not only clean up the quality, but also restore the entire dynamic range of the original voice.

  • @TenPester
    @TenPester Рік тому

    Imagine all the scammers calling you up, getting you to speak for 3 seconds, and cloning your voice so they can pretend to be you to relatives or friends or even your workplace. What a great tool.

  • @paulodonovanmusic
    @paulodonovanmusic Рік тому +1

    In January I was imagining having various celebs e.g. Morgan Freeman's voice licenced to a video tutorial site, then users could pick the trainer persona that would teach them, e.g. javascript. So I was imaging Bowie explaining variables. Or the manager in a small company could have inhouse trainings using the materials of third parties, but her own authoritative, trusted persona. And preferably with options too. Seems like we are much farther than I imagined already :)

    • @colindayo
      @colindayo Рік тому

      And what about the copyright?

    • @paulodonovanmusic
      @paulodonovanmusic Рік тому

      ​@@colindayo if you're talking about voices of people, then licensed as mentioned above. In case of dead people, also licensed in advance of death of course. Bowie is just my personal imagination, you couldn't actually get him to sign anything now. Or at least not very legibly. If you're talking about the tutorial site, how do tutorial sites handle copyright licensing? That would be between them and the creators. Obviously everyone has to be on board. Otherwise it's not really a legit tutorial site to begin with. Unless of course they had the creators sign some kind unlimited use clause.
      Anyway, I could write you a business plan, and lend you money to start it, but then, shouldn't you contribute something too? Or is it just let's find something wrong? You could probably avoid the whole question by having ChatGPT 7 generate new tutorials, even new voices, like a cross between Bowie and Freeman and Kermit the Frog. Actually, it would probably sound like Kermit without even adding Kermit.

  • @sendtheasteroid8008
    @sendtheasteroid8008 Рік тому

    Chat gpt creates your paper, then another program copies your voice, then a deep fake is made of you. Then you put it all together in a zoom meeting and work is getting done while I play RuneScape

  • @elitecoder955
    @elitecoder955 11 місяців тому +1

    I wish I had my grandfather's voice recorded for this moment ... :(

  • @rj_2190
    @rj_2190 Рік тому +3

    This is so cool. Is there any way to test it out yet? I’d love to test it with different voice styles

  • @thesaltyguy3564
    @thesaltyguy3564 Рік тому

    "imagine your dead with reading you a book" uuuh this is 100% a black mirror plot

  • @BRUXXUS
    @BRUXXUS Рік тому

    This is one of the only videos on a paper that actually feels like magic. Haha.

  • @clray123
    @clray123 Рік тому +1

    Wow, just imagine scammers using this to scam your grandmother.

  • @MichelBertrand
    @MichelBertrand Рік тому

    Imagine if you could have conversations with a loved one who passed away.

  • @leannviolet
    @leannviolet 9 місяців тому

    I want the VALL-E to train with your voice and wake me up every morning with "WHAT A TIME TO BE ALIVE!"

    • @leannviolet
      @leannviolet 9 місяців тому

      On a side note: Two Minute Papers is the cure to depression, thank you Dr. Károly Zsolnai-Fehér 😁😁

  • @superdoodjj
    @superdoodjj Рік тому

    Next paper: AI perfectly clones your voice after a single breath

  • @humanfriendlyrobot
    @humanfriendlyrobot Рік тому

    Tell me you got a bag of hotchips, without telling me you got a bag of hotchips.

  • @Srindal4657
    @Srindal4657 Рік тому +2

    I need more technological advancements. MORE!

  • @mrburns366
    @mrburns366 Рік тому

    this is going to be great for NPCs in video games! AI can have them respond intelligently and with a convincing human voice

  • @vasudevmenon2496
    @vasudevmenon2496 Рік тому

    Microsoft research projects were one of the best and had innovative ideas. I liked their garage projects. Would love to see that in widespread use.

  • @Kazazki64
    @Kazazki64 Рік тому

    Ngl everytime I hear your voice I already think it is AI synthetized based on your speech cadence lol! Keep up the good work though love everything you do!

  • @arabidllama
    @arabidllama Рік тому

    Forget the "death of the author" argument - we're now into the "resurrection of the author" argument ._.

  • @maljamin
    @maljamin Рік тому +1

    People in comments are discussing how far along this actually is. It must've been like 6 years ago Lyrebird AI came out. I forget how much audio it needed, I want to say 20 minutes, could be less. The result was pretty great, better than the "baseline" clips here. Arguably as good as the new model here. Of course that's doing it with 3 seconds of audio so if you got 20 minutes worth, that's got to be amazing.

  • @junofall
    @junofall Рік тому +3

    What exactly is the use case supposed to be for this? The dangers far outweigh the benefits, as for cloning the voices of the dead, we should really leave them alone.