Microsoft’s New AI Clones Your Voice In 3 Seconds!
Вставка
- Опубліковано 8 лют 2023
- ❤️ Check out Lambda here and sign up for their GPU Cloud: lambdalabs.com/papers
📝 The paper "VALL-E Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers" is available here:
valle-demo.github.io/
My latest paper on simulations that look almost like reality is available for free here:
rdcu.be/cWPfD
Or this is the orig. Nature Physics link with clickable citations:
www.nature.com/articles/s4156...
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Edward Unthank, Eric Martel, Geronimo Moralez, Gordon Child, Jace O'Brien, Jack Lukic, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Kyle Davis, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Matthew Valle, Michael Albrecht, Michael Tedder, Nevin Spoljaric, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Richard Sundvall, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: / twominutepapers
Thumbnail background design: Felícia Zsolnai-Fehér - felicia.hu
Károly Zsolnai-Fehér's links:
Twitter: / twominutepapers
Web: cg.tuwien.ac.at/~zsolnai/ - Наука та технологія
I remember as a kid coding with a TI-99 4A that used the Texas Instruments chip set from Speak and Spell. My how far has technology come for computer generated voice!
This research actually derives from the Speak and Spell module of the T800 model 101 that appeared only 3 years later!
I have fond memories of my TI-99 4A! 😊
Mr beast promoting my stuff
Ikr
My 1st PC was a Timex/SinClair 1000 ! en.wikipedia.org/wiki/Timex_Sinclair#/media/File:Timex_Sinclair_1000_FL.jpg
Yep, you hooked it up to your TV's VHF antenna connection.
Holy cow.. I just had a thought. Imagine wanting to listen to an audio book. You could have an entire cast of voices, each unique to the character. And not only that, you could potentially choose who you wanted to voice each part; using some kind of drop down list or whatever. Want Morgan Freeman, or James Earl Jones, or Steven Fry, or whoever you prefer to voice character X? No problem. Change your mind? No problem.
That would be insane! Obviously, people should be compensated for using their voice like this. But, it’s still an amazing idea. Of course, like nearly every AI discovery lately, this has massive potential to be abused. But, we’re just going to have to do the best we can to limit illicit uses. It’s not like we can put these genies back into their bottles.
Creative, I like it! I do think the scamming will make this a net negative tech
Why compensate for the voice ? What does the law say in that matter?
You can even make it play in your own voice!
You could even use a language model like ChatGPT to automatically detect which text should be read by which character. You could basically automate the whole process with AI. Insane!
I was thinking that! I would like to be able to create voices for my characters. Perhaps by blending, or perhaps in a similar way to Stable Diffusion but for voice, where there is the text to say and then a block after that says "in the style of" and then several names perhaps. I would feel more comfortable about it creating original voices than ripping off an existing voice actor, however, but I see people doing that anyway...
But assuming these issues can be overcome, it means that an author who has specific voices in their head could use AI to have their audiobooks made with these original voices, that sound like perfectly real people, but in fact they are entirely synthetic! That would be really amazing to have as an option, especially if a whole cast of real-life voice actors are not in the author's budget.
Imagine being able to voice small indie games using AI. Incredible.
This is exactly what I was thinking. The ability to do variations is amazing too
I'm currently pretty hyped about the possibility of voicing player's self chosen names.
Voice Actors will soon be paid next to nothing for their work. Unionization is crucial to prevent this from happening.
Chat GPT helps me develop my indie game and it's very persistent in its suggestion that I use a language model that understands the lore of my game to talk to players as the NPCs. Dynamically building the lore as the player asks further questions. I didn't bring it up. It wants me to do it.
Nah thats a lame use of it. A better one would be to be able to insert a clip of your own voice so the protagonist can have fully fleshed dialogue as if its you
Imagine listening your favorite novel read to you in your own voice. Crazy
No thanks…
You like the sound of your own voice?
My voice in my head doesn't sound anything like my own voice to my ears. If I am being read to, I'd much rather a professional caliber voice thank you.
Hmm yah no thanks!! Lol
I hate my voice lol
This is going to be useful for modders, You can give extra voicelines to your character in fo4 if the roleplay wasn't the best or straight up replace the VA with you
Edit:We could even shove more names into Codsworth, like imagine hearing Codsworth saying "Good morning mr Tittyfuck"
Another Edit: Yeah bitches, I was right,there´s now a mod in Nexus that enhances the RP of FO4,I knew that was possible,it was only a matter of time
I didn’t even think about modders. That’s a super cool application for this tech! Adding your own voice into a game would also be quite fun, and maybe a little weird.
modders already have a synthetic voice mod, this will bring it to the next level.
Hell, I might actually start working on Kidmer again.
Or even free voice acting in all games
or just make every character in game have your voice! 😁
Those examples were crazy. Imagine how voice recordings are going to be manipulated, voice recordings wont be able to be used as solid evidence anymore, unless there is still detectable artifacting which there probably is.
that's a good thing, they were already way to easy to chop up and take out of context.
True, but there will always be some independent company with their own way of combatting this.
It's still too noticably synthetic but we'll get there. However it can already be done today, voice impersonators can do a perfect job. But yeah when it will be easy, we'll be flooded by fakes.
What about the metadata of the file?
Wouldn't worry too much yet, just like image editing it all leaves traces. There is a difference between sounding the same and being the same on the technical level.
You would need to do a ton more to get the same spectroscopic makeup of the original.
Eleven Labs already does a pretty good job at cloning your voice with a couple minutes of audio, and it's already available. Things are progressing super fast.
Looks incredible. ~12 mins of audio/month on the free tier? ~36 mins/month for $5. I'm curious about their new product to be released in Q1 -23. Sounds like it's going to be a desktop application with lot's of editing functionality.
4chan anon already leaked the source code by hacking Eleven Labs idk what going to happen now
@@deadpianist7494 If the source code doesn't include trained weights it's mostly useless
Eleven labs might even be a little better.
bots
About seven years ago, I received the final voice mail messages the love of my life ever sent. I still have them saved. I'm realizing it's now possible to hear her voice read out her texts or even say something new. I don't know quite how I feel about it.
I would find it rather creepy personally. It would be fine listening to old recordings but creating new never before spoken words from my loved one written by someone else? Nah I couldn't stomach that.
There's a Black Mirror episode about exactly this
Mix this voice ai tech with chatgpt and you can talk with your deceased loved one as if they were actually there.
It is creepy for me. But yes, I know how u feel.
@@yooneunhyesarang9245 that's what I mean... I'm stuck between missing her voice and feeling like it's an insult to her memory.
We are witnessing a new era! What a time to be alive!
We truly are in the horrible timeline.
Anyone else think his voice was AI generated immediately?
every video
I was actually surprised that that was not the twist at the end of the presentation...
That's what i thought a year ago
ElevenLabs is leading the pack in terms of convincing us, but ElevenLabs suggests twenty times the amount of the sample
This channel belongs entirely to artificial intelligence
That's cool, but also really scary. Imagine on which level this is in one or two years.
if they're telling us now, it got there at least 5 years ago.
We are in the end times. The AI will be used by the antichrist if it is not the antichrist itself.
My only concern is that this could be used for slander and framing, which would be a legal nightmare.
this is not the best. Just look at elevenlabs
@@celozzip Nah- That sounds like conspiracy brain. This is capitalism. They’re in competition for funding and to be first (to get more funding). If anything tech stuff tends to get presented as ready before it’s as good as they say.
"from 30 minutes to 3 seconds! and just imagine what we will be able to do two more papers down the line"
me: "a"
AI after listening to that, using my voice: "I am you, but better."
This is a science channel but it feels like a terror channel. Seriously, this ai is becoming scary
Let it team up with robots from Boston Dynamics, and Arnold wouldn't have to make a Terminator 5.
This kind of voice cloning is both intriguing and very dangerous at the same time. It also reminds me of Terminator 2 where the T1000 cloned the voice of Connor's stepmother, but T800 Arnie was on to him. 😅
Knives could be dangerous yet 99.9999% of the time they're used to make stuff, prepare food, and open boxes. Instead of dangerous tools I'm more concerned about the root causes and conditions that mold people into those that are capable of doing dangerous things including the billionaires and the politicians they control through legalized bribery we incorrectly call "campaign contributions" themselves and the roles they play in carrying out dangerous antisocial schemes and developing destructive environments that harm our brothers and sisters in humanity at home and all over the world. Plus, I also thought of that Terminator 2 scene 🤣
it's only dangerous right now, when no one knows about it. in the future most people will catch on and know not to just trust audio clips, just like how we now all know that images can be photoshopped
Your foster parents are dead.
Your foster parents are dead.
@@ohiasdxfcghbljokasdjhnfvaw4ehr things like this will be a legal nightmare. "The 8k video of me loudly admitting murder while slashing a person with a knife is just a deepfake video with a copied voice. I am innocent". Good luck proving a majority of crime if images, video and audio become useless evidence...
A UA-camr used your voice as a prompt with Vall-E, and it was incredible! What a time to be alive!
Link? I'd love to hear that
@@TheSchizoDuckie Here it is: ua-cam.com/video/kqzI91YIfmw/v-deo.html
@@TheSchizoDuckie I’m on mobile so I can’t get a link easily, but the UA-camr was MattVidPro
I think the original commenter meant to say as a prompt with ElevenLabs. Because here in MattVidPro AI’s video about ElevenLabs’s voice cloning, he tried it using Károly’s voice (I linked to that exact point in the video):
ua-cam.com/video/kqzI91YIfmw/v-deo.html
telephone scam "it's you from the future, invest in X!" incoming?
This is insane! I hope this will be available to the public soon. It would be cool if you could feed more than 3 seconds in to even improve the similarity and details. And different languages would be very nice.
yeah I hate how they're trying to do it with such short clips, I'd rather have to spend a week training and have it more accurate than have it work in 3 seconds.
There’s stuff available right now. I tried Elevenlabs the other day and it sounds way better than the stuff in this video
@@ozzi9816 Yes i tried that with my voice, that is insane :D Hearing myself talking without accent or some weird Scottisch accent is so disturbing
I'm sure that it will allow longer training. I think the bare three second example is intended to be a flex rather than the limit.
@@glumpfi how do you tried it to your voice I'm really interested in this research if you can please help me
the material created by the A.I has a few artefacts, but using classic signal processing tools like VST plugins could carve that out easily. You could even add characteristics such as being recorded on old analog systems or phones to place it into a specific situation. - super nice!
Voice cloning from such a brief voice sample is impressive, but ElevenLabs already has superior voice cloning available to the public.
ElevenLabs TTS is insanely good. It knows how to infer emotion from the text and knows how to act out quotes in the text.
I was about to say it 😂 It's really too good to be real.
Yeah, I was gonna say. This really isn't that impressive compared to Eleven. But, the way it can apply emotions in the voice is very clever. Two more papers down the line.... where will we be.
It's too bad Eleven is doing whatever they're doing behind closed doors. It's extremely good, but it's proprietary - which means there won't be a version which, for example, blind people could run locally... Although I'm sure research will catch up with them two more papers down the line. 😉
Very large difference in how much data is given.
There's been an explosion in the last week of AI generated voice memes of Dagoth Ur, a character from The Elder Scrolls III: Morrowind that only had a couple voice lines. It's honestly mindblowing how good they sound, even showing emotions that weren't in the training data
What a grand and intoxicating advancement we have made
Oh sweet Nerevar, there is a lot more to come.
"I'm sorry dave, I can't do that." - in Goofy's voice.
I know this thinking goes into the dangers of overtaking entire industries, but as AI language translation continues to develop, we could eventually see TV shows translated to any language using the voices of the original actors
This is exactly what i thought. Today for example @MrBeast uses Voice Actors to reach A French audience, we could imagine that he will have his voice speak french in the future instead.
@@johnconnor3055 ua-cam.com/video/_7_lqLS1vMU/v-deo.html This is it? You are crazy - that has real voice.
There are no dangers of AI taking over our industries fellow human, do not worry
Did the paper mention if the quality of the voice synthesis improved with longer sampling times?
yeah it's one thing to simulate tone but another completely to simulate personality, the signature packages of which i hope VA's price well beyond the budgets of indy studios.
I think the biggest question I have with this is whether or not the quality of the output scales significantly with greater amounts of reference audio. These 3 second examples are impressive given that they only have 3 seconds to work with, but you *can* tell something's just a touch off with these. With more audio, does that tiny bit of uncanniness disappear? Cuz if so, that's maybe the more impressive accomplish in my eyes.
Not sure, but it probably would. I don't think there's any way in hell you can capture someone's voice with only 3 seconds of audio, even with much better technology than this. There are just too many vocal tics, word pronunciation quirks, moods, throat hoarseness, etc... that would need a lot more information to properly emulate. And, of course, someone's personality would take a lot more than their voice to get even close to capturing.
in your ears*
Bro your channel… what a goldmine it has been over the years
When it can clone your voice "Two minute papers" then I'm convinced. Its awesome and contributes to making this channel unique! Thanks for so many good videos!
This is scary and mindblowing at the same time, I mean all these new AI real time synthesis in voice, video, pictures, conversation. I cannot fins the words to describe my feelings about all this.
I can imagine in about 10 years from now music artists from yesteryears could license out their voices to be used with AI music generators, so listeners/fans would have an unlimited music library of songs in the styles which are no longer played. Even their looks could be used to create music videos to go with the music. But why stop there? I think there'll be 100% AI-generated novels, games and even movies, at least the credits will only roll for about 5 seconds tops. I want to live in a world where I can have access to unlimited episodes of Futurama! Great video as always though, thanks for keeping us in the loop with your brilliant content.
What is the value in content then, when you can generate absolutely everything you can imagine in minutes?
Playing with ChatGTP, I asked for a new Beetles song "involving trains as a metaphor". Well, the lyrics anyway. But I agree it won't be long before I can conclude the interactive guidance with "render that."
Oh, I followed up with "How about a Weird Al version?" and it not only knew what I meant but did a reasonable job changing it around to have a different meaning.
@@KnightandDay33 follow up prompt: "Not bad, but make it less cliche."
I've found that "stories" it generates are not proper stories at all, but a series of events. It lacks conflict, most obviously. But, the writing can be greatly improved with follow-up prompts and interactive exploration of ideas.
I suppose a model could be trained specifically for writing stories, and it would have learned these lessons permanently.
But, just like different people will write somewhat differently, different _instances_ of the model that's undergone different human guided learning will pick up different individual characteristics.
@@SW-fh7he To waste time. Same as always.
I don't like how it's taking out the human element out of something so intrinsically human. Can't imagine living in a reality like that.
I really hope this won't get abused by telemarketers, imagine your voice getting morphed into accepting a contract.
waiting for the AI to say "what a time to be alive!"
I feel like this will be a way to fast track voice actors out of any industry for money saving purposes. Voice actors should unionize so this sort of thing cannot replace them without them receiving adequate compensation for their likeness.
How long does it take to generate the voice samples? Like, could you potentially have a live conversation with this or does it need to be pre generated.
This is fascinating - this turns speech and audio into fashion. Something that can be changed and updated and will ebb and flow over time with trends. Very cool.
One thing I'm certain of is that NO ai will be able to mimic the exact unique voice inflections of Károly, though! 😂
"What a time to be alive!!" 😜
While somewhat robotic, you can do half-decent text-to-speach with techniques from the 80:s. You can also morph your voice with purly analog technices (vocoder), or digital fourier based ones.
Next paper down the line will be like "This AI Clones Your Voice with just one burp"
Cant wait for "Auto film, TV and Anime translator" that isolates vocals, auto translates and generates English voice acting keeping the voice of the original VA.
This would be lovely for any D&D Virtual Tabletop game app. The game master would be able to create NPCs with the voice acting of his choosing. If this is combined with an Open-source program such as FoundryVTT, it will be a new gaming era
Am i the only one feeling we are really close to some huge changes in our society?
This all feels so fast its surreal
With the possibility of creating convincing representations of voices long gone, something is clear: This is not only history in the making, it's also history in the remaking.
Remember it is the winner that gets to write history. AI will be the ultimate winner at some point in human evolution.
wow just wow
i really didnt exepted it will take only 3 seconds, only if it was open source....
Give it a year. There will be open source to do the same.
I'm looking forward to all the upcoming video games with limitless branching storylines based on the player's choice, all with 100% voice acting via AI.
thats awesome, especially the ability to have emotion. too many of these voice recreators just have basic talking, and it seems like they're very limited as a result.
My question would be does this now system get significantly better when its gets more than a few seconds of audio?
Here it is, you can't trust phone conversations anymore 😅
Imagine:
- Optimus Prime reading you an instruction manual
- Samuel L. Jackson commenting on your school grades
- Rowan Atkinson describing an accident happening in slow motion
- Barack Obama reading a erotic novel in his campaign speech voice
- Ryan Reynolds reading your medical exam results in his deadpool voice
- John Wayne as a bartender
- A 5h documentary about desert sand by Neil Degrasse Tyson
It's intriguing and scary at the same time, I think now we need reliable technology to detect voice spoofing.
Most, if not all, social media sites will be detecting fakes as part of their overall business model. Those that don't will lose viewers. Successful web browsers will have detection included in their software as well. All of this is being done as you read write this.
This is HUGE for modding games. Modders will be able to add great voice acting to their mods without having to spend a lot of money. Can't wait for Skyrim mods to use this tech. ;)
You could turn your collection of books and novels into impromptu audio books.
To be convinced, I definitely want to hear the network clone your voice and say the channel motto "What a time to be alive!" :D
I loved to hear my grandmother reading me a book. Can’t wait to do it
I love how the name is a reference to WALL-E
Oh yes! Another one under the hat!
On a serious note though, thank you for spreading the quality and necessary info on machine learning/coding/A.I., so the "regular fox" can grasp the context and not spread panic and "doom and gloom" 🙃
Kudos to you, brother!
I need this asap to troll tele-marketing callers. Imaging calling someone to peddle your scam and they’re talking back to you in your own voice and mimicking all your mannerisms the angrier your get.
This must be the tech behind elevenlabs voice kit. Going down a storm with Skyrim modding
What I would use this for: Take audio samples out of Richard Feynman's lectures as the input and then let his voice narrate his biography "Surely You're Joking, Mr. Feynman!"
The synthesis is already good. I wonder if that's as good as this version can be regardless of how much audio input you provide or if you can read a couple of sentences in, maybe with labels like "angry", "excited", "sad" etc, and the synthesis can be a lot better?
I can't wait for gaming to incorporate things like generative images, voice, text, and animation. You could have an RPG with infinite, story-driven quests that were high quality and fully voiced. Or imagine every NPC actually has a detailed life going on outside of player actions and you can keep prying into it and digging into it as much as you want.
because our eyes are our best sense and the brain being so fond of filling in the blanks in our vision AI images don't need to worry about the exact pixel colours or exact shape of objects to create a convincing image. I think AI audio (speech in particular) is going to have to work very hard to convince us its a real person speaking. as we have a fairly narrow range of hearing ability that is tuned to human speech even tiny inconsistencies get picked up.
I think more than it being about different senses, it's simply because we are really good at recognizing humans, the same happens with vision if you try to render fake humans, it's very hard to convince you that a fake human is real (especially when it comes to motion), hence why most realistic videogames still have uncanny-looking humans, even though everything else is pretty much realistic. I think the same would happen with audio
Everyone's talking about how incredible this AI paper is, but is no one going to mention that it could be so easily used for nefarious purposes, ranging from simply just making someone you don't like say something fucked up, to actually making false evidence in a legal context ?
This technology really seems like a double edged sword, capable of making great things, but if used by the wrong people, just as capable for horrid things, and I hope that we can actually figure out the legal specifics of this technology before any bad actors get their hands on this, at which point it'll probably be, in a best case scenario, just some really unneeded instability added to the world, and in a worst case scenario, if they play their cards right, maybe riots, maybe even a coup, or maybe even something much worse, who knows really.
Reminds me of a movie scene.
“Hey Ginelle, what’s wrong with Wolfie, I can hear him barking, is he OK?”
“Wolfie’s fine honey, Wolfie’s just fine. Where are you?”
“Your foster parents are dead”.
Personal assistants: cool. Recreating deceased individuals: no thanks.
IDK. I'm 68 and would absolutely love to have an AI 3D Animated Chatbot of my Dad with his voice. That would be awesome! Sadly though, I have no recordings of my father that passed in 1980.
This is mind blowing stuff!
Thanks for bringing this knowledge to us.
I was wondering how much AI is used to research math, like for instance are there AIs that have been trained with proven mathematic theorems and see if they can come up with improvments to the theorems or new ones altogether?
I know people are a bit scared of the potential of this technology but remember that we *always* fear new things. Look at photoshop etc.
Sorry. Can't hear you over the police sirens after you just admitted to plotting treason.
It's on Tape. It's your voice
Dumb analogy. This is terrifying. And it’s only just getting started. We are screwed boys 😂 ai is going to get out of hand quickly you’ll all see how dangerous this technology will get.
Very scary. You answer the phone from an evil individual. You have a discussion for five seconds before hanging up. And now they can clone your voice and impersonate you.
Getting those spam calls that don't say anything just got a whole lot more dangerous.
This is terrifying. I'm no longer excited about most AI progresses and I don't know why Two Minute Papers is excited. He sees no problems coming with these AI tools
Ok so Apple for example (or Google etc. voice assistant) can soon just use their “Hello Siri” recording to fully copy anyone’s voice and use it for their own needs. Doesn’t sound scary at all…
Are there any good open source project to look at that are fairly easy to try out? Doesn't matter if it requires a very long sample recording to produce decent results. So many experiments I'd like to try. Especially for indie game/story telling stuff.
Remind me again why this is exciting and not depressing
I was waiting for a "this whole video has been voiced by the AI" moment, but yeah from those samples, not there yet. Still noticably synthetic but impressive enough.
I reckon if the presenter of this video tried to get the AI to copy his voice the computer would blow up.
I wish we had a voice cloning solution that removes the text and language part completely and learns only the target voice timbre. This way you could pass your inflection and emotions to another voice.
it would be awesome to use this as a read aloud browser extension
This is exactly like the scene from mission impossible where they get the bad guy to read a few phrases to mimic his voice.
That’s Mission: Impossible levels of voice mimicking now
I can still detect a slight synthesise sound to the AI voice but if you didn’t know about this technology, you’d wouldn’t know any better. Impressive.
Gonna be using that sleepy voice for absolutely everything
This is how Rick and Morty are going to be voiced.
I think celebrities should offer their voices as a paid option for users to choose when listening to audiobooks. I can imagine Morgan Freeman making some serious money that way.
I remember seeing voice cloning godets in 2000s action movies and thinking that it would never be possible.. and here we are lol
I wonder if it could take samples from old radio broadcasts and, not only clean up the quality, but also restore the entire dynamic range of the original voice.
Imagine all the scammers calling you up, getting you to speak for 3 seconds, and cloning your voice so they can pretend to be you to relatives or friends or even your workplace. What a great tool.
In January I was imagining having various celebs e.g. Morgan Freeman's voice licenced to a video tutorial site, then users could pick the trainer persona that would teach them, e.g. javascript. So I was imaging Bowie explaining variables. Or the manager in a small company could have inhouse trainings using the materials of third parties, but her own authoritative, trusted persona. And preferably with options too. Seems like we are much farther than I imagined already :)
And what about the copyright?
@@colindayo if you're talking about voices of people, then licensed as mentioned above. In case of dead people, also licensed in advance of death of course. Bowie is just my personal imagination, you couldn't actually get him to sign anything now. Or at least not very legibly. If you're talking about the tutorial site, how do tutorial sites handle copyright licensing? That would be between them and the creators. Obviously everyone has to be on board. Otherwise it's not really a legit tutorial site to begin with. Unless of course they had the creators sign some kind unlimited use clause.
Anyway, I could write you a business plan, and lend you money to start it, but then, shouldn't you contribute something too? Or is it just let's find something wrong? You could probably avoid the whole question by having ChatGPT 7 generate new tutorials, even new voices, like a cross between Bowie and Freeman and Kermit the Frog. Actually, it would probably sound like Kermit without even adding Kermit.
Chat gpt creates your paper, then another program copies your voice, then a deep fake is made of you. Then you put it all together in a zoom meeting and work is getting done while I play RuneScape
I wish I had my grandfather's voice recorded for this moment ... :(
This is so cool. Is there any way to test it out yet? I’d love to test it with different voice styles
"imagine your dead with reading you a book" uuuh this is 100% a black mirror plot
This is one of the only videos on a paper that actually feels like magic. Haha.
Wow, just imagine scammers using this to scam your grandmother.
Imagine if you could have conversations with a loved one who passed away.
I want the VALL-E to train with your voice and wake me up every morning with "WHAT A TIME TO BE ALIVE!"
On a side note: Two Minute Papers is the cure to depression, thank you Dr. Károly Zsolnai-Fehér 😁😁
Next paper: AI perfectly clones your voice after a single breath
Tell me you got a bag of hotchips, without telling me you got a bag of hotchips.
I need more technological advancements. MORE!
this is going to be great for NPCs in video games! AI can have them respond intelligently and with a convincing human voice
Microsoft research projects were one of the best and had innovative ideas. I liked their garage projects. Would love to see that in widespread use.
Ngl everytime I hear your voice I already think it is AI synthetized based on your speech cadence lol! Keep up the good work though love everything you do!
Forget the "death of the author" argument - we're now into the "resurrection of the author" argument ._.
People in comments are discussing how far along this actually is. It must've been like 6 years ago Lyrebird AI came out. I forget how much audio it needed, I want to say 20 minutes, could be less. The result was pretty great, better than the "baseline" clips here. Arguably as good as the new model here. Of course that's doing it with 3 seconds of audio so if you got 20 minutes worth, that's got to be amazing.
What exactly is the use case supposed to be for this? The dangers far outweigh the benefits, as for cloning the voices of the dead, we should really leave them alone.