Deepfake AI voice clone: 30min vs 8hrs of training (Descript Overdub demo)

Поділитися
Вставка
  • Опубліковано 1 жов 2024

КОМЕНТАРІ • 119

  • @lachlanmoore2345
    @lachlanmoore2345 2 роки тому +83

    I am planning on using this tool to get my friends arrested by confessing to crimes they didn't do!

    • @sun-eye
      @sun-eye Рік тому +3

      Hey, that's a cool little trick.

    • @billagond9209
      @billagond9209 Рік тому

      Something liike this happen on Discord, never believe anything you see online.

    • @generationzee13
      @generationzee13 Рік тому

      Hell Nah

    • @thisguyIoI
      @thisguyIoI Рік тому

      💀

    • @Ayplus
      @Ayplus Рік тому

      That's what friends are for

  • @OwlofDating
    @OwlofDating Рік тому +20

    Short but to the point, just what I wanted. Thank you

  • @dougjohnson1517
    @dougjohnson1517 2 роки тому +5

    Sounds like I could replace audio book readers I hate with my favorite narrators. And if it's a little flat, good! Because what I hate is overacting.

  • @ricebeansrockroll882
    @ricebeansrockroll882 3 роки тому +7

    It still sounds a bit "crunchy".
    But I'm not sure I would have thought you where a computer as much as using a bad microphone.

  • @FreshMootz
    @FreshMootz Рік тому +3

    I've been trying to improve the robotic voice that it gave me to begin with, but I cant seem to find the settings of how to upload my podcasts which I have an abundance of my natural voice. Can you do a video of how to do this?

  • @FusionThunder
    @FusionThunder 3 роки тому +24

    It's pretty good! I wonder how it will sounds like after a 100 hours

  • @allenhuffman
    @allenhuffman 2 роки тому +20

    That’s a really good demo of the technology. Back when I had a bunch of podcast shows, this would have greatly reduced the time I spent editing them each week. I can only imagine where it will be in ten years.

    • @IntenseInvestor
      @IntenseInvestor 2 роки тому +2

      Can probably just think about the words and they will appear....

  • @tomdchi12
    @tomdchi12 3 роки тому +10

    Yikes! I didn't notice the on first viewing (or hearing)! Second time through, I picked up where the system mushes some sounds, and that the intonation is flat, but overall... scary good.

  • @gonzcasa
    @gonzcasa 3 роки тому +10

    mind blown, this is a game changer for a lot of businesses

  • @VaibhavShewale
    @VaibhavShewale 2 роки тому +2

    i tried to train but it never worked for me

  • @ArkyonVeil
    @ArkyonVeil 3 роки тому +15

    Great improvement over the last test. Though its still quite noticeable when someone has heard your voice before and is wearing headphones. The artifacts like a slight electronic tinge and the unnatural inflection kind of reveal the whole sharade.
    HOWEVER: If someone isn't all that used to your voice or expecting the fakery, I imagine this could go unnoticed in the vein of a lackluster take and a spotty microphone.
    Technology is improving fast, likely it is that in 2022 most of these issues have been ironed out.

    • @RealJamesArcher
      @RealJamesArcher  3 роки тому +4

      Yeah, the artifacts are one of the biggest giveaways, but I don't think it'll take them long to sort that out. It's amazing to me how realistic they're getting the intonation. It's only a matter of time.

    • @mnomadvfx
      @mnomadvfx 2 роки тому

      "If someone isn't all that used to your voice or expecting the fakery, I imagine this could go unnoticed in the vein of a lackluster take and a spotty microphone"
      ^^ This 100%.
      Not just the microphone, but also the audio compression which is far from perfect in every single encoding - especially if you are experiencing frame drops causing audio glitches at the same time.

    • @onliemovie8994
      @onliemovie8994 10 місяців тому

      It was obvious that this wasn’t your voice but still impressive honestly

  • @HowTechTo
    @HowTechTo 3 роки тому +8

    That's pretty good! Did you use your own audio for the 8 hours or did you use their script? Not 100% clear if you have to use their script or not. I submitted 10 minutes and... it's not enough

    • @FreshMootz
      @FreshMootz Рік тому

      I submitted about 45 minutes. It's not enough.

  • @TechieSewing
    @TechieSewing 2 роки тому +2

    To be fair it doesn't sound much like your voice or you own sound arrangement, all that combination of mic proximity, echo and so on. But it does sound like a human voice :)

  • @dobishs
    @dobishs Рік тому +1

    would this work when using someone else voice but I need to input words from another language?

  • @randyrektor
    @randyrektor 3 роки тому +4

    I love this, but I hate this, but I love this.. you know?

    • @RealJamesArcher
      @RealJamesArcher  3 роки тому +3

      I feel the same! It:s very clever, but it's a disturbing road to be on.

  • @matthewfuller9760
    @matthewfuller9760 3 роки тому +3

    From the owner: "While you can edit Projects offline, you still need an internet connection to transcribe audio." Does this mean I cannot record output from my own voice without an internet connection once the project files have been generated? Transcription refers to the conversion of speech to text. I want to be able to type and have it output my voice without being online.
    In other words, they dont have a general model of my voice. Every new word is novel and must be computed using their servers otherwise it would take forever?

    • @RealJamesArcher
      @RealJamesArcher  3 роки тому

      I'm not positive, but I would assume that both the text-to-speech and speech-to-text require round trips to the server because they're both pretty processor intensive.

    • @matthewfuller9760
      @matthewfuller9760 3 роки тому

      @@RealJamesArcher yep. I am thinking the same way.

  • @murrayr100
    @murrayr100 2 роки тому +1

    It sounds distorted and emotionally flat. Like a real person who is having equipment problems. It's really not good enough for podcasting. It might be alright for some short edits though.

  • @takiaschannel
    @takiaschannel Рік тому +1

    Sounds Awesome, I've added about 12mins. Is it best to make a new dub with new longer audio or add new audio by editing the existing one?

  • @NightDocs
    @NightDocs 2 роки тому +20

    It was pretty immediately apparent it was AI from the beginning. I’m actually kind of surprised because my personal voice clone on Descript sounds a little better while in yours I’m still hearing some artifacts.
    Also, the 8 hours of voice training was probably a little wasteful because the voice trainer only accepts about an hour of training, as far as I remember which means it probably took the first hour and discarded the rest.
    The improvement was almost entirely because their algorithm was recently updated and improved, not necessarily because you gave it more training
    Edit: saw your other video was only a few months ago so the extra training did probably help. It’s also been awhile since I read about the max training time so they may have changed that shrug 🤷🏼‍♂️
    Great video either way!

    • @Revontuletband
      @Revontuletband 2 роки тому +1

      I think that different voices will get different results, well, just because they are different. For instance, I feel like a lot of quirks in James's clone are coming from the fact that his real voice has a lot of vocal fry. It's a kinda of natural distortion so it's probably not good for the AI. But anyway, it's very interesting to watch how thngs improve!

    • @mnomadvfx
      @mnomadvfx 2 роки тому +3

      I feel like using compressed audio as training data isn't the best start however much more time it takes to upload uncompressed audio.

  • @BlackPrimeMinister
    @BlackPrimeMinister Рік тому +1

    This video deserves the follow just because it is so clever.

  • @patrickstavros7429
    @patrickstavros7429 2 роки тому +2

    congrats you just gave away your voice for free. You cannot stop them from using your voice in a podcast, commericial, product or service that you do not agree with or align with. The best part it you get no money in return, hence the ROYALTY-FREE term used in the agreement. You are a podcaster, you are making money off your UA-cam channel, why on earth would you let your voice go for free?
    8.2 License to User Content. We claim no ownership rights in your User Content. You hereby grant to us a nonexclusive, royalty-free, sublicensable, worldwide license to access, reproduce, distribute, process, publish, display, perform, adapt, modify, analyze, and otherwise use the User Content to provide, maintain, and improve Descript and the Descript technology, without compensation to you, provided that our use of any Projects you create is subject to the usage limitations and confidentiality obligations set forth in Section 9 below.

    • @enriquemontero74
      @enriquemontero74 2 роки тому +1

      is a simple voice lol

    • @percythefisherman
      @percythefisherman 2 роки тому +1

      This is worrying. You have highlighted a very legitimate problem that for paid voice over artists and actors is a minefield. I feel that the producer of this video should at least let us know his opinion on this.

    • @mnomadvfx
      @mnomadvfx 2 роки тому

      @@percythefisherman It's exactly why the initial enthusiasm for deepfake tech waned so quickly in academia.
      Juat like with the stem cell problem some time ago.
      The ethics problem reared it's ugly head, people started pointing fingers, legislation started restricting what they could do and funding dried up - the academics are too afraid to push the tech forward for fear of losing funding they need to do research.

  • @aligatortree
    @aligatortree 2 роки тому +2

    that was a good one James... Question, are you able to download the audio and put it elsewhere, let's say i wanna edit on Davinci, instead of on Descript's video editor?

  • @LiveWellUkraine
    @LiveWellUkraine 2 роки тому +1

    Creepy... and cool. (Which is how good tech starts.) The question is James... will you use this power for good or evil?

  • @Madbeef878
    @Madbeef878 2 роки тому +1

    Wow! As great as this tool would be for content creators, I can see it 100% being a 'must have', for the criminally minded. "Hello Mr Archer, How are you doing today? I'm just calling you about your bank account......"

  • @RootoonsEkim
    @RootoonsEkim Рік тому +1

    Something I plan to do with Overdub is to clone my high voice for a character in my show I have, so when I get older and cannot do that anymore I will be able to use overdub to keep the voice in store.

  • @formulavicio4273
    @formulavicio4273 2 роки тому +1

    Lol everything was text to speech man this is awesome, i have problems in create videos since my house makes a lot of noises with small space and multiple persons around this is awesome!

  • @TheAaronalden
    @TheAaronalden 3 роки тому +18

    I could tell immediately because I was listening for it. Still very impressive! I think within a year they could have it perfected. I'm not sure what makes it sound off, it just has kind of a digital glitch like when you set spoken words to autotune.

    • @Daniel_WR_Hart
      @Daniel_WR_Hart 2 роки тому +2

      Same, but only because I was partially expecting to get punk'd. If this was anyone else's video on a different topic, I would have assumed they had a sore throat.

  • @moneymentor_channel
    @moneymentor_channel 2 місяці тому

    not good anyway. The quality of the sound is bad, the voice cloning was decent.

  • @georged822
    @georged822 Рік тому +1

    I noticed it right away, it reminds me of listening to low bit rate audio. But the 8hrs def made your voice sound higher rez. Maybe after 400 hours it will sound realistic?

  • @chumleyk
    @chumleyk Рік тому

    Too many channels using fake audio. Your one included.. It sounded fake from the beginning. Soon everyone will get sick of it once they know the signs and render all the libraries of videos using it as trash.

  • @illiniry
    @illiniry Рік тому

    I have lots of audio files of my deceased father, can I use descript to clone his voice or can I only train it by repeating their phrases? Please help, I would really like to bring my dad back thanks.

  • @breehimself
    @breehimself 3 роки тому +2

    mindblown.gif

  • @TheADCRogue_YT
    @TheADCRogue_YT Рік тому

    As soon As you played the first version and said you were going to compare I realized the entire video had been dubbed

  • @DJBumbleBee901
    @DJBumbleBee901 23 дні тому

    Nice. Do you have to verify vocals to train and clone? And is it mobile?

  • @jasonwood999
    @jasonwood999 Рік тому +1

    Wow...I hadn't noticed... this is fantastic

  • @roastking860
    @roastking860 Рік тому

    I'm trained as hell, let's go. I could totally the tell the difference

  • @TomerGamerTV
    @TomerGamerTV Рік тому

    1:41 at the second that the video started i already knew there was something wrong going on with your voice

  • @LostInTech3D
    @LostInTech3D 3 роки тому

    sounds like you with the flu phoning into work 😂
    I can see this being bad news for us regarding those horrible text+speech videos that exist on youtube.

    • @RealJamesArcher
      @RealJamesArcher  3 роки тому

      Or audiobooks. I can't imagine trying to listen to a whole audiobook with an AI voice, because even a great one would still be...awkward.

    • @mnomadvfx
      @mnomadvfx 2 роки тому

      "I can see this being bad news for us regarding those horrible text+speech videos that exist on youtube"
      Au contrare - the current ones are terrible.
      This will at least give them a nice upgrade - unfortunately as they improve it will become harder to tell a real one from a fake one.
      Someone could literally just

  • @jartinte
    @jartinte Рік тому

    Awesome did you have some guide for foundational model to train voices ?

  • @keisaboru1155
    @keisaboru1155 Рік тому

    sounds exactly like your broken mic hahaha xD

  • @CassiusTheShow
    @CassiusTheShow Рік тому

    Hey James -- great video. What would you recommend today is the best way to train a model that I have 10 hours of interview audio with?
    For a documentary I'm working on -- I want to feed it audio of a professional actor performing a monologue I wrote, and use the model to overdub the documentary subject's voice onto.

  • @FERNANDOPENAS
    @FERNANDOPENAS 10 місяців тому

    Will descript immitate my accent as well or only my voice pitch and tone?

  • @aliaskennedy7897
    @aliaskennedy7897 2 роки тому

    I was telling myself something wrong with the audio in this video , haha

  • @founderedmoney
    @founderedmoney 11 місяців тому

    First time I’ve said wow out loud to an ai tool

  • @annabrenda8694
    @annabrenda8694 2 роки тому

    I noticed RIGHT FROM THE BEGINNING that you were using the AI

  • @IntenseInvestor
    @IntenseInvestor 2 роки тому

    Wild....going to try this on my channel lol

  • @huynhdanghaiau
    @huynhdanghaiau 2 роки тому

    I want to learn, you can open this good knowledge class

  • @Omnikam
    @Omnikam Рік тому

    This could give back someones voice lost to cancer

  • @BungieStudios
    @BungieStudios Рік тому

    Very accurate. I could tell though probably because I am wearing headphones and also expected it based on the subject material. Slight gaps and no breaths in the audio. However, if I didn't know any better it would fool me.

  • @angelicearthling
    @angelicearthling 2 роки тому

    It's pretty good, but it still has that robotic tone to it. I could tell it wasn't your actual voice from the beginning.

  • @HansCNelson
    @HansCNelson Рік тому

    Is there a way to train an overdub voice on a specific speaker once speaker labels have been applied to the video?

  • @sun-eye
    @sun-eye Рік тому

    Yeah, I could tell from the very beginning that your voice sounded robotic. I would suggest using it for longer. Maybe, a couple of days because the difference between the 30 minute and the 3 hour is very big.

  • @realjgerard
    @realjgerard Рік тому

    Crazy how this was a year ago… my have we come a long way…

  • @atranimecs
    @atranimecs 2 роки тому

    wow its gotten way better.

  • @oz4549
    @oz4549 Рік тому

    This will be huge in the adult industry

  • @فيصلفيصل-ك8ر3ظ
    @فيصلفيصل-ك8ر3ظ Рік тому

    Can I create digital audio in Arabic?

  • @Thatsmessedupman
    @Thatsmessedupman Рік тому

    Yup. I knew instantly your voice was ,ai even with the training, There is an underlying gravely sound in the voice with a hit of warping and electronic feel.

  • @ericstephenvorm
    @ericstephenvorm Рік тому

    Impressive results!

  • @Masterhunter325
    @Masterhunter325 2 роки тому

    This is sick!)) I couldn't tell you used Overdub for the entire video)))

  • @BAWalks
    @BAWalks Рік тому

    Mind Blown.

  • @hasan7786
    @hasan7786 2 роки тому

    Whooooo! Thanks for these videos. Answered all my questions before I pulled the trigger.

  • @alistair21
    @alistair21 2 роки тому

    sounds way better than mine which is still pretty shit after providing an hour of training.

  • @joshualopez9259
    @joshualopez9259 2 роки тому

    I could tell it wasn't really you like 5 words in, still has a tone to it that let's you know but not bad, but doesn't help that it made every last word of a sentence you say sound so low and goes down.

  • @radedjordjevic8638
    @radedjordjevic8638 4 місяці тому

    not work

  • @augustine_
    @augustine_ 3 роки тому

    Is this means we only need to upload our voice id statement + the audio of our podcast/anything and not the script given by them for the initial voice overdub setup? Or we want to first upload voice id statement plus their 30 minute transcript to get overdub voice, then for more accurate overdub, upload other file with voice id statement with our podcast audio? Waiting for your reply.

    • @RealJamesArcher
      @RealJamesArcher  3 роки тому +2

      Hey, Augustine, it pretty much means you just need the voice ID statement and then whatever audio you can pull together. I just took all the raw recordings from my past video shoots and stitched them together in a single audio file, and that worked for me!

  • @marcs7847
    @marcs7847 2 роки тому

    Sick!

  • @themaskio4804
    @themaskio4804 Рік тому

    Awesome. But how do you train the AI with the voice you want it to learn?

  • @gfreezy619
    @gfreezy619 Рік тому

    This is exciting and scary at the same time

  • @ivirlei
    @ivirlei 2 роки тому

    Incredible video!

  • @redferdroyale9510
    @redferdroyale9510 2 роки тому

    Yea its much better

  • @gonzcasa
    @gonzcasa 3 роки тому

    soon you'll be able to make your own music with an artist that you want

    • @RealJamesArcher
      @RealJamesArcher  3 роки тому

      Yeah, there are definitely some weird ethical concerns here. This particular company requires the training voice to recite a verbal contract, but there are ways to get around that (like having an impersonator do it) and nothing stopping people from downloading and using their own software on any training data they want. Weird times ahead.

  • @andersonsystem2
    @andersonsystem2 2 роки тому

    wow! OMG

  • @matthewfuller9760
    @matthewfuller9760 3 роки тому

    Once trained by your voice for 8 hours, can you then use the tool offline? I imagine not right.

    • @mnomadvfx
      @mnomadvfx 2 роки тому +1

      Indeed.
      I would be wary of doing this in the first place.
      It's one thing for people like celebrities that have tens of thousands of hours of their voice on record due to their public exposure - but for the average joe not wanting their identity stolen this could potentially be dangerous.

  • @ktestable
    @ktestable 2 роки тому

    damn..

  • @singlesightart
    @singlesightart 2 роки тому

    That is so awesome

  • @shonsavesclaims_1
    @shonsavesclaims_1 2 роки тому

    Wow.

  • @GingerBooth
    @GingerBooth Рік тому

    Great demo!

  • @relaxbro5605
    @relaxbro5605 2 роки тому +1

    It was obvious from the beginning BUT I wonder how much better it would get if you let autotune work on it🤔 do you know an audio engineer who could do this? Would love to see/ hear how this turns out. Maybe with autotune it would be even harder to tell the difference.

  • @LaTigerGenesis
    @LaTigerGenesis 3 роки тому

    lovin' your channel, hombre!

  • @CP-dl4nc
    @CP-dl4nc 3 роки тому +2

    It is really good but if someone already knows your voice they will detect the "machine" quality immediately. The tell is the lack of inflection and pitch that is directly connected to the context of the words. I see it as a tool but no substitute (yet) for an actual human.

    • @RealJamesArcher
      @RealJamesArcher  3 роки тому +1

      Absolutely agree. The human touch makes all the difference.

    • @matthewfuller9760
      @matthewfuller9760 3 роки тому +1

      @@RealJamesArcher It's 90% of the way there to replacing humans some of the time in video games.

  • @davidcovington901
    @davidcovington901 3 роки тому +1

    Thanks for all the hard-work investigation and the Buy rating. Hoping my old laptop is up to specs for using it, because I plan on becoming addicted, to rest my voice.
    Will we ever hear your voice live again?

    • @RealJamesArcher
      @RealJamesArcher  3 роки тому +2

      Oh yes, I don't expect to actually use this much on a day-to-day basis. There's no substitute for the real human voice and the subtle distinctions it can make. I'll probably use this for occasional patching up or repairing something I said wrong, but not much else. I still plan to shoot my videos the old fashioned way!

  • @Anurania
    @Anurania 3 роки тому +1

    Not good enough to replace voice actors yet but maybe within five years. I'm thinking mostly in terms of video games where we want characters to have an endless amount of things to say.

  • @tsechee
    @tsechee 2 роки тому

    support chinese?

    • @strukitru
      @strukitru Рік тому

      AI doesn't really care about the language you speak. The only thing you should mind is that you want to train the model in the language you want it to speak in. The AI works with the phonetics of your input, not a databank of words of a given language. And since Chinese sounds different than for example English .. u get the idea.

  • @kbuss10
    @kbuss10 2 роки тому +1

    what on earth is the point??? just use your own voice if you need to, it is much less work... this is not deepfake! deepfake would be if you TALK into it, and then it converts so it sounds like Dirty Harry, Obama, or who you train it to. if you do that from text youd stil have to adjust the voiceovers timing which is an extreemely tedious process basically unviable

  • @instacoachhim
    @instacoachhim Рік тому

    Hey I would like to ask a tiny problem of descript do you have it before : wrong pronounce ! do you know how to fix it? Thank you