I've been trying to improve the robotic voice that it gave me to begin with, but I cant seem to find the settings of how to upload my podcasts which I have an abundance of my natural voice. Can you do a video of how to do this?
That’s a really good demo of the technology. Back when I had a bunch of podcast shows, this would have greatly reduced the time I spent editing them each week. I can only imagine where it will be in ten years.
Yikes! I didn't notice the on first viewing (or hearing)! Second time through, I picked up where the system mushes some sounds, and that the intonation is flat, but overall... scary good.
Great improvement over the last test. Though its still quite noticeable when someone has heard your voice before and is wearing headphones. The artifacts like a slight electronic tinge and the unnatural inflection kind of reveal the whole sharade. HOWEVER: If someone isn't all that used to your voice or expecting the fakery, I imagine this could go unnoticed in the vein of a lackluster take and a spotty microphone. Technology is improving fast, likely it is that in 2022 most of these issues have been ironed out.
Yeah, the artifacts are one of the biggest giveaways, but I don't think it'll take them long to sort that out. It's amazing to me how realistic they're getting the intonation. It's only a matter of time.
"If someone isn't all that used to your voice or expecting the fakery, I imagine this could go unnoticed in the vein of a lackluster take and a spotty microphone" ^^ This 100%. Not just the microphone, but also the audio compression which is far from perfect in every single encoding - especially if you are experiencing frame drops causing audio glitches at the same time.
That's pretty good! Did you use your own audio for the 8 hours or did you use their script? Not 100% clear if you have to use their script or not. I submitted 10 minutes and... it's not enough
To be fair it doesn't sound much like your voice or you own sound arrangement, all that combination of mic proximity, echo and so on. But it does sound like a human voice :)
From the owner: "While you can edit Projects offline, you still need an internet connection to transcribe audio." Does this mean I cannot record output from my own voice without an internet connection once the project files have been generated? Transcription refers to the conversion of speech to text. I want to be able to type and have it output my voice without being online. In other words, they dont have a general model of my voice. Every new word is novel and must be computed using their servers otherwise it would take forever?
I'm not positive, but I would assume that both the text-to-speech and speech-to-text require round trips to the server because they're both pretty processor intensive.
It sounds distorted and emotionally flat. Like a real person who is having equipment problems. It's really not good enough for podcasting. It might be alright for some short edits though.
It was pretty immediately apparent it was AI from the beginning. I’m actually kind of surprised because my personal voice clone on Descript sounds a little better while in yours I’m still hearing some artifacts. Also, the 8 hours of voice training was probably a little wasteful because the voice trainer only accepts about an hour of training, as far as I remember which means it probably took the first hour and discarded the rest. The improvement was almost entirely because their algorithm was recently updated and improved, not necessarily because you gave it more training Edit: saw your other video was only a few months ago so the extra training did probably help. It’s also been awhile since I read about the max training time so they may have changed that shrug 🤷🏼♂️ Great video either way!
I think that different voices will get different results, well, just because they are different. For instance, I feel like a lot of quirks in James's clone are coming from the fact that his real voice has a lot of vocal fry. It's a kinda of natural distortion so it's probably not good for the AI. But anyway, it's very interesting to watch how thngs improve!
congrats you just gave away your voice for free. You cannot stop them from using your voice in a podcast, commericial, product or service that you do not agree with or align with. The best part it you get no money in return, hence the ROYALTY-FREE term used in the agreement. You are a podcaster, you are making money off your UA-cam channel, why on earth would you let your voice go for free? 8.2 License to User Content. We claim no ownership rights in your User Content. You hereby grant to us a nonexclusive, royalty-free, sublicensable, worldwide license to access, reproduce, distribute, process, publish, display, perform, adapt, modify, analyze, and otherwise use the User Content to provide, maintain, and improve Descript and the Descript technology, without compensation to you, provided that our use of any Projects you create is subject to the usage limitations and confidentiality obligations set forth in Section 9 below.
This is worrying. You have highlighted a very legitimate problem that for paid voice over artists and actors is a minefield. I feel that the producer of this video should at least let us know his opinion on this.
@@percythefisherman It's exactly why the initial enthusiasm for deepfake tech waned so quickly in academia. Juat like with the stem cell problem some time ago. The ethics problem reared it's ugly head, people started pointing fingers, legislation started restricting what they could do and funding dried up - the academics are too afraid to push the tech forward for fear of losing funding they need to do research.
that was a good one James... Question, are you able to download the audio and put it elsewhere, let's say i wanna edit on Davinci, instead of on Descript's video editor?
Wow! As great as this tool would be for content creators, I can see it 100% being a 'must have', for the criminally minded. "Hello Mr Archer, How are you doing today? I'm just calling you about your bank account......"
Something I plan to do with Overdub is to clone my high voice for a character in my show I have, so when I get older and cannot do that anymore I will be able to use overdub to keep the voice in store.
Lol everything was text to speech man this is awesome, i have problems in create videos since my house makes a lot of noises with small space and multiple persons around this is awesome!
I could tell immediately because I was listening for it. Still very impressive! I think within a year they could have it perfected. I'm not sure what makes it sound off, it just has kind of a digital glitch like when you set spoken words to autotune.
Same, but only because I was partially expecting to get punk'd. If this was anyone else's video on a different topic, I would have assumed they had a sore throat.
I noticed it right away, it reminds me of listening to low bit rate audio. But the 8hrs def made your voice sound higher rez. Maybe after 400 hours it will sound realistic?
Too many channels using fake audio. Your one included.. It sounded fake from the beginning. Soon everyone will get sick of it once they know the signs and render all the libraries of videos using it as trash.
I have lots of audio files of my deceased father, can I use descript to clone his voice or can I only train it by repeating their phrases? Please help, I would really like to bring my dad back thanks.
sounds like you with the flu phoning into work 😂 I can see this being bad news for us regarding those horrible text+speech videos that exist on youtube.
"I can see this being bad news for us regarding those horrible text+speech videos that exist on youtube" Au contrare - the current ones are terrible. This will at least give them a nice upgrade - unfortunately as they improve it will become harder to tell a real one from a fake one. Someone could literally just
Hey James -- great video. What would you recommend today is the best way to train a model that I have 10 hours of interview audio with? For a documentary I'm working on -- I want to feed it audio of a professional actor performing a monologue I wrote, and use the model to overdub the documentary subject's voice onto.
Very accurate. I could tell though probably because I am wearing headphones and also expected it based on the subject material. Slight gaps and no breaths in the audio. However, if I didn't know any better it would fool me.
Yeah, I could tell from the very beginning that your voice sounded robotic. I would suggest using it for longer. Maybe, a couple of days because the difference between the 30 minute and the 3 hour is very big.
Yup. I knew instantly your voice was ,ai even with the training, There is an underlying gravely sound in the voice with a hit of warping and electronic feel.
I could tell it wasn't really you like 5 words in, still has a tone to it that let's you know but not bad, but doesn't help that it made every last word of a sentence you say sound so low and goes down.
Is this means we only need to upload our voice id statement + the audio of our podcast/anything and not the script given by them for the initial voice overdub setup? Or we want to first upload voice id statement plus their 30 minute transcript to get overdub voice, then for more accurate overdub, upload other file with voice id statement with our podcast audio? Waiting for your reply.
Hey, Augustine, it pretty much means you just need the voice ID statement and then whatever audio you can pull together. I just took all the raw recordings from my past video shoots and stitched them together in a single audio file, and that worked for me!
Yeah, there are definitely some weird ethical concerns here. This particular company requires the training voice to recite a verbal contract, but there are ways to get around that (like having an impersonator do it) and nothing stopping people from downloading and using their own software on any training data they want. Weird times ahead.
Indeed. I would be wary of doing this in the first place. It's one thing for people like celebrities that have tens of thousands of hours of their voice on record due to their public exposure - but for the average joe not wanting their identity stolen this could potentially be dangerous.
It was obvious from the beginning BUT I wonder how much better it would get if you let autotune work on it🤔 do you know an audio engineer who could do this? Would love to see/ hear how this turns out. Maybe with autotune it would be even harder to tell the difference.
It is really good but if someone already knows your voice they will detect the "machine" quality immediately. The tell is the lack of inflection and pitch that is directly connected to the context of the words. I see it as a tool but no substitute (yet) for an actual human.
Thanks for all the hard-work investigation and the Buy rating. Hoping my old laptop is up to specs for using it, because I plan on becoming addicted, to rest my voice. Will we ever hear your voice live again?
Oh yes, I don't expect to actually use this much on a day-to-day basis. There's no substitute for the real human voice and the subtle distinctions it can make. I'll probably use this for occasional patching up or repairing something I said wrong, but not much else. I still plan to shoot my videos the old fashioned way!
Not good enough to replace voice actors yet but maybe within five years. I'm thinking mostly in terms of video games where we want characters to have an endless amount of things to say.
AI doesn't really care about the language you speak. The only thing you should mind is that you want to train the model in the language you want it to speak in. The AI works with the phonetics of your input, not a databank of words of a given language. And since Chinese sounds different than for example English .. u get the idea.
what on earth is the point??? just use your own voice if you need to, it is much less work... this is not deepfake! deepfake would be if you TALK into it, and then it converts so it sounds like Dirty Harry, Obama, or who you train it to. if you do that from text youd stil have to adjust the voiceovers timing which is an extreemely tedious process basically unviable
I am planning on using this tool to get my friends arrested by confessing to crimes they didn't do!
Hey, that's a cool little trick.
Something liike this happen on Discord, never believe anything you see online.
Hell Nah
💀
That's what friends are for
Short but to the point, just what I wanted. Thank you
Sounds like I could replace audio book readers I hate with my favorite narrators. And if it's a little flat, good! Because what I hate is overacting.
It still sounds a bit "crunchy".
But I'm not sure I would have thought you where a computer as much as using a bad microphone.
I've been trying to improve the robotic voice that it gave me to begin with, but I cant seem to find the settings of how to upload my podcasts which I have an abundance of my natural voice. Can you do a video of how to do this?
It's pretty good! I wonder how it will sounds like after a 100 hours
That’s a really good demo of the technology. Back when I had a bunch of podcast shows, this would have greatly reduced the time I spent editing them each week. I can only imagine where it will be in ten years.
Can probably just think about the words and they will appear....
Yikes! I didn't notice the on first viewing (or hearing)! Second time through, I picked up where the system mushes some sounds, and that the intonation is flat, but overall... scary good.
mind blown, this is a game changer for a lot of businesses
i tried to train but it never worked for me
Great improvement over the last test. Though its still quite noticeable when someone has heard your voice before and is wearing headphones. The artifacts like a slight electronic tinge and the unnatural inflection kind of reveal the whole sharade.
HOWEVER: If someone isn't all that used to your voice or expecting the fakery, I imagine this could go unnoticed in the vein of a lackluster take and a spotty microphone.
Technology is improving fast, likely it is that in 2022 most of these issues have been ironed out.
Yeah, the artifacts are one of the biggest giveaways, but I don't think it'll take them long to sort that out. It's amazing to me how realistic they're getting the intonation. It's only a matter of time.
"If someone isn't all that used to your voice or expecting the fakery, I imagine this could go unnoticed in the vein of a lackluster take and a spotty microphone"
^^ This 100%.
Not just the microphone, but also the audio compression which is far from perfect in every single encoding - especially if you are experiencing frame drops causing audio glitches at the same time.
It was obvious that this wasn’t your voice but still impressive honestly
That's pretty good! Did you use your own audio for the 8 hours or did you use their script? Not 100% clear if you have to use their script or not. I submitted 10 minutes and... it's not enough
I submitted about 45 minutes. It's not enough.
To be fair it doesn't sound much like your voice or you own sound arrangement, all that combination of mic proximity, echo and so on. But it does sound like a human voice :)
would this work when using someone else voice but I need to input words from another language?
I love this, but I hate this, but I love this.. you know?
I feel the same! It:s very clever, but it's a disturbing road to be on.
From the owner: "While you can edit Projects offline, you still need an internet connection to transcribe audio." Does this mean I cannot record output from my own voice without an internet connection once the project files have been generated? Transcription refers to the conversion of speech to text. I want to be able to type and have it output my voice without being online.
In other words, they dont have a general model of my voice. Every new word is novel and must be computed using their servers otherwise it would take forever?
I'm not positive, but I would assume that both the text-to-speech and speech-to-text require round trips to the server because they're both pretty processor intensive.
@@RealJamesArcher yep. I am thinking the same way.
It sounds distorted and emotionally flat. Like a real person who is having equipment problems. It's really not good enough for podcasting. It might be alright for some short edits though.
Perfect summation
Sounds Awesome, I've added about 12mins. Is it best to make a new dub with new longer audio or add new audio by editing the existing one?
It was pretty immediately apparent it was AI from the beginning. I’m actually kind of surprised because my personal voice clone on Descript sounds a little better while in yours I’m still hearing some artifacts.
Also, the 8 hours of voice training was probably a little wasteful because the voice trainer only accepts about an hour of training, as far as I remember which means it probably took the first hour and discarded the rest.
The improvement was almost entirely because their algorithm was recently updated and improved, not necessarily because you gave it more training
Edit: saw your other video was only a few months ago so the extra training did probably help. It’s also been awhile since I read about the max training time so they may have changed that shrug 🤷🏼♂️
Great video either way!
I think that different voices will get different results, well, just because they are different. For instance, I feel like a lot of quirks in James's clone are coming from the fact that his real voice has a lot of vocal fry. It's a kinda of natural distortion so it's probably not good for the AI. But anyway, it's very interesting to watch how thngs improve!
I feel like using compressed audio as training data isn't the best start however much more time it takes to upload uncompressed audio.
This video deserves the follow just because it is so clever.
congrats you just gave away your voice for free. You cannot stop them from using your voice in a podcast, commericial, product or service that you do not agree with or align with. The best part it you get no money in return, hence the ROYALTY-FREE term used in the agreement. You are a podcaster, you are making money off your UA-cam channel, why on earth would you let your voice go for free?
8.2 License to User Content. We claim no ownership rights in your User Content. You hereby grant to us a nonexclusive, royalty-free, sublicensable, worldwide license to access, reproduce, distribute, process, publish, display, perform, adapt, modify, analyze, and otherwise use the User Content to provide, maintain, and improve Descript and the Descript technology, without compensation to you, provided that our use of any Projects you create is subject to the usage limitations and confidentiality obligations set forth in Section 9 below.
is a simple voice lol
This is worrying. You have highlighted a very legitimate problem that for paid voice over artists and actors is a minefield. I feel that the producer of this video should at least let us know his opinion on this.
@@percythefisherman It's exactly why the initial enthusiasm for deepfake tech waned so quickly in academia.
Juat like with the stem cell problem some time ago.
The ethics problem reared it's ugly head, people started pointing fingers, legislation started restricting what they could do and funding dried up - the academics are too afraid to push the tech forward for fear of losing funding they need to do research.
that was a good one James... Question, are you able to download the audio and put it elsewhere, let's say i wanna edit on Davinci, instead of on Descript's video editor?
Creepy... and cool. (Which is how good tech starts.) The question is James... will you use this power for good or evil?
Wow! As great as this tool would be for content creators, I can see it 100% being a 'must have', for the criminally minded. "Hello Mr Archer, How are you doing today? I'm just calling you about your bank account......"
Something I plan to do with Overdub is to clone my high voice for a character in my show I have, so when I get older and cannot do that anymore I will be able to use overdub to keep the voice in store.
Lol everything was text to speech man this is awesome, i have problems in create videos since my house makes a lot of noises with small space and multiple persons around this is awesome!
I could tell immediately because I was listening for it. Still very impressive! I think within a year they could have it perfected. I'm not sure what makes it sound off, it just has kind of a digital glitch like when you set spoken words to autotune.
Same, but only because I was partially expecting to get punk'd. If this was anyone else's video on a different topic, I would have assumed they had a sore throat.
not good anyway. The quality of the sound is bad, the voice cloning was decent.
I noticed it right away, it reminds me of listening to low bit rate audio. But the 8hrs def made your voice sound higher rez. Maybe after 400 hours it will sound realistic?
Too many channels using fake audio. Your one included.. It sounded fake from the beginning. Soon everyone will get sick of it once they know the signs and render all the libraries of videos using it as trash.
I have lots of audio files of my deceased father, can I use descript to clone his voice or can I only train it by repeating their phrases? Please help, I would really like to bring my dad back thanks.
mindblown.gif
As soon As you played the first version and said you were going to compare I realized the entire video had been dubbed
Nice. Do you have to verify vocals to train and clone? And is it mobile?
Wow...I hadn't noticed... this is fantastic
I'm trained as hell, let's go. I could totally the tell the difference
1:41 at the second that the video started i already knew there was something wrong going on with your voice
sounds like you with the flu phoning into work 😂
I can see this being bad news for us regarding those horrible text+speech videos that exist on youtube.
Or audiobooks. I can't imagine trying to listen to a whole audiobook with an AI voice, because even a great one would still be...awkward.
"I can see this being bad news for us regarding those horrible text+speech videos that exist on youtube"
Au contrare - the current ones are terrible.
This will at least give them a nice upgrade - unfortunately as they improve it will become harder to tell a real one from a fake one.
Someone could literally just
Awesome did you have some guide for foundational model to train voices ?
sounds exactly like your broken mic hahaha xD
Hey James -- great video. What would you recommend today is the best way to train a model that I have 10 hours of interview audio with?
For a documentary I'm working on -- I want to feed it audio of a professional actor performing a monologue I wrote, and use the model to overdub the documentary subject's voice onto.
Will descript immitate my accent as well or only my voice pitch and tone?
I was telling myself something wrong with the audio in this video , haha
First time I’ve said wow out loud to an ai tool
I noticed RIGHT FROM THE BEGINNING that you were using the AI
Wild....going to try this on my channel lol
I want to learn, you can open this good knowledge class
This could give back someones voice lost to cancer
Very accurate. I could tell though probably because I am wearing headphones and also expected it based on the subject material. Slight gaps and no breaths in the audio. However, if I didn't know any better it would fool me.
It's pretty good, but it still has that robotic tone to it. I could tell it wasn't your actual voice from the beginning.
Is there a way to train an overdub voice on a specific speaker once speaker labels have been applied to the video?
Yeah, I could tell from the very beginning that your voice sounded robotic. I would suggest using it for longer. Maybe, a couple of days because the difference between the 30 minute and the 3 hour is very big.
Crazy how this was a year ago… my have we come a long way…
wow its gotten way better.
This will be huge in the adult industry
Can I create digital audio in Arabic?
Yup. I knew instantly your voice was ,ai even with the training, There is an underlying gravely sound in the voice with a hit of warping and electronic feel.
Impressive results!
This is sick!)) I couldn't tell you used Overdub for the entire video)))
Mind Blown.
Whooooo! Thanks for these videos. Answered all my questions before I pulled the trigger.
sounds way better than mine which is still pretty shit after providing an hour of training.
I could tell it wasn't really you like 5 words in, still has a tone to it that let's you know but not bad, but doesn't help that it made every last word of a sentence you say sound so low and goes down.
not work
Is this means we only need to upload our voice id statement + the audio of our podcast/anything and not the script given by them for the initial voice overdub setup? Or we want to first upload voice id statement plus their 30 minute transcript to get overdub voice, then for more accurate overdub, upload other file with voice id statement with our podcast audio? Waiting for your reply.
Hey, Augustine, it pretty much means you just need the voice ID statement and then whatever audio you can pull together. I just took all the raw recordings from my past video shoots and stitched them together in a single audio file, and that worked for me!
Sick!
Awesome. But how do you train the AI with the voice you want it to learn?
This is exciting and scary at the same time
Incredible video!
Yea its much better
soon you'll be able to make your own music with an artist that you want
Yeah, there are definitely some weird ethical concerns here. This particular company requires the training voice to recite a verbal contract, but there are ways to get around that (like having an impersonator do it) and nothing stopping people from downloading and using their own software on any training data they want. Weird times ahead.
wow! OMG
Once trained by your voice for 8 hours, can you then use the tool offline? I imagine not right.
Indeed.
I would be wary of doing this in the first place.
It's one thing for people like celebrities that have tens of thousands of hours of their voice on record due to their public exposure - but for the average joe not wanting their identity stolen this could potentially be dangerous.
damn..
That is so awesome
Wow.
Great demo!
It was obvious from the beginning BUT I wonder how much better it would get if you let autotune work on it🤔 do you know an audio engineer who could do this? Would love to see/ hear how this turns out. Maybe with autotune it would be even harder to tell the difference.
lovin' your channel, hombre!
Thank you! 🙌
It is really good but if someone already knows your voice they will detect the "machine" quality immediately. The tell is the lack of inflection and pitch that is directly connected to the context of the words. I see it as a tool but no substitute (yet) for an actual human.
Absolutely agree. The human touch makes all the difference.
@@RealJamesArcher It's 90% of the way there to replacing humans some of the time in video games.
Thanks for all the hard-work investigation and the Buy rating. Hoping my old laptop is up to specs for using it, because I plan on becoming addicted, to rest my voice.
Will we ever hear your voice live again?
Oh yes, I don't expect to actually use this much on a day-to-day basis. There's no substitute for the real human voice and the subtle distinctions it can make. I'll probably use this for occasional patching up or repairing something I said wrong, but not much else. I still plan to shoot my videos the old fashioned way!
Not good enough to replace voice actors yet but maybe within five years. I'm thinking mostly in terms of video games where we want characters to have an endless amount of things to say.
support chinese?
AI doesn't really care about the language you speak. The only thing you should mind is that you want to train the model in the language you want it to speak in. The AI works with the phonetics of your input, not a databank of words of a given language. And since Chinese sounds different than for example English .. u get the idea.
what on earth is the point??? just use your own voice if you need to, it is much less work... this is not deepfake! deepfake would be if you TALK into it, and then it converts so it sounds like Dirty Harry, Obama, or who you train it to. if you do that from text youd stil have to adjust the voiceovers timing which is an extreemely tedious process basically unviable
Hey I would like to ask a tiny problem of descript do you have it before : wrong pronounce ! do you know how to fix it? Thank you