Descript AI Voice Cloning - DaveClone vs RealDave
Вставка
- Опубліковано 27 сер 2024
- Testing another AI voice cloning tool, Descript. Can this one produce a DaveClone?
Eleven Labs video: • DaveClone - Testing El...
Forum: www.eevblog.co...
If you find my videos useful you may consider supporting the EEVblog on Patreon: / eevblog
Web Site: www.eevblog.com
Main Channel: / eevblog
EEVdiscover: / eevdiscover
AliExpress Affiliate: s.click.aliexpr...
Buy anything through that link and Dave gets a commission at no cost to you.
T-Shirts: teespring.com/s...
#ElectronicsCreators #descript #ai
To me as a non-native speaker living in Australia the 2nd one sounds a lot more "Dave" than the first. It does sound a little bit too excited at the wrong places in the script, but overall it is still a lot more convincing than the super flat AI voice of the first.
I agree. The voice trained on videos sounds more like normal Dave video. Just a bit too excited.
A mixture of both would do the business.
Really? I think it's awful. I get what you mean that it's more like my excited voice, but the voice itself is horrible and grating. It's like words have been cut off or something.
@@EEVblog I don't disagree with it being a bit grating and words getting a bit cut off etc, but it is still a much more accurate you than the first robotic one :) If they could vary the tone and excitement throughout the script to match the context, it would be a perfect reproduction of your real voice!
@@EEVblog Yeah .... Sorry Dave. That second one is pretty convincing. Yes, it's a bit "excited", but the tonal components are rather believable. The monotonic one is rather drone-like. The real Dave is somewhere in between in regards to expressiveness, but both have decent tonal qualities. IMHO
😂 The first sounds like you're being held hostage by terrorists and forced to read about transistors
The second is incredibly good! It's like your normal level of "excitement" has been tripled, but it's way more convincing.
I thought it''s pretty good too but it got one emotion and sticked with it at all times. After few seconds without a change it sounds robotic and annoying.
@@woox2k As sometimes the ranting Dave is also.
Well... to be honest, that second sample does sound very similar to your voice in your early video's.
I think Dave liked the first one because it made him sound better than he normally does. Yes, I agree there was more of Dave's twang in the second one.
@@BenMitro Yeah I think that's the case too
SOLAR FREAKING ROADWAYS!
When Dave gets excited, his voice pitches up more. I think that's what the second run of Descript was catching a bit of in the sample videos.
The first was read by Dave from script and was read at his normal pitch... maybe because the content was not that exciting ( :) ) or done deliberately by Dave to create a better dataset for testing different cloning tools.
Definitely agree with this. The second is better than the first, but it's like Dave is trying to out-Dave himself 😂
"Some transistors are packaged gibber."
The AI knows something about Chinese counterfeit audiophile power transistors. Respect.
Real Audiophiles don't use transistors.
@@Okurka. Real gibber don't jabba
6:14 Now you know the pain we have to go through.
Even though you listen to yourself more than most people, I think it would be challenging for anyone to objectively analyze their own voice. Great video as usual Dave!!
I think the more emotive one is the most accurate... Almost spot on. He's not going to like hearing that.
Sorry, but nope. I've bene trained on thousands of hours of my own voice played back through high quality studio monitor speakers.
@@EEVblog But you have no more objectivity on the matter than anyone else. That last sample is the closest, and it's like 90% accurate.
@@EEVblogImmaterial! You inherently have subjective bias. Face it Dave, all your fans also have thousands of hours listening to your voice and the majority rule that the overemotional sample is YOU!
I didn't watch the first video, but from the sample in this one "British Dave" sounded better, words were flowing one into another. Those two new samples sounded like someone made a soundboard of separate words and stitched them together into sentences.
I think that "British Dave" sounded better and closer to your voice. You may be sensitive to the accent but non-Australians will not notice the lack of accent.
I think the second version was actually a lot more like you, just a bit too "excited" for the text given. You should have it say "sex on a stick" or something.
Sorry Dave but the one trained on your videos REALLY sounded like you in your videos, just without the context of why theyre constantly putting in voice stress. Sounded just like one of the 'solar freaking roadways' vids - and yeah it really isnt your best side XD.
Russel Brand doing Dave impersonations was a spot on call for the first one
Dave, I don't know what you think you sound like, but that second one sounds much more like the emoting Aussie we're used to. Add a few "Bob's your uncle," "that's terrible Muriel" and "we're in like Flynn" and call it done.
I think the DaveClone voice is better from the point of the emotion in the voice, the problem is it is putting the emphasis in the wrong place because it's not being done in the context of what is being said, which is probably why it grates so much. But the "android" voice is so close its just we're not used to monotone Dave.
To me, the one trained on your videos sounds much more like you than the one trained on the script.
The second one sounded exactly like you, but maybe after a couple pints
1(text train): Battery Drained. 2(video based): Over Voltage . 3(Eleven Labs): Batteries in series with leakage .
I am an Aussie and the second sounds more like you.
If it turned down the excitement a bit, it would be you!
yeah, the 2nd one is definitely more "Dave". maybe a little over-the-top, but much less robotic than the 1st one, especially the ending of the work "individually".
The second one, trained on your natural voice, actually sounds much better to me than the one trained on the recommended script, mainly because the second voice seems to capture some of your emotive expression. I find the normal, emotionless "AI" voices to be impossible to listen to for any length of time, but I could listen to a fair bit of text rendered in the second voice.
The first one sounds like Dave on smack. The second one sounds like Dave on crank.
Davebot: "g r e e t i n g s. I r e q u i r e y o u r b a n k d e t a i l s t o p r o c e s s r e f u n d"
Customer: "Dave? Is that really you? You sound like you have a cold."
Davebot: "... ... a i n t t h a t a b o b b y d a z z l e r"
Customer: "Oh, haha, okay, it is you. Hold on, sending"
Davebot: "B O B I S Y O U R U N C L E"
I don't understand. the second sounds like garage to me immediately. but the first sounds spot on. is just my opinion. the second sounds like Terminator describing the rise of skynet
Everyone else here inthe comments seems to think the 2nd one is way better.
I like the second one....in fact I want the AI to have an emotive multiplier, like those resident evil videos where the facial expressions are multiplied 500%, except applied to your voice.
First one sounds like Dave has lost his will to live, second one is still pretty good bit like Dave is stuck on high energy mode.
2nd sounded much better and more like you in my opinion
06:19 is quite how you are speaking in your clips :-)
It's a LOT better than that previous one... but still doesn't sound like there's any emotion or "soul" in the voice. It does a better job at creating the accent.
You got your Strine back! 9/10 if they could remove the aliasing.
That is what Video-Dave and Podcast-Dave sound like after they've been partying together all weekend.
In the second one, I could predict after 30 words where your voice would raise to a high pitch. Always the same sound in a wavy format.
First one is best, but low pitched and flat; like if it had only few tones with nothing in between, like the fixed energy levels from electrons.
Its definatly got the accent down. Maybe a mix betweeen the two would be quite close. It seems the second lacked bass frequancies.
Haha - the second generate voice is spot on😂
The best one was the one you didn't like ,it was so much like you, over excited, high pitched typical Dave tone and intonation.
I knew this was coming.
lol, the video trained one sounded more accurate to me
every time you try something you are teaching the beast..we have no chance.
You have no chance to survive make your time.
The first one missed the word "individually" but it got it on the 2nd one. The 2nd has some of you inflection, but in all the wrong places. Perhaps it it had a better vocabulary it would improve.
I'm from the UK and I could tell the last one sounded British.
This is The Singularity!
The second one sounds like Dave on weed having the greatest time of his life with transistors xD
1st 8/10 no emotions Dave (tone of voice: actually very similar)
2nd 6/10 sad Dave (aussie accent with 0 human touch)
3th 2/10 squirrel Dave (no comments)
To a Brit, that isn't a British accent Dave!
The previous attempt was much better. This one literally sounds like a robot with metallic notes in its voice.
You know, getting Russel Dave to read bits of your script would be kinda memeworthy XD
The British Dave does actually sound good and rather pleasant, it's just not me.
You should reach out to Home Assistant saying you want to train Piper with your voice
Better accent but like you say Dave, no characteristic inflections. 7/10
It's a lot better than the last one. If it had some character and personality it would be much harder to tell.
Is the clone called Daiv?
8/10?! Pssh, you're kidding! Sure, yes, it gets the Aussie accent quite well, but some words have this odd raspy glitch at the end -- I dunno, the word that comes to mind would be "sawtooth", but that's just me -- and even ignoring the monotonousness, it sounds too robotic still. I give it a 6/10. Once they figure out a way of avoiding the monotonousness, that's when I'll be impressed.
The last one did sound a little crackly to me. I would've believed it was a microphone problem. I didn't notice it on the first one though.
9/10, they both sound good to me
2nd one is you dave for sure 8 out of 10
I still can't tell the difference, apart from the one trained on your videos was more animated... I must have a low resolution audio decoder in my brain.
First one sounded good but you can still tell its AI and weird but def sounds like you.
I rate this a clear Jabber/10 !
Pretty decent actually. Extremely depressed version but very good lol
Cool! It clearly lacks the emotions, but not much is missing before building a DaveBot to fix electronics for you.
It's hollow, not terrible but not right.
Are you trolling Dave? Screw the accent if the voice sounds like a robot. Eleven labs is waaay better the others in the video.
I agree, but the point is trying to match MY voice and accent.
Oh wow, I actually have to fully disagree here.. The Eleven Labs voice maybe not nailing the accent 100% right, but sounds like a real, smooth human voice. The Descript AI ones both sound like some Text-To-Speech technology out of the 90s/00s with real audible noise / unnatural sharpness to the voices. But I still much prefer the second Descript AI voice over the first. So for me it's:
Eleven Labs >>> 2nd Descript AI > 1st Descript AI
I agree that the Eleven Lab one sounds better and smooth, it's way more pleasant sounding. But it's not me.
The real test would be to produce one of your videos with a cloned voice and see how many people notice a difference.
Almost all people would notice! Dave has a difficult voice pattern to follow artificially. Like he said, he talks with emotion. It's clear if he's reading a script or talking about something that excites him. He might just be good at acting and not talk like this in real life but we do not care, we are used to his speech in his videos and that is difficult to replicate.
The AI forgot to say Bob's your uncle...
Ah yeah, sorry Dave, the second one is deffo more "you" on an average video, but like other's say, it doesn't match the subject matter. If you put a debunking script in to it, I think it would sound on point 🙂
Have to say, that elevenlabs one sounds just like Hugh Jeffries
Hahah 2nd comment for the algo, your energy is cracking me up. Love it!
Second video is more convincing. First sounds more robotic - its interesting how people don't recognise their own voice.
I don't know Dave... it needs more 'jibber and jabber'.
Second one was spot on, first one not so good. First 5 second one 9
Try play ' ht although I haven't tried their voice cloning their pre existing voices are quite good
If you were calling me and asking for gift cards, I would expect the excitement of the second one.
It would be a total sell.
Depressed Dave vs overexcited Dave.
That sounds a LOT more like you to me then the British one (I’m British) if only you could mix the two you had here and I think that would work well. The first one here might fool me if I thought you were deliberately reading it as if some calamity had occurred. Think it’s cloned the voice well but needs natural emotion/inflection.
7/10
8/10 it's dry monotone and you can hear the breaks
the 2d one was the best!!!That is you like it or not
I reckon it sounds a bit of a younger Dave.. (the one you gave 8 out of 10). About half way between the two would be probably about it.
The missus prefers the first AI generated version of your voice @ 4:30 mark 🤣
What comes from learning with mix of script reading and audio from videos? I think those have only your character and script have no life. Something around 20% between the two could give nice results?
Second one is better. Sounds more like your voice in your videos.
DaveClone is a bit too fast and a bit too high pitched, but overall I'd say it is much better than The Transistor. It's not as monotone, but more importantly: It actually sounds a lot more like you.
It's definitely not fooling anyone, but small snippets might actually be believable.
2nd one also sounded like you but like you're trying not to laugh.
I have to disagree. The "chalkboard" sample sounds more like you. The first sample is too low in frequency.
Can we give "AI Dave Voice" the name 'Dave from the Old Dart'? 🤣🤣🤣
Aussie slang kills me. 😁
Hahah I couldn't hear it. US native here and I tried to pick out certain words and I still couldn't tell
Call him Dawid
The second one is closer to you than the first. Yeah. We all see our own voices as cringe. LOL
Not perfect but wow, so much better and at least it's not a British accent lol
to my yankie ear the old ai one was close to me
The first had a depressed deadpan delivery, like you were being sarcastically uninterested when reading about a new product.
The second sounded a bit like horse racing commentary 🏇
sorry to say but you sounds like the 2rd one you don't like it but it is true !! impressive
Nothing like.
My ranking would be:
1. Eleven Labs - most fluent.
2. Descript AI trained on videos - most natural, yet slightly too excited.
3. Descript AI trained on a script - very artificial sounding, with lots of artifacts, didn't like it at all.
For shits and giggles could you start one of your video with a Russell Brand "Hello Beautiful People!" :P
Pretty soon we won't know what's real or Fake Especially after google rolls out there ai
Portuguese here and i said straight away: British accent 😂😂
I'm British but could tell that last one had a twang of aussie but basically British. In second you could tell wasn't right. On second, much better but lacks good flow, you could tell wasn't human. Pace seemed wrong too. Third was not good. Overall second was probably best for me.
I thought the original one sounded more like you than this one! Maybe it's because I'm English? 🤔 Weird and interesting in equal measure lol.
The Eleven Labs one is "smoother" I think, and more pleasant to listen to, but the accent is completely wrong so it's a fail on cloning my actual voice.
I'm English and I thought the ElevenLabs one was terrible. It has a very slight Aussie twang, like someone from Australia who's lived in England for 30 years.
DaveClone sounds like you 10 years ago😂
You have been assimilated.
i’m sure a lot of us viewers are in the states. we couldn’t tell cause we don’t even question our own election. 😂. kidding but kinda not
This one sounds more like you, but is more artificial. Sounds robot, with no afflections, no modulation. Just-as-if-you-read-a-word-at-a-time.
Yeah this one is closer but totally lacking emotion/ inflection. Sorry can't replace you yet!
Very strange, to me the training script one sounds the least like you. And even I'm very used to the Aussie accent (lived there for a while) I even prefer the eleven labs one over the training script one from descript, yes the accent is a bit off but your voice is pretty good.