How to Program Speech Synthesis in an Animatronic Mouth Using Python and Arduino

Will Cogley

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 24 тра 2020
Here's a closer look at the programming behind my animatronic mouth. Using Arduino, Python, and a few open-source libraries, I take a typed sentence and convert it into an animation sequence.
Support me on Patreon! / nilheimmechatronics
Contact: enquiries@willcogley.com
Discord Server: / discord
Open source animatronic mouth design: www.nilheim.co.uk/latest-proje...
Instructable: www.instructables.com/id/Simp...
Наука та технологія

КОМЕНТАРІ • 47

@scottduede8134 4 роки тому ⁺²¹
As a linguist, I can say that this is awesome sauce.
@hypodyne1 4 роки тому
I did the same thing with a talking head program. Used the same dictionary and mapped the visemes to the phonemes. You could have a conversation in real time with my app (called Ayako). Awesome that you took it further with the robotic mouth. Well done.
@drudtube 3 роки тому
That looks great! I'm working on a same kind of controller. But I am using a stereo track that is analyzed by a Teensy Audio Board. The Right track contains the speech and controlls the Jaw. The Left track contains tones that correspond to the mouth positions of the other servo's.
For example 200Hz for A, E, I. And 250Hz for B, M, P.
So the actor who's gonna play with this mouth only has te make an audio track with the right tones in the right position on the Left-track.
@tecnicotec1 4 роки тому ⁺¹
Very good vídeo and better job.
Its really amazing and interesting.
Thanks about your job.
@PhG1961 4 роки тому
Excellent work ! Awesome !!
@cdoebler 4 роки тому
Excellent work. My upcycled Teddy Ruxpin uses a much simpler set of visemes.
@stevecoxiscool 4 роки тому ⁺¹
Nice work !!!
@Nono-hk3is 4 роки тому
Good work!
@TheMrJuoji 4 роки тому
instead of maping in an array each servo position for each viseme you could try to do a reverse kynematic model then just have each end position for the mouth maped in set of position , you could even map that to motion capture
@jonathangriffin3486 4 роки тому ⁺²
For emulating audio you would probably need to look at how the frequency domain representation of the signal is changing over time.
@satyakidas7144 4 роки тому
Beautiful video awesome robot
@tankart3645 4 роки тому
Looks awesome I got to say
@Skillseboy1 3 роки тому
Such a cool video
@TheRainHarvester 4 роки тому ⁺¹⁰
Make your mouth the narrator in the bottom right of all your videos!
How loud are the servos in real life?
@twobob 4 роки тому
I seem to remember using a Regular Expression to make the UK version of the Arpabet for my speech projects in the past. I think thats right. This looks like a fun project. I used a c# library under unity for this last time I faffed around. Good job overall. it's a tricky subject :)
@twobob 4 роки тому
Also the iphonex does a decent job these days. apps.apple.com/us/app/face-cap/id1373155478 might be worth an eyeball for example.
I did cmusphinx.github.io/2013/03/speech-recognition-on-kindle-touch-with-cmusphinx/ donkeys years ago showing that one can indeed get a decent extract of words from those tools you mentioned. You need to faff a bit though. Good luck, this one looks like fun,
@robertwesterfield3454 6 місяців тому
Wow thanks!
@FirstLast-wr9mh 4 роки тому
Fantastic
@MattHollands 4 роки тому ⁺¹³
Are you planning to put a skin on the mouth? Seems like there are bits around the mouth to deform the lips etcs but looks a bit odd without a skin
@kiltmaster7041 2 роки тому ⁺¹
It did strike me as odd that he was setting servo positions for certain expressions when he doesn't even know what those expressions will look like on a completed face. Surely it would make more sense to finish the head before setting something like that? But what do I know?
@Skyliner_369 3 роки тому ⁺¹
I'm sure that if I wanted to, I could probably write a blender extension that avoids all this phoneme stuff and instead sends direct pose data from animated frame data. That way, the mouth is animated like how someone might animate a character.
@AltMarc 4 роки тому
For local ASR try DeepSpeech, on a RPI4 DeepSpeech Lite works in real time.
Local Speech recognition is still tricky, works better on full sentences than single words.
@Robots-and-androids Рік тому
You might be able to use text to speech software first and then convert that text that it gives you into phonemes. both microsoft and bing offer "free" speech recognition for python. I use both. I am planning on doing something similar with a human figure.
@mariotoys6173 4 роки тому
Respect
@DustinWatts 4 роки тому ⁺²
Great work Will! I was thinking, what about using an ESP32 instead of Arduino? ESP32 can run MicroPython. Therefore eliminating the need for two microcontrollers and a serial connection.
@KineticWasEpicVideos 4 роки тому
NLTK will not run on an ESP32. Raspi + arduino combo would be ideal for a contained system.
@HeathLedgersChemist 4 роки тому
Could you approximate the mouth positions for Leeds by just leaving the mouth open all of the time?
@Robots-and-androids Рік тому
where did you get that amazing servo tester????? I NEED one of those! --Thomas
@SDRIFTERAbdlmounaim 3 роки тому
use a loop and table instead of a bunch of 'if's those will stack up real quickly lol
@REALVIBESTV 10 місяців тому
Can I buy the code
@SpaceDave-on8uv 3 роки тому
2:44
Does this mean switch statements do not exist in arduino?
@Jimmyfpv_ 3 роки тому
Yes, but bare in mind that they are not very useful when you do logic operations within the ‘if’. You would need to map the possible results into values so that you could use the switch statements
@gone6442 2 роки тому
Ok im making the mad hatter and march hare
@saucelessbones5872 Рік тому
gona make me act up
@Allanusmonostat Рік тому
So if you modded this just a hair it could be a phonetic filter.
@CyberSyntek 4 роки тому
Will, take a look at audioservocontroller dot com. I'm not sure how many servos it is cabable of controlling at once as I haven't grabbed one yet, I can inquire though as someone from the FB group has one. There is a few dif audio servo controllers out there but they don't seem to be very common. Scary Terry is another one. I saw someone post the hardware layout at some point so it might be easier to throw one together depending on the components. Might be the only way to get that many sevos running in sync.
Anymore thoughts on the potential forum? XD
*edit* Fernado from the group has a vid up with him testing it on his DARA robot. "DARA robot lip sync" if u r curious. Can ask him if he played with it anymore since. I think he had it just hooked up to the jaw and not his tongue model, he would know better mind you. :9
@ViennaMike 4 роки тому
Scary Terry and similar just work off the volume of the sound source, not visemes. Adequate for a jaw on a prop or toy (I use them and I'm developing a similar thing using a Raspberry Pi), bit nowhere near what Will is doing with visemes, jaw,.face, and tongue movements.
@Bigbirddev 2 роки тому
People who made this
*it took me 2 years to make*
@mr.e.484 4 роки тому
#10
@abetusk 4 роки тому
Unfortunately this isn't "open source". The source is available, as are the STLs, but there is no license on them and so cannot be used used, redistributed or altered legally.
The commonly held definition of "open source" is (from en.wikipedia.org/wiki/Open-source_license):
"... a type of license for computer software and other products that allows the source code, blueprint or design to be used, modified and/or shared ... Licenses which only permit non-commercial redistribution or modification of the source code for personal use only are generally not considered as open-source licenses."
From the "terms of service" page at www.nilheim.co.uk/terms-of-service.html:
" .. not to (or permit anyone else to) do or attempt any of the following:
* distribute, rent, loan, lease, sell, sublicense, or otherwise transfer or offer the Service for any commercial purpose;
"
Which puts it in direct contradiction with the definition of "open source" most widely used.
Please consider removing the term "open source" for something more appropriate like "source available", or putting the source code and STL files under a free/libre license.
@ViennaMike 4 роки тому
Of course I agree that prohibiting commercial use means it's not "open source" under the common definition. But I can certainly see reasons for doing so. I do think that the developer should consider some standard license, rather than the current "terms of service" which has some clear wording errors and more importantly, use of a non-standard license restricts uses the creator intended to allow, as no one is familiar with them or exactly how the terms may be interpreted. Besides just changing to an open source license . wouldn't other options include: 1) While not intended for software, use the Creative Commons license limiting commercial use, 2) License under the VERY unrestictive GPL, with options for commercial users to pay for closed licenses. This doesn't actually prohibit commercial use provided the user abides by the terms of GPL opening up their own changes to the same terms, but may make it more attractive for commercial users to pay for a restrictive license, or 3) While I haven't seen it used, use the Commons Clause (commonsclause.com/)?
@Mr_Motor 3 роки тому
on L the tongue should touch the top
@MrMoka15 4 роки тому
Are you a Furry? You could make a lot of money by seling this to them :3
@ChrisD__ 3 роки тому
It might be hard to fit all this stuff into a mask, but I remember there being a few people building animatronic fursuit heads like this.

Наступне

Автоматичне відтворення