Fun fact I record in my living room or sitting on my bed/the couch I mean this genuinely, if you are unable to record physically comfortably in a closet or under a desk DO NOT RECORD THERE. Your discomfort will come through in your voice and will result in strained, laboured sounding samples in UTAU, or in my case, I actually managed to give myself mild vocal damage and moderate damage to the muscles in my back from recording in my closet. If you are physically in pain or experiencing discomfort while recording, stop immediately and move your setup.
Hard agree, I have a pretty inflexible setup since I don’t have a laptop and my PC is super heavy and cumbersome to move, so when I recorded Yu Fujimura V2, I was in a room notorious for echo with the heater on (I wasn’t going to freeze to record my samples and I wouldn’t have many other opportunities to record) while my Guinea pigs were behind me creating even more background noise. There are obviously more ideal setups like I mentioned in the video, but if it’s not possible, practical, or in your case, healthy, don’t feel like you shouldn’t record anything at all.
Ik nobody asked, but my closet in small and filled with a buncha clothes hanging from a hanger, which may be annoying but its a good choice if youc an fit.
@@GarGar_ With all due respect, that sounds like a terrible place to record. You’ll presumably be hunched over a laptop for an extended period of time, putting strain on your back. If you have to squeeze yourself inside, it can’t be an ideal recording place.
I recorded in three different places:my first incomplete utau in my room, almost without a mic, second, in my living room, with a mic this time, and at my cousins house, my sister and cousin chill together, but my fathers part of the family is really, REALLY loud, so i went into my cousins room because they weren’t in there and were outside, i followed none of the steps and my VB sounded super DUPER robotic, but, for being so committed to something, I’m proud of myself and don’t mind, just made akuma, butterfly is next, oh god.
It’s been so long since I’ve last seen a new UTAU tutorial where it’s actually straightforward, not so vague af, made like a joke, and not a bunch of filler commentary, which unnecessarily pads their videos for some reason, and I thank you for that.
Wanted to leave a big thank you here. Your tutorial was the resource which motivated and gave me a proper grasp on how to actually tackle all of this. I thought creating such a voicebank was always a very difficult endeavour but you made it really understandable and approachable for me. Therefore I finally realised my 10 years old plans of creating the Banks I wanted. Having some up-to-date resources on this was a massive help so yeah, now Iam UTAUing too :D
i have never used utau in my entire life and have zero intention to use it but this was so interesting!!! i love seeing the behind the scenes so ty for making this!!!
This is a great tutorial! I've been making UTAU for 10+ years, and this will definitely be a video I sent to friends just starting out, but who understand the basic terms. c:
Hey y'all, this comment is gonna have some quick addendums to the tutorial, either adding things that I neglected to mention or correcting any mistakes. This comment will be updated as I find more corrections, so please read: - I neglected to mention that this tutorial is written with the assumption that your computer runs on Windows. Some of the programs I’ve mentioned may not work on other operating systems and you’ll have to look for alternatives. - As of editing this comment, the download link for OREMO has been down for a while but there's an alternate download link I've added in the description. - Not necessarily an error, but at 7:03, I expressed my dislike towards CV English and despite still standing by my preference towards VCCV English/ARPAsing, I feel that I should have worded it differently. CV English works better than I gave it credit for, but I still prefer the results of other recording types and I still recommend you use VCCV/ARPAsing. - The diagram at 24:06 - 24:18 is incorrect, the part that won't be looped should include everything between the left blank and consonant, and everything between the consonant and right blank will not be looped. - I've been told that for CV hard consonants, you're better off oto-ing based on 30 overlap (lower for shorter consonants) with the overlap being slightly before the consonant starts, then the preutterance being against the start of the vowel and the consonant area ending shortly after the vowel levels out. The technique I've shown in the video can still work, but if you're unhappy with your otos, you may want to try this out.
I wish I had this knowledge back in 2014 when I first tried to make my own voicebank. It's never too late to start. This will help heal my inner child ❤ Thank you for the video!
YOU HAVE NO IDEA HOW LONG I'VE WAITED FOR A TUTORIAL LIKE THIS!!!! I tried making one several years ago but gave up because I didn't undeerstand some part and straight up just got lost, so thank you so much for making this video aaa!!!!!
36:20 - By the way, if you record all your append sets at the same pitches, they can share the same prefix map, so long as the note suffix is AFTER the append suffix (ie: "_SD4" instead of "D4_S"). This is how Arachne's English bank and her (in beta) French bank work. You can just type in "_S" and the prefix map will handle the rest. This will NOT work if your appends are not recorded at the same pitch (unless you're willing to make some... adjustments to the oto)
tbh this was not as hard as i thought it would be! i also decided to jump straight into vcv after watching this and im currently oto-ing my vb right now, i'm not having any trouble oto-ing but with so many lines and my extremely short attention span, i'm gonna be working on this bank for weeks 💀
thank you!! this was confusing at first, but it got me to understand a lot of the terminology and basic terms coming back to this after watching more UTAU tutorials or utau-making livestreams makes it easier to understand and intensely helpful
i haven't watched through the whole thing yet but thank you so much for making an utau tutorial in 2023 first of all and second of all THANK YOU for including proper subtitles !!! i can never follow without them
Don't spend any more than maybe $50 on a usb mic as the returns are super negligible for the increase in price. If you want something better than the blue snowball or similar you should just save up for a Scarlett Solo and an AT2020. But if you don't plan to use your mic for anything other than maybe making voice banks and some voice chat there's no reason to pay more than $50 for a mic. But, don't let only having a crappy mic stop you from making a voice bank. The important thing is that you actually do THE THING in any capacity. You can always remake your voice bank down the line with your added experience from when you had bad equipment and minimal experience.
Very helpful advice here. Glad to have found this, will be perfect to show to my friends who want to make voicebanks. I feel you on the VCV thing. I dove right in lol. Also, Yu is so cute I love their voice.
This is nice. I'm thinking of making a couple of metal scream/growl voicebanks as there aren't many so far and I can do a couple styles of heavy vocals.
I think there was a mistake? Pre-utt just tells UTAU where the sound starts on the note, while the consonant is the 1 that covers what not to loop. The area between the pre-utt & consonant will not loop or stretch in UTAU. & looking at the CV especially, that's why it's choppy sounding. There's a ton of space before the sound, so it's great to take advantage of it! Having overlap under 20 doesn't do it for me, while having it around 70 or more can fake a VCV in terms of smoothness fairly ok. Vb's like KYE are praised as great CV vb's cuz they're not afraid to make those parameters big. KYE's parameters tend to have closer to a 2:3 ratio with the overlap & preutt. Granted there's no "one way" to config a vb cuz it can come down to preference & usage, but having bigger ratios for CV tend to help a lot. Also??? CVE is a commonly used method that can & does give a good result. A lot of BrE users especially seem more likely to prefer CVE over CVVC for their accent.
Thanks for catching the pre-utterance/consonant part, I'm adding that part to a pinned comment containing addendums. While I may have reworded my comments regarding CV English if I rewrote this video, I still much prefer VCCV English/ARPAsing because it's easier to use and it's generally more flexible.
@@yupphireutau That's fair. Some others may prefer something else. BrE doesn't work for either of those methods as far as I've been told, but also CVE is a great start into ENG usage if one isn't used to all the sounds the language can have & I think it's more flexible than people give it credit for. Arpasing just makes me panic from how inconsistent it gets with the dupe numbers. Arpabet is interesting tho. It's fairly easy to understand & memorize the aliases faster than XSampa & CZSampa/VCCV. I still like CVE's aliasing tho even if it's not 100% consistent in how people alias the sounds lol
ive wanted to make a voice bank or two. thank u so much.edit: OMGOMGOGMGMOGMOGMOGMOGMO SUBTITLED TAHNKY OU SOOOOOOOOOOOOO MUCH I HAVE TROUBLE UNDERSTANDING WTF IM TOLD WITHOUT TEXT THANK YOU SO MUCH
Hello there!! Nice video, very well done - thorough and insightful. Question; in the beginning of this video, you mention other programs coming out and Utau being an "older" program at this time. What newer programs were you referring to...?
@@yupphireutau Ahhh, thank you for the reply!! :) Your video has inspired me to attempt to make my own UTAU voicebank!! HOWEVER; i am running into an issue where Oremo is being blocked and is not running on Mac Big Sur. Have you encountered this issue..? If so, have you been able to circumvent this problem so that Oremo can function as normal on the current Mac OS...?? I am vexxed :/
@@chrismajor1646 My PC runs on Windows so I unfortunately can't help you there. UTAU stuff on Mac is a bit more difficult from what I've heard from others
fun fact for generating frequencies !!! there's a hidden button between the gap of edit freq, map and initialize freq. map where it allows you to generate frequencies in bulk! (i do not know if this works properly !!!)
thank you so much you are the saviour of the modern vocal synth fandom!! 😭😭 just for one thing i'd like to ask if that's alright, this will be my first time creating an utau, except i use mac and can't access oremo and setparam. do you have any resources on how to record an utau without oremo, or even any pointers on what i should do? :0
I'm on Windows so I can't really help much. As far as I'm aware, I don't think there are any oremo alternatives, you might have to record all of your samples in audacity and do everything manually. For oto-ing, I think vLabeler is available for Mac so you could try that. You might want to ask other mac utau users though.
40:43 fyi I don’t think a TOS is legally enforceable. However, you probably still own the rights to the voice in the voice bank at the very least. Do y’all use like, Creative Commons licenses and such too?
oremo is becoming the hardest part of this for me. i wish you had a more detailed explanation on how to use it!! this tutorial is really helpful but im stuck here and cant find anything that helps me with my specific questions :(
Feeling kinda daunted now with my dreams of "maybe i don't want to make a vb with all the bells and whistles quite yet but i do want to take advantage of my natural vocal range…and if i record samples every 6 semitones i'd still have to do 6-7 pitches for that orz" but *that ain't gonna stop me!* Wish I knew where my microphone went tho 😭😭😭
If there is no oto file at all, your voicebank won't produce any sound. Without the oto file, your voicebank has no aliases, which means there's no way for UTAU to know how to use your samples. If you mean having only an unmodified pre-generated base oto, it will produce sound, but the timings will be off.
Hey! Thank you very much for this amazing tutorial! I was wondering, once the voicebank is made, would you happen to know a software similar to Vocalo Changer that you could just open in a DAW as a plugin and import the voicebank that you just created (unlike Vocalo Changer where you have to stick with their preset voicebanks) ? That would make my worklow much quicker cause I coudl just record a vocal track and put the plug in on it instead of having to write every midi note and typing texts! Thanks!
@@yupphireutau Thank you very much for the quick answer! I thought that would be the case… I was also wondering if you’d know any other software similar to UTAU where you could import users voicebanks and use a piano roll interface ? The reason why I’m asking is because I struggle with the Japanese interface, if there was one similar to Vocaloid 6 that wouldbe ideal! What is great about Vocaloid 6 is that you can use it as plug in, what’s really annoying though is that they won’t you let you use voicebanks other than the ones they make :/ Thanks again!
okay soo... I've been having trouble.. Downloading OREMO. Whenever I go to the site to download it, it never works. I've searched everywhere for a link that might work. Is it too old? I can't download it and I really want to make an Utau. Can someone help???
I think the Spanish version of the site may be functional. The downloads should still be the same so try and see if this works: es.osdn.net/users/nwp8861/pf/OREMO/files/
I’m assuming you’re talking about CV recordings but I’d say about 3-7 seconds is good. (I’m not experienced in recording CV banks to take this with a grain of salt)
with some time and research I could probably provide some info on how to record more complex voicebanks such as recording and otoing VCCV Eng (with PaintedCzs reclist), not so much with tuning VCCV eng however
VCCV English voicebank tutorials already exist though, I have no intention of trying to create a tutorial for something when the most qualified person to make it (the reclist creator) already has in great detail
please help I don't understand VCV at ALL. I dislike using OREMO for it's.... not the best OTO features, and how do I find the tempo of my recording? how
hi! I wanted to create one for a specific project and wanted to use mostly for narting purposes (for a robot oc) and I'm not even sure I'll ever use it for musics as I'm pretty bad and extremely ignorent about musical stuffs, so I was wondering if there's less our more specific stuff to do for it? thanks a lot if you decide to take the time to answer me 🙏
@@yupphireutau ha, well I've only quickly watched your video once as I was busy.. but as I understood, your tutorial is mostly to make voicebank for songs purposes. Tho, in my case, I would like to make one for narrating purposes, I want to basically only make it talk, so what I'm asking for is if it's any different from this tutorial? Sorry for long explanation :')
The process will be the same because the process of making the utau sing vs talk is done after making the voicebank. But consider the fact that making an utau talk is way more difficult than making an utau sing.
I'm commenting for help again because I don't know where else to ask and I can't make an account on uta forum for some reason (sorry) So I made my CV voicebank and I love them sm! But now I wanted to make a VCV bank! AND IT'S DRIVING ME TO TEARS. I was having trouble to record with BGM so I didn't use it. It was too distacting and glitched out my computer. So Otoing will probably be harder than it would usually be, but that's not the problem I'm concerned about. In voice configurations, it has "alias" what is that? Is that like what you type when you put the note down? And is it supposed to be there automatically? Because I have no clue how VCV notes work. help me please >m
Basically, yes. In VCV voicebanks, you'll have aliases that look like a か, - べ, and o な. If you're making your base oto using SetParam, it should generate all the aliases automatically.
I’ve never experienced this issue before but try opening up the oto in notepad and check if the aliasing got messed up. It should look something like the otos I’ve showed in the video (like at 25:21)
@@yupphireutau After a couple of tries it still says "Could not read oto". I'm making a CV Romaji voicebank and I don't want to use Moresampler because like you said, it's horrible.
the values don't seem to be showing up when i try to auto-estimate VCV on setparam. i've used Zurui's reclist and bgm. im not sure whats going on with it? i tried CV before and it worked fine. EDIT: found the fix, it was because my file names were in romaji and not japanese. however, i still have a few questions... are we to oto each and every individual sample in setparam? and how do i go about converting a notepad document to shift-jis?
super out of left field question but do you know anyone who has recorded like. distorted fry screams or anything for UTAU? if so, how'd they make that work?
If you're okay with recording in Audacity, doing a LOT of things manually, and making oto-ing more time-consuming because there's no background tempo, then yes.
A couple things: • CV English is not inherently bad, it's just got a steeper learning curve. Look at Pankune Kinzoku, she's a CV English voice and she's even higher quality than most VCCV or Arpasing voicebanks • The CV oto information is incorrect, especially with hard consonants.
I mentioned this in another reply, but while I feel like I should have worded my comments regarding CV English, I still prefer VCCV English/ARPAsing banks (plus they're much more common). Also, can you clarify on what about the CV oto information is wrong? I want the tutorial to be as correct as possible so if there are any critical errors, I want to add them to the pinned addendum comment.
@@yupphireutau mainly just hard consonants, you're better off otoing based on 30 overlap (15 on shorter consonants like [b]) with the overlap being slightly before the consonant starts, then the preutterance (at no specific value because consonants aren't fully consistent) being against the start of the vowel and the consonant area ending shortly after the vowel levels out. As for vowels, 100 overlap and 40 preutterance works best ^^
There are additional resources for oto-ing in the description. I have no plans to make an additional oto-ing tutorial because I am not very experienced with it (there's a reason I needed Gen to effectively write the oto section for me lol)
@ I actually figured out how to do it recently. Took 2 days included procrastination but it works out well. I still don’t get the cvvc stuff but if it works then it works. I might make a voice bank but maybe in the future since it seems like a lot of studying compared to the standard stuff you can get right out of the gate
@@StardustOwO It isnt that hard to understand, its mostly a bit more time consuming to make, cause there arent that many english voicebank centered tutorials or programs to help you compared to japanese voicebanks
@@yupphireutau ah okay, sorry I'm a bit (a lot) of an idiot Most tutorials I've seen show them but the recordings directly into utau so I was just confused sorry haha 😅
If you try to use Moresamler to oto, it won't be accurate and you need to edit them manually. For CV / CVVC choose to oto in CVVC (If you are making a CV voicebank, the VC part e.g. - a [a k] ka WON'T work.) For VCV choose to oto in VCV REMEMBER to change your Region > Language for non-Unicode programs > Japanese(Japan) to oto your voicebank with hiragana but not only romaji.
OH, i don't like to make two comments, but i have a question, Why are voicebanks like that of nana haruka, namine ritsu, and yokune ruko male so complicated? i just recorded vowels seperately and put those together to make words, is this normal, or did something go wrong in the process of making mine?
You can't make a voicebank sing in comprehensive Japanese (or any language) with only vowels, you also need samples that have consonants. Any Japanese CV, CVVC, or VCV reclist worth using will contain more than just your vowels. In the case of those specific voicebanks, they are all VCV voicebanks. As I explained in the video, these voicebanks have more and longer samples in order to create a smoother, less choppy result.
Okay, i know i may should really stupid because my two brain cells can only comprehend certain words put together, BUT, I know, but these voice banks are all like - a い A4 and - く A B3, and when i check the voicebanks, They all have thousands of these and it makes it harder for me to use them, i also can’t figure out which version of - they use, the only other voice bank I’ve been able to properly use(kinda) is Teto english, and the first ritsu voice bank, i tried WORLD, but that doesn’t work, and Mel just doesn’t work, defoko seems to also be similar to my voicebank with just
I have a whole section of the video dedicated to explaining the different types of voicebanks but I’m not sure you watched it. The reason Defoko is so simple and easy to use is because her voicebank is a CV (consonant-vowel) voicebank. These types of voicebanks are characterized by having samples consisting of single syllables such as あ(a), り(ri), and と(to). This means you only need to enter single hiragana characters into the notes in order to make them sing. The main drawback to CV as I explained in the video was that the results usually sound choppy and unnatural. VCV voicebanks function, as I also explained in the video, by cross fading vowels. So if you wanted the voicebank to sing the word わたし(watashi), you’ll have to type in the following: (- わ)(a た)(a し) into three separate notes. The wa note needs to start with the hyphen because it’s the starting note and there’s nothing before it. The middle note needs to contain both the vowel from the previous note and then the hiragana character for your current note. So since the previous note is wa, it’s a. So therefore, the second note should be (a た). Same goes for the third note. You use the previous vowel, (in this case, it’s “a” again) and then the hiragana character that you want for that note. If you have more questions regarding voicebanks, please make sure you watch the video first because a lot of this is stuff that I’m repeating from the video that I already made.
@@yupphireutau Like "do rei mi fa so la ti do" do you have to record all of those yourself per kana? or when you put your voice into utau it will do it automatically
"Windows version: OREMO ver. 3.0-b140323 (Japanese/English): Download from OSDN (10.2 MB, release date=2014/03/23) blog description" that one i should download correct?
You have to record samples for every single phoneme in a language and oto/configure them in order for utau to play those notes. Utau can’t generate voiced samples for you
Omg please help! you're like the only person to bring up OpenUtau and it's what I mainly use. I recorded a multipitch voicebank, otoed it, and its showing up in OpenUtau but when I import USTs it's not auto adjusting to the pitch like it's supposed to. I set up the subbanks and stuff so I don't understand why it's not auto adjusting to it's correct pitch It's not a problem with the UST because other VCV multipitch banks auto adjust perfectly fine
Ik this was made a while ago but I’m trying to download Utau-synth on my Mac since that’s the version for Mac and it gave a warning saying it can’t open the thingy because it’s from an unknown developer 😬 and then I got scared shetless and deleted the stuff. I went to the official site too, it wasn’t some random page
@@yupphireutau me neither lol. I had the same problem where Utau would just repeatedly crash. It’s probably incompatible with one of the Mac OS updates, and turning off the disk encryption is a dangerous idea. I’ll have to find a different software but I thought it was worth asking if you knew anything before I did, thanks for taking the time to reply
"im a crazy person who jumped headfirst into vcv"
laughs in vcv-e
What’s vcv-e?
Vcv-e? What's that?
@@Wonderhoy-er I think it's VCV English
@@CorpseCrawler maybe…
@@CorpseCrawlerI wanna know what vcv english sounds like now
I have seen a lot of CV english but what about vcv english?
It's about time someone made a video about recording a voicebank in its entirety. I like it a lot.
Fun fact I record in my living room or sitting on my bed/the couch
I mean this genuinely, if you are unable to record physically comfortably in a closet or under a desk DO NOT RECORD THERE. Your discomfort will come through in your voice and will result in strained, laboured sounding samples in UTAU, or in my case, I actually managed to give myself mild vocal damage and moderate damage to the muscles in my back from recording in my closet. If you are physically in pain or experiencing discomfort while recording, stop immediately and move your setup.
Hard agree, I have a pretty inflexible setup since I don’t have a laptop and my PC is super heavy and cumbersome to move, so when I recorded Yu Fujimura V2, I was in a room notorious for echo with the heater on (I wasn’t going to freeze to record my samples and I wouldn’t have many other opportunities to record) while my Guinea pigs were behind me creating even more background noise. There are obviously more ideal setups like I mentioned in the video, but if it’s not possible, practical, or in your case, healthy, don’t feel like you shouldn’t record anything at all.
Ik nobody asked, but my closet in small and filled with a buncha clothes hanging from a hanger, which may be annoying but its a good choice if youc an fit.
@@GarGar_ With all due respect, that sounds like a terrible place to record. You’ll presumably be hunched over a laptop for an extended period of time, putting strain on your back. If you have to squeeze yourself inside, it can’t be an ideal recording place.
I recorded in three different places:my first incomplete utau in my room, almost without a mic, second, in my living room, with a mic this time, and at my cousins house, my sister and cousin chill together, but my fathers part of the family is really, REALLY loud, so i went into my cousins room because they weren’t in there and were outside, i followed none of the steps and my VB sounded super DUPER robotic, but, for being so committed to something, I’m proud of myself and don’t mind, just made akuma, butterfly is next, oh god.
@@yupphireutau eh, im a small person. And the fluffy clothes provide comfort, at least for me :)
It’s been so long since I’ve last seen a new UTAU tutorial where it’s actually straightforward, not so vague af, made like a joke, and not a bunch of filler commentary, which unnecessarily pads their videos for some reason, and I thank you for that.
☆ Timestamps :3 !! ♡
0:00 intro
3:26 Voicebank types
9:19 Preparations and planning
13:59 Japanese pronunciation
16:56 Recording
22:37 Otoing 😰 Intro and basic theory
24:18 Otoing Base otos
29:23 Otoing Settings
29:40 Cv otoing
30:28 Vcv otoing
31:25 Cvvc otoing
32:54 Files
37:13 character concepts
38:57 Distribution
41:27 Outro
Wanted to leave a big thank you here. Your tutorial was the resource which motivated and gave me a proper grasp on how to actually tackle all of this.
I thought creating such a voicebank was always a very difficult endeavour but you made it really understandable and approachable for me.
Therefore I finally realised my 10 years old plans of creating the Banks I wanted.
Having some up-to-date resources on this was a massive help so yeah, now Iam UTAUing too :D
Do I have any intentions to make a UTAU voicebank? No not at all. But I still feel very informed.
Same
i have never used utau in my entire life and have zero intention to use it but this was so interesting!!! i love seeing the behind the scenes so ty for making this!!!
This is a great tutorial! I've been making UTAU for 10+ years, and this will definitely be a video I sent to friends just starting out, but who understand the basic terms. c:
Hey y'all, this comment is gonna have some quick addendums to the tutorial, either adding things that I neglected to mention or correcting any mistakes. This comment will be updated as I find more corrections, so please read:
- I neglected to mention that this tutorial is written with the assumption that your computer runs on Windows. Some of the programs I’ve mentioned may not work on other operating systems and you’ll have to look for alternatives.
- As of editing this comment, the download link for OREMO has been down for a while but there's an alternate download link I've added in the description.
- Not necessarily an error, but at 7:03, I expressed my dislike towards CV English and despite still standing by my preference towards VCCV English/ARPAsing, I feel that I should have worded it differently. CV English works better than I gave it credit for, but I still prefer the results of other recording types and I still recommend you use VCCV/ARPAsing.
- The diagram at 24:06 - 24:18 is incorrect, the part that won't be looped should include everything between the left blank and consonant, and everything between the consonant and right blank will not be looped.
- I've been told that for CV hard consonants, you're better off oto-ing based on 30 overlap (lower for shorter consonants) with the overlap being slightly before the consonant starts, then the preutterance being against the start of the vowel and the consonant area ending shortly after the vowel levels out. The technique I've shown in the video can still work, but if you're unhappy with your otos, you may want to try this out.
this is so complicated, my respect just skyrocketed to anyone who uses vocaloid/utauloid softwares. thank you for the video!!!
I wish I had this knowledge back in 2014 when I first tried to make my own voicebank. It's never too late to start. This will help heal my inner child ❤ Thank you for the video!
YOU HAVE NO IDEA HOW LONG I'VE WAITED FOR A TUTORIAL LIKE THIS!!!! I tried making one several years ago but gave up because I didn't undeerstand some part and straight up just got lost, so thank you so much for making this video aaa!!!!!
36:20 - By the way, if you record all your append sets at the same pitches, they can share the same prefix map, so long as the note suffix is AFTER the append suffix (ie: "_SD4" instead of "D4_S"). This is how Arachne's English bank and her (in beta) French bank work. You can just type in "_S" and the prefix map will handle the rest. This will NOT work if your appends are not recorded at the same pitch (unless you're willing to make some... adjustments to the oto)
tbh this was not as hard as i thought it would be! i also decided to jump straight into vcv after watching this and im currently oto-ing my vb right now, i'm not having any trouble oto-ing but with so many lines and my extremely short attention span, i'm gonna be working on this bank for weeks 💀
thank you!! this was confusing at first, but it got me to understand a lot of the terminology and basic terms
coming back to this after watching more UTAU tutorials or utau-making livestreams makes it easier to understand and intensely helpful
i haven't watched through the whole thing yet but thank you so much for making an utau tutorial in 2023 first of all and second of all THANK YOU for including proper subtitles !!! i can never follow without them
38:24 Dislikes: Thinking
If that ain't a vibe lmao 🤣
Don't spend any more than maybe $50 on a usb mic as the returns are super negligible for the increase in price. If you want something better than the blue snowball or similar you should just save up for a Scarlett Solo and an AT2020.
But if you don't plan to use your mic for anything other than maybe making voice banks and some voice chat there's no reason to pay more than $50 for a mic.
But, don't let only having a crappy mic stop you from making a voice bank. The important thing is that you actually do THE THING in any capacity. You can always remake your voice bank down the line with your added experience from when you had bad equipment and minimal experience.
me watching this even though ive been in the utau community since 2016 and my vb is literally in this video
hearing my own raw files was a jumpscare ngl
update: i watched the whole thing
im ur biggest fan
uwu
it's a lot to digest, but at least now i have this one video i can keep coming back to reference if i need help. thanks ;m;
Hope to actually try using UTAU somewhat soon because I wanna try doing this, it just sounds really fun (other than the oto-ing part lol)
Thank you lots for the tutorial! I've been wanting to create my own vb for a while and the info you give is easy to follow ♥
Very helpful advice here. Glad to have found this, will be perfect to show to my friends who want to make voicebanks. I feel you on the VCV thing. I dove right in lol. Also, Yu is so cute I love their voice.
This is nice. I'm thinking of making a couple of metal scream/growl voicebanks as there aren't many so far and I can do a couple styles of heavy vocals.
I don't care man, i'm gonna jump straight into english! Great tutorial!
I could have used this when making Vinzui V1 and V2, and will be refering to this when making V3
Thank for such an amazing video!
This was beautifully explained in a way I can understand haha
And I finally get how to use VCV voicebanks!!
I just started learning with the Vocaloid editor last week, and this video popped up on my homepage. Thank you, it was really helpful.
lol we're both thinking of doing the same thing on the same day
I think there was a mistake? Pre-utt just tells UTAU where the sound starts on the note, while the consonant is the 1 that covers what not to loop. The area between the pre-utt & consonant will not loop or stretch in UTAU. & looking at the CV especially, that's why it's choppy sounding. There's a ton of space before the sound, so it's great to take advantage of it! Having overlap under 20 doesn't do it for me, while having it around 70 or more can fake a VCV in terms of smoothness fairly ok. Vb's like KYE are praised as great CV vb's cuz they're not afraid to make those parameters big. KYE's parameters tend to have closer to a 2:3 ratio with the overlap & preutt. Granted there's no "one way" to config a vb cuz it can come down to preference & usage, but having bigger ratios for CV tend to help a lot.
Also??? CVE is a commonly used method that can & does give a good result. A lot of BrE users especially seem more likely to prefer CVE over CVVC for their accent.
Thanks for catching the pre-utterance/consonant part, I'm adding that part to a pinned comment containing addendums. While I may have reworded my comments regarding CV English if I rewrote this video, I still much prefer VCCV English/ARPAsing because it's easier to use and it's generally more flexible.
@@yupphireutau That's fair. Some others may prefer something else. BrE doesn't work for either of those methods as far as I've been told, but also CVE is a great start into ENG usage if one isn't used to all the sounds the language can have & I think it's more flexible than people give it credit for. Arpasing just makes me panic from how inconsistent it gets with the dupe numbers. Arpabet is interesting tho. It's fairly easy to understand & memorize the aliases faster than XSampa & CZSampa/VCCV. I still like CVE's aliasing tho even if it's not 100% consistent in how people alias the sounds lol
I have been looking for smth like that thank you SO much!
Well... I guess I'll stick to pitch shifting or using existing voice banks, lol...
I wish I had this video 2 months ago ahHH. This is so helpful, thank you!
Omg hiii I love this!!!
thanks for your effort!!
Thank you for making this ❤️
ive wanted to make a voice bank or two. thank u so much.edit: OMGOMGOGMGMOGMOGMOGMOGMO SUBTITLED TAHNKY OU SOOOOOOOOOOOOO MUCH I HAVE TROUBLE UNDERSTANDING WTF IM TOLD WITHOUT TEXT THANK YOU SO MUCH
update: i love your tutorial so much. genuinely tusan tack kiitos kiitos arigato for making somethign this well written FREE!!!
Hello there!! Nice video, very well done - thorough and insightful.
Question; in the beginning of this video, you mention other programs coming out and Utau being an "older" program at this time. What newer programs were you referring to...?
Mostly the likes of Cevio, Synthesizer V, and Deep Vocal but there are a bunch of other vocal synth softwares out there
@@yupphireutau Ahhh, thank you for the reply!! :) Your video has inspired me to attempt to make my own UTAU voicebank!! HOWEVER; i am running into an issue where Oremo is being blocked and is not running on Mac Big Sur. Have you encountered this issue..? If so, have you been able to circumvent this problem so that Oremo can function as normal on the current Mac OS...?? I am vexxed :/
@@chrismajor1646 My PC runs on Windows so I unfortunately can't help you there. UTAU stuff on Mac is a bit more difficult from what I've heard from others
@@chrismajor1646 make sure you have the latest version and for the right computer!
Finally i can my my UTAU FANLOID OC real!!
Hi it's me again 😍 I was stuck because all of the oto tutorials were super convoluted but this saved me!!!
where was this video 2 years agoooo 😭😭😭
fun fact for generating frequencies !!!
there's a hidden button between the gap of edit freq, map and initialize freq. map where it allows you to generate frequencies in bulk!
(i do not know if this works properly !!!)
thank you so much you are the saviour of the modern vocal synth fandom!! 😭😭
just for one thing i'd like to ask if that's alright, this will be my first time creating an utau, except i use mac and can't access oremo and setparam. do you have any resources on how to record an utau without oremo, or even any pointers on what i should do? :0
I'm on Windows so I can't really help much. As far as I'm aware, I don't think there are any oremo alternatives, you might have to record all of your samples in audacity and do everything manually. For oto-ing, I think vLabeler is available for Mac so you could try that. You might want to ask other mac utau users though.
😃-yeah I can tell this is gonna take me a long long time before I’ll finally make a UTAU
17:05 I have my headset mic... So basically, am i screwed?
Depends if it sounds good
So slayful and amazing and useful
40:43 fyi I don’t think a TOS is legally enforceable. However, you probably still own the rights to the voice in the voice bank at the very least. Do y’all use like, Creative Commons licenses and such too?
in love with your voice and utau bank also 😢
This is fantastic!
FINAAALLLY ANOTHER VIDEO
Imma comment and revisit this video =u= see ya later
I've somehow brute forced a voicebank
ON A CHROMEBOOK.
oremo is becoming the hardest part of this for me. i wish you had a more detailed explanation on how to use it!! this tutorial is really helpful but im stuck here and cant find anything that helps me with my specific questions :(
YESS AN ACTUAL TOTORIAL!
Feeling kinda daunted now with my dreams of "maybe i don't want to make a vb with all the bells and whistles quite yet but i do want to take advantage of my natural vocal range…and if i record samples every 6 semitones i'd still have to do 6-7 pitches for that orz" but *that ain't gonna stop me!* Wish I knew where my microphone went tho 😭😭😭
thank you so much this is so helpful and underrated
everything during the oto section sounds like wizard speak to me
This video changed my life
Making a voicebank just because i had a dream where i made one
Quick question, What happens if you don’t Oto your voicebank? Does it ultimately break or smth?
If there is no oto file at all, your voicebank won't produce any sound. Without the oto file, your voicebank has no aliases, which means there's no way for UTAU to know how to use your samples. If you mean having only an unmodified pre-generated base oto, it will produce sound, but the timings will be off.
@@yupphireutau wElL tHaT wAS fAsT
nyoom
thk a lot!! it really helped me! plus im kinda happy i speak arabic so i dont have a problem with the japanese language TvT
samesies!!
You are very good at speaking 👍
ty! This video really helped^^
Omg ive been searching for a video like this
Do you have any vids explaing how to use utau perhaps?
I've considered making one but I since I probably won't get to it soon, if ever, I recommend this guide: joezcafe.wixsite.com/joezutau/guides
Hey! Thank you very much for this amazing tutorial! I was wondering, once the voicebank is made, would you happen to know a software similar to Vocalo Changer that you could just open in a DAW as a plugin and import the voicebank that you just created (unlike Vocalo Changer where you have to stick with their preset voicebanks) ? That would make my worklow much quicker cause I coudl just record a vocal track and put the plug in on it instead of having to write every midi note and typing texts!
Thanks!
I don’t think so. If you want to do something like that then you’re probably better off looking into RVC instead of using UTAU
@@yupphireutau Thank you very much for the quick answer! I thought that would be the case… I was also wondering if you’d know any other software similar to UTAU where you could import users voicebanks and use a piano roll interface ? The reason why I’m asking is because I struggle with the Japanese interface, if there was one similar to Vocaloid 6 that wouldbe ideal! What is great about Vocaloid 6 is that you can use it as plug in, what’s really annoying though is that they won’t you let you use voicebanks other than the ones they make :/
Thanks again!
You could try OpenUTAU, which has similar functions to UTAU but has a different interface that a lot of people prefer.
@@yupphireutauOk, thanks!
26:28 temporary secretary jumpscare
THAT
IS
OVERLY
AWESOME
okay soo... I've been having trouble.. Downloading OREMO. Whenever I go to the site to download it, it never works. I've searched everywhere for a link that might work. Is it too old? I can't download it and I really want to make an Utau. Can someone help???
I think the Spanish version of the site may be functional. The downloads should still be the same so try and see if this works: es.osdn.net/users/nwp8861/pf/OREMO/files/
@@yupphireutau Hey it didn't work. But thank you for trying! I'll keep searching. :)
@@AyakodrawingzIf you haven't found it yet, try looking for Internet Archive backups of the site. That's what finally worked for me.
What's the perfect time to spend in each note? For example: 2 seconds holding the note before ending
I’m assuming you’re talking about CV recordings but I’d say about 3-7 seconds is good. (I’m not experienced in recording CV banks to take this with a grain of salt)
with some time and research I could probably provide some info on how to record more complex voicebanks such as recording and otoing VCCV Eng (with PaintedCzs reclist), not so much with tuning VCCV eng however
VCCV English voicebank tutorials already exist though, I have no intention of trying to create a tutorial for something when the most qualified person to make it (the reclist creator) already has in great detail
if you know how ro, could you make a deepvocal version?
I have no experience with deepvocal
Hi, and since i love tutorials of UTAU, what microphone did you use? So i can better tune it.
As stated in the video, I use a Blue Snowball.
@yupphireutau okay, thank you!
is very any difference between UTAU and openUTAU when it comes to voicebank creation?
No, all voicebanks that work in utau should work in OpenUTAU and vice versa
@@yupphireutau thankyou!
it is time i can make a sans utau
I am going to learn this... I hope
please help
I don't understand VCV at ALL. I dislike using OREMO for it's.... not the best OTO features, and how do I find the tempo of my recording? how
cool.
hi! I wanted to create one for a specific project and wanted to use mostly for narting purposes (for a robot oc) and I'm not even sure I'll ever use it for musics as I'm pretty bad and extremely ignorent about musical stuffs, so I was wondering if there's less our more specific stuff to do for it? thanks a lot if you decide to take the time to answer me 🙏
I’m not quite sure what you’re asking here
@@yupphireutau ha, well I've only quickly watched your video once as I was busy.. but as I understood, your tutorial is mostly to make voicebank for songs purposes. Tho, in my case, I would like to make one for narrating purposes, I want to basically only make it talk, so what I'm asking for is if it's any different from this tutorial? Sorry for long explanation :')
The process will be the same because the process of making the utau sing vs talk is done after making the voicebank. But consider the fact that making an utau talk is way more difficult than making an utau sing.
yupphireutau the goat 🎉
I'm commenting for help again because I don't know where else to ask and I can't make an account on uta forum for some reason (sorry)
So I made my CV voicebank and I love them sm! But now I wanted to make a VCV bank! AND IT'S DRIVING ME TO TEARS.
I was having trouble to record with BGM so I didn't use it. It was too distacting and glitched out my computer. So Otoing will probably be harder than it would usually be, but that's not the problem I'm concerned about. In voice configurations, it has "alias" what is that? Is that like what you type when you put the note down? And is it supposed to be there automatically? Because I have no clue how VCV notes work. help me please >m
Basically, yes. In VCV voicebanks, you'll have aliases that look like a か, - べ, and o な. If you're making your base oto using SetParam, it should generate all the aliases automatically.
my cv banks always have the wrong frequency, and oremo's auto otoing is broken
i dont have a studio mic either
When I download the latest version on Oremo it just shows a bunch of symbols, and no hiragana is shown. How do I fix it? I'm doing a CV voicebank.
Try changing the reclist and ensuring your PC’s locale is set to Japan
@@yupphireutau It worked but now when I run Number Duplicate Aliases it just says "Could not read oto"
I’ve never experienced this issue before but try opening up the oto in notepad and check if the aliasing got messed up. It should look something like the otos I’ve showed in the video (like at 25:21)
@@yupphireutau After a couple of tries it still says "Could not read oto". I'm making a CV Romaji voicebank and I don't want to use Moresampler because like you said, it's horrible.
the values don't seem to be showing up when i try to auto-estimate VCV on setparam. i've used Zurui's reclist and bgm. im not sure whats going on with it? i tried CV before and it worked fine.
EDIT: found the fix, it was because my file names were in romaji and not japanese.
however, i still have a few questions... are we to oto each and every individual sample in setparam? and how do i go about converting a notepad document to shift-jis?
super out of left field question but do you know anyone who has recorded like. distorted fry screams or anything for UTAU? if so, how'd they make that work?
oremo wont open for me. it says something about brackets not being executed? is it like MMD where i have to install other thing for it to open?
It shouldn't need other programs but you do need to have your locale set to Japan and be running Windows.
@@yupphireutau is there anyway i can record an utau voicebank without changing my locale?
If you're okay with recording in Audacity, doing a LOT of things manually, and making oto-ing more time-consuming because there's no background tempo, then yes.
This is too much for my brain to handle
A couple things:
• CV English is not inherently bad, it's just got a steeper learning curve. Look at Pankune Kinzoku, she's a CV English voice and she's even higher quality than most VCCV or Arpasing voicebanks
• The CV oto information is incorrect, especially with hard consonants.
I mentioned this in another reply, but while I feel like I should have worded my comments regarding CV English, I still prefer VCCV English/ARPAsing banks (plus they're much more common). Also, can you clarify on what about the CV oto information is wrong? I want the tutorial to be as correct as possible so if there are any critical errors, I want to add them to the pinned addendum comment.
@@yupphireutau mainly just hard consonants, you're better off otoing based on 30 overlap (15 on shorter consonants like [b]) with the overlap being slightly before the consonant starts, then the preutterance (at no specific value because consonants aren't fully consistent) being against the start of the vowel and the consonant area ending shortly after the vowel levels out.
As for vowels, 100 overlap and 40 preutterance works best ^^
Other than that, I think SunGuardian covered the rest
I’m at the stage of otoing and it’s realllllly confusing. Do you have/maybe can you make a video on otoing specifically for vcv?
There are additional resources for oto-ing in the description. I have no plans to make an additional oto-ing tutorial because I am not very experienced with it (there's a reason I needed Gen to effectively write the oto section for me lol)
3:05 so you're saying im screwed? cool....
Nah, there are some Easy english voicebank versions like, V - C english
@ I actually figured out how to do it recently. Took 2 days included procrastination but it works out well. I still don’t get the cvvc stuff but if it works then it works. I might make a voice bank but maybe in the future since it seems like a lot of studying compared to the standard stuff you can get right out of the gate
@@StardustOwO It isnt that hard to understand, its mostly a bit more time consuming to make, cause there arent that many english voicebank centered tutorials or programs to help you compared to japanese voicebanks
quick question: if i plan on having a voivebank for openutau, do i make the voicebank in regular utau first and then put it into open utau?
You don’t make voicebanks in UTAU itself
@@yupphireutau ah okay, sorry I'm a bit (a lot) of an idiot
Most tutorials I've seen show them but the recordings directly into utau so I was just confused sorry haha 😅
If you try to use Moresamler to oto, it won't be accurate and you need to edit them manually.
For CV / CVVC choose to oto in CVVC (If you are making a CV voicebank, the VC part e.g. - a [a k] ka WON'T work.)
For VCV choose to oto in VCV
REMEMBER to change your Region > Language for non-Unicode programs > Japanese(Japan) to oto your voicebank with hiragana but not only romaji.
Thanks for the Guide btw)
OH, i don't like to make two comments, but i have a question, Why are voicebanks like that of nana haruka, namine ritsu, and yokune ruko male so complicated? i just recorded vowels seperately and put those together to make words, is this normal, or did something go wrong in the process of making mine?
You can't make a voicebank sing in comprehensive Japanese (or any language) with only vowels, you also need samples that have consonants. Any Japanese CV, CVVC, or VCV reclist worth using will contain more than just your vowels. In the case of those specific voicebanks, they are all VCV voicebanks. As I explained in the video, these voicebanks have more and longer samples in order to create a smoother, less choppy result.
Okay, i know i may should really stupid because my two brain cells can only comprehend certain words put together, BUT, I know, but these voice banks are all like - a い A4 and - く A B3, and when i check the voicebanks, They all have thousands of these and it makes it harder for me to use them, i also can’t figure out which version of - they use, the only other voice bank I’ve been able to properly use(kinda) is Teto english, and the first ritsu voice bank, i tried WORLD, but that doesn’t work, and Mel just doesn’t work, defoko seems to also be similar to my voicebank with just
あくねやぃ etc, but all those other ones are like, beyond my mind
I have a whole section of the video dedicated to explaining the different types of voicebanks but I’m not sure you watched it. The reason Defoko is so simple and easy to use is because her voicebank is a CV (consonant-vowel) voicebank. These types of voicebanks are characterized by having samples consisting of single syllables such as あ(a), り(ri), and と(to). This means you only need to enter single hiragana characters into the notes in order to make them sing. The main drawback to CV as I explained in the video was that the results usually sound choppy and unnatural.
VCV voicebanks function, as I also explained in the video, by cross fading vowels. So if you wanted the voicebank to sing the word わたし(watashi), you’ll have to type in the following: (- わ)(a た)(a し) into three separate notes. The wa note needs to start with the hyphen because it’s the starting note and there’s nothing before it. The middle note needs to contain both the vowel from the previous note and then the hiragana character for your current note. So since the previous note is wa, it’s a. So therefore, the second note should be (a た). Same goes for the third note. You use the previous vowel, (in this case, it’s “a” again) and then the hiragana character that you want for that note.
If you have more questions regarding voicebanks, please make sure you watch the video first because a lot of this is stuff that I’m repeating from the video that I already made.
OKAY, I looked into it and experimented, and now I FINALLY know how and why they are made this way, thank you for answering!
I was gonna play around with an Ai Voice. Then i decided to ty this instead. Ahhh! Not another rabbit hole!
thank youuuuuuuuuuuuuuuuuuuuuuuuu omggggggggg
PJSEKAI MUSIC 0:00
All the background music in this video is from the project sekai game soundtrack
@@yupphireutauthat is so wonderhoy
wait so are u saying you have to record each character in each pitch? or will it tune itself
also theres like 20 versions of oremo
I’m not sure what you’re trying to ask here. Also, you should download the most recent version of English Oremo
@@yupphireutau Like "do rei mi fa so la ti do" do you have to record all of those yourself per kana? or when you put your voice into utau it will do it automatically
"Windows version: OREMO ver. 3.0-b140323 (Japanese/English): Download from OSDN (10.2 MB, release date=2014/03/23) blog description" that one i should download correct?
You have to record samples for every single phoneme in a language and oto/configure them in order for utau to play those notes. Utau can’t generate voiced samples for you
this is going to be fun
Meanwhile minor me doing this: *wth-*
Omg please help! you're like the only person to bring up OpenUtau and it's what I mainly use. I recorded a multipitch voicebank, otoed it, and its showing up in OpenUtau but when I import USTs it's not auto adjusting to the pitch like it's supposed to. I set up the subbanks and stuff so I don't understand why it's not auto adjusting to it's correct pitch
It's not a problem with the UST because other VCV multipitch banks auto adjust perfectly fine
I’m not familiar with that issue so I’m not of much help
Is there an alternative of Oremo for Mac? The Mac version is 32-bit, which isn't supported anymore.
Not that I know of. You’ll likely have to use something like audacity
@@yupphireutau Are there any guides?
Guides for what?
@@yupphireutau For using audacity to do this.
after i make the VB, can i test it via openUtau? utau voicebank install crashes the software on my linux pc
Yes
Ik this was made a while ago but I’m trying to download Utau-synth on my Mac since that’s the version for Mac and it gave a warning saying it can’t open the thingy because it’s from an unknown developer 😬 and then I got scared shetless and deleted the stuff. I went to the official site too, it wasn’t some random page
I have no experience with utau on Macs so I don’t know what to tell you
@@yupphireutau me neither lol. I had the same problem where Utau would just repeatedly crash. It’s probably incompatible with one of the Mac OS updates, and turning off the disk encryption is a dangerous idea. I’ll have to find a different software but I thought it was worth asking if you knew anything before I did, thanks for taking the time to reply