VOCALOID1 MFPA impl(…) - Off-Topic@Heyuri

File: m1772488142904.png

(444 KB, 2003x1640)

ImgOpsHide image

▶

VOCALOID1 MFPA implementation QueueSevenM◆Tnq5UWtkfs 2026/03/04(Wed)18:37:41 No.175057 Yeah x1

▶

So I've been working on a project to re-implement the VOCALOID1 engine.
I'm basing it on the description in Jordi Bonada's PhD thesis "Voice Processing and Synthesis by Performance Sampling and Spectral Models" and not the original papers as the former is more detailed, easier to follow, and also describes the VOCALOID2 engine.

After a lot of trouble with getting TWM f0 estimation to work, I've finally gotten to implementing MFPA. And amazingly, it seems to have worked first try.

Compare my results:
https://i.ibb.co/dsvgv0fd/Screen-Shot-2026-03-02-at-3-54-48-PM.png

To the results in the study:
https://i.ibb.co/C3fjdWVd/Screen-Shot-2026-03-02-at-3-55-09-PM.png

Marked for deletion (Old)

1 QueueSevenM◆Tnq5UWtkfs 2026/03/04(Wed)18:40:59 No.175058 +

▶

File: m1772489166399.png

(282 KB, 1754x1278)

ImgOpsHide image

▶

Also here's the graph of the f0 estimate from the TWM. It's still somewhat flawed (see the jump down at frame 37), and it required using unusual parameters (Kaiser-Bessel beta 2.2 instead of the 1.95 recommended by the study, and only 6 harmonics instead of 11) to avoid instabilities even in relatively trivial scenarios. Actually I think I've finally figured out what's been wrong with the TWM the whole time - I forgot to convert the frequency in bins to the frequency in hertz. I think this was originally intentional, because only realized much later that the error formula is neither linear nor even relative. I haven't test the fix yet, however I'd imagine it should finally solve the problems I've had with TWM.

This graph specifically shows the estimated fundamental frequency for each 256-point frame of an E4 /e/ phoneme.

2 Anonymous 2026/03/04(Wed)18:44:20 No.175062 +

▶

日本語でおｋ

3 QueueSevenM◆Tnq5UWtkfs 2026/03/04(Wed)18:46:26 No.175063 +

▶

>>175062
What do you mean?

4 QueueSevenM◆Tnq5UWtkfs 2026/03/04(Wed)18:49:11 No.175065 +

▶

Actually I meant IPA /i/ not /e/.

5 TEH RAPEMAN 2026/03/04(Wed)19:24:36 No.175070 +

▶

I don't understand it well myself but i admire your efforts ヽ(´ー｀)ノ

6 Anonymous 2026/03/04(Wed)19:47:55 No.175073 +

▶

>as the former is more detailed, easier to follow, and also describes the VOCALOID2 engine.
you went with the harder to program paper? what made you choose this one?

also, wat are you planning to do with it afterwards? is it just a programming excercise? ヽ(ﾟρﾟ)ノ

7 QueueSevenM◆Tnq5UWtkfs 2026/03/04(Wed)20:10:23 No.175088 +

▶

>>175073
>you went with the harder to program paper? what made you choose this one?
No, "former" means first in the sentence, not chronologically.
>also, wat are you planning to do with it afterwards? is it just a programming excercise? ヽ(ﾟρﾟ)ノ
It was just that originally, but I now plan to eventually release it as an open-source library.

8 QueueSevenM◆Tnq5UWtkfs 2026/03/04(Wed)20:11:49 No.175090 +

▶

>>175070
You'd be surprised. Try just reading the paper from the start. You may find that you can actually understand it.

https://www.tdx.cat/bitstream/handle/10803/7555/tjbs.pdf?sequence=1&isAllowed=y

Name
Email	sagenokodump
Subject
Comment Tegaki	Emotes Kaomoji Emoji BBCode
File
Password	(for deletion)
Allowed file types are: gif, jpg, jpeg, png, bmp, swf, webm, mp4 Maximum file size allowed is 50000 KB. Images greater than 200 * 200 pixels will be thumbnailed. 25 unique users in the last 10 minutes (including lurkers) Switch form position \| BBCode reference \| Banned? \| Quick reply \| Post API Read the rules before you post. Protect your username, use a tripcode! 日本のへゆり

Heyuri!

Bulletin Boards

Heyuri★CGI

Other

Off-Topic@Heyuri

Posting mode: Reply