About three weeks a(…) - Off-Topic@Heyuri

File: Screen Shot 2026-05-26 at 3.15.12 PM.png

(205 KB, 2867x622)

ImgOpsHide image

▶

QueueSevenM◆Tnq5UWtkfs 2026/05/26(Tue)19:16:13 No.183892 +

▶

About three weeks ago, I promised that the following day, I would make a post the following day about a big update. I didn't say what it was about at the time, but it was going to be about the Excitation plus Resonance model. Unfortunately, I was not able to make the post at the time, and am sorry for that. However, I do have a significant update now.

Back in April, I began to implement the Excitation plus Resonance model. The Excitation plus Resonance is model of the voice timbre spectrum. It consists of several parts actually, but the two main ones in terms of implementation difficult are the source curve and vocal tract resonances.

The source curves approximates the frequency-domain response of the voice source, and the resonances correspond roughly to formants and are model using a modified version of the Klatt Formant. The problem lies not in implementing this model - it is actually very easy to do so - but rather estimating the parameters for this model, which is not given in any of the papers I've read. Back in April, I spent a couple days trying to create it. Ultimately, I came up with something that was quite poor quality, very slow, and required very specific parameters but accepted it. Much later though, I found that the exact method to estimate the source curve is actually given in the expired EpR patent.

Not wanting to repeat my previous mistake, when I began to work on EpR, I searched extensively for the method used to estimate the resonances, but came up with nothing. Eventually, I began to trying figure it out myself. I started by recreating the data in the paper by annotating an image extracted from the paper in Photoshop and then running a python script on said annotated images. I would then compared the result of my approach on said recreated data to the paper's.

I actually had been thinking about it long before my first attempt. I already had a method that I had conceived of. The main idea of this method was that the reciprocal of the second derivative could be used to approximate the bandwidths of the resonances.

Initially, after having to make a few adaptions to the approach after realizing several things wouldn't work quite how I thought, the results vastly outperformed my initial expectations. There were several ideas that didn't work out, but overall I felt good at the end of the first day. I debated whether to continue trying or to just go with what I had. I decided that would go with what I had and make a post about it the following day, this about three weeks however.

The next day however I decided to continue work instead. Initially, I was quite surprised that the second derivative reciprocal method even worked at all, so I overlooked many inadequacies. They were a lot larger than I had thought for some reason the previous. Specifically, in many of the most critical areas. It is worth noting that it is in a logarithmic scale (decibels), so apparently small errors in some areas can actually be very large. Furthermore, I discovered there was actually a mistake in the modified Klatt implementation, and that with the correct formulation, it performed much worse. Furthermore, the next day, I decided to recreate another sample from the paper, and it performed terribly. So I continued. I decided this time, I would not be satisfied until I had actually recreated it properly, no matter how long it took.

Three weeks later and 1,905 attempts later... Initially, I made quite some progress, however for over a week, I made basically no progress at all despite many hundreds of attempts. Finally, though, mainly in the past two to three days, I have made some major discoveries. Yesterday, I did a test that showed that combining an old idea I had ruled out with a new one showed significant progress. Just today, I have implemented that idea properly and made another significant improvement.

While this new implementation is still flawed, and actually has major issues for the higher frequencies, I think it shows considerable potential. Besides, it is more simplified than the previous implementation and I have some clear ways forward. This new version solves many issues I had before and doesn't require many of contrived things I had to do before.

One thing specifically it shows promise is in this section of the second recreated test sample.
Here is that section from the paper: https://files.catbox.moe/ontyhu.png
Here is the result from the old approach: https://files.catbox.moe/9uplbn.png
And now here is the result from the new approach: https://files.catbox.moe/gzu3q8.png

Marked for deletion (Old)

Name
E-mail	sagenokodump
Subject
Comment Tegaki	Emotes Kaomoji Emoji BBCode
File
Password	(for deletion)
Allowed file types are: gif, jpg, jpeg, png, bmp, webp, swf, webm, mp4 Maximum file size allowed is 50000 KB. Images greater than 200 * 200 pixels will be thumbnailed. 21 unique users in the last 10 minutes (including lurkers) Switch form position \| BBCode reference \| Banned? \| Quick reply \| Post API Read the rules before you post. Protect your username, use a tripcode! 日本のへゆり

Heyuri!

Bulletin Boards

Heyuri★CGI

Other

Off-Topic@Heyuri

Posting mode: Reply