[Home] [Catalog] [Search] [Inbox] [Write PM] [Admin]
(for deletion)
  • Allowed file types are: gif, jpg, jpeg, png, bmp, swf, webm, mp4
  • Maximum file size allowed is 50000 KB.
  • Images greater than 250 * 250 pixels will be thumbnailed.



We will play Unreal Tournament 2004 this Saturday at 17:00 UTC [Info] [Countdown]


File: image.png
(1.05 MB, 835x709)[ImgOps]
1106043
drink me

File: meandwhomst.jpeg
(36 KB, 959x720)[ImgOps]
37348
Me and who?
>>
u
>>
>>178864
n mi, n mi n u (´人`)

File: 32.png
(1.45 MB, 2068x780)[ImgOps]
1523244
Hello Heyuri. If you could get a love doll of TWO of the following but NOT all three, which pair do you pick and why?
1) Hatsune Miku (original KEI design)
2) Megurine Luka
3) Kagamine Rin (Future Style)
9 posts omitted. Click Reply to view.
>>
miku & rin coz they're cute :hokke:
>>
I'd get Miku and Luka (takoluka edition).
>>
Miku and Luka, entirely because I don't like the Future Rin design much.

If it was regular Rin, it'd be perfect, and thus I would pick Miku and Rin.
>>
>>178739
agree :onigiri:
>>
File: rin-white.png
(242 KB, 700x700)[ImgOps]
248477
>>178726
Hmm, black clip or white clip?

File: egg.jpg
(131 KB, 1280x720)[ImgOps]
135034
:tehegg:

File: image.png
(2.81 MB, 1357x1920)[ImgOps]
2948948
OHMYFUCKINGGOD EASTER'S SOON I WANT EGGS I WANT EGGS NOWWWW!!!!!!!!!!!!(;´Д`)

File: wamu353.jpg
(39 KB, 600x382)[ImgOps]
40624
O hi yo :waha:
>>
File: x14_004.gif
(59 KB, 167x250)[Animated GIF][ImgOps]
60610
o hai thar

File: nigra.jpg
(1.05 MB, 3272x2082)[ImgOps]
1102683
There's a nigra on the moon! :astonish:
>>
Nigga stole my moonbuggy. :nigra:
>>
we are sendings humans back to teh moon, anyone following this event? I think its pretty cool after 50 years
>>
>>178803
I'm glad they chimped out and ate all the space funds with their Great Society welfare state bullshit so we couldn't just continue going there 50 years ago, thank you :nigra: now you can twerk and nae nae over Neil Armstrong's footprints or something

File: Fi9TRNPaYAAOvGd.jpg
(214 KB, 1200x1200)[ImgOps]
220039
Ducks.
>>
DICKS
>>
DICKS!? (;゚Д゚)
>>
File: duckroll.jpg
(48 KB, 598x477)[ImgOps]
49784
DUCKROLL
>>
>>

File: easter.jpg
(4.84 MB, 4000x2950)[ImgOps]
5076161
happy easter heyuri ( ´ω`)
>>
y-you got taiga making chocolate for you!?!? :angry:

also.. happyy easter! :biggrin:
>>
File: taiga.jpg
(941 KB, 1600x1200)[ImgOps]
963748
!
>>
:tehegg:

File: fate order.jpg
(212 KB, 1920x1080)[ImgOps]
217657
as i finished first route - unlimted blade, i wonder if i should go for teh tony hawk route next? I seen some clips of him doing 360s and it lookz cool. But i still have Sakuras route. I am thinking of just watching teh movies instead of finished the vn. Thoughts?

File: rei.jpg
(420 KB, 1193x843)[ImgOps]
430950
behold!
>>
Plain characters like me are only good enough for cloned manko... Fine, I'll have sex with teh Rei! :cry:
>>
>>178542
she's older than you sick fuck
>>
File: 1390895179602.jpg
(22 KB, 500x508)[ImgOps]
22915
AAAAAAHHHHHHHHHH WTF IS THAT
>>
File: cant get up im gay.jpg
(31 KB, 720x405)[ImgOps]
32055
>>
what is my waifu doing here????

File: bmi.png
(61 KB, 587x335)[ImgOps]
62531
how many fat lards do we have here
2 posts omitted. Click Reply to view.
>>
File: s.png
(82 KB, 768x437)[ImgOps]
84646
okey
>>
File: fatfuck.png
(24 KB, 357x219)[ImgOps]
25492
( ´π` )
>>
File: WINRAR.jpg
(17 KB, 365x232)[ImgOps]
18244
OUTTA THE WAY, SMALL FRIES

mmm... fries... :drool:
>>
File: Screenshot (308).png
(16 KB, 375x225)[ImgOps]
17155
I think I should lose weight soon:sweat:
>>
File: image.png
(42 KB, 389x372)[ImgOps]
43569
fat is evil and shall be destroyed

File: Screenshot 2026-04-03 013921.png
(44 KB, 1255x232)[ImgOps]
46048
>>
I could not ヽ(;´Д`)ノ
>>
File: strawberry panics 2.jpg
(292 KB, 2586x497)[ImgOps]
299836
UPDATE: I could ヽ(´∇`)ノ

YURI POWAR!!!
>>

Hello, I'm back and I have a major update to my VOCALOID project! I have sucessfully achieved a shape-invariant pitch transposition!

Here it is.
First the original audio: https://files.catbox.moe/zmt3rr.wav
Now my version with WBVPM (pitched down by an octave): https://files.catbox.moe/kho97n.wav
And a version using a naive pitch shift: https://files.catbox.moe/xs39bq.wav

Notice that my version, while having more noise, sounds more natural and has less phasiness. This is particular noticeable if you play both at very low volume. One sounds much more 'human' than the other.

Also note that this an extreme example with an octave shift (or 1200 cents) - in practice, shifts would typically be far less. Also this doesn't implement several other parts of the system (more on that later).

I'll explain all of this in a moment, but first, I'd to correct some major biographical errors. Since this is a long post, I've divided it into sections

BIOGRAPHICAL CORRECTIONS

In the last post, I claimed that VOCALOID1 used Narrow-Band Voice Pulse Modeling while VOCALOID2 and onwards used Wide-Band Voice Pulse Modeling. This was incorrect, and additionally it was the source of most of my confusion surround the paper.

What actually happened is that the research technology that would later become VOCALOID1 started out as work to improve the existing Spectral Modeling Synthesis system that had been developed in the early 1990s. This improvement began work in the late 1990s. But importantly, this system evolved and techniques from it were incorporated with techniques from a system that was being developed called a Phase-Locked Vocoder, and this system would be released as VOCALOID1. In the mid-2000s, work began on combining the techniques learned from improving SMS and the PLVC-based system and attempting to combine them with the mucher older and well-known TD-PSOLA system. Importantly, TD-PSOLA (Time-Domain Pitch Synchronous OverLap and Add) was a time-domain system, while SMS was a frequency-domain system (and also TD-PSOLA was pitch synchronous - hence the name, while SMS had a constant hop size). The first technique they developed was Narrow-Band Voice Pulse Modeling, and later Wide-Band Voice Pulse Modeling. Wide-Band Voice Pulse Modeling ended it up being used in VOCALOID2.

Now that I understand this, I also understand the major mistake I made when reading the paper: I was reading it from the perspective of an implementer, thinking of the sections as the steps to implementing it instead of as research. I had thought that section 2.2 described the core processing algorithms. When it was actually about SMS, and importantly, about *the improvements they made to SMS*, and not a complete description of SMS, since SMS was already an established technique. Hence my confusion on why some things were seemingly vaguely explained, since *the paper wasn't about them*. At the same time, much of that section is very useful though because importantly, much of that research was also incorporated into the later techniques.

RESULTS

I have successfully implemented the Wide-Band Voice Pulse Modeling; synthesis; and pitch transposition, time stretching, and timbre scaling algorithms. Additionally, I have also finished implementing the full version of the pitch estimation module, changed the code to work using overlapping windows, implemented the window adaption system, and fixed countless.


Comment too long, view post No.178285 to see the full comment.
10 posts omitted. Click Reply to view.
>>
I wish I was as interested in anything as OP is in whatever he's talking about :dizzy:
>>
>>178483
Well actually I'm implementing the techniques that were used for VOCALOID2. But a VOCALOID1-like engine could be an interesting future project.
>so much work was put into our silly vocaloid voices we really should be grateful it even exists huh...
https://www.tdx.cat/bitstream/handle/10803/7555/tjbs.pdf?sequence=1&isAllowed=y
>>
>>178483
Wait what happened to my tripcode??
>>
>>178626
sorry

i ate it
>>
Hello I'm back with another update to my VOCALOID project. It's not as big an improvement as last time - and in fact, there's no new features - but I felt like it was worth posting. I've been trying to rectify the major issues before I move onto implementing the Excitation plus Resonance model.

The first thing I attempted to tackle was all the added noise at high frequencies.
Here's the original spectrum: https://files.catbox.moe/fq55bo.png
And here's the reconstructed spectrum (with no transforms applied): https://files.catbox.moe/gq7jff.png
You can clearly see the high frequency artifacts. The first thing I tried was something mentioned in the paper. In the paper, specifically the WBVPM section, it was mentioned that there are two approaches for a non-integer size discrete fourier transform. The first one is repeating the signal while second is upsampling it. I went with second as the former is patented and also because the second is easier to implement. It is mentioned that increasing the repetition count of the signal (or in the case of upsampling, the upsampling factor), and then discarding the higher frequencies, can improve the estimation by reducing artifacts. In the case of repetition, it is also mentioned that quadratic interpolation can be used in the resulting spectrum, however I am not sure if this can be done for upsampling and as such, I have not tried to implement it for now.

Here's the result after applying an upsampling factor of 3: https://files.catbox.moe/qcgnzq.png
Here's the original audio: https://files.catbox.moe/f7g8ta.wav
The original reconstruction: https://files.catbox.moe/da0m1i.wav
And now with the improved reconstruction: https://files.catbox.moe/513ycn.wav
You can see an improvement, especially at lower frequency, however the high frequency artifacts largely persist. So they have to be arising elsewhere. I realized the source was the reconstruction of the signal (AKA the "synthesis"). I had previously implemented a synthesis method that was quite different from the one used in the study, because I did not understand the method in the study at first. My synthesis method worked by taking each voice pulse and for each sample where the voice pulse is the closest voice pulse to that sample, setting the value of that sample to the interpolated value of a spline representing a time domain version of the upsampled voice pulse with a step corrospondin between the ratio a sample in the regular time domain and the upsampled time domain. Now, in some cases, estimation inaccuracies and differences from any transformations that were applied result in these regions of samples being bigger than the actual sample itself. In these cases, we take advantage of the period nature of the voice pulse and repeat it (i.e. sampling before the start is equivalent from that offset from the end, and sampling after the end is the same as that offset from the start). However, this method results in discontinuities in some cases.
Here is an example of such a discontinuity: https://files.catbox.moe/jnnxfj.png
I began to try to implement an interpolation system. In this system, we could calculate the gap between pulses - or in the cases of inaccuracies in the other direction (i.e. overlapping pulses) - the overlapping area, and interpolate between one pulse and the other linearly. However, this was approach was complicated significantly by the non-integer (and potentially differing) sizes of the pulses as well as numerous edge cases. For this reason, I struggled to do so and spent over an hour trying to figure out how to do it corrrectly. About half way through, I decided to check the paper again and this time I understood the actual synthesis method properly, largely because of a diagram I had missed the first time.
In the actual method, each pulse is is expanded in a manner similar to that of the border interpolation technique used in WBVPM analysis, except kind of in reverse. In this technique, for each voice pulse, we generate extensions on both sides with each extension having the size of the border interpolation ratio of the size of the voice pulse. Then we apply a trapezoidal window to the voice pulse which starts at zero at each side of the extended voice pulse and becomes 1 on either side after protrusion of twice the border interpolation size on each side. Then we overlap and add the voice pulses.
This technique fixes the discontinuity issue because it effectively results in each border-interpolation-length side of each voice pulse being interpolated with the corrosponding section for the other voice pulse linearly over a period of twice the border interpolation size. However, this only holds perfectly when the fundamental frequency is the same for both voice pulses (and thus they are the same size) and they are spaced out at onsets that are exactly the period of the fundamental frequency apart. However, when this in not the case, some amount of modulation occurs that results in some voice pulses being attenuated while others are accentuated. This is especially noticeable when there are large inaccuracies in the fundamental frequency estimation and/or the voice pulse onset sequence.
Here's the same section from before. Notice how now it does not have a discontinuity: https://files.catbox.moe/p26914.png
Now here's a zoomed-out version: https://files.catbox.moe/zacw8w.png
Now here's a section with large inaccuracies in the MFPA estimation that clearly shows large modulation artifacting: https://files.catbox.moe/efk1vx.png
Here's the new spectrum: https://files.catbox.moe/f94zse.png
You can see that while the high frequency artifacts are now gone, there are now more low frequency artifacts. In fact, the overall amount of artifacts is actually higher than before.
Here's the reconstructed audio: https://files.catbox.moe/ympfi0.wav
While I ended out solving this issue by fixing large inaccuracies in the MFPA system, it is interesting to note that my approach is more resilient to estimation inaccuracies. Perhaps for a future improved vocal synthesizer, it would be worth exploring a variant of my periodic continuation technique adapted with an interpolation method that could handle changes in pulse onset and f0.

The first thing I tried was switching to a magnitude-limited logarithmic scale for the ampltiude in the MFPA function instead of it being linear. However, this resulted in little to no effect. The next thing I tried was adjusting the size in periods of the window used for the peaks that are fed into MFPA, however again this resulted in little to no effect. Next, I tried implementing the harmonic peak selection algorithm I proposed in the previous post, but again this resulted in little to no effect.

Comment too long, view post No.178806 to see the full comment.

File: chensmug03.jpg
(34 KB, 474x474)[ImgOps]
35240
heh, you are not a coffee~
how lame. :lolico:
>>
File: satorismug03.png
(291 KB, 474x474)[ImgOps]
298809
is what you would like to think, but too bad! I'm a fair share of coffee too, myself!
>>
ehhh~ thats lame coffee! who would wanna drink shit diarrea!?!
>>
File: shikanoko.jpg
(495 KB, 1878x2048)[ImgOps]
507024
キタ━━━(゚∀゚)━━━!!


Delete post: []
First
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212] [213] [214] [215] [216] [217] [218] [219] [220] [221] [222] [223] [224] [225] [226] [227] [228] [229] [230] [231] [232] [233] [234] [235] [236] [237] [238] [239] [240] [241] [242] [243] [244] [245] [246] [247] [248] [249] [250] [251] [252] [253] [254] [255] [256] [257] [258] [259] [260] [261] [262] [263] [264] [265] [266] [267] [268] [269] [270] [271] [272] [273] [274] [275] [276] [277] [278] [279] [280] [281] [282] [283] [284] [285] [286] [287] [288] [289] [290] [291] [292] [293] [294] [295] [296] [297] [298] [299] [300] [301] [302] [303] [304] [305] [306] [307] [308] [309] [310] [311] [312] [313] [314] [315] [316] [317] [318] [319] [320] [321] [322] [323] [324] [325] [326] [327] [328] [329] [330] [331] [332] [333] [334] [335] [336] [337] [338] [339] [340] [341] [342] [343] [344] [345] [346] [347] [348] [349] [350] [351] [352] [353] [354] [355] [356] [357] [358] [359] [360] [361] [362] [363] [364] [365] [366] [367] [368] [369] [370] [371] [372] [373] [374] [375] [376] [377] [378] [379] [380] [381] [382] [383] [384] [385] [386] [387] [388] [389] [390] [391] [392] [393] [394] [395] [396] [397] [398] [399] [400] [401] [402] [403] [404] [405] [406] [407] [408] [409] [410] [411] [412] [413] [414] [415] [416] [417] [418] [419] [420] [421] [422] [423] [424] [425] [426] [427] [428] [429] [430] [431] [432] [433] [434] [435] [436] [437] [438] [439] [440] [441] [442] [443] [444] [445] [446] [447] [448] [449] [450] [451] [452] [453] [454] [455] [456] [457] [458] [459] [460] [461] [462] [463] [464] [465] [466] [467] [468] [469] [470] [471] [472]