fc2web and a whole (…) - Off-Topic@Heyuri

(238 KB, 1226x745)[ImgOps]

Anonymous 2025/04/02(Wed)21:15:36 No.136839

fc2web and a whole bunch of other free japanese hosts are to be shut down at the end of june

it has a page on archiveteam but it seems nothing is being scraped right now
https://wiki.archiveteam.org/index.php/Fc2web

a lot of old sites may disappear sad

Marked for deletion (Old)

1 Anonymous 2025/04/02(Wed)21:25:03 No.136840

why are they being shut down?

2 Anonymous 2025/04/02(Wed)21:30:00 No.136841

>>136840
they dont align culturally with the NWO emo

3 Anonymous 2025/04/02(Wed)21:42:56 No.136842

>>136840
>The FC2WEB system has been in operation for over 20 years,
The system and servers are aging, and it is difficult to maintain them.

if you visit a site you get a popup
http://saomix.fc2web.com/

if theyre abandoned, they will be gone. but if they can migrate, the old visitor counts will prob reset
（ ´,_ゝ`）

4 Anonymous 2025/04/02(Wed)21:44:18 No.136843

File: image.png

(13 KB, 296x223)[ImgOps]

I will try to scrape some stuff later on, but I hope there will be a way to remove this warning from teh archives dark

5 Anonymous SAGE! 2025/04/02(Wed)21:48:56 No.136844

My guess would be that it's related to the UK's "no fun allowed" Online Safety Act, which goes into full effect this summer dark

It applies to any large-ish "platform" that serves UK users, and they're trying to bully the entire world into kowtowing to it. Perhaps those Japanese services decided to just shut down instead of receiving a fine or geoblocking the UK - especially since their services have likely been mostly dead for the past 15 years

Sage for serious business

6 kaguya◆V4Zyo2onG6 ## Admin 2025/04/02(Wed)22:29:28 No.136847

File: usada.jpg

(19 KB, 297x221)[ImgOps]

Let's try saving at least Futaba/2channel/2D etc related fc2web sites to Museum@Heyuri happy

I will attempt to scrape these sites soon:
http://saomix.fc2web.com/
http://convenies.fc2web.com/
http://tuneari.fc2web.com/
http://uguisuhp.fc2web.com/newpage2.htm
http://meganekko2.fc2web.com/
http://nijineta.fc2web.com/
http://mypacekame.fc2web.com/ (there is a notice of moving the site, but the new one is empty - noting down here, will only put up if he gives up)
http://rva139.fc2web.com/
http://ichinoa.fc2web.com/
http://riceballman.fc2web.com/

I found these sites with a google search of "虹裏" site:fc2web.com, u may search with similar keywords and suggest the sites (or just post them ITT for the sake of sharing)

7 Anonymous 2025/04/02(Wed)23:47:36 No.136855

File: 1724642291260865.jpg

(69 KB, 700x848)[ImgOps]

someone quick save that japanese nudist website kuma6

8 Anonymous 2025/04/03(Thu)00:03:58 No.136859

>>136855
I will save it with my photographic memory and i'll describe it to you whenever you need it biggrin

9 hachikuji◆naRqMpZoTs ## lost snail 2025/04/03(Thu)02:46:21 No.136869

>>136847
ganbatte dance2

10 Anonymous 2025/04/03(Thu)03:35:50 No.136873

>archiveteam
These fucking queermoes do jack diddly shit. They let geocities.jp die unarchived, teacup, etc. Fuck them.

11 Anonymous 2025/04/03(Thu)03:38:08 No.136874

>>136873
why did they do that? :(

12 Anonymous SAGE! 2025/04/03(Thu)05:57:45 No.136883

>>136844
Why are lawmakers so mad? nyaoo

13 Anonymous 2025/04/03(Thu)05:59:27 No.136884

File: 050516-3.jpg

(10 KB, 320x240)[ImgOps]

Post some cool things you find from random fc2 pages you find
>>136874
Not trying is easier than trying biggrin

14 Anonymous 2025/04/03(Thu)06:58:33 No.136890

File: 95473.jpg

(10 KB, 320x240)[ImgOps]

ｷﾀ━━━(ﾟ∀ﾟ)━━━!!

15 Anonymous 2025/04/03(Thu)06:59:51 No.136891

File: 20030713_01.jpg

(57 KB, 640x480)[ImgOps]

ｷﾀ━━━(ﾟ∀ﾟ)━━━!!

16 Anonymous 2025/04/03(Thu)07:47:54 No.136897

File: yy.jpg

(12 KB, 246x500)[ImgOps]

ｷﾀ━━━(ﾟ∀ﾟ)━━━!!

17 Anonymous 2025/04/03(Thu)08:37:10 No.136901

Public archives are dead.
I used to maintain a few pages dedicated to archiving the history of certain boards, but these days archive.ph just won't cooperate. I can't even use it to archive the index of a slow imageboard without it spitting a bullshit error at me.
I still feel really bad about not saving anything from geocities.jp beyond a screencap of a page or two, and a few flash files. To be fair, I actually trusted archiveteam would pull through and scrape everything, so I wasn't really worried at the time closed-eyes2

Whenever anyone goes on a "copypaste some kanji before:2015 in the search bar and find weird images" rabbithole hootenanny, the first couple thousand results you're going to receive are hosted on fc2, so it's certainly going to be weird to see what sites will surface once it's gone.

18 Anonymous 2025/04/03(Thu)09:40:44 No.136906

File: f.jpg

(5 KB, 259x194)[ImgOps]

ｷﾀ━━━(ﾟ∀ﾟ)━━━!!

19 Anonymous 2025/04/03(Thu)09:53:31 No.136907

>>136906
18yo cry

20 Anonymous 2025/04/03(Thu)11:01:08 No.136909

>>136901
>hosted on fc2, so it's certainly going to be weird to see what sites will surface once it's gone
To my understanding sites hosted on the fc2 main site will survive, only fc2web (and others in OP) will b gone

21 Anonymous 2025/04/03(Thu)11:42:43 No.136918


from urllib.parse import urlparse
from googlesearch import search

urls = [
    # "55street.net",
    # "easter.ne.jp",
    # "finito-web.com",
    # "ojiji.net",
    # "zero-yen.com",
    "fc2web.com",
    # "k-free.net",
    # "gooside.com",
    # "ktplan.net",
    # "kt.fc2.com",
    # "zero-city.com",
    # "k-server.org",
    # "land.to"
]

results = search(f"site:*.{urls[0]}", num_results=10000, unique=True, safe=None, sleep_interval=5, region="ja")

parsed_urls = []
for s in results:
    parsed_url = urlparse(s)
    domain = f"{parsed_url.scheme}://{parsed_url.netloc}/"
    parsed_urls.append(domain)
    print(domain, flush=True)

unique_urls = list(set(parsed_urls))

filename = f"{urls[0]}.txt"
with open(filename, 'w') as file:
    for url in unique_urls:
        file.write(url + '\n')

print(f"saved {filename}")

>>136847
i tried similar with this py but google doesnt have many results.
archive.org cdx api wont return just subdomains either.

22 Anonymous 2025/04/03(Thu)11:48:23 No.136919

What futaba posters, ie japs themselves think about it?

23 Anonymous 2025/04/03(Thu)16:21:52 No.136945

Here's a host-level list of a 293 million websites (5.3 GB download, extracts to 20 GB txt file):
https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2025-jan-feb-mar/host/cc-main-2025-jan-feb-mar-host-ranks.txt.gz
Some text editors like Notepad++ will need 24GB+ RAM to open the file, but I'm sure there are programs/scripts that can search through it without loading it all into RAM. The domains are backwards like "com.fc2web.nigger" instead of "nigger.fc2web.com".

8347 matches for com.fc2web
180 matches for net.55street
393 matches for com.fc2.kt

24 Anonymous 2025/04/03(Thu)17:06:32 No.136949

>8347 matches for com.fc2web
>180 matches for net.55street
>393 matches for com.fc2.kt
There should probably be a list only for these, and fix teh URLs with regex
Though it still takes someone willing with enuf storage/bandwidth to archive thousands of sites sweat2

25 Anonymous 2025/04/03(Thu)17:31:29 No.136952

>>136919
futaba posters dont talk. its all sentence-long bullshit and (AI) image posting.

26 Anonymous 2025/04/03(Thu)17:35:19 No.136953

>>136952
Are they okay?

27 Anonymous 2025/04/03(Thu)17:43:33 No.136954

In Japan, imageboards are for images

28 Anonymous 2025/04/03(Thu)18:10:45 No.136956

File: image.png

(99 KB, 711x625)[ImgOps]

Futaba is on the chattier side
The img server doesn't even have image replies
They don't currently have a discussion about this topic, but this doesn't mean they never had - they don't have public archives like 4chan's archived.moe etc, so we can't just search it to find related discussions in the past.
When you visit img, you can always find discussions of toshiaki about very average ecchi pictures/screenshots. It's a place to worth patterning after ( ´ω`)

29 Anonymous 2025/04/03(Thu)19:14:31 No.136968

>fc2web and a whole bunch of other free japanese hosts are to be shut down at the end of june
LMAO

30 Anonymous 2025/04/03(Thu)19:30:59 No.136969

Are web.fc2.com sites staying up?

E.g. http://strangewalker.web.fc2.com/

31 Anonymous 2025/04/03(Thu)20:05:22 No.136971

Yes, nothing indicating those ones are shutting down. It's probably worth having our own Ayashii Walker anyways, because I think that's the only one left up

32 Anonymous 2025/04/03(Thu)20:18:33 No.136972

>>136918
used >>136945 and tried to grab all the domains with fc2web & fc2.kt. first time doing something like this so i hope i didnt fuck it up
https://up.heyuri.net/src/4343.txt

will try follow up with other domains but i gotta find out all the subdomains that's dying...

33 Anonymous 2025/04/03(Thu)20:44:19 No.136986

>>136844

I promise you, the UK doesn't have that much sway, nor have our incompetent government's expensive attempts to mediate the Internet ever actually done anything. It's all just WORDSWORDSWORDS intended to placate people who spend all day on Mumsnet complaining about how Pornhub needs to be banned because their impressionable shota saw one porn video and turned ghay (´～`)

34 Anonymous 2025/04/03(Thu)20:45:13 No.136987

File: image.png

(47 KB, 784x313)[ImgOps]

http://futabajinro.fc2web.com/
Futaba's werewolf game's logs
we should salvage teh icons here for Heyuri's usage

35 Anonymous 2025/04/03(Thu)20:49:52 No.136988

>japs themselves think about it?
You could read through Twitter
https://xcancel.com/search?f=tweets&q=fc2web

36 Anonymous 2025/04/03(Thu)20:59:29 No.136990

if there are 10K sites disappearing in total, if we say they are .1GB in average (this may even be generous)
10.000*0.1 = 1000GB = 1TB
I think it wouldn't be too impossible to archive everything? unsure

37 Anonymous 2025/04/03(Thu)21:34:44 No.136995

>>136972
some of these are like
com.fc2web.toukei135.html.comic
com.fc2web.toukei135.txt.robots
which looks like they should be toukei135.fc2web.com/robots.txt and /comic.html

38 Anonymous 2025/04/03(Thu)21:40:28 No.136997

>toukei135
gives 404 unsure

39 Anonymous 2025/04/03(Thu)22:57:33 No.137011

>>136995
you got that right. my mistake, i made an updated one. also has the other domains that are in the script in >>136918
https://up.heyuri.net/src/4344.txt

>>136990
theres about 13600 lines in this file, and if we assume a third are dead, and that every site also takes up 100mb, that would mean roughly 900gb of storage. or uh... 1.36tb i guess. still somewhat in reach for anyone with a spare drive

there is a considerable amount of 404s and 403s, so i think it could be alot less if we figure how to just not download the ones that'll error out

40 Anonymous 2025/04/03(Thu)23:33:26 No.137014

This probably needs a python script (thxfully we're in the age of chatgpt)

It should use links.txt 137011-san posted, if a page is redirecting to anywhere on error.fc2.com from the index page, it shouldn't save it and put the link to error.txt with a new line. Successful ones should be written to something like done.txt (will be used to generate the link index if we ever put it on Heyuri etc) after completing. Also it should have a "resuming" mechanism somehow using the two files, checking the last link in both of them & comparing which one is furher down in links.txt, and continuing from the one after that.

After downloading all pages, we should
1- convert downloaded .htm and .html files from SHIFT_JIS to UTF-8
there is a linux command noted on Museum@Heyuri for this: find . -type f \( -name '*.htm' -o -name '*.html' \) -exec sh -c 'iconv -f SHIFT_JIS -t UTF-8//TRANSLIT "{}" > "{}.tmp" && mv "{}.tmp" "{}"' \; or find . -type f \( -name '*.htm' -o -name '*.html' \) -exec sh -c 'iconv -f SHIFT_JIS -t UTF-8 "{}" | sed -r "s/‾/~/g;s/¥/\\\\/g" > "{}.tmp" && mv "{}.tmp" "{}"' \; for saving yen symbols
Or use teh powershell script: https://up.heyuri.net/src/3485.ps1 (Convert-Encoding.ps1)
2- convert <html> at the beginning of each file to <html lang="ja"> so browsers can display intended fonts.
also a command for this noted on Museum@Heyuri: find . -type f \( -name "*.html" -o -name "*.htm" \) -exec sed -i 's/<html\( lang="[^"]*"\)\?>/<html lang="ja">/Ig' {} +
3- remove the end of service notification popup thing
It should detect starting from "<div id="popup-container">" to "</script>" and remove in between
I didn't test but ChatGPT suggested this command: find . -type f \( -iname "*.html" -o -iname "*.htm" \) -exec perl -0777 -i -pe 's|<div id="popup-container">.*?</script>\s*||gs' {} +

Then all that's left is sharing with internets somehow biggrin

Depending how much it actually takes (I doubt the average gets anywhere close to 100MB), we can host it ourselves.

There will be some sites like http://uguisuhp.fc2web.com which doesn't link anything from its index but needs to be linked from http://uguisuhp.fc2web.com/newpage2.htm
I will remember to save this one specifically, but there are probably some others with hidden pages that will get lost...

41 Anonymous 2025/04/03(Thu)23:45:51 No.137015

>1- convert downloaded .htm and .html files from SHIFT_JIS to UTF-8
>2- convert <html> at the beginning of each file to <html lang="ja"> so browsers can display intended fonts.
The potential collateral damage that this could cause (corruption, wrong characters, general b0rkage) isn't worth it IMHO - I think it'd be better to keep the pages in their original form sweat2

42 Anonymous SAGE! 2025/04/03(Thu)23:46:52 No.137016

*Not to mention that some pages may not even be Shift-JIS, but EUC-JP or even UTF-8

43 Anonymous 2025/04/04(Fri)00:03:06 No.137017

>The potential collateral damage that this could cause (corruption, wrong characters, general b0rkage) isn't worth it IMHO
I recall running into issues with serving SJIS files over web servers, or maybe it was Cloudflare. I think <html lang="ja"> was necessary too, but maybe it could use some kind of check to see if it should do dizzy

If someone cares about just having them not disappear, they could just do Step 3 first and distribute that with torrent too

44 Anonymous 2025/04/04(Fri)02:05:12 No.137022

File: 77.jpg

(16 KB, 320x240)[ImgOps]

ｷﾀ━━━(ﾟ∀ﾟ)━━━!!

45 Anonymous 2025/04/04(Fri)06:42:00 No.137042

File: Solo Leveling - S02E04.jpg

(785 KB, 1920x1080)[ImgOps]

>>137022

46 Anonymous 2025/04/04(Fri)06:47:22 No.137043

File: a64b4e028f116cbeaaf0559244db0e08.jpg

(332 KB, 1650x2850)[ImgOps]

Why not use wget or wget2 for archiving teh sites? It have recursive mode, option preventing it from blindly following external domains, option to convert links.

47 Anonymous 2025/04/04(Fri)06:50:52 No.137044

And all thanks to our Pater RMS!

48 Anonymous 2025/04/04(Fri)09:51:08 No.137051

>>136855
What website? (I want to know)

49 Anonymous 2025/04/04(Fri)14:18:22 No.137056

File: y609kjivmpse1.png

(131 KB, 1794x1397)[ImgOps]

Speaking of storage space... It is going to make you envy (yes, a reddit link dark

) https://reddit.com/r/StableDiffusion/comments/1jqej32/vram_is_not_everything_today/
I wonder if they buy it from their parents' money. Or is it not so expensive, relative to the monthly income, to buy computer parts in US.

50 Anonymous 2025/04/04(Fri)14:36:16 No.137058

File: hyh.jpg

(12 KB, 320x240)[ImgOps]

ｷﾀ━━━(ﾟ∀ﾟ)━━━!!

51 Anonymous 2025/04/04(Fri)14:37:29 No.137059

>>137022
What is he doing with his hand? That's not a peace sign. Looks like he's trying to symbolize a gun

52 Anonymous 2025/04/04(Fri)14:39:04 No.137060

>>137056
Harddrive's arent that expensive. I make 15 an hour and I can afford as much space as I want

53 Anonymous 2025/04/04(Fri)14:58:15 No.137065

>>137060
I do not have even a 100GB of free space currently dark

And the number of SATA ports on ur motherboard is limited, what do?

54 Anonymous 2025/04/04(Fri)15:00:30 No.137066

>>137056
computer parts are definitely not that expensive in the US

55 Anonymous 2025/04/04(Fri)16:03:12 No.137071

>>137065
I just delete stuff that I'm probably not going to use again. Anime I've already watched that wasn't great, obsolete AI models, old versions of software, video games I probably won't replay, all of it can go. Someone who isn't a NEET can spend their money on HDDs to archive that stuff.

56 Anonymous 2025/04/04(Fri)16:21:50 No.137073

File: dd.jpg

(65 KB, 600x450)[ImgOps]

ｷﾀ━━━(ﾟ∀ﾟ)━━━!!

57 Anonymous 2025/04/04(Fri)17:36:12 No.137075

>>137043
i tried wget to take all 55street sites since fc2 was too large
55street has 178 sites according to the host lvl record, and the downloaded data adds up to a total of 600mb, giving an average of 3.3mb per site. this COULD mean that the total archive could add up to 52.8gb, assuming the 16k site figure maintains a 3.3mb average. dont know how many sites just 404d, but lets just keep with the 3.3mb average

the compressed size of the 55street archive (450mb, 7z) is too big to put on heyuri uploader, so if anyone has a file uploader or a better compression algorithm do let me know

note that i havent done any of the cleaning mentioned in >>137014, just the raw site data. terminal says it was 23min to download everything, and i think i can scale it up a bit more too so it'll be faster. i only used 8threads and limited it to 512k & had a wait of .3s, so theres probably alot of room for improvement

58 Anonymous 2025/04/04(Fri)18:05:09 No.137076

>>137075
Just put it in parts! ヽ(´ー｀)ノ And it's better to make a separate user board.

59 Anonymous 2025/04/04(Fri)18:15:51 No.137077

I am trying to write a script that checks if links are redirecting to error.fc2.com (and put the links that doesn't redirect to werks.txt file, those that redirect to error.txt) but I can't get it to work with 100% certainty yet (;´Д`)
Once I have a clean list of subdomains, using wget or httracker or watever should be easy

>this COULD mean that the total archive could add up to 52.8g
That's nothing... I'd put all on Museum@Heyuri biggrin

60 Anonymous 2025/04/04(Fri)18:26:19 No.137078

>>137076
ooo good point. forgot i can do that. it's below:
https://up.heyuri.net/user/boards/jparc/index.php

i'll probably try put some more site data on there so hopefully it doesnt fill up too quick (´～`) or uh, that the data doesnt just become redundant ┐(ﾟ～ﾟ)┌

61 Anonymous 2025/04/04(Fri)18:41:16 No.137080

>>137077
I think wget can do it too. ヽ(´ー｀)ノ

wget --spider --max-redirect=0 -a wget.log --tries=1 --wait=0.4 -i list.txt

then you'll need just a script that parses its log

--2025-04-04 21:32:16--  http://toukei135.fc2web.com/
Resolving toukei135.fc2web.com (toukei135.fc2web.com)... 199.48.208.133
Connecting to toukei135.fc2web.com (toukei135.fc2web.com)|199.48.208.133|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://error.fc2.com/web/ ［following］
0 redirections exceeded.

>should be easy
Oh, it'll require so much man reading and debugging once it werks dark

62 Anonymous 2025/04/04(Fri)19:02:08 No.137082

File: thxgpt.png

(23 KB, 690x506)[ImgOps]

THX, I think I already solved it - it was a false assumption of mine that they were all be index.html sweat2

I'm using curl for now but once I have a good list I should probably use wget or httracker. I'm not quite l33t enough (´ー`)

63 Anonymous 2025/04/04(Fri)20:00:16 No.137083

>>137011
how complete is this?
I noticed it at least doesn't have http://njtown.fc2web.com/ unsure

64 Anonymous SAGE! 2025/04/04(Fri)20:04:48 No.137084

and neither domains below from >>136847
http://saomix.fc2web.com/
http://rva139.fc2web.com/
http://ichinoa.fc2web.com/

65 Anonymous 2025/04/04(Fri)21:08:35 No.137087

looks like the Common Crawl list of 293 million websites still isn't the entire internet

66 Anonymous 2025/04/04(Fri)21:19:04 No.137091

>>137082
Whoa, C code and even having a GUI is pretty cool dizzy

I only can into console scripts and just started to experiment with Tkinter closed-eyes2

67 Anonymous 2025/04/04(Fri)22:15:26 No.137095

i hope you guyz can save teh japanese internets from the demonic forces trying to maek the web all boring. ヽ(´∇`)ノ

68 Anonymous 2025/04/05(Sat)09:02:52 No.137113

File: 2025-04-05_09-55.jpg

(66 KB, 709x402)[ImgOps]

>>137087
sure doesnt (´～`) we might need to find other crawler lists. or maybe the older data sets of common crawler have domains that arent included in the one sent by >>136945
looks like the 16k site figure might go up Σ(;ﾟДﾟ)

69 Anonymous 2025/04/05(Sat)09:46:15 No.137116

File: image.png

(3 KB, 323x155)[ImgOps]

I think there is no land.to domain that doesn't return 404 at all anymoar? unsure

The way it returns 404 confuses my script, so I'll just halt it and give teh result list. Less than 10K sites to archive if it werked right biggrin

70 Anonymous 2025/04/05(Sat)09:59:32 No.137120

links that return an error - from 350wuen.fc2web.com onwards, they timeout instead of sending to error.fc2: https://up.heyuri.net/src/4347.txt
links that are ready to archive https://up.heyuri.net/src/4348.txt

Seems it got them correct but I didn't check thoroughly

>>137113
Would be bettar to include moar of course, but I think 10K sites that made to that crawl list is still sumthing biggrin

71 Anonymous 2025/04/05(Sat)16:10:31 No.137160

There may be leaked h4ck3d DNS zone lists somewhere on the net. Dunno if google can find it, only real haxxors know where to get them now probably.
https://zonefiles.io/

https://stackoverflow.com/questions/131989/how-do-i-get-a-list-of-all-subdomains-of-a-domain

72 Anonymous 2025/04/05(Sat)16:33:34 No.137169

>>136855
WHAT SITE
I never knew it existed and now I never will cry

73 Anonymous 2025/04/05(Sat)17:24:05 No.137185

File: 55sites.png

(2 KB, 266x54)[ImgOps]

55 sites, average is 4MB... in theory dark

probably doesn't help my only available drive is an encrypted one intended for saving big videos (exfat), and it haets small icons
I should find a smaller drive to maek NTFS...

74 sage SAGE! 2025/04/05(Sat)17:30:33 No.137188

imagine using teh proprietary dark

75 Anonymous 2025/04/05(Sat)18:37:16 No.137193

>>137185
exfat genuinely does suck in general

all my external drives are NTFS, particularly since I do remember a time when exfat support was a lot iffier on Linux
although really, my really lazy bastard with a hammer instead of a drill solution would be a veracrypt file volume of 300MB and a lazy password lolol

76 Anonymous 2025/04/06(Sun)10:56:42 No.137283

>>137120
i tried to use the list here but wget came up with weird errors when i tried to use it (´人｀) just used the old one and let it 404 everything that wasnt it. since the sites redirect to its own site, wget seems to pickup on it and doesnt download them accordingly.

the userboard now has ojiji.net scraped, a total of 152 sites scraped, and 115 successfully scraped. it took like 2 hours to download the files from this http://shima.ojiji.net/ fucker.. but it was done! ヽ(´ー｀)ノ their site alone took up like 100mb... was i archiving lhqhq?

https://up.heyuri.net/user/boards/jparc/

commenting on how fast it was, it seems xargs is really good at downloading lots of domains concurrently but handles each domain exclusively with a single thread. upping wait times down and giving it random wait, alongside a little bump in dl speed seems to have made it a little faster, but still overall slow. ChatGPT-san will have to answer to how to make it faster...! (ﾟ血ﾟ#)

77 Anonymous 2025/04/06(Sun)15:03:05 No.137294

File: Screenshot 2025-04-06 at 16.54.14.png

(58 KB, 750x442)[ImgOps]

>>137193
General purpose fs'es do suck when you need to store millions of read-only, often small blob's.

>>137120
Fed it into my piece-of-shit crawler. Maybe it'll manage to save something.

78 Anonymous 2025/04/14(Mon)01:56:43 No.138197

File: 428979e9960d5ea54f5d5fe345b0edd17c0eb161.jpg

(145 KB, 640x480)[ImgOps]

86 GiB
2M files
2.2M urls crawled
1.8M urls in queue

why am I doing it

79 Anonymous SAGE! 2025/04/21(Mon)19:45:32 No.140036

>>136956
They use archive.org

Name
Email
Subject
Comment
File	Animated GIF
Password	(for deletion, 8 chars max)
Allowed file types are: gif, jpg, jpeg, png, bmp, swf, webm, mp4 Maximum file size allowed is 50000 KB. Images greater than 200 * 200 pixels will be thumbnailed. 22 unique users in the last 10 minutes (including lurkers) Switch form position \| BBCode reference Read the rules before you post. Protect your username, use a tripcode! 日本のへゆり

Heyuri!

Bulletin Boards

Heyuri★CGI

Other

Off-Topic@Heyuri

Posting mode: Reply