2019-07-20: We encourage users to help the talented Kyoto Animation staff recover from the recent horrific tragedy.
2019-08-19 New scrapers are in the progress of development, with promising results on /wsg/. On behalf of all archivers, unless Asagi is replaced, as 4chan grows we are all are in grave danger under the strain of deep software inefficiency and unsustainable costs (like fireden). More details here.
2019-09-02: The old frontend server had its issues fixed by the provider, but we may still want to move to it to our backend server to cut costs and reduce reliance on an increasingly overprovisioned host. We may be able to scrape more boards if funding and more efficient scrapers are provided for expansion. Donations to our site would help in case of image storage failure.
Please refrain from spamming the ghostposting system or it may not be around for long.

## Developer No.3894 View ViewReplyReportDelete
## TL;DR Regarding the Absolute State of 4chan Archival

At Desuarchive we have long struggled with many issues with many unsolved mysteries from the previous admin (peace be upon his wrists), but we have now set up the archiver on a more stable footing and there is some development going on with the scraper at least, so things are looking up as you may have seen this past year.

It is imperative for the survival of all 4chan archivers that Java-based Asagi is replaced (especially given the downfall of Fireden) and significant efficiency improvements are made in both excessive HTTP requests and RAM usage, while providing the same reliability and accuracy. As 4chan grows all archivers will be in grave danger of dying under the strain of deep software inefficiency and unsustainable costs if this is not done.

Archives are not going to be sustainable as seen with Fireden if only one dude has to shoulder the weight of thousands of dollars of equipment and bandwidth usage. The next archiver on the deathwatch appears to be Warosu.

On behalf of all 4chan archives, we need your help with the two scrapers being developed. These scrapers are currently set to be Asagi compatible as a future drop-in replacement for all other archivers: which is no small feat as Asagi regularly uses 40-60GB of RAM at full load but these could use as low as 30-150MB.

https://github.com/bibanon/eve - Python based scraper. We currently actively test it in production with /wsg/ scraping.

https://github.com/bbepis/Hayden - C# .NET light scraper, still needs testing for evaluation. But it's doing real great.

As such, we hope to be able to build a brand new archival stack based on these that dispenses with the inefficiencies of the scrapers of the past using PostgreSQL JSONB to store threads exactly as they are from the 4chan API (NoSQL style). While we are not frontend developers, we can sidestep this by building middleware to emit a 4chan compatible API, so that 4chan-X can be used as the JavaScript webapp and Android apps (Chanu, Clover) and iPhone apps could be modified with a few lines of code to work with the archive.

In support of research and onboarding for this, this effort 4plebs has generously developed partial 4chan API compatibility for the FoolFuuka frontend which is slowly being rolled out. This will allow Android and iPhone applications to view the FoolFuuka archivers (but not ghostpost yet). If you are a PHP developer we need your help here.


They also developed 4plebs X which uses 4chan-X to function as a webapp frontend, possibly able utilize this 4chan API to replace the user facing part of the PHP HHVM FoolFuuka stack with a familiar alternative. It has flaws such as the lack of search and ghostposting, but hopefully developers could try to step in regarding that.


Demo: https://test.4plebs.org (to use disable 4chan-X to avoid conflicts).

If you know any third party 4chan app devs, please refer them to us so we can direct them on how to set the proper configurations for their app to access FoolFuuka archives. (there was an old FoolFuuka API already but it predates 4chan API so it is not directly compatible, best to move off it)

We are willing to provide support and troubleshooting for better understanding of FoolFuuka/Asagi instances for the construction of new ones or development of replacement scrapers, or if anyone wants to pick up the boards of Fireden. We have institutional knowledge and experience running many major archival websites gathered over 2 years, so don't hesitate to drop by.



Our guide could use some work but it will guide you there with some hiccups.


## Regarding the Absolute State of Fireden and /v/ and /vg/ archival

Fireden is infamous in the community for never reaching out for help or advice, and never acting on anything other than abuse emails. I don't think they ever planned to operate for this long they were set up on the whim in 2015 after archive.moe died, so they probably just had enough it costs a lot to operate a site that can scrape /v/ and /vg/ images. But if the Fireden admin is reading this, be the prodigial son: we can provide any assistance or backup you need so that your hard work is not in vain.

The next archiver I expect to collapse under pressure is Warosu. As for us we are pretty stable after a $500 chassis upgrade and hot spare SSDs, but it really sucks to be one of the few people in the world who puts a large amount of capital into 4chan archival.

We refuse to pump more money in to bail out any more archivers for barely any returns, We have had to bail out 4 of them already and have paid $7000 to date out of pocket, and $200 a month, can't someone else pony up?

The best bet is for a large capital investment to be made on arch.b4k.co so it can be significantly upgraded to our standards to match the levels of Fireden, providing /vg/ scraping and full images for both. It will actually not cost too much to start out with maybe only 5x10TB drives for $700, $300 for a new case and maybe $600 for new AMD Ryzen with 160GB of RAM for Asagi and MySQL and $100 for colocation. Because we will probably never see the fireden images ever again, so that saves a lot of space.

4plebs refuses to take on any more boards as they are barely able to handle the ones they have.

## Basic Details about the Maintenance Done

This weekend we managed to do a major case upgrade for $500 for our backend image server to allow it to host more services such as scrapers and frontend content. All SSDs were moved out of the internal bay and into hotswap bays, and a hot spare SSD for booting was added: without those it was really difficult to service and made it difficult to consider using it for hosting databases safely. It may be possible to attach at least 6 more 3.5" drives which will be necessary as only 10TB of storage is available.

This may make it possible to halve the costs of cloud servers and bandwidth that we currently use by consolidating service together into a single server.

1 drive with bad sectors was replaced safely for $150 and a ZFS resilver completed. The other drives do not appear to have issues, but we continue to monitor the situation.

Tests done with the bibanon/eve scraper for scraping /wsg/ have been extremely promising, though development is still ongoing to put it on par with the Asagi scraper. It is possible that any new deployment of the scraper will utilize either this or hayden, but proper testing will still be necessary.
46 posts and 1 image omitted

## Admin No.3026 View ViewReplyReportDelete
Welcome to /desu/. Use this board to report issues, request features, and for other discussions regarding desuarchive.org & rbt.asia. Other posts will be removed.

When reporting a technical issue, be sure to include the full URL of the page/image.

Do not use this board for removal requests, which must be emailed to [email protected] Other rule violations can be reported by clicking the "Report" button on the post.

No.3970 View ViewReplyReportDelete
Is there any way I could download the entirety of the /mlp/ archives hosted here? I know it would be huge, but I want to know if there is a way.

No.3973 View ViewReplyReportDelete
In the /m/ ghost posting there are a pair of shitposter post that are not deleted, but my comment about how janitors deleting legit recommendations should be called out got deleted and I am not able to ghost post in said thread. Is there any particular reason for this?

No.3961 View ViewReplyReportDelete
What happened to RebeccaBlackTech? It does not load for about a week now.

Auto-fetching deleted posts

No.3869 View ViewReplyReportDelete
How much of a toll would it take on the server if 4chan X started automatically checking Desuarchive's copy of a thread upon opening the thread to see if there were any deleted/ghost posts? Is there a good way to do this that would minimize the resource usage on Desu's end? Do you think it might be better to try to set up a separate web service that lists deleted posts?

No.3944 View ViewReplyReportDelete
>2019-08-26: The old frontend server is experiencing abysmal performance due to provider's resource overprovisioning and possibly excess resource consumption (maybe we are getting scraped?), we may have to cancel it and move to our backend server.
I'm concerned that this might be me. I'm scraping a bunch of images sometimes with a scraper I wrote.
I downloaded maybe 10GB of images over the last few months, but it happens in bursts.
I put a delay after each download so that I don't cause too much strain, and try not to download the same image multiple times, though this has happened in the past before when the scraper was shitty.
Can you tell me about the issues you are experiencing, so I know whether they were caused by me and what I can do about it?
You can go technical about it, or better yet, list a few search queries or filenames that caused it.

Recurring donations?

No.3932 View ViewReplyReportDelete
I'd much prefer automatically donating $10/month over manually donating larger amounts irregularly. I've heard most nonprofits prefer taking small recurring donations as well, for budgeting reasons. Any plans to support this?

!aNRFNkpkG. No.3933 View ViewReplyReportDelete
Is anyone else having issues with 4chan right now? For some reasons, messages are taking ages to post.

No.3941 View ViewReplyReportDelete
If you search for deleted ghost posts on /a/ you actually find results, how is this possible? I thought a deleted ghost post remained permanently deleted with no way of accessing it.

the server is stable but the software is at its final limits

## Developer No.3366 View ViewReplyLast 50ReportDelete
> TL;DR:
* EDIT: 2018/08/20: Frontend server provider's SSD inodes fucked up, but we restored from backup after weeks of painstaking reinstallation.
* We used the search server to rescrape images and posts in between then and now, still importing. Unfortunately this means the search server cannot provide search for a few weeks until everything moves back.

* (Backend) Hard drives, server, is physically fine and did not face any issues.
* But FoolFuuka and Asagi has reached its utter limits. Not to mention cloudflare on 4chan itself exerts a limit on the amount of requests made, makes scraping extremely difficult (the same problem from before).
* As such we have had to pause archival of gif and wsg images or face an inability to scrape all images.
* The main admin was incapacitated the past 2 months and could only respond to issues in the last month intermittently. Need a successor.
* Demonstrate your contribution: help us fix Asagi or develop a replacement. As 4chan's volume grows and the amount of posts held reaches Big Data levels, the entire 4chan archival community is reaching the limits of this decade old software.

If you can help, contact us directly at the bridged channels irc.rizon.net #bibanon , our Matrix/Riot.im channel https://riot.im/app/#/room/#bibanon-chat:matrix.org .

## what is up

Since the primary admin is essentially incapacitated due to his brand new soul crushing job and recently a burst eardrum, there was no consensus on making an announcement, but as the hardware provider I feel it is time for the public to know.

Last year due to our size we were the first archiver to face and report the issues of Cloudflare's aggressive anti-bot protections on 4chan. You can read a full discussion about that in the thread below. There is a workaround that we have shared to the other living 4chan archivers but the fact remains for every 4chan archiver (not just desuarchive), the situation is not fully resolved and exerts a limit on how much can be archived from one node.


Recently in the past month there was two incidents where the site went down without restarting. Due to the primary admin's brand new soul crushing job he was unable to respond to notifications for a week. He returned and we worked out a method to recover the missed images from 4chan's archives.json, significantly mitigating the loss. But now without this admin we are still understaffed.

Without his expertise, in order to reduce dropped images on the other boards we have had to give up archiving /gif/ and /wsg/ images a few weeks ago to save enough requests for the other boards. As such Desuarchive slogs on for now.

## know the stakes

We are not alone. Every archiver of this size will meet this issue with cloudflare, died under the strain of scaling up, or dropped boards to keep running. Desuarchive by virtue of being the largest, holding threads from archive.moe and foolz, and having the most high volume boards that keep growing is the canary in the coal mine for the whole community.

To those who look down at how things pull together here, look at our colleagues and predecessors. Every single time they met a scalability or cost issue like this they quit and deleted their archiver. RebeccaBlackTech had already abandoned multiple boards, struggled with the fuuka engine with no updates or optimization, and was about to delete the site until we gave them a hand. Archive.moe, Foolz, they did much worse and actually lost or even deleted previously archived data. Loveisover already died and deleted everything due to their failure to handle expansion. 4plebs pulls together alright but that's because they choose not to expand for new boards. You can check out the page we wrote that is a literal graveyard for dead archivers.


The fact is, Foolfuuka and Asagi let them down with their poor design and horrendous and inefficient resource usage. 4chan grew too much for them. We are one of the last living by investing $6000 of our own money in hardware, countless hours in time to shore up FoolFuuka and Asagi, and $200 a month in maintaining this site. I think we got maybe $100 in donations to date. So who has a stake in our success? It matters not, one thing you can be sure of is our tenacity. We have done this for 3 years and more already and we will continue onward with support or no support.

## do it yourself???

If you are not satisfied we challenge you to run an archiver. Let us reiterate:

If you'd like to start a 4chan archiver of your own, just read our guide to set up FoolFuuka along with Asagi: https://wiki.bibanon.org/FoolFuuka

* FoolFuuka/Asagi is very RAM hungry. Unless it is optimized, archival will continue to be as expensive and unsustainable as it is today.
* For a server that supports a publicly viewable thumbs only archival of all boards, it will require 64GB of RAM, a decent CPU after Sandy Bridge, and at least 500GB for thumbs (to hold all thumbs released on the Internet Archive from Archive.moe, 4plebs, or such).
* For Desuarchive, 20-40TB of space is necessary just to hold its full images to date (not even that from the Archive.moe dump).

Don't be surprised if you face the same travails that we and our many predecessors have. But don't hesitate to ask us for assistance either, because we have long experience with setting up these systems. We are one of the last, but most dedicated support groups for this crumbling, aging piece of software.

Or maybe we can develop an alternative solution so that no more admins will have to suffer the horrors of Asagi again. Perhaps we will call the successor engine and framework Ayase.

## developers and sysadmins halp

Maybe we as anons are poor in money and surplus in time.

This is why I call on all who can to help us improve or replace the Asagi scraper engine. This is not just for our own good, the entire 4chan archiver community is at stake. So now this is an ingenious way for you to show how much you care, to prove your stake.

**Unfortunately, as the primary admin is incapacitated I lack the expertise in the software to fully explain the issues.**

But in short there can be considered to be a limit on how many requests can be made to 4chan from one node. Asagi itself also has a resource limit whereby it spirals out of control and crashes during times of high load from 4chan. Finally the moment cloudflare itself drops the 503 blocks, nothing can be done except to restart the archiver or reduce the amount of boards being scraped. From what I understand, Asagi is too inflexible to have more than one instance in one database, so a whole new software will be necessary to support multiple scraping nodes.

A new engine for archiving 4chan at scale must be developed. It will require the ability to asynchronously scrape threads and images without consuming too many idle resources of CPU and RAM time. It will need to be able to run in multiple nodes but report to one MySQL Database.

## why no public statement

Know that I have been reading your posts on this board and simply due to lack of consensus while the primary admin was incapacitated I was not able to respond, apologies. But let's be real, for most server issues there is really not much that debating with the crowd can solve. I can count on my two hands the amount of people in the world who are qualified to operate FoolFuuka and Asagi at scale.

But this moratorium ends now. If you are interested in assisting us drop in a pull request or drop into our channel at these communication points.

If you are interested in becoming a volunteer sysadmin for Desuarchive, stop by the channel. You must prove that you 1. have at least 2 hours per day of free time to volunteer and 2. have comfortable experience with most of the following:

* Utilizing command-line Linux
* Experience setting up virtual private servers, particularly LXC containers
* Setting up, partitioning, and recovering a ZFS RAID (without GUIs or wizards)
* Building Nginx webserver configurations
* Experience setting up a functioning instance of FoolFuuka and Asagi **strongly recommended**
* Tuning MySQL databases with Percona, TokuDB, and other optimizations
* Setting up Sphinxsearch at scale with multiple nodes
* Setting up node.js and Java instances and optimizing to keep resource usage low

This post was modified by Desuarchive Administrator on 2018-08-21
225 posts and 9 images omitted