[231 / 10 / ?]

> TL;DR:

This post was modified by Desuarchive Administrator on 2018-08-21

* EDIT: 2018/08/20: Frontend server provider's SSD inodes fucked up, but we restored from backup after weeks of painstaking reinstallation.* We used the search server to rescrape images and posts in between then and now, still importing. Unfortunately this means the search server cannot provide search for a few weeks until everything moves back.* (Backend) Hard drives, server, is physically fine and did not face any issues.* But FoolFuuka and Asagi has reached its utter limits. Not to mention cloudflare on 4chan itself exerts a limit on the amount of requests made, makes scraping extremely difficult (the same problem from before).* As such we have had to pause archival of gif and wsg images or face an inability to scrape all images.* The main admin was incapacitated the past 2 months and could only respond to issues in the last month intermittently. Need a successor.* Demonstrate your contribution: help us fix Asagi or develop a replacement. As 4chan's volume grows and the amount of posts held reaches Big Data levels, the entire 4chan archival community is reaching the limits of this decade old software.If you can help, contact us directly at the bridged channels irc.rizon.net #bibanon , our Matrix/ Riot.im channel https://riot.im/app/#/room/#bibanon-chat:matrix.org ## what is upSince the primary admin is essentially incapacitated due to his brand new soul crushing job and recently a burst eardrum, there was no consensus on making an announcement, but as the hardware provider I feel it is time for the public to know.Last year due to our size we were the first archiver to face and report the issues of Cloudflare's aggressive anti-bot protections on 4chan. You can read a full discussion about that in the thread below. There is a workaround that we have shared to the other living 4chan archivers but the fact remains for every 4chan archiver (not just desuarchive), the situation is not fully resolved and exerts a limit on how much can be archived from one node.Recently in the past month there was two incidents where the site went down without restarting. Due to the primary admin's brand new soul crushing job he was unable to respond to notifications for a week. He returned and we worked out a method to recover the missed images from 4chan's archives.json, significantly mitigating the loss. But now without this admin we are still understaffed.Without his expertise, in order to reduce dropped images on the other boards we have had to give up archiving /gif/ and /wsg/ images a few weeks ago to save enough requests for the other boards. As such Desuarchive slogs on for now.## know the stakesWe are not alone. Every archiver of this size will meet this issue with cloudflare, died under the strain of scaling up, or dropped boards to keep running. Desuarchive by virtue of being the largest, holding threads from archive.moe and foolz, and having the most high volume boards that keep growing is the canary in the coal mine for the whole community.To those who look down at how things pull together here, look at our colleagues and predecessors. Every single time they met a scalability or cost issue like this they quit and deleted their archiver. RebeccaBlackTech had already abandoned multiple boards, struggled with the fuuka engine with no updates or optimization, and was about to delete the site until we gave them a hand. Archive.moe, Foolz, they did much worse and actually lost or even deleted previously archived data. Loveisover already died and deleted everything due to their failure to handle expansion. 4plebs pulls together alright but that's because they choose not to expand for new boards. You can check out the page we wrote that is a literal graveyard for dead archivers.The fact is, Foolfuuka and Asagi let them down with their poor design and horrendous and inefficient resource usage. 4chan grew too much for them. We are one of the last living by investing $6000 of our own money in hardware, countless hours in time to shore up FoolFuuka and Asagi, and $200 a month in maintaining this site. I think we got maybe $100 in donations to date. So who has a stake in our success? It matters not, one thing you can be sure of is our tenacity. We have done this for 3 years and more already and we will continue onward with support or no support.## do it yourself???If you are not satisfied we challenge you to run an archiver. Let us reiterate:If you'd like to start a 4chan archiver of your own, just read our guide to set up FoolFuuka along with Asagi: https://wiki.bibanon.org/FoolFuuka * FoolFuuka/Asagi is very RAM hungry. Unless it is optimized, archival will continue to be as expensive and unsustainable as it is today.* For a server that supports a publicly viewable thumbs only archival of all boards, it will require 64GB of RAM, a decent CPU after Sandy Bridge, and at least 500GB for thumbs (to hold all thumbs released on the Internet Archive from Archive.moe, 4plebs, or such).* For Desuarchive, 20-40TB of space is necessary just to hold its full images to date (not even that from the Archive.moe dump).Don't be surprised if you face the same travails that we and our many predecessors have. But don't hesitate to ask us for assistance either, because we have long experience with setting up these systems. We are one of the last, but most dedicated support groups for this crumbling, aging piece of software.Or maybe we can develop an alternative solution so that no more admins will have to suffer the horrors of Asagi again. Perhaps we will call the successor engine and framework Ayase.## developers and sysadmins halpMaybe we as anons are poor in money and surplus in time.This is why I call on all who can to help us improve or replace the Asagi scraper engine. This is not just for our own good, the entire 4chan archiver community is at stake. So now this is an ingenious way for you to show how much you care, to prove your stake.**Unfortunately, as the primary admin is incapacitated I lack the expertise in the software to fully explain the issues.**But in short there can be considered to be a limit on how many requests can be made to 4chan from one node. Asagi itself also has a resource limit whereby it spirals out of control and crashes during times of high load from 4chan. Finally the moment cloudflare itself drops the 503 blocks, nothing can be done except to restart the archiver or reduce the amount of boards being scraped. From what I understand, Asagi is too inflexible to have more than one instance in one database, so a whole new software will be necessary to support multiple scraping nodes.A new engine for archiving 4chan at scale must be developed. It will require the ability to asynchronously scrape threads and images without consuming too many idle resources of CPU and RAM time. It will need to be able to run in multiple nodes but report to one MySQL Database.## why no public statementKnow that I have been reading your posts on this board and simply due to lack of consensus while the primary admin was incapacitated I was not able to respond, apologies. But let's be real, for most server issues there is really not much that debating with the crowd can solve. I can count on my two hands the amount of people in the world who are qualified to operate FoolFuuka and Asagi at scale.But this moratorium ends now. If you are interested in assisting us drop in a pull request or drop into our channel at these communication points.If you are interested in becoming a volunteer sysadmin for Desuarchive, stop by the channel. You must prove that you 1. have at least 2 hours per day of free time to volunteer and 2. have comfortable experience with most of the following:* Utilizing command-line Linux* Experience setting up virtual private servers, particularly LXC containers* Setting up, partitioning, and recovering a ZFS RAID (without GUIs or wizards)* Building Nginx webserver configurations* Experience setting up a functioning instance of FoolFuuka and Asagi **strongly recommended*** Tuning MySQL databases with Percona, TokuDB, and other optimizations* Setting up Sphinxsearch at scale with multiple nodes* Setting up node.js and Java instances and optimizing to keep resource usage low