Decensor 2019-06-20

For a while I was a "data minimalist". In time, I've found that holding data, and it's other synonym, state, can be quite valuable. While I like to cough at the idea of state, oh do I ever hate state, sometimes... it's necessary.

I've been wondering about hosting images there on this blog and eventually went to an IPFS based solution which is mostly working, but flakey. This isn't entirely IPFS' part, this is largely because my servers get replaced every week. At least on this blog, the two clearnet servers and two hidden servers that host it. It's not something IPFS was probably designed around.

IPFS is nice because you can add the same file from two machines, on two different planets, and get the same hash. Things are getting censored all the time, videos and images that merely display facts will be taken down. Unfortunately, the hash is not that easy to calculate and is not just the multihash you'd expect.

I have also wanted to have a "war chest" of memes, facts, screenshots, etc. Both because they get lost in time (sli.mg dying, imgtc losing images in the past, etc) and because I wanted a good way to archive them. I could just link to them, but with storage being so cheap, why not have them? And here's the thing about offline data, it's more private than even double Tor. No one knows you're accessing that file, how many times, where your cursor is on it, etc. There's no traffic on the wire to analyze. So it wins from privacy, speed, reliability, and censorship resistant perspectives.

One thing I've realized is that having filenames with the checksum of the file is a good idea. While they are long and cumbersome, it's a good way to see what exactly was there. It can become proof of what was there, if the thing was discovered.

Example

Alice shares http://somedomain.notld/objectionablememe.png on Twitter.

somedomain.notld gets a takedown request. That file can be replaced and there's no obvious way to tell that happened, or it can be deleted (most likely). What if Bob wanted to know what was deleted?

Alice could have shared http://somedomain.notld/57d51a0f4656de16a23233698cef58aa98d1e5d0516e8a796e5881c81f43563c . Now Bob knows that 57d51a0f4656de16a23233698cef58aa98d1e5d0516e8a796e5881c81f43563c should be the SHA256SUM of the object.

Bob can now search for 57d51a0f4656de16a23233698cef58aa98d1e5d0516e8a796e5881c81f43563c on any search engine and may come up with a result. If he does, he should make sure the filename matches the file's contents in terms of the checksum. If it does, either someone really likes long 64 hexademical filenames or that's the image Alice was talking about.

This helps with replication as well. Anyone can see they have the same meme. If they store their memes on disk by the checksum, or an image host does, there's no collisions (or so I hope) and there's no duplicates. And you can see how many other sites have the same copy. Because not everyone is writing "asdiad.png" or "otioai.png" for the same image. Heck, they might have multiple of the same file, just on different names. Waste of disk space.

Tagging

You can have a bunch of files and sift through all to know which one is relevant. Let's say you're 100 replies into a 4chan thread and you've got the perfect reply. But, wow, is it ever hard to find in a folder with 10,000 images!

So maybe you categorize it by folder. That helps a bit, but usually a meme is going to fit different categories. So maybe you do folders and symlinks, and that kind of works. My point is that tagging can be useful.

Solution

Now this is my solution, for now. It's probably going to evolve and change a lot. Formats may change, URL formats as well. I don't know. It's pretty fresh off the press. Written in Go which I am not nearly as good at as Python, so there's definitely some issues.

I decided to write a tool that you can use on the command line and in a browser (read only). This lets me share my "war chest" with anyone else. In the future I'd like things to be easily exported. I may open up rsyncd so anyone can sync the whole thing easily and merge it with theirs. Of course different people will have different tag name convensions, etc. And tagging in general can be tricky, you want to be specific but not too specific.

Intent

I obviously have beliefs and ideas that lean a certain direction. Not entirely, of course. I'm either probably extremely wrong or extremely correct if I'm angering people on both sides of the aisle with what I'm saying.

My goal is truth. I don't want to spread things that aren't true. If you can refute something I have, especially if it was entirely fake, please tell me. I don't care if it supports my point, it's not the truth. Maybe in time I'll fall over to another side because in the end it makes sense, I don't know. I just want to share reality and you can make the most of it. Now sometimes more information can make people less happy. Happiness isn't everything, but sometimes enough blackpills and you're feeing pretty sour about the world. It's a tricky balance because you don't want to be in la-la land and you don't want to think things are worse than they are.

In action

You can Decensor in action on this website. Should work on clearnet or Tor. Yes, it's super ugly and can be polished a lot. It's pretty easy to use and add stuff to.

Here's the source on Github.

As of 2020-08-23 I have switched away from Decensor. It was cool, worked well, and I liked the interface.

But, I wanted to have my website completely static, for various reasons. I looked into making Decensor generate a static set of pages (like Hugo) and did make some progress but ended up not going down that route. I like the simple route.

Basically, I moved my assets into /files/ on this domain. If I reference them, I do #sha256=(hash) with the link. This gives a cryptographic reference in source control and something I can audit. I also find having filenames, instead of just a hash, is quite useful. Sure, I miss the tagging and maybe can do that out-of-band.

This, along with removing Rantbin, (which I have mixed feelings about) makes for a completely static site that looks the same off a USB stick as it does go-beyond.org or potato...whatever.onion. Completely relatively pathed.

The assets are also tracked with rhash, so the sha256 is listed in an index (which can be found be search engines and such). The hashes file is PGP signed as well.

I'll have to write more on this later.

See the changes here if you're interested.

Code snippets I used for the conversion

I have to apologize, I added this section a while after I wrote the one-liners so they are foggy.

Copy assets according to their filename in metadata.

for asset in $(grep -o -r '/decensor/asset/[0-9a-f]*' . | awk '{FS = "/"};{print $NF}'); do cp /srv/files/decensor/assets/$asset /srv/files/terancorp/websites/go-beyond.org/files/"$(cat /srv/files/decensor/metadata/$asset/filename | tr -d '

')"; done

I keep rereading this one and I'm not entirely sure what I did with it. Maybe there was another script I wrote to do it. It's certainly rewriting the links, but I am not sure if something clever is getting me the hash or if I properly tagged it on another step.

find files/ -type f -exec sha256 {} \; | while read -r line; do find /srv/files/terancorp/code/go-beyond.org/content/ -type f -name '*.md' -exec sed "s|$(echo $line | awk '{print "/decensor/asset/"$NF}';)|/$(echo "$line" | grep -o '(.*)' | tr -d '()')|g" {} \; ; done | grep /files/