Organizing Data Through the Lens of Deduplication 03 Aug 2020 · 5 min read — shared on Lobsters Reddit and Twitter Our home file server has been running since 2008, and over the last 12 years, it has accumulated more than 4 TB of data. The storage is shared between four people, and it tends to get disorganized over time. We also had a problem with duplicated data (over 500 GB of wasted space), an issue that is intertwined with disorganization. I wanted to solve both of these problems at once, and without losing any of our data. Existing tools didn’t work the way I wanted, so I wrote Periscope to help me clean up our file server. Periscope works differently from most other duplicate file finders. It’s designed to be used interactively to explore the filesystem, understand which files are duplicated and where duplicates live, and safely delete duplicates, all without losing any data. Periscope enables exploring the filesystem with standard tools — the shell, and commands like cd , ls , tree , and so on — while providing additional duplicate-aware commands that mirror core filesystem utilities. For example, psc ls gives a directory listing that highlights duplicates, and psc rm deletes files only if a duplicate copy exists. Here is Periscope in action on a demo dataset: Continue reading →

Experiments in Constraint-based Graphic Design Lobsters, 12 Dec 2019 · 16 min read — shared on Hacker News Reddit and Twitter Standard GUI-based graphic design tools only support a limited “snap to guides” style of positioning, have a basic object grouping system, and implement primitive functionality for aligning or distributing objects. They don’t have a way of remembering constraints and relationships between objects, and they don’t have ways of defining and reusing abstractions. I’ve been dissatisfied with existing tools for design, in particular for creating figures and diagrams, so I’ve been working on a new system called Basalt that matches the way I think: in terms of relationships and abstractions. Basalt is implemented as a domain-specific language (DSL), and it’s quite different from GUI-based design tools like Illustrator and Keynote. It’s also pretty different from libraries/languages like D3.js, TikZ, and diagrams. At its core, Basalt is based on constraints: the designer specifies figures in terms of relationships, which compile down to constraints that are solved automatically using an SMT solver to produce the final output. This allows the designer to specify drawings in terms of relationships like “these objects are distributed horizontally, with a 1:2:3 ratio of space between them.” Constraints are also a key aspect of how Basalt supports abstraction, because constraints compose nicely. I’ve been experimenting with this concept, off and on, for the last couple years. Basalt is far from complete, but the exploration has yielded some interesting results already. The prototype is usable enough that I made all the figures in my latest research paper and presentation with it. Continue reading →