writing is migrating to plain-text with light-markup— such as markdown, restructured-text, and asciidoc —most especially among developers and tech people, but also authors, and soon everyone, i guarantee it.

in addition to its biggest benefit, the great flexibility, and the appeal of static blogs, a plain-text workflow also makes a promise for convenient version-control. which makes most techies think of github, am i right?

now, anything more than convenient diffs is overkill for most writers, but since when have techies been able to resist overkill when it involves something complicated? some of them prefer to use a screwdriver to pound nails.

ergo, there are constant calls to use github for prose. but this drumbeat ignores one very fundamental fact. actually, it ignores a dozen; but let’s focus now on one: github diff routines just don’t work well for prose.

and even when the algorithms do work acceptably well, the display of your diffs is often “less than ideal”…

indeed, it was only in december of 2013 that paragraphs on github prose diffs started to be wrapped; before then, i guess you needed to do horizontal scrolling to see them. (and i think everyone agrees the horizontal-scroll is evil.)

> https://github.com/blog/1707-soft-wrapping-on-prose-diffs

it was another 9 months later before side-by-side diffs and, more importantly, word-highlighted diffs, were introduced.

> https://github.com/blog/1884-introducing-split-diffs

> https://github.com/blog/1885-better-word-highlighting-in-diffs

these huge advances came right after github put on sale its t-shirts, which might tell us about the priority of prose.

> https://github.com/blog/1883-new-in-the-shop-github-flow-shirts

but even now, with those improvements, there are still flaws with the display of prose diffs in github. if you add a few words inside a paragraph, it can often cause the words which follow to misalign, relatively, in the “before” and “after” displays. (you see a small example of this even in the graphic at the top.) and if you add a lot of stuff, it might all go way out of whack.

***

it is fairly well-recognized that things would work better if each sentence in the text was placed on a separate line.

not only might the change-algorithms work more smoothly, but the display-of-diffs would also be significantly improved.

it’s simple: one sentence per line.

some people — dating to the illustrious brian kernighan — argue that this makes it easier to rewrite and edit as well, as “most people change documents by rewriting phrases and adding, deleting and rearranging sentences.” astute. (but he could benefit from some judicious oxford commas.)

in accordance with his observation, kernighan also advised that we “make lines short, and break lines at natural places, such as after commas and semicolons, rather than randomly.”

for pointers to this, google’s top return gives “semantic lines”:

> http://rhodesmill.org/brandon/2012/one-sentence-per-line/

brandon rhodes notes that we should remember that we can “add linefeeds anywhere that there is a break between ideas.”

the best summary is: use linebreaks to split on phrases.

***

as someone who has used this practice for many decades now, i can assure you it’s a useful one during the course of re-writing.

but whether or not you choose to write in this particular way— maybe you have reasons not to, or can’t get in the habit, or you probably just think it looks weird and it freaks you out — there is no dispute that it makes version-tracking diffs better.

so you might not want to write that way, but you might like to have your text be in that format for better version-tracking.

good news: you can have it both ways! just call “breakerbreaker”.

use this little javascript routine to break your text into phrases…

//

// breakerbreaker -- a routine to break text into phrases

//

var s=$("#theinput").val()

//

// #1 -- regularize line-endings and delete trailing spaces

//

while (s.indexOf("\r

") > -1) {s=s.replace(/\r

/g,"

")}

while (s.indexOf("\r") > -1) {s=s.replace(/\r/g,"

")}

//

while (s.indexOf("

") > -1) {s=s.replace(/

/g,"

")}

//

// #2 -- introduce space/linebreak combination at phrases

//

s=s.replace(/\. /g,".

")

s=s.replace(/, /g,",

")

s=s.replace(/; /g,";

")

s=s.replace(/: /g,":

")

s=s.replace(/\? /g,"?

")

s=s.replace(/! /g,"!

")

s=s.replace(/\" /g,'"

')

s=s.replace(/\) /g,")

")

//

s=s.replace(/ \"/g,'

"')

s=s.replace(/ \(/g,"

(")

//

s=s.replace(/ -- /g,"

--

")

//

s=s.replace(/ about /g,"

about ")

s=s.replace(/ also /g,"

also ")

s=s.replace(/ and /g,"

and ")

s=s.replace(/ as /g,"

as ")

s=s.replace(/ because /g,"

because ")

s=s.replace(/ between /g,"

between ")

s=s.replace(/ both /g,"

both ")

s=s.replace(/ but /g,"

but ")

s=s.replace(/ by /g,"

by ")

s=s.replace(/ could /g,"

could ")

s=s.replace(/ ever /g,"

ever ")

s=s.replace(/ for /g,"

for ")

s=s.replace(/ from /g,"

from ")

s=s.replace(/ have /g,"

have ")

s=s.replace(/ how /g,"

how ")

s=s.replace(/ if /g,"

if ")

s=s.replace(/ in /g,"

in ")

s=s.replace(/ inside /g,"

inside ")

s=s.replace(/ into /g,"

into ")

s=s.replace(/ is /g,"

is ")

s=s.replace(/ may /g,"

may ")

s=s.replace(/ might /g,"

might ")

s=s.replace(/ minus /g,"

minus ")

s=s.replace(/ must /g,"

must ")

s=s.replace(/ never /g,"

never ")

s=s.replace(/ of /g,"

of ")

s=s.replace(/ on /g,"

on ")

s=s.replace(/ only /g,"

only ")

s=s.replace(/ or /g,"

or ")

s=s.replace(/ outside /g,"

outside ")

s=s.replace(/ plus /g,"

plus ")

s=s.replace(/ should /g,"

should ")

s=s.replace(/ that /g,"

that ")

s=s.replace(/ their /g,"

their ")

s=s.replace(/ to /g,"

to ")

s=s.replace(/ was /g,"

was ")

s=s.replace(/ what /g,"

what ")

s=s.replace(/ when /g,"

when ")

s=s.replace(/ where /g,"

where ")

s=s.replace(/ whether /g,"

whether ")

s=s.replace(/ which /g,"

which ")

s=s.replace(/ who /g,"

who ")

s=s.replace(/ why /g,"

why ")

s=s.replace(/ will /g,"

will ")

s=s.replace(/ with /g,"

with ")

s=s.replace(/ without /g,"

without ")

s=s.replace(/ would /g,"

would ")

//

$("#theoutput").val(s)

***

after regularizing all your line-endings to the standard “

” (which might be unnecessary, but let’s make sure anyway), “breakerbreaker” then removes any errant “trailing spaces” — i.e., each space directly preceding a linebreak — which is something that you shouldn’t have in your text-file anyway. (um, yes, if you are using that stupid markdown convention where 2 spaces at the end of the line force a hard-linebreak, do yourself a favor and change ’em to “<br>” forevermore.)

then “breakerbreaker” places a space/linebreak combination next to various “strings” which will typically set off phrases, — punctuation, conjunctions/disjunctions, prepositions, etc. the best way to think of this space/linebreak combination is it represents the equivalent of a word-processor soft-return. (except, of course, github still sees a “

” as a hard-return, so the diff-display will show a sequence of fairly short lines.)

as a little example, here’s what the first four paragraphs from this article look like after they have been through this routine:



writing

is migrating

to plain-text

with light-markup

--

such

as markdown,

restructured-text,

and asciidoc

--

most especially among developers

and tech people,

but also authors,

and soon everyone,

i guarantee it. in addition

to its biggest benefit,

the great flexibility,

and the appeal

of static blogs,

a plain-text workflow

also makes a promise

for convenient version-control. ergo,

there are constant calls

to use github

for prose. but this drumbeat ignores one very fundamental fact.



if you want to use “breakerbreaker” on some of your own text, you can fire up the demo program which you find located here:

> http://zenmagiclove.com/simple/breakerbreaker.html

that’s also a good place to grab the script — just “view source”…

***

by the way, in addition to here, this essay is also located here:

http://zenmagiclove.com/simple/breaker.html

***

after “breakerbreaker”, text is ready for its github diff closeup.

“most people change documents by rewriting phrases, and adding, deleting, and rearranging sentences.”

when a “breakerbreaker” text is diffed, results stand out clearly.

but there’s more beauty to come.

because you’re probably thinking that even if the diffs are fine, you don’t really wanna work with a text-file that looks like this, what with all your paragraphs being chopped into short lines. the thing is, you don’t have to, because this conversion is easily reversed with one global change: simply change all of the occurrences of space/linebreak to a space. and, voila!, your text is back to its original state, ready for you. but meanwhile, you have received a diff which is extremely clear. before check-in, run “breakerbreaker”; after check-out, re-join.

***

of course, github could just install this script in its diff workflow, and create better diffs for you without you doing all this “work”. maybe you can convince them of that; no, i can’t help you there.

but you’re still able to go through the process by yourself, so i hope this simple idea helps you get better diffs in the future.

just to be “meta,” i will put “breakerbreaker” up on github. you can find it here, and “fork it,“ or whatever you kids do.

https://github.com/bbirdiman

try it out, and if you have any suggestions for improvement,

get in the game!

-bowerbird

p.s. i’ve done much work on this, if you are interested…

some of the work originated with project gutenberg e-texts, text-lines obtained by o.c.r. done on paper-book page-scans, analyzing diffs resulting between various rounds of proofing.

you can see an example on these webpages:

> http://zenmagiclove.com/misc/weball.html

> http://zenmagiclove.com/misc/webone.html

and i wrote this later for version-control change-tracking:

> http://zenmagiclove.com/phrase-change-display.html

> http://zenmagiclove.com/phrase-change-sample.html

and here’s some version-tracking on the gettysburg address:

> http://zenmagiclove.com/misc/gabal/gabal.html

that’s all! thanks for reading…

-bowerbird