Text Justification

Giving a paragraph, assuming no word in the paragraph has more characters than what a single line can hold, how to optimally justify the words so that different lines look like have a similar length?

Paragraph below is what I randomly picked:

In computer science, mathematics, management science, economics and bioinformatics, dynamic programming (also known as dynamic optimization) is a method for solving a complex problem by breaking it down into a collection of simpler subproblems, solving each of those subproblems just once, and storing their solutions. The next time the same subproblem occurs, instead of recomputing its solution, one simply looks up the previously computed solution, thereby saving computation time at the expense of (it is hoped) a modest expenditure in storage space. (Each of the subproblem solutions is indexed in some way, typically based on the values of its input parameters, so as to facilitate its lookup.) The technique of storing solutions to subproblems instead of recomputing them is called “memoization”.

Let’s define a line can hold 90 characters(including white spaces) at most.

If we simply put each line as many characters as possible and recursively do the same process for the next lines, the image below is the result:

The function below calculates the “badness” of the justification result, giving that each line’s capacity is 90:

calcBadness = (line) => line.length <= 90 ? Math.pow(90 — line.length, 2) : Number.MAX_VALUE;

Why diff² ? Because there are more punishments for “an empty line with a full line” than “two half-filled lines.”

Also, if a line overflows, we treat it as infinite bad.

The total badness score for the previous brute-force solution is 5022 , let’s use dynamic programming to make a better result!

What’re the subproblems?

For every positive number i smaller than words.length , if we treat words[i] as the starting word of a new line, what’s the minimal badness score?

How to solve the subproblems?

The total badness score for words which index bigger or equal to i is calcBadness(the-line-start-at-words[i]) + the-total-badness-score-of-the-next-lines . We can make different choices about what words contained in a line, and choose the best one as the solution to the subproblem.

Let’s take a look at an example: if we have three words length at 80, 40, 30.

Let’s treat the best justification result for words which index bigger or equal to i as S[i].

What’s S[2]? We can make one choice:

Put a word length 30 on a single line -> score: 3600.

What’s S[1]? We can make two choices:

1. Putting the last two words on the same line -> score: 361.

2. Putting the last two words on different lines -> score: 2500 + S[2]

Choice 1 is better so S[2] = 361.

What’s S[0]? We can make three choices:

1. Putting the three words on the same line -> score: MAX_VALUE.

2. Putting the first word on line 1, and rely on S[1] -> score: 100 + S[1]

3. Putting the first two words on line 1, and rely on S[2] -> score: MAX_VALUE. + S[2]

Choice 2 is the best.

We can draw the dependency graph similar to the Fibonacci numbers’ one:

How to get the final result?

As long as we solved all the subproblems, we can combine the final result same as solving any subproblem.

The DEMO below is my implementation; it uses the bottom-up approach.

The memo table saves two numbers for each slot; one is the total badness score, another is the starting word index for the next new line so we can construct the justified paragraph after the process.