I’ve been a Reddit member for the past 3 years now, but only recently did I discover the /r/StarTrek subreddit. I came across a comment about how “bearing 310 mark 215” came up unusually often throughout the different Star Trek series.

“Oh, really?” I wondered.

Lately I’ve been trying to learn to write Android apps but I quickly found that I needed a refresher in Java. I learned C++ and Java in college, like many CS students, but aside from a short stint with C# at work, my knowledge of the language has waned over the years. When I revisited that post on Reddit a few days later, I came up with an idea to bring everything together:

Find a set of scripts from the different Star Trek series. Write a program in Java to read the scripts and pull out phrases that mention course, heading, etc. Analyze the results.

Finding Scripts

This was pretty easy. A quick Google search led me to TWIZ TV where I found scripts for every episode of every season of Star Trek: The Next Generation and Star Trek: Deep Space Nine conveniently bundled into ZIPs. Unfortunately, they didn’t have anywhere near as thorough collections of The Original Series, Voyager, or Enterprise. But this is a good start.

Writing the Java Program

This took more time than I expected, but it was well worth the trouble. Thanks to a good Java book, and a bit of googling, I wrote my first arguably useful Java program in years.

From a given starting folder, the code iterates through all its sub-folders and analyzes all the text files within. Each time a keyword shows up — course, heading, mark, range, or distance — a line is output that contains the filename (indicating the series, season, and episode), the line number (to verify using Notepad++), which keyword was detected, and any numbers (spelled out in the scripts) that immediately follow the keyword. Output is written to console which can be easily piped to a CSV file. From there, the CSV can be exported to a database or spreadsheet for analysis.

Some caveats:

If there is any text between the keyword and the number sequence, it does not register as a “hit” in my code. So if the script reads “Heading of three-two-one” it will be missed because of the word “of” in the middle.

The “Is the next word after the keyword a number?” function tries to cover all types of numeric text (one, two, …, nine, zero, oh, ten, eleven, … twenty, thirty,… hundred, thousand, million, but I’m sure something slipped detection.

I got lazy near the end, so I just hard-coded parameters like the path to the files and the keywords to search for.

Results

The following results were the result of passing the CSV output into OpenOffice Base, where I could run a few simple SQL queries:

First, let’s look at the phrase that started this mad journey. “Bearing three-one-zero” appears once (DS9 Rocks and Shoals). For what it’s worth, “Mark three-one-zero” appears twice (TNG All Good Things and DS9 Valiant) “Mark two-one-five” is mentioned eleven times, but it is not the most frequent “mark.” “Bearing two-one-five” appears twice (TNG All Good Things and DS9 Destiny)

The most common bearing is one-eight-seven with four instances — four different episodes, two are TNG and two are DS9. Tied for second place are a half dozen values with two instances each.

The most popular sector is zero-zero-one with seven instances — six of those were from the TNG Best of Both Worlds two-parter, and the other instance was from DS9’s first episode whose opening scene takes place at the same time as Best of Both Worlds. The second most popular “sector” was twenty-one-five-oh-five — five hits, all from TNG episode The Wounded.

The top five “marks” were: four (12), two-one-five (11), three (8), zero (7), and seven (6).

There were no “heading” or “range” values that stood out statistically.

I’ve provided the CSV below, which you can view in your favorite spreadsheet or database software and make your own observations. I’ve also provided the source code, so others may continue where I leave off.

Files