May 23 2019

Introduction

Since one of its earliest releases, I have been a rabid fan of an indie computer game called Out of the Park Baseball (OOTP).

OOTP is a video game that allows a player to manage his favorite baseball team from:

The front office, as the team’s General Manager

The dugout, as the team’s manager

or both at the same time!

It’s not like your average console based baseball video, like MLB: The Show, where the player controls the pitcher/fielder/batter and “plays” each game. In OOTP, if you choose to play as the “manager” of your team you have control over some strategy for each AB, and control of substitutions, but the rest is simulated by the game. Where OOTP really shines, however, is at the General Manager level.

Taking the reins

As a GM, you have complete control of the rosters of every team in your organization, from the MLB club all the way down to its Rookie Ball affiliates. This includes:

Setting lineups

Signing/releasing players

Managing minor league prospects and their promotions

Drafting amateurs

Handling your budget

Navigating the waiver wire

Making trades

just to name some of the abilities you’re entrusted (burdened?) with. Instead of playing out one game at a time (although I’m sure some people do), most people playing the game this way simulate seven calendar “days” at a time. The multi-season arc that the game takes as a result is absolutely addictive. The functionality is so complete and the simulation engine so realistic that several current and former MLB players have acknowledged they are huge fans and play regularly, including Curt Schilling, who is an avid fan of the game. He’s been quoted saying about OOTP:

There is no comparison to any sports sim I’ve ever played

Red Sox owner John Henry seems to agree as he has been quoted as saying:

OOTP is an Astonishing Accomplishment

OOTP’s potential is fully realized

The game’s AI, considering the monstrous task being asked of it, does a pretty good job of producing a realistic and enjoyable challenge. When complaints about the realism of the simulation AI inevitably arise on their forums, the team behind OOTP is quick to defend the AI, pointing out that by simulating thousands of seasons inside their game using MLB’s opening day rosters each year, the AI has correctly picked both World Series participants AND the eventual World Series champion three years running.

That being said, OOTP’s true magic is on full display with its ability to support multi-player online leagues. Here, every team is managed by an actual human and any weaknesses of the AI, realized or not, are removed from the equation. What results is the most challenging, rewarding and insanely addictive competition I’ve ever been exposed to while playing a video game.

Since I started playing OOTP, immediately after I got out of college in 2005, I have played in a number of online leagues; but most of that time has been spent playing in just two of them. The first, which was my first online league experience, remained active for 10 years; simulating seasons up until the year 2075 in-game. The second is my current league, the If Baseball Were Different League (IBWDL). Made up of some of the most passionate and shrewd baseball minds I’ve come across in all my time playing OOTP online, the league features:

Several players who assume a second, “journalist” persona on our Slack boards. For example, we have our own version of a "Buster Olney”, who publishes articles ranging from: Opening Day previews and forecasts for every division in MLB Teams needs, Buyer/Seller analysis and Hot Rumors as the Trade Deadline approaches Playoff series previews, like this one which incorrectly predicted that my Philadelphia Phillies would fall to the Tampa Bay Rays in last year’s world series. (Nice try guys.) Free Agent market rundowns

A group of GMs who publish a bi-weekly podcast about our baseball universe, including: Interviews with other GM’s in the league A breakdown of all the important action in the last simulation

A dedicated commissioner and former Minor League Baseball play-by-play announcer who, twice a week does a live broadcast of the most important games of the simulation on our Twitch channel.

I can not even begin to quantify the number of hours I have spent agonizing over the decisions presented to me in this league. From free agent signings, to trade offers, to cutting my final roster down to 25 before Opening Day, each is approached knowing that everyone else in the league is doing their level best to make sure I never get to hoist the World Series trophy over my head.

Looking for an edge

So, with all that being said, I’m always looking for an edge that will allow me to beat the pants off of the rest of the guys in my league. With that in mind, let’s take a quick detour and talk about where I think the game falls short of the high standard it has set for itself in so many other areas.

That shortcoming lies in the UI and its limits with respect to displaying statistics for players/teams/leagues that, it seems to me, the game could easily remedy. OOTP’s simulation engine, maybe obviously, generates outcomes for each and every pitch of each and every game that it simulates. This data is extremely rich, and OOTP could allow a player to run all sorts of exotic queries against that data to derive insights about their team’s performance. However, the tools presented to you by the game to leverage this data are often lacking. While they have some nice options for splits, (vs. LHP, or Last Week) even trying to see a double split for a player (say Last Week vs. LHP), is difficult if not impossible. There is no ability to compare two players’ statistics across any kind of split either. Makes for a lot of clicking back and forth between the players you’re considering while trying to figure out what you’d like to do.

On the plus side, OOTP does allow a player to export most of their data in either CSV or SQL formats and while I can’t directly export a table that contains all of the pitch by pitch data, OOTP does allow me to export some data which they call “Game Logs”. The rows in this table are text strings that describe each pitch and the subsequent events/activity that resulted from that pitch. To give you an idea of what I’m talking about, here is an example of a game log rendered in HTML:

So, I have access to the outcome of every pitch in the history of our league, albeit in a format that will require some massaging, presented to me in CSV format. There are approximately 23 million individual events in the “Game Logs” CSV dump that I just generated for my league, and combined with some other available table like player/team/league info, I think I’ve found my edge, and potentially one heck of an interesting computer engineering challenge. :)

What Are You Trying To Do?

I’m going to build a web-based UI for analyzing all of this amazing data I now have my hands on. I’d like to setup a tool that will:

Parse the text strings that make up the “Game Logs” CSV file I mentioned earlier in a format that I could query against. Take that parsed data and create a database record called “GameEvent” to store it. Allow me to define new “Statistics” which will be backed by the new GameEvent table to return information from the database that OOTP itself does not produce. Provide a UI to view those statistics for different groups of players for various time ranges inside the game.

On top of that, I’d also like to add some functionality that I think the game is missing, like:

A screen that allows me to easily compare my top prospects as they ascend the minor leagues

A trade analysis tool that will let me play with my lineups/pitching staff while I move players in and out of deal I’m considering

A view into the Free Agent market that allows me to more easily measure the value of the players on the market while providing a UI to visualize how they would fit into my team if signed.

Why Is This Going To Be Interesting?

Besides being a baseball nut, I’m a software engineer who could always stand to brush up on the latest and greatest technologies, so along the way I’m interested in learning some new things. I plan to try to cover the following topics as I work towards my ultimate end-goal, many of which I hope you, dear reader, will find interesting like:

Working with a Graph DB, Neo4j

I’m going to load the data into both a PostgreSQL database and a Neo4j database. I’ve never worked with a Graph DB and I can see some places where its structure would be beneficial to the app, so I’m excited to take it for a spin. Additionally, I think the modeling of this problem is challenging enough that it should really help me gain some competence over this tech.

Studying the performance and application of different database technologies

I always hears advice from “the internet” about the performance limitations of different database technologies and how those limitations really show up when the data set gets very large. Well, luckily I have the ability to generate as much data as I’d like to. Sure I have ~23M records from my online league, but I could easily generate 10x more by creating and running a single player game well into the future. I’m interested to find out how/when the two disparate technologies struggle during the building of the app.

Implementing a GraphQL API

GraphQL is the new hotness in the web development world, and as a result I’ve been itching to give it a try personally. I could envision having some entities in the database that have quite a large breadth, objects like a player’s “hitting statistics”. That endpoint could return an almost infinite number of fields, as a quick look at player on Baseball Reference shows at least 50 categories just for hitting. So, maybe an API that would allow us to pick and choose the fields we need returned would work better than a more traditional REST API that would either a) require more API calls to get the data or b) be much more inefficient as all fields would be transmitted during the call instead of just the desired fields.

Advanced Caching Work

This application will be working with large amounts of data and be tasked with performing near real-time calculations over the large dataset. Working in its favor however, is the fact that the data is read-only and new data will not be “streamed” in, but rather “dumped” after every simulation, which, for my current leagues, occurs on Monday, Wednesday and Friday. Taking these two factors together, it seems there is an opportunity here to experiment with some advanced caching techniques to make this real-time feel a reality.

JavaScript Visualizations

Since the application will be digesting pitch-by-pitch data about each and every individual game in the league, I think it would be fun to try to visualize the game logs in the app using some kind of JS canvas tool like anime.js. I could imagine a replay feature that would playback a game from the web using these game logs as a source.

Building a high performance calculation engine using a functional language

As I’ve stated earlier, I anticipate having to make some pretty complex calculations on a very large dataset while building this app. That on its own seems like a very interesting problem that I look forward to tackling and one I think a functional paradigm would be well suited to handle, considering the immutability of the data it would be working with. Here at MojoTech we work with Elixir/Phoenix on various projects and have been pleased with its performance and ease of use, so, Elixir will be my weapon of choice, since:

It’s built on top of Erlang, whose reputation for speed precedes itself

Its syntax is very similar to my strongest language, Ruby

Many consultancies have begun to use it for their daily work and I’m very interested to understand its strengths and weaknesses.

React and Server-Side Rendering

I’m still learning the ins-and-outs of React and its ecosystem, so being able to work with it on an app that really scratches my own personal itch should help me take my understanding to a new level. Since this app is so heavily tilted towards being read-only in so many aspects, I suspect some interesting things can be done with respect to server-side rendering, so I’ll most likely give that a try. I’ve also wanted to evaluate and get experience working with TypeScript, so I’ll include that as well.

Stay tuned…

I’ll be adding posts as I tackle the many various aspects of bringing this application to life. As I’ve mentioned here, many of the topics I’ve mentioned above are entirely or mostly new to me, so I’m interested and excited to hear feedback from you, dear reader, as we embark on this journey together, at least figuratively.

And the first step in that journey together will be covered in my next post, which will discuss the process of tokenizing the “Game Logs” and getting them into the database(s) in a format that I can work with. Tentatively titled:

/(Working|Struggling) with Regexps and tokenization!/

Thank you and please stay tuned!

P.S. I know, I know, you mean that I get to spend some part of my work week thinking about and playing with my favorite video game? Well, you can too, all you have to do is come work with us at MojoTech!