PDF View: College_Baseball_Top_Players

Overview:

The following will document the process of gathering and analyzing NCAA baseball statistics using R and Excel. I will briefly discuss the coding aspect of this project, but the primary purpose is to analyze the top offensive players in college baseball.

Gathering the Data:

The data for this analysis came from the NCAA statistics website, which contains team and player stats as far back as the 2012 season. To obtain this data, I created code in R to scrape the NCAA website and used the sqldf package to transform the data. Each school has a unique ID associated with it, and without going into too much detail, the code consisted of looping through these school ID’s and storing the results in two large tables, one table for a player’s career statistics and the other table for a player’s 2017 statistics (prior to May 1st). These output tables were then exported to excel using the xlsx package.

The Metrics:

Unfortunately, the NCAA website only contains traditional statistics, and the advanced metrics used for this project needed to be generated in R. While batting average and slugging percentage can be useful, they are not the best for gauging a player’s skill level and were not considered during my analysis. The metrics that are more telling of a player’s offensive ability and were the focal point of my analysis are ISO, K%, BABIP and wOBA. These stats are detailed below.

Isolated Power (ISO): ISO is used to assess a player’s raw power and gives a quick way to determine how often a player hits for extra bases as opposed to singles. For players with at least 100 PA, the average is .130 for this season and .110 for a career, but the elite power guys will have an ISO > .250

Strikeout Rate (K%): K% is simply how often a player strikes out per plate appearance. The average is around 17% and the players that are best at putting the ball in play will be around 10%.

Batting Average on Balls in Play (BABIP): BABIP measures how often ball in play falls for a hit. A batter does not have complete control of a batted ball outcome and this metric can help to evaluate factors such as defense, luck and talent. This tool is best used when comparing BABIP to a career BABIP. For example, if a player has a significant change in BABIP compared with his career BABIP, it can probably be determined that luck is playing a major role. The NCAA average this season is .330 and .320 for a career.

Weighted On-Base Average (wOBA): wOBA is one of the best measures of a player’s offensive skill level and combines all of the different aspects of hitting into one metric. It is calculated by assigning weights to each offensive event based on the event’s relative run value. While these weights change slightly each year, the weights used in this analysis were from the 2016 MLB season. The events along with their weights are as follows: BB (.691) HBP (.721) 1B (.878) 2B (1.242) 3B (1.569) HR (2.015)



Due to its ability to capture the offensive value of a player, wOBA was the statistic most considered during my analysis. Average for this year’s NCAA season is .345 while a career average is .323. The most complete hitters will have a wOBA above .400.

The Analysis:

The goal of this analysis was to find the top 10 hitters in select conferences based on the stats mentioned above. Keep in mind that park factors and quality of opponents were not considered, but career numbers should help to mitigate those factors. The process of selecting the top hitters consisted of using wOBA as a baseline, and then analyzing ISO%, K% and BABIP for each player. The objective was to value complete hitters, ones who were above average in each category; however, players who were near the top of the conference in one category could afford to be slightly below average in another category based on the quality of players in the conference. Additionally, only players with a minimum of 100 career plate appearances were included in the leaderboards. Based on previous year’s RPI and draft picks, only the top eight conferences were evaluated. These conferences are the All-American Conference (AAC), Atlantic Coast Conference (ACC), Big 10, Big 12, Big West, Conference USA, Pac 12, and SEC.

Before getting into the conferences, I first wanted to take a look at previous draft years to get a sense of what the numbers look like for some of the best players. The NCAA website only has stats beyond 2013, so I created a separate code to get stats from players selected in the first three rounds of the previous three MLB drafts. Listed in alphabetic order, here is that list displaying their career stats and stats for their last college season:

The list below shows the stats of a top 20 pick from the previous three drafts.

Notes:

Only four players had a wOBA below .400 in the year before they were drafted and only six players had a career wOBA below .400. The two players (Zach Collins & Ian Happ) who possessed below average K% during their final season made up for it with exceptional ISO and wOBA.

Many of these players are still developing and have not had a chance to advance through the minors yet, but this next list shows the players who have excelled at the next level are everyday MLB players.

Notes:

It’s easy to see why these players have had so much success at the professional level. Besides Trea Turner, who makes up for his power with speed, all of these players had an ISO above .200 in their final season before being drafted. The only player to have a below average K% was Aaron Judge, but a .408 career BABIP indicates how hard he can hit the ball and his ISO reflects that power.

AAC

Notes:

Jake Scheiner and Kevin Merrell are two of the top hitters in the country, ranking 23rd and 26th respectively in wOBA this season. Dwayna Williams-Sutton, drafted in the 21st round out of high school, needs to improve his strikeout rate, but owns the third highest ISO during his injury plagued season. Merrell, along with Houston’s Connor Wong, project to be selected within the first three rounds of this year’s draft.

ACC

Notes:

With 25 players over a .400 wOBA, this was the toughest conference to narrow down to just ten players. A number of players will be taken in this year’s draft, with Louisville’s Brendan McKay possibly selected 1st overall. Though he’s not draft eligible this year, Seth Beer of Clemson has been impressive during his first two seasons and should not have to wait long to be drafted in 2018.

Conference USA

Notes:

Matt Wallner of Southern Mississippi leads all freshman in wOBA and ISO, while Aaron Aucker of Middle Tennessee ranks 4th in the nation in wOBA. Brewer Hicklen, who also plays football for UAB, is draft eligible as a redshirt sophomore, but will likely get more attention in next year’s draft. With 12 stolen bases and seven homeruns this year, he has above average speed and great power potential.

Big 10

Notes:

The Big 10 does not have a ton of talent this year, but two of the conference’s most intriguing prospects are both 5’9 sophomores. Marty Costes of Maryland already has 18 homers in his career and holds the 3rd highest career ISO in the conference. Jawuan Harris of Rutgers is another dual sport athlete, also playing football for the Scarlet Knights. He has 57 career stolen bases, tied for 12th nationally, and seven long balls this season. If Harris can cut down on his K% (26% for career), he could garner some attention in next year’s draft.

Big 12

Notes:

At 6’4, Luken Baker is one of the top prospects in the country and leads the conference in career wOBA by a wide margin. Only a sophomore, he is not eligible for the draft this year, but was taken in the 37th round out of high school.

Big West

Notes:

27 Big West players were selected in the MLB draft last season and Keston Hiura figures to be the first Big West player off the board in this year’s draft. Hiura is a top 20 prospect, and it is easy to see why. The UC Irvine junior claims a top five wOBA nationally and has walked at a more frequent rate than he has struck out this season.

Pac 12

Notes:

According to Baseball America, Arizona’s JJ Matijevic is the conference’s top prospect, ranking 63rd overall. One of the top freshman is the nation, Cal’s Andrew Vaughn could become the conference’s top prospect in a couple of years as he leads the Pac 12 in homeruns and is 4th in wOBA.

SEC

Notes:

Lead by Jeren Kendall and Brent Rooker, the SEC has the second most talent this year behind only the ACC . Kendall is projected to be a top 15 pick this year, but it is tough to argue that Rooker has not been the more impressive college player. The Mississippi State junior leads the conference in wOBA (2nd in nation), ISO (1st in nation), and stolen bases.

The list below combines all of the players from the conference leaderboards for comparison purposes.

It was tough to create a list of players from all other conferences due to level of competition and quality of pitching, but here is my best attempt.

Notes:

The best player on this list is Jake Burger, who is expected to be a top 20 pick and ranks in the top 10 nationally in ISO and just outside the top 10 in wOBA. Hartford’s Erik Ostberg, while he only has 109 plate appearances, leads the nation in wOBA, BABIP, and walks 11% more than he strikes out. Another notable player from the Mountain West conference is New Mexico’s Louis Gonzalez, who Baseball America ranks as the 83rd best prospect.

If you want to view other conferences and players, I have also created a shiny app that you can access the data:

https://rs868412.shinyapps.io/ncaa_player_stats_career__2017_season/