Wednesday, November 3, 2010

A Rundown of Baseball Statistics and the Sabremetrics Movement

Casual baseball fans may feel this way at points in this article
One of baseball's most special qualities is the ability to make incredibly effective use of statistics to evaluate performance. Unlike other sports, Baseball can basically be boiled down to numerous one-on-one match-ups throughout the course of a game, and that is why baseball is considered such a statistics driven sport.  A batter hitting a home run relies on nothing but himself to score a run, but you won't find that ability to score without any help from your teammates in any other sport. 

Baseball has always been a sport where analysis is heavily reliant on statistics, but recently, the effectiveness of the old school statistics have come into question as different metrics are created and studied. Many analysts/writers are making the transition to the newer metrics and stats, but there are still plenty of the old guard who refuse to make the change. With this in mind, I decided to delve into the debate between old school and new school and explain it as simply as possibly.


WARNING: This may be a struggle to fight through at points. Feel free to take your time.

Old School Pitching Stats

BABIP – Batting Average on Balls in Play

This is not an old school stat, but it is necessary to understand it since I will use it in my analysis of the stats in this section. BABIP is the batting average on balls hit into the field of play. The historical average for pitchers has always balanced out around .300. Any variance from this number shows that the pitcher has either been getting lucky (lower BABIP meaning hard hit balls being right at people, etc) or being unlucky (Higher BABIP meaning more balls sneaking through the infield, etc). Here is a quick example of why this is important. A pitcher gives up 5 hits on 5 shattered bat bloopers. Has the pitcher pitched badly? No. The pitcher has pitched well, but just gotten unlucky. Obviously his BABIP is going to be 1.000 and comparing that to the .300 norm shows that it was unlucky. You can look at a pitcher’s BABIP during the season and be able to tell whether the pitcher is due for a regression or an improvement just based on the BABIP coming back to the mean of .300.


Pitcher Win-Loss Record

The biggest battles in the baseball community have been over Win-Loss record. Personally, I don't know how this can even be a discussion because Win-Loss record is the most overrated/overused stat in baseball. Wins measure a team’s success, not an individual’s performance.

I won’t go too deeply because there is a fantastic article written by Keith Law of ESPN’s Scouts Inc that covers this exact issue.

Basically, Law goes through and examines how much affect/responsibility a starting pitcher has for a team win in any particular game. The answer is less than 30%. This quote from the article sums it up perfectly.

If you tell me a starting pitcher went 15-10, the only thing I can tell you is that he appeared in 25 games that year. I don't know if he pitched well, if he pitched really well but got lousy run support, or if he pitched poorly but played for an offensive powerhouse with good defenders.
Pitcher wins apply a team outcome to an individual player in a sport where no one individual player can win a game.

So why on earth would you use Win-Loss record to evaluate a a pitcher's performance?


ERA – Number of earned runs per 9 innings

There are two problems with ERA. Both involve an inability to separate what the pitcher actually controls from what is out of his control.
1.      1.  ERA has no way for accounting for luck involved. The pitcher could be benefiting from a low BABIP and good defense or vice versa. The effect BABIP has on ERA is by far the biggest issue because BABIP can completely skew the resulting ERA.
2.      2. Using errors to decide whether a run is unearned or earned is no longer the best way to determine the defensive value behind the pitcher (Explained later). It is based upon an official scorer’s decision that can often vary depending on whether the hitter is on the home team or not. Official scorers are more likely to give home batters credit for hits while the opposite is true for visiting batters.


WHIP – Walks + Hits/Innings Pitched

A better determinant than ERA, but still flawed. WHIP still doesn’t account for the luck of balls hit into play. Worse, it should measure per batter faced. An inning is immeasurable in length. It could be a pitcher gives up a hit every 4 batters. Or it could be a pitcher gives up a hit, 2 errors and 3 outs. That would be one hit per 6 batters


QS – Quality Starts 
6 innings 3 ERs or less.

 The issues with this stat are as follows. 
1. First, 6 innings and 3 earned runs measures out to a 4.50 ERA.
2. 4 runs in 8 innings is not a QS, but it results in the same ERA (4.50 ERA) as 6 innings 3 ER and saves the bullpen two extra innings. 
3. Worst of all, a complete game 4 run performance is not a QS even though it results in a 4.00 ERA and saves the bullpen three more innings.


New School Pitching Metrics


FIP - Fielder Independent Pitching

FIP is basically a metric designed to measure factors that the pitcher can control. The key elements of FIP are walks, strikeouts, and homeruns. The calculation is then adjusted using a .300 BABIP to look similar to ERA. As with most stats, FIP is not perfect. My problems with FIP are that it doesn’t account for how hard balls are hit. For example, a double off the wall is not considered any differently than an infield hit. Regardless, FIP is still one of the best ways to measure a pitcher’s performance.


WAR – Wins Above Replacement

The best all around tool for evaluating pitchers. Uses innings pitched, FIP, and adjustments for ballpark, WAR measures a pitcher's overall value to his team. Still flawed due to not taking innings as much into account as I would like and is based on FIP, but it is still the best we have so far.

Go here (scroll to the bottom) to read the seven part explanation of pitcher WAR. 

Old School Hitting Stats

RBI

This one will hurt some people’s feelings, but RBI is another useless stat. They are too reliant on the performance of other players on the team to be an adequate measure of individual performance.
Extreme Example: A batter could come up to the plate 100 times in a row with a man on third. He hits 100 weak ground balls to second base and gets 100 RBI. Another batter could come up 100 times with the bases empty and hit 100 triples. He would have no RBIs. Which hitter would have performed better? Obviously, the answer is the batter who hit 100 triples. 


Runs

Same premise as RBI. The batter who gets on base has no control over whether the other hitters in the lineup drive him in. 


On Base Percentage – (Walks+Hits)/Plate appearances

Not to be overly simplistic, but OBP is the most important stat in hitting because the basic goal of hitting is to minimize outs. Statistical analysis has proven this by measuring the increases in likelihood that a team will score a run that come from different hitting outcomes. OBP had a greater correlation to a team scoring runs than the other "simple" rate stats (Batting Average and Slugging Percentage).


New Age Hitting Metrics

OPS -- On Base Percentage + Slugging Percentage

Easiest of the “new” stats  because it only requires you to add On Base Percentage and Slugging Percentage. The downside of OPS is that it does not give appropriate weight to OBP and also overvalues Slugging Percentage. 

Unless you are a statistician or math whiz, you'll just have to trust the math that goes into these next stats. They are mainstream enough for there not to be any issues with incorrect equations, etc.

wOBA – Weighted On-Base Average

My favorite pure batting metric. Basically, it is a metric designed to measure and weight a hitter’s performance in all aspects. It is scaled to look like OBP. This article explains wOBA better than I ever could


Hitter WAR -- Wins Above Replacement

The most complete indicator of a player’s worth. WAR calculates how many wins a player is worth to a team over the generic replacement level player. WAR uses wOBA for the hitter’s offensive value, adds UZR for defensive value, adds a replacement level calculation, and then adds or subtracts for positional adjustments. The best example of how positional adjustments work is that catchers gets credit for playing a position where good hitters are rare, while a first baseman is not given much credit for playing a position loaded with great hitters.

Links to the explanations


Defensive Stats

Errors and Fielding %

Fielding percentage has a major flaw in that it doesn’t measure how many balls a fielder gets to. This is how Manny Ramirez and Pat Burrell end up looking like good fielders using Fielding %. Fielding percentage doesn’t account for big slow outfielders or middle infielders with very little range. On the other hand, a player like Troy Tulowitzki will get to tons of balls because of his range, but he may make more errors since he puts himself into position to make tough plays more often.

Errors are also based on the subjective viewpoint of the official scorer for the game. Many times the decision of hit or error depends on whether you are the home team or the visitors.


UZR -- Ultimate Zone Rating

Very complicated stats, but I’ll try to go for a simple summation. Basically, UZR divides the field into zones. Using those zones, the fielder is evaluated not only on whether he made the play but also how far he had to go to get to the ball. Simply put, it evaluates range as well as sure-handedness.  



In summary, the biggest issue with many of the old school statistics is that they allow too much mixing of team factors into stats that are supposed to measure individual performance. In orderly to properly evaluate players, there has to be a separation between team and individual factors. While I have given you what I believe are the best stats to use for measuring individual performance, the fact remains that none of them are perfect. This is most true with pitching stats since there really is not great metric for pitcher evaluation. Just make sure wins and losses never play any role.

No comments:

Post a Comment