I've tweaked this article a bit, and it's always a must read before the baseball season gets going.
One of baseball's most special qualities is the ability to make incredibly effective use of statistics to evaluate performance. Unlike other sports, Baseball can basically be boiled down to numerous one-on-one match-ups throughout the course of a game, and that is why baseball is considered such a statistics driven sport. A batter hitting a home run relies on nothing but himself to score a run, and you won't find that ability to score without any help from your teammates in any other sport.
Baseball has always been a sport where analysis is heavily reliant on statistics, but recently, the effectiveness of the old school statistics have come into question as different metrics are created and studied. Many analysts/writers are making the transition to the newer metrics and stats, but there are still plenty of the old guard who refuse to change their ways. With this in mind, I decided to delve into the debate between the old school and new school and explain it as simply as possibly.
WARNING: This may be a struggle to fight through at points. Feel free to take your time.
Old School Pitching Stats
BABIP – Batting Average on Balls in Play
This is not an old school stat, but it is necessary to understand it since I will use it in the analysis of multiple statistics in this section. BABIP is the batting average on balls hit into the field of play. The historical average for pitchers has always balanced out around .300. Any variance from this number shows that the pitcher has either been getting lucky (lower BABIP meaning hard hit balls being right at people, etc) or been unlucky (Higher BABIP meaning more balls sneaking through the infield, etc). Here is a quick example of why this is important. A pitcher gives up 5 hits on 5 shattered bat bloopers. Has the pitcher pitched badly? No. The pitcher has pitched well, but just gotten unlucky. Obviously his BABIP is going to be 1.000 and comparing that to the .300 norm shows that it was unlucky. You can look at a pitcher’s BABIP during the season and be able to tell whether the pitcher is due for a regression or an improvement based on the BABIP returning to the mean of .300.
Pitcher Win-Loss Record
The biggest battles in the baseball community have been over Win-Loss record. It defies logic, but Win-Loss record is the most overused stat in baseball. It doubles as the most overrated. Wins measure a team’s success, not an individual’s performance.
I won’t go too deeply because there is a fantastic article written by Keith Law of ESPN’s Scouts Inc that covers this exact issue.
Basically, Law goes through and examines how much affect/responsibility a starting pitcher has for a team win in any particular game. The answer is less than 30%. This quote from the article sums it up perfectly.
If you tell me a starting pitcher went 15-10, the only thing I can tell you is that he appeared in 25 games that year. I don't know if he pitched well, if he pitched really well but got lousy run support, or if he pitched poorly but played for an offensive powerhouse with good defenders.
Pitcher wins apply a team outcome to an individual player in a sport where no one individual player can win a game.
So why on Earth would you use Win-Loss record to evaluate a a pitcher's performance?
ERA – Number of earned runs per 9 innings
It's not a bad stat by itself, but there are two problems with ERA. Both involve an inability to separate what the pitcher actually controls from what is out of his control.
1. ERA has no way for accounting for luck or the contributions of the defenders in the field. The pitcher could be benefiting from a low BABIP and good defense or vice versa. The effect BABIP has on ERA is by far the biggest issue because BABIP can completely skew the resulting ERA.
1. ERA has no way for accounting for luck or the contributions of the defenders in the field. The pitcher could be benefiting from a low BABIP and good defense or vice versa. The effect BABIP has on ERA is by far the biggest issue because BABIP can completely skew the resulting ERA.
2. using errors to decide whether a run is unearned or earned is no longer the best way to determine the defensive value behind the pitcher (Explained later). It is based upon an official scorer’s decision that can often vary depending on whether the hitter is on the home team or not. Official scorers are more likely to give home batters credit for hits while the opposite is true for visiting batters.
WHIP – (Walks + Hits) / Innings Pitched
A better determinant than ERA, but still flawed. WHIP still doesn’t account for the luck of balls hit into play. Also, the denominator should be At Bats and not innings. An inning is immeasurable in length. It could be a pitcher gives up a hit every four batters (three outs). Or it could be a pitcher gives up a hit, two errors, and three outs. The pitcher's WHIP in each scenario would be the same, but the pitcher was more effective in the second scenario (one hit per six batters as opposed to one hit per four).
QS – Quality Starts
Six innings with three earned runs or less.
The issues with this stat are as follows.
1. Six innings and three earned runs measures out to a 4.50 ERA.
2. Four earned runs in eight innings is not a QS, but it results in the same ERA (4.50 ERA) as six innings with three earned runs and saves the bullpen two extra innings.
3. Worst of all, a complete game four earned run performance is not a QS even though it results in a 4.00 ERA and saves the bullpen three more innings.
New School Pitching Metrics
FIP - Fielder Independent Pitching
FIP is basically a metric designed to measure factors that the pitcher can control. The key elements of FIP are walks, strikeouts, and homeruns. The calculation is then adjusted using a .300 BABIP to look similar to ERA. As with most stats, FIP is not perfect. My problems with FIP are that it doesn’t account for how hard balls are hit. For example, a double off the wall is not considered any differently than an infield hit. Regardless, FIP is still one of the best ways to measure a pitcher’s performance.
WAR – Wins Above Replacement
The best all around tool for evaluating pitchers. Uses innings pitched, FIP, and adjustments for ballpark, WAR measures a pitcher's overall value to his team. It is still flawed due to not taking innings as much into account as I would like and is based on FIP, but it is still the best we have so far.
Go here (scroll to the bottom) to read the seven part explanation of pitcher WAR.
Old School Hitting Stats
RBI
This one will hurt some people’s feelings, but RBI is another useless stat. It is too too reliant on the performance of other players on the team to be an adequate measure of individual performance. In order to pile on the RBIs, a hitter has to hope his teammates have success in front of him. Without that, he will struggle in the RBI category. The hitter himself has no control over his teammates.
Extreme Example: A batter could come up to the plate 100 times in a row with a man on third. He hits 100 weak ground balls to second base and gets 100 RBI. Another batter could come up 100 times with the bases empty and hit 100 triples. He would have no RBIs. Which hitter would have performed better? Obviously, the answer is the batter who hit 100 triples.
Runs
Same premise as RBI. The batter who gets on base has no control over whether the other hitters in the lineup drive him in.
On Base Percentage – (Walks+Hits)/Plate appearances
Not to be overly simplistic, but OBP is the most important stat in hitting because the basic goal of hitting is to minimize outs. Statistical analysis has measured the impact each of the simple statistics on a team's likelihood to score runs. OBP had a greater correlation to a team scoring runs than the other "simple" rate stats (Batting Average and Slugging Percentage).
New Age Hitting Metrics
OPS -- On Base Percentage + Slugging Percentage
Easiest of the “new” stats because it only requires you to add On Base Percentage and Slugging Percentage. The downside of OPS is that it does not give appropriate weight to OBP and also overvalues Slugging Percentage.
Unless you are a statistician or math whiz, you'll just have to trust the math that goes into these next stats. They are mainstream enough for there not to be any issues with incorrect equations, etc.
wOBA – Weighted On-Base Average
My favorite pure batting metric. Basically, it is a metric designed to measure and weight a hitter’s performance in all aspects and is scaled to look like OBP. This article explains wOBA better than I ever could.
Hitter WAR -- Wins Above Replacement
The most complete indicator of a player’s worth. WAR calculates how many wins a player is worth to a team over the generic replacement level player. WAR uses wOBA for the hitter’s offensive value, adds UZR for defensive value, adds a replacement level calculation, and then adds or subtracts for positional adjustments. The best example of how positional adjustments work is that catchers gets credit for playing a position where good hitters are rare, while a first baseman is not given much credit for playing a position loaded with great hitters.
Links to the explanations
Defensive Stats
Errors and Fielding %
Fielding percentage has a major flaw in that it doesn’t measure how many balls a fielder's range allows him to reach. This is how Manny Ramirez and Pat Burrell end up looking like good fielders using Fielding percentage. Fielding percentage doesn’t account for big slow outfielders or middle infielders with very little range. On the other hand, a player like Troy Tulowitzki will get to tons of balls because of his range, but he may make more errors since he puts himself into position to make tough plays more often.
Errors are also based on the subjective viewpoint of the official scorer for the game. Many times the decision of hit or error depends on whether you are the home team or the visiting team.
UZR -- Ultimate Zone Rating
Very complicated stats, but I’ll try to go for a simple summation. Basically, UZR divides the field into zones. Using those zones, the fielder is evaluated not only on whether he made the play but also how far he had to go to get to the ball. Simply put, it evaluates range as well as sure-handedness.
In summary, the biggest issue with many of the old school statistics is that they allow too much mixing of team factors into stats that are supposed to measure individual performance. In order to properly evaluate players, there has to be a separation between team and individual factors. While I have given you what I believe are the best stats to use for measuring individual performance, the fact remains that none of them are perfect. The best way to evaluate players will always be by using multiple statistics and taking a totality of the circumstances approach.
Where the hell did BABIP come from? Never heard of such. While i agree W/L stats can be misleading i believe the ERA is the ultimate stat for a pitcher. I am old school and dont give a shit what these ridiculous new stats stand for. One of YOUR negatives on ERA is luck? Come on Doe luck is a part of the game. Its part of every game.
ReplyDeleteJohn, while I that most of the stats we use today are outdated, I think runs scored should stay the same. Yes, you rely heavily on the guys behind you to drive you in, but what if you are a guy that doesn't rely on others to do that. Guys like Elsbury and Crawford create there own runs. They hit doubles/triples with less than 2 outs, they steal bases and they can hit the ball out of the park. So they do rely on others but they put themselves in situations to score better than anyone else. I don't know how to separate those guys from guys like Big Poppi who are only going base to base no matter where the ball is hit.
ReplyDeleteBABIP isn't very new. It's pretty simple.
ReplyDeleteYes luck is something that good individual measures do their best to remove. There are certain things specifically under the pitcher's control. Thus, those should be the most important measures for that pitcher's performance. ERA isn't terrible like some of the stats, but FIP and WAR are better measures.
There are better measures that weigh on an individual player though. Sure, those things you listed help, but they aren't enough to have an effect that would justify the drawbacks (relying on teammates).
When you take steals and caught stealing into account, steal attempts are actually more likely to decrease a team's chance of scoring than help.
There are actually measures for baserunning and speed as a whole. However, the fact of the matter is that OBP is a much more directly related to a team's scoring chances.