One responsibility NFL scouts have is to take notes on prospects. Here are some plausible snippets:
Meaning the quarterback feels pressure when there is none.
This cornerback is a catch tackler.
This means the player waits for the ball carrier to initiate contact.
This player helped his backup scheme after he was injured.
Observing a player that’s been taken out of a game can still tell you information about his character.
We hypothesize that these scouting reports contain untapped information. Scouts may also assign grades to players based on their observations, this can be an overall grade or for specific attributes such as throwing accuracy. We aim to show that there is information in the reports that isn’t captured in the grades. We test this by checking if there is predictive power from the text for the success of the prospect.
Many scouts have years of experience and can detect minor differences between players. Analytical methods have focused on predicting player success from statistics, like college stats and combine measurements. Significant advancements happen when different disciplines collaborate, in this case: sports, stats, and natural language processing.
We combine statistics and scouting reports for two ends:
Our goal is NOT to replace scouts! That would waste information.
We collected data from 2010 to 2015. 2010 was the first year both grades and reports were available for players. 2015 was the last year allowing most players enough time to receive a second contract or be cut from the league.
We used data from the following sources:
The scouting report is the input which contains their grade and writeup about the players.
Predicted average annual salary of their second contract. If a player does not sign a second contract then their salary label is zero. To get more samples, we use 5th year options as second contracts if they haven’t signed anything else. If a player is cut and hasn’t been signed again, we assign a salary of zero as well.
From this we extract:
We retrieve salary data from spotrac.
Note: we correct for inflation by normalizing the salary.
We fit a model using machine learning. We used feature importances which measures how useful the feature was in predicting the contract. We made a word cloud and visualize the terms given to the model based on their feature importance. Each term is colored as follows:
Now let’s take a closer look at the term pocket strength.
The quarterbacks labeled in blue had the scout write something about them related to the pocket for a strength. Usually about how the quarterback is able to escape the pocket or how they have poise in the pocket. For example one scout wrote about Sam Bradford.
[Bradford] Has quick feet to sidestep the rush and the toughness to stand in the pocket and take a hit.
If we only look at Quarterbacks earning more than $7 million per year, about twice as many quarterbacks have a mention of pocket in their strengths then those that don't. This indicates that this scout is undervaluing the value of good pocket presence.
For wide receivers we’ll take a closer look at the size weakness.
Wide Receiver size" width="100%vw" />
Not surprisingly when scouts mention size as a weakness in wide receivers they say things like “only average size” or “size is a concern in the NFL”. For DeAndre Hopkins they wrote:
Only average size for a starting outside receiver…
From the chart there are 28 receivers with size mentioned as a weakness. However about half of those receivers earned more than $2 million per year on their second contract. This suggests that this scout is discounting undersized receivers too much. In fact, receivers had a better success rate with size as a weakness than receivers without.
Find our slides that we presented at CASSIS.