We investigate 120 years of data on all Olympic athletes from Athens 1896 to Rio 2016. Our primary aim is to generally explore if the Olympics are an “even playing field,” or if athletes from some countries have systematic biases in their favor.
We begin by reading in Olympic data. We glimpse()
the first few rows to get a sense of what the data contains. We also generate a new column iso_a3
, which contains country codes in the ISO 3166-1 alpha-3 format. These country codes will be used to precisely join additional data on development indicators later in our analysis.
## Rows: 271,116
## Columns: 18
## $ id <int> 1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, ...
## $ name <chr> "A Dijiang", "A Lamusi", "Gunnar Nielsen Aaby", "Edgar...
## $ sex <chr> "M", "M", "M", "M", "F", "F", "F", "F", "F", "F", "M",...
## $ age <int> 24, 23, 24, 34, 21, 21, 25, 25, 27, 27, 31, 31, 31, 31...
## $ height <dbl> 180, 170, NA, NA, 185, 185, 185, 185, 185, 185, 188, 1...
## $ weight <dbl> 80, 60, NA, NA, 82, 82, 82, 82, 82, 82, 75, 75, 75, 75...
## $ team <chr> "China", "China", "Denmark", "Denmark/Sweden", "Nether...
## $ noc <chr> "CHN", "CHN", "DEN", "DEN", "NED", "NED", "NED", "NED"...
## $ games <chr> "1992 Summer", "2012 Summer", "1920 Summer", "1900 Sum...
## $ year <dbl> 1992, 2012, 1920, 1900, 1988, 1988, 1992, 1992, 1994, ...
## $ season <chr> "summer", "summer", "summer", "summer", "winter", "win...
## $ city <chr> "Barcelona", "London", "Antwerpen", "Paris", "Calgary"...
## $ sport <chr> "Basketball", "Judo", "Football", "Tug-Of-War", "Speed...
## $ event <chr> "Basketball Men's Basketball", "Judo Men's Extra-Light...
## $ medal <chr> NA, NA, NA, "Gold", NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ region <chr> "China", "China", "Denmark", "Denmark", "Netherlands",...
## $ host_country <chr> "Spain", "UK", "Belgium", "France", "Canada", "Canada"...
## $ iso_a3 <chr> "chn", "chn", "dnk", "dnk", "nld", "nld", "nld", "nld"...
We begin our analysis by seeking evidence of potential biases for high performance at the Olympics.
Perhaps the most basic indication that certain countries may be favored are simple visualizations of the all-time count of medals won by each country.
Let’s visualize where these countries are.