Reflection 1

The dataset I selected was a listing of the top 200 all-time swims in a variety of Olympic events. This is the link to access the dataset

After downloading the dataset and performing some initial data exploration, I determined the following:

  • The dataset has 13 columns:
    • Event Name
    • Swim time
    • Swim date
    • Event description
    • Team Code
    • Team Name
    • Athlete Full Name
    • Gender
    • Athlete birth date
    • Rank_Order
    • City
    • Country Code
    • Duration (hh:mm:ss:ff)
    • All of these are strings except for Rank_Order which is an int
  • The following columns hold redundant data:
    • Swim time & Duration (hh:mm:ss:ff)
    • Team Code & Team Name (Both store an identity of the country the athlete competes for)
    • Event description & Gender (The Event description states whether a race is M or F)
  • The data contains records of 26 Olympic events (13 men’s and 13 women’s):
    • 100m, 200m, 400m, 800m, 1500m freestyle
    • 100m, 200m backstroke
    • 100m, 200m breaststroke
    • 100m, 200m butterfly
    • 200m, 400m individual medley
    • With having 26 events each with 200 entries there are 5200 elements in the dataset

After this initial exploration, I started more analysis about the contents the dataset and determined:

  • There are 50 different countries holding at least 1 swimmer in the top 200 all time rankings with each country having an average of 104 swimmers on the list despite 50% of the countries have 8 or less swimmers all time
    • The US has the highest number of swimmers at 710
    • 9 Countries are tied with 1 swimmer
  • 50 Countries have had a top 200 time set on their soil
    • Italy leads all countries with 6 #1 times
    • The US leads in total with 710 top 200 times
  • There are 716 different athletes in the total top 20
    • Katie Ledecky leads with 218 appearances (4.2% of all and 8.4% of women’s appearances)
    • Michael Phelps leads the men with 86 appearances (1.7% of all and 3.4% of men’s appearances)

After this I created a heatmap of the countries with swimmers in the top 20 (200 was too much data) and the frequency of each place

Due to the large amount the US has in the top 10 it is hard to determine the difference between 0 and 1 swimmers






One response to “Reflection 1”

  1. A WordPress Commenter Avatar

    Hi, this is a comment.
    To get started with moderating, editing, and deleting comments, please visit the Comments screen in the dashboard.
    Commenter avatars come from Gravatar.

Leave a Reply

Your email address will not be published. Required fields are marked *