# Stardard libraries

library(tidyr)
library(dplyr)
library(ggplot2)
library(nycflights13)
library(plyr)

# Allowing local data frame access to flights, weather and planes datasets for better visualization
flights <- tbl_df(flights)
weather <- tbl_df(weather)
planes <- tbl_df(planes)

Problem: Exploring the NYC Flights Data

In this problem set we will continue to use the data on all flights that departed NYC (i.e. JFK, LGA or EWR) in 2013. You can find this data as part of the nycflights13 R package. Data includes not only information about flights, but also data about planes, airports, weather, and airlines. Load the data and use it to answer the following questions.

(a) Flights are often delayed. Perform an exploratory data analysis to address each of the following questions:
  • What was the worst day to fly out of NYC in 2013 if you dislike delayed flights?

If I disliked delayed flights, the worst day for me to fly out of NYC in 2013 would be the one with the highest value of average departure delay per flight on that day (Only flights with positive departure delays included)

As we can see in the scatterplot above, the worst day to fly out of NYC would be the value on X-axis corresponding to the highest value for Y-axis, which in this case seems to be somewhere around March. To obtain the accurate date, we can look up the dataset arranged in descending order of mean departure delays.

##         date mean_delay
## 1 2013-03-08   102.9035

Thus, we can see that our observation from the scatterplot was correct and the worst day to fly out of NYC in 2013 according to our logic is indeed in March, i.e., 2013-03-08 (8th March, 2013)

  • Are there any seasonal patterns in departure delays for flights from NYC?

Taking 3 months as a season as it occurs in NYC -

  1. Winter (December - February)

  2. Spring (March - May)

  3. Summer (June - August)

  4. Fall (September - November)

Thus, we can see in the bar graph that the average delay is the highest during summer (June - August) and the lowest average delay is observed during Fall (September - November)

  • On average, how do departure delays vary over the course of a day?

As we can see on the smooth curve, the mean delay is pretty much inversely proportional to the hour (except for a few outliers that can be seen on the path). Thus, we can infer that mean departure delay decreasing as the day progresses from 12:00

(b) Flight delays are often linked to weather conditions. How does weather impact flights from NYC? Utilize both the flights and weather datasets to explore this question. Include at least one visualization to aid in communicating what you find.

As we can see in the diagram above, the function of departure delay v/s pressure is pretty much like a heap, with the departure delay being at its maximum value when the pressure is around 1150 torr and the delay reduces as the pressure increases or decreases from that value (except for a few outliers.)

Similarly, the function of Air time of a flight v/s the wind speed at the origin forms a heap-like structure with the air time being at its maxima when the wind speed is around 7. As and when the wind speed increases or decreases from 7, the air time declines

The graph of air time of a flight versus wind direction resembles a sinusodial curve, with the air time attaining local maximas when the air direction is 0 or 180 degrees (plus or minus 360), and local minimas when the air direction is 90 or 270 degrees (plus or minus 360).

(c) Flight performance may also be impacted by the aircraft used. Do aircrafts with certain characteristics (e.g. manufacturer) demonstrate better performance? Utilize both the flights and planes datasets to explore this question. Include at least one visualization to aid in communicating what you find.

Thus, we can see that the departure delay highly depends on the manufacturer, with AIRBUS INDUSTRIE having one of the least departure delays, whereas EMBARER having the highest departure delay. This suggests that EMBARER air planes need more time to set the plane ready before they take off, whereas AIRBUS INDUSTRIE is one of those which are more optimal in the same aspect.

Similarly, we can see that the departure delay highly depends on the model of the plane as well, with 767-223 being one of those having the least departure delay, whereas A320-232 having the highest departure delay. This suggests that A320-232 air planes need more time to set the plane ready before they take off, whereas 767-223 is one of the more optimal ones in the same aspect.

Conclusion - As one can expect, there is a strong impact of the planes and the weather at origin on the flights data. R helps us identify the ones with the strongest effect among all.


In case you need to contact me, please feel free to shoot me an email at rohan27@uw.edu

Thank you for viewing!