Overview of Data Exploration

In the explore section, we would like to walk you through our data exploration process and help you to get an insight of how we drafted our models and how we selected variables through various types of visualizations.

Destination Airports

content = paste(sep = "<br/>",
  "<b><a href='https://www.jfkairport.com/'>John F. Kennedy International Airport</a></b>",
  "Jamaica, Queens",
  "New York, NY 11430"
)

all %>% 
  leaflet() %>% 
  addTiles() %>%
  setView(-95, 39, zoom = 4) %>% 
  addCircleMarkers(~ long, ~ lat, radius = ~ scales::rescale(count, c(1, 10)), color = "rgb(255, 65, 54)", 
                   label = paste("Destination: ", all$municipality, " Count: ", all$count)) %>% 
  addPopups(-73.77932, 40.63945, content, options = popupOptions(closeButton = TRUE))

Here is an interactive map showing 66 destination airports which were departed from JFK with records of delay and/or cancellation from Nov, 2021 to Jan, 2022.
To keep our models parsimonious, we did not include destination airport as a predictor. However, it is still an interesting factor to explore. We investigated the distribution of delay and cancellation counts by each destination airports, and checked if the distribution differs by different airlines. Moreover, we found that LAX and SFO have outstanding counts of delay and cancellation and we took a closer look at the underlying factors behind those delays and cancellations.

Cancellation and Delay

We then created a Shiny App for the audience to engage in our data exploration process. The audience could select which airline and which month they concern and get the user-selected outputs. In the Cancellation tab, you could observe the number of cancellations and the number of COVID cases on each day of a month. In the Delay tab, you could observe the number of delays and the average delay time (in minutes) on each day of a month.

Categorical Predictors

Delay time is one of our outcome of interests and we decided to conduct a linear regression model. Besides the main effects, we would like to check if there are any significant effect modifiers in our model. In this part, we investigated the interaction between the categorical predictors, including:

Times of the Day
Months
Airlines

Continuous Predictors

In our drafted linear regression model, there are a few continuous predictors:

Carrier Delay
Extreme Weather Delay
Late Arrival Delay
National Aviation System (NAS) Delay
Security Delay
Temperature
Humidity
Visibility
Wind Speed

We were also interested in whether our continuous predictors could have different effects on different levels of times of the day, months, or airlines. Through the plots with different strata, there could be a few trends to explore.