We would like to conduct a linear regression on delay time in minutes. In this case, we would need to check if the interactions between some continuous predictors and airlines, months, times of the day, respectively, are significantly associated with the delay time. To achieve it, we did some visualizations. For the tidiness of visualizations, we have adjusted the range of axis.
Continuous predictors are:
Types of Delay (in minutes; five categories in total, determined by the U.S. Department of Transportation, reported by the Bureau of Transportation)
Carrier Delay: The cause of the cancellation or delay was due to circumstances within the airline’s control (e.g. maintenance or crew problems, aircraft cleaning, baggage loading, fueling, etc.)
Extreme Weather Delay: Significant meteorological conditions (actual or forecasted) that, in the judgment of the carrier, delays or prevents the operation of a flight such as tornado, blizzard or hurricane
Late Arrival Delay: A previous flight with same aircraft arrived late, causing the present flight to depart late
National Aviation System (NAS) Delay: Delays and cancellations attributable to the national aviation system that refer to a broad set of conditions, such as non-extreme weather conditions, airport operations, heavy traffic volume, and air traffic control
Security Delay: Delays or cancellations caused by evacuation of a terminal or concourse, re-boarding of aircraft because of security breach, inoperative screening equipment and/or long lines in excess of 29 minutes at screening areas
Weather Specific
Temperature: Hourly dry bulb temperature (°F)
Humidity: Hourly relative humidity (%)
Visibility: Hourly visibility
Wind Speed: Hourly wind speed (mph)
cont_airline = function(cont){
airline = raw_df %>%
mutate(
text_label = str_c("Airline: ", airline)
) %>%
plot_ly(x = ~cont, y = ~delay, color = ~airline,
text = ~text_label, hoverinfo = "text",
type = "scatter", mode = "markers", alpha = .5)
}
cont_month = function(cont){
month = raw_df %>%
mutate(
text_label = str_c("Month: ", month),
month = fct_reorder(month, date)) %>%
plot_ly(x = ~cont, y = ~delay, color = ~month,
text = ~text_label, hoverinfo = "text",
type = "scatter", mode = "markers", alpha = .5)
}
cont_hour = function(cont){
hour = raw_df %>%
mutate(
text_label = str_c("Time: ", hour_c)) %>%
plot_ly(x = ~cont, y = ~delay, color = ~hour_c,
text = ~text_label, hoverinfo = "text",
type = "scatter", mode = "markers", alpha = .5)
}
Based on the graphs, we found that there could be two additional significant interactions between:
Carrier Delay * Airline
Temperature * Month
For the following statistical analysis, we would focus on these interaction terms to see if they are necessary to be included in our linear regression model.