We would like to conduct a linear regression on delay time in minutes. In this case, we would need to check if the interactions between some continuous predictors and airlines, months, times of the day, respectively, are significantly associated with the delay time. To achieve it, we did some visualizations. For the tidiness of visualizations, we have adjusted the range of axis.


Continuous predictors are:


cont_airline = function(cont){
  
  airline = raw_df %>% 
    mutate(
      text_label = str_c("Airline: ", airline)
    ) %>% 
    plot_ly(x = ~cont, y = ~delay, color = ~airline,
            text = ~text_label, hoverinfo = "text",
            type = "scatter", mode = "markers", alpha = .5)
}

cont_month = function(cont){
  month = raw_df %>%
    mutate(
      text_label = str_c("Month: ", month),
      month = fct_reorder(month, date)) %>% 
    plot_ly(x = ~cont, y = ~delay, color = ~month,
          text = ~text_label, hoverinfo = "text",
          type = "scatter", mode = "markers", alpha = .5)
}
  
cont_hour = function(cont){
  hour = raw_df %>% 
    mutate(
      text_label = str_c("Time: ", hour_c)) %>% 
        plot_ly(x = ~cont, y = ~delay, color = ~hour_c,
          text = ~text_label, hoverinfo = "text",
          type = "scatter", mode = "markers", alpha = .5)
}

Interaction for Continuous Predictors

Types of Delay

Carrier Delay

Extreme Weather Delay

Late Arrival Delay

NAS Delay

Security Delay

Weather Specific

Temperature

Humidity

Visibility

Wind Speed


Interpretation

Based on the graphs, we found that there could be two additional significant interactions between:

  • Carrier Delay * Airline

  • Temperature * Month

For the following statistical analysis, we would focus on these interaction terms to see if they are necessary to be included in our linear regression model.