We would like to conduct a linear regression on delay time in minutes. In this case, we would need to check if the interactions between some categorical predictors including times of the day, months, and airlines, are significantly associated with the total delay time in minutes. To achieve it, we did some visualizations. For the tidiness of visualizations, we have adjusted the range of axis.


Categorical predictors are:


hour_air = function(air){
  raw_df %>% 
  filter(
    airline == air) %>%     
  mutate(
    mean = mean(delay),
    hour_c = fct_reorder(hour_c, delay)
    ) %>% 
  plot_ly(x = ~hour_c, y = ~delay, color = ~hour_c,
          type = "box", mode = "markers", alpha = .5) %>% 
    layout(
      xaxis = list(title = "Time of the Day"),
      yaxis = list(title = "Delay Time (minutes)", range = c(0, 180)))
}

month_air = function(air){
  raw_df %>% 
  filter(
    airline == air) %>%     
  mutate(
    month = fct_reorder(month, delay)
    ) %>% 
  plot_ly(x = ~month, y = ~delay, color = ~month,
          type = "box", mode = "markers", alpha = .5) %>% 
    layout(
      xaxis = list(title = "Month"),
      yaxis = list(title = "Delay Time (minutes)", range = c(0, 180)))
}

hour_month = function(mon){
raw_df %>% 
  filter(
    month == mon) %>%     
  mutate(
    hour_c = fct_reorder(hour_c, delay)
    ) %>% 
  plot_ly(x = ~hour_c, y = ~delay, color = ~hour_c,
          type = "box", mode = "markers", alpha = .5) %>% 
    layout(
      xaxis = list(title = "Time of the Day"),
      yaxis = list(title = "Delay Time (minutes)", range = c(0, 180)))
}

hour_month_air = function(air){
  raw_df %>% 
    filter(
      airline == air) %>%
    mutate(
      month = fct_reorder(month, date)) %>% 
    plot_ly(x = ~hour_c, y = ~delay, color = ~month,
            type = "box", mode = "markers", alpha = .5) %>% 
    layout(
      boxmode = "group",
      xaxis = list(title = "Time of the Day"),
      yaxis = list(title = "Delay Time (minutes)", range = c(0, 180)))
}

Interaction between Categorical Predictors

Time*Airline

Before Stratification by Airline

Stratification by Airline

Alaska
American
Delta
Endeavor
JetBlue
Republic
United

Month*Airline

Before Stratification by Airline

Stratification by Airline

Alaska
American
Delta
Endeavor
JetBlue
Republic
United

Month*Time

Before Stratification by Month

Stratification by Month

Nov
Dec
Jan

Three-ways Interaction

Before Stratification by Airline

Stratification by Airline

Alaska
American
Delta
Endeavor
JetBlue
Republic
United


Interpretation

Based on the graphs, we observed that between-group differences existed, and adding interaction terms between the categorical predictors could be one of the options for building the linear regression model.

We found that there could be a significant interaction between: