We would like to conduct a linear regression on delay time in minutes. In this case, we would need to check if the interactions between some categorical predictors including times of the day, months, and airlines, are significantly associated with the total delay time in minutes. To achieve it, we did some visualizations. For the tidiness of visualizations, we have adjusted the range of axis.
Categorical predictors are:
Times of the Day (no scheduled hour for flight falls within 0:00 - 4:59 in the raw dataset)
Morning: if scheduled hour for flight falls within 5:00 - 8:59
Noon: if scheduled hour for flight falls within 9:00 - 13:59
Afternoon: if scheduled hour for flight falls within 14:00 - 17:59
Night: if scheduled hour for flight falls within 18:00 - 23:59
Months
Nov: November, 2021
Dec: December, 2021
Jan: January, 2022
Airlines
Alaska Airlines
American Airlines
Delta Air Lines
Endeavor Air
JetBlue Airways
Republic Airways
United Air Lines
hour_air = function(air){
raw_df %>%
filter(
airline == air) %>%
mutate(
mean = mean(delay),
hour_c = fct_reorder(hour_c, delay)
) %>%
plot_ly(x = ~hour_c, y = ~delay, color = ~hour_c,
type = "box", mode = "markers", alpha = .5) %>%
layout(
xaxis = list(title = "Time of the Day"),
yaxis = list(title = "Delay Time (minutes)", range = c(0, 180)))
}
month_air = function(air){
raw_df %>%
filter(
airline == air) %>%
mutate(
month = fct_reorder(month, delay)
) %>%
plot_ly(x = ~month, y = ~delay, color = ~month,
type = "box", mode = "markers", alpha = .5) %>%
layout(
xaxis = list(title = "Month"),
yaxis = list(title = "Delay Time (minutes)", range = c(0, 180)))
}
hour_month = function(mon){
raw_df %>%
filter(
month == mon) %>%
mutate(
hour_c = fct_reorder(hour_c, delay)
) %>%
plot_ly(x = ~hour_c, y = ~delay, color = ~hour_c,
type = "box", mode = "markers", alpha = .5) %>%
layout(
xaxis = list(title = "Time of the Day"),
yaxis = list(title = "Delay Time (minutes)", range = c(0, 180)))
}
hour_month_air = function(air){
raw_df %>%
filter(
airline == air) %>%
mutate(
month = fct_reorder(month, date)) %>%
plot_ly(x = ~hour_c, y = ~delay, color = ~month,
type = "box", mode = "markers", alpha = .5) %>%
layout(
boxmode = "group",
xaxis = list(title = "Time of the Day"),
yaxis = list(title = "Delay Time (minutes)", range = c(0, 180)))
}
Based on the graphs, we observed that between-group differences existed, and adding interaction terms between the categorical predictors could be one of the options for building the linear regression model.
We found that there could be a significant interaction between: