There are 66 and 65 destination airports in the delay and
cancellation datasets, respectively. For the efficiency purpose in
statistical analysis, we do not include destination airport as a
predictor in our models.
However, it is still an
interesting factor to explore.
First we would like to check if delay and cancellation counts differ in different destination airports.
Flights from JFK to LAX have the highest delay occurrences with a number of 2293 and flights to BGR have the highest delay occurrences with a number of 6.
Flights from JFK to SFO have the highest cancellation occurrences with a number of 76 and flights to BZN have the highest cancellation occurrences with a number of 1.
delay_dest = function(dest){
delay %>%
filter(
airline_name == dest
) %>%
group_by(destination_airport) %>%
summarize(
count = n()
) %>%
mutate(
destination_airport = fct_reorder(destination_airport, count),
text_label = str_c("Airport: ", destination_airport, "\nCount: ", count)
) %>%
plot_ly(x = ~destination_airport, y = ~count, text = ~text_label, hoverinfo = "text",
color = ~destination_airport, type = "bar", alpha = .5) %>%
layout(
xaxis = list(title = "Destination Airport"),
yaxis = list(title = "Count"),
title = "Distribution of Delay by Destination Airport")
}
cancel_dest = function(dest){
cancel %>%
filter(
airline_name == dest
) %>%
group_by(destination_airport) %>%
summarize(
count = n()
) %>%
mutate(
destination_airport = fct_reorder(destination_airport, count),
text_label = str_c("Airport: ", destination_airport, "\nCount: ", count)
) %>%
plot_ly(x = ~destination_airport, y = ~count, text = ~text_label, hoverinfo = "text",
color = ~destination_airport, type = "bar", alpha = .5) %>%
layout(
xaxis = list(title = "Destination Airport"),
yaxis = list(title = "Count"),
title = "Distribution of Cancellation by Destination Airport")
}
We can also take a look at whether different airlines could have different trends in delay and cancellation counts among all the destination airports.
We found that LAX and SFO have outstanding delay and cancellation counts, so we decided to take a closer look at the underlying factors behind those delays and cancellations.
We can clearly observe that the delay times are clustered before 180 minutes. For the following explorations, we filtered the delay minutes between 0 to 180 minutes.
The airlines which departure from JFK to the two airports are different, and there is no distinct trend in delay time in minutes among different airlines between the two airports.
Both airports show an increasing trend in delay minutes from November to January. There is no distinct difference in delay time in minutes in different months between the two airports.
We can observe a distinct difference in cancellation counts in each scheduled hour between the two airports.