Airport load modelling

Passenger load factor is a measure of efficiency, and it is used to describe how well an airline performs [2]. A high load factor indicates that an airline has sold most of its seats, while a low load factor can cause concern for many airlines. Airlines provide the passenger load factor 1-3 days beforehand, while the airfield plans duty rosters months in advance. In literature, little is known about the prediction of future visitors. Therefore, it was necessary to develop a method to predict the total number of visitors on a given day. The goal for this method is that the owners can optimally plan employee allocation and optimize operations to maximize revenue [1].
Airports have been collecting data for a long time. But the collected data is not appropriately used yet. Right now, the passenger load factor is being predicted by simply guessing. The airport wants to accurately predict the passenger flight load because the airlines give them the passengers list 1- 3 days before the flight, which is far too late. At the same time, the rosters must be made months beforehand.
This research aims to see how machine learning can be used to predict the future passenger load factor of flights so that it can be used to assign airport resources like work rosters and security correctly. This research aims to come up with an algorithm that can be used for passenger load factor prediction.
To provide the client with an algorithm, machine learning is used. A lot of applications have been implemented thanks to machine learning. Machine learning allows computers to learn and is mainly used for regression and classification. Besides that, it is also used for prediction. This paper will try to see how machine learning can predict the future passenger load factor of flights[2].
Business owners want to improve their quality of service and reduce costs using big data techniques. To achieve this, they use predictive models and machine learning [1]. For example, machine learning algorithms for regression can find relationships between different variables and consequences to predict future load numbers[1].
This paper will mainly focus on providing an algorithm regarding the prediction of future visitors. This case is about future visitors of flights and predicting the flight load. Prediction is seen as the foundation and basis of decision- making [3]. The passenger load factor measures how many passengers an airline can carry. Studies have shown that there are a lot of essential elements that can affect the load factor of flights. Knowing these factors could help airlines make more effective decisions and planning [4].


Airlines provide the airports with the number of passengers three days before a flight, but duty rosters and staff planning must be made months beforehand. Currently, airports have no means of predicting the number of passengers. To tackle this problem, two machine learning algorithms, CatBoost and K- Nearest Neighbor, were used to create a model for passenger prediction. The performance of each regressor is measured in Root-Mean-Squared Error (RMSE). The K-Nearest Neighbor model scored a test error of 21.80 RMSE, while the CatBoost model scored a lower error of 16.94 RMSE. Beyond the RMSE scores, the training time and feature importance were measured. For both models, the preprocessing time required only 8 seconds. For K-Nearest Neighbor, the training time of the model was around 25 minutes (1443.7 seconds), the training time was much faster for CatBoost (274.49 seconds), being able to build the model within 5 minutes. The strongest indicators of future passenger load are calculated by evaluating each independent variable by computing its usage of all the decision trees during training. The strongest indicator is max_seats (16.84), followed by the month (15.38).

[1] Ma, X., Tian, Y., Luo, C., & Zhang, Y. (2018). Predicting Future Visitors of Restaurants Using Big Data. Proceedings – International Conference on Machine Learning and Cybernetics, 1, 269–274.

[2] Hao, J., & Ho, T. K. (2019). Machine Learning Made Easy: A Review of Scikit-learn Package in Python Programming Language. In Journal of Educational and Behavioral Statistics (Vol. 44, Issue 3, pp. 348–361). SAGE Publications Inc.

[3] Huang, X., Kazantsev, N. S., Zhang, M., Ge, X., Lucas, A., Debnath, J., Liu, H., Chen, C., McGowan, D., Lu, Y., Kuo, K., Hikmat Fouad AL- Hadeeth, R., Ghaffar Ebadi, A., Guo, L., Mamun Habib, M., Zhao, W., Zhao, X., Mi Jun, A., Mariana, M., … Chen, Z. (2016). Prediction of Visitors Quantity Based on A Combined Method.

[4] Salarzadeh Jenatabadi, H., & Azina Ismail, N. (2007). The Determination of Load Factors in the Airline Industry. In International Review of Business Research Papers (Vol. 3, Issue 4).

Dr. Mathis Mourey
I am a Lecturer in Finance/Statistics at THUAS and hold a PhD in Finance from the University Grenoble Alpes (UGA). My research mainly focuses on Systemic Risk measurement. I also have research interests in Data Science and Cryptocurrencies.