Skip to main content
SearchLoginLogin or Signup

Evaluating airline passengers’ satisfaction during the COVID-19 pandemic: a text mining approach

Published onJul 20, 2021
Evaluating airline passengers’ satisfaction during the COVID-19 pandemic: a text mining approach

Abstract. Facing crisis market situations, customers’ satisfaction is important to keep customers loyal. This study aims to measure the service quality key factors to customer satisfaction in the airline industry. A feature selection approach was applied to measure the service quality key factors for influencing customer satisfaction. Support vector machines (SVM) was employed to evaluate the feature selection algorithms’ performance. Findings revealed that responsiveness was the most important factor of airline customers’ satisfaction. This research provides paths to airlines’ managers on how to assure the services making customers feeling satisfied.

*Corresponding author: [email protected]


The coronavirus 2019 (COVID-19) has caused problems for airlines. Major airlines such as Scandinavian Airlines and Virgin have already demanded tens of billions of dollars in government assistance [1]. Throughout history, the airline industry has faced numerous challenges, but none have been as swift and serious as the spread of COVID-19 [2]. Customer satisfaction management is critical for the airline industry during the pandemic [3]. Customer satisfaction, according to some studies, plays an important role in motivating consumers' behavioral loyalty, which translates into favorable feedback, repeat purchases, or recommending the product or service to others [4]. Unsatisfied passengers, on the other hand, will decide not to fly with the same airline again in the future [5], or they may launch a negative word-of-mouth campaign (which may be electronic) that harms the company's credibility and image [6].

Customers' satisfaction with airline services is described as the customer's response after receiving the service [7]. The SERVQUAL model is one of the most reliable tools for determining how airline services impact customer satisfaction [8]. In empirical research, [9] proposed ‘SERVQUAL’ a service quality measurement system focused on dimensions such as tangibles, efficiency, responsiveness, assurance, and empathy. This method was designed to assess consumer preferences and attitudes in order to determine customer satisfaction [10]. Up to this stage, this study wants to answer a question about which SERVQUAL factors are the most important for airline customers during the COVID-19 pandemic.

Previous researchers [8], [10] used the questionnaire approach to determine which variables in the SERVQUAL dimension are the most significant attributes for airline customers’ satisfaction. For the survey method, respondents may not pay attention to every item or randomly answer questions resulting in incomplete data [11]. Online customer reviews, on the other hand, are written by consumers based on their willingness rather than questions posed, so they can be franker than conventional surveys (e.g., questionnaire surveys, focus groups, or a combination of the two) [12]. To fulfill those gaps, this study uses online customer reviews as a basis of data in order to get more natural voices of customers.

Literature Reviews

Interrelationship among SERVQUAL and customer satisfaction

The interaction between pre-purchase expectations and post-purchase evaluation results in customer satisfaction [13]. Service quality is measured by the disparities between customers' expectations of the service and their perceptions of the providers' actual results, according to the SERVQUAL instrument [9]. Customers' satisfaction is shaped by their expectations and actual service efficiency, while SERVQUAL is based on its expectancy-disconfirmation theory writings [14]. It can be inferred that customers would be satisfied if SERVQUAL dimensions are delivered well by airlines. SERVQUAL is also a more generic model that offers a system for assessing perceived service quality [7]. This model has been hotly debated, but there is no denying that many studies have been conducted in the past [7], [10], [15]. Consequently, this study uses SERVQUAL to measure customer satisfaction.

Text Mining

Text mining is a method that incorporates data mining and natural language processing techniques. Instead of dealing with knowledge overload, text mining methods have been applied to online review research, which are mostly used for detecting general trends or specific patterns of online customer reviews [16]. Text mining is a quick and easy way to summarize and gather key details from a large number of customer reviews, allowing you to see the customer's point of view more clearly [17].

Text mining is a powerful technique for extracting market value from the large amount of accessible social media data in the value of social media [18]. [19] used a text mining tool to examine Airbnb customer feedback and discovered that, interestingly, ‘price' is not known as a key influencer. [12] used text mining to conduct an exploratory study to learn what restaurants' customers want. As a result, this research employs a text mining technique to determine what airline passengers want in terms of satisfaction.

Least Absolute Shrinkage and Selection Operator (LASSO)

LASSO is a linear model with simultaneous shrinkage and variable selection that performs regression and feature selection at the same time and has other benefits such as sparsity and interpretability [20]. [21] used text analysis to detect a manager's fraud risk and discovered that LASSO is substantially more accurate than Convex Optimization (CVX). LASSO was used by [12] to investigate online customer feedback. This study employs LASSO to extract the most relevant considerations from online customer feedback in order to assist managers in making trade-offs while providing service to their customers.

Support vector machine (SVM)

SVM is a supervised learning algorithm that is based on statistical theory and the structural risk minimization principle [22]. Because of its usefulness, SVM was found to be superior for machine learning classifiers [23], and they examined customer satisfaction from airline tweets. [24] used SVM and found it to be successful in evaluating the efficiency of feature selection algorithms using movie reviews as a data source. [25] used SVM and demonstrated that it is a better candidate for assessing the classification accuracy of feature selection algorithms using data from medical datasets. The efficiency of LASSO is evaluated using SVM in this analysis.

Research Methods

Data Collection

This study collected online customer reviews of two of the biggest airline in Europe [26] from Sktyrax ( The website allows comment texts and a 0 to 10 rating score provided by the passengers. Skytrax receives about 1.26 million monthly visitors, with 87.47 percent of those searching using words like airline(s), air, and review-(s) [27]. When a series of verification and security features are added to the comments on Skytrax, they are found to be extremely accurate [28]. [29] collected customer feedback on Skytrax to assess airline passengers' emotions. The total number of reviews was collected 100 reviews from 22nd March 2020 to 26th March 2021.

Defining Factors

This study built factors based on literature reviews. This study measured satisfaction factors using SERVQUAL dimensions. Table 3.1 lists table of factors.

Table 3.1 Defining Factors




Example Words

Adapted from



Appearance, physical facilities, equipment and personnel.

employees, appearance, clean, cabin, modern, etc.





Capability to execute the promised service credibility and dependably.

timeliness, accurate, security, safety, cancelation, etc.





Enthusiasm to help customers and grant precise service.

Responsiveness, prompt, willing, etc.





Propriety of crew and their capability to deliver confidence and trust.

reputation, confidence, skill, courteousness, knowledge, etc.





satisfied (rating score of 7 to 10) or dissatisfied (rating scores of 0 to 6)


Building Lexicons

This study built lexicons by computing word frequencies related to SERVQUAL. Then, related single words of SERVQUAL based on literature reviews were collected by synonyms and antonyms from to build the whole factors’ lexicons. Adverbs, verb and adjectives are POS, and they can represent customer feelings about their opinions, idea, reaction and emotion [33]. One of the techniques commonly used to detect services/products attributes is the part-of-speech (POS) tagger to annotate nouns and noun phrases in customer reviews [34].

Normalizing Data

Before the data was run by feature selection method, the data had to be reformatted in advance in order to obtain valid results. The researcher used equation (1) to normalize the data into the interval (-1,1).

v= vminamaxaminav^{'} = \ \frac{{v - min}_{a}}{{\max_{a} - min}_{a}} (1)

Beside data normalization, the five-fold cross-validation experiment would be implemented. The data set will be divided into five equal parts. Four parts were used as a training data sets, and another one part was used as a test data set. Furthermore, this study did the five-fold cross validation experiment experiments.


Matlab R2017a was used to run LASSO algorithm in order to do feature selection simultaneously to get the most important factors to the model from airline customers’ reviews.

min=t=1T(ytβ0β1x1,tβkxk,t)2,   s.t.  j=1kβjλ\min = \sum_{t = 1}^{T}{(y_{t} - \beta_{0} - \beta_{1}x_{1,t} - \ldots - \beta_{k}x_{k,t})^{2}},\ \ \ s.t.\ \ \sum_{j = 1}^{k}\left| \beta_{j} \right| \leq \lambda (2)

Based on the above equation, in regression parameter value βi is limited by a specific penalty selection benchmark. Given a k-explained transformation, the parameter estimate β ̂ will be selected for the essential features. The parameter estimate will be influenced by the value of λ. There is one special case when the k value approaches infinity, the parameter estimate β ̂ is not limited, and then the estimate will be the value determined by the least-squares method. So, the opposite situation is that when k is adjusted to 0 for all parameter estimates will be 0. Accordingly, it provides a feature subset according to the coefficient is 0 which is not the criterion features that we are looking for.


SVM was employed to evaluate the experimental results from LASSO. LibSVM was used to train classifiers from original feature sets and chosen feature subsets by LASSO. LIBSVM, an open-source library tool that supports multi-class classification, was used to create the SVM classifier [35]. To obtain the best parameter settings, the C-SVM classification parameter selection tool in the Radial Basis Function (RBF) is used [36]. The following are the stages:

Stage 3.5.1: Normalize the data.

Stage 3.5.2: Transform data format.

Stage 3.5.3: Apply the RBF kernel function listed in equation

K(x,y)=erxy2K\left( x,y \right) = e^{- r\left\| x - y \right\|^{2}} (3)

Stage 3.5.4: Apply cross-validation to select the best parameters C and γ.

Stage 3.5.5: Get the best parameters C and γ, and train SVM.

Stage 3.5.6: Test using the constructed model.


In the past, when it came to performance assessment, the confusion matrix was thought to be the best way to assess classification performance. The product of a classification task's estimation is summarized in a confusion matrix. The uncertainty matrix's key is the number of incorrect and correct predictions, which is summarized with count values and separated by class. Overall accuracy (OA) is used in this study to determine the features are most relevant to the model [37]. The proposed method's excellent OA indicates that the function collection and representation collect valuable information. The higher the OA ranking, the better the model's features are. Overall Accuracy= TP+TNTP+FP+TN+FNOverall\ Accuracy = \ \frac{TP + TN}{TP + FP + TN + FN} (4)

Experimental Results

LASSO Results

Table 4.1. LASSO Results















LASSO did feature selection method and ranked all those features based on the word occurrence frequency by the five-fold cross validation experiment with 5 time appearance. The results have been shown in table 4.1. It showed that responsiveness (RS) was the most important attribute for airline passengers’ satisfaction.

SVM Evaluation

Table 4.2. SVM Evaluation

LASSO (1factor)

Original (4Factors)

Overall Accuracy



Once LASSO has chosen the most important attributes of airline customers’ satisfaction, the evaluation of LASSO performance will be done by SVM as shown in the table 4.2. The results showed that LASSO obtained higher overall accuracy.


LASSO has shown how powerful the algorithm is since the features are obtained fewer, and then after evaluation done by SVM, LASSO also gets higher overall accuracy than the original data set. It has been approved that feature selection methods are very important to get fewer factors but more accurate than original factors. The result showed that responsiveness was the most important factor since it got higher overall accuracy after evaluated by SVM compared to the original data set which is four factors. It is similar with what [38] found that due to COVID-19, the aviation industry has had to become even more nimble and responsive. Responsiveness is one of the most important factors for airline customers since during the COVID-19 pandemic many customers had to cancel their flight due to flight restrictions, so they asked for a refund with a fast process.


This study aims to give managerial implications for managers that can give services and products to their customers better and meet the customer expectations. The first contribution is to know which attributes are the most important factors of customer satisfaction for airlines. Managers have to deal carefully with those factors since those will influence customer satisfaction because when they are satisfied they are becoming loyal. The second contribution of this study is to help managers to do trade-off in providing services because during COVID-19 crisis, airlines had faced so many problems.

The third contribution for managers and researchers in terms of data collecting is that it uses customer online reviews to avoid potential bias and pitfalls by using traditional survey e.g. questionnaires, because respondents could fill the survey randomly, and to decrease time and cost. Online reviews are able to solve some problems that questionnaires face such as sample bias, costly in terms of human and financial resources [12], [39].


[1] S. Gössling, D. Scott, and C. M. Hall, “Pandemics, tourism and global change: a rapid assessment of COVID-19,” J. Sustain. Tour., vol. 29, no. 1, pp. 1–20, 2020, doi: 10.1080/09669582.2020.1758708.

[2] J. B. Sobieralski, “COVID-19 and airline employment: Insights from historical uncertainty shocks to the industry,” Transp. Res. Interdiscip. Perspect., vol. 5, p. 100123, 2020, doi: 10.1016/j.trip.2020.100123.

[3] P. Monmousseau, A. Marzuoli, E. Feron, and D. Delahaye, “Impact of Covid-19 on passengers and airlines from passenger measurements: Managing customer satisfaction while putting the US Air Transportation System to sleep,” Transp. Res. Interdiscip. Perspect., p. Epub ahead of print 4 September 2020, 2020, doi: 10.1016/j.trip.2020.100179.

[4] F. R. Lucini, L. M. Tonetto, F. S. Fogliatto, and M. J. Anzanello, “Text mining approach to explore dimensions of airline customer satisfaction using online customer reviews,” J. Air Transp. Manag., p. Epub ahead of print 27 December 2019, 2020, doi: 10.1016/j.jairtraman.2019.101760.

[5] J. Namukasa, “The influence of airline service quality on passenger satisfaction and loyalty the case of Uganda airline industry,” TQM J., vol. 25, no. 5, pp. 520–532, 2013, doi: 10.1108/TQM-11-2012-0092.

[6] J. Blodgett and H. Li, “Assessing the Effects of Post-Purchase Dissatisfaction and Complaining Behavior on Profitability: A Monte Carlo Simulation,” J. Consum. Satisf. Dissatisfaction Complain. Behav., vol. 20, pp. 1–14, 2007.

[7] J. Rezaei, O. Kothadiya, L. Tavasszy, and M. Kroesen, “Quality assessment of airline baggage handling systems using SERVQUAL and BWM,” Tour. Manag., vol. 66, pp. 85–93, 2018, doi: 10.1016/j.tourman.2017.11.009.

[8] C. Basfirinci and A. Mitra, “A cross cultural investigation of airlines service quality through integration of Servqual and the Kano model,” J. Air Transp. Manag., vol. 42, pp. 239–248, 2015, doi: 10.1016/j.jairtraman.2014.11.005.

[9] a Parasuraman, V. a Zeithaml, and L. L. Berry, “SERQUAL: A Multiple-Item scale for Measuring Consumer Perceptions of Service Quality,” J. Retail., vol. 64, pp. 12–40, 1988, doi: 10.1016/S0148-2963(99)00084-3.

[10] F. T. Shah, Z. Syed, A. Imam, and A. Raza, “The impact of airline service quality on passengers’ behavioral intentions using passenger satisfaction as a mediator,” J. Air Transp. Manag., vol. 85, p. 101815, 2020, doi: 10.1016/j.jairtraman.2020.101815.

[11] J. R. Evans and A. Mathur, “The value of online surveys: a look back and a look ahead,” Internet Res., vol. 28, no. 4, pp. 854–887, 2018, doi: 10.1108/IntR-03-2018-0089.

[12] W.-K. Chen, D. Riantama, and L.-S. Chen, “Using a Text Mining Approach to Hear Voices of Customers from Social Media toward the Fast-Food Restaurant Industry,” Sustainability, vol. 13, no. 1, p. 268, 2021, doi: 10.3390/su13010268.

[13] J. F. Engel, R. D. Blackwell, and P. W. Miniard, Consumer Behavior:Post-Purchase Processes: Consumption and Evaluation. Hinsdale: Dryden Press, 1982.

[14] L. Y. Leong, T. S. Hew, V. H. Lee, and K. B. Ooi, “An SEM-artificial-neural-network analysis of the relationships between SERVPERF, customer satisfaction and loyalty among low-cost and full-service airline,” Expert Syst. Appl., vol. 42, no. 19, pp. 6620–6634, 2015, doi: 10.1016/j.eswa.2015.04.043.

[15] G. Philip and S. A. Hazlett, “The measurement of service quality: A new P-C-P attributes model,” Int. J. Qual. Reliab. Manag., vol. 14, no. 3, pp. 260–286, 1997, doi: 10.1108/02656719710165482.

[16] Z. Yan, M. Xing, D. Zhang, and B. Ma, “EXPRS: An extended pagerank method for product feature extraction from online consumer reviews,” Inf. Manag., vol. 52, pp. 850–858, 2015, doi: 10.1016/

[17] X. Xu and Y. Li, “Examining Key Drivers of Traveler Dissatisfaction with Airline Service Failures: A Text Mining Approach,” J. Supply Chain Oper. Manag., vol. 14, no. 1, pp. 30–50, 2016.

[18] W. He, S. Zha, and L. Li, “Social media competitive analysis and text mining: A case study in the pizza industry,” Int. J. Inf. Manage., vol. 33, no. 3, pp. 464–472, 2013, doi: 10.1016/j.ijinfomgt.2013.01.001.

[19] M. Cheng and X. Jin, “What do Airbnb users care about? An analysis of online review comments,” Int. J. Hosp. Manag., vol. 76, no. A, pp. 58–70, 2019, doi: 10.1016/j.ijhm.2018.04.004.

[20] R. Tibshirani, “Regression Shrinkage and Selection Via the Lasso,” J. R. Stat. Soc. Ser. B, vol. 58, no. 1, pp. 267–288, 1996, doi: 10.1111/j.2517-6161.1996.tb02080.x.

[21] A. R. Dastjerdi, D. Foroghi, and G. H. Kiani, “Detecting manager’s fraud risk using text analysis: evidence from Iran,” J. Appl. Account. Res., vol. 20, no. 2, pp. 154–171, 2019, doi: 10.1108/JAAR-01-2018-0016.

[22] C. Cortes and V. Vapnik, “Support-Vector Networks,” Mach. Learn., vol. 20, pp. 273–297, 1995, doi: 10.1023/A:1022627411411.

[23] S. Kumar and M. Zymbler, “A machine learning approach to analyze customer satisfaction from airline tweets,” J. Big Data, vol. 6, pp. 1–16, 2019, doi: 10.1186/s40537-019-0224-1.

[24] S. K. Trivedi, S. Dey, and A. Kumar, “Capturing user sentiments for online Indian movie reviews,” Electron. Libr., vol. 36, no. 4, pp. 677–695, 2018, doi: 10.1108/el-04-2017-0075.

[25] M. Alirezanejad, R. Enayatifar, H. Motameni, and H. Nematzadeh, “Heuristic filter feature selection methods for medical datasets,” Genomics, vol. 112, no. 2, pp. 1173–1181, 2020, doi: 10.1016/j.ygeno.2019.07.002.

[26] Buyck C, “Wizz Air Is Now Europe’s Largest Airline, Southwest World’s Biggest Amid Coronavirus Disruption,” Forbes, 2020.

[27] W. Messner, “The impact of language proficiency on airline service satisfaction,” J. Travel Tour. Mark., vol. 37, no. 2, pp. 169–184, 2020, doi: 10.1080/10548408.2020.1740139.

[28] V. Bogicevic, W. Yang, M. Bujisic, and A. Bilgihan, “Visual Data Mining: Analysis of Airline Service Quality Attributes,” J. Qual. Assur. Hosp. Tour., vol. 18, no. 4, pp. 509–530, 2017, doi: 10.1080/1528008X.2017.1314799.

[29] C. Song, J. Guo, and J. Zhuang, “Analyzing passengers’ emotions following flight delays- a 2011–2019 case study on SKYTRAX comments,” J. Air Transp. Manag., 2020, doi: 10.1016/j.jairtraman.2020.101903.

[30] R. Rajaguru, “Role of value for money and service quality on behavioural intention: A study of full service and low cost airlines,” J. Air Transp. Manag., vol. 53, pp. 114–122, 2016, doi: 10.1016/j.jairtraman.2016.02.008.

[31] T. Jeeradist, N. Thawesaengskulthai, and T. Sangsuwan, “Using TRIZ to enhance passengers’ perceptions of an airline’s image through service quality and safety,” J. Air Transp. Manag., vol. 53, pp. 131–139, 2016, doi: 10.1016/j.jairtraman.2016.02.011.

[32] N. Raassens and H. Haans, “NPS and Online WOM: Investigating the Relationship Between Customers’ Promoter Scores and eWOM Behavior,” J. Serv. Res., vol. 20, no. 3, pp. 322–334, 2017, doi: 10.1177/1094670517696965.

[33] B. Gao, X. Li, S. Liu, and D. Fang, “How power distance affects online hotel ratings: The positive moderating roles of hotel chain and reviewers’ travel experience,” Tour. Manag., vol. 5, pp. 176–186, 2018, doi: 10.1016/j.tourman.2017.10.007.

[34] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 168–177, doi: 10.1145/1014052.1014073.

[35] R. Wang, W. Li, R. Li, and L. Zhang, “Automatic blur type classification via ensemble SVM,” Signal Process. Image Commun., vol. 71, pp. 24–35, 2019, doi: 10.1016/j.image.2018.08.003.

[36] D. Zhao, H. Liu, Y. Zheng, Y. He, D. Lu, and C. Lyu, “A reliable method for colorectal cancer prediction based on feature selection and support vector machine,” Med. Biol. Eng. Comput., vol. 57, no. 4, pp. 901–912, 2019, doi: 10.1007/s11517-018-1930-0.

[37] S. Zhang and X. Duan, “Prediction of protein subcellular localization with oversampling approach and Chou’s general PseAAC,” J. Theor. Biol., vol. 437, pp. 239–250, 2018, doi: 10.1016/j.jtbi.2017.10.030.

[38] J. R. Bartle, R. K. Lutte, and D. Z. Leuenberger, “Sustainability and air freight transportation: Lessons from the global pandemic,” Sustain., vol. 13, no. 7, p. 3738, 2021, doi: 10.3390/su13073738.

[39] M. Schuckert, X. Liu, and R. Law, “Hospitality and Tourism Online Reviews: Recent Trends and Future Directions,” J. Travel Tour. Mark., vol. 32, no. 5, pp. 608–621, 2015, doi: 10.1080/10548408.2014.933154.

No comments here
Why not start the discussion?