Evaluating airline passengers’ satisfaction during the COVID-19 pandemic: a text mining approach

. Facing crisis market situations, customers’ satisfaction is important to keep customers loyal. This study aims to measure the service quality key factors to customer satisfaction in the airline industry. A feature selection approach was applied to measure the service quality key factors for influencing customer satisfaction. Support vector machines (SVM) was employed to evaluate the feature selection algorithms’ performance. Findings revealed that responsiveness was the most important factor of airline customers’ satisfaction. This research provides paths to airlines’ managers on how to assure the services making customers feeling satisfied


Introduction
The coronavirus 2019 (COVID-19) has caused problems for airlines.Major airlines such as Scandinavian Airlines and Virgin have already demanded tens of billions of dollars in government assistance [1].Throughout history, the airline industry has faced numerous challenges, but none have been as swift and serious as the spread of COVID-19 [2].Customer satisfaction management is critical for the airline industry during the pandemic [3].Customer satisfaction, according to some studies, plays an important role in motivating consumers' behavioral loyalty, which translates into favorable feedback, repeat purchases, or recommending the product or service to others [4].Unsatisfied passengers, on the other hand, will decide not to fly with the same airline again in the future [5], or they may launch a negative word-of-mouth campaign (which may be electronic) that harms the company's credibility and image [6].
Customers' satisfaction with airline services is described as the customer's response after receiving the service [7].The SERVQUAL model is one of the most reliable tools for determining how airline services impact customer satisfaction [8].In empirical research, [9] proposed 'SERVQUAL' a service quality measurement system focused on dimensions such as tangibles, efficiency, responsiveness, assurance, and empathy.This method was designed

2
Literature Reviews

Interrelationship among SERVQUAL and customer satisfaction
The interaction between pre-purchase expectations and post-purchase evaluation results in customer satisfaction [13].Service quality is measured by the disparities between customers' expectations of the service and their perceptions of the providers' actual results, according to the SERVQUAL instrument [9].Customers' satisfaction is shaped by their expectations and actual service efficiency, while SERVQUAL is based on its expectancy-disconfirmation theory writings [14].It can be inferred that customers would be satisfied if SERVQUAL dimensions are delivered well by airlines.SERVQUAL is also a more generic model that offers a system for assessing perceived service quality [7].This model has been hotly debated, but there is no denying that many studies have been conducted in the past [7], [10], [15].Consequently, this study uses SERVQUAL to measure customer satisfaction.

Text Mining
Text mining is a method that incorporates data mining and natural language processing techniques.Instead of dealing with knowledge overload, text mining methods have been applied to online review research, which are mostly used for detecting general trends or specific patterns of online customer reviews [16].Text mining is a quick and easy way to summarize and gather key details from a large number of customer reviews, allowing you to see the customer's point of view more clearly [17].
Text mining is a powerful technique for extracting market value from the large amount of accessible social media data in the value of social media [18].[19] used a text mining tool to examine Airbnb customer feedback and discovered that, interestingly, 'price' is not known as a key influencer.[12] used text mining to conduct an exploratory study to learn what restaurants' customers want.As a result, this research employs a text mining technique to determine what airline passengers want in terms of satisfaction.

Least Absolute Shrinkage and Selection Operator (LASSO)
LASSO is a linear model with simultaneous shrinkage and variable selection that performs regression and feature selection at the same time and has other benefits such as sparsity and interpretability [20].[21] used text analysis to detect a manager's fraud risk and discovered that LASSO is substantially more accurate than Convex Optimization (CVX).LASSO was used by [12] to investigate online customer feedback.This study employs LASSO to extract the most relevant considerations from online customer feedback in order to assist managers in making trade-offs while providing service to their customers.

International Conference of Information Communication Technologies
enhanced Social Sciences and Humanities 2021 -ICTeSSH 2021

Support vector machine (SVM)
SVM is a supervised learning algorithm that is based on statistical theory and the structural risk minimization principle [22].Because of its usefulness, SVM was found to be superior for machine learning classifiers [23], and they examined customer satisfaction from airline tweets.[24] used SVM and found it to be successful in evaluating the efficiency of feature selection algorithms using movie reviews as a data source.[25] used SVM and demonstrated that it is a better candidate for assessing the classification accuracy of feature selection algorithms using data from medical datasets.The efficiency of LASSO is evaluated using SVM in this analysis.

Data Collection
This study collected online customer reviews of two of the biggest airline in Europe [26] from Sktyrax (https://www.airlinequality.com/).The website allows comment texts and a 0 to 10 rating score provided by the passengers.Skytrax receives about 1.26 million monthly visitors, with 87.47 percent of those searching using words like airline(s), air, and review-(s) [27].When a series of verification and security features are added to the comments on Skytrax, they are found to be extremely accurate [28].[29] collected customer feedback on Skytrax to assess airline passengers' emotions.The total number of reviews was collected 100 reviews from 22nd March 2020 to 26th March 2021.

Defining Factors
This study built factors based on literature reviews.This study measured satisfaction factors using SERVQUAL dimensions.

Building Lexicons
This study built lexicons by computing word frequencies related to SERVQUAL.Then, related single words of SERVQUAL based on literature reviews were collected by synonyms and antonyms from thesaurus.com to build the whole factors' lexicons.Adverbs, verb and adjectives are POS, and they can represent customer feelings about their opinions, idea, reaction and emotion [33].One of the techniques commonly used to detect services/products attributes is the part-of-speech (POS) tagger to annotate nouns and noun phrases in customer reviews [34].

Normalizing Data
Before the data was run by feature selection method, the data had to be reformatted in advance in order to obtain valid results.The researcher used equation ( 1) to normalize the data into the interval (-1,1).
Beside data normalization, the five-fold cross-validation experiment would be implemented.
The data set will be divided into five equal parts.Four parts were used as a training data sets, and another one part was used as a test data set.Furthermore, this study did the five-fold cross validation experiment experiments.

LASSO
Matlab R2017a was used to run LASSO algorithm in order to do feature selection simultaneously to get the most important factors to the model from airline customers' reviews.
Based on the above equation, in regression parameter value βi is limited by a specific penalty selection benchmark.Given a k-explained transformation, the parameter estimate β ̂ will be selected for the essential features.The parameter estimate will be influenced by the value of λ.There is one special case when the k value approaches infinity, the parameter estimate β ̂ is not limited, and then the estimate will be the value determined by the least-squares method.So, the opposite situation is that when k is adjusted to 0 for all parameter estimates will be 0.
Accordingly, it provides a feature subset according to the coefficient is 0 which is not the criterion features that we are looking for.

Evaluation
SVM was employed to evaluate the experimental results from LASSO.LibSVM was used to train classifiers from original feature sets and chosen feature subsets by LASSO.LIBSVM, an open-source library tool that supports multi-class classification, was used to create the SVM classifier [35].To obtain the best parameter settings, the C-SVM classification parameter selection tool grid.py in the Radial Basis Function (RBF) is used [36].The following are the stages:

Metrics
In the past, when it came to performance assessment, the confusion matrix was thought to be the best way to assess classification performance.The product of a classification task's estimation is summarized in a confusion matrix.The uncertainty matrix's key is the number of incorrect and correct predictions, which is summarized with count values and separated by class.Overall accuracy (OA) is used in this study to determine the features are most relevant to the model [37].The proposed method's excellent OA indicates that the function collection and representation collect valuable information.The higher the OA ranking, the better the model's features are.
Overall Accuracy = TP+TN TP+FP+TN+FN (4)  [38] found that due to COVID-19, the aviation industry has had to become even more nimble and responsive.Responsiveness is one of the most important factors for airline customers since during the COVID-19 pandemic many customers had to cancel their flight due to flight restrictions, so they asked for a refund with a fast process.

Conclusions
This study aims to give managerial implications for managers that can give services and products to their customers better and meet the customer expectations.The first contribution is to know which attributes are the most important factors of customer satisfaction for airlines.
Managers have to deal carefully with those factors since those will influence customer satisfaction because when they are satisfied they are becoming loyal.The second contribution of this study is to help managers to do trade-off in providing services because during COVID-19 crisis, airlines had faced so many problems.The third contribution for managers and researchers in terms of data collecting is that it uses customer online reviews to avoid potential bias and pitfalls by using traditional survey e.g.questionnaires, because respondents could fill the survey randomly, and to decrease time and cost.Online reviews are able to solve some problems that questionnaires face such as sample bias, costly in terms of human and financial resources [12], [39].

Stage 3 . 5 . 1 : 3 )Stage 3 . 5 . 4 :
Normalize the data.Stage 3.5.2:Transform data format.Stage 3.5.3:Apply the RBF kernel function listed in equation International Conference of Information Communication Technologies enhanced Social Sciences and Humanities 2021 -ICTeSSH 2021 (, ) =  −‖−‖ 2 (Apply cross-validation to select the best parameters C and γ.Stage 3.5.5:Get the best parameters C and γ, and train SVM.Stage 3.5.6:Test using the constructed model.

Table 3
.1 lists table of factors.

Table 4 .
International Conference of Information Communication Technologies enhanced Social Sciences and Humanities 2021 -ICTeSSH 2021 get fewer factors but more accurate than original factors.The result showed that responsiveness was the most important factor since it got higher overall accuracy after evaluated by SVM compared to the original data set which is four factors.It is similar with what LASSO has shown how powerful the algorithm is since the features are obtained fewer, and then after evaluation done by SVM, LASSO also gets higher overall accuracy than the original data set.It has been approved that feature selection methods are very important to