Customer Satisfaction: Data Driven Analysis

Parodi, Valerio

doi:10.15167/parodi-valerio_phd2022-05-20

The Project is based on a data analysis of a particular Company in order to increase the brand perception over clients. To delimit the domain, we start with some highlights on customer experience and customer satisfaction. The object of the study is the customer satisfaction measured by Net Promoter Score. The goal of the project is to understand hich segments of the company have an impact on the brand perception, in particular on the NPS. The available data are comment forms filled by clients: some questions are multiple choice questions, while others are free text uestions. Starting from this awareness, our work devotes an entire chapter to theory and techniques useful for our purposes. In particular, we want to give a solid background on free text analysis. We start from intuitive processes to reach sophisticated algorithms to achieve the goal. The initial point is a classification from plain text: text preprocessing, feature extraction from text and linear model for sentiment analysis. Before starting with a language modelling, we introduce a simple deep learning for text classification. Then we describe N-gram language model and neural language model. After the sentiment analysis, the main goal of free text analysis is the topic extraction. In order to achieve this goal we need to model the meaning of the words. This means studying word and sentence embeddings.Distributional semantic, explicit and implicit matrix factorization, Word2vec, word analogies are just some of the technique detailed. Topic modelling means navigate through text collection and find the meaning of a particular sentence or paragraph or an entire document. The world of topic modelling is an ensable of different approaches and algorithms. We chose to analyse PLSA and, in particular, LDA (Latend Dirichlet Allocation). Another important part related to the knowledge necessary to tackle the problem is related to predictive models and feature engineering. An entire chapter is devoted to the feature importance and regression trees. The last part of the document shows the project in detail, starting from real data. We divide the project in two parts. The first analysis is based on multiple choice questions. The objective we have is to understand which question has more impact on the question related to NPS. After that, we want to estimate the variation on the NPS supposing to be able to change the result of a particular question (a particular outlet) through a business action. For this purpose we create a NPS-simulator: starting from a shift of some customer opinions, on a particular outlet, due to a business action, we want to estimate the NPS variation. The second analysis is based on free text questions. Using methods, already seen, we want to extract sentiment from customers opinion and topics not considered in the multiple choice questions. What we want to achieve is a consolidation of knowledge, through data-driven techniques, of what the company is already studying (multiple choice questions), but above all we want to understand what are the strengths to focus on or the weaknesses to work on that until now are not known (free text questions).