Abstract: Analytics Big data incorporates approaches, tools


The paper focuses on research on customer satisfaction based on a detailed analysis of consumer reviews coded in languages using Artificial Intelligence methodology like Text Mining, Aspect Sentiment Analysis, Data Mining and Machine Learning. The results corroborates the efficacy of the devised method for decision support for the management of the product quality and reasons for more use of it instead of classical research methods.

Best services for writing your paper according to Trustpilot

Premium Partner
From $18.00 per page
4,8 / 5
Writers Experience
Recommended Service
From $13.90 per page
4,6 / 5
Writers Experience
From $20.00 per page
4,5 / 5
Writers Experience
* All Partners were chosen among 50+ writing services by our Customer Satisfaction Team


Historically the assurance of quality is based on a process approach of a quality management system. The process focuses on communication of the company and the customer in the product production process and consumption. For the product quality improvement, the model incorporates feedback which described in form of customer reviews of the product quality.

To measure satisfaction of customer, International Quality Standard uses some measures: personal interviews, phone interviews, discussion groups, mail surveys online research and survey which have extensive drawbacks like generation of humongous amount of manual work which increases the research costs and hinders the continuous monitoring of customer satisfaction which in turn influences the managerial decision making process as it is dependent on the arrival rate of up-to-date information about customer opinions. These drawbacks made organization to focu on approach such as AI.

Big data for Analytics

Big data incorporates approaches, tools and methods to deal with structured and unstructured data which is humongous & diverse & generated in the conditions of continuous growth. The analysis is performed on large amount of generated data which is based on four Vs: Volume, Variety, Velocity and Value.  The transition to new technologies by the organization to work with a large volume of data is an indicator of readiness which is referred to as Bigd. If Bigd exceeds 50%, then Big Data analysis technologies should be implemented

Volume refers to accumulated data, parameter. Velocity computation is based on two values: the first details the capture and processing of data in near-real-time, the second is the rate of data accumulation in the organization. Variety collection of data from multiple sources in which may be in multiple formats. Value determination is done by experts and it ranges from 0 to 1 to prioritise the source of the data. The key drawback is that Big Data technologies are inefficient to evaluate the quality of services. Various organizations has proposed a methodology to evaluate reviews based on Artificial Intelligence (AI).

AI based approach to Quality Management

The approach is based on four steps. Firstly, reviews collection, cleansing and loading is done. Secondly, the analysis, processing of the stored reviews is done by evaluating them based on emotional response. Then, qualitative and quantitative research is done which is based on decision trees, where the sentiment is covered as a dependent variable. Management decisions are carried on basis of this research.

Steps in AI based approach


Applied AI Based Techniques

3.1 Data Collection

The reviews are stored in XML structure. It includes separate blocks for product name, company and review with other blocks for additional information. The reviews are stored in a review object which simplifies data collection. The two key methods of reviews collection include taking reviews using API which are ready to use tools and another is data collection using web parsing refers to automated analysis using scripts.

3.2 Sentiment Analysis

After data collection, the processing is done using text mining tools. Sentiment analysis evaluates emotional value of author’s opinion about product satisfaction in relation to the object in the text. Three analysis approach are: linguistic, statistical and combined. The linguistic approach focuses on rules and sentiment vocabulary with the drawback of no quantitative evaluation of the sentiment. Statistical approach covers supervised and non-supervised machine learning. Combined approach is the amalgamation of the first two approaches. Currently supervised machine learning based on Bayesian classification and Support Vector Machines is used which is based on Lemmatization which refers to reducing the words to their basic format.

3.3 Aspect Sentiment Analysis

It allows evaluation of general customer product. An aspect refers to characteristics, attributes, qualities, properties of the product. There are two stages in analysis: aspect identification and determination of the sentiment of the comment. An algorithm for the same was made:

First stage:  

1.     Extraction all nouns S from, set of reviews D.

2.     Count the frequency of words from set of reviews D

3.     Count the difference between the counted frequencies

4.     Sort set of nouns S in descending order and then division of nouns S into aspect groups.

Second Stage:

1.     Divide of set of reviews into sets of sentences.?

2.     Classification of sentiments

3.      Check sentence condition: if sentence is based on negative or positive sentiment and contains at least a noun from the aspect group, then is labelled as an opinion.

Representation of the results of Sentiment and Aspect is done in textual form.

3.4 Decision Trees

It is an algorithm for data processing obtained from Sentiment Analysis and Aspect Sentiment Analysis. The key characteristic of the algorithm is data mining to    support decisions in product quality management. For the realisation, an intelligent data analysis tool is used, the results of which are easily comprehensible as they are represented by means of Boolean logic.  The algorithm explains what product aspects influence customer satisfaction and in what way. Decision tree allows to measure the influence of separate sentiment comments on aspects and their mutual presence or absence in the relation to customer satisfaction. It enables us to detect the most significant product aspects that are essential for the customer.  It makes it possible to evaluate experimentally customer satisfaction in dependence on satisfaction with different product attributes which allows to distribute the company’s budget effectively to maintain a high product quality.  The importance of aspects group describes how much review sentiment depends on aspect group sentiment. If the number of aspect groups is g / 2 , then the number of independent variables is g. The score of customer satisfaction S is determined as follows:

where N p o s denotes number of positive reviews, N  eg  denotes number of negative reviews.?

Implementation of the approach

The developed approach was tested on the data obtained from 635,824 reviews of hotels, collected from popular various internet sources covering the period of 2003-2013. The structure of the collected data consisted of: hotel name; country name; resort name; date of visit; opinion of the hotel; author evaluation of food.  A training sample of positive and negative judgements was constructed using the collected data of the author’s evaluation of accommodation, food and service. The accuracy criterion which was ratio of the number of correctly classified examples to their total number was used to determine the classification accuracy.

For the marking of reviews and the Sentiment Analysis, a classifier was created based on the NB method, with frequency vectors as attribute space, and use of lemmatization and tagging of the negative particles.

The algorithm was used to extract all the key words which were divided into seven basic aspect groups. Further, extracting and marking sentences with words from aspect groups by sentiment was done to describe the quantitative research of consumer satisfaction dynamics. The factors were compared with the average satisfaction in the whole resort by detecting negative trends & by identification of problems in the quality of hotel services. Further, in order to find the reasons of underperformance by Hotel A,   qualitative research of the Sentiment Analysis results was done. Decision trees were created using algorithm C4.5 which helped in identification of the main factors main factors of consumer dissatisfaction which were low service level, problems with food, and complaints about the hotel room which lead to improvements of the managerial decisions on the basis of the information of existing problems contained in negative reviews generated from the automated algorithm.


1) The suggested idea is based on text data processing and analysis which allows to undertake quantitative and qualitative research of customer satisfaction using computer-aided procedures and enables effective managerial decisions about product quality management. It allows for the effective reduction of labour intensity for customer satisfaction research which is available for use by a wide range of companies.

2) The experiment was efficient in solving real problems of quality management as the results of analysis of customer reviews were highly accurate.

Future effort in this field can be devoted to the automatic annotation of text data, for the representation of the huge amount of text found in the reviews by summarising it, to extract valuable information.