ABSTRACT can be A intend B. 55% is

ABSTRACT – Early disease prediction is
one of the core elements of biomedical and
healthcare communities to improve the
quality of prior diagnostics for fatal
diseases like Congenital Heart Disease,
Cancer etc. Advanced Data Mining
techniques can help remedial situations.
Experimenting the medical structured data
with data mining concepts like Classifiers
and Association Rule Mining (ARM)
techniques helps in the detection of
occurrence for a particular disease.
Medical data set obtained from the open
source of United Kingdom is processed
and analysed for heart disease prediction
and then the system suggests hospital for
further treatment. Accuracy comparison
between the classifier algorithms used is
generated from R Studio. This prediction
results pave way for proper diagnosis and
early treatment of chronic diseases. It can
be used to mitigate the death rate increase
due to the late prediction of fatal diseases
only at the critical stage.
Key Words – Data mining, Congenital
Heart Disease, ARM.
I. INTRODUCTION
The healthcare industry collects reliable
and huge amounts of healthcare data
which, unfortunately, are not “mined” to
discover hidden information for effective
decision making. Clinical decisions are
often made based on doctors’ intuition and
experience rather than on the knowledge
rich data hidden in the database.
I.1 Classification
There are two forms of data analysis that
can be used for extracting models
describing important classes or to predict
future data trends. These two forms are as
follows?
Classification
? Prediction
Classification models predict categorical
class labels; and prediction models predict
continuous valued functions. The two
main efficient classifiers implemented here
are the Decision Tree and Naïve baye’s
classification algorithm.
I.2 Association Rule Mining
Association means finding relationship
between different data items in a same data
transaction that is used to discover various
hidden patterns. For instance, if someone
buys a desktop (A), then they also
purchases a speaker (B) in 55% of the
occurrence. This relationship occurs in
8.2% of desktop buys. An association rule
in this condition can be A intend B. 55% is
the CF (confidence factor) and 8.2% is the
SF (support factor). Apriori algorithm,
Pincer search and AprioriDP are the
efficient ARM algorithms in data mining.
II. LITERATURE SURVEY
A review is carried out on different
techniques used by researchers in the
prediction of disease. Enormous
technologies of Data Mining are involved
in design of disease prediction model.
M.C.S.Geetha et al., (2017) have proposed
a system of Analyzing the Suitability Of
Relevant Classification Techniques On
Medical Data Set For Better Prediction. As
emphasized by the authors, It is complex
for medical practitioners to envisage the
heart attack as it requires experience and
knowledge. The health sector today
contains concealed yet significant
information for making decisions. Hence
they have applied and analysed the
commonly used classification algorithms
on medical data set that helps to predict
heart disease that accounts to be the
primary cause of death worldwide. The
research results do not presents a
remarkable difference in the prediction
when using dissimilar classification
algorithms in data mining. The experiment
can serve as an significant tool for
physicians to predict dangerous cases in
practice and counsel accordingly. The
representation given in the paper will be
able to respond more difficult queries in
forecasting the heart attack diseases. The
predictive accuracy determined by
REPTREE, J48 and BayesNet algorithms
propose that parameters used are
consistent indicator to predict the heart
diseases. In the future, more parameters
can be considered for better prediction 1.

Sarath Babu et al.,(2017)have desinged a
system of Heart Disease Diagnosis Using
Data Mining Technique. Medical data
mining has a great potential for exploring
the hidden patterns in the data sets of
medical domain. These patterns can be
utilized to do clinical diagnosis. These data
need to be collected in a standardized
form. From the medical profiles fourteen
attributes are extracted such as age, sex,
blood pressure and blood sugar etc. can
predict the likelihood of patient getting
heart disease. These attributes are fed in to
K-means algorithms, MAFIA algorithm
and Decision tree classification in heart
disease prediction, applying the data
mining technique to heart disease
treatment. Decision Tree has tremendous
efficiency using fourteen attributes, after
applying genetic algorithm to reduce the
actual data size to get the optimal subset of
attribute acceptable for heart disease
prediction 2.
Ilham KADI et al.,(2015) developed A
decision tree-based approach for
cardiovascular dysautonomias diagnosis.
In this paper, a case study was performed
in order to construct a cardiovascular
dysautonomias prediction system using
data mining techniques and a dataset
collected from an ANS(Autonomic
Nervous System) unit of the Moroccan
university hospital Avicenne. The
prediction system is a decision tree-based
classifier that was developed using C4.5
decision tree algorithm. A comparison
between the accuracy rates obtained using
C4.5 algorithm, K-NN and Naïve Bayes
(NB) classifiers was carried out in order to
assess the performance of our system,.
These classifiers have achieved good
performance and high accuracy rates
which were very promising, but still lower
in comparison with the performance of
C4.5 algorithm. C4.5 algorithm is one of
the well-known decision tree algorithms
because of its efficiency and
comprehensive features. The results were
analysed based on three main goals
namely: accuracy, interpretability and
usability. Thus, the prototype was
approved to be highly accurate,
interpretable, time saving and easy to use.
In fact, the prediction system developed
can automate the analysis procedure of the
ANS’s test results and make it easier for
specialists. It can also provide decision
support for cardiologists to assist them and
help them to make better clinical decisions
or at least provide them a second opinion
3.
Sandeep Kaur et al., (2016) proposed a
Disease Prediction using Hybrid K-means
and Support Vector Machine. Predictive
data mining in the field of medical
diagnosis is an emerging research area. A
hybrid K-means algorithm and Support
Vector Machine algorithm (SVM) for
disease prediction is proposed in this paper
to improve the efficiency and accuracy for
prediction. The hybrid K-means algorithm
is applied for dimensionality reduction to
remove outliers and noisy data. SVMs help
in minimizing the errors and also examine
the medical data in shorter time.
The reduced dataset is given as an input to
Support Vector Machine classifier. The
hybrid algorithm is developed by
analysing the various enhanced Kmeans
algorithms and then selecting the two best
enhanced algorithms based on their
performance. The proposed work is to
select the initial centroids by partitioning
the data into k equal parts. The simulation
is performed on diabetes dataset in
MATLAB. The final result of simulation
shows that the efficiency achieved by
proposed algorithm is better than simple
K-means algorithm. The final result of
simulation shows that the accuracy
achieved by purposed algorithm is better
than simple K-means algorithm. The Kmeans
achieved the accuracy of 82% and
the hybrid algorithm achieved the accuracy
of 92% on the same dataset. The Proposed
Model can be applied to any dataset
including Breast cancer, Pima Diabetes,
Surgery dataset, Iris dataset etc 4.
Swaroopa Shastri et al., (2017) developed
Data Mining Techniques to Predict
Diabetes Influenced Kidney Disease. The
objective is to give away a service that
helps the users to have a check up being
sitting in the same place and get the result
of occurrence of the diabetic disease by
providing the details to the application that
is designed to help out the users with
appropriate outcomes. In this system the
datasets are analysed using Apriori
algorithm to calculate the probability and
generate the prediction. In this application
the detailed correlation involving diabetes
and kidney disease is addressed with a
suitable be bothered into a verdict. It helps
doctors to suggest the best medications
with the forecast providence utility for the
users that makes them aware in advance
about the chances of getting the diabetic
kidney related disease. 5.
Jagdeep Singh et al., (2016) developed a
Prediction of Heart Diseases Using
Associative Classification. In this paper
various association and classification
methods are implemented on the heart
datasets to predict the heart diseases. The
association algorithm like Apriori and
FPGrowth are used to finds association
rules of heart dataset attributes. The
classification algorithms like J48, ZeroR,
NaiveBayes, OneR and k-nearest
neighbour are implemented on training
dataset and the output of each algorithm is
evaluated of basis of corrected classified
instances. The main contribution of the
present study to attain high prediction
accuracy for early diagnoses of heart
diseases. The proposed hybrid associative
classification is implemented on weka
environment. The comparative results
show that IBk (k Nearest Neighbor) with
Apriori associative algorithms produces
better results than others. The experimental
results show that large number of the rules
support in the better discover of heart
diseases that even support the heart
specialist in their diagnosis judgements.
Finally an expert system is developed for
the end user. On the basis of better
performance and corrected classified
instances of implemented algorithm, the
IHDPS (Intelligent Heart Disease
Prediction System) is purposed for
prediction of heart diseases 6.
Dao-I Lin et al., (2006) designed Pincer
Search: An Efficient Algorithm for
Discovering the Maximum Frequent Set.
In this paper, the authors presented a novel
algorithm that can be efficiently discover
the maximum frequent set. The PincerSearch
algorithm could reduce both the
number of times the database is read and
the number of candidates considered. A
very important characteristic of the
algorithm is that it does not require explicit
examination of every frequent itemset. The
authors of this paper, have evaluated the
performance of the algorithm using well
known synthetic benchmark databases,
real life census, and stock market
databases. The improvement in
performance can be upto several orders of
magnitude, compared to the best previous
algorithms. Structural properties and basic
discovery approaches used are: the
maximum frequent set, Closure properties,
Discovering frequent itemsets, Apriori
algorithm 7.
III. HOW DATA MINING IS USEFUL
IN THE MEDICAL FIELD
Due to the vast use of computers in the
hospitals and by doctors who practice, a
large amount of information is gathered. In
today’s scenario, the medical institutions
have subtle yet enough information of data
of patients. Huge set of data consist of
relevant information of the patient along
with lot of other information which is the
noise. The entire set of data may be used
by the practitioners but the data miners
have to extract only specific concerned
information know as knowledge.
Emerging research demands the use of
technology available to be helpful for the
society globally. With the available mining
tools it is possible to design a model which
can be helpful for the health care industry.
The tools can provide us with accurate and
time to time report needed for the
practitioners so that the patient is
benefited.
IV. CONCLUSION
In this paper, a survey conducted from
2002 to 2016 gives the different models
available and the different data mining
techniques used. With data mining growth
in biomedical and healthcare communities,
accurate analysis of medical data benefits
early disease detection, patient care and
community services. However, the
analysis accuracy is reduced when the
quality of medical data is incomplete.
Moreover, different regions exhibit unique
characteristics of certain regional diseases,
which may weaken the prediction of
disease outbreaks. Thus, handling data
with proper parameters and constraints
using efficient algorithms will give
promising results.
V. FUTURE ENHANCEMENT
The main objective is to identify the
patterns and features from the medical data
of the patient by combining Classifiers and
Association Rule Mining techniques for
prediction of diseases.
The other objective is to suggest respective
medical hospitals suitable for the predicted
diseases to engage with further diagnosis
and surgical treatments.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now