In today’s world Breast
cancer is one of the major problem faced by women . Identifying cancer is the
primitive stage and is still challenging.
The diagnosis and treatment of the breast cancer have become an urgent. Breast
cancer, is widely seen tumor in Indian
women . Early treatment of breast cancer have become an extremely crucial work to do, not only helps to cure cancer but
also helps in curative of its occurence.
Today , there are different kinds
of methods and data mining techniques and various process like knowledge discovery are developed for predicting the breast
cancer. As per the study , we perform a comparison
of different classification and clustering algorithms. Various classification algorithms and the
clustering algorithm are used. The result indicate that the classification
algorithms are better predictors than the clustering algorithms.
breast cancer is common in women. Predicting breast cancer is as important as
its treatment. Breast cancer is the most common cause of death among women. If
breast cancer predicted at its earlier stages,better treatment can be provided
which enable the person to survive.Diagnosis and treatment of breast cancer has
become an urgent work to perform.Different datamining methods are used to
retrieve valuable information from large databases inorder to make decisions to
provide better health services.
Breast cancer begins with
the abnormal growth of some breast cells. These cells divide more rapidly and
continue to accumulate than healthy cells do, forming a lump or mass. These cells
may grow through your breast to your lymph nodes or to other parts of your body.Breast cancer varies on the basis ofage groups, it is less common at a young age (i.e., in their thirties), younger
women lean to have more aggressive breast cancers than older women.
In this paper we perform
comparison on different classification as well as clustering algorithm to
predict breast cancer. A number of attributes are used in performing comparison.
These attributes are compared to find the best classification algorithm.
In paper 1, three different data mining
classification methods are used for the prediction of breast cancer. It
considers different parameters for prediction of cancer. But for superior
prediction, focus is on accuracy and lowest computing time. Studies filtered
all algorithms based on lowest computing time and accuracy and it came up with the conclusion that Naïve Bayes
is a superior algorithm compared to decision tree and k-nearest neighbor,
because it takes lowest time i.e. 0.02 seconds and at the same time is
providing highest accuracy.
In 2 paper,
WPBC dataset is used for finding an efficient predictor algorithm to predict
the recurring or non-recurring nature of disease. This helps Oncologists to
differentiate a good prognosis (non-recurrent) from a bad one (recurrent) and
can treat the patients more effectively. Eight popular data mining methods have
been used, four from clustering algorithms (Kmeans ,EM, PAM and Fuzzy c-means)
and four from classification algorithms (SVM, C5.0, KNN and Naive Bayes).The
results of these algorithms are clearly outlined in this paper with necessary
results. The classification algorithms, C5.0 and SVM have shown 81% accuracy in
classifying there occurrence of the disease. This is found to be best among
all. On the other hand, EM was found to be the most promising clustering
algorithm with the accuracy of 68%. The research shows that the classification
algorithms are better predictor than clustering algorithms. The impact factors
of various parameters responsible for predicting the occurrence/non-occurrence
of the disease can be verified clinically. Further, the identified critical
parametersshould be verified by applying on larger medical dataset topredict
the recurrence of the disease in future.
In paper 3, they intend to build a diagnostic
model for breast cancer which is to search the relationship between breast
cancer and its symptoms. A feature selection method, INTERACT, is applied to
select related and important features in order to improve
the accuracy of the diagnostic model. And, SVM is applied to build the
classification model. Two diagnostic models are built with and without feature
selection for the sake of proving the significance of the feature selection.
Through the experiments, the accuracy of the diagnostic model with feature
selection is improved obviously compared with the model without feature
selection. Meantime, nine features are chosen out as the relevant factors for
building the diagnostic model. The information found out in this study can be
supplementary information for related practitioner better diagnosing heart
In paper 4it
focus on the importance of feature selection in breast cancer prognosis. Using
proper attribute selection technique, any classification algorithm can be
improved significantly. Attributes with less contribution in dataset often
misguides the classification and results in poor prediction. In this work, they
found Support Vector Machine giving much better output both before and after
attribute selection. Area under ROC curve analysis showed results in favor,
where Naïve Bayes and Decision Tree showed much better improvement after
feature selection method. In this paper we only focused on whether breast
cancer is recursive or not. In addition of this work, they try to predict the
time of recurrence of cancer which is classified as recursive.
Paper 5 presented a survey of
classification simulations which can be used for breast cancer detection using WEKA
tool. A discussion on a variety of classification techniques that already exist
in real world and the performance accuracy is listed from that. By using that
we can decide which algorithm is best for the WEKA tool for breast cancer
detection. It compares different algorithmsand found SVM is better having high
accuracy and expectation maximization with the least accuracy.
In paper 6 paper
presented a survey of classification simulations which can be used for breast
cancer detection using WEKA tool. A variety of classification techniques that
already exist in real world are discussed. By using that we can decide which
algorithm is best for the WEKA tool for breast cancer detection.
N 47 0
R 11 0
R 23 23
N 47 0
R 11 0
R 31 15
N 47 0
R 11 0
N 47 0
R 11 0
Table :comparison of clustering and classification
TP: True Positive
TN: True Negative
FP: False Positive
FN: False Negative
From the above
comparisons we came up with a conclusion that the classification algorithms
works better than the clustering algorithms in predicting breast cancer. Andin
the classification algorithms the SVM and C5.0 came up with better performance.
The best algorithm for predicting breast cancer is purely based on the accuracy
of the algorithm.
Shah; Anjali G.
Jivani “Comparison of data mining classification
algorithms for breast cancer prediction”
2 Uma Ojha; Savita Goel “A study on prediction of breast cancer recurrence using data mining techniques” 2017 7th International Conference on Cloud Computing,
Data Science & Engineering – Confluence
3 Runjie ShenYuanyuan Yang
Fengfeng Shao “Intelligent Breast Cancer Prediction Model Using Data Mining
4 Ahmed Iqbal Pritom; Md. Ahadur Rahman Munshi; ShahedAnzarusSabab;Shihabuzzaman Shihab.”Predicting breast cancer recurrence using effective classification and feature selection technique”
S.B.Dheebikaa.Vinodhini , ” Survey
on Breast Cancer Detection Using Weka Tool”
6 Jahanvi Joshi, RinalDoshi,
Jigar Patel, Ph.D,” Diagnosis of
Breast Cancer using Clustering Data Mining Approach”