Postgraduate Diploma in Information TechnologyProgram Name Research in Information TechnologyCourse Code IT8*01Title of the paper Data Mining in HealthcareMentor Dr. Zawar ShahPrepared By Abhishek Pravin ManjrekarDate of Submission Student Signature AbhishekAbstract Some of the important applications of data mining in health care includes predicting the future outcomes of diseases based on old data collected from similar diseases, diagnosis of disease based on patient data, analysing treatment costs and demand of resources, missing data and minimizing the time to wait for the disease diagnosis. There is an unstoppable increase in the amount of electronic health records being collected by healthcare facilities.
Data mining in Healthcare is a crucial and difficult task that needs to be executed accurately. It attempts to solve real world health problems in diagnosis and treatment of diseases. This literature review attempts to find out interesting results of patients data by using different techniques. Also mentioned about the various algorithm and methods used in Data mining for healthcare sector. This literature also reviews the diagnosis and treatments of patients and also about the reduction of cost in treatment.Keywords: Data Mining; Healthcare; Diagnosis; Treatments; Algorithms; TechniquesIntroduction Data mining is a process used by business, banking sector, health sector to turn data into useful information. Data Mining is the area of research for digging of useful information from previous data as shown in Fig 1.
1. In this figure, it is shown that all the information or data is collected and then it is analyse by using various techniques, then we get the useful results that can be implemented further. There are different Data Mining tools and techniques used to predict behaviour and trends in the data which allow the experts to make active and more accurate decision based on the knowledge. There are seven main techniques used in Data Mining such as Tracking Patterns, Classification, Association, Detection, Clustering, Regression and Prediction. Each of these techniques are highly effective and helps in to evaluate data to give the best results.
Fig 1.1 Data Mining Healthcare industry produces a large amount of data about diseases but all is waste and does not helps in effective decision making. The health care expert has their own experience on the basis of which they predict about particular disease of the patient which may sometime leads to the false results. So, there is need to apply data mining by using the patient’s historical data and implement different data mining techniques to find hidden patterns and similarity that may support in many ways. Also each data mining technique have a different purpose depending on the need and use. Although Data mining in Healthcare is an important yet complicated task that needs to be executed perfectly.
Information technologies in healthcare created the electronic patient records obtained from monitoring of the patients. This information includes records on the treatment progress, prescribed drugs, lab results, details of examination, previous medical history, etc. Institutions of health are able to use data mining techniques or tools for a variety of areas, such as customer satisfaction and economic indicators, quality indicators, performance of physicians, cost efficiency and decision making based on evidence, optimize health care identifying high-risk patients, etc. There are many advantages of the Data Mining in healthcare like it may group the patients together with similar type of disease so that they can be given effective treatment, provide medical treatments at low cost, detect the causes of disease and give appropriate treatments, minimizing the time for treatments, helps in developing efficient healthcare insurance policies, etc.Research Questions1. What are the best techniques or algorithms used for mining of such huge data?2. Which is the best tool for Data mining?Literature ReviewData Mining in Healthcare for Heart Disease(Shafique, U.
, Majeed, F., Qaiser, H., ; Ul Mustafa, I.
(2015))SummaryIn the present generation, the heart diseases or heart issues are very hot topics in Healthcare Sector. According to World Health Organisation (WHO) the death rate of people will reach more than twenty-three million annually. As the data in Health sector is humungous, the author has some solution and techniques that can be used for Data Mining .This article not only discusses the challenges but also proposes some solutions and tools for Data Mining of such critical disease. StrengthTool like Waikato Environment for Knowledge Analysis (WEKA) developed at the University of Waikato, New Zealand is used.
It is an open source and can be accessed from anywhere. Different Data Mining tasks are performed by WEKA such as classification, clustering, regression, etc.The major strength of this article is that the author has used the algorithm such as Decision Tree, Naïve Bayes and Artificial Neural Networks.
There are four experiments performed by author to compare which algorithm is the best one. The author got 82.914% highest accuracy and has a fastest execution time in Naïve Bayes algorithm.WeaknessThe author has taken only Five hundred and ninety-seven records or datasets of patients, so there should be more data for more accuracy. In this article only three algorithms are used and the author has concluded that one of them are the best.
The author should have taken more algorithms into consideration. The records of patients should be more quality oriented and with no missing values. The author should have used more Data Mining techniques like tracking pattern, Detection, Clustering, etcResearch MethodologyQuantitative Method. Authors have gathered Five hundred and ninety-seven patient’s records or datasets and have done the experiments that resulted to the best algorithm among three.Application of Data Mining: Diabetes healthcare in young and old patients(Abdullah, A.A.
, Mohammed, G.A., Mohammed, K.S. (2013))SummaryDiabetes is most common disease across the globe. This disease is spread through age starting from fifteen years young people to old people. The authors have collected the data from World Health Organisation (WHO) and used for the experiment.
In this article the authors have used various Data Mining techniques and tools to discover pattern and then identify which best treatment can be given to the patient.StrengthThe authors have used tool such as Oracle Data Miner (ODM) for the Data Mining. It helps in data analysis, find hidden patterns and information. The authors have also used predictive and regression Data Mining techniques so that it can predict the effective treatment for diabetes.The authors have used algorithm such as Support Vector Machine (SVM).This algorithm is used for the classification of the data. The authors have taken all the attributes of the patient into consideration like age, weight, height, blood pressure, blood sugar, alcohol consumption, smoking, etc.
WeaknessThe drawbacks of this article is that the treatment after Data Mining for old age people is more effective than the young age people. The authors have only experimented the Data Mining between the age group of 15 years to 45 years. The authors have used only one algorithm and two Data Mining techniques for the treatment of the patients. In this article the author have chosen ODM for the Data Mining, but for that one needs to have knowledge of coding language. Not everybody can use ODM for the diagnosis of the data except for the coders.
Research MethodologyQuantitative Method. The authors have gathered 2005 dataset that is available in WHO and have experimented this data by using Predictive and Regression Techniques and have presented in the form of numerical form and statistics.Clinical decision support systems for heart disease using data mining approach.(Singh, H., & Kaswan, K. S.
(2016))SummaryHeart Disease is the leading cause of death globally. So, there are number of patients who are infected with heart disease. There are large records of patient’s data all over the world. To examine this, Data Mining is used for the effective diagnosis and treatment of the patients.
In this article, the authors have experimented patient records with attributes and used some Data mining techniques and algorithms for the best results.StrengthThe authors have used four algorithms such as Multilayer Perceptron (MLP), Random Forest, J48, Alternating Decision Tree (AD Tree) for the Data Mining. After the experiment the authors have got 81.
5% of accuracy in MLP, comparatively higher than the other three algorithm.WeaknessIn this article, the authors have taken patient records from the age of 29 years to 77 years. The other drawback is that the authors have considered not more than 14 attributes. The MLP algorithm takes longer time to build the model than the rest of the algorithms.Research MethodologyQuantitative Method.
The author have collected 303 instances and 14 attributes of patient records for the Data Mining and resulted in numerical form and statistics.Integrating Decision Tree and K-Means Clustering with Different Initial Centroid Selection Methods in the Diagnosis of Heart Disease Patients.(Shouman, M.
, Turner, T., ; Stocker, R. (2012))SummarySeveral researchers have been using Data Mining techniques for the diagnosis of medical patients. Heart Disease led to causing death more than ten million past ten years. In this article the authors has explained about the Data Mining techniques and methods that are useful for the accuracy of the treatments of patient. This article shows that the k-mean clustering and decision tree enhance the diagnosis of heart disease patients.
StrengthThe authors have used Clustering technique like k-mean clustering, Decision tree and Initial Centroid selection for the patient records. The author has also mentioned about the 10 Fold Cross Validation method to test the data. By using the k-means clustering, the author has got 83.9% accuracy.WeaknessAuthors have not used a larger dataset to identify if this techniques or methods are useful .
Only 303 datasets have been mentioned, but in that also 6 datasets have missing values. The authors have only tested thirteen attributes out of seventy-six attributes. Attributes are those which contains age, height, weight, sex, etc. In this article the authors have not used any other technique other than Decision tree and k-mean clustering.
The author has not mention about any tool being used.Research MethodologyQuantitative Method. The data is collected from Cleveland Clinic Foundation that is in numerical form. It is then displayed in numerical for and statistics.Data Mining Applications In Healthcare Sector(Durairaj, M., Ranjani, V. (2013))SummaryData mining applications are used in commercial and scientific sectors.
If Data Mining is applied to Healthcare sector, it plays a significant role to aid the patient with accurate diagnosis and treatment. This article mainly compares the tools and techniques in Data mining with Health care problems. This article reviews the infertility issues and the techniques used to predict the methodologies and treatment with the accuracy level.StrengthThis article has thoroughly described about each Data Mining methods.
If the Rough Set Theory of data mining approach (RST) and Artificial Neural Network (ANN) are used together, the authors have got best results resulting to 90% accuracy. The hybrid approach gave the best results to predict the success rate of in-vitro fertilization (IVF).This article showed if two or more techniques are combined, it may lead to more accuracy percentage in healthcare.WeaknessThe author should have done more experiment using more techniques and algorithms for Data mining.
Author has mentioned about the 97.77% accuracy for cancer treatment using WEKA tool with one technique and one algorithm, but has not mentioned what attributes were taken into consideration for cancer treatment.Research MethodologyQuantitative Method. The author has used three Data mining techniques for the IVF treatment success rate. The medical data and results are in numerical form and statistics.
A Data-Mining Framework for Transnational Healthcare System(Shen, C., Jigjidsuren, C., Dorjgochoo, S., Chen, C., Chen, W., Hsu, C., Lai, F.
(2012))SummaryEvery country vary with geographical conditions like climate, location, culture, etc., so the disease risks also vary across countries. The major problems are improving medical resources and giving effective treatments to patients. Despite the advance technology, the world is still struggling to get the specific data that needs to be diagnosed.
To overcome this problem, the author have used a Data mining Framework for Transnational healthcare system that includes pre-diagnose tests, alternative methods and evaluation of liver diseases. The author demonstrated this experiment on patient’s records in US, Taiwan and Mongolia.StrengthThe Data-mining Framework for Transnational Healthcare System (DFTHS) is a promising framework used in this article as the error rate considerably decreased by 26%. It is helpful for the physicians as it consists of some clinical values. After the process of the Data mining framework, it takes less time for the doctors to diagnose the patient and give him appropriate treatment.
By use of Clustering technique, this experiment was considered as an aid to physicians as requires very less time to figure out the diagnosis and treatment for the patient.WeaknessOne of the main drawback of the article is that the patient should have early checks so that proper diagnosis and treatment can be beneficial. Because if the patient’s condition is critical, the Data mining framework would not be able to assist the doctor properly for diagnosis. The other drawback is that this framework is not fully-automated system as it does not calculate the changed results.Research MethodologyQuantitative Method. The author has collected Twenty-two thousand patient records and those were experimented in Data mining Framework for Transnational Healthcare System (DFTHS).
The author got the results in numerical form.Naive Bayes Classification of Public Health Data with Greedy Feature Selection.(Hickey, S.
J. (2013))SummaryData Mining is the work of discovering patterns and behaviours in data given. Data mining has been grown because of such huge data that is produced by many sector like business, banking, etc. Data mining is largely used in health like to track the patient’s records, diagnose and find an appropriate solution for the disease, for appropriate tests and results.In this article, the author have used Naïve Bayes classifiers with greedy feature selection in WEKA tool to identify which best attributes can be used. This article predicts the cost of treatment, outcome of a treatment that can be useful for future patients.StrengthWith the combination of attributes and applying Data mining to those represented the outcome of treatment and also the cost of the treatment.
It is also useful for the stakeholders to use this type of Data mining. This method is useful to those people who are in Low Income Group as the cost of treatment will decrease.WeaknessThe author have only taken the patient’s hospital stay, Discharge Status, number of diagnosis done into consideration. The author should have taken many attributes such as sex, age, blood pressure, allergies, etc. in this method so that there would be more effective and accuracy of treatment. The author is not confident in the end because the data was insufficient, would have been added more attributes.Research MethodologyQuantitative method.
The authors has collected 135,418 medical records from Hospital Discharge Survey (National Centre for Health Statistics (NCHS).A Study on Data Mining Classification Algorithms for Medical Data(Rani, P. R. S.
(2014))SummaryProcess of analysing a data and summarising it into useful information is call Data mining. Various techniques, methods and algorithms are present for mining of data.In this article the author have described about various Data mining algorithm that are useful for medical data. This article also shows the problem related to medical sector used by different algorithms. The author have described about Artificial Neural Network (ANN), Bayesian classifiers, Decision Trees (DT), Support Vector Machines (SVM).StrengthThe author have thoroughly described about how different algorithms can be used to different diseases.
He has also defined the formulae for each of the algorithms which can be used for medical analysis.WeaknessThe author has not described about the rest of the algorithms such as J48, k-mean clustering, etc. In this article there is only specification about ANN, Bayesian classifiers, DT and SVM. The author have not used any techniques related to Data mining. The author have not mentioned about any Data mining tools to be used. There is only limited amount of datasets present and experimented.
Research MethodologyQualitative method. The author has discussed by referring other articles. DiscussionResearch QuestionsWhat are the best techniques or algorithms used for mining of such huge data?There are various techniques or algorithms used for Data mining. All the authors have discussed about the techniques or algorithm in their experiment and also have explained them as the best by one way or other. Some suggested to use Decision Tree, SVM, Bayesian Classifiers, etc. However, one of the best techniques or algorithm found was the one with the Hybrid Approach (Durairaj, M.
, Ranjani, V. (2013) Data Mining Applications in Healthcare Sector). The accuracy by using the algorithms like Artificial Neural Network (ANN) and Rough Set Theory (RST) together was very high than the other articles mentioned.
The result came up to 90% accuracy and can be used for the effective treatment of data and diagnosis.Which is the best tool for Data mining?The techniques and algorithms can only be used via tool or application. The authors in the articles mentioned about the tools like ODM, DFTHS, etc. One of the best tool used in this articles is WEKA. WEKA tool consists of number of techniques and algorithms. If the classification technique along with Rule Decision Table to be used together, the accuracy would be 97.77% in the medical treatment of a patient.
Literatue Map Concept Matrix Articles Issues Tools Techniques3.1 X X 3.2 X X 3.3 X X3.
4 X X3.5 X X 3.6 X X 3.7 X X3.8 ConclusionData mining is accepted everywhere now.
It brings many advantages like ease of use, accuracy, scalability, flexibility. Certain aspects like security and privacy of the data needs improvement. From the study of the articles, it is clear that more than single technique is needed for accuracy of the data.This study had eight articles to aid the patients’ records. Each article contained various method, tools and techniques used for the accuracy and treatment of patients. All authors presented the importance of study in data mining in medical data in their own aspects. Data mining utilization improved the decision making for diagnostic problem for disease.
This papers also gave more space for future research on diseases and technologies to provide a greater solution for both doctors and patients.ReferencesAljumah, A. A., Ahamad, M. G.
, & Siddiqui, M. K. (2013). Application of data mining: Diabetes health care in young and old patients. Journal of King Saud University – Computer and Information Sciences, 25(2), 127-136.Durairaj, M., Ranjani, V.
(2013) CITATION Dur13 l 16393 (Durairaj & Ranjani, 2013)Data Mining Applications In Healthcare Sector. International Journal of Scientific & Technology Research.Hickey, S. J. (2013).
Naive bayes classification of public health data with greedy feature selection. Communications of the IIMA, 13(2), 87-97.Rani, P. R. S.
(2014). A study on data mining classification algorithms for medical data. International Journal of Advanced Research in Computer Science, 5(2).Singh, H.
, & Kaswan, K. S. (2016). Clinical decision support systems for heart disease using data mining approach.
International Journal of Computer Science and Software Engineering, 5(2), 19-23.Shafique, U., Majeed, F., Qaiser, H., & Ul Mustafa, I. (2015).
Data mining in healthcare for heart diseases. International Journal of Innovation and Applied Studies, 10(4), 1312-1322.Shen, C., Jigjidsuren, C., Dorjgochoo, S.
, Chen, C., Chen, W., Hsu, C., Lai, F.
(2012). A data-mining framework for transnational healthcare system.Journal of Medical Systems, 36(4), 2565-75.Shouman, M.
, Turner, T., & Stocker, R. (2012). Integrating decision tree and K-means clustering with different initial centroid selection methods in the diagnosis of heart disease patients.
Paper presented at the 1-7.