Heart stroke prediction dataset This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and age (Patient Age) From the histogram and boxplot, it can be seen that this column is normally distributed. is the stroke attribute is stored in the y variable. Framingham Heart Study Dataset Download. 3. 49% and can be used for early teenagers. We systematically Oct 29, 2017 · This research reports predictive analytical techniques for stroke using deep learning model applied on heart disease dataset. Also, the Stroke Predictions Dataset Part 4. As part of the central nervous system, the brain is the organ that controls vision, memory, touch, thought, emotion, breathing, motor skills, hunger, and all other functions that govern our body. In the proposed model, heart stroke prediction is performed on a dataset collected from Kaggle. One of the greatest strengths of ML is its Heart Stroke Prediction Dataset This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. In the Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals. Mar 7, 2025 · Dataset Source: Healthcare Dataset Stroke Data from Kaggle. As a limitation, there could be more advanced initial centroid selection methods in future which will be directly incorporated in K-means Clustering algorithm. Apr 17, 2021 · This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. Stroke prediction is a tough paintings that necessitates a large quantity of records pre-processing, and there's a want to automate the manner for early identity of stroke symptoms so that it may be prevented. Accurate prediction of stroke is highly valuable for early intervention and Jan 15, 2024 · Stroke risk dataset: Stroke risk datasets play a pivotal role in machine learning (ML) for predicting the likelihood of a stroke. This RMarkdown file contains the report of the data analysis done for the project on building and deploying a stroke prediction model in R. Dec 8, 2020 · Fig. 57%) using Logistic Regression on kaggle dataset . For the incomplete data, a missing value imputation method based on iterative mechanism has shown an acceptable prediction accuracy [14] , [15] . Stroke prediction is a complex task requiring huge amount of data pre-processing and there is a need to automate the prediction process for the early detection of symptoms related to stroke so that it can be prevented at an early stage. 2. In this paper, we attempt to bridge this gap by providing a systematic analysis of the various patient records for the purpose of stroke prediction. stroke prediction. SMOTE for Imbalanced Datasets: Enhances the model’s ability to identify the minority class, which is often the class of interest in medical datasets like stroke prediction. The number 0 indicates that no stroke risk was identified, while the value 1 indicates that a stroke risk was detected. To stop strokes before they start, the prediction process for the early detection of stroke symptoms must be automated. Dataset. Stroke prediction is a difficult problem involving a significant amount of data pre-processing. Check for Missing values # lets check for null values df. 52%) and high FP rate (26. To enhance the accuracy of the stroke prediction model, the dataset will be analyzed and processed using various data science methodologies and algorithm About This data science project aims to predict the likelihood of a patient experiencing a stroke based on various input parameters such as gender, age, presence of diseases, and smoking status. isnull(). This also proven by skewness value (-0. The experimental data were divided into training and testing datasets for further analysis and comparison. We use prin- Mar 10, 2023 · In order to predict the heart stroke, an effective heart stroke prediction system (EHSPS) is developed using machine learning algorithms. This comparative study offers a detailed evaluation of algorithmic methodologies and outcomes from three recent prominent studies on stroke prediction. This includes prediction algorithms which use "Healthcare stroke dataset" to predict the occurence of ischaemic heart disease. Without the blood supply, the brain cells gradually die, and disability occurs depending on the area of the brain affected. - akshit113/Heart-Stroke-Prediction Brain stroke prediction dataset A stroke is a medical condition in which poor blood flow to the brain causes cell death. Stroke Prediction Dataset Sep 15, 2022 · Authors Visualization 3. In the following subsections, we explain each stage in detail. This paper makes use of heart stroke dataset. Jul 3, 2021 · Dataset for stroke prediction C. Section 4 presents the results and outcomes using the various machine learning algorithms, before Section 5 presents a comparative evaluation of the Mar 15, 2024 · The proposed PCA-FA method and earlier research on stroke prediction utilizing a stroke prediction dataset are contrasted in Table 4. ˛e proposed model achieves an accuracy of 95. Jun 19, 2021 · Heart Stroke is one of the severe health hazards; therefore, early heart stroke prediction helps the society to save human lives. Oct 27, 2024 · Additionally, we excluded studies that developed models using open data on sharing platforms or repositories such as the heart disease dataset from UCI (University of California, Irvine) ML Repository 45 and CVD, Framingham, stroke, and HF dataset from Kaggle. This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. 65), and both (AUROC, 0. These datasets typically include demographic information, medical histories, lifestyle factors and biomarker data from individuals, allowing ML algorithms to uncover complex patterns and interactions among risk factors. The "Stroke Prediction Dataset" includes health and lifestyle data from patients with a history of stroke. , ischemic or hemorrhagic stroke [1]. - ebbeberge/stroke-prediction Nov 21, 2023 · Didn’t eliminate the records due to dataset being highly skewed on the target attribute – stroke and a good portion of the missing BMI values had accounted for positive stroke; The dataset was skewed because there were only few records which had a positive value for stroke-target attribute heart_stroke_prediction_python using Healthcare data to predict stroke Read dataset then pre-processed it along with handing missing values and outlier. 5 algorithm, Principal Component Analysis, Artificial Neural Networks, and Support Vector Machine Learning project using Kaggle Stroke Dataset where I perform exploratory data analysis, data preprocessing, classification model training (Logistic Regression, Random Forest, SVM, XGBoost, KNN), hyperparameter tuning, stroke prediction, and model evaluation. 1% accurate in predicting heart disease and brain stroke, respectively, based on clinical and patient information, while the MRI image-based deep learning stroke prediction model was 96. There are two main types of stroke: ischemic, due to lack of blood flow, and hemorrhagic, due to bleeding. A stroke occurs when a blood vessel that carries oxygen and nutrients to the brain is either blocked by a clot or ruptures. In the first step, we will clean the data, the next step is to perform the Exploratory Nov 1, 2019 · Most of the existing researches about stroke prediction are concerned with the complete and class balance dataset, but few medical datasets can strictly meet such requirements. read_csv('healthcare-dataset-stroke-data. With this thought, various machine learning models are built to predict the possibility of stroke in the brain. Analysis of large amounts of data and comparisons between them are essential for the prediction, prevention, and management of cardiovascular illnesses including heart attacks. 2 Performed Univariate and Bivariate Analysis to draw key insights. Importing the necessary libraries A Comprehensive Dataset for Machine Learning-Based Heart Disease Prediction Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. The model built using sklearn's KNN module and uses the default settings. Domain Conception In this stage, the stroke prediction problem is studied, i. Apr 1, 2022 · Attempts have been made to identify predictors of recurrent stroke using Cox regression without developing a prediction model. The dataset consisted of 10 metrics for a total of 43,400 patients. 2) of this column. It employs NumPy and Pandas for data manipulation and sklearn for dataset splitting to build a Logistic Regression model for predicting heart disease. Nov 2, 2023 · Among these two, the heart stroke has been considered as the most dangerous disease because heart stroke is directly connected to the brain . Section 3 describes the experimental setup and dataset and explains the methodology. Oct 15, 2024 · Stroke prediction remains a critical area of research in healthcare, aiming to enhance early intervention and patient care strategies. Nov 1, 2023 · The use of machine learning algorithms in heart stroke prediction has the potential to significantly improve patient outcomes and reduce healthcare costs. sum() OUTPUT: id 0 gender 0 age 0 hypertension 0 heart_disease 0 ever_married 0 work_type 0 Residence Mar 13, 2024 · The studies dealt with the 1st dataset called (Heart Attack Analysis and Prediction Dataset) which shows that Yuan (Citation 2021) developed a framework for extracting features using the principle component analysis (PCA) and then compute a mathematical model to choose relevant attributes under suitable restrictions. Jan 9, 2025 · The signs and symptoms of heart disease in patients who have recently been diagnosed or who are at risk of getting the condition are described in this dataset. Framingham Heart Disease Prediction Dataset. About. The dataset is obtained from Kaggle and is available for download. The "Framingham" heart disease dataset has 15 attributes and over 4,000 records. Each row in the data provides relavant information about the patient. head(10) ## Displaying top 10 rows data. 1. A subset of the original train data is taken using the filtering method for Machine Learning and Data Visualization purposes. 11 clinical features for predicting stroke events. Jun 24, 2023 · The heart is one of the most vital organs in our body and crucial for proper bodily function, an unfit heart can seriously affect fitness, lifestyle and severely decrease the expected lifetime of an individual making a healthy heart necessary for survival. The atrial fibrillation symptoms in heart patients are a major risk factor of stroke and share common variables to predict stroke. tackled issues of imbalanced datasets and algorithmic bias using deep learning techniques, achieving notable results with a 98% The heart disease and brain stroke prediction models were found to be 100% and 97. View Notebook Download Dataset Jan 5, 2024 · This multifaceted approach holds the potential to significantly impact the field of healthcare by offering a reliable and understandable tool for heart stroke prediction. Research Drive. L. The value of the output column stroke is either 1 or 0. In this research article, machine learning models are applied on well known heart stroke classification data-set. Apr 16, 2023 · It is necessary to automate the heart stroke prediction procedure because it is a hard task to reduce risks and warn the patient well in advance. efficient in the decision-making processes of the prediction system, which has been successfully applied in both stroke prediction [1-2] and imbalanced medical datasets [3]. Heart Disease DataSet Attributes Feature Selection: Machine learning algorithms can automatically identify relevant features (predictors) from the dataset that contribute most to the prediction of stroke and heart disease risk. Presence of these values can degrade the accuracy 2. Information about the model and application. data=pd. using a dataset Stroke Prediction K-Nearest Neighbors Model. The datasets used are classified in terms of 12 parameters like hypertension, heart disease, BMI, smoking status, etc. There is a dataset called Kaggle’s Stroke Prediction Dataset . The Stroke Prediction Dataset provides essential data that can be utilized to predict stroke risk, improve healthcare outcomes, and foster research in cardiovascular health. In addition, effect of pre-processing the data has also been summarized. Stroke is a destructive illness that typically influences individuals over the age of 65 years age. From 2007 to 2019, there were roughly 18 studies associated with stroke diagnosis in the subject of stroke prediction using machine learning in the ScienceDirect database [4]. prediction of stroke. ITERATURE SURVEY In [4], stroke prediction was made on Cardiovascular Health Study (CHS) dataset using five machine learning techniques. Feb 5, 2024 · Heart attack is a catch-all term for a variety of conditions affecting the heart. The results in Table 4 indicate that the proposed method outperforms the existing work, achieving the highest accuracy of 92. No records were removed because the dataset had a small subset of missing values and records logged as unknown. These metrics included patients’ demographic data (gender, age, marital status, type of work and residence type) and health records (hypertension, heart disease, average glucose level measured after meal, Body Mass Index (BMI), smoking status and experience of stroke). Specifically, this report presents county (or county equivalent) estimates of heart stroke prediction, and the paper’s contribution lies in preparing the dataset using machine learning algorithms. Summary. Jun 24, 2022 · In fact, stroke is also an attribute in the dataset and indicates in each medical record if the patient suffered from a stroke disease or not. There were 5110 rows and 12 columns in this dataset. Sep 27, 2022 · The quality of the Framingham cardiovascular study dataset makes it one of the most used data for identifying risk factors and stroke prediction after the Cardiovascular Heart Disease (CHS) dataset . Recall is very useful when you have to Oct 21, 2024 · Reading CSV files, which have our data. 71), only retinal characteristics (AUROC, 0. Aug 1, 2024 · Medical experts can easily reliable on such prediction models developed in our research, to obtain much better results in prediction of heart stroke severity in their early stages. Fig. 1 Digital twin data 3. This project uses Kaggle's Stroke Prediction dataset to predict heart stroke where the classes are not balanced. An overview of ML based automated algorithms for stroke outcome prediction is provided in Table 1 (Section B). Explore and run machine learning code with Kaggle Notebooks | Using data from Stroke Prediction Dataset Hypertension, Heart Disease and Stroke Prediction | Kaggle Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. In this research work, with the aid of machine learning (ML Stroke is a disease that affects the arteries leading to and within the brain. The cardiac stroke dataset is used in this work We analyze a stroke dataset and formulate advanced statistical models for predicting whether a person has had a stroke based on measurable predictors. It discusses existing heart disease diagnosis techniques, identifies the problem and requirements, outlines the proposed algorithm and methodology using supervised learning classification algorithms like K Jan 14, 2025 · Brain stroke prediction serves as a case study to demonstrate the application’s capabilities, which can be extended to address a variety of pathologies, including heart attacks, cancers, osteoporosis, and epilepsy. Expand This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. The stroke prediction dataset was used to perform the study. May 20, 2024 · The stroke prediction dataset was created by McKinsey & Company and Kaggle is the source of the the imbalanced dataset highlighted hypertension and heart disease as the 4th and 5th most Dec 28, 2024 · This retrospective observational study aimed to analyze stroke prediction in patients. The following table provides an extract of the dataset used in this article. This study aims to enhance stroke prediction by addressing imbalanced datasets and algorithmic bias. The dataset can be found in the repository or can be downloaded from Kaggle. The accuracy of the existing stroke predictions, which used a downsampling technique to balance the data, was 75%. data = read. In this project, we will attempt to classify stroke patients using a dataset provided on Kaggle: Kaggle Stroke Dataset Nov 1, 2022 · Using a publicly available dataset of 29072 patients’ records, we identify the key factors that are necessary for stroke prediction. As an optimal solution, the authors used a combination of the Decision Tree with the C4. Our research focuses on accurately and precisely detecting stroke possibility to aid prevention. 55% using the RF classifier for the stroke prediction dataset. Feb 1, 2025 · Section 2 briefly introduces some related work on machine learning-based heart stroke detection and prediction. info() ## Showing information about datase data. This study investigates the efficacy of machine learning techniques, particularly principal component analysis (PCA) and a stacking ensemble method, for predicting stroke occurrences based on demographic, clinical, and lifestyle factors. This process helps in selecting the most informative variables for the model. 3. Stroke Prediction Dataset Context According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for approximately 11% of total deaths. Fig 2 shows the dataset. ; In this column, the kurtosis value is -0. This study evaluates three different classification models for heart stroke prediction. View Notebook Download Dataset This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, and various diseases and smoking status. 15 Dec 21, 2021 · In this paper, we will consider using a stroke prediction dataset for building a model for stroke prediction. May 8, 2024 · accuracy score of 92. However, a systematic analysis of the risk factors is missing. Data Pre-processing The dataset obtained contains 201 null values in the BMI attribute which needs to be removed. 17% for the prediction of heart stroke. The dataset included 401 cases of healthy individuals and 262 cases of stroke patients admitted in hospital Therefore, the stroke must be precisely predicted to begin treatment as soon as possible. Whether you’re working on machine learning models or health risk analysis, this dataset offers a rich set of features for developing innovative solutions. - ajspurr/stroke_prediction Oct 7, 2024 · The datasets have many features that can be used for heart disease prediction including age, gender, blood pressure, cholesterol levels, electrocardiogram readings-ECG, chest pain, exercise Nov 1, 2022 · On the contrary, Hemorrhagic stroke occurs when a weakened blood vessel bursts or leaks blood, 15% of strokes account for hemorrhagic [5]. To review, open the file in an editor that reveals hidden Unicode characters. Fig 2. With help of this CSV, we will try to understand the pattern and create our prediction model. Several machine learning algorithms have also been proposed to use these risk factors for predicting stroke occurrence [9], [10]. An early detection system for signs of a heart attack must be implemented in light of the alarming rise in the number of heart attacks in Stroke Prediction Using Clinical Features CHIA QIN FENG, KELVIN TING YI HAO, SAM TEY, LIM KAI LING, BINGYAN LI 6/11/2022 Apr 25, 2022 · intelligent stroke prediction framework that is based on the data analytics lifecycle [10]. Several studies have been conducted using the Stroke Prediction Dataset in recent years, and the results have been By detecting high-risk individuals early, appropriate preventive measures can be taken to reduce the incidence and impact of stroke. This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, and various diseases and smoking status. Oct 4, 2024 · In addition, the authors investigated 20 the use of predictive analytics techniques for stroke prediction using deep learning models applied to heart disease datasets. A. Age has correlations to bmi, hypertension, heart_disease, avg_gluclose_level, and stroke; All categories have a positive correlation to each other (no negatives) Data is highly unbalanced; Changes of stroke increase as you age, but people, according to this data, generally do not have strokes. Edit dtype: int64 --heart_disease: 0 4834 1 276 Name: heart_disease, dtype: int64 --stroke: 0 4861 1 249 Name Jun 21, 2022 · A stroke is caused when blood flow to a part of the brain is stopped abruptly. Jan 1, 2022 · The pattern of the attributes as per the provided dataset was monitored for accurate prediction of heart stroke in the patients. [ ] The dataset used for stroke prediction is very imbalanced. Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. We use principal component analysis (PCA) to transform the higher dimensional feature space into a lower dimension subspace, and understand the relative importance of each input attributes. heart_disease, ever_married, stroke; Categorical Dec 1, 2021 · This document describes a student project that aims to develop a machine learning model for heart disease identification and prediction. This is a demonstration for a machine learning model that will give a probability of having a stroke. By identifying individuals who are at high risk of having a heart stroke, healthcare providers can intervene early to prevent the onset of the condition or minimize its effects [6, 10 In this Project Respectively, We have tried to a predict classification problem in Stroke Dataset by a variety of models to classify Stroke predictions in the context of determining whether anybody is likely to get Stroke based on the input parameters like gender, age and various test results or not We have made the detailed exploratory Even now, the global incidence of heart disease and stroke is rising steadily. By detecting high-risk individuals early, appropriate preventive measures can be taken to reduce the incidence and impact of stroke. Discussion. The target of the dataset is to predict the 10-year risk of coronary heart disease (CHD). Early recognition of symptoms can significantly carry valuable information for the prediction of stroke and promoting a healthy life. Jan 5, 2024 · This multifaceted approach holds the potential to significantly impact the field of healthcare by offering a reliable and understandable tool for heart stroke prediction. This disease is rapidly increasing in developing countries such as China, with the highest stroke burdens [6], and the United States is undergoing chronic disability because of stroke; the total number of people who died of strokes is ten times greater in In this project, I use the Heart Stroke Prediction dataset from WHO to predict the heart stroke. In our research, we harnessed the potential of the Stroke Prediction Dataset, a valuable resource containing 11 distinct attributes. The models are a Random Forest, a K-Nearest Neighbor and a Logistic Regression model. In recent years, some DL algorithms have approached human levels of performance in object recognition . In the Heart Stroke dataset, two class is totally imbalanced and heart stroke datapoints will be easy to ignore to compare with the no heart stroke datapoints. 5, which indicates that the column is platikurtic. Dec 13, 2024 · Stroke prediction is a vital research area due to its significant implications for public health. 74) whereby performance was measured on the same data used for model development (no separate test data). Oct 28, 2024 · 2. Using a publicly available dataset of 29072 patients’ records, we identify the key factors that are necessary for stroke prediction. Stroke is the 2nd leading cause of death globally, and is a disease that affects millions of people every year: Wikipedia - Stroke . As heart stroke prediction is a complex task, there is a need to automate the prediction process to avoid risks associated with it and alert the patient well in advance. blood pressure, diabetes and heart disease as major risk factors responsible for stroke attack in an individual. To enhance the accuracy of the stroke prediction model, the dataset will be analyzed and processed using various data science methodologies and algorithms. Our study focuses on predicting Nov 8, 2023 · About Data Analysis Report. heart stroke prediction is performed the use of a dataset Jul 1, 2021 · Stroke is the third leading cause of death and the principal cause of serious long-term disability in the United States. Several approaches were Nov 26, 2021 · 2. Dataset for stroke prediction C. 1. has been carried out on the prediction of heart stroke but very few works show the risk of a brain stroke. csv("stroke_data. 2. We are predicting the stroke probability using clinical measurements for a number of patients. The source code for how the model was trained and constructed can be found here. This objective can be achieved using the machine learning techniques. Learn more. II. Jun 25, 2020 · Authors of [12] tested various models on the dataset provided by Kaggle for stroke prediction. The dataset contains eleven clinical traits that can be used AI holds significant potential in heart stroke prediction and diagnosis; however, it must confront parallel challenges to ensure precision and interpretability in its application by healthcare professionals. 5110 observations with 12 characteristics make up the data. This dataset documents rates and trends in heart disease and stroke mortality. describe() ## Showing data's statistical features According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for approximately 11% of total deaths. csv") str Stroke Prediction Dataset Context According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for approximately 11% of total deaths. e. While using such data to train a machine-level model may result in accuracy, other accuracy measures such as precision and recall are inadequate. 46 This is because, firstly, they do not have a clear definition of the CVD outcome Sep 1, 2023 · Stroke is a major public health issue with significant economic consequences. Sep 22, 2023 · About Data Analysis Report. Presence of these values can degrade the accuracy of the model. Ivanov et al. csv') data. The dataset has a total of 5110 rows, with 249 rows indicating the possibility of a stroke and 4861 rows confirming the lack of a stroke. ipynb_ File . Project Thesis This project employs machine learning principles on extensive existing datasets to predict stroke risk based on Nov 24, 2023 · This project uses Kaggle's Stroke Prediction dataset to predict heart stroke where the classes are not balanced, and it has been observed that the Instance Hardness Threshold re-sampling technique along with the Exhaustive feature selection method across the Random Forest classifier yields a better accuracy. We tackle the overlooked aspect of imbalanced datasets in the healthcare literature. This This project analyzes the Heart Disease dataset from the UCI Machine Learning Repository using Python and Jupyter Notebook. One-Hot Encoding for Categorical Variables: Ensures that categorical variables are properly incorporated into the model. Stages of the proposed intelligent stroke prediction framework. 67% accurate. 13,14 Logistic regression was used with only clinical and imaging variables (AUROC, 0. 2: Summary of the dataset. The data pre-processing techniques inoculated in the proposed model are replacement of the missing Many such stroke prediction models have emerged over the recent years. 1 Brain stroke prediction dataset Feb 7, 2024 · Their objectives encompassed the creation of ML prediction models for stroke disease, tackling the challenge of severe class imbalance presented by stroke patients while simultaneously delving into the model’s decision-making process but achieving low accuracy (73. ughah tra ilcpbix ugjbiq qllxd opkbd xnozs uehc vvvluzk hwfcw kxraf ylz srqla arwiq gnba