[XGBoost] 심장 질환 예측

728x90

SMALL

데이터셋 로드

import pandas as pd

df = pd.read_csv('Heart_Prediction_Dataset.csv')
df.head()

원-핫 인코딩

df = pd.get_dummies(df)
df.head()

df['HeartDisease'].value_counts()

데이터 전처리

# 훈련 데이터, 검증 데이터, 테스트 데이터 나누기
features = df[df.keys().drop(['HeartDisease'])].values
outcome = df['HeartDisease'].values.reshape(-1,1)

from sklearn.model_selection import train_test_split

train_features, test_features, train_target, test_target = train_test_split(features, outcome, stratify=outcome, test_size=0.3)
train_features, val_features, train_target, val_target = train_test_split(train_features, train_target, stratify=train_target, test_size=0.3)

데이터 스케일링

from sklearn.preprocessing import MinMaxScaler

feature_scaler = MinMaxScaler()
train_features_scaled = feature_scaler.fit_transform(train_features)
val_features_scaled = feature_scaler.transform(val_features)
test_features_scaled = feature_scaler.transform(test_features)

모델 학습

from xgboost import XGBClassifier
from xgboost.callback import EarlyStopping

xgb = XGBClassifier()

early_stop = EarlyStopping(rounds=20,metric_name='error',data_name="validation_0",save_best=True)
xgb.fit(train_features_scaled, train_target,eval_set=[(val_features_scaled, val_target)], eval_metric='error', callbacks=[early_stop])
result = xgb.predict(test_features_scaled)

# 알고리즘 성능 평가
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

print('Accuracy :',accuracy_score(test_target, result))
print('Precision :',precision_score(test_target, result))
print('Recall :',recall_score(test_target, result))
print('F1 score :',f1_score(test_target, result))

Accuracy : 0.855072463768116
Precision : 0.8598726114649682
Recall : 0.8823529411764706
F1 score : 0.8709677419354838

# 훈련 시 중요한 학습 특징 확인
from xgboost import plot_importance
import matplotlib.pyplot as plt
%matplotlib inline

fig, ax = plt.subplots(figsize=(10, 12))
plot_importance(xgb,ax=ax)

중요한 성능 특징만으로 특징 설정

df.keys()

features = df[['MaxHR','Cholesterol','RestingBP','Oldpeak','Age','ChestPainType_ASY','ExerciseAngina_N','Sex_F','FastingBS','ST_Slope_Up','ChestPainType_NAP']].values
outcome = df['HeartDisease'].values.reshape(-1,1)

from sklearn.model_selection import train_test_split

train_features, test_features, train_target, test_target = train_test_split(features, outcome, stratify=outcome, test_size=0.3)
train_features, val_features, train_target, val_target = train_test_split(train_features, train_target, stratify=train_target, test_size=0.3)

from sklearn.preprocessing import MinMaxScaler

feature_scaler = MinMaxScaler()
train_features_scaled = feature_scaler.fit_transform(train_features)
val_features_scaled = feature_scaler.transform(val_features)
test_features_scaled = feature_scaler.transform(test_features)

from xgboost import XGBClassifier
from xgboost.callback import EarlyStopping

xgb = XGBClassifier()

early_stop = EarlyStopping(rounds=20,metric_name='error',data_name="validation_0",save_best=True)
xgb.fit(train_features_scaled, train_target,eval_set=[(val_features_scaled, val_target)], eval_metric='error', callbacks=[early_stop])
result = xgb.predict(test_features_scaled)

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

print('Accuracy :',accuracy_score(test_target, result))
print('Precision :',precision_score(test_target, result))
print('Recall :',recall_score(test_target, result))
print('F1 score :',f1_score(test_target, result))

Accuracy : 0.8623188405797102
Precision : 0.891156462585034
Recall : 0.8562091503267973
F1 score : 0.8733333333333333

from xgboost import plot_importance
import matplotlib.pyplot as plt
%matplotlib inline

fig, ax = plt.subplots(figsize=(10, 12))
plot_importance(xgb,ax=ax)

728x90

LIST

저작자표시 비영리 변경금지 (새창열림)

'Learning-driven Methodology > ML (Machine Learning)' 카테고리의 다른 글

[XGBoost] 보험료 예측 (0)	2022.10.05
[XGBoost] 심혈관 질환 예측 (0)	2022.10.05
[XGBoost] 위스콘신 유방암 데이터 (3) (0)	2022.10.04
[XGBoost] 위스콘신 유방암 데이터 (2) (0)	2022.10.04
[XGBoost] 위스콘신 유방암 데이터 (1) (0)	2022.10.04

GOATLAB

[XGBoost] 심장 질환 예측

데이터셋 로드

원-핫 인코딩

데이터 전처리

데이터 스케일링

모델 학습

중요한 성능 특징만으로 특징 설정

'Learning-driven Methodology > ML (Machine Learning)' 카테고리의 다른 글

티스토리툴바

[XGBoost] 심장 질환 예측

데이터셋 로드

원-핫 인코딩

데이터 전처리

데이터 스케일링

모델 학습

중요한 성능 특징만으로 특징 설정

'Learning-driven Methodology > ML (Machine Learning)' 카테고리의 다른 글

관련글

티스토리툴바