728x90
반응형
SMALL
데이터셋 로드
import pandas as pd
df = pd.read_csv('Cardiovascular_Disease_dataset.csv')
df.head()
df['Presence or absence of cardiovascular disease'].value_counts()
0 35021
1 34979
Name: Presence or absence of cardiovascular disease, dtype: int64
데이터 전처리
# 훈련 데이터, 검증 데이터, 테스트 데이터로 나누기
features = df[df.keys().drop(['id','Presence or absence of cardiovascular disease'])].values
outcome = df['Presence or absence of cardiovascular disease'].values.reshape(-1,1)
from sklearn.model_selection import train_test_split
train_features, test_features, train_target, test_target = train_test_split(features, outcome, stratify=outcome, test_size=0.3)
train_features, val_features, train_target, val_target = train_test_split(train_features, train_target, stratify=train_target, test_size=0.3)
데이터 스케일링
from sklearn.preprocessing import MinMaxScaler
feature_scaler = MinMaxScaler()
train_features_scaled = feature_scaler.fit_transform(train_features)
val_features_scaled = feature_scaler.transform(val_features)
test_features_scaled = feature_scaler.transform(test_features)
모델 학습
from xgboost import XGBClassifier
from xgboost.callback import EarlyStopping
xgb = XGBClassifier()
early_stop = EarlyStopping(rounds=20, metric_name='error', data_name="validation_0", save_best=True)
xgb.fit(train_features_scaled, train_target,eval_set=[(val_features_scaled, val_target)], eval_metric='error', callbacks=[early_stop])
result = xgb.predict(test_features_scaled)
# 알고리즘 성능 평가 결과
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
print('Accuracy :',accuracy_score(test_target, result))
print('Precision :',precision_score(test_target, result))
print('Recall :',recall_score(test_target, result))
print('F1 score :',f1_score(test_target, result))
Accuracy : 0.7382857142857143
Precision : 0.7552083333333334
Recall : 0.7046883933676387
F1 score : 0.7290742383910087
from xgboost import plot_importance
import matplotlib.pyplot as plt
%matplotlib inline
fig, ax = plt.subplots(figsize=(10, 12))
plot_importance(xgb,ax=ax)
728x90
반응형
LIST
'Learning-driven Methodology > ML (Machine Learning)' 카테고리의 다른 글
[Machine Learning] 오토인코더 (Autoencoder) (0) | 2022.11.11 |
---|---|
[XGBoost] 보험료 예측 (0) | 2022.10.05 |
[XGBoost] 심장 질환 예측 (0) | 2022.10.04 |
[XGBoost] 위스콘신 유방암 데이터 (3) (0) | 2022.10.04 |
[XGBoost] 위스콘신 유방암 데이터 (2) (0) | 2022.10.04 |