본문 바로가기
Learning-driven Methodology/ML (Machine Learning)

[XGBoost] 위스콘신 유방암 데이터 (2)

by goatlab 2022. 10. 4.
728x90
반응형
SMALL

wdbc 데이터셋

 

 

특징은 유방 종괴의 미세 바늘 흡인물 (FNA)의 디지털화된 이미지에서 계산된다. 이것은 이미지에 존재하는 세포 핵의 특성을 설명한다. (https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

 

wdbc_data.csv
0.12MB

import pandas as pd

df = pd.read_csv('wdbc_data.csv', header=None)
df.head()

# Outcome 값 변경
df.loc[df[1]=='B', 1] = 0
df.loc[df[1]=='M', 1] = 1

df[1] = df[1].astype('int32')

df.describe()

df.isnull().sum()

 

데이터 전처리

 

features = df[df.keys().drop([0,1])].values
outcome = df[1].values.reshape(-1,1)

from sklearn.model_selection import train_test_split

train_features, test_features, train_target, test_target = train_test_split(features, outcome, stratify=outcome, test_size=0.3)
train_features, val_features, train_target, val_target = train_test_split(train_features, train_target, stratify=train_target, test_size=0.3)

 

데이터 스케일링

 

from sklearn.preprocessing import MinMaxScaler

feature_scaler = MinMaxScaler()
train_features_scaled = feature_scaler.fit_transform(train_features)
val_features_scaled = feature_scaler.transform(val_features)
test_features_scaled = feature_scaler.transform(test_features)

 

검증 데이터를 이용한 조기 중단 알고리즘 적용

 

from xgboost import XGBClassifier
from xgboost.callback import EarlyStopping

xgb = XGBClassifier()

early_stop = EarlyStopping(rounds=20,metric_name='error',data_name="validation_0",save_best=True)
xgb.fit(train_features_scaled, train_target,eval_set=[(val_features_scaled, val_target)], eval_metric='error', callbacks=[early_stop])
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

result = xgb.predict(test_features_scaled)

print('Accuracy :',accuracy_score(test_target, result))
print('Precision :',precision_score(test_target, result))
print('Recall :',recall_score(test_target, result))
print('F1 score :',f1_score(test_target, result))
Accuracy : 0.9590643274853801
Precision : 0.9830508474576272
Recall : 0.90625
F1 score : 0.943089430894309
from xgboost import plot_importance
import matplotlib.pyplot as plt
%matplotlib inline

fig, ax = plt.subplots(figsize=(10, 12))
plot_importance(xgb,ax=ax)

 

조기 중단 알고리즘 비교를 위해 단일 학습

 

from xgboost import XGBClassifier
from xgboost.callback import EarlyStopping

xgb = XGBClassifier()
xgb.fit(train_features_scaled, train_target)

result = xgb.predict(test_features_scaled)
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

print('Accuracy :',accuracy_score(test_target, result))
print('Precision :',precision_score(test_target, result))
print('Recall :',recall_score(test_target, result))
print('F1 score :',f1_score(test_target, result))
Accuracy : 0.9532163742690059
Precision : 0.9666666666666667
Recall : 0.90625
F1 score : 0.9354838709677419
from xgboost import plot_importance
import matplotlib.pyplot as plt
%matplotlib inline

fig, ax = plt.subplots(figsize=(10, 12))
plot_importance(xgb, ax=ax)

728x90
반응형
LIST