728x90
반응형
SMALL
wdbc 데이터셋
특징은 유방 종괴의 미세 바늘 흡인물 (FNA)의 디지털화된 이미지에서 계산된다. 이것은 이미지에 존재하는 세포 핵의 특성을 설명한다. (https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29)
import pandas as pd
df = pd.read_csv('wdbc_data.csv', header=None)
df.head()
# Outcome 값 변경
df.loc[df[1]=='B', 1] = 0
df.loc[df[1]=='M', 1] = 1
df[1] = df[1].astype('int32')
df.describe()
df.isnull().sum()
데이터 전처리
features = df[df.keys().drop([0,1])].values
outcome = df[1].values.reshape(-1,1)
from sklearn.model_selection import train_test_split
train_features, test_features, train_target, test_target = train_test_split(features, outcome, stratify=outcome, test_size=0.3)
train_features, val_features, train_target, val_target = train_test_split(train_features, train_target, stratify=train_target, test_size=0.3)
데이터 스케일링
from sklearn.preprocessing import MinMaxScaler
feature_scaler = MinMaxScaler()
train_features_scaled = feature_scaler.fit_transform(train_features)
val_features_scaled = feature_scaler.transform(val_features)
test_features_scaled = feature_scaler.transform(test_features)
검증 데이터를 이용한 조기 중단 알고리즘 적용
from xgboost import XGBClassifier
from xgboost.callback import EarlyStopping
xgb = XGBClassifier()
early_stop = EarlyStopping(rounds=20,metric_name='error',data_name="validation_0",save_best=True)
xgb.fit(train_features_scaled, train_target,eval_set=[(val_features_scaled, val_target)], eval_metric='error', callbacks=[early_stop])
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
result = xgb.predict(test_features_scaled)
print('Accuracy :',accuracy_score(test_target, result))
print('Precision :',precision_score(test_target, result))
print('Recall :',recall_score(test_target, result))
print('F1 score :',f1_score(test_target, result))
Accuracy : 0.9590643274853801
Precision : 0.9830508474576272
Recall : 0.90625
F1 score : 0.943089430894309
from xgboost import plot_importance
import matplotlib.pyplot as plt
%matplotlib inline
fig, ax = plt.subplots(figsize=(10, 12))
plot_importance(xgb,ax=ax)
조기 중단 알고리즘 비교를 위해 단일 학습
from xgboost import XGBClassifier
from xgboost.callback import EarlyStopping
xgb = XGBClassifier()
xgb.fit(train_features_scaled, train_target)
result = xgb.predict(test_features_scaled)
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
print('Accuracy :',accuracy_score(test_target, result))
print('Precision :',precision_score(test_target, result))
print('Recall :',recall_score(test_target, result))
print('F1 score :',f1_score(test_target, result))
Accuracy : 0.9532163742690059
Precision : 0.9666666666666667
Recall : 0.90625
F1 score : 0.9354838709677419
from xgboost import plot_importance
import matplotlib.pyplot as plt
%matplotlib inline
fig, ax = plt.subplots(figsize=(10, 12))
plot_importance(xgb, ax=ax)
728x90
반응형
LIST
'Learning-driven Methodology > ML (Machine Learning)' 카테고리의 다른 글
[XGBoost] 심장 질환 예측 (0) | 2022.10.04 |
---|---|
[XGBoost] 위스콘신 유방암 데이터 (3) (0) | 2022.10.04 |
[XGBoost] 위스콘신 유방암 데이터 (1) (0) | 2022.10.04 |
[Machine Learning] 앙상블 (Ensemble) (0) | 2022.10.04 |
[Machine Learning] LightGBM (0) | 2022.10.04 |