데이터 공부를 기록하는 공간

[pca] iris 본문

STUDY/ADP, 빅데이터분석기사

[pca] iris

BOTTLE6 2021. 3. 21. 16:03
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV

np.random.seed(0)

iris = datasets.load_iris()
features = iris.data
target = iris.target

1. PCA with FeatureUnion, Pipeline, GridSearchCV

# StandardScaler + PCA 
## FeatureUnion for 전처리
preprocess = FeatureUnion([("std", StandardScaler()), 
                          ("pca",  PCA())])

# pipe line
## Pipeline for 
pipe = Pipeline([("preprocess", preprocess),
                ("classifier", LogisticRegression())])

# 후보값 
## for parameters
search_space = [{"preprocess__pca__n_components" : [1,2,3],
                "classifier__penalty":["l1","l2"],
                "classifier__C" : np.logspace(0,4,10)}]

# GridSearchCV for 교차검증
clf = GridSearchCV(pipe, search_space, cv=5, verbose=2, n_jobs=-1)
clf

best_model = clf.fit(features, target)
best_model.best_score_
best_model.best_estimator_['preprocess']
best_model.best_estimator_['classifier']

 

2. PCA without FeatureUnion

pipe = Pipeline([("std", StandardScaler()),
                ("pca", PCA()),
                 ("classifier", LogisticRegression())]
               )

search_space = [{"pca__n_components" : [1,2,3],
                "classifier__penalty":["l1","l2"],
                "classifier__C" : np.logspace(0,4,10)}]

clf = GridSearchCV(pipe, search_space, cv=5, verbose=1, n_jobs=-1)

 

3. pca - whitening

 

whitening > 주성분 분포를 평균 0 분산 1로 만들어줌

whitening 안 할 경우 평균만 0으로 분포

features_scaled = StandardScaler().fit_transform(features)
pca = PCA(n_components=0.99, whiten=True)
features_pca = pca.fit_transform(features_scaled)

print(features.shape[1])
print(features_pca.shape[1])

  ▷  4 , 3

 

import matplotlib.pyplot as plt
plt.scatter(features_pca[:,0], features_pca[:,1])
plt.title("pca with whitening")

pca_nowhiten = PCA(n_components=0.99)
features_nowhiten = pca_nowhiten.fit_transform(features_scaled)
plt.scatter(features_nowhiten[:,0],features_nowhiten[:,1])
plt.title("pca without whitening")

'STUDY > ADP, 빅데이터분석기사' 카테고리의 다른 글

[arima] smp2  (0) 2021.03.21
[arima] smp  (0) 2021.03.21
[clustering] Mall_Customers  (0) 2021.03.21
[classification] STAY or LEAVE  (0) 2021.03.21
[ARIMA] airplane  (0) 2021.03.20
Comments