Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |
Tags
- 파이썬 주식
- Crawling
- 파이썬
- 빅데이터분석기사
- 백테스트
- 주식
- SQL
- TimeSeries
- 토익스피킹
- Quant
- 파트5
- hackerrank
- Python
- 볼린저밴드
- 비트코인
- GridSearchCV
- randomforest
- docker
- ADP
- Programmers
- 데이터분석전문가
- sarima
- 실기
- 데이터분석
- PolynomialFeatures
- 변동성돌파전략
- 코딩테스트
- backtest
- 프로그래머스
- lstm
Archives
- Today
- Total
데이터 공부를 기록하는 공간
[pca] iris 본문
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
np.random.seed(0)
iris = datasets.load_iris()
features = iris.data
target = iris.target
1. PCA with FeatureUnion, Pipeline, GridSearchCV
# StandardScaler + PCA
## FeatureUnion for 전처리
preprocess = FeatureUnion([("std", StandardScaler()),
("pca", PCA())])
# pipe line
## Pipeline for
pipe = Pipeline([("preprocess", preprocess),
("classifier", LogisticRegression())])
# 후보값
## for parameters
search_space = [{"preprocess__pca__n_components" : [1,2,3],
"classifier__penalty":["l1","l2"],
"classifier__C" : np.logspace(0,4,10)}]
# GridSearchCV for 교차검증
clf = GridSearchCV(pipe, search_space, cv=5, verbose=2, n_jobs=-1)
clf


best_model = clf.fit(features, target)
best_model.best_score_
best_model.best_estimator_['preprocess']
best_model.best_estimator_['classifier']

2. PCA without FeatureUnion
pipe = Pipeline([("std", StandardScaler()),
("pca", PCA()),
("classifier", LogisticRegression())]
)
search_space = [{"pca__n_components" : [1,2,3],
"classifier__penalty":["l1","l2"],
"classifier__C" : np.logspace(0,4,10)}]
clf = GridSearchCV(pipe, search_space, cv=5, verbose=1, n_jobs=-1)

3. pca - whitening
whitening > 주성분 분포를 평균 0 분산 1로 만들어줌
whitening 안 할 경우 평균만 0으로 분포
features_scaled = StandardScaler().fit_transform(features)
pca = PCA(n_components=0.99, whiten=True)
features_pca = pca.fit_transform(features_scaled)
print(features.shape[1])
print(features_pca.shape[1])
▷ 4 , 3
import matplotlib.pyplot as plt
plt.scatter(features_pca[:,0], features_pca[:,1])
plt.title("pca with whitening")

pca_nowhiten = PCA(n_components=0.99)
features_nowhiten = pca_nowhiten.fit_transform(features_scaled)
plt.scatter(features_nowhiten[:,0],features_nowhiten[:,1])
plt.title("pca without whitening")

'STUDY > ADP, 빅데이터분석기사' 카테고리의 다른 글
[arima] smp2 (0) | 2021.03.21 |
---|---|
[arima] smp (0) | 2021.03.21 |
[clustering] Mall_Customers (0) | 2021.03.21 |
[classification] STAY or LEAVE (0) | 2021.03.21 |
[ARIMA] airplane (0) | 2021.03.20 |
Comments