Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 |
Tags
- backtest
- TimeSeries
- docker
- lstm
- ADP
- hackerrank
- 파이썬 주식
- GridSearchCV
- Programmers
- SQL
- 프로그래머스
- Python
- 비트코인
- Quant
- 빅데이터분석기사
- 변동성돌파전략
- 토익스피킹
- PolynomialFeatures
- 데이터분석
- 볼린저밴드
- randomforest
- 파이썬
- 실기
- 백테스트
- 파트5
- 코딩테스트
- 데이터분석전문가
- sarima
- Crawling
- 주식
Archives
- Today
- Total
데이터 공부를 기록하는 공간
[arima] smp2 본문
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.stattools import acf, pacf
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
from matplotlib.pyplot import rcParams
rcParams['figure.figsize'] = 10, 6
import itertools
path= './smp/smp.xlsx'
df = pd.read_excel(path, header=1)
df.head(3)
df =df.rename(columns = {'Unnamed: 0':'ym','육지':'smp'})
df = df.drop(['제주','통합','Unnamed: 4'],axis=1)
df['ym'] = pd.to_datetime(df.ym, format='%Y-%m')
df = df.set_index('ym')
df = df[df.smp>0]
df = df.reset_index().sort_values(by='ym', ascending=True).set_index("ym")
# raw data fig
fig = df.plot()
plt.title("SMP")
■ 정상성 확인하기
# determine rolling statistics
rolmean = df.rolling(window=12).mean()
rolstd = df.rolling(window=12).std()
print(rolmean, rolstd)
# plot rolling statistics
orig = plt.plot(df, color='blue', label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label='Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
plt.show(block=False)
From the above graph, we see that rolling mean itself has a trend component even though rolling standard deviation is fairly constant with time. For our time series to be stationary, we need to ensure that both the rolling statistics ie: mean & std. dev. remain time invariant or constant with time. Thus the curves for both of them have to be parallel to the x-axis, which in our case is not so.
To further augment our hypothesis that the time series is not stationary, let us perform the ADCF test.
# Perform Augmented Dickey-Fuller test:
print("Result of Dickey Fuller Test:")
dftest = adfuller(df['smp'], autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value', '#Lags Used', 'Number of Observations Used'])
for key, value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print(dfoutput)
df['smp_log'] = np.log(df.smp)
df['smp_log_ma'] = df['smp_log'].rolling(window=12).mean()
df['smp_log_std'] = df['smp_log'].rolling(window=12).std()
df['smp_log_ma_diff'] = df['smp_log'] - df['smp_log_ma']
df['smp_diff'] = df['smp'] -df['smp'].shift(1)
def test_stationarity(timeseries):
#Determine rolling statistics
movingAverage = timeseries.rolling(window=12).mean()
movingSTD = timeseries.rolling(window=12).std()
#Plot rolling statistics
orig = plt.plot(timeseries, color='blue', label='Original')
mean = plt.plot(movingAverage, color='red', label='Rolling Mean')
std = plt.plot(movingSTD, color='black', label='Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
plt.show(block=False)
#Perform Dickey–Fuller test:
print('Results of Dickey Fuller Test:')
dftest = adfuller(timeseries, autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print(dfoutput)
test_stationarity(df.smp_log)
p>0.05 = > smp_log : Non Stationary
test_stationarity(df.smp_log_ma_diff.dropna())
▶ smp_log_ma_diff는smp_ p
test_stationarity(df.smp_diff.dropna())
dfasdfp<0.05 Stationary
▶ smp_log_ma_diff는 pvalue가 0.05보다 작음 Stationaryvalue가 0.05보다 작음 Stationary
p<0.05 Stationary
2. ACF, PACF 그려보기
# Differenced data plot
plt.figure(figsize=(12,10))
plt.subplot(411)
plt.plot(df.smp)
plt.legend(["Raw SMP (Non Stationary)"])
plt.subplot(412)
plt.plot(df.smp_log,'orange')
plt.legend(["log transformed SMP (Non Stationary)"])
plt.subplot(413)
plt.plot(df.smp_log_ma_diff,'pink')
plt.legend(["log tranformed + differenced SMP (stationary)"], loc='upper right')
plt.subplot(414)
plt.plot(df.smp_diff,'green')
plt.legend(["differenced SMP (Stationary)"], loc='upper right')
## smp_diff 해석
### acp : cut off after lag 0
### pacf : cut off after lag 0
### ARRIMA(0,1,0)
model2 = ARIMA(df.smp_diff.dropna().values, order = (0,1,0))
model2_fit = model2.fit()
model2_fit.summary()
'STUDY > ADP, 빅데이터분석기사' 카테고리의 다른 글
빅데이터분석기사실기-XGBOOST 분류 (0) | 2021.06.06 |
---|---|
빅데이터분석기사 실기 예제 - 작업형#1 (0) | 2021.06.05 |
[arima] smp (0) | 2021.03.21 |
[pca] iris (0) | 2021.03.21 |
[clustering] Mall_Customers (0) | 2021.03.21 |
Comments