[crawling] requests - krx 시세 가져오기

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

데이터 공부를 기록하는 공간

[crawling] requests - krx 시세 가져오기 본문

STUDY/CRAWLING

[crawling] requests - krx 시세 가져오기

BOTTLE6 2022. 6. 6. 00:00

참고글

https://blog.naver.com/ellijahbyeon/222213048898

✔ 라이브러리

### https://blog.naver.com/ellijahbyeon/222213048898
import requests
import pandas as pd
from io import BytesIO

✔ 크롤링 함수 정의

path = 'C:/#####/####/test/' #### 저장주소, 주식데이터
def krx_basic(tdate):
    #### generate
    gen_req_url = 'http://data.krx.co.kr/comm/fileDn/GenerateOTP/generate.cmd'
    query_str_parms = {
        'mktId': 'ALL',
        'trdDd': str(tdate),
        'share': '1',
        'money': '1',
        'csvxls_isNo': 'false',
        'name': 'fileDown',
        'url': 'dbms/MDC/STAT/standard/MDCSTAT01501'
    }
    headers = {
        'Referer': 'http://data.krx.co.kr/contents/MDC/MDI/mdiLoader',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent': '#####' #generate.cmd에서 찾아서 입력
    }
    r = requests.get(gen_req_url, query_str_parms, headers=headers)
    #### download
    gen_req_url = 'http://data.krx.co.kr/comm/fileDn/download_csv/download.cmd'
    form_data = {
        'code': r.content
    }
    r = requests.post(gen_req_url, form_data, headers=headers)
    df = pd.read_csv(BytesIO(r.content), encoding='cp949')
    df['일자'] = tdate
    file_name = 'basic_'+ str(tdate) + '.csv'
    df.to_csv(path+file_name, index=False, index_label=None, encoding='cp949')
    print('KRX crawling completed :', tdate)
    return

http://data.krx.co.kr/contents/MDC/MDI/mdiLoader/index.cmd?menuId=MDC0201 에서

개발자도구(F12) 후 새로고침(F5)를 하면, generate.cmd와 download.cmd를 확인할 수 있음

일단, generate.cmd에서 Form data를 확인하여 "query_str_parms" 에 위와 같이 입력하여 준다.

download.cmd의 form data는 generate.cmd의 response와 형식이 같다.

✔ 일자별 데이터 반복문 적용

# 일자별 저장하기 
for year in range(2021,2022):
    for month in range(1, 2):
        for day in range(1, 32):
            tdate = year * 10000 + month * 100 + day * 1
            if tdate <= 20211231:
                krx_basic(tdate)

✔ 일자별 데이터 합치기

# 합치기
WIP = pd.read_csv(path + 'basic_20210101.csv', encoding='cp949') # 시작일자에 해당하는 파일명 적어줘야함
for year in range(2021,2022):
    for month in range(1, 2):
        for day in range(1, 32):
            tdate = year * 10000 + month * 100 + day * 1
            if tdate <= 20211231:
                yesterday = pd.read_csv(path + 'basic_' + str(tdate) + '.csv', encoding='cp949')
                WIP = pd.concat([WIP, yesterday], sort=False)
                print("concatenate completed :", tdate)
WIP = WIP.drop_duplicates()
WIP