정규표현식 RE

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

데이터 공부를 기록하는 공간

정규표현식 RE 본문

STUDY/PYTHON _ PROGRAMMERS

정규표현식 RE

BOTTLE6 2022. 10. 25. 12:00

https://docs.python.org/ko/3/library/re.html

re — 정규식 연산 — Python 3.10.8 문서

re — 정규식 연산 소스 코드: Lib/re.py 이 모듈은 Perl에 있는 것과 유사한 정규식 일치 연산을 제공합니다. 패턴과 검색 할 문자열은 모두 유니코드 문자열(str)과 8비트 문자열(bytes)이 될 수 있습니

docs.python.org

❍ match, search, findall, finditer

import re
# match
p = re.compile('[a-z]+') #a~z까지 +:1번이상 반복되는
m = p.match('3 python')
print(m)

# search : 첫번째 객체가 매치되지 않더라도, 매치되는 객체가 있으면 반환함
p = re.compile('[a-z]+') #a~z까지 +:1번이상 반복되는
m = p.search('3 python')
print(m)

# finditer : match되는 결과는 iterator로 반환함
p = re.compile('[a-z]+') #a~z까지 +:1번이상 반복되는
m = p.finditer('life is too short')
for i in m:
    print(i)

# findall : 정규표현식 일치하는 스트링을 리스트에 담아서 반환해줌
p = re.compile('[a-z]+') #a~z까지 +:1번이상 반복되는
m = p.findall('life is too short')
print(m)

❍ match 객체의 메서드

- group(), start(), end(), span()

p = re.compile('[a-z]+') #a~z까지 +:1번이상 반복되는
m = p.match('python')
print(m.group())
print(m.start())
print(m.end())
print(m.span())

❍ compile 옵션

# DOTALL, S
p = re.compile('a.b') # .은 줄바꿈 문자를 제외하고 아무것이나, 
m = p.match('a\nb')
print(m)
p = re.compile('a.b', re.DOTALL) # re.DOTALL을 넣으면 줄바꿈도 포함
# p = re.compile('a.b', S) 동일
m = p.match('a\nb')
print(m)

# IGNORECASE, I
p = re.compile('[a-z]')
print(p.match('python'))
print(p.match('Python')) # 대문자 인식 못함
print(p.match('PYTHON'))

p = re.compile('[a-z]', re.I) # 대소문자 인식하게 해줌
print(p.match('python'))
print(p.match('Python')) # 대문자 인식
print(p.match('PYTHON'))

# MULTILINE, M
# ^:맨처음, \s:공백, \w 단어가 반복
p = re.compile("^python\s\w+") # 새로운 줄을 인식하지 못함
data = """python one 
life is too short
python two
you need python
python three"""
print(p.findall(data))

p = re.compile("^python\s\w+", re.M) # ^를 맨 첫줄만 아닌 다른 라인의 시작도 처음으로 인식하게 해줌
data = """python one 
life is too short
python two
you need python
python three"""
print(p.findall(data))

# VERBOSE, X : 긴 정규식을 나눠서 적을 수 있도록
charref = re.compile(r'&[#](0[0-7]+|[0-9]+|x[0-9a-fA-F]+);')

charref = re.compile(r"""
$[#]                # start of a numeric entity reference
(                   
    0[0-7]+         # octal form
    | [0-9]+        # Decimal form
    | x[0-9a-fA-F]+ # Hexadecimal form
)
;                   # Trailing semicolon
""", re.VERBOSE)

❍ 백슬래시 문제

\section 표현하고 싶은데,

\s는 공백을 의미함 \는 \로 인식되기 때문에,

결국에는 \\을 사용해야 하는데,

이를 간단히 하기위해 r'\를 활용가능하다.

p = re.compile('\\section')
p = re.compile('\\\\section')
p = re.compile(r'\\section')

❍ 메타문자 |

import re
p = re.compile("Crow|Servo")
m = p.match("CrowHello")
print(m)

❍ 메타문자 ^

시작문자

import re
print(re.search('^Life', 'Life is too short'))
print(re.search('^Life', 'My Life'))

❍ 메타문자 &

맨끝문자

import re
print(re.search('short$', "Life is too short"))
print(re.search('short$', "Life is too short, you"))

❍ 메타문자 \b

공백문자

import re
p = re.compile(r'\bclass\b')
print(p.search('no class at all'))
print(p.search('the declassfied algorithm'))
print(p.search('one subclass is'))

❍ 그루핑 ()

표현식을 묶어주는 것

import re
p = re.compile('(ABC)+') # ABC가 1번이상 반복되는 문자열
m = p.search('ABCABCABC OK?')
print(m)
print(m.group())

import re 
p = re.compile(r"(\w)\s+\d+[-]\d+[-]\d+")
m = p.search("park 010-1234-5678")
print(m)
print(m.group(1)) #첫번째 그룹
#print(m.group(2))

import re
p = re.compile(r'(\b\w+)\s+\1') # \1은 그룹(\b\w+)을 한번 더 반복
print(p.search("Paris in the the spring").group())

# 그루핑된 문자열에 이름 붙이기 ?P<name>

import re
p = re.compile(r"(?P<name>\w+)\s+((\d+)[-]\d+[-]\d+)")
m = p.search("park 010-1234-566")
print(m.group("name")) #name이라는 그룹 불러오기

# 그루핑된 문자열에 이름 붙이기 ?P<name>

import re
p = re.compile(r"(?P<word>\w+)\s+(?P=word)")
print(p.search("Paris in the the spring").group())

❍ 전방탐색 : 긍정형 (?=)

import re
p = re.compile(".+:") # .+ 문자열이 반복되다가 :를 만났을 때
m = p.search("http://google.com")
print(m.group())

# :는 검색할 때만 활용하고, 출력에는 미포함
p = re.compile(".+(?=:)") # .+ 문자열이 반복되다가 :를 만났을 때
m = p.search("http://google.com")
print(m.group())

❍ 전방탐색 : 부정형(?!)

import re
p = re.compile(".*[.](?!bat$).*$", re.M) # .*문자열 반복되다가 . 확장자가 bat로 끝나는 것은 미포함
l = p.findall("""
autoexec.exe
autoexec.bat
autoexec.jpg
""")
print(l)

❍ 문자열 바꾸기 sub

import re
p = re.compile('(blue|white|red)')
p.sub('colour', 'blue socks and red shoes')

❍ Greedy vs Non-Greedy

import re
s = '<html><head><title>TiTle</title>'
print(re.match("<.*>", s).group()) #greedy, # <> 사이 문자열 있다면 모두 출력
print(re.match("<.*?>", s).group()) #non-greedy ?를 최소한으로 반복하겠다.

'STUDY > PYTHON _ PROGRAMMERS' 카테고리의 다른 글

[프로그래머스] 탐욕법(Greedy) - 체육복 (0)	2022.06.11
[프로그래머스] Hash - 완주하지 못한 선수 lv1 (0)	2022.06.11
[프로그래머스]lv2_가장 큰 수 (0)	2021.09.19
[프로그래머스]lv1_체육복 (0)	2021.09.19
[프로그래머스]lv1_완주하지 못한 선수 (0)	2021.09.19

'STUDY/PYTHON _ PROGRAMMERS' Related Articles

Comments

데이터 공부를 기록하는 공간

정규표현식 RE 본문

정규표현식 RE

❍ match, search, findall, finditer

❍ match 객체의 메서드

❍ compile 옵션

❍ 백슬래시 문제

❍ 메타문자 |

❍ 메타문자 ^

❍ 메타문자 &

❍ 메타문자 \b

❍ 그루핑 ()

❍ 전방탐색 : 부정형(?!)

❍ 문자열 바꾸기 sub

❍ Greedy vs Non-Greedy

'STUDY > PYTHON _ PROGRAMMERS' 카테고리의 다른 글

티스토리툴바