Deep Dive Grep

SE 필수 습득 명령!!

by alinos

Apr 14. 2016

저는 시스템 관리 업무중 NIX 시스템 분석과 모니터링을 좋아합니다.

시스템 분석과 모니터링을 하려면 시스템 로그와 VFS의 정보에 필요한 부분만을 잘 쓸어담아야 하죠.

그래서 grep 명령어를 하루에 수십번씩 사용합니다.

그러다보니, 나름 노하우가 많이 생기기도 하고 손이 본능적으로 사용하고 있는 걸 정리해보려고 합니다.

다른 사이트도 참고해서 제가 알고 있는 것 이상으로 Deep 하게 정리해보겠습니다.

먼저 예제로 사용할 문서는 다음 텍스트 박스에 적혀 있는 걸로 사용할게요. (파일 이름은 deep_grep)

I'm Lil System Engineer aka.alinos
I Like Grep
ex) hi hello test
1. grep
2. Case insensitive -i
3. recursively -r/-R
4. word -w
5. count -c
6. line number -n
7. invert match -v
8. Egrep -E -e
9. After/Before/aroound -A -B -C
10. Grep color --color=auto
11. grep word in file name -l -L
12. only output -o
13. position byte -b
14. Regular expression

1. GREP 기본 사용

# grep grep deep_grep

Output :

ex) hi hello grep
1. grep
8. Egrep -E -e
11. grep word in file name -l -L

문법은 다들 알다시피 grep {찾을 명령어} {파일 이름} 입니다.

2. 대소문자 개무시 검색 (Case insensitive)

# grep -i grep deep_grep

Output :

I Like Grep
ex) hi hello grep
1. grep
8. Egrep -E -e
10. Grep color --color=auto
11. grep word in file name -l -L

-i 옵션을 쓰니 grep|Grep 모두 잡아 냅니다. 정규식으로는 [Gg][Rr][Ee][Pp] 와 해석할 수 있습니다.

3. 하위 디렉토리 포함 검색

# grep -r grep * 혹은 grep -R grep *

Output :

deep_grep:ex) hi hello grep

deep_grep:1. grep

deep_grep:8. Egrep -E -e

deep_grep:11. grep word in file name -l -L

stream/aka:grep

-r/-R 모두 같은 의미로 사용되고, 하위 디렉토리에 있는 파일까지 단어들을 잡아올 수 있습니다.

사실 -r 과 -R은 약간 다른 용도로 사용되네요, 저도 지금 알았습니다. (전 그냥 -r 로 다 긁어요 -_-)

-r, --recursive like --directories=recurse
-R, --dereference-recursive
--include=FILE_PATTERN search only files that match FILE_PATTERN
--exclude=FILE_PATTERN skip files and directories matching FILE_PATTERN
--exclude-from=FILE skip files matching any file pattern from FILE
--exclude-dir=PATTERN directories that match PATTERN will be skipped.

위와 같이 약간 다른 옵션을 가지고 있는데, 보통은 recurse 옵션으로 사용을하고, -R 을 잘 쓰면 조금 더 디테일하게 쓸 수 가 있네요, 옵션은 위에 있으니 저는 예제 하나만 만들겠습니다.

# grep -R --exclude-dir=stream grep *

Output :

deep_grep:ex) hi hello grep

deep_grep:1. grep

deep_grep:8. Egrep -E -e

deep_grep:11. grep word in file name -l -L

위의 예제에서는 stream 폴더안에 aka 라는 파일안에 grep 이란 단어를 가져왔는데, exclude-dir 옵션으로 해당 디렉토리 제외해서 긁어왔습니다.

파일들 많고 폴더가 산재한 디렉토리에서 긁을때 exclude/include 옵션 잘쓰면 I/O 랑 부하 많이 줄일 수 있겠습니다 !!

exclude/include는 패턴형식을 지원함으로, stream1이라는 폴더가 하나더 있다는 가정을 했을때

exclude -R --exclude-dir=stream* grep * 이라는 구문을 쓰면 stream이라는 이름으로 시작되는 폴더는 긁지 않습니다.

4. 독립 단어로만 검색

# grep -w grep deep_grep

Output:

ex) hi hello grep

1. grep

11. grep word in file name -l -L

-w 는 요게 그냥 grep 가 뭔차이냐면 독립 단어만 긁는 다는 겁니다.

예를 들어 그냥 grep 으로 긁으면 Egrep grep 1grep 이런것들이 다 긁어지지만 워드 옵션을 주면 오로지 grep 이란 단어가 들어있는 줄만 긁어옵니다. (1번 참고)

5. 카운트 검색

# grep -c grep deep_grep

Output:

-c 옵션은 문서에서 grep 이라는 글자에 카운팅을 해줍니다. deep_grep 이란 문서에 grep 이란 단어가 4개 있는거죠

독립 단어 검색 복습겸 -w 옵션을 같이 줘서 조금 변화를 주면

# grep -w -c grep deep_grep

Output:

3개가 나옵니다. grep 이라는 독립 단어가 3개 존재한다는 거죠, deep_grep 문서보면 Egrep 이란 단어가 하나 있는데 이건 무시하는거예요.

6. 라인 검색

# grep -n line deep_grep

Output:

9:6. line number -n

-n 옵션을 주면 긁은 글씨가 몇 번째 줄에 있는지 알려줘요, cat -n deep_grep 뭐 요런 옵션이랑 비슷한거죠.

7. 제외 검색

# grep -v invert deep_grep

Output:

I'm Lil System Engineer aka.alinos
I Like Grep
ex) hi hello test
1. grep
2. Case insensitive -i
3. recursively -r/-R
4. word -w
5. count -c
6. line number -n
8. Egrep -E -e
9. After/Before/aroound -A -B -C
10. Grep color --color=auto
11. grep word in file name -l -L
12. only output -o
13. position byte -b
14. Regular expression

-v를 주고 긁으면 invert 줄이 있는 7번만 샥 제외 됐습니다. 요거 많이 써요.

8. 멀티 검색 (aka egrep!!)

# grep -E 'count|position|output' deep_grep

Output:

5. count -c

12. only output -o

13. position byte -b

-E 옵션은 grep 의 꽃 요거 은근히 사람들 잘 몰라서 못쓰는데 이거 핵 꿀팁이예요.

and 검색이 아니라 or 검색이라서!!!

왜냐면 이걸 모르면 grep 을 여러번 단어별로 몇번씩 긁는 수밖에 없거든요.

잘쓰시는 분들은 알았겠지만 저는 한 4년을 그랬어요.

9. 검색 전/후 줄 수 출력

# grep -A2 word deep_grep

Output:

4. word -w

5. count -c

6. line number -n

11. grep word in file name -l -L

12. only output -o

13. position byte -b

내가 찾는 단어 줄 밑으로 줄 수를 보고 싶으면 -A 옵션을 주고 보고 싶은 줄 수만큼 입력!

# grep -B2 word deep_grep

Output:

2. Case insensitive -i

3. recursively -r/-R

4. word -w

9. After/Before/aroound -A -B -C

10. Grep color --color=auto

11. grep word in file name -l -L

내가 찾는 단어 줄 위로 줄 수를 보고 싶으면 -B 옵션을 주고 보고 싶은 줄 수만큼 입력!

# grep -C2 word deep_grep

Output:

2. Case insensitive -i

3. recursively -r/-R

4. word -w

5. count -c

6. line number -n

9. After/Before/aroound -A -B -C

10. Grep color --color=auto

11. grep word in file name -l -L

12. only output -o

13. position byte -b

내가 찾는 단어 줄 위아래로 줄 수를 보고 싶으면 -C 옵션을 주고 보고 싶은 줄 수만큼 입력!

요것도 핵꿀팁입니다.

내가 생각나는 단어 위아래에 같이 연관 내용이 있는 경우가 많아요, 이럴 경우에 -A/-B/-C 옵션을 쓰면 엄청 편하고 좋습니다.

10. 검색 문자 강조 색깔 넣기!!

# grep --color=auto grep deep_grep

Output:

ex) hi hello grep

1. grep

8. Egrep -E -e

11. grep word in file name -l -L

--color 옵션은 auto/never/always 옵션이 있는데요, 기본은 never 로 되어 있어서 grep 으로 잡아도 문자 강조가 되지 않죠.

제가 여태까지 문서에서 grep 글자에 일일이 색깔을 넣은건 해석도 편히 하기 위한것도 있지만, 전 color 옵션을 기본으로 grep 에 alias 쓰고 있습니다.

alias grep='grep --color=auto' 이런식으로요.

always 란 옵션도 최근데 알게된 꿀팁인데요, always 옵션을 쓰면 파이프라이닝에서도 색깔이 살아남아요.

# grep 1 deep_grep --color=auto | grep grep --color=auto

Output:

1. grep

11. grep word in file name -l -L

이렇게 auto 를 사용을하면, 맨 끝에 찾은 단어만 강조됩니다.

하지만 always 옵션을 쓰면

# grep 1 deep_grep --color=always |GREP_COLORS="mt=01;32" grep grep --color=auto -n

1. grep

11. grep word in file name -l -L

요렇게 파이프라이닝 되도, 색깔을 강조 할 수 있어요.

GREP_COLORS 라는 옵션으로 grep 강조할 색깔을 커스텀하게 설정할 수 있습니다.

GREP_COLORS 옵션에 대한 설정은 grep MANPAGE 에 잘나와 있으니까 자세히는 기재하지 않을게요. 색깔이 나온다는게 중요한거니까.

11. 내가 찾는 문자를 가지고 있는 파일을 찾아내기

# grep -r -l grep * # --files-with-matches

Output:

deep_grep

gfreij

stream/aka

stream1/trst

-l() 은 grep 내용을 출력하는게 아니라, grep 이란 단어를 가진 파일 이름들을 표시해줍니다.

모든 출력을 볼 필요가 없고 내가 찾는 구문을 가진 파일을 확인할 때 사용합니다.

# grep -r -L grep * # --files-without-match

Output:

arp.py

arprequest.pyc

cdn_gslb

ciphertext

fff

for

list

spdtest.sh

stream/stream.f

stream/mysecond.c

stream/stream_c

stream/stream.c

stream/mysecond.o

stream/Makefile

test

test.pl

test.py

-L 은 반대의미 입니다. 제가 선언한 문구를 가지고 있지 않은 파일 이름들을 출력해줍니다.

12. 내가 찾는 문자만을 출력 하기 only

# grep -o only deep_grep

only

오로지 내가 찾은 글자만을 출력 해줍니다.

저는 특정 문자열 카운팅할때 자주 조합해서 씁니다.

# grep -o [Gg]rep deep_grep | sort -n | uniq -c

Output:

4 grep

2 Grep

이런식으로요.

13. 문자가 있는 byte 위치 표시

# grep [Gg]rep deep_grep -b

35:I Like Grep

47:ex) hi hello grep

66:1. grep

178:8. Egrep -E -e

226:10. Grep color --color=auto

254:11. grep word in file name -l -L

사실 잘 안써요, 근데 뭔가 파일 조작하는 스크립트를 쓸때나 split 명령어로 큰파일을 자를때 측량 용도로 쓸 수 있을 거 같네요.

14. 정규식 Regular Expression !!

뭐든 그렇지만 grep 은 대표적인 문자열을 긁는 명령어기 때문에, 정규식을 제대로 사용해야 제대로 쓴다고 말할 수 있을 거 같습니다. 저도 더 잘 사용하고 싶네요.

먼저 Posix Character Class 잘 알면 좋습니다.

Posix Character Class
[:alnum:] Alphanumeric characters ex) [a-zA-Z0-9]
[:alpha:] Alphabetic characters ex) [a-zA-Z]
[:ascii:] ASCII characters ex) [\x00-\x7F]
[:blank:] Space and tab ex) [ \t]
[:cntrl:] Control characters ex) [\x00-\x1F\x7F]
[:digit:] Digits ex) [0-9]
[:graph:] Visible characters ex) [\x21-\x7E]
[:lower:] Lowercase letters ex) [a-z]
[:print:] Visible characters and spaces ex) [\x20-\x7E]
[:punct:] Punctuation and symbols. ex) [!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~]
[:space:] All whitespace characters, including line breaks ex) [ \t\r\n\v\f]
[:upper:] Uppercase letters ex) [A-Z]
[:word:] Word characters ex) [A-Za-z0-9_]
[:xdigit:] Hexadecimal digits ex) [A-Fa-f0-9]

참고 페이지 : http://www.regular-expressions.info/posixbrackets.html

그리고 Operator를 사용해서 쓰면 됩니다.

제가 무슨 밥아저씨도 아니고, 당연한것만 어디서 베껴왔습니다.

. Matches any single character.
? The preceding item is optional and will be matched, at most, once.
* The preceding item will be matched zero or more times.
+ The preceding item will be matched one or more times.
{N} The preceding item is matched exactly N times.
{N,} The preceding item is matched N or more times.
{N,M} The preceding item is matched at least N times, but not more than M times.
– Represents the range if it’s not first or last in a list or the ending point of a range in a list.
^ Matches the empty string at the beginning of a line; also represents the characters not in the range of a list.
$ Matches the empty string at the end of a line.
\b Matches the empty string at the edge of a word.
\B Matches the empty string provided it’s not at the edge of a word.
\< Match the empty string at the beginning of word.
\> Match the empty string at the end of word.

참고 페이지 : http://www.cyberciti.biz/faq/grep-regular-expressions/

실전 예제 이제 마지막으로 적어봅니다.

1. 문서에서 IP 뽑아 내기

이거 아주 매일쓰는 정규식입니다.

#grep -Eo '[[:digit:]]{1,3}[.][[:digit:]]{1,3}[.][[:digit:]]{1,3}[.][[:digit:]]{1,3}' dnsmasq.log

Output :

8.8.4.4

8.8.8.8

112.171.134.114

8.8.8.8

8.8.4.4

위 구문을 쓰면 IP만 싹 빠지는데요, [[:digit:]]{1,3} 이것만 해석하면 쉬운데요 [[:digit:]] 가 숫자 0부터 9까지고 {1,3} 은 숫잔데 1자리수에서 3자리수까지를 선언하는 구문입니다.

그래서 [[:digit:]]{1,3}[.] 네번 반복하면 딱 IP 형식이 되는거죠 . 쩜은 그냥 써도 되지만 정규식에선 [.] 괄호를 쳐주는게 더 명시적인 것 같아요. 그냥 . 은 단일 문자로도 대체 될 수 있거든요.

# grep dns.a.q -o dnsmasq.log

Output :

dnsmasq

이런식으로요.

이외에 더 좋은 대체 방법은 grep -Eo '([[:digit:]]{1,3}[.]){3}[[:digit:]]{1,3}' dnsmasq.log 이 방법도 있겠네요.

더 어려워보이긴 하지만 [[:digit:]]{1,3}을 {3}이란 반복문으로 더 줄여서 쓸 수 있습니다.

2. 특정문자열로 시작하는 시스템 계정 찾기

#grep "\<s.*\>:x" /etc/passwd

Output :

sync:x:5:0:sync:/sbin:/bin/sync

shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

saslauth:x:499:76:"Saslauthd user":/var/empty/saslauth:/sbin/nologin

sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

smmsp:x:51:51::/var/spool/mqueue:/sbin/nologin

sensu:x:494:495:Sensu Monitoring Framework:/opt/sensu:/bin/false

이건 쉽습니다. \<s 이렇게 s로 시작문자를 선언하고 .* 로 뒤에 모든 문자들을 가르키는 Operator 를 씁니다.

그리고 전체 문서에 비슷한 문법을 가진것들에 대한 범위를 좁히기위헤서 user 뒤에 암호 부분을 가르키는 :x 를 같이 명시해줘서 s로 시작하는 계정들만 찾아낼 수 있습니다.

3. Nginx 프로세스 상태 파악

# grep -rh nginx -A 50 /proc/*/status | grep -Ei "name|pid|threads|ctxt|vmrss|cpu|mem"

Name: nginx

Pid: 32576

PPid: 1

TracerPid: 0

Threads: 1

Cpus_allowed: 1

Cpus_allowed_list: 0

Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001

Mems_allowed_list: 0

voluntary_ctxt_switches: 22

nonvoluntary_ctxt_switches: 1

Name: nginx

Pid: 32579

PPid: 32576

TracerPid: 0

Threads: 1

Cpus_allowed: 1

Cpus_allowed_list: 0

Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001

Mems_allowed_list: 0

voluntary_ctxt_switches: 64763

nonvoluntary_ctxt_switches: 367

위처럼 grep 명령어 사용하면 VFS 폴더를 뒤져서 nginx 가 현재 사용하는 cpu 랑 메모리 context switch 량 파악할 수 있습니다.

-rh 옵션으로 파일 헤더값 표시 생략 및 하위폴더 까지 검색하는 옵션을 주고 -A 50 옵션으로 nginx 이름을 찾은뒤 대충 50줄 출력 그리고 파이프라인으로 넘겨서 grep -Ei 옵션으로 제가 보고 싶은 항목만 출력하도록 했습니다.

4. 여러 복합 조건으로 검색하기

# grep -E "(reply|cached)[[:space:]].*cloudfront.net is 54[.](192|230)[.]37[.].*7$" dnsmasq.log

Output:

Mar 25 03:26:26 dnsmasq[27337]: reply d2oh4tlt9mrke9.cloudfront.net is 54.192.37.117

Mar 25 03:26:26 dnsmasq[27337]: reply d2oh4tlt9mrke9.cloudfront.net is 54.192.37.67

Mar 25 03:26:27 dnsmasq[27337]: reply d3ewslr5655zon.cloudfront.net is 54.192.37.187

Mar 25 03:30:38 dnsmasq[27337]: reply dy6tztwnb325w.cloudfront.net is 54.192.37.87

Mar 25 03:30:39 dnsmasq[27337]: reply d1l2s6adam5wqx.cloudfront.net is 54.192.37.17

Mar 30 20:05:47 dnsmasq[31960]: reply d3cajjoh3a1m9n.cloudfront.net is 54.192.37.127

Mar 31 01:36:25 dnsmasq[31960]: reply d3ouyz166y09uz.cloudfront.net is 54.192.37.167

그냥 억지로 조합 해봤어요.

grep -E "(reply|cached) reply 나 cached 인 구문을 잡아라

grep -E "(reply|cached)[[:space:]] 그리고 다음에는 공백이 있는 문자다.

[[:space:]].*cloudfront.net 공백뒤에는 모든 문자를 대체하고 대신 cloudfront.net 으로 끝난다.

cloudfront.net is 54[.](192|230)[.]37[.].*7$" cloudfron.net is 다음 54.192/230.37 대역을 잡아라 마지막엔 7로 끝나는 IP로.

그럼 추후 추가적인 grep 명령어 사용의 Tip 이나 수정할 사항 생기면, 이 글을 수정하도록 하겠습니다.

잘못된거나 grep 으로 잡고 싶은 것들 많이 말씀해주시면 좋은 글이 될 것 같네요.

참고 사이트

http://www.cyberciti.biz/faq/grep-regular-expressions/

Regular Expressions In grep

A Step By Step Tutorial and examples for Regular Expressions In grep under UNIX / Linux / OS X operating systems.

http://www.cyberciti.biz/faq/grep-regular-expressions/

http://www.thegeekstuff.com/2011/01/regular-expressions-in-grep-command/

Regular Expressions in Grep Command with 10 Examples – Part I

Regular expressions are used to search and manipulate the text, based on the patterns. Most of the L

http://www.thegeekstuff.com/2011/01/regular-expressions-in-grep-command/

https://www.gnu.org/software/findutils/manual/html_node/find_html/grep-regular-expression-syntax.html#grep-regular-expression-syntax

grep regular expression syntax - Finding Files

Finding Files

https://www.gnu.org/software/findutils/manual/html_node/find_html/grep-regular-expression-syntax.html#grep-regular-expression-syntax

http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_04_02.html#sect_04_02_02

Examples using grep

grep searches the input files for lines containing a match to a given pattern list. When it finds a match in a line, it copies the line to standard output (by default), or whatever other sort of output you have requested with options.

http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_04_02.html#sect_04_02_02

keyword

alinos 소속 SE 직업 데이터분석가

Lil System Engineer

팔로워 2

브런치를 다시 시도해본다.작가의 다음글