Elasticsearch: Mapping

엘라스틱서치 데이터 맵핑, 뭔가 하드코딩 같은...

by 유윤식

May 9. 2019

ELK로 유명함.

Elasticsearch, Logstash, Kibana

대학교 4학년 쯤,

처음 Elasticsearch(그땐, ELK Stack 이 완성되지 않았음) 를 접하고

루씬 아파치 프로젝트를 알게됨.

꽤나 오랜 시간이 지난 지금,

Beat 계열의 다양한 스택들이 추가됨.

Suricata Network Traffic Data 에는

약 490개 정도의 필드가 존재.

이 모든 필드(변수?)에 맵핑을 할 수 없다.

그래서 보통 자동 맵핑을 사용한다.

하지만,

가장 중요한 IP Address 정보는 이게 IP임을 알려주어야 한다.

안그럼, String 변수로 인식함.

일단 Ducument가 중요하니까,

Ref : https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html#_explicit_mappings

Mapping | Elasticsearch Reference [7.0] | Elastic

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html#_explicit_mappings

다이나믹하게 맵핑을 추가할 수 있음.

하지만!

보통 Logstash를 통해서 데이터를 Elasticsearch로 넣는다.

이때는,

Logstash에 conf 를 설정한다.

1. input 정의

input {

redis {

type => "redis"

host => "127.0.0.1"

data_type => "channel"

key => "obtraffic"

batch_count => 10

threads => 1

}

2. filter 정의

filter {

if [type] == "redis" {

geoip {

source => "src_ip" # 원래 데이터 소소의 Key

target => "src_geoip" # 바꾸려는 Key

}

geoip {

source => "dest_ip" # 원래 데이터 소소의 Key

target => "dest_geoip" # 바꾸려는 Key

}

3. output 설정

output {

if [type] == "redis" {

elasticsearch {

hosts => ["localhost:9200"]

index => "suricata-%{+YYYY.MM.dd}"

template => "/etc/logstash/template/suri_template.json"

template_name => "suricata-1"

template_overwrite => true

}

kafka {

codec => "json"

topic_id => "suricata"

acks => "1"

bootstrap_servers => "192.168.2.12:9092,192.168.2.12:9093,192.168.2.12:9094"

id => "suricata_log"

}

suri_template.json 파일을 살펴보면,

{

"template": "suricata-*",

"settings": {

"number_of_shards": 5,

"index.refresh_interval" : "10s"

"mappings": {

"doc": {

"properties": {

"src_geoip": {

"properties": {

"location": {

"type": "geo_point"

}

"dest_geoip": {

"properties": {

"location": {

"type": "geo_point"

}

사실,

개인적으로 설정을 냅다 꽂아 박는(?) 형태의 설계를 좋아하지 않는다.

Fluentd 를 참고 할 수 있음.

Ref : https://docs.fluentd.org/v0.12/articles/in_others

Other Input Plugins | Fluentd

https://docs.fluentd.org/v0.12/articles/in_others

Logstash 와 유사한 데이터 Shipper 기능을 담당.

AWS가 밀어준다.

본론 및 결론,

Suricata 데이터 소스에 있는 Key를 먼저 살펴보고,

그 중에서 src_ip, dest_ip 에 대한 geo_point 타입을 맵핑하려는 의도이다.

그럼 지도에서 각 IP 소스가 얼마나 어떻게 분포하는지

시각화를 통해 알 수 있다.

잠깐,

Kibana(ELK 중 K를 담당)에서 어떻게 지도 위에 표현하는지

예전에 지하철 승하차 관련 공공데이터 API를 통해

비슷한 작업을 진행 한 바 있음.

그때와 크게 다르지 않음!

끝.

keyword

유윤식 직업 개발자

망각의 동물이기 때문에 작성하는 공부/업무 다이어리

구독자 75

작가의 이전글Python: m.l with Keras #13Docker for CentOS: Suricata작가의 다음글