brunch

매거진 Amazon EKS 스터디

라이킷 댓글

You can make anything
by writing

C.S.Lewis

계정을 잊어버리셨나요?

by Master Seo Nov 04. 2023

EKS9탄-8.EKS DB-PostgreSQL-8/18

PostgreSQL 장애테스트해보자

<1> 장애 테스트 준비

<2> [장애1] 프라이머리 파드(인스턴스) 1대 강제 삭제 및 동작 확인 - Link

<3> [장애2] 프라이머리 파드(인스턴스) 가 배포된 노드 1대 drain 설정 및 동작 확인

<4> CloudNativePG Scale & 롤링 업데이트

<1> 장애 테스트 준비

# 파드IP 변수 지정

POD1=$(kubectl get pod mycluster-1 -o jsonpath={.status.podIP})

POD2=$(kubectl get pod mycluster-2 -o jsonpath={.status.podIP})

POD3=$(kubectl get pod mycluster-3 -o jsonpath={.status.podIP})

# query.sql

curl -s -O https://raw.githubusercontent.com/gasida/DOIK/main/5/query.sql

cat query.sql ;echo

CREATE DATABASE test;

\c test;

CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 TEXT NOT NULL);

INSERT INTO t1 VALUES (1, 'Luis');

위와 같은 내용 ~

# SQL 파일 query 실행

psql -U postgres -h psql.$MyDomain -f query.sql

혹은

kubectl cp query.sql myclient1:/tmp

kubectl exec -it myclient1 -- psql -U postgres -h mycluster-rw -p 5432 -f /tmp/query.sql

# [터미널2] 모니터링

while true; do kubectl exec -it myclient2 -- psql -U postgres -h mycluster-ro -p 5432 -d test -c "SELECT COUNT(*) FROM t1"; date;sleep 1; done

# 확인

kubectl exec -it myclient1 -- psql -U postgres -h mycluster-ro -p 5432 -d test -c "SELECT COUNT(*) FROM t1"

count

-------

(1 row)

kubectl exec -it myclient1 -- psql -U postgres -h mycluster-ro -p 5432 -d test -c "SELECT * FROM t1"

c1 | c2

----+------

1 | Luis

(1 row)

# INSERT

psql -U postgres -h psql.$MyDomain -d test -c "INSERT INTO t1 VALUES (2, 'Luis2');"

모니터링에 카운트가 올라간다~~~~~~~

psql -U postgres -h psql.$MyDomain -d test -c "SELECT * FROM t1"

c1 | c2

----+-------

1 | Luis

2 | Luis2

(2 rows)

혹은

kubectl exec -it myclient1 -- psql -U postgres -h mycluster-rw -p 5432 -d test -c "INSERT INTO t1 VALUES (2, 'Luis2');"

kubectl exec -it myclient1 -- psql -U postgres -h mycluster-ro -p 5432 -d test -c "SELECT * FROM t1"

# test 데이터베이스에 97개의 데이터 INSERT

#for ((i=3; i<=100; i++)); do psql -U postgres -h $POD1 -p 5432 -d test -c "INSERT INTO t1 VALUES ($i, 'Luis$i');";echo; done

for ((i=3; i<=100; i++)); do kubectl exec -it myclient1 -- psql -U postgres -h mycluster-rw -p 5432 -d test -c "INSERT INTO t1 VALUES ($i, 'Luis$i');";echo; done

kubectl exec -it myclient1 -- psql -U postgres -h mycluster-ro -p 5432 -d test -c "SELECT COUNT(*) FROM t1"

<2> [장애1] 프라이머리 파드(인스턴스) 1대 강제 삭제 및 동작 확인 - Link

파드 삭제하기 테스트

# 프라이머리 파드 정보 확인

kubectl cnpg status mycluster

# [터미널1] 모니터링

watch kubectl get pod -l cnpg.io/cluster=mycluster

NAME READY STATUS RESTARTS AGE

mycluster-1 1/1 Running 0 92m

mycluster-2 1/1 Running 0 91m

mycluster-3 1/1 Running 0 90m

# [터미널2] 모니터링

while true; do kubectl exec -it myclient2 -- psql -U postgres -h mycluster-ro -p 5432 -d test -c "SELECT COUNT(*) FROM t1"; date;sleep 1; done

# [터미널3] test 데이터베이스에 다량의 데이터 INSERT

rw라 프라이머리로 써진다~~~

for ((i=301; i<=10000; i++)); do kubectl exec -it myclient2 -- psql -U postgres -h mycluster-rw -p 5432 -d test -c "INSERT INTO t1 VALUES ($i, 'Luis$i');";echo; done

row가 올라간다.

for ((i=10001; i<=20000; i++)); do kubectl exec -it myclient2 -- psql -U postgres -h mycluster-rw -p 5432 -d test -c "INSERT INTO t1 VALUES ($i, 'Luis$i');";echo; done

혹은

for ((i=301; i<=10000; i++)); do psql -U postgres -h psql.$MyDomain -p 5432 -d test -c "INSERT INTO t1 VALUES ($i, 'Luis$i');";echo; done

for ((i=10001; i<=20000; i++)); do psql -U postgres -h psql.$MyDomain -p 5432 -d test -c "INSERT INTO t1 VALUES ($i, 'Luis$i');";echo; done

# [터미널4] 파드 삭제 >> INSERT 가 중간에 끊어지나요?

kubectl get pod -l cnpg.io/cluster=mycluster -owide

kubectl delete pvc/mycluster-1 pod/mycluster-1

kubectl cnpg status mycluster

# 파드 정보 확인 : 파드의 이름과 배치 위치 비교 확인

kubectl get pod -l cnpg.io/cluster=mycluster -owide

kubectl get pod -l cnpg.io/cluster=mycluster

NAME READY STATUS RESTARTS AGE

mycluster-2 1/1 Running 0 125m

mycluster-3 1/1 Running 0 125m

mycluster-4 1/1 Running 0 2m18s

<3> [장애2] 프라이머리 파드(인스턴스) 가 배포된 노드 1대 drain 설정 및 동작 확인

노드 장애를 확인해보자~

# (옵션) 오퍼레이터 로그 확인 - 문제 이벤트 내용이 나온다.

다른 터미널에서 로그 모니터링 해보자

kubectl get pod -n operators -l app.kubernetes.io/name=cloudnative-pg

kubectl logs -n operators -l app.kubernetes.io/name=cloudnative-pg -f

# 워커노드 drain

# kubectl drain <<노드>> --ignore-daemonsets --delete-emptydir-data

kubectl get node

NODE=<각자 자신의 EC2 노드 이름 지정>

NODE=ip-192-168-3-231.ap-northeast-2.compute.internal

NODE=ip-192-168-2-233.ap-northeast-2.compute.internal

혹은

kubectl drain $NODE --delete-emptydir-data --force --ignore-daemonsets && kubectl get node -w

# 클러스터 정보 확인 : 파드가 Pending 된 주 원인은 무엇일까요? >> 예를 들어 동일AZ에 정상 워커노드가 있었다면 어떻게 될까요?

kubectl get pod -owide

kubectl cnpg status mycluster

//현재 운영중이것이 2대라 , 1대 기다리고 있다고 내용이 나온다.

Namespace: default

System ID: 7301969990465462290

PostgreSQL Image: ghcr.io/cloudnative-pg/postgresql:15.3

Primary instance: mycluster-3

Primary start time: 2023-11-16 09:50:04 +0000 UTC (uptime 52s)

Status: Creating a new replica Creating replica mycluster-4-join

Instances: 3

Ready instances: 2

Current Write LSN: 0/F006B20 (Timeline: 2 - WAL File: 00000002000000000000000F)

# 동작 확인 후 uncordon 설정

kubectl uncordon $NODE

복구후

kubectl cnpg status mycluster

Status: Cluster in healthy state

Instances: 3

Ready instances: 3

Current Write LSN: 0/11007C70 (Timeline: 2 - WAL File: 000000020000000000000011)

파드 볼륨 증가 (온라인) - Link

스토리지 증설 온라인 상에서 가능하다!!!

# 모니터링

watch kubectl get pod,pvc

# PVC 3G → 5G 로 증가 설정 : 증가 후 감소는 안됨 > AWS EBS 증가 확인

확인

kubectl df-pv

kubectl patch cluster mycluster --type=merge -p '{"spec":{"storage":{"size":"5Gi"}}}'

kubectl describe cluster mycluster

프라이머리를 다른 서버로 변경이 가능하다.

primary 파드 변경 : kubectl cnpg promote mycluster <Name>

현재 확인

kubectl cnpg status mycluster

# 모니터링

watch -d kubectl cnpg status mycluster

# primary 파드 변경

kubectl cnpg promote mycluster mycluster-4

# 확인 : rw 서비스 접속 후 확인

kubectl cnpg status mycluster

kubectl exec -it myclient3 -- psql -U postgres -h mycluster-rw -p 5432 -c "select inet_server_addr();"

------------------

192.168.1.212

(1 row)

https://www.enterprisedb.com/blog/how-cloudnativepg-manages-replication-slots

How CloudNativePG manages physical replication slots for PostgreSQL in Kubernetes

CloudNativePG has a native mechanism that provides an automated way to manage physical replication slots in a high availability Postgres cluster, with one or more hot standby replicas, and lets them survive a failover. This article describes why this is im

enterprisedb.com

<4> CloudNativePG Scale & 롤링 업데이트

Scale 테스트

# 정보 확인

kubectl cnpg status mycluster

kubectl get cluster mycluster

NAME AGE INSTANCES READY STATUS PRIMARY

mycluster 167m 3 3 Cluster in healthy state mycluster-6

# 모니터링

watch kubectl get pod

# 5대로 증가 : 증가 및 join 완료 후 아래 접속 확인

kubectl patch cluster mycluster --type=merge -p '{"spec":{"instances":5}}' && kubectl get pod -l postgresql=mycluster -w

kubectl get cluster mycluster

kubectl cnpg status mycluster

# any 접속 확인

for i in {1..30}; do kubectl exec -it myclient2 -- psql -U postgres -h mycluster-r -p 5432 -c "select inet_server_addr();"; done | sort | uniq -c | sort -nr | grep 192

# 3대로 감소

kubectl patch cluster mycluster --type=merge -p '{"spec":{"instances":3}}' && kubectl get pod -l postgresql=mycluster -w

롤링 업데이트 ??

Standby 시작 → Primary 갱신 전 SwitchOver(? 옵션 확인 할것) 로 다운타임 최소화 - Updrades Rolling_Updates

kubectl cnpg status mycluster |grep Image

(11-15-access@myeks:default) [root@myeks-bastion-EC2 ~]# kubectl cnpg status mycluster |grep Image

PostgreSQL Image: ghcr.io/cloudnative-pg/postgresql:15.3

15.3을 15.4로 변경해보자~

# [터미널1] 모니터링

watch kubectl get pod -l cnpg.io/cluster=mycluster

# [터미널2] 모니터링

while true; do kubectl exec -it myclient2 -- psql -U postgres -h mycluster-ro -p 5432 -d test -c "SELECT COUNT(*) FROM t1"; date;sleep 1; done

# [터미널3] test 데이터베이스에 다량의 데이터 INSERT

for ((i=10000; i<=20000; i++)); do kubectl exec -it myclient3 -- psql -U postgres -h mycluster-rw -p 5432 -d test -c "INSERT INTO t1 VALUES ($i, 'Luis$i');";echo; done

# [터미널4] postgresql:15.3 → postgresql:15.4 로 업데이트 >> 순서와 절차 확인

kubectl cnpg status mycluster # Primary 파드와 Image 버전 확인

kubectl patch cluster mycluster --type=merge -p '{"spec":{"imageName":"ghcr.io/cloudnative-pg/postgresql:15.4"}}' && kubectl get pod -l postgresql=mycluster -w

# 확인

kubectl get cluster mycluster

kubectl cnpg status mycluster | grep Image

(11-15-access@myeks:default) [root@myeks-bastion-EC2 ~]# kubectl cnpg status mycluster | grep Image

PostgreSQL Image: ghcr.io/cloudnative-pg/postgresql:15.4

삭제

kubectl delete cluster mycluster && kubectl delete pod —all

https://www.enterprisedb.com/blog/current-state-major-postgresql-upgrades-cloudnativepg-kubernetes

The Current State of Major PostgreSQL Upgrades with CloudNativePG

You want to upgrade your PostgreSQL cluster with no cutover time for your applications—or you want to import your database in Kubernetes but you don't know how—read this article and find out what is already possible with CloudNativePG!

enterprisedb.com

주말 CloudNet 스터디 내용 참고하여 정리한 부분입니다.

https://gasidaseo.notion.site/gasidaseo/CloudNet-Blog-c9dfa44a27ff431dafdd2edacc8a1863

다음은

https://brunch.co.kr/@topasvga/3509

9. EKS DB -PgBouncer 3/3

클러스터 신규 설치하여 실습한다. <1> PgBouncer (오픈소스 Connection Pooling) 지원 <1> PgBouncer (오픈소스 Connection Pooling) 지원 : 데이터베이스 액세스 계층 도입, 인증 모니터링 로깅 처리 가능, 연결 재

brunch.co.kr/@topasvga/3509

브런치는 최신 브라우저에 최적화 되어있습니다. IE chrome safari