๋กœ์ผ“๐Ÿพ
article thumbnail
๋ฐ˜์‘ํ˜•

 

 

Karpenter ๋Š” AWS ์—์„œ ์ง„ํ–‰ ์ค‘์ธ ์˜คํ”ˆ ์†Œ์Šค ํ”„๋กœ์ ํŠธ๋กœ ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค์—์„œ AWS ASG ๋ณด๋‹ค ํ•œ ์ฐจ์› ๋†’์€ ์ˆ˜์ค€์˜ ์Šค์ผ€์ผ๋ง์„ ์ œ๊ณตํ•ด์ค๋‹ˆ๋‹ค. Karpenter ๊ฐ€ ๋ฌด์—‡์ธ์ง€ ๋ชจ๋ฅด๊ฒ ๋‹ค๋ฉด ์—ฌ๊ธฐ๋ฅผ ์ฐธ๊ณ !

 

Karpenter ๋ฅผ ์ด์šฉํ•˜๋ฉด CPU ์ธ์Šคํ„ด์Šค ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ GPU ์ธ์Šคํ„ด์Šค ๋˜ํ•œ ์ •๋ง์ •๋ง ์œ ๋™์ ์œผ๋กœ ํ”„๋กœ๋น„์ €๋‹ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Karpenter ๋ฅผ ์“ฐ๋ฉด ์“ธ ์ˆ˜๋ก "์™€ ์ด๊ฑฐ ์ง„์งœ ์งฑ์ธ๋ฐ?" ๋ผ๋Š” ์ƒ๊ฐ์ด ๋งŽ์ด ๋“ญ๋‹ˆ๋‹ค. ์•„์ฃผ ์œ ์šฉํ•œ ํ”„๋กœ์ ํŠธ์ธ๊ฑฐ ๊ฐ™์•„์š”.

 

CPU ์ธ์Šคํ„ด์Šค ๊ฐ™์€ ๊ฒฝ์šฐ์—๋Š” Karpenter ๋งŒ ์„ค์น˜ํ•˜๋ฉด ํ”„๋กœ๋น„์ €๋‹, ๋””ํ”„๋กœ๋น„์ €๋‹ ๋ชจ๋‘ ์ž˜ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ GPU ์ธ์Šคํ„ด์Šค ๊ฒฝ์šฐ ํ”„๋กœ๋น„์ €๋‹์€ ์ž˜๋˜์ง€๋งŒ, ๋””ํ”„๋กœ๋น„์ €๋‹์€ ๋™์ž‘ํ•˜์ง€ ์•Š์•„์š”.

 

์ฒ˜์Œ์—๋Š” ์•„์ง ๋ฉ”์ด์ € ๋ฒ„์ „์ด๋ผ์„œ GPU ๊ด€๋ จํ•œ ๊ธฐ๋Šฅ์ด ๊ฐœ๋ฐœ์ด ์•ˆ๋œ ์ค„ ์•Œ์•˜์ง€๋งŒ... ์ €์˜ ์ฐฉ๊ฐ์ด์˜€์ฃ .

 

๊ทธ๋Ÿผ ์–ด๋–ป๊ฒŒ Karpenter ๋ฅผ ์ด์šฉํ•ด์„œ GPU ์ธ์Šคํ„ด์Šค๋ฅผ ๋””ํ”„๋กœ๋น„์ €๋‹ ํ•˜๋Š”์ง€ ์•Œ์•„๋ณด๋„๋ก ํ•˜์ฃ !

 

๋ ›์ธ ๋‘๋”์ฝ”๋“œ~!

 

 

 

Github ์—์„œ ์ด์Šˆ ํ™•์ธ!


์ €  ๊ฐ™์€ ์‚ฌ๋žŒ์ด ์ „์„ธ๊ณ„์— ํ•œ๋ช…์ฏค์€ ์žˆ์„ ๊ฒ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์•„๋งˆ ์ด๋ฏธ Karpenter Repo ์— Issus ๋ฅผ ๋ฐœํ–‰ํ–ˆ์„ ๊ฑฐ์—์š”. ๊ทธ๋Ÿฌ๋‹ˆ ํ•œ๋ฒˆ ์ฐพ์•„๋ณด์ฃ !

(Karpenter Repo ๋Š” ์—ฌ๊ธฐ!)

 

๊ธ€ ์ž‘์„ฑ ๊ธฐ์ค€์œผ๋กœ v0.28.0 ์ด ๊ฐ€์žฅ ์ตœ์‹  ๋ฒ„์ „

 

gpu ๋ผ๋Š” ํ‚ค์›Œ๋“œ๋กœ ๊ฒ€์ƒ‰ํ•ด๋ดค๋Š”๋ฐ ๋‹คํ–‰ํžˆ ์—ฐ๊ด€๋œ ์ด์Šˆ๋“ค์ด ๊ฝค ์žˆ๊ตฐ์š”!

 

 

์ด๋“ค ์ค‘ Karpenter not deprovisioning and deleting customized gpu node #3862 ๋ผ๋Š” ์ด์Šˆ๋ฅผ ํ™•์ธํ•ด๋ณด์ฃ .

 

์•„ํ•˜! ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค์— ๋””๋ฐ”์ด์Šค ํ”Œ๋Ÿฌ๊ทธ์ธ์„ ์„ค์น˜ํ•˜๋ผ๊ณ  ํ•˜๋Š” ๊ตฐ์š”. 

 

์ƒ๊ฐํ•ด๋ณด๋‹ˆ ์˜ˆ์ „์— ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค์—์„œ ์ด์™€ ๊ด€๋ จํ•œ ๋ฌธ์„œ๋ฅผ ๋ณธ๊ฑฐ ๊ฐ™์•„์š”.

์ฟ ๋ฒ„๋„คํ‹ฐ์Šค๋Š” ๋””๋ฐ”์ด์Šค ํ”Œ๋Ÿฌ๊ทธ์ธ์„ ์‚ฌ์šฉํ•˜์—ฌ AMD ๋ฐ NVIDIA GPU(๊ทธ๋ž˜ํ”ฝ ํ”„๋กœ์„ธ์‹ฑ ์œ ๋‹›)๋ฅผ ์—ฌ๋Ÿฌ ๋…ธ๋“œ๋“ค์— ๊ฑธ์ณ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ ์•ˆ์ •์ ์ธ ์ง€์›์„ ํฌํ•จํ•œ๋‹ค.

 

๋˜ํ•œ Karpenter ๋ฌธ์„œ์—๋„ ์•„๋ž˜์™€ ๊ฐ™์€ ๋‚ด์šฉ์„ ๋ฐœ๊ฒฌํ•  ์ˆ˜ ์žˆ์–ด์š”.

GPU ๋…ธ๋“œ๋ฅผ ํ”„๋กœ๋น„์ €๋‹ํ•˜๋Š” ๊ฒฝ์šฐ ํ•ด๋‹น ๋…ธ๋“œ์— ์ ์ ˆํ•œ GPU ์žฅ์น˜ ํ”Œ๋Ÿฌ๊ทธ์ธ ๋ฐ๋ชฌ์…‹์„ ๋ฐฐํฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. 
daemonset์ด ์‹คํ–‰๋˜์ง€ ์•Š์œผ๋ฉด Karpenter๋Š” ํ•ด๋‹น ๋…ธ๋“œ๋ฅผ ์ดˆ๊ธฐํ™”๋œ ๊ฒƒ์œผ๋กœ ๋ณด์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

 

์ด์ œ ์›์ธ์„ ์•Œ์•˜์Šต๋‹ˆ๋‹ค. ์—”๋น„๋””์•„ GPU ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š”๋ฐ ์—”๋น„๋””์•„ ๋””๋ฐ”์ด์Šค ํ”Œ๊ทธ๋Ÿฌ์ธ์„ ์„ค์น˜ํ•˜์ง€ ์•Š์•„ Karpenter ๊ฐ€ ์ด๋ฅผ ์ดˆ๊ธฐํ™”ํ–ˆ๋‹ค๊ณ  ์ธ์‹ํ•˜์ง€ ๋ชปํ•˜๊ณ  ์žˆ์–ด์„œ ๊ทธ๋žฌ๋˜ ๊ฒ๋‹ˆ๋‹ค.

 

๊ทธ๋Ÿผ ์ด์ œ ํ•œ๋ฒˆ ์—”๋น„๋””์•„ ๋””๋ฐ”์ด์Šค ํ”Œ๋Ÿฌ๊ทธ์ธ์„ ์„ค์น˜ํ•˜๊ณ  ํ…Œ์ŠคํŠธํ•ด๋ณด์ฃ !

 

 

NVIDIA / k8s-device-plugin ์„ค์น˜


์„ค์น˜ ๋ฐฉ๋ฒ•์€ ์—ฌ๊ธฐ์— ์ž˜ ๊ฐ€์ด๋“œ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

 

Helm Repo ๋ฅผ ๋“ฑ๋กํ•˜๊ณ  ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.

$ helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
$ helm repo update

 

Helm CLI ๋ฅผ ํ†ตํ•ด ์„ค์น˜ํ•ด์ค๋‹ˆ๋‹ค.

helm upgrade -i nvdp nvdp/nvidia-device-plugin \
  --namespace nvidia-device-plugin \
  --create-namespace \
  --version 0.14.0

 

 

๋””๋ฐ”์ด์Šค ํ”Œ๋Ÿฌ๊ทธ์ธ์€ ๋ฐ๋ชฌ์…‹์œผ๋กœ ์„ค์น˜๋˜๊ธฐ ๋•Œ๋ฌธ์— GPU ๊ฐ€ ์•„๋‹Œ CPU ์ธ์Šคํ„ด์Šค์—๋„ ์„ค์น˜ํ•˜๋ ค๊ณ  ์‹œ๋„ํ•˜๋Š”๋ฐ.. ๋‹น์—ฐํžˆ ๋‹ค Error ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ NodeAffinity ์™€ Taints, Toleralations ๋ฅผ ์ด์šฉํ•ด ์›ํ•˜๋Š” GPU ์—๋งŒ ๋ฐฐํฌ๋  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

 

values.yaml ์„ ๋‹ค์šด๋กœ๋“œ ๋ฐ›๊ณ  ์ปค์Šคํ…€ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

 

์ €๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ ์˜ต์…˜์„ ์ถ”๊ฐ€ํ•ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions: # OR
          - key: "node.kubernetes.io/instance-type"
            operator: In
            values: [ "g4dn.xlarge" ]
          - key: "karpenter.k8s.aws/instance-family"
            operator: In
            values: ["g4dn"]
          - key: "topology.kubernetes.io/zone"
            operator: In
            values: [ "ap-northeast-2a", "ap-northeast-2b", "ap-northeast-2c", "ap-northeast-2d" ]
          - key: "karpenter.sh/capacity-type"
            operator: In
            values: [ "on-demand"]
          - key: "kubernetes.io/arch"
            operator: In
            values: [ "amd64" ]
tolerations:
  - key: kubernetes.io/arch
    value: "amd64"
    effect: NoSchedule
  - key: nvidia.com/gpu
    value: "true"
    effect: NoSchedule

 

 

๊ทธ๋ฆฌ๊ณ  karpenter-provisioner.yaml ๋„ ๋ฐฐํฌํ•ด์ฃผ๊ณ , GPU ๊ธฐ๋ฐ˜์˜ ํŒŒ๋“œ๋ฅผ ์ƒ์„ฑํ•ด์ค๋‹ˆ๋‹ค.

apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: gpu-instance-aws-node-template
spec:
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 40Gi
        volumeType: gp2
        encrypted: false
        
  securityGroupSelector:
    Name: "eks-cluster-sg"
    kubernetes.io/cluster/csg-sd-dev-eks: owned
    
  subnetSelector:
    karpenter.sh/discovery: eks
    
  tags:
    karpenter.sh/discovery: eks
---
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: gpu-instance-karpenter-provisioner
spec:
  providerRef:
    name: gpu-instance-aws-node-template

  requirements:
    - key: "node.kubernetes.io/instance-type"
      operator: In
      values: [ "g4dn.xlarge" ]
    - key: "karpenter.k8s.aws/instance-family"
      operator: In
      values: ["g4dn"]
    - key: "topology.kubernetes.io/zone"
      operator: In
      values: [ "ap-northeast-2a", "ap-northeast-2b", "ap-northeast-2c", "ap-northeast-2d" ]
    - key: "karpenter.sh/capacity-type"
      operator: In
      values: [ "on-demand"]
    - key: "kubernetes.io/arch"
      operator: In
      values: [ "amd64" ]

  limits:
    resources:
      cpu: "100"
      memory: 100Gi
      nvidia.com/gpu: 100

  consolidation:
    enabled: true

  # ๋ชจ๋“  ๋…ธ๋“œ์— ์ ์šฉ๋˜๋Š” label
  labels:
    provision: karpenter

  # ๋ชจ๋“  ๋…ธ๋“œ์— ์ ์šฉ๋˜๋Š” taints
  taints:
    - key: kubernetes.io/arch
      value: "amd64"
      effect: NoSchedule
    - key: nvidia.com/gpu
      value: "true"
      effect: NoSchedule

 

์ดํ›„ EC2 ์— ์ƒ์„ฑ๋œ GPU ์ธ์Šคํ„ด์Šค๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๊ณ ,

 

Pod ๋ฅผ ์‚ญ์ œํ•˜๋ฉด Karpenter ๋Š” GPU ์ธ์Šคํ„ด์Šค์— ์–ด๋–ค ์ž์›๋„ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ์ง€ ์•Š๋‹ค๊ณ  ํŒ๋‹จํ•˜์—ฌ, ์ธ์Šคํ„ด์Šค๋ฅผ ๋””ํ”„๋กœ๋น„์ €๋‹ ํ•ด์ค๋‹ˆ๋‹ค!

 

์˜ค๋Š˜์€ ์ด๋ ‡๊ฒŒ Karpenter ๋กœ ํ”„๋กœ๋น„์ €๋‹ํ•œ GPU ์ธ์Šคํ„ด์Šค๊ฐ€ ์™œ ๋””ํ”„๋กœ๋น„์ €๋‹์€ ์•ˆ๋˜๋Š” ์ง€์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ดค์Šต๋‹ˆ๋‹ค.

 

๊ทธ๋Ÿผ ์˜ค๋Š˜์€ ์—ฌ๊ธฐ๊นŒ์ง€!

๋ฐ˜์‘ํ˜•
profile on loading

Loading...