mojozoox

Finding the right mojo for everything.

Kubernetes Resource Requests and Limits

  • When scheduler tries to schedule a pod, k8s checks for pod's resource requirements and places on node which has sufficient resources
  • By default container requests for 0.5 CPU and 256 Mi of RAM for getting scheduled, this can be modified by adding resources section under spec of pod yaml definition
...
spec
  ...
  resources:
    requests:
      memory: "1Gi"
      cpu: 1
  • 1 CPU = 1000m = 1 vCPU = 1 AWS vCPU = 1 GCP core = 1 Azure core = 1 Hyperthread. m is millicore
  • It can as low as 0.1 which is 100m
  • For memory 1 K (kilobyte) = 1,000 bytes 1 M = 1,000,000 bytes 1 G = 1,000,000,000 bytes 1 Ki (kibibyte) = 1,024 bytes 1 Mi = 1,048,576 bytes ...
  • While container is running it's resource requirements can go high so by default k8s sets a limit of 1 vCPU and 512 Mi to containers, this can also be changed by adding limits section under resources section
...
spec:
  resources:
    ...
    limits:
      memory: "2Gi"
      cpu: 2
  • If container tries to use more CPU then limits, then it is throttled and in case if memory exceeds container is terminated

Terraform Template to Create SQS Private Endpoint

data "aws_vpc_endpoint_service" "sqs" {
  service      = "sqs"

  filter {
    name   = "service-type"
    values = ["Interface"]
  }
}

data "aws_vpc" "selected" {
  id = "vpc-change-me"
}

resource "aws_security_group" "sqs_ep" {
  name                   = "sqs-ep"
  vpc_id                 = "vpc-change-me"
  revoke_rules_on_delete = true

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # I use the https://github.com/cloudposse/terraform-null-label to create my tags 
  tags = module.label.tags
}

resource "aws_security_group_rule" "sqs_ep" {
  description              = "Allow https traffice from the VPC"
  from_port                = 443
  protocol                 = "tcp"
  security_group_id        = aws_security_group.sqs_ep.id
  cidr_blocks              = [data.aws_vpc.selected.cidr_block]
  to_port                  = 443
  type                     = "ingress"
}

resource "aws_vpc_endpoint" "sqs_ep" {
  vpc_id            = "vpc-change-me"
  service_name      = data.aws_vpc_endpoint_service.sqs.service_name
  vpc_endpoint_type = "Interface"
  auto_accept       = null

  security_group_ids  = [aws_security_group.sqs_ep.id]
  subnet_ids          = module.dynamic_subnets.private_subnet_ids
  policy              = null
  private_dns_enabled = true

  tags = module.label.tags
}

Quarantine A EKS Node with Docker Containers

While performing these step please practice caution, and have the right dashboards that will point to any issues.

Step 0. Annotate the node, ip-10-102-11-188.ec2.internal, in question, to not participate in the auto-scaling.

$ kubectl annotate node ip-10-102-11-188.ec2.internal cluster-autoscaler.kubernetes.io/scale-down-disabled=true

Step 1. Cordon the affected the node.

kubectl cordon ip-10-102-11-188.ec2.internal

Step 2. Get the list of app pods running on that instance

$ kubectl get pods -n podinfo --field-selector spec.nodeName=ip-10-102-11-188.ec2.internal --show-labels
NAME                      READY   STATUS    RESTARTS   AGE   LABELS
podinfo-ffb8d6b8d-bbs5r   1/1     Running   0          48m   app=podinfo,pod-template-hash=ffb8d6b8d
podinfo-ffb8d6b8d-jfs9b   1/1     Running   0          48m   app=podinfo,pod-template-hash=ffb8d6b8d
podinfo-ffb8d6b8d-mpskj   1/1     Running   0          48m   app=podinfo,pod-template-hash=ffb8d6b8d

Step 3. Label the pods for quarantine

$ kubectl label pod podinfo-ffb8d6b8d-bbs5r -n podinfo app=quarantine --overwrite
pod/podinfo-ffb8d6b8d-bbs5r labeled

After you’ve changed the label, you will notice that ReplicaSet creates a new pod, but the pod with name podinfo-ffb8d6b8d-bbs5r stays around.

NOTE: Do this for every pod you found in the step 2

Step 4. Seek out any Deployment resource pods, that relevant from to evicted, from that node.

$ kubectl get pods --all-namespaces --field-selector spec.nodeName=ip-10-102-11-188.ec2.internal
NAMESPACE           NAME                                              READY   STATUS             RESTARTS   AGE
kube-system         aws-node-b5qb8                                    1/1     Running            0          82d
kube-system         coredns-59dfd6b59f-hn5sk                          1/1     Running            0          56m
kube-system         debug-agent-z2r2x                                 1/1     Running            0          82d
kube-system         kube-proxy-g4sr7                                  1/1     Running            0          82d
podinfo             podinfo-ffb8d6b8d-bbs5r                           1/1     Running            0          50m
podinfo             podinfo-ffb8d6b8d-jfs9b                           1/1     Running            0          50m
podinfo             podinfo-ffb8d6b8d-mpskj                           1/1     Running            0          50m

For example in the above list coredns-59dfd6b59f-hn5sk pod from the coredns deployment, evict this.

$ kubectl delete pod -n kube-system coredns-59dfd6b59f-hn5sk

Step 5. SSH into the node ip-10-102-11-188.ec2.internal to stop and disable kubelet

$ ssh ec2-user@ip-10-102-11-188.ec2.internal
Last login: Tue Dec 21 09:27:28 2021 from ip-10-102-11-188.ec2.internal

       __|  __|_  )
       _|  (     /   Amazon Linux 2 AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-2/
14 package(s) needed for security, out of 44 available
Run "sudo yum update" to apply all updates.
[ec2-user@ip-10-102-11-188 ~]$ sudo systemctl stop kubelet
[ec2-user@ip-10-102-11-188 ~]$ sudo systemctl disable kubelet
Removed symlink /etc/systemd/system/multi-user.target.wants/kubelet.service.

This above operation should send node into NotReady state.

Step 6. Once the Node ip-10-102-11-188.ec2.internal goes into a NotReady state delete the node

$ kubectl get nodes
NAME                               STATUS                        ROLES    AGE    VERSION
ip-10-102-11-188.ec2.internal      NotReady,SchedulingDisabled   <none>   154m   v1.17.17-eks-ac51f2
$ kubectl delete ip-10-102-11-188.ec2.internal
node "ip-10-102-11-188.ec2.internal" deleted

This will ensure the the node is removed and no longer part of the K8s cluster. The above steps ensure the the instance is running with all the docker containers intact, but no longer attached to the K8s cluster. This ends our work on the terminal, the following steps need to be performed on AWS Management Console.

Step 7. At this point when you will need to logged in to the node, you should pause all the docker containers. But before that you might want to check for any spurious connection from the running containers.

# docker ps -q | xargs -n1 -I{} -P0 bash -c "docker inspect -f '{{.State.Pid}}' {}; exit 0;" | xargs -n1 -I{} -P0 bash -c 'nsenter -t {} -n ss -p -a -t4 state established; echo; exit 0'

And the pause the running containers.

$ docker ps -q | xargs -n1 -I{} -P0 bash -c 'docker pause {}; exit 0'

Step 8. Head over to the EC2 > ASG dashboard on the AWS Management Console, and detach the instance from the ASG.

Step 9. Go to instance details dashboard, and remove the existing security group and add the QUARANTINE security group.