DevOps/Kubernetes Cheatsheet
DevOps is the combination of cultural philosophies, practices, and tools in software engineering that encourages collaboration between traditionally siloed development and IT operations teams. This is my personal Cheat sheet for various DevOps related tools. If there is some context missing here or there, don’t hesitate to reach out to me on Twitter.
Kubernetes
If you’re working with kubectl the FIRST thing you want to do is setup autocompletion.
Context & Multiple Clusters
# Get configuration from cloud
exo sks kubeconfig $cluster-name kube-admin --zone ch-dk-2 --group system:masters >> ~/.kube/$CONFIG_NAME
# Create Context
kubectl config --kubeconfig=~/.kube/$CONFIG_NAME set-context $CONTEXT_NAME
# View new cluster and user entry
kubectl config view
nano ~/.kube/$CONFIG_NAME
# Edit names to differentiate between clusters
# Create a new context
kubectl config set-context $CONTEXT_NAME --cluster=$CLUSTER_NAME --user=$USER_NAME --namespace=default
# Use Context
kubectl config use-context $CONTEXT_NAME
Select Pods by label in all namespaces:
kubectl get pods -l app=longhorn-manager -A
Secrets
Decode a secret:
kubectl get secrets/argo-postgres-config -n argo --template=\{\{.data.password\}\} | base64 -D
Pods
Delete pods with name starting with $STRING. Don’t do this and use labels instead.
for pod in $(kubectl get po -n argo | grep "$STRING" | awk '{print $1}'); do kubectl delete pod/$pod -n $NAMESPACE; done;
Deployments
Restart all deployments in $NAMESPACE.
for deployment in $(kubectl get deployment --namespace $NAMESPACE -o jsonpath='{.items[*].metadata.name}'); do
kubectl rollout status deployment $deployment --namespace $NAMESPACE
done
Restart Deployment:
kubectl rollout restart deployment $DEPLOYMENT --namespace $NAMESPACE
DevSpace ImagePullBackoff
I recently had the problem that my devspace helm deployments could not pull images from a private registry. Since devspace is using the default sa account it is possible to patch the account and set a default imagePullSecret.
kubectl patch sa default -n $NAMESPACE -p '"imagePullSecrets": [{"name": "$SECRET_NAME" }]'
Longhorn PVC/PV Debugging
Error:
Warning FailedScheduling 26m (x2371 over 40h) default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Get PVC name from description kubectl describe $POD -n argo | grep ClaimName
More Information about the PVC:
kubectl get pvc $PVC -n $NAMESPACE -o yaml
kubectl get pvc -n postgres
kubectl describe pvc postgres-db-postgres-cluster-1-0 -n postgres
PV stuck in Termintating
kubectl get pv
kubectl patch pv pvc-f87cd151-8497-4a6e-a991-a9d1a99fd761 -p '{"metadata":{"finalizers":null}}'
kubectl describe pvc postgres-db-postgres-cluster-1-0 -n postgres
Error invalid json format of recurringJobSelector: invalid character '}' looking for beginning of object key string
:
Changed:
recurringJobSelector: '[
{"name":"backup-s3",
"isGroup":true,}
]'
# to
recurringJobSelector: '[{"name":"backup-s3","isGroup":true,}]'
recurringJobSelector: '[{"name":"backup-s3","isGroup":true}]'
Without the ,
.
Validate with:
kubectl get cm longhorn-storageclass -o yaml -n longhorn-system
kubectl describe kubegres postgres-cluster -n postgres
Linux General
Encrypt/Decrypt String
echo 12345678901 | openssl enc -e -base64 -aes-128-ctr -nopad -nosalt -k secret_password
echo cSTzU8+UPQQwpRAq | openssl enc -d -base64 -aes-128-ctr -nopad -nosalt -k secret_password
Prometheus / Grafana
Monitoring Argo Workflows with Prometheus and Grafana
- Deploy the Helm Chart
prometheus-community/kube-prometheus-stack
into the namespacemonitoring
. - Add the label
monitoring
to the Serviceworkflow-controller-metrics
with the valueprometheus
. ```yaml apiVersion: v1 kind: Service metadata: labels: app: workflow-controller monitoring: prometheus name: workflow-controller-metrics namespace: argo ports:- name: metrics port: 9090 protocol: TCP targetPort: 9090 selector: app: workflow-controller ```
3. Create a `ServiceMonitor` for the `workflow-controller-metrics` service. After applying this config the prometheus server should auto discover the service. The label `release: prometheus` is mandatory.
```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: workflow-controller-metrics
labels:
release: prometheus
spec:
endpoints:
- path: /metrics
port: metrics
scheme: http
scrapeTimeout: 30s
jobLabel: argo-workflows
namespaceSelector:
matchNames: - argo
selector:
matchLabels:
monitoring: prometheus
```
4. Add annotations to the Service to enable the Prometheus data scraping. The annotation `prometheus.io/scrape: “true”` is mandatory!
```yaml
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/metrics"
prometheus.io/port: "9090"
labels:
app: workflow-controller
monitoring: prometheus
name: workflow-controller-metrics
spec:
ports:
- name: metrics
port: 9090
protocol: TCP
targetPort: 9090
selector:
app: workflow-controller
```
5. Reload the prometheus configuration.
```bash
kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9091:9090 -n monitoring
curl -X POST http://localhost:9091/-/reload
```
6. Add the Grafana Dashboard as configmap to the namespace `monitoring`. Download the Dashboard [here](https://grafana.com/grafana/dashboards/13927).
Change all occurences of `${DS_THANOS-MASTER}` to `prometheus`. This value would normally be set while importing the dashboard, but since we are automatically importing the dashboard, we need to change it. The label `grafana_dashboard: "1"` is crucial, without it the dashboard will not be imported.
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: argo-workflows-dashboard
labels:
grafana_dashboard: "1"
data:
argo-workflows-dashboard.json: |
...
```
7. Delete the Grafana Pod. After a new Pod is created, it should automatically import the dashboard.
### Debug Prometheus ServiceMonitor not added to configuration.
Example ServiceMonitor:
```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: workflow-controller-metrics
labels:
release: prometheus
spec:
endpoints:
- path: /metrics
port: metrics
scheme: http
scrapeTimeout: 30s
jobLabel: argo-workflows
namespaceSelector:
matchNames: - argo
selector:
matchLabels:
monitoring: prometheus
```
- Are the label selectors correct for the ServiceMonitor? Try selecting the services and pods by the labels specified in the ServiceMonitor configuration.
`kubectl get servicemonitor -l release=prometheus -A`
- Have the default ServiceMonitors any labels you havent added to the custom one? `kubectl get servicemonitor prometheus-kube-prometheus-apiserver -o yaml -n monitoring`
- Check the prometheus configuration in the UI. `kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9091:9090 -n monitoring`. A scrape configuration with the jobName should be generated.
- Reload the Prometheus configuration with `curl -X POST http://localhost:9091/-/reload`.
---
## HELM
---
### Using Helm in a CI/CD pipeline
Sometimes you want to upgrade an existing release and other times you want to install a new release. This can be achieved by setting the `--install` flag on the `upgrade` command. The `--install` flag will install the release if it does not exist. The `--atomic` flag will roll back the release if after 3 minutes (`--wait --timeout 3m0s`) the release is not successful. If there’s no revision to revert to, the chart deployment will be deleted.
```bash
helm repo add prometheus-community/kube-prometheus-stack https://prometheus-community.github.io/helm-charts
helm upgrade --install --wait --timeout 3m0s --atomic -f values.yaml prometheus prometheus-community/kube-prometheus-stack
```
---
## GitLab
---
### GitLab CI/CD
#### Kubectl `apply` Template:
```yaml
.kubectl_deploy_template: &kubectl_template
image: google/cloud-sdk
before_script:
- kubectl config set-cluster k8s --server="$K8S_SERVER_URL"
- kubectl config set clusters.k8s.certificate-authority-data $CERTIFICATE_AUTHORITY_DATA
- kubectl config set-credentials gitlab --token="$K8S_USER_TOKEN"
- kubectl config set-context default --cluster=k8s --user=gitlab
- kubectl config use-context default
- kubectl create namespace $NAMESPACE --dry-run=client -o yaml | kubectl apply -f -
- kubectl config set-context default --namespace=$NAMESPACE
script:
- ls -lah $DEPLOYMENT_FOLDER
- kubectl apply --recursive -f $DEPLOYMENT_FOLDER
```
Template usage:
```yaml
deploy_application:
<<: *kubectl_template
stage: deploy_application
variables:
NAMESPACE: my-namespace
DEPLOYMENT_FOLDER: dev/application/
rules:
- if: "$CI_COMMIT_BRANCH == 'dev'"
```
#### Helm `upgrade` template:
```yaml
.helm_install_template: &helm_install_template
image: dtzar/helm-kubectl
before_script:
- echo "Before script"
script:
- kubectl config set-cluster k8s --server="$K8S_SERVER_URL"
- kubectl config set clusters.k8s.certificate-authority-data $CERTIFICATE_AUTHORITY_DATA
- kubectl config set-credentials gitlab --token="$K8S_USER_TOKEN"
- kubectl config set-context default --cluster=k8s --user=gitlab
- kubectl config use-context default
- kubectl create namespace $NAMESPACE --dry-run=client -o yaml | kubectl apply -f -
- kubectl config set-context default --namespace=$NAMESPACE
- helm repo add $CHART_NAME $HELM_REPO_URL
- helm upgrade --install --wait --timeout 3m0s --atomic $HELM_ARGS $RELEASE_NAME $HELM_REPO
- |
for deployment in $(kubectl get deployment --namespace $NAMESPACE -o jsonpath='{.items[*].metadata.name}'); do
kubectl rollout status deployment/$deployment --namespace $NAMESPACE
done
```
Template usage:
```yaml
deploy_prometheus_base:
<<: *helm_install_template
stage: deploy_prometheus_base
variables:
NAMESPACE: monitoring
RELEASE_NAME: prometheus
HELM_REPO: prometheus-community/kube-prometheus-stack
HELM_REPO_URL: https://prometheus-community.github.io/helm-charts
CHART_NAME: prometheus-community
HELM_ARGS: "-f values.yaml"
rules:
- if: "$CI_COMMIT_BRANCH == 'dev'"
```