Skip to content

DevOps & Infrastructure

Comprehensive reference covering Docker, Kubernetes, CI/CD, Terraform, cloud platforms, monitoring, SRE practices, and data center networking.

Containers & Docker

  • [[docker-fundamentals]] - containers, images, volumes, networking, lifecycle
  • [[dockerfile-and-image-building]] - Dockerfile syntax, multi-stage builds, optimization
  • [[docker-compose]] - multi-container orchestration, service discovery
  • [[docker-for-ml]] - ML-specific Docker patterns, JupyterLab, MLflow, Model Runner

Kubernetes

  • [[kubernetes-architecture]] - control plane, worker nodes, CNI, cluster deployment
  • [[kubernetes-workloads]] - Pods, Deployments, StatefulSets, DaemonSets, ConfigMaps, Secrets
  • [[kubernetes-services-and-networking]] - Services, Ingress, TLS, cert-manager, DNS
  • [[kubernetes-storage]] - PV, PVC, StorageClass, CSI drivers, cloud storage
  • [[kubernetes-resource-management]] - requests, limits, QoS, namespaces, quotas, HPA, autoscaling
  • [[kubernetes-on-aks]] - Azure AKS, ACR, Azure AD, monitoring, virtual nodes
  • [[kubernetes-on-eks]] - AWS EKS, ECR, EBS/EFS, ALB Ingress Controller

Package Management & Templating

  • [[helm-package-manager]] - charts, templates, hooks, releases, repositories, secrets

CI/CD & Automation

  • [[cicd-pipelines]] - GitHub Actions, Azure DevOps, pipeline stages, multi-environment
  • [[jenkins-automation]] - Jenkinsfile, shared libraries, Docker agents, credentials
  • [[gitops-and-argocd]] - ArgoCD, sync waves, app-of-apps, Sealed Secrets, Crossplane

Infrastructure as Code

  • [[terraform-iac]] - HCL, state, modules, workspaces, cloud providers
  • [[ansible-configuration-management]] - playbooks, roles, inventory, idempotency

Cloud Platforms

  • [[aws-cloud-fundamentals]] - IAM, VPC, EC2, S3, CloudWatch, ECR, App Runner
  • [[container-registries]] - Docker Hub, ECR, ACR, GAR, Nexus

Monitoring & Observability

  • [[monitoring-and-observability]] - Golden Signals, Prometheus, Grafana, Loki, Tempo, SLI/SLO/SLA

SRE Practices

  • [[sre-principles]] - culture, error budgets, toil, team models
  • [[sre-incident-management]] - on-call, postmortems, escalation, diagnostics
  • [[sre-automation-and-toil]] - automation maturity, tools, workflow automation
  • [[chaos-engineering-and-testing]] - chaos engineering, load testing, game days, resilience patterns

Deployment & Release

  • [[deployment-strategies]] - rolling update, blue-green, canary, feature flags, 15-factor methodology
  • [[service-mesh-istio]] - Istio, Envoy, traffic management, mTLS, observability

Architecture Patterns

  • [[microservices-patterns]] - Spring Cloud, API Gateway, service discovery, event-driven, OAuth2
  • [[datacenter-network-design]] - CLOS/leaf-spine, VXLAN, EVPN, BGP, multi-site

Foundations

  • [[devops-culture-and-sdlc]] - DevOps principles, Agile/Scrum, SDLC, tools landscape
  • [[git-version-control]] - Git workflow, branching strategies, monorepo vs multirepo
  • [[linux-server-administration]] - filesystem, processes, networking, SSH, systemd, scripting