GitOps Is Simple. GitOps at Scale Is Not.
The pitch is elegant: declare your desired state in Git, and a controller reconciles reality to match. No more kubectl apply from laptops. No more "it works on my cluster." Full audit trail, easy rollbacks, and infrastructure-as-code for everything.
We've implemented GitOps across 50+ Kubernetes clusters for clients ranging from Series A startups to Fortune 500 enterprises. Here's what we've learned about making it work at scale.
Lesson 1: Mono-Repo vs. Multi-Repo — It Depends
The first decision every team agonizes over. Here's our guidance:
Mono-Repo (Recommended for < 20 services)
gitops-config/
├── base/
│ ├── namespaces/
│ ├── network-policies/
│ └── rbac/
├── apps/
│ ├── payment-service/
│ ├── auth-service/
│ └── notification-service/
└── clusters/
├── prod-us-east/
├── prod-eu-west/
└── staging/Pros: Single PR for cross-cutting changes, easy to see the full picture, simpler CI/CD Cons: Gets unwieldy past 20-30 services, blast radius of bad merges is larger
Multi-Repo (Recommended for > 20 services)
Each team owns their own config repo. A central "cluster config" repo references them.
Pros: Team autonomy, smaller blast radius, independent release cycles Cons: Harder to make global changes, more repos to manage, need tooling for cross-repo visibility
Our Recommendation
Start mono-repo. Split when it becomes painful (usually around 20 services or 5 teams). The migration is straightforward with Kustomize or Helm.
Lesson 2: Environment Promotion Done Right
The most common GitOps anti-pattern: manually updating image tags in config files across environments.
The Right Way: Automated Image Promotion
# Use Kustomize overlays for environment-specific config
# base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yaml
- hpa.yaml
# overlays/staging/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
patchesStrategicMerge:
- deployment-patch.yaml
images:
- name: payment-service
newTag: v2.3.1-rc.1 # Automatically updated by CI
# overlays/production/kustomization.yaml
# Same structure, but with production image tag
images:
- name: payment-service
newTag: v2.3.0 # Promoted from staging after validationPromotion Pipeline
CI builds image → Updates staging config → ArgoCD syncs staging
↓
Integration tests pass
↓
PR auto-created for prod config
↓
Team approves → ArgoCD syncs prodLesson 3: Drift Detection Is Non-Negotiable
Someone will kubectl edit in production. It's not a matter of if, but when.
Configure ArgoCD for Self-Healing
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: payment-service
spec:
syncPolicy:
automated:
prune: true # Remove resources not in Git
selfHeal: true # Revert manual changes
syncOptions:
- CreateNamespace=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3mAlert on Drift
Even with self-healing, you want to know when drift happens:
# Prometheus alert for ArgoCD drift
- alert: ArgoCDApplicationOutOfSync
expr: argocd_app_info{sync_status="OutOfSync"} == 1
for: 5m
labels:
severity: warning
annotations:
summary: "Application {{ $labels.name }} is out of sync"Lesson 4: Secrets in GitOps
The hardest problem in GitOps: you can't put secrets in Git, but GitOps says everything should be in Git.
Our Recommended Approach: External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: payment-db-credentials
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
target:
name: payment-db-credentials
data:
- secretKey: username
remoteRef:
key: secret/data/payment-service/db
property: username
- secretKey: password
remoteRef:
key: secret/data/payment-service/db
property: passwordSecrets live in Vault. ExternalSecret manifests live in Git. The operator syncs them into Kubernetes secrets. Best of both worlds.
Lesson 5: Multi-Cluster Management
With 50 clusters, you need a management layer. Here's our architecture:
Hub-and-Spoke with ArgoCD
Management Cluster (Hub)
├── ArgoCD (manages all clusters)
├── ApplicationSets (generates apps per cluster)
└── Cluster Secrets (credentials for spoke clusters)
Spoke Clusters
├── prod-us-east-1
├── prod-us-west-2
├── prod-eu-west-1
├── staging
└── devApplicationSets for DRY Config
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: platform-services
spec:
generators:
- clusters:
selector:
matchLabels:
environment: production
template:
metadata:
name: '{{name}}-platform'
spec:
project: platform
source:
repoURL: https://github.com/org/gitops-config
path: 'clusters/{{name}}/platform'
destination:
server: '{{server}}'
namespace: platformOne ApplicationSet definition, applied across all production clusters automatically.
Lesson 6: Breaking Changes and Rollbacks
Instant Rollbacks
This is GitOps' superpower. Rolling back is just reverting a Git commit:
git revert HEAD
git push origin main
# ArgoCD detects the change and rolls back automaticallyCanary Deployments with GitOps
Use Argo Rollouts for progressive delivery:
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: payment-service
spec:
strategy:
canary:
steps:
- setWeight: 10
- pause: { duration: 5m }
- setWeight: 30
- pause: { duration: 5m }
- setWeight: 60
- pause: { duration: 10m }
analysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: payment-serviceKey Takeaways
- 1.Start mono-repo, split when it hurts
- 2.Automate environment promotion — never manually edit image tags
- 3.Enable self-healing and alert on drift
- 4.Use External Secrets Operator for secrets management
- 5.ApplicationSets for multi-cluster management
- 6.Git revert is your rollback strategy
GitOps at scale requires investment in tooling and process, but the payoff — full audit trail, instant rollbacks, and declarative everything — is worth it.
Want to implement GitOps across your clusters? Schedule a free assessment.