Operations Guide — Agent Framework (Kind Cluster)¶
Quick reference for deploying updates, checking status, reading logs, and debugging the local Kind cluster.
Prerequisites¶
| Tool | Purpose |
|---|---|
docker | Build images |
kind | Local k8s cluster |
kubectl | Cluster management |
uv | Python package manager |
pnpm | Frontend package manager |
Full Redeploy (from scratch)¶
Use this when you want to rebuild both images and re-apply everything from scratch.
# Works on Windows (PowerShell), Linux, macOS, and Git Bash
uv run python deploy.py
# Optional flags:
uv run python deploy.py --cluster-name dev --backend-tag agent-microservices-kind:local --frontend-tag chatbot-frontend-kind:local
The script automatically: 1. Reads secrets from .env 2. Builds backend Docker image (agent-microservices-kind:local) 3. Builds frontend Docker image (chatbot-frontend-kind:local) from ../ai-chatbot-ui/ 4. Loads both images into the Kind cluster dev 5. Deploys namespaces and infra (Postgres, Redis) 6. Creates secrets in all namespaces 7. Applies all k8s manifests via kubectl apply -k deployment/k8s/overlays/kind
Partial Redeploy — Backend Only¶
When you only change Python code in agent-framework/:
# 1. Rebuild backend image
docker build -f deployment/docker/backend.Dockerfile -t agent-microservices-kind:local .
# 2. Load into Kind cluster
kind load docker-image agent-microservices-kind:local --name dev
# 3. Restart all backend deployments
kubectl rollout restart deployment -n af-edge
kubectl rollout restart deployment -n af-platform
kubectl rollout restart deployment -n af-runtime
# 4. Watch rollout complete
kubectl rollout status deployment/gateway-bff -n af-edge --timeout=120s
Partial Redeploy — Frontend Only¶
When you only change code in ai-chatbot-ui/:
# From ai-chatbot-ui/ directory
cd ..\ai-chatbot-ui
# 1. Rebuild frontend image (NEXT_PUBLIC_API_URL="" → uses relative paths via ingress)
docker build --build-arg NEXT_PUBLIC_API_URL="" -t localhost/ai-chatbot-ui:latest .
# 2. Load into Kind cluster
kind load docker-image localhost/ai-chatbot-ui:latest --name dev
# 3. Update the frontend deployment to use the new image
kubectl set image deployment/frontend frontend=localhost/ai-chatbot-ui:latest -n af-edge
# 4. Restart to pick it up
kubectl rollout restart deployment/frontend -n af-edge
# 5. Watch it come up
kubectl rollout status deployment/frontend -n af-edge --timeout=120s
Apply k8s Manifest Changes Only¶
When you edit YAML files in deployment/k8s/ but don't need to rebuild images:
Status & Health¶
Quick overview — all pods¶
Per-namespace pods¶
kubectl get pods -n af-edge # frontend, gateway-bff
kubectl get pods -n af-platform # identity-auth, policy-authorization
kubectl get pods -n af-runtime # agent-runtime, conversation, job-controller, etc.
kubectl get pods -n af-data # postgres, redis
kubectl get pods -n af-observability # grafana, loki, tempo, prometheus
Only show problem pods¶
kubectl get pods -A --field-selector=status.phase!=Running | Where-Object { $_ -notmatch "Completed|code-interpreter" }
Deployment health¶
HPA status (autoscaler)¶
Ingress rules¶
Endpoint connectivity¶
# Health check
curl http://localhost/health
# Should return threads list (empty array is fine)
curl http://localhost/threads
# Full smoke test
./deployment/k8s/overlays/kind/smoke-test.ps1
Logs¶
Frontend (Next.js)¶
Gateway BFF¶
Agent Runtime (where the ReAct loop runs)¶
Job Controller¶
Identity / Auth¶
All logs from a namespace (last 50 lines per pod)¶
Follow logs from multiple pods matching a label¶
Previous crashed container logs¶
Debugging¶
Describe a failing pod¶
kubectl describe pod <pod-name> -n <namespace>
# e.g.
kubectl describe pod -n af-edge -l app=frontend
Exec into a running container¶
Check events (shows scheduling failures, OOM kills, etc.)¶
kubectl get events -n af-edge --sort-by='.lastTimestamp' | Select-Object -Last 20
kubectl get events -A --sort-by='.lastTimestamp' | Select-Object -Last 30
Memory pressure — kill stuck pods¶
# Delete all Pending pods (they'll reschedule if something frees up)
kubectl get pods -A --field-selector=status.phase=Pending -o json |
kubectl delete -f -
Force delete a stuck pod¶
Secrets¶
Recreate secrets (after .env change)¶
# Re-run deploy script — it uses --dry-run=client | apply so it's idempotent
uv run python deploy.py
View current secret keys (not values)¶
kubectl get secret shared-secrets -n af-edge -o jsonpath='{.data}' | ConvertFrom-Json | Get-Member -MemberType NoteProperty | Select-Object Name
Scaling¶
Scale a deployment to 1 replica (memory-constrained single-node Kind)¶
kubectl scale deployment <name> -n <namespace> --replicas=1
# e.g.
kubectl scale deployment gateway-bff -n af-edge --replicas=1
Scale an HPA minimum¶
$patch = '{"spec":{"minReplicas":1}}'
Set-Content "$env:TEMP\hpa.json" $patch
kubectl patch hpa <name> -n <namespace> --type=merge --patch-file "$env:TEMP\hpa.json"
Observability¶
Open Grafana dashboards¶
Login:admin / admin (anonymous read also enabled) Pre-built dashboards: - Service RED Metrics — requests, errors, duration per service - Infrastructure — CPU, memory, pod restarts - Log Analytics — error rates, log search - Distributed Tracing — trace explorer (Tempo) - Alerts Overview — firing alerts
Query logs directly (Loki)¶
In Grafana → Explore → Loki:
{namespace=~"af-.*"} # all agent-framework logs
{namespace="af-edge", app="gateway-bff"} # gateway logs only
{namespace=~"af-.*"} |~ "(?i)error|exception" # errors across all services
Query traces (Tempo)¶
In Grafana → Explore → Tempo → Search
Local Dev (no cluster)¶
Start backend (monolith mode)¶
cd agent-framework
docker compose -f deployment/docker/docker-compose.yml up -d postgres redis
uv run uvicorn ravi.server.app:app --port 8000 --reload
Start frontend¶
Run tests¶
Lint & format¶
Port Reference¶
| Service | Local Dev Port | k8s (Kind) |
|---|---|---|
| Frontend (Next.js) | 3000 | http://localhost/ |
| Backend API | 8000 | http://localhost/chat, /threads, etc. |
| PostgreSQL | 5432 | internal cluster only |
| Redis | 6379 | internal cluster only |
| Grafana | — | http://localhost/grafana/ |
| MCP demo server | 9000 | docker compose -f deployment/docker/docker-compose.yml --profile mcp |