Configuration
For a comprehensive field-by-field reference of all config schemas, the config overlay system, and the complete environment variables master table, see configuration-reference.md.
Cordum uses a mix of config files (mounted into containers) and environment variables.
Config files
Compose mounts these files from config/:
config/pools.yaml- topic -> pool routingconfig/timeouts.yaml- per-topic and per-workflow timeoutsconfig/safety.yaml- safety kernel policyconfig/nats.conf- NATS server config (JetStreamsync_interval)
config/system.yaml is a sample payload for the config service (budgets, rate limits, observability, alerting). It is not mounted by default; use POST /api/v1/config to store it.
The control plane validates pool/timeout/safety files against embedded JSON
schemas (see core/infra/config/schema/). Invalid configs return errors and,
for timeouts, fall back to defaults.
Core environment variables
Shared across services:
CORDUM_ENV(productionenables strict security defaults)CORDUM_PRODUCTION(trueenables strict security defaults)CORDUM_TLS_MIN_VERSION(1.2or1.3, default1.3in production)CORDUM_LOG_FORMAT(jsonortext, defaulttext)CORDUM_GRPC_REFLECTION(set to1to enable gRPC reflection, dev only)NATS_URL(defaultnats://nats:4222)REDIS_URL(compose defaultredis://:${REDIS_PASSWORD}@redis:6379—REDIS_PASSWORDis required)NATS_USE_JETSTREAM(0|1)POOL_CONFIG_PATH,TIMEOUT_CONFIG_PATHSAFETY_KERNEL_ADDR- NATS TLS:
NATS_TLS_CA,NATS_TLS_CERT,NATS_TLS_KEY,NATS_TLS_INSECURE,NATS_TLS_SERVER_NAME - Redis TLS:
REDIS_TLS_CA,REDIS_TLS_CERT,REDIS_TLS_KEY,REDIS_TLS_INSECURE,REDIS_TLS_SERVER_NAME - Redis clustering:
REDIS_CLUSTER_ADDRESSES(comma-separated host:port seeds)
Typed environment variable helpers
Several services use typed env var helpers from core/infra/env/ that parse
values with safe fallback behavior:
env.IntOr(key, default)— Parses an integer from the named env var. Falls back to the compiled default if the variable is missing, empty, or not a valid positive integer.env.Int64Or(key, default)— Same asIntOrbut forint64values.env.DurationOr(key, default)— Parses a Go duration string (e.g.30s,5m). Falls back to the default if the variable is missing, empty, or not a valid positive duration.env.Bool(key)— Returnstruefor1,true,yes,y,on(case-insensitive). Returnsfalsefor anything else, including unset.
All helpers silently fall back — they never panic or return errors. This means a misconfigured value reverts to the compiled default rather than crashing the service.
Gateway
GATEWAY_GRPC_ADDR,GATEWAY_HTTP_ADDR,GATEWAY_METRICS_ADDRGATEWAY_METRICS_PUBLIC(set to1to allow non-loopback metrics bind in production)API_RATE_LIMIT_RPS,API_RATE_LIMIT_BURST(applied per tenant; falls back to client IP when tenant is missing)TENANT_ID(single-tenant default)- API keys:
CORDUM_API_KEY,API_KEY, orCORDUM_API_KEYS(comma-separated or JSON) - API key file:
CORDUM_API_KEYS_PATH(same format asCORDUM_API_KEYS, reloads on change) - Allow anonymous auth (local/dev only):
CORDUM_ALLOW_INSECURE_NO_AUTH=1 - Header principal:
CORDUM_ALLOW_HEADER_PRINCIPAL=true(disabled by default in production) - CORS:
CORDUM_ALLOWED_ORIGINS,CORDUM_CORS_ALLOW_ORIGINS,CORS_ALLOW_ORIGINS - HTTP TLS:
GATEWAY_HTTP_TLS_CERT,GATEWAY_HTTP_TLS_KEY - gRPC TLS:
GRPC_TLS_CERT,GRPC_TLS_KEY - Artifacts:
ARTIFACT_MAX_BYTES(max upload/download size) - JWT auth:
CORDUM_JWT_HMAC_SECRET,CORDUM_JWT_PUBLIC_KEY,CORDUM_JWT_PUBLIC_KEY_PATH,CORDUM_JWT_ISSUER,CORDUM_JWT_AUDIENCE,CORDUM_JWT_DEFAULT_ROLE,CORDUM_JWT_CLOCK_SKEW,CORDUM_JWT_REQUIRED - Pack catalog defaults:
CORDUM_PACK_CATALOG_URL,CORDUM_PACK_CATALOG_ID,CORDUM_PACK_CATALOG_TITLE,CORDUM_PACK_CATALOG_DEFAULT_DISABLED=1 - Marketplace fetch:
CORDUM_MARKETPLACE_ALLOW_HTTP=1,CORDUM_MARKETPLACE_HTTP_TIMEOUT(e.g.15s) GATEWAY_MAX_JOB_PAYLOAD_BYTES(max job submission payload size, default2097152/ 2 MB)GATEWAY_MAX_BODY_BYTES(max HTTP request body size, default1048576/ 1 MB)
User authentication
The gateway supports user/password authentication in addition to API key authentication:
CORDUM_USER_AUTH_ENABLED=true- Enable user/password authentication (stores users in Redis)CORDUM_ADMIN_USERNAME- Default admin username (default:admin)CORDUM_ADMIN_PASSWORD- Default admin password (creates admin user on first startup if set)CORDUM_ADMIN_EMAIL- Optional admin email
When user auth is enabled, the /api/v1/auth/login endpoint accepts both:
- User credentials (username/email + password)
- API keys (for programmatic access via scripts/CI)
User management endpoints (admin only):
POST /api/v1/users- Create a new userPOST /api/v1/auth/password- Change password (authenticated)
Context engine
CONTEXT_ENGINE_ADDR- TLS server:
CONTEXT_ENGINE_TLS_CERT,CONTEXT_ENGINE_TLS_KEY - TLS client:
CONTEXT_ENGINE_TLS_CA,CONTEXT_ENGINE_TLS_REQUIRED,CONTEXT_ENGINE_INSECURE
Scheduler
JOB_META_TTL/JOB_META_TTL_SECONDSWORKER_SNAPSHOT_INTERVALSCHEDULER_CONFIG_RELOAD_INTERVAL(interval for config overlay reload, e.g.30s)OUTPUT_POLICY_ENABLED(0|1|true|false, default disabled)NATS_JS_ACK_WAIT,NATS_JS_MAX_AGENATS_JS_REPLICAS(JetStream stream replication factor)SCHEDULER_METRICS_ADDR(default:9090)SCHEDULER_METRICS_PUBLIC(set to1to allow non-loopback metrics bind in production)
Workflow engine
WORKFLOW_ENGINE_HTTP_ADDRWORKFLOW_ENGINE_SCAN_INTERVALWORKFLOW_ENGINE_RUN_SCAN_LIMIT
Safety kernel
SAFETY_KERNEL_ADDR,SAFETY_POLICY_PATH(orSAFETY_POLICY_URL)- Policy URL allowlist:
SAFETY_POLICY_URL_ALLOWLIST(comma-separated hostnames) - Allow private/loopback policy URLs (not recommended):
SAFETY_POLICY_URL_ALLOW_PRIVATE=1 - TLS server:
SAFETY_KERNEL_TLS_CERT,SAFETY_KERNEL_TLS_KEY - TLS client:
SAFETY_KERNEL_TLS_CA,SAFETY_KERNEL_TLS_REQUIRED,SAFETY_KERNEL_INSECURE - Decision cache:
SAFETY_DECISION_CACHE_TTL(e.g.5s,250ms) - Policy signature verification:
SAFETY_POLICY_PUBLIC_KEY,SAFETY_POLICY_SIGNATURE,SAFETY_POLICY_SIGNATURE_PATH,SAFETY_POLICY_SIGNATURE_REQUIRED - Policy reload/overlays:
SAFETY_POLICY_RELOAD_INTERVAL,SAFETY_POLICY_CONFIG_SCOPE,SAFETY_POLICY_CONFIG_ID,SAFETY_POLICY_CONFIG_KEY,SAFETY_POLICY_CONFIG_DISABLE - Safety kernel reads policy bundle fragments from the config service in Redis; ensure
REDIS_URLis set when using pack policy overlays.
Config overlay hot-reload
Pool routing, timeout, and fail-mode configuration stored in Redis via the
config service (PUT /api/v1/config) is reloaded at runtime without restarting
the scheduler. Two mechanisms work together:
- NATS notification — When the API gateway writes config to Redis, it
publishes to
sys.config.changed. All scheduler replicas subscribe and reload immediately. - Polling fallback — Each replica polls Redis on a configurable interval (default 30 s) to catch any missed notifications.
Set SCHEDULER_CONFIG_RELOAD_INTERVAL to adjust the polling interval (e.g.
10s for faster convergence, 60s for lower overhead). On each reload the
scheduler compares content hashes and only applies changes when pool routing,
timeouts, or fail modes have actually changed.
For the full reload flow and reset instructions, see the Config Reload section in the reference.
Dynamic pool lifecycle
Worker pools can be created, drained, and deleted at runtime without restarting any services. The scheduler picks up changes via config hot-reload (NATS notification or 30-second poll).
Lifecycle states
create → ACTIVE → drain → DRAINING → (auto) → INACTIVE → delete
- Active: Pool receives new job routing. Default state.
- Draining: Pool is removed from the routing table. In-flight jobs on workers complete normally. A background goroutine checks every 10 seconds and transitions to inactive when all jobs finish or the drain timeout expires.
- Inactive: Pool is fully drained. Can be deleted or reactivated via update.
How it works
- API mutation:
PUT /api/v1/pools/{name}(or cordumctl, dashboard) writes tocfg:system:default.data.poolsviaSetWithRetry(optimistic locking). - NATS broadcast: Gateway publishes
sys.config.changedso all replicas reload immediately. - Scheduler reload:
watchConfigChangesdetects the change,buildRouting()rebuilds the routing table, filtering out draining and inactive pools. - Drain checker: Gateway background goroutine monitors draining pools, reads worker snapshot, and auto-transitions to inactive.
Pack overlays
Packs register pools via overlays/pools.patch.yaml in their bundle.
During pack install, the overlay is merged into the system config via
json_merge_patch. Pack uninstall removes the overlay. The scheduler
picks up changes on the next reload cycle.
Management surfaces
| Surface | Commands |
|---|---|
| REST API | PUT/PATCH/DELETE /api/v1/pools/{name}, drain, topic management |
| Dashboard | Pools page (/pools) — create, edit, drain, delete, topic assignment |
| CLI | cordumctl pool list/get/create/update/delete/drain/topic |
NATS server durability (JetStream)
JetStream fsync cadence is controlled by sync_interval in the NATS server
config. Lower values improve crash durability at the cost of throughput.
- Compose: edit
config/nats.conf. - K8s base: edit the
cordum-nats-configConfigMap indeploy/k8s/base.yaml. - Production overlay: edit the
cordum-nats-configConfigMap indeploy/k8s/production/nats.yaml. - Helm: set
nats.jetstream.syncIntervalincordum-helm/values.yaml(or--set nats.jetstream.syncInterval=1s).
For full details, see docs/DOCKER.md.