Skip to main content

Configuration Reference


Table of Contents

  1. Overview
  2. system.yaml — System Configuration
  3. Config Overlay System
  4. pools.yaml — Worker Pool Routing
  5. timeouts.yaml — Timeout Configuration
  6. safety.yaml — Safety Policy
  7. output_scanners.yaml — Output Scanner Patterns
  8. Environment Variables Master Table
  9. Cross-References


system.yaml — System Configuration

config/system.yaml is not mounted by default in Docker Compose. It is a payload for the config service — store it via POST /api/v1/config or let packs write fragments.

safety

Controls system-wide safety defaults. These supplement the rule-based policy in safety.yaml.

FieldTypeDefaultDescription
pii_detection_enabledbooltrueEnable PII detection in inputs
pii_actionstring"block"Action on PII detection: block, redact, warn
pii_types_to_detectstring[]["email","phone"]PII categories to scan for
injection_detectionbooltrueEnable prompt injection detection
injection_sensitivitystring"high"Sensitivity level: low, medium, high
content_filter_enabledbooltrueEnable content category filtering
blocked_categoriesstring[]["hate_speech","sexual_content"]Blocked content categories
anomaly_detectionboolfalseEnable anomaly detection
allowed_topicsstring[][]Allowlisted topics (empty = all allowed)
denied_topicsstring[][]Denylisted topics

budget

Cost control and attribution settings.

FieldTypeDefaultDescription
daily_limit_usdfloat1000.0Daily spend limit in USD
monthly_limit_usdfloat10000.0Monthly spend limit
per_job_max_usdfloat5.0Maximum cost per single job
per_workflow_max_usdfloat50.0Maximum cost per workflow run
alert_at_percentint[][50,75,90,100]Alert at these % of limit
action_at_limitstring"throttle"Action when limit hit: throttle, deny, alert
cost_attribution_enabledbooltrueEnable per-tenant cost tracking
cost_centersstring[][]Cost center tags for attribution

rate_limits

System-level budget rate limits enforced by the scheduler. These are independent from gateway-level API rate limiting (API_RATE_LIMIT_RPS env var), which is enforced by the api-gateway middleware before requests reach the scheduler.

FieldTypeDefaultDescription
requests_per_minuteint120000Sustained throughput limit (2000 req/sec)
requests_per_hourint7200000Hourly throughput limit
burst_sizeint4000Token bucket burst — peak spike capacity before throttling
concurrent_jobsint10000Max concurrent jobs across all tenants
concurrent_workflowsint5Max concurrent workflows
queue_sizeint5000Max pending queue depth

retry

Default retry policy for jobs (overridable per-topic in timeouts.yaml).

FieldTypeDefaultDescription
max_retriesint3Maximum retry attempts
initial_backoffduration1sInitial backoff delay
max_backoffduration30sMaximum backoff delay
backoff_multiplierfloat2.0Exponential backoff multiplier
retryable_errorsstring[]["network_error","timeout"]Error types that trigger retry
non_retryable_errorsstring[]["bad_request"]Error types that skip retry

resources

Resource allocation defaults.

FieldTypeDefaultDescription
default_prioritystring"interactive"Default job priority
max_timeout_secondsint300Maximum allowed timeout
default_timeout_secondsint60Default job timeout
max_parallel_stepsint10Max parallel workflow steps
preemption_enabledbooltrueAllow job preemption
preemption_grace_periodint30Seconds before preemption

models

Allowed LLM model configuration.

FieldTypeDefaultDescription
allowed_modelsstring[]["gpt-4","llama-3","claude-3"]Permitted model identifiers
default_modelstring"gpt-4"Default model for jobs
fallback_modelsstring[]["llama-3"]Models to try if primary unavailable

context

Context engine retrieval settings.

FieldTypeDefaultDescription
allowed_memory_idsstring[]["repo:*","kb:*"]Allowed memory ID patterns
denied_memory_idsstring[][]Denied memory ID patterns
max_context_tokensint4000Max tokens to retrieve
max_retrieved_chunksint10Max chunks per retrieval
cross_tenant_accessboolfalseAllow cross-tenant context access
allowed_connectorsstring[]["github","slack"]Permitted connector types
redaction_policiesobject{}Config field defined but not yet consumed at runtime

slo

Service-level objective configuration.

FieldTypeDefaultDescription
target_p95_latency_msint1000Target p95 latency in milliseconds
error_rate_budgetfloat0.01Error rate budget (1%)
timeout_secondsint60SLO evaluation window timeout
criticalboolfalseMark as critical service

experiment (NOT YET IMPLEMENTED)

Struct exists in code but no runtime code reads these fields.

experiment:
enabled: false
name: ""
buckets: []

integrations (NOT YET IMPLEMENTED)

Struct exists in code but no runtime code reads these fields.

integrations:
github:
enabled: false
connection_id: ""
allowed_teams: []
allowed_scopes: []
gitlab: # same structure
slack: # same structure
jira: # same structure

observability (NOT YET IMPLEMENTED)

No backing code or struct exists.

observability:
otel:
enabled: false
endpoint: ""
protocol: "grpc" # grpc | http
headers: {}
resource_attributes: {}
grafana:
base_url: ""
dashboards:
system_overview: ""
workflow_performance: ""

alerting (NOT YET IMPLEMENTED)

No backing code or struct exists.

alerting:
pagerduty:
enabled: false
integration_key: ""
severity: "critical"
slack:
enabled: false
webhook_url: ""
severity: "error"


pools.yaml — Worker Pool Routing

Defines how job topics are routed to worker pools.

Example

topics:
"job.default": ["general"]
"job.hello-pack.echo": ["hello-pack"]
"job.code-review": ["code-review", "general"] # fallback order
"job.compliance.*": ["compliance"]

pools:
general:
requires: []
hello-pack:
requires: []
code-review:
requires: ["code.read", "code.write"]
compliance:
requires: ["compliance.review", "data.access"]

Topics Section

Maps NATS subject patterns to ordered lists of pool names.

FieldTypeDescription
topicsmap[string]string[]Topic pattern → ordered list of eligible pool names
  • Topics use exact match or NATS wildcard patterns
  • The list ordering defines fallback priority — first pool with capacity wins
  • Worker pool name must match the pool a worker heartbeats as

Pools Section

Defines pool profiles and capability requirements.

FieldTypeDescription
poolsmap[string]PoolDefPool name → pool definition
pools.*.requiresstring[]Capabilities a worker must declare to join this pool

Routing Algorithm

  1. Scheduler receives a job with topic (e.g., job.code-review)
  2. Looks up topic in topics map → gets pool list ["code-review", "general"]
  3. For each pool in order: a. Checks if pool has workers with required capabilities (requires list) b. Checks if pool has capacity (workers available) c. First match wins — job dispatched to that pool
  4. If no pool matches → job stays in pending state for reconciler

Schema

Validated against core/infra/config/schema/pools.schema.json.



safety.yaml — Safety Policy

Defines safety kernel input rules, output rules, and MCP (Model Context Protocol) configuration.

For full details on the safety kernel, see safety-kernel.md. For output policy, see output-policy.md.

Example

version: "1"
rules:
- id: fraud-review
match:
capabilities: ["bank.transfer"]
risk_tags: ["financial", "high_value"]
decision: require_approval
reason: "Financial transactions require human approval"

- id: auto-allow-validators
match:
capabilities: ["validate.*"]
decision: allow
reason: "Read-only validation is always safe"

output_policy:
enabled: false
fail_mode: open # open = allow on scanner error, closed = deny

output_rules:
- id: secret_leak
match:
detectors: ["secret_leak"]
decision: quarantine
reason: "Potential secret in output"

- id: pii
match:
detectors: ["pii"]
decision: redact
reason: "PII detected — redacting"

tenants:
acme-corp:
mcp:
allow: ["github", "slack"]
deny: ["*"]
default:
mcp:
allow: ["*"]
deny: []

Rules Section (Input Policy)

FieldTypeDescription
rules[].idstringUnique rule identifier
rules[].match.capabilitiesstring[]Capability patterns to match (supports * wildcard)
rules[].match.risk_tagsstring[]Risk tag patterns to match
rules[].match.metadatamapKey-value metadata conditions
rules[].decisionstringallow, deny, require_approval, throttle
rules[].reasonstringHuman-readable reason
rules[].throttle_durationdurationRequired if decision is throttle

Rules are evaluated top-to-bottom; first match wins.

Velocity Rule Fragments

Velocity rules are regular rules[] entries stored as dedicated policy bundle fragments at cfg:system:policy -> bundles -> velocity/{id}. They do not change the safety-kernel evaluator; they only add managed rule fragments that use the existing velocity block on input rules.

Example fragment:

version: "1"
rules:
- id: login-burst
match:
topics: ["job.auth.login"]
tenants: ["default"]
risk_tags: ["auth"]
velocity:
max_requests: 3
window_seconds: 60
key: tenant
decision: require_approval
reason: "Repeated login attempts require review"
FieldTypeDescription
rules[].velocity.max_requestsintRequests allowed inside the sliding window before the rule fires
rules[].velocity.window_secondsintSliding-window size in seconds (1 to 86400)
rules[].velocity.keystringBucket key expression (tenant, topic, actor_id, actor_type, capability, pack_id, or labels.<key>; compound keys use :)

Default Decision

The default_decision field at the top of safety.yaml controls what happens when no input rule matches a job. The production default is deny (fail-closed), meaning unmatched jobs are rejected. To whitelist specific topics, add decision: allow rules.

# Fail-closed: unmatched jobs are denied
default_decision: deny

Output Policy Section

FieldTypeDefaultDescription
output_policy.enabledboolfalseEnable output scanning
output_policy.fail_modestring"closed"open = allow on scanner error, closed = quarantine on scanner error (recommended for production)

Output Rules Section

FieldTypeDescription
output_rules[].idstringUnique rule identifier
output_rules[].match.topicsstring[]Topic patterns
output_rules[].match.capabilitiesstring[]Capability patterns
output_rules[].match.risk_tagsstring[]Risk tag patterns
output_rules[].match.content_patternsstring[]Regex patterns for content matching
output_rules[].match.detectorsstring[]Scanner detector names (secret_leak, pii, injection)
output_rules[].match.max_output_bytesintMaximum output size in bytes
output_rules[].decisionstringallow, deny, quarantine, redact
output_rules[].reasonstringHuman-readable reason

Tenants Section

Per-tenant MCP tool access control.

FieldTypeDescription
tenants.*.mcp.allowstring[]Allowed MCP tool/resource patterns
tenants.*.mcp.denystring[]Denied MCP tool/resource patterns

Schema

Validated against core/infra/config/schema/safety_policy.schema.json.



Environment Variables Master Table

Global / Shared

VariableDefaultRequiredDescription
CORDUM_ENVNoSet to production or prod for strict security defaults
CORDUM_PRODUCTIONfalseNoAlternative: set to true for production mode
CORDUM_TLS_MIN_VERSION1.2 (dev), 1.3 (prod)NoMinimum TLS version: 1.2 or 1.3
CORDUM_LOG_FORMATtextNoLog format: json or text
CORDUM_GRPC_REFLECTIONNoSet to 1 to enable gRPC reflection (dev only)
NATS_URLnats://localhost:4222YesNATS server URL
REDIS_URLredis://localhost:6379YesRedis URL (Compose: redis://:${REDIS_PASSWORD}@redis:6379 — password required)
NATS_USE_JETSTREAM0NoEnable NATS JetStream: 0 or 1
POOL_CONFIG_PATHconfig/pools.yamlNoPath to pools config
TIMEOUT_CONFIG_PATHconfig/timeouts.yamlNoPath to timeouts config. Production mode: if explicitly set and the file cannot be loaded or parsed, the scheduler exits with an error. In dev mode, falls back to built-in defaults with a warning.
SAFETY_POLICY_PATHconfig/safety.yamlNoPath to safety policy
SAFETY_KERNEL_ADDRlocalhost:50051NoSafety kernel gRPC address
CONTEXT_ENGINE_ADDR:50070NoContext engine gRPC address
OUTPUT_POLICY_ENABLEDfalseNoEnable output policy scanning: true, 1
CORDUM_TENANT_IDNoDefault tenant ID for SDK/MCP clients
CORDUM_INSTANCE_IDos.Hostname()NoOverride pod name used in Prometheus pod label. Defaults to hostname; falls back to "unknown"

Prometheus pod label: All Cordum metrics include a pod const label (os.Hostname() or CORDUM_INSTANCE_ID) so Prometheus can distinguish replicas in HA deployments. Use sum by (pod) (cordum_scheduler_jobs_received_total) for per-replica breakdown.

Licensing

VariableDefaultDescription
CORDUM_LICENSE_FILEPath to license JSON file. If not set, checks ~/.cordum/license.json and /etc/cordum/license.json
CORDUM_LICENSE_TOKENLicense token (base64-encoded or raw JSON). Alternative to file-based licensing
CORDUM_LICENSE_PUBLIC_KEYembeddedBase64-encoded Ed25519 public key for signature verification
CORDUM_LICENSE_PUBLIC_KEY_PATHPath to public key file (alternative to inline)

No license = Community tier (3 workers, 3 concurrent jobs, 500 RPS, 7-day audit retention). Invalid or expired licenses degrade to Community — Cordum never crashes or blocks startup due to licensing.

Telemetry

VariableDefaultDescription
CORDUM_TELEMETRY_MODEanonymousTelemetry mode: off (no collection), local_only (collect but don't report), anonymous (collect and report aggregate stats)
CORDUM_TELEMETRY_ENDPOINThttps://telemetry.cordum.io/v1/reportHTTPS endpoint for anonymous telemetry reports

Telemetry is independent from licensing. It never collects PII, prompts, secrets, or job content. Operators can opt out at any time via CORDUM_TELEMETRY_MODE=off or POST /api/v1/telemetry/consent.

NATS TLS

VariableDefaultDescription
NATS_TLS_CACA certificate path for NATS TLS
NATS_TLS_CERTClient certificate path
NATS_TLS_KEYClient private key path
NATS_TLS_INSECURESkip TLS verification
NATS_TLS_SERVER_NAMETLS server name override

NATS JetStream

VariableDefaultDescription
NATS_JS_ACK_WAIT10mJetStream ack wait duration
NATS_JS_MAX_AGE7dJetStream message max age
NATS_JS_REPLICAS1JetStream stream replication factor

Redis TLS

VariableDefaultDescription
REDIS_TLS_CACA certificate path for Redis TLS
REDIS_TLS_CERTClient certificate path
REDIS_TLS_KEYClient private key path
REDIS_TLS_INSECURESkip TLS verification
REDIS_TLS_SERVER_NAMETLS server name override
REDIS_CLUSTER_ADDRESSESComma-separated cluster seeds (host:port)

Redis Data TTL

VariableDefaultDescription
REDIS_DATA_TTL_SECONDSData TTL in seconds (takes precedence)
REDIS_DATA_TTLData TTL as Go duration (e.g., 24h)

Redis Connection Pool

VariableDefaultDescription
REDIS_POOL_SIZE20Max connections per Redis node. Each service replica opens up to this many connections.
REDIS_MIN_IDLE_CONNS5Minimum idle connections kept warm per Redis node. Reduces cold-start latency for bursty traffic.

Sizing guidance: With N service replicas × P pool size × M Redis nodes, total connections ≈ N×P×M. For example, 3 scheduler replicas × 50 pool × 1 Redis = 150 connections. Redis default maxclients is 10000, so pool sizes up to 100 are safe for typical deployments. The scheduler benefits from higher pool sizes (recommend 50) due to concurrent job dispatch; other services can use the default 20.

Invalid values (non-numeric, zero, negative) are silently replaced with defaults and a warning is logged.

Gateway

VariableDefaultDescription
GATEWAY_GRPC_ADDR:50051gRPC listen address
GATEWAY_HTTP_ADDR:8080HTTP listen address
GATEWAY_METRICS_ADDR:9090Metrics listen address
GATEWAY_METRICS_PUBLICSet to 1 for non-loopback metrics in production
GATEWAY_HTTP_TLS_CERTHTTP TLS certificate path
GATEWAY_HTTP_TLS_KEYHTTP TLS private key path
GRPC_TLS_CERTgRPC TLS certificate path
GRPC_TLS_KEYgRPC TLS private key path
GATEWAY_MAX_JOB_PAYLOAD_BYTES2097152 (2 MB)Max job submission payload size in bytes
GATEWAY_MAX_BODY_BYTES1048576 (1 MB)Max HTTP request body size in bytes
GATEWAY_MAX_JSON_BODY_BYTESMax JSON request body size
TENANT_IDSingle-tenant default ID
ARTIFACT_MAX_BYTESMax artifact upload/download size
WORKFLOW_FOREACH_MAX_ITEMSMax items in workflow for-each expansion
POLICY_CHECK_FAIL_MODEclosedBehavior when Safety Kernel is unreachable during policy evaluation (both gateway submit-time and scheduler dispatch-time). closed (default): reject the job. open: allow with warning log.

Gateway — API Keys

VariableDefaultDescription
CORDUM_API_KEYSingle API key
API_KEYFallback if CORDUM_API_KEY not set
CORDUM_API_KEYSMultiple keys: comma-separated or JSON array
CORDUM_API_KEYS_PATHPath to keys file (reloads on change)
CORDUM_ALLOW_INSECURE_NO_AUTHSet to 1 for no-auth mode (dev only)
CORDUM_ALLOW_HEADER_PRINCIPALSet to true for header-based principal (disabled in production)

Gateway — Rate Limiting

VariableDefaultDescription
API_RATE_LIMIT_RPS2000Per-tenant rate limit (requests/sec)
API_RATE_LIMIT_BURST4000Per-tenant burst size
API_PUBLIC_RATE_LIMIT_RPS20Public (unauthenticated) rate limit
API_PUBLIC_RATE_LIMIT_BURST40Public burst size
REDIS_RATE_LIMITtrueEnable Redis-backed distributed rate limiting. When true, rate limits are enforced globally across all gateway replicas via Redis sliding-window counters (key format: cordum:rl:{key}:{unix_second}). When false or Redis unavailable, falls back to per-process in-memory token buckets (effective limit = N × configured limit with N replicas).

Horizontal scaling note: With multiple gateway replicas, Redis-backed rate limiting (REDIS_RATE_LIMIT=true) is strongly recommended. Without it, each replica maintains its own in-memory token bucket, so the effective rate limit is multiplied by the number of replicas.

Gateway — CORS

VariableDefaultDescription
CORDUM_ALLOWED_ORIGINSAllowed CORS origins
CORDUM_CORS_ALLOW_ORIGINSAlias for allowed origins
CORS_ALLOW_ORIGINSAlias for allowed origins

Gateway — JWT Authentication

VariableDefaultDescription
CORDUM_JWT_HMAC_SECRETHMAC secret for JWT signing
CORDUM_JWT_PUBLIC_KEYRSA/EC public key (PEM) for JWT verification
CORDUM_JWT_PUBLIC_KEY_PATHPath to public key file
CORDUM_JWT_ISSUERExpected JWT issuer
CORDUM_JWT_AUDIENCEExpected JWT audience
CORDUM_JWT_DEFAULT_ROLEDefault role for JWT tokens without role claim
CORDUM_JWT_CLOCK_SKEWAllowed clock skew (e.g., 30s)
CORDUM_JWT_REQUIREDSet to true to require JWT for all requests

Gateway — OIDC Authentication

VariableDefaultDescription
CORDUM_OIDC_ISSUEROIDC issuer URL
CORDUM_OIDC_AUDIENCEExpected OIDC audience
CORDUM_OIDC_CLAIM_TENANTJWT claim for tenant ID
CORDUM_OIDC_CLAIM_ROLEJWT claim for user role
CORDUM_OIDC_ALLOWED_ALGSComma-separated allowed algorithms
CORDUM_OIDC_JWKS_REFRESH_INTERVALJWKS refresh interval (e.g., 1h)
CORDUM_OIDC_ISSUER_ALLOWLISTComma-separated allowed issuers
CORDUM_OIDC_ALLOW_PRIVATEAllow private/loopback issuer URLs
CORDUM_OIDC_ALLOW_HTTPAllow HTTP (non-TLS) issuer URLs

HA note — JWKS coordination: When running multiple gateway replicas, the OIDC provider automatically coordinates JWKS fetches via Redis. The first replica to refresh fetches from the IdP and writes the JWKS to cordum:auth:jwks:<issuerHash> (TTL 1h). Other replicas read from this cache, reducing IdP load from N requests to 1 per refresh cycle. Each replica also applies random jitter (0–30s initial, 0–15s per tick) to prevent thundering-herd requests. If Redis is unavailable, replicas fall back to direct IdP fetches (same behavior as single-replica).

Gateway — OIDC Authentication

VariableDefaultDescription
CORDUM_OIDC_ENABLEDfalseEnable OIDC JWT validation for bearer tokens
CORDUM_OIDC_ISSUEROpenID Connect issuer URL used for discovery
CORDUM_OIDC_AUDIENCEExpected audience for bearer-token validation; browser callback validation uses CORDUM_OIDC_CLIENT_ID
CORDUM_OIDC_CLAIM_TENANTorg_idClaim name used to resolve the Cordum tenant
CORDUM_OIDC_CLAIM_ROLEcordum_roleClaim name used to resolve the Cordum role
CORDUM_OIDC_CLIENT_IDEnable browser OIDC SSO with this client ID
CORDUM_OIDC_CLIENT_SECRETClient secret used during the authorization-code exchange
CORDUM_OIDC_REDIRECT_URIAbsolute callback URL registered with the IdP (typically https://<gateway>/api/v1/auth/sso/oidc/callback)
CORDUM_OIDC_SCOPESopenid,profile,emailComma-separated scopes requested during login
CORDUM_OIDC_STATE_TTL10mTTL for OIDC state / nonce tracking entries stored in Redis
CORDUM_OIDC_ALLOWED_ALGSRS256,RS384,RS512,ES256,ES384,ES512Restrict accepted signing algorithms
CORDUM_OIDC_JWKS_REFRESH_INTERVAL6hBackground refresh interval for the issuer JWKS cache
OIDC_JWKS_REFRESH_COOLDOWN1mMinimum time between on-demand unknown-kid refresh attempts
CORDUM_OIDC_ISSUER_ALLOWLISTOptional comma-separated allowlist of issuer hosts/domains
CORDUM_OIDC_ALLOW_PRIVATEfalse in productionAllow private-network issuer hosts in production
CORDUM_OIDC_ALLOW_HTTPfalse in productionAllow plain HTTP issuer / redirect URLs in production
CORDUM_AUTH_REDIRECT_URL<ui-origin>/loginPost-auth redirect target used after OIDC or SAML completes
CORDUM_AUTH_SESSION_TTL24hBrowser/session token TTL for password, OIDC, and SAML sign-ins

Helm / Compose note: The Helm chart exposes these under auth.oidc.*.

Gateway — SAML Authentication

VariableDefaultDescription
CORDUM_SAML_ENABLEDfalseEnable the SAML service-provider endpoints on the gateway
CORDUM_SAML_IDP_METADATA_URLRemote IdP metadata URL the gateway should fetch on startup
CORDUM_SAML_IDP_METADATAInline IdP metadata XML (use instead of the URL for air-gapped installs)
CORDUM_SAML_BASE_URLhttp://localhost:8081External gateway base URL used to publish metadata, ACS, and login endpoints
CORDUM_SAML_CERT_PATHPEM certificate path for the service-provider signing / TLS cert
CORDUM_SAML_KEY_PATHPEM private-key path paired with CORDUM_SAML_CERT_PATH
CORDUM_SAML_ENTITY_IDmetadata URLExplicit SAML entity ID override for the service provider
CORDUM_SAML_BINDINGredirectSP-initiated binding used for the login request (redirect or post)
CORDUM_SAML_RESPONSE_BINDINGpostExpected ACS response binding (post or redirect)
CORDUM_SAML_ALLOW_IDP_INITIATEDfalseAllow IdP-initiated SSO responses with no stored RelayState
CORDUM_SAML_STATE_TTL10mTTL for SAML RelayState/request tracking entries stored in Redis
CORDUM_AUTH_REDIRECT_URL<ui-origin>/loginPost-auth redirect target used after the ACS callback completes
CORDUM_AUTH_SESSION_TTL24hBrowser/session token TTL for password, OIDC, and SAML sign-ins

Helm / Compose note: The Helm chart exposes these under auth.saml.*, and docker-compose.yml includes the same gateway variables as commented examples for local development.

Gateway — SCIM Provisioning

VariableDefaultDescription
CORDUM_SCIM_BEARER_TOKENShared bearer token required by all SCIM 2.0 provisioning endpoints under /api/v1/scim/v2/*

SCIM provisioning is additionally gated by the SCIM license entitlement. When the entitlement is disabled, discovery, user, and group routes return 403 tier_limit_exceeded even if a bearer token is configured.

If CORDUM_SCIM_BEARER_TOKEN is unset, Cordum can generate and store a Redis-backed SCIM token through the admin settings API (POST /api/v1/scim/settings/token) and the dashboard page at /settings/scim. If the env var is set, that value is used unless an operator later creates a Redis-managed override.

SCIM response locations and the dashboard-published endpoint URL are derived from the external gateway base URL (CORDUM_API_BASE_URL, CORDUM_API_BASE, or CORDUM_SAML_BASE_URL).

Helm note: The Helm chart exposes these under auth.scim.*, including auth.scim.existingSecret for referencing an existing Kubernetes secret instead of placing the bearer token inline.

Gateway — Advanced RBAC

Advanced RBAC provides role hierarchy with permission-based access control, gated by the RBAC license entitlement (Enterprise plan). When the entitlement is disabled, the gateway falls back to basic role string matching (admin/operator/viewer).

RBAC roles are stored in Redis (key prefix rbac:role:). Default roles (admin, operator, viewer) are bootstrapped on startup if not present.

VariableDefaultDescription
CORDUM_RBAC_ROLE_DEFSJSON array of custom role definitions to seed on startup (optional)

Dashboard: The roles management tab at /settings/users shows built-in and custom roles. Custom role creation/editing requires the RBAC entitlement.

API: Role management endpoints at /api/v1/auth/roles (see API Reference).

Gateway — User Authentication

VariableDefaultDescription
CORDUM_USER_AUTH_ENABLEDfalseEnable user/password auth (Redis-backed)
CORDUM_ADMIN_USERNAMEadminDefault admin username
CORDUM_ADMIN_PASSWORDAdmin password (creates user on first startup)
CORDUM_ADMIN_EMAILOptional admin email

Gateway — Pack Marketplace

VariableDefaultDescription
CORDUM_PACK_CATALOG_URL(built-in)Official catalog URL
CORDUM_PACK_CATALOG_ID(auto)Catalog ID
CORDUM_PACK_CATALOG_TITLE(auto)Catalog display title
CORDUM_PACK_CATALOG_DEFAULT_DISABLEDSet to 1 to disable default catalog
CORDUM_MARKETPLACE_ALLOW_HTTPSet to 1 for HTTP marketplace URLs
CORDUM_MARKETPLACE_HTTP_TIMEOUTFetch timeout (e.g., 15s)

Scheduler

VariableDefaultDescription
SCHEDULER_METRICS_ADDR:9090Metrics listen address
SCHEDULER_METRICS_PUBLICSet to 1 for non-loopback metrics in production
SCHEDULER_CONFIG_RELOAD_INTERVAL30sConfig overlay reload interval
JOB_META_TTLJob metadata TTL (Go duration, e.g., 48h)
JOB_META_TTL_SECONDSJob metadata TTL in seconds (takes precedence)
WORKER_SNAPSHOT_INTERVALWorker state snapshot interval
OUTPUT_POLICY_ENABLEDfalseEnable output policy: true, 1
POLICY_CHECK_FAIL_MODEclosedBehavior when safety kernel is unreachable during pre-dispatch input policy checks. closed (default): requeue with backoff. open: allow through with warning log and metric. See safety-kernel.md for risk implications.

Gateway + Scheduler — Boundary Hardening

These flags control the canonical topic registry, schema enforcement, worker attestation, and readiness gating described in ADR 009.

VariableDefaultTypeServiceDescription
SCHEMA_ENFORCEMENTwarnstring (off, warn, enforce)gateway + schedulerControls how registered topic schemas are enforced. The gateway uses it at submit time for POST /api/v1/jobs; the scheduler uses the same mode before dispatch. warn logs violations and continues, enforce rejects/failed-jobs on schema mismatch, off skips schema validation.
WORKER_ATTESTATIONoffstring (off, warn, enforce)schedulerControls whether scheduler heartbeat processing requires a valid worker credential token. warn accepts the heartbeat but logs attestation failures; enforce rejects unattested heartbeats; off skips attestation checks.
WORKER_READINESS_REQUIREDfalseboolschedulerWhen true, scheduling only considers workers that have recently advertised matching ready_topics in their handshake. When false, workers without readiness data remain eligible for backward compatibility.
WORKER_READINESS_TTL60sdurationschedulerFreshness window for handshake readiness state. After this TTL expires, the worker heartbeat may still be present, but readiness gating treats the worker as not ready until it handshakes again. Invalid or non-positive values fall back to 60s with a warning log.

Workflow Engine

VariableDefaultDescription
WORKFLOW_ENGINE_HTTP_ADDRHTTP listen address
WORKFLOW_ENGINE_SCAN_INTERVALRun scan interval
WORKFLOW_ENGINE_RUN_SCAN_LIMITMax runs to scan per tick
WORKFLOW_FOREACH_MAX_ITEMSMax items in for-each expansion

Safety Kernel

VariableDefaultDescription
SAFETY_KERNEL_ADDRlocalhost:50051gRPC listen address
SAFETY_POLICY_PATHconfig/safety.yamlPath to safety policy file
SAFETY_POLICY_URLLoad policy from URL instead of file
SAFETY_POLICY_URL_ALLOWLISTComma-separated allowed hostnames for URL loading
SAFETY_POLICY_URL_ALLOW_PRIVATEAllow private/loopback policy URLs (not recommended)
SAFETY_POLICY_MAX_BYTESMax policy file size
SAFETY_KERNEL_TLS_CERTTLS server certificate
SAFETY_KERNEL_TLS_KEYTLS server private key
SAFETY_KERNEL_TLS_CAClient TLS CA (for mTLS)
SAFETY_KERNEL_TLS_REQUIREDRequire TLS for kernel connections
SAFETY_KERNEL_INSECURESkip TLS verification (dev only)
SAFETY_DECISION_CACHE_TTLDecision cache TTL (e.g., 5s, 250ms)
OUTPUT_SCANNERS_PATHconfig/output_scanners.yamlPath to scanner patterns file

Safety Kernel — Policy Signature Verification

VariableDefaultDescription
SAFETY_POLICY_PUBLIC_KEYPublic key for signature verification (PEM)
SAFETY_POLICY_SIGNATUREInline signature
SAFETY_POLICY_SIGNATURE_PATHPath to signature file
SAFETY_POLICY_SIGNATURE_REQUIREDRequire valid signature

Safety Kernel — Policy Reload / Overlays

VariableDefaultDescription
SAFETY_POLICY_RELOAD_INTERVALPolicy file reload interval
SAFETY_POLICY_CONFIG_SCOPEConfig service scope for overlay
SAFETY_POLICY_CONFIG_IDConfig service scope ID
SAFETY_POLICY_CONFIG_KEYConfig service data key
SAFETY_POLICY_CONFIG_DISABLEDisable config service overlay

Context Engine

VariableDefaultDescription
CONTEXT_ENGINE_ADDR:50070gRPC listen address
CONTEXT_ENGINE_TLS_CERTTLS server certificate
CONTEXT_ENGINE_TLS_KEYTLS server private key
CONTEXT_ENGINE_TLS_CAClient TLS CA (for connections to engine)
CONTEXT_ENGINE_TLS_REQUIREDRequire TLS connections
CONTEXT_ENGINE_INSECURESkip TLS verification
CONTEXT_ENGINE_MAX_ENTRY_BYTESMax size per context entry
CONTEXT_ENGINE_MAX_CHUNK_SCANMax chunks to scan per retrieval

MCP Server

VariableDefaultDescription
CORDUM_API_KEYAPI key for gateway-backed MCP handlers
CORDUM_TENANT_IDTenant ID for MCP bridge/resource operations
MCP_TRANSPORTstdioTransport mode: stdio (default) or http
MCP_HTTP_ADDR:8090HTTP listen address (only used when MCP_TRANSPORT=http)

HA note — HTTP transport: Set MCP_TRANSPORT=http to enable HTTP mode, which exposes /sse (SSE stream) and /message (POST JSON-RPC) endpoints. This allows running multiple MCP server replicas behind a load balancer. The default stdio mode supports only a single instance and is intended for local CLI integrations.

Audit Export

VariableDefaultDescription
CORDUM_AUDIT_EXPORT_TYPEExport type: webhook, syslog, datadog, cloudwatch
CORDUM_AUDIT_EXPORT_WEBHOOK_URLWebhook endpoint URL
CORDUM_AUDIT_EXPORT_WEBHOOK_SECRETWebhook HMAC signing secret
CORDUM_AUDIT_EXPORT_SYSLOG_ADDRSyslog server address
CORDUM_AUDIT_EXPORT_DD_API_KEYDatadog API key
CORDUM_AUDIT_EXPORT_DD_SITEDatadog site (e.g., datadoghq.com)
CORDUM_AUDIT_EXPORT_DD_TAGSDatadog tags (comma-separated)
CORDUM_AUDIT_EXPORT_CW_LOG_GROUPCloudWatch log group
CORDUM_AUDIT_EXPORT_CW_LOG_STREAMCloudWatch log stream
AWS_REGIONAWS region for CloudWatch
AWS_ACCESS_KEY_IDAWS credentials
AWS_SECRET_ACCESS_KEYAWS credentials
AUDIT_TRANSPORTbufferAudit transport: buffer (in-memory) or nats (NATS-backed, recommended for multi-replica)
CORDUM_AUDIT_BUFFER_SIZE1000In-memory audit buffer size (events)
CORDUM_AUDIT_EXPORT_MAX_RETRIES3Max retries before dropping a batch

NATS-Backed Audit Pipeline

When AUDIT_TRANSPORT=nats, audit events are published to the NATS subject sys.audit.export instead of being buffered in per-process memory. A consumer subscribes with queue group audit-exporters so exactly one replica handles each event. This provides:

  • Crash resilience — events survive process restarts when JetStream is enabled (at-least-once delivery)
  • Stateless replicas — audit events are no longer tied to the process that generated them
  • Automatic fallback — if NATS publish fails, events fall back to the local in-memory buffer

The consumer calls the configured SIEM exporter (CORDUM_AUDIT_EXPORT_TYPE) for each event. Failed exports trigger NATS redelivery (nak). Malformed messages are acked to prevent poison pill loops.

Note: For production HA deployments, enable JetStream on the sys stream to get durable audit delivery. Without JetStream, audit events use core NATS (at-most-once).

DLQ

VariableDefaultDescription
CORDUM_DLQ_ENTRY_TTL_DAYSDLQ entry TTL in days

Worker SDK

VariableDefaultDescription
NATS_URLnats://localhost:4222NATS URL for worker connections
WORKER_IDExplicit worker ID (auto-generated if not set)

CLI TLS

VariableDefaultDescription
CORDUM_TLS_CACA certificate path for CLI TLS verification
CORDUM_TLS_INSECURESet to 1 to skip TLS verification (dev/debug only)

Dashboard

VariableDefaultDescription
CORDUM_API_UPSTREAM_SCHEMEhttpSet to https when gateway serves TLS
CORDUM_DASHBOARD_EMBED_API_KEYEmbed API key in dashboard (dev only)

Docker Compose Helpers

VariableDefaultDescription
COMPOSE_HTTP_TIMEOUTDocker Compose HTTP timeout
DOCKER_CLIENT_TIMEOUTDocker client timeout

Configuration Reference

Complete reference for all Cordum configuration files, environment variables, and the config overlay system.

For a quick-start overview, see configuration.md.


Table of Contents

  1. Overview
  2. system.yaml — System Configuration
  3. Config Overlay System
  4. pools.yaml — Worker Pool Routing
  5. timeouts.yaml — Timeout Configuration
  6. safety.yaml — Safety Policy
  7. output_scanners.yaml — Output Scanner Patterns
  8. Environment Variables Master Table
  9. Cross-References

Overview

Cordum uses three configuration layers:

  1. YAML config files — mounted into containers from config/
  2. Environment variables — per-service settings, secrets, addresses
  3. Config overlay system — runtime config stored in Redis, merged by scope hierarchy

Config Files

FilePurposeValidated
config/pools.yamlTopic-to-pool routing, pool capability requirementsYes (JSON Schema)
config/timeouts.yamlPer-topic and per-workflow timeouts, reconciler settingsYes (JSON Schema)
config/safety.yamlSafety kernel input/output rules, MCP allow/deny listsYes (JSON Schema)
config/output_scanners.yamlOutput content scanner regex patterns (secret, PII, injection)No
config/system.yamlSystem-wide config (budgets, rate limits, models, SLOs) — stored via config serviceNo
config/nats.confNATS server config (JetStream sync_interval)N/A

The control plane validates pools, timeouts, and safety files against embedded JSON schemas in core/infra/config/schema/. Invalid configs return errors; for timeouts, the system falls back to defaults.

Config Loading Order

  1. YAML files loaded from paths specified by env vars (or defaults)
  2. On startup, bootstrapConfig() writes file-based pools/timeouts into the Redis config service
  3. Runtime overlay from Redis config service takes precedence over files
  4. Env vars override specific settings (e.g., OUTPUT_POLICY_ENABLED overrides safety.yaml)

system.yaml — System Configuration

config/system.yaml is not mounted by default in Docker Compose. It is a payload for the config service — store it via POST /api/v1/config or let packs write fragments.

safety

Controls system-wide safety defaults. These supplement the rule-based policy in safety.yaml.

FieldTypeDefaultDescription
pii_detection_enabledbooltrueEnable PII detection in inputs
pii_actionstring"block"Action on PII detection: block, redact, warn
pii_types_to_detectstring[]["email","phone"]PII categories to scan for
injection_detectionbooltrueEnable prompt injection detection
injection_sensitivitystring"high"Sensitivity level: low, medium, high
content_filter_enabledbooltrueEnable content category filtering
blocked_categoriesstring[]["hate_speech","sexual_content"]Blocked content categories
anomaly_detectionboolfalseEnable anomaly detection
allowed_topicsstring[][]Allowlisted topics (empty = all allowed)
denied_topicsstring[][]Denylisted topics

budget

Cost control and attribution settings.

FieldTypeDefaultDescription
daily_limit_usdfloat1000.0Daily spend limit in USD
monthly_limit_usdfloat10000.0Monthly spend limit
per_job_max_usdfloat5.0Maximum cost per single job
per_workflow_max_usdfloat50.0Maximum cost per workflow run
alert_at_percentint[][50,75,90,100]Alert at these % of limit
action_at_limitstring"throttle"Action when limit hit: throttle, deny, alert
cost_attribution_enabledbooltrueEnable per-tenant cost tracking
cost_centersstring[][]Cost center tags for attribution

rate_limits

System-level budget rate limits enforced by the scheduler. These are independent from gateway-level API rate limiting (API_RATE_LIMIT_RPS env var), which is enforced by the api-gateway middleware before requests reach the scheduler.

FieldTypeDefaultDescription
requests_per_minuteint120000Sustained throughput limit (2000 req/sec)
requests_per_hourint7200000Hourly throughput limit
burst_sizeint4000Token bucket burst — peak spike capacity before throttling
concurrent_jobsint10000Max concurrent jobs across all tenants
concurrent_workflowsint5Max concurrent workflows
queue_sizeint5000Max pending queue depth

retry

Default retry policy for jobs (overridable per-topic in timeouts.yaml).

FieldTypeDefaultDescription
max_retriesint3Maximum retry attempts
initial_backoffduration1sInitial backoff delay
max_backoffduration30sMaximum backoff delay
backoff_multiplierfloat2.0Exponential backoff multiplier
retryable_errorsstring[]["network_error","timeout"]Error types that trigger retry
non_retryable_errorsstring[]["bad_request"]Error types that skip retry

resources

Resource allocation defaults.

FieldTypeDefaultDescription
default_prioritystring"interactive"Default job priority
max_timeout_secondsint300Maximum allowed timeout
default_timeout_secondsint60Default job timeout
max_parallel_stepsint10Max parallel workflow steps
preemption_enabledbooltrueAllow job preemption
preemption_grace_periodint30Seconds before preemption

models

Allowed LLM model configuration.

FieldTypeDefaultDescription
allowed_modelsstring[]["gpt-4","llama-3","claude-3"]Permitted model identifiers
default_modelstring"gpt-4"Default model for jobs
fallback_modelsstring[]["llama-3"]Models to try if primary unavailable

context

Context engine retrieval settings.

FieldTypeDefaultDescription
allowed_memory_idsstring[]["repo:*","kb:*"]Allowed memory ID patterns
denied_memory_idsstring[][]Denied memory ID patterns
max_context_tokensint4000Max tokens to retrieve
max_retrieved_chunksint10Max chunks per retrieval
cross_tenant_accessboolfalseAllow cross-tenant context access
allowed_connectorsstring[]["github","slack"]Permitted connector types
redaction_policiesobject{}Config field defined but not yet consumed at runtime

slo

Service-level objective configuration.

FieldTypeDefaultDescription
target_p95_latency_msint1000Target p95 latency in milliseconds
error_rate_budgetfloat0.01Error rate budget (1%)
timeout_secondsint60SLO evaluation window timeout
criticalboolfalseMark as critical service

experiment (NOT YET IMPLEMENTED)

Struct exists in code but no runtime code reads these fields.

experiment:
enabled: false
name: ""
buckets: []

integrations (NOT YET IMPLEMENTED)

Struct exists in code but no runtime code reads these fields.

integrations:
github:
enabled: false
connection_id: ""
allowed_teams: []
allowed_scopes: []
gitlab: # same structure
slack: # same structure
jira: # same structure

observability (NOT YET IMPLEMENTED)

No backing code or struct exists.

observability:
otel:
enabled: false
endpoint: ""
protocol: "grpc" # grpc | http
headers: {}
resource_attributes: {}
grafana:
base_url: ""
dashboards:
system_overview: ""
workflow_performance: ""

alerting (NOT YET IMPLEMENTED)

No backing code or struct exists.

alerting:
pagerduty:
enabled: false
integration_key: ""
severity: "critical"
slack:
enabled: false
webhook_url: ""
severity: "error"

Config Overlay System

The config service stores configuration fragments in Redis, organized by scope hierarchy. Lower scopes override higher ones.

Scope Hierarchy

system (global defaults)
└── org (organization overrides)
└── team (team overrides)
└── workflow (workflow-specific)
└── step (step-specific)

Redis Key Format

cfg:{scope}:{scope_id}

Examples:

  • cfg:system:default — system-wide config (pools, timeouts, pack catalogs)
  • cfg:system:policy — policy bundle fragments from packs
  • cfg:system:packs — installed pack registry
  • cfg:system:pack_catalogs — marketplace catalog definitions
  • cfg:org:acme-corp — organization-level overrides
  • cfg:team:platform — team-level overrides
  • cfg:workflow:my-workflow — workflow-specific config

Document Structure

Each config document in Redis is a JSON object:

{
"scope": "system",
"scope_id": "default",
"data": {
"pools": { ... },
"timeouts": { ... },
"_poolsFileHash": "sha256...",
"_timeoutsFileHash": "sha256..."
},
"revision": 3,
"updated_at": "2026-01-15T10:30:00Z",
"meta": {}
}

bootstrapConfig() Behavior

On scheduler startup, bootstrapConfig() syncs file-based config into Redis:

  1. Reads cfg:system:default from Redis
  2. For pools and timeouts:
    • If the key does not exist in Redis, writes the file-based config (creates key)
    • If the key exists, compares SHA-256 hashes of the file content
    • If hashes differ, updates Redis with new file content (file wins)
    • If hashes match, no-op
  3. This means dashboard/API changes to pools/timeouts persist until the file changes

Config Reload

Config changes propagate to all replicas through two mechanisms:

  1. NATS notification (immediate) — When PUT /api/v1/config writes to Redis, the API gateway publishes a lightweight notification to sys.config.changed (broadcast, empty queue group). All scheduler replicas subscribe and reload config from Redis immediately on receipt.

  2. Polling fallback (30s) — Each scheduler replica polls Redis for config changes on a configurable interval. This catches any notifications missed due to transient NATS issues.

  • Env var: SCHEDULER_CONFIG_RELOAD_INTERVAL (default 30s)
  • NATS subject: sys.config.changed — broadcast to all replicas
  • On each reload (notification or poll), it reads cfg:system:default and compares hashes
  • If pools changed: updates routing table live
  • If timeouts changed: updates reconciler timeouts live

Note: The NATS message is a notification only — it does not contain the config data itself. Replicas always reload from Redis to ensure consistency.

Resetting Cached Config

To force a config reload from files:

# Delete the Redis config key
redis-cli DEL cfg:system:default

# The next scheduler tick (or restart) will re-bootstrap from files

Effective Config Resolution

The config service merges scopes top-down. For a given request context:

effective = merge(system, org, team, workflow, step)

Each scope's data map shallow-merges into the result. Keys in lower scopes override higher scopes.

API Endpoints

  • GET /api/v1/config?scope={scope}&scope_id={id} — read a config document
  • PUT /api/v1/config — write/update a config document
  • GET /api/v1/config/effective?scope={scope}&scope_id={id} — get merged effective config

Fresh Install Behavior

On fresh installs, no cfg:system:default key exists in Redis. When the dashboard requests GET /api/v1/config (which defaults to scope=system&scope_id=default), the gateway returns 200 {} — an empty JSON object. The dashboard renders its built-in defaults (safety stance, rate limits, retention days, etc.) until an admin saves settings via the Settings page or POST /api/v1/config.

No manual config seeding is required. Non-default scope queries (e.g., ?scope=org&scope_id=acme) still return 404 if the config document does not exist.


pools.yaml — Worker Pool Routing

Defines how job topics are routed to worker pools.

Example

topics:
"job.default": ["general"]
"job.hello-pack.echo": ["hello-pack"]
"job.code-review": ["code-review", "general"] # fallback order
"job.compliance.*": ["compliance"]

pools:
general:
requires: []
hello-pack:
requires: []
code-review:
requires: ["code.read", "code.write"]
compliance:
requires: ["compliance.review", "data.access"]

Topics Section

Maps NATS subject patterns to ordered lists of pool names.

FieldTypeDescription
topicsmap[string]string[]Topic pattern → ordered list of eligible pool names
  • Topics use exact match or NATS wildcard patterns
  • The list ordering defines fallback priority — first pool with capacity wins
  • Worker pool name must match the pool a worker heartbeats as

Pools Section

Defines pool profiles and capability requirements.

FieldTypeDescription
poolsmap[string]PoolDefPool name → pool definition
pools.*.requiresstring[]Capabilities a worker must declare to join this pool

Routing Algorithm

  1. Scheduler receives a job with topic (e.g., job.code-review)
  2. Looks up topic in topics map → gets pool list ["code-review", "general"]
  3. For each pool in order: a. Checks if pool has workers with required capabilities (requires list) b. Checks if pool has capacity (workers available) c. First match wins — job dispatched to that pool
  4. If no pool matches → job stays in pending state for reconciler

Schema

Validated against core/infra/config/schema/pools.schema.json.


timeouts.yaml — Timeout Configuration

Controls per-topic timeouts, per-workflow timeouts, and reconciler settings.

Example

reconciler:
dispatch_timeout_seconds: 300 # 5 min for pending→dispatched
running_timeout_seconds: 900 # 15 min default for running jobs
scan_interval_seconds: 30 # check every 30s

topics:
"job.compliance.review":
timeout_seconds: 600 # 10 min timeout
max_retries: 5
"job.quick-check":
timeout_seconds: 30
max_retries: 1

workflows:
"long-pipeline":
child_timeout_seconds: 1800 # 30 min per step
total_timeout_seconds: 7200 # 2 hr total
max_retries: 2

Reconciler Section

Controls how the scheduler detects and handles stalled jobs.

FieldTypeDefaultDescription
dispatch_timeout_secondsint300 (5m)Max time for pending → dispatched transition
running_timeout_secondsint900 (15m)Max time for dispatched → completed transition. Per-topic overrides available via topics.<topic>.running_timeout_seconds.
scan_interval_secondsint30How often reconciler scans for stale jobs

Topics Section

Per-topic timeout overrides.

FieldTypeDefaultDescription
topics.*.timeout_secondsint(reconciler default)Job execution timeout for this topic
topics.*.max_retriesint0Max retries for this topic

Workflows Section

Per-workflow timeout overrides.

FieldTypeDefaultDescription
workflows.*.child_timeout_secondsint(reconciler default)Timeout per child step
workflows.*.total_timeout_secondsint(none)Total workflow run timeout
workflows.*.max_retriesint0Max retries per step

Schema

Validated against core/infra/config/schema/timeouts.schema.json.


safety.yaml — Safety Policy

Defines safety kernel input rules, output rules, and MCP (Model Context Protocol) configuration.

For full details on the safety kernel, see safety-kernel.md. For output policy, see output-policy.md.

Example

version: "1"
rules:
- id: fraud-review
match:
capabilities: ["bank.transfer"]
risk_tags: ["financial", "high_value"]
decision: require_approval
reason: "Financial transactions require human approval"

- id: auto-allow-validators
match:
capabilities: ["validate.*"]
decision: allow
reason: "Read-only validation is always safe"

output_policy:
enabled: false
fail_mode: open # open = allow on scanner error, closed = deny

output_rules:
- id: secret_leak
match:
detectors: ["secret_leak"]
decision: quarantine
reason: "Potential secret in output"

- id: pii
match:
detectors: ["pii"]
decision: redact
reason: "PII detected — redacting"

tenants:
acme-corp:
mcp:
allow: ["github", "slack"]
deny: ["*"]
default:
mcp:
allow: ["*"]
deny: []

Rules Section (Input Policy)

FieldTypeDescription
rules[].idstringUnique rule identifier
rules[].match.capabilitiesstring[]Capability patterns to match (supports * wildcard)
rules[].match.risk_tagsstring[]Risk tag patterns to match
rules[].match.metadatamapKey-value metadata conditions
rules[].decisionstringallow, deny, require_approval, throttle
rules[].reasonstringHuman-readable reason
rules[].throttle_durationdurationRequired if decision is throttle

Rules are evaluated top-to-bottom; first match wins.

Velocity Rule Fragments

Velocity rules are regular rules[] entries stored as dedicated policy bundle fragments at cfg:system:policy -> bundles -> velocity/{id}. They do not change the safety-kernel evaluator; they only add managed rule fragments that use the existing velocity block on input rules.

Example fragment:

version: "1"
rules:
- id: login-burst
match:
topics: ["job.auth.login"]
tenants: ["default"]
risk_tags: ["auth"]
velocity:
max_requests: 3
window_seconds: 60
key: tenant
decision: require_approval
reason: "Repeated login attempts require review"
FieldTypeDescription
rules[].velocity.max_requestsintRequests allowed inside the sliding window before the rule fires
rules[].velocity.window_secondsintSliding-window size in seconds (1 to 86400)
rules[].velocity.keystringBucket key expression (tenant, topic, actor_id, actor_type, capability, pack_id, or labels.<key>; compound keys use :)

Default Decision

The default_decision field at the top of safety.yaml controls what happens when no input rule matches a job. The production default is deny (fail-closed), meaning unmatched jobs are rejected. To whitelist specific topics, add decision: allow rules.

# Fail-closed: unmatched jobs are denied
default_decision: deny

Output Policy Section

FieldTypeDefaultDescription
output_policy.enabledboolfalseEnable output scanning
output_policy.fail_modestring"closed"open = allow on scanner error, closed = quarantine on scanner error (recommended for production)

Output Rules Section

FieldTypeDescription
output_rules[].idstringUnique rule identifier
output_rules[].match.topicsstring[]Topic patterns
output_rules[].match.capabilitiesstring[]Capability patterns
output_rules[].match.risk_tagsstring[]Risk tag patterns
output_rules[].match.content_patternsstring[]Regex patterns for content matching
output_rules[].match.detectorsstring[]Scanner detector names (secret_leak, pii, injection)
output_rules[].match.max_output_bytesintMaximum output size in bytes
output_rules[].decisionstringallow, deny, quarantine, redact
output_rules[].reasonstringHuman-readable reason

Tenants Section

Per-tenant MCP tool access control.

FieldTypeDescription
tenants.*.mcp.allowstring[]Allowed MCP tool/resource patterns
tenants.*.mcp.denystring[]Denied MCP tool/resource patterns

Schema

Validated against core/infra/config/schema/safety_policy.schema.json.


output_scanners.yaml — Output Scanner Patterns

Defines regex-based content scanners for output policy enforcement. Loaded by the safety kernel when OUTPUT_POLICY_ENABLED=true.

Example

scanners:
secret:
patterns:
- name: aws_access_key
regex: "AKIA[0-9A-Z]{16}"
severity: critical
confidence: high
- name: github_token
regex: "gh[ps]_[A-Za-z0-9_]{36,}"
severity: critical
confidence: high
- name: generic_api_key
regex: "(?i)(api[_-]?key|apikey|secret[_-]?key)\\s*[:=]\\s*['\"]?[A-Za-z0-9/+=]{20,}"
severity: high
confidence: medium
pii:
patterns:
- name: email_address
regex: "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
severity: medium
confidence: high
- name: ssn
regex: "\\b\\d{3}-\\d{2}-\\d{4}\\b"
severity: critical
confidence: high
injection:
patterns:
- name: prompt_injection
regex: "(?i)(ignore previous|disregard|forget all|system prompt)"
severity: high
confidence: medium

Scanner Definition

FieldTypeDescription
scannersmap[string]ScannerScanner name → scanner definition
scanners.*.patternsPattern[]List of regex patterns
scanners.*.patterns[].namestringPattern identifier
scanners.*.patterns[].regexstringGo-compatible regex pattern
scanners.*.patterns[].severitystringcritical, high, medium, low
scanners.*.patterns[].confidencestringhigh, medium, low
scanners.*.patterns[].context_requiredboolWhether surrounding context needed for match

Env Var

VariableDefaultDescription
OUTPUT_SCANNERS_PATHconfig/output_scanners.yamlPath to scanner definitions file

Environment Variables Master Table

Global / Shared

VariableDefaultRequiredDescription
CORDUM_ENVNoSet to production or prod for strict security defaults
CORDUM_PRODUCTIONfalseNoAlternative: set to true for production mode
CORDUM_TLS_MIN_VERSION1.2 (dev), 1.3 (prod)NoMinimum TLS version: 1.2 or 1.3
CORDUM_LOG_FORMATtextNoLog format: json or text
CORDUM_GRPC_REFLECTIONNoSet to 1 to enable gRPC reflection (dev only)
NATS_URLnats://localhost:4222YesNATS server URL
REDIS_URLredis://localhost:6379YesRedis URL (Compose: redis://:${REDIS_PASSWORD}@redis:6379 — password required)
NATS_USE_JETSTREAM0NoEnable NATS JetStream: 0 or 1
POOL_CONFIG_PATHconfig/pools.yamlNoPath to pools config
TIMEOUT_CONFIG_PATHconfig/timeouts.yamlNoPath to timeouts config. Production mode: if explicitly set and the file cannot be loaded or parsed, the scheduler exits with an error. In dev mode, falls back to built-in defaults with a warning.
SAFETY_POLICY_PATHconfig/safety.yamlNoPath to safety policy
SAFETY_KERNEL_ADDRlocalhost:50051NoSafety kernel gRPC address
CONTEXT_ENGINE_ADDR:50070NoContext engine gRPC address
OUTPUT_POLICY_ENABLEDfalseNoEnable output policy scanning: true, 1
CORDUM_TENANT_IDNoDefault tenant ID for SDK/MCP clients
CORDUM_INSTANCE_IDos.Hostname()NoOverride pod name used in Prometheus pod label. Defaults to hostname; falls back to "unknown"

Prometheus pod label: All Cordum metrics include a pod const label (os.Hostname() or CORDUM_INSTANCE_ID) so Prometheus can distinguish replicas in HA deployments. Use sum by (pod) (cordum_scheduler_jobs_received_total) for per-replica breakdown.

Licensing

VariableDefaultDescription
CORDUM_LICENSE_FILEPath to license JSON file. If not set, checks ~/.cordum/license.json and /etc/cordum/license.json
CORDUM_LICENSE_TOKENLicense token (base64-encoded or raw JSON). Alternative to file-based licensing
CORDUM_LICENSE_PUBLIC_KEYembeddedBase64-encoded Ed25519 public key for signature verification
CORDUM_LICENSE_PUBLIC_KEY_PATHPath to public key file (alternative to inline)

No license = Community tier (3 workers, 3 concurrent jobs, 500 RPS, 7-day audit retention). Invalid or expired licenses degrade to Community — Cordum never crashes or blocks startup due to licensing.

Telemetry

VariableDefaultDescription
CORDUM_TELEMETRY_MODEanonymousTelemetry mode: off (no collection), local_only (collect but don't report), anonymous (collect and report aggregate stats)
CORDUM_TELEMETRY_ENDPOINThttps://telemetry.cordum.io/v1/reportHTTPS endpoint for anonymous telemetry reports

Telemetry is independent from licensing. It never collects PII, prompts, secrets, or job content. Operators can opt out at any time via CORDUM_TELEMETRY_MODE=off or POST /api/v1/telemetry/consent.

NATS TLS

VariableDefaultDescription
NATS_TLS_CACA certificate path for NATS TLS
NATS_TLS_CERTClient certificate path
NATS_TLS_KEYClient private key path
NATS_TLS_INSECURESkip TLS verification
NATS_TLS_SERVER_NAMETLS server name override

NATS JetStream

VariableDefaultDescription
NATS_JS_ACK_WAIT10mJetStream ack wait duration
NATS_JS_MAX_AGE7dJetStream message max age
NATS_JS_REPLICAS1JetStream stream replication factor

Redis TLS

VariableDefaultDescription
REDIS_TLS_CACA certificate path for Redis TLS
REDIS_TLS_CERTClient certificate path
REDIS_TLS_KEYClient private key path
REDIS_TLS_INSECURESkip TLS verification
REDIS_TLS_SERVER_NAMETLS server name override
REDIS_CLUSTER_ADDRESSESComma-separated cluster seeds (host:port)

Redis Data TTL

VariableDefaultDescription
REDIS_DATA_TTL_SECONDSData TTL in seconds (takes precedence)
REDIS_DATA_TTLData TTL as Go duration (e.g., 24h)

Redis Connection Pool

VariableDefaultDescription
REDIS_POOL_SIZE20Max connections per Redis node. Each service replica opens up to this many connections.
REDIS_MIN_IDLE_CONNS5Minimum idle connections kept warm per Redis node. Reduces cold-start latency for bursty traffic.

Sizing guidance: With N service replicas × P pool size × M Redis nodes, total connections ≈ N×P×M. For example, 3 scheduler replicas × 50 pool × 1 Redis = 150 connections. Redis default maxclients is 10000, so pool sizes up to 100 are safe for typical deployments. The scheduler benefits from higher pool sizes (recommend 50) due to concurrent job dispatch; other services can use the default 20.

Invalid values (non-numeric, zero, negative) are silently replaced with defaults and a warning is logged.

Gateway

VariableDefaultDescription
GATEWAY_GRPC_ADDR:50051gRPC listen address
GATEWAY_HTTP_ADDR:8080HTTP listen address
GATEWAY_METRICS_ADDR:9090Metrics listen address
GATEWAY_METRICS_PUBLICSet to 1 for non-loopback metrics in production
GATEWAY_HTTP_TLS_CERTHTTP TLS certificate path
GATEWAY_HTTP_TLS_KEYHTTP TLS private key path
GRPC_TLS_CERTgRPC TLS certificate path
GRPC_TLS_KEYgRPC TLS private key path
GATEWAY_MAX_JOB_PAYLOAD_BYTES2097152 (2 MB)Max job submission payload size in bytes
GATEWAY_MAX_BODY_BYTES1048576 (1 MB)Max HTTP request body size in bytes
GATEWAY_MAX_JSON_BODY_BYTESMax JSON request body size
TENANT_IDSingle-tenant default ID
ARTIFACT_MAX_BYTESMax artifact upload/download size
WORKFLOW_FOREACH_MAX_ITEMSMax items in workflow for-each expansion
POLICY_CHECK_FAIL_MODEclosedBehavior when Safety Kernel is unreachable during policy evaluation (both gateway submit-time and scheduler dispatch-time). closed (default): reject the job. open: allow with warning log.

Gateway — API Keys

VariableDefaultDescription
CORDUM_API_KEYSingle API key
API_KEYFallback if CORDUM_API_KEY not set
CORDUM_API_KEYSMultiple keys: comma-separated or JSON array
CORDUM_API_KEYS_PATHPath to keys file (reloads on change)
CORDUM_ALLOW_INSECURE_NO_AUTHSet to 1 for no-auth mode (dev only)
CORDUM_ALLOW_HEADER_PRINCIPALSet to true for header-based principal (disabled in production)

Gateway — Rate Limiting

VariableDefaultDescription
API_RATE_LIMIT_RPS2000Per-tenant rate limit (requests/sec)
API_RATE_LIMIT_BURST4000Per-tenant burst size
API_PUBLIC_RATE_LIMIT_RPS20Public (unauthenticated) rate limit
API_PUBLIC_RATE_LIMIT_BURST40Public burst size
REDIS_RATE_LIMITtrueEnable Redis-backed distributed rate limiting. When true, rate limits are enforced globally across all gateway replicas via Redis sliding-window counters (key format: cordum:rl:{key}:{unix_second}). When false or Redis unavailable, falls back to per-process in-memory token buckets (effective limit = N × configured limit with N replicas).

Horizontal scaling note: With multiple gateway replicas, Redis-backed rate limiting (REDIS_RATE_LIMIT=true) is strongly recommended. Without it, each replica maintains its own in-memory token bucket, so the effective rate limit is multiplied by the number of replicas.

Gateway — CORS

VariableDefaultDescription
CORDUM_ALLOWED_ORIGINSAllowed CORS origins
CORDUM_CORS_ALLOW_ORIGINSAlias for allowed origins
CORS_ALLOW_ORIGINSAlias for allowed origins

Gateway — JWT Authentication

VariableDefaultDescription
CORDUM_JWT_HMAC_SECRETHMAC secret for JWT signing
CORDUM_JWT_PUBLIC_KEYRSA/EC public key (PEM) for JWT verification
CORDUM_JWT_PUBLIC_KEY_PATHPath to public key file
CORDUM_JWT_ISSUERExpected JWT issuer
CORDUM_JWT_AUDIENCEExpected JWT audience
CORDUM_JWT_DEFAULT_ROLEDefault role for JWT tokens without role claim
CORDUM_JWT_CLOCK_SKEWAllowed clock skew (e.g., 30s)
CORDUM_JWT_REQUIREDSet to true to require JWT for all requests

Gateway — OIDC Authentication

VariableDefaultDescription
CORDUM_OIDC_ISSUEROIDC issuer URL
CORDUM_OIDC_AUDIENCEExpected OIDC audience
CORDUM_OIDC_CLAIM_TENANTJWT claim for tenant ID
CORDUM_OIDC_CLAIM_ROLEJWT claim for user role
CORDUM_OIDC_ALLOWED_ALGSComma-separated allowed algorithms
CORDUM_OIDC_JWKS_REFRESH_INTERVALJWKS refresh interval (e.g., 1h)
CORDUM_OIDC_ISSUER_ALLOWLISTComma-separated allowed issuers
CORDUM_OIDC_ALLOW_PRIVATEAllow private/loopback issuer URLs
CORDUM_OIDC_ALLOW_HTTPAllow HTTP (non-TLS) issuer URLs

HA note — JWKS coordination: When running multiple gateway replicas, the OIDC provider automatically coordinates JWKS fetches via Redis. The first replica to refresh fetches from the IdP and writes the JWKS to cordum:auth:jwks:<issuerHash> (TTL 1h). Other replicas read from this cache, reducing IdP load from N requests to 1 per refresh cycle. Each replica also applies random jitter (0–30s initial, 0–15s per tick) to prevent thundering-herd requests. If Redis is unavailable, replicas fall back to direct IdP fetches (same behavior as single-replica).

Gateway — OIDC Authentication

VariableDefaultDescription
CORDUM_OIDC_ENABLEDfalseEnable OIDC JWT validation for bearer tokens
CORDUM_OIDC_ISSUEROpenID Connect issuer URL used for discovery
CORDUM_OIDC_AUDIENCEExpected audience for bearer-token validation; browser callback validation uses CORDUM_OIDC_CLIENT_ID
CORDUM_OIDC_CLAIM_TENANTorg_idClaim name used to resolve the Cordum tenant
CORDUM_OIDC_CLAIM_ROLEcordum_roleClaim name used to resolve the Cordum role
CORDUM_OIDC_CLIENT_IDEnable browser OIDC SSO with this client ID
CORDUM_OIDC_CLIENT_SECRETClient secret used during the authorization-code exchange
CORDUM_OIDC_REDIRECT_URIAbsolute callback URL registered with the IdP (typically https://<gateway>/api/v1/auth/sso/oidc/callback)
CORDUM_OIDC_SCOPESopenid,profile,emailComma-separated scopes requested during login
CORDUM_OIDC_STATE_TTL10mTTL for OIDC state / nonce tracking entries stored in Redis
CORDUM_OIDC_ALLOWED_ALGSRS256,RS384,RS512,ES256,ES384,ES512Restrict accepted signing algorithms
CORDUM_OIDC_JWKS_REFRESH_INTERVAL6hBackground refresh interval for the issuer JWKS cache
OIDC_JWKS_REFRESH_COOLDOWN1mMinimum time between on-demand unknown-kid refresh attempts
CORDUM_OIDC_ISSUER_ALLOWLISTOptional comma-separated allowlist of issuer hosts/domains
CORDUM_OIDC_ALLOW_PRIVATEfalse in productionAllow private-network issuer hosts in production
CORDUM_OIDC_ALLOW_HTTPfalse in productionAllow plain HTTP issuer / redirect URLs in production
CORDUM_AUTH_REDIRECT_URL<ui-origin>/loginPost-auth redirect target used after OIDC or SAML completes
CORDUM_AUTH_SESSION_TTL24hBrowser/session token TTL for password, OIDC, and SAML sign-ins

Helm / Compose note: The Helm chart exposes these under auth.oidc.*.

Gateway — SAML Authentication

VariableDefaultDescription
CORDUM_SAML_ENABLEDfalseEnable the SAML service-provider endpoints on the gateway
CORDUM_SAML_IDP_METADATA_URLRemote IdP metadata URL the gateway should fetch on startup
CORDUM_SAML_IDP_METADATAInline IdP metadata XML (use instead of the URL for air-gapped installs)
CORDUM_SAML_BASE_URLhttp://localhost:8081External gateway base URL used to publish metadata, ACS, and login endpoints
CORDUM_SAML_CERT_PATHPEM certificate path for the service-provider signing / TLS cert
CORDUM_SAML_KEY_PATHPEM private-key path paired with CORDUM_SAML_CERT_PATH
CORDUM_SAML_ENTITY_IDmetadata URLExplicit SAML entity ID override for the service provider
CORDUM_SAML_BINDINGredirectSP-initiated binding used for the login request (redirect or post)
CORDUM_SAML_RESPONSE_BINDINGpostExpected ACS response binding (post or redirect)
CORDUM_SAML_ALLOW_IDP_INITIATEDfalseAllow IdP-initiated SSO responses with no stored RelayState
CORDUM_SAML_STATE_TTL10mTTL for SAML RelayState/request tracking entries stored in Redis
CORDUM_AUTH_REDIRECT_URL<ui-origin>/loginPost-auth redirect target used after the ACS callback completes
CORDUM_AUTH_SESSION_TTL24hBrowser/session token TTL for password, OIDC, and SAML sign-ins

Helm / Compose note: The Helm chart exposes these under auth.saml.*, and docker-compose.yml includes the same gateway variables as commented examples for local development.

Gateway — SCIM Provisioning

VariableDefaultDescription
CORDUM_SCIM_BEARER_TOKENShared bearer token required by all SCIM 2.0 provisioning endpoints under /api/v1/scim/v2/*

SCIM provisioning is additionally gated by the SCIM license entitlement. When the entitlement is disabled, discovery, user, and group routes return 403 tier_limit_exceeded even if a bearer token is configured.

If CORDUM_SCIM_BEARER_TOKEN is unset, Cordum can generate and store a Redis-backed SCIM token through the admin settings API (POST /api/v1/scim/settings/token) and the dashboard page at /settings/scim. If the env var is set, that value is used unless an operator later creates a Redis-managed override.

SCIM response locations and the dashboard-published endpoint URL are derived from the external gateway base URL (CORDUM_API_BASE_URL, CORDUM_API_BASE, or CORDUM_SAML_BASE_URL).

Helm note: The Helm chart exposes these under auth.scim.*, including auth.scim.existingSecret for referencing an existing Kubernetes secret instead of placing the bearer token inline.

Gateway — Advanced RBAC

Advanced RBAC provides role hierarchy with permission-based access control, gated by the RBAC license entitlement (Enterprise plan). When the entitlement is disabled, the gateway falls back to basic role string matching (admin/operator/viewer).

RBAC roles are stored in Redis (key prefix rbac:role:). Default roles (admin, operator, viewer) are bootstrapped on startup if not present.

VariableDefaultDescription
CORDUM_RBAC_ROLE_DEFSJSON array of custom role definitions to seed on startup (optional)

Dashboard: The roles management tab at /settings/users shows built-in and custom roles. Custom role creation/editing requires the RBAC entitlement.

API: Role management endpoints at /api/v1/auth/roles (see API Reference).

Gateway — User Authentication

VariableDefaultDescription
CORDUM_USER_AUTH_ENABLEDfalseEnable user/password auth (Redis-backed)
CORDUM_ADMIN_USERNAMEadminDefault admin username
CORDUM_ADMIN_PASSWORDAdmin password (creates user on first startup)
CORDUM_ADMIN_EMAILOptional admin email

Gateway — Pack Marketplace

VariableDefaultDescription
CORDUM_PACK_CATALOG_URL(built-in)Official catalog URL
CORDUM_PACK_CATALOG_ID(auto)Catalog ID
CORDUM_PACK_CATALOG_TITLE(auto)Catalog display title
CORDUM_PACK_CATALOG_DEFAULT_DISABLEDSet to 1 to disable default catalog
CORDUM_MARKETPLACE_ALLOW_HTTPSet to 1 for HTTP marketplace URLs
CORDUM_MARKETPLACE_HTTP_TIMEOUTFetch timeout (e.g., 15s)

Scheduler

VariableDefaultDescription
SCHEDULER_METRICS_ADDR:9090Metrics listen address
SCHEDULER_METRICS_PUBLICSet to 1 for non-loopback metrics in production
SCHEDULER_CONFIG_RELOAD_INTERVAL30sConfig overlay reload interval
JOB_META_TTLJob metadata TTL (Go duration, e.g., 48h)
JOB_META_TTL_SECONDSJob metadata TTL in seconds (takes precedence)
WORKER_SNAPSHOT_INTERVALWorker state snapshot interval
OUTPUT_POLICY_ENABLEDfalseEnable output policy: true, 1
POLICY_CHECK_FAIL_MODEclosedBehavior when safety kernel is unreachable during pre-dispatch input policy checks. closed (default): requeue with backoff. open: allow through with warning log and metric. See safety-kernel.md for risk implications.

Gateway + Scheduler — Boundary Hardening

These flags control the canonical topic registry, schema enforcement, worker attestation, and readiness gating described in ADR 009.

VariableDefaultTypeServiceDescription
SCHEMA_ENFORCEMENTwarnstring (off, warn, enforce)gateway + schedulerControls how registered topic schemas are enforced. The gateway uses it at submit time for POST /api/v1/jobs; the scheduler uses the same mode before dispatch. warn logs violations and continues, enforce rejects/failed-jobs on schema mismatch, off skips schema validation.
WORKER_ATTESTATIONoffstring (off, warn, enforce)schedulerControls whether scheduler heartbeat processing requires a valid worker credential token. warn accepts the heartbeat but logs attestation failures; enforce rejects unattested heartbeats; off skips attestation checks.
WORKER_READINESS_REQUIREDfalseboolschedulerWhen true, scheduling only considers workers that have recently advertised matching ready_topics in their handshake. When false, workers without readiness data remain eligible for backward compatibility.
WORKER_READINESS_TTL60sdurationschedulerFreshness window for handshake readiness state. After this TTL expires, the worker heartbeat may still be present, but readiness gating treats the worker as not ready until it handshakes again. Invalid or non-positive values fall back to 60s with a warning log.

Workflow Engine

VariableDefaultDescription
WORKFLOW_ENGINE_HTTP_ADDRHTTP listen address
WORKFLOW_ENGINE_SCAN_INTERVALRun scan interval
WORKFLOW_ENGINE_RUN_SCAN_LIMITMax runs to scan per tick
WORKFLOW_FOREACH_MAX_ITEMSMax items in for-each expansion

Safety Kernel

VariableDefaultDescription
SAFETY_KERNEL_ADDRlocalhost:50051gRPC listen address
SAFETY_POLICY_PATHconfig/safety.yamlPath to safety policy file
SAFETY_POLICY_URLLoad policy from URL instead of file
SAFETY_POLICY_URL_ALLOWLISTComma-separated allowed hostnames for URL loading
SAFETY_POLICY_URL_ALLOW_PRIVATEAllow private/loopback policy URLs (not recommended)
SAFETY_POLICY_MAX_BYTESMax policy file size
SAFETY_KERNEL_TLS_CERTTLS server certificate
SAFETY_KERNEL_TLS_KEYTLS server private key
SAFETY_KERNEL_TLS_CAClient TLS CA (for mTLS)
SAFETY_KERNEL_TLS_REQUIREDRequire TLS for kernel connections
SAFETY_KERNEL_INSECURESkip TLS verification (dev only)
SAFETY_DECISION_CACHE_TTLDecision cache TTL (e.g., 5s, 250ms)
OUTPUT_SCANNERS_PATHconfig/output_scanners.yamlPath to scanner patterns file

Safety Kernel — Policy Signature Verification

VariableDefaultDescription
SAFETY_POLICY_PUBLIC_KEYPublic key for signature verification (PEM)
SAFETY_POLICY_SIGNATUREInline signature
SAFETY_POLICY_SIGNATURE_PATHPath to signature file
SAFETY_POLICY_SIGNATURE_REQUIREDRequire valid signature

Safety Kernel — Policy Reload / Overlays

VariableDefaultDescription
SAFETY_POLICY_RELOAD_INTERVALPolicy file reload interval
SAFETY_POLICY_CONFIG_SCOPEConfig service scope for overlay
SAFETY_POLICY_CONFIG_IDConfig service scope ID
SAFETY_POLICY_CONFIG_KEYConfig service data key
SAFETY_POLICY_CONFIG_DISABLEDisable config service overlay

Context Engine

VariableDefaultDescription
CONTEXT_ENGINE_ADDR:50070gRPC listen address
CONTEXT_ENGINE_TLS_CERTTLS server certificate
CONTEXT_ENGINE_TLS_KEYTLS server private key
CONTEXT_ENGINE_TLS_CAClient TLS CA (for connections to engine)
CONTEXT_ENGINE_TLS_REQUIREDRequire TLS connections
CONTEXT_ENGINE_INSECURESkip TLS verification
CONTEXT_ENGINE_MAX_ENTRY_BYTESMax size per context entry
CONTEXT_ENGINE_MAX_CHUNK_SCANMax chunks to scan per retrieval

MCP Server

VariableDefaultDescription
CORDUM_API_KEYAPI key for gateway-backed MCP handlers
CORDUM_TENANT_IDTenant ID for MCP bridge/resource operations
MCP_TRANSPORTstdioTransport mode: stdio (default) or http
MCP_HTTP_ADDR:8090HTTP listen address (only used when MCP_TRANSPORT=http)

HA note — HTTP transport: Set MCP_TRANSPORT=http to enable HTTP mode, which exposes /sse (SSE stream) and /message (POST JSON-RPC) endpoints. This allows running multiple MCP server replicas behind a load balancer. The default stdio mode supports only a single instance and is intended for local CLI integrations.

Audit Export

VariableDefaultDescription
CORDUM_AUDIT_EXPORT_TYPEExport type: webhook, syslog, datadog, cloudwatch
CORDUM_AUDIT_EXPORT_WEBHOOK_URLWebhook endpoint URL
CORDUM_AUDIT_EXPORT_WEBHOOK_SECRETWebhook HMAC signing secret
CORDUM_AUDIT_EXPORT_SYSLOG_ADDRSyslog server address
CORDUM_AUDIT_EXPORT_DD_API_KEYDatadog API key
CORDUM_AUDIT_EXPORT_DD_SITEDatadog site (e.g., datadoghq.com)
CORDUM_AUDIT_EXPORT_DD_TAGSDatadog tags (comma-separated)
CORDUM_AUDIT_EXPORT_CW_LOG_GROUPCloudWatch log group
CORDUM_AUDIT_EXPORT_CW_LOG_STREAMCloudWatch log stream
AWS_REGIONAWS region for CloudWatch
AWS_ACCESS_KEY_IDAWS credentials
AWS_SECRET_ACCESS_KEYAWS credentials
AUDIT_TRANSPORTbufferAudit transport: buffer (in-memory) or nats (NATS-backed, recommended for multi-replica)
CORDUM_AUDIT_BUFFER_SIZE1000In-memory audit buffer size (events)
CORDUM_AUDIT_EXPORT_MAX_RETRIES3Max retries before dropping a batch

NATS-Backed Audit Pipeline

When AUDIT_TRANSPORT=nats, audit events are published to the NATS subject sys.audit.export instead of being buffered in per-process memory. A consumer subscribes with queue group audit-exporters so exactly one replica handles each event. This provides:

  • Crash resilience — events survive process restarts when JetStream is enabled (at-least-once delivery)
  • Stateless replicas — audit events are no longer tied to the process that generated them
  • Automatic fallback — if NATS publish fails, events fall back to the local in-memory buffer

The consumer calls the configured SIEM exporter (CORDUM_AUDIT_EXPORT_TYPE) for each event. Failed exports trigger NATS redelivery (nak). Malformed messages are acked to prevent poison pill loops.

Note: For production HA deployments, enable JetStream on the sys stream to get durable audit delivery. Without JetStream, audit events use core NATS (at-most-once).

DLQ

VariableDefaultDescription
CORDUM_DLQ_ENTRY_TTL_DAYSDLQ entry TTL in days

Worker SDK

VariableDefaultDescription
NATS_URLnats://localhost:4222NATS URL for worker connections
WORKER_IDExplicit worker ID (auto-generated if not set)

CLI TLS

VariableDefaultDescription
CORDUM_TLS_CACA certificate path for CLI TLS verification
CORDUM_TLS_INSECURESet to 1 to skip TLS verification (dev/debug only)

Dashboard

VariableDefaultDescription
CORDUM_API_UPSTREAM_SCHEMEhttpSet to https when gateway serves TLS
CORDUM_DASHBOARD_EMBED_API_KEYEmbed API key in dashboard (dev only)

Docker Compose Helpers

VariableDefaultDescription
COMPOSE_HTTP_TIMEOUTDocker Compose HTTP timeout
DOCKER_CLIENT_TIMEOUTDocker client timeout

Cross-References