Core
Circuit breakers
Per-tool failure counters that open automatically when a tool starts failing — preventing cascading errors and giving failing dependencies time to recover.
Why circuit breakers
Without circuit breakers, a failing dependency (a database that went down, an API that's rate-limiting you) causes every tool call to hang until timeout, then return an error. With 10 retries and 30s timeouts, that's 5 minutes of the AI spinning in place.
Circuit breakers short-circuit this: after N failures, the circuit opens and all subsequent calls fail immediately with a clear message and a retry-after hint. The AI can respond intelligently — try an alternative, wait, or inform the user — rather than hammering a dead service.
States
┌──────────────────────────────────────────┐
│ CLOSED │
│ Normal operation. Requests pass through │
└──────────────────┬──────────────────────┘
│ failure threshold reached
│ (default: 5 failures in 60s)
▼
┌──────────────────────────────────────────┐
│ OPEN │
│ All requests fail immediately. │
│ No handlers called. │
└──────────────────┬──────────────────────┘
│ reset timeout expires
│ (default: 30s)
▼
┌──────────────────────────────────────────┐
│ HALF-OPEN │
│ One test request allowed through. │
│ Success → CLOSED. Failure → OPEN. │
└──────────────────────────────────────────┘CLOSEDNormal. All requests pass through. Failure counter increments on each error. Resets to 0 on each success.
OPENTripped. All requests fail immediately without calling the handler. After the timeout, transitions to HALF-OPEN.
HALF-OPENTesting. One request is allowed through. If it succeeds, the circuit closes. If it fails, it re-opens.
Configuration
Set global defaults and per-tool overrides. Tools that are naturally flaky (external APIs) get higher thresholds; critical local tools (shell, filesystem) get lower ones.
// ~/.conductor/config.json
{
"circuitBreaker": {
// Global defaults (apply to all tools unless overridden)
"failureThreshold": 5, // Open after N failures
"successThreshold": 2, // Close after N successes in HALF-OPEN
"timeout": 30000, // ms before OPEN → HALF-OPEN
// Per-tool overrides
"overrides": {
"shell.exec": {
"failureThreshold": 3,
"timeout": 60000
},
"db.query": {
"failureThreshold": 10,
"timeout": 15000
},
"web.fetch": {
"failureThreshold": 5,
"timeout": 10000
}
}
}
}Monitoring and resetting
# Check all circuit breakers
conductor circuit list
# Output:
# TOOL STATE FAILURES LAST-FAILURE
# filesystem.read CLOSED 0 —
# filesystem.write CLOSED 1 2m ago
# shell.exec OPEN 5 12s ago
# git.commit CLOSED 0 —
# db.query HALF-OPEN 3 35s ago
# Reset a specific circuit (forces CLOSED)
conductor circuit reset shell.exec
# Reset all circuits
conductor circuit reset --allWhat the AI sees
When a circuit is OPEN, the error response includes a retryAfterMs hint. Well-implemented AI clients can use this to schedule a retry or pick an alternative.
// What the AI sees when a circuit is OPEN:
{
"error": {
"code": "CIRCUIT_OPEN",
"tool": "shell.exec",
"message": "Circuit breaker is OPEN for shell.exec. Too many recent failures. Try again in 18s.",
"retryAfterMs": 18000
}
}
// The AI can use this to:
// 1. Try an alternative tool
// 2. Wait and retry
// 3. Report the issue to the userMetrics
Circuit breaker state is exposed in Prometheus metrics (HTTP transport only) at GET /metrics.
conductor_circuit_stateCurrent state (0=CLOSED, 1=HALF_OPEN, 2=OPEN) per toolconductor_circuit_failures_totalCumulative failure count per toolconductor_circuit_trips_totalNumber of times each circuit has trippedconductor_tool_duration_msTool call duration histogram per toolconductor_tool_calls_totalTotal calls per tool, labeled by result