NEXUS
A production-grade AI operating system combining persistent memory, intelligent LLM routing, hierarchical goal management, and real-time risk governance into a unified cognitive loop.
Live Dashboard
Interactive real-time view of the NEXUS cognitive loop — memory state, active goals, LLM routing decisions, and risk assessments.
Architecture Diagram
End-to-end signal flow from user input through the control plane, cognition layer, and into persistent memory stores. Animated data pulses show live information flow paths.
Technical Breakdown
Semantic-aware routing dispatches requests to the optimal LLM backend based on task type, cost constraints, and real-time model health. Supports GPT-4o, Claude 3 Opus, Gemini 1.5 Pro with automatic fallback chains and load-balanced inference.
- Latency P99: <340ms end-to-end
- 9 model backends with health monitoring
- Cost-optimal routing saves ~43% vs single-model
- Streaming support across all backends
Three-tier memory architecture: episodic (session events), semantic (knowledge graph embeddings), and procedural (action patterns). Pinecone handles vector similarity with Redis L1 cache for sub-10ms recall. Memory consolidation runs async to avoid blocking.
- 2.4M+ indexed vectors across memory types
- Semantic search recall: 94.2% top-5 accuracy
- Redis L1 cache: <8ms average retrieval
- Nightly consolidation + memory pruning
HTN-inspired planner that decomposes high-level goals into executable subtask trees. Maintains a priority queue with dependency resolution, parallel execution where safe, and automatic re-planning on task failure or environmental change.
- HTN planning depth up to 7 levels
- Parallel task execution with DAG scheduling
- Failure recovery with backtracking
- Goal persistence across sessions
Celery-backed distributed task queue with priority lanes, rate limiting per tool/API, dead-letter handling, and full observability via OpenTelemetry. Supports both async fire-and-forget and synchronous blocking patterns.
- Priority queues: CRITICAL / HIGH / NORMAL / BATCH
- Rate limiting per external API
- OpenTelemetry spans for full trace coverage
- Dead-letter queue with automatic retry backoff
Multi-stage safety layer evaluates every action before execution: content policy screening, resource budget enforcement, reversibility checks, and human-escalation triggers. Runs on a separate process to prevent bypasses.
- Action-level risk scoring: 0–100 scale
- Irreversible actions require explicit confirmation
- Budget enforcement: time, tokens, money, API calls
- Audit trail: immutable append-only log
The central nervous system of NEXUS. Runs the perception → planning → action → reflection loop. Manages inter-module communication via an internal event bus, maintains agent state machines, and coordinates all subsystem lifecycles.
- Perception-action loop: ~80ms cycle time
- Event-driven architecture (internal bus)
- Agent state machine with 12 states
- Graceful degradation under partial failures