Orchestrator

Individual ProjectPrototype

Stack

Node.js
TypeScript
Angular
Voice AI
AI Runtime
Voice Streaming
Realtime & Pipeline
MCP
3CX Call Control
RAG
SQLite-Vec
ONNX Runtime
Docker

Overview

Voice Agent Orchestrator is a standalone on-premises platform for building, evaluating, and operating AI-powered voice agents for telephony. It supports realtime speech-to-speech models, modular STT→LLM→TTS pipelines, knowledge retrieval, and AI-driven call automation using any combination of local and cloud services.

The platform is intentionally decoupled from the PBX and deployed as an independent runtime. This allows AI infrastructure — including local models, vector databases, embedding services, and evaluation tooling — to evolve and scale separately from the communication system, while keeping telephony integrations simple and vendor-independent.

Problem

Voice AI integrations are often built around specific providers, runtimes, and deployment assumptions. As organizations introduce new models, local inference, knowledge systems, or hybrid architectures, business logic becomes increasingly coupled to implementation details.

The result is slower experimentation, duplicated integrations, and growing maintenance costs whenever the underlying AI stack changes. The challenge is not supporting a single provider or model, but enabling conversational systems to evolve without forcing the surrounding platform to evolve with them.

Approach

The platform is built around two interchangeable voice runtime architectures.

Realtime mode delegates the entire conversation loop to a single speech-to-speech provider such as OpenAI Realtime, xAI Grok, Google Gemini, Alibaba Qwen, or Amazon Nova Sonic.

Pipeline mode decomposes the conversation into independent STT → LLM → TTS stages. Each component can be configured separately and deployed either through cloud providers or self-hosted services. Cloud deployments support OpenAI-compatible providers such as OpenRouter, TogetherAI, Hugging Face, Grok, and others, while local deployments can be powered by Ollama, custom Python services, or any endpoint exposing an OpenAI-compatible interface. Hybrid architectures are fully supported — for example, local speech processing combined with a cloud-hosted reasoning model.

To maximize provider compatibility, the pipeline runtime is built around OpenAI-compatible APIs and implements a pseudo-streaming layer on top of traditionally non-streaming speech services. This allows a broad range of cloud and self-hosted STT/TTS solutions to participate in near real-time conversations while maintaining a unified integration model.

The voice stack can be enhanced through optional processing modules. Currently the platform supports neural Voice Activity Detection using Silero VAD (ONNX Runtime) as an alternative to traditional amplitude-based detection, improving speech segmentation, interruption handling, and transcription quality. Because audio processing is implemented as an independent layer, additional services can be introduced without affecting the surrounding runtime architecture.

The platform integrates with 3CX through the Call Control API for real-time call handling and uses the built-in 3CX MCP server as its primary tool execution layer. Core capabilities such as contact lookup, extension discovery, call transfers, and routing are exposed to agents through MCP tools. Phonebook searches benefit from 3CX's fuzzy linguistic matching capabilities, helping compensate for speech recognition inaccuracies during voice interactions.

Native streaming support is planned for a future release as an alternative execution path alongside the current OpenAI-compatible pipeline. This will enable direct integration with streaming-first providers and self-hosted services while preserving the same runtime abstraction and agent configuration model.

Capabilities

Onboarding Wizard

Guided onboarding from installation to first deployed agent.
Configure Realtime or Pipeline voice architectures.
Connect cloud or self-hosted AI providers.
Optional local Knowledge Base deployment (SQLite-Vec + Embeddings).
Agent templates, routing, prompts, and voice configuration.
All settings remain independently configurable after onboarding.

Runtime Configuration

Modify individual runtime components without affecting the rest of the stack.
Switch between Realtime and Pipeline architectures on demand.
Reconfigure providers, models, tools, voices, and agent settings independently.
Support local, cloud, and hybrid AI deployments.
Live logs, warnings, and service health visibility.
Built-in access to Evaluation, Billing, and Performance analytics.

Knowledge Base

One-click local RAG deployment via Docker.
Built on SQLite, sqlite-vec, and Nomic embeddings.
Shared across all providers and runtime modes.
No migration required when switching models.
Compatible with any model that supports tool calling.
Upload and manage documents directly from the dashboard.
Supports PDF, Markdown, and plain text sources.

MCP Integration

Built-in 3CX MCP server for telephony operations.
Contact lookup, extension discovery, routing, and call transfers.
Support for custom MCP servers through API-based integrations.
Shared across Realtime and Pipeline architectures.
Easily extend agent behavior without modifying core logic.
Easily configure per-agent tool filtering in agent settings after MCP server connection.

Model Quality Evaluation

Available for Pipeline architectures.
Agent-vs-Agent evaluation framework.
Dynamic test scenarios based on agent configuration.
Knowledge Base, routing, and tool usage scenarios included automatically.
Detect regressions before deploying configuration changes.
Compare models under the same agent setup.

Stack Cost Analysis

Separate cost tracking for Realtime and Pipeline architectures.
Active sessions automatically archived after stack changes.
Compare historical configurations and provider combinations.
Automatic pricing synchronization where supported (e.g. OpenRouter).
Manual price overrides with update tracking for unsupported providers.
Optional infrastructure cost accounting for self-hosted components.

Performance Monitoring

Realtime response latency monitoring.
Pipeline latency breakdown across runtime components.
Turn-level performance tracking during conversations.
Quickly identify bottlenecks within the active stack.
Compare performance between different configurations.
Built-in diagnostics for optimization and troubleshooting.

← All projects