DAVID — PIPELINE INTELLIGENCE

DAVID

Every transaction makes you smarter than your Goliath competitors.

DAVID is a 5-layer AI pipeline stack: Sesh (gateway) → 626 (orchestration) → Stitch (intelligence) → Tinker (fine-tuning). Named after Disney's Lilo & Stitch. Every layer is a character. Together, they form a self-improving intelligence engine that compounds with every interaction.

5
Pipeline layers
14
Models in catalog
6
Unified AI providers
50+
Prompt templates

What it does

Intelligent model routing

ModelRouter scores 14 models by task type (chat/code/eval), complexity, and cost ceiling. Returns best pick + 3 fallbacks with reasoning. Claude for reasoning, GPT for generation, Qwen for bulk — automatically.

Prompt registry with versioning

PromptRegistry manages 50+ Handlebars templates. Register, resolve with variables, fork, A/B test. Version control every prompt change. Roll back bad prompts in one call.

Quality-gated generation

QualityGate runs an LLM judge panel scoring accuracy, relevance, completeness, clarity, and safety. SelfRefineLoop auto-retries on failure — feeding back refinement prompts until quality threshold is met.

Token-aware context management

ContextManager fits conversations into any model's token budget. Extractive summarization preserves key topics. Truncates tool results proportionally. Never loses critical context.

Durable workflow orchestration (626)

Temporal-powered workflows that survive server crashes. ChatWorkflow and IntelligenceWorkflow run multi-step agent loops with retries, timeouts, and distributed task queues.

Closed-loop fine-tuning (Tinker)

Every interaction → Langfuse trace → training data → Dagster pipeline → TML fine-tuned model. Cost trajectory: $50/mo → $5 → $3 via progressive self-hosted models.

Centralized API gateway (Sesh)

SHA-256 hashed API keys. Sliding window rate limiting. Per-key cost tracking with monthly budgets. Multi-provider proxy with automatic failover. SDKs for JS, Python, Go, Rust.

Full observability

Every LLM call traced via Langfuse: prompt version, model, provider, input/output tokens, cost, latency, tool calls, errors. Debug any response back to its source in seconds.

How it works

01

Request enters through Sesh

API gateway authenticates, rate-limits, and routes the request. Cost is pre-estimated and budget-checked before any tokens are spent.

02

Stitch selects the optimal model

ModelRouter analyzes the task (chat? code? SQL?), scores 14 models by complexity and cost, and picks the best candidate with 3 fallbacks.

03

626 orchestrates the workflow

Temporal workflows manage the full lifecycle: context fitting → prompt resolution → generation → tool execution → quality evaluation. Durable and retryable.

04

Quality gate validates the response

QualityGate scores the output. If it fails: SelfRefineLoop retries with feedback. Only responses that pass quality thresholds reach the user.

05

Tinker closes the learning loop

Langfuse traces + user feedback → training data. Dagster orchestrates batch fine-tuning via TML. The next response is better because of this one.

Why it's different

Five-layer architecture (Sesh → 626 → Stitch → Tinker → Observe) — not a monolith, each layer independently upgradable

Self-improving: every interaction feeds the fine-tuning pipeline, compounding quality over time

Provider-agnostic: swap between Anthropic, OpenAI, Moonshot, Ollama, or custom fine-tuned models without code changes

Cost trajectory from $50/month to $3/month via progressive self-hosting and fine-tuning

Temporal-based workflows guarantee completion even through crashes and network failures

Named after David vs. Goliath — small teams beat enterprise competitors through compounding intelligence

Ready to try DAVID?

Start Building →

The stack, layer by layer

Named after Lilo & Stitch. Each layer is a character.

Sesh

Sesh

API Gateway

Centralized API key management, request routing, rate limiting, and cost tracking. Every AI call flows through Sesh — authenticated, metered, and cached.

SHA-256 hashed API keys
Sliding window rate limiting
Per-key monthly cost limits
Multi-provider proxy (Anthropic, OpenAI, Ollama, Moonshot)
Prometheus metrics + webhook delivery
SDKs for JavaScript, Python, Go, Rust
626

626

Workflow Orchestration

Temporal-powered durable workflows that survive crashes, retries, and network failures. Named after Experiment 626 — raw power, properly contained.

Chat workflow: route → fit context → resolve prompt → generate → evaluate
Intelligence workflow: ReAct agent loop with SQL execution
Configurable timeouts: 30s (fast) / 60s (balanced) / 120s (frontier)
Distributed task queues with heartbeat monitoring
Automatic retry with exponential backoff
Stitch

Stitch

AI Pipeline Intelligence

The brain. Prompt versioning, model routing, context fitting, quality gating, and observability — unified into one orchestration layer.

PromptRegistry: 50+ versioned templates with Handlebars rendering
ModelRouter: 14-model catalog scored by task + complexity + cost
ContextManager: token-aware fitting with extractive summarization
QualityGate: LLM judge panel scoring accuracy, relevance, completeness
SelfRefineLoop: generate → evaluate → refine → retry
ProviderPool: 6 unified providers with health checking
Tinker

Tinker

Fine-Tuning Engine

Closed-loop training pipeline. Every interaction → training data → fine-tuned model → better responses. The system that makes DAVID get smarter every day.

Training data from Langfuse traces + user feedback
Dagster orchestration for batch processing
Weights & Biases experiment tracking
TML (Thinking Machines Lab) API for model training
Progressive cost reduction: $50/mo → $5 → $3 via self-hosted fine-tuned models