Local Vs Cloud
Most agentic platforms ask the user one question at setup: “which API key do you want to use?” The platform then routes every task to whatever provider that key buys access to. The user has chosen a lock-in. Stallari refuses this shape. Provider selection is a routing decision the platform makes per task, against a policy the user controls, with explicit reporting of what crossed which boundary.
Why Routing Is Policy
Section titled “Why Routing Is Policy”A single workflow rarely benefits from running every step on the same provider. A short classification step belongs on a fast on-device model. A long-context summarisation step might justify a cloud call. A sensitive thread might require a local-only path or an explicit user gate. A failed cloud call should be able to retry on a different cloud provider, not block the entire job.
If the user pre-committed at setup, none of this works. Either every step pays the cloud cost or every step accepts the on-device latency. Either every step risks the same provider outage or the user runs in circles trying to swap API keys mid-week.
Stallari decouples the question. Providers are configured once each: Anthropic, OpenAI, Google, xAI, Apple Foundation Models, the shared on-device inference service. Each provider carries metadata: cost, latency, context window, available models, allowed scopes. Skills declare what they need. Routing matches need against available providers under policy. The user controls the policy; the platform executes it.
On-Device Inference
Section titled “On-Device Inference”Apple Foundation Models is the on-device LLM that ships with recent macOS releases. It runs without a network call. It is free at the margin. Its context window is short but adequate for a broad class of work: classification, extraction, language tagging, decision routing, short summarisation. Stallari treats it as a first-class provider for any task its window can hold.
For heavier on-device work, the Stallari Inference Service runs MLX-Swift models — embeddings, reranking, larger summarisations — on the Mac’s neural engine and GPU. The service is shared across instances on the same machine: if a user runs three Stallari instances on one Mac, they share resident model weights, ANE scheduling, and per-instance quota. This is the substrate that makes “stay on-device when you can” practical at scale, not just aspirational.
The Lens search and ranking layer also runs on-device under MLX. Vault queries, knowledge bundle assembly, memory recall — all of this stays on the user’s hardware. The user can run Stallari with the Wi-Fi off and most operational surfaces continue to function.
Cloud Providers
Section titled “Cloud Providers”Some work genuinely needs a cloud provider. Long-context reasoning, current-knowledge research, multi-step planning beyond what a small on-device model can hold — these justify the cost and the boundary crossing. Stallari supports Anthropic (Opus / Sonnet), OpenAI (ChatGPT), Google (Gemini), and xAI (Grok) as first-class cloud providers. Each is configured independently, each is rotatable, none is the default.
When a cloud call runs, three things happen:
- The platform constructs a typed context packet from sources the scope ACL permits to leave the device for that scope.
- The platform records what was sent — the prompt, the tool surface, the resolved provider — in the audit trail.
- The result comes back and lands in the Activity record with the cost, latency, and any tool calls the cloud model issued.
The user can read the record after the fact. Nothing is hidden. If a workflow sent data to a cloud provider, the dispatch record names the provider, the packet contents, and the scope decision that authorised the call.
Boundary Discipline
Section titled “Boundary Discipline”The substrate enforces what crosses the boundary, not just policy declaration.
| Surface | Cloud-eligible? | Notes |
|---|---|---|
| Vault notes | Only in scopes the user explicitly cloud-eligibled. | Scope ACL gates packet construction. |
| Encrypted credentials | Never. | Credentials live in the keychain-backed store; the runtime reads them without surfacing values to inference. |
| Memory store | Never. | Memory recall happens locally; the recalled content may be packed for cloud inference if its source scope permits. |
| Local-corpus indexes (mail / notes / reminders) | Only when an explicit workflow routes them to cloud, and only the projected fields the workflow declared. | Indexes themselves stay on disk. |
| Audit trail | Never. | Dispatch records are local; the user owns them. |
| Telemetry / usage metadata | Never. | The platform does not phone home — three explicit trigger classes only, all user-opt-in. |
The SQLite-backed substrates (memory, tracks, traces, credentials, runtime stores) are encrypted on disk under a key the user owns. The encryption is not transport security with a vendor key — it is local AES with a key derived per device that never leaves the user’s keychain. A cloud provider that received an exfiltrated database file would still face the encryption layer.
Multi-Provider Reality
Section titled “Multi-Provider Reality”Practical use cases blend providers within a single workflow:
- A morning briefing classifies overnight mail on-device (Apple Foundation Models), assembles a vault knowledge bundle locally (Lens), and routes the long-context summary to a cloud provider (Anthropic) for the user’s daily digest.
- A code-review skill runs lightweight diff classification on-device, then routes the actual review to Claude Code or OpenAI Codex (a separate MCP client wiring entirely, with its own session lifecycle).
- A research workflow uses xAI for current-knowledge retrieval and Anthropic for synthesis, with on-device reranking between them.
Each step records its provider. The user can ask “what did the briefing job cost me last week on Anthropic?” and the audit trail answers from disk.
No Phone-Home
Section titled “No Phone-Home”A load-bearing differentiation point: Stallari does not snitch on the user. The substrate makes exactly three classes of outbound call:
- User-initiated work — a dispatch the user explicitly triggered.
- User-opted-in daily — a recurring job the user enabled, with a visible schedule.
- User-opted-in on-event — a discrete fault class the user asked to be notified about.
There is no telemetry beacon, no usage analytics, no “anonymous diagnostics”, no fleet ping. Routing decisions, model selections, run outcomes — all stay on the user’s hardware. The platform’s business model is not observability of the user.
This matters for the local-vs-cloud framing because vendor platforms commonly conflate “cloud-routed inference” with “we observe everything you do”. Stallari separates these. A cloud inference call sends a bounded packet to the provider for the duration of the call. The platform itself sees no telemetry from that call beyond what the user can read in their own audit trail.
What This Buys
Section titled “What This Buys”The decoupling has practical consequences:
- The user can stay on-device by default and opt cloud in per workflow.
- A provider outage is a routing event, not a platform outage.
- A new provider — local or cloud — slots in as a new route, not a fork of the runtime.
- The user can run Stallari on a plane.
- The user can audit. “Show every cloud call last week, by provider, with the scope tag that authorised it.”
Provider routing as policy is what makes the platform durable against the cloud market’s churn. The user’s relationship with their own data is independent of which vendor is in business this quarter.
Related concepts
Section titled “Related concepts”- Agency model — how typed primitives interact with provider routing.
- Context and memory — how context assembly constructs the packet that crosses any boundary.
- Scope and ACL — how scope tags gate which data classes are cloud-eligible.
- Legibility and continuity — how dispatch records make boundary crossings inspectable.