Guardrails
Stallari assumes model output can be wrong, incomplete, or influenced by untrusted content. Guardrails limit what a model can see, which plugins it can use, and which actions require human review.
This page describes the public safety posture. It does not document internal rule names, scanner signatures, bypass details, or deployment topology.
Core Principles
Section titled “Core Principles”| Principle | Meaning |
|---|---|
| Least capability | A job should see only the tools and context it needs. |
| Visible authority | Plugin use, data classes, and action classes should be clear to the user. |
| Review before risk | Sensitive or irreversible actions should pause for confirmation. |
| Secrets stay out of prompts | Credentials are resolved at the tool boundary, not sent through model text. |
| Auditability | Work should be inspectable in Activity and related review surfaces. |
Model Context
Section titled “Model Context”A provider key does not grant blanket vault access. A model receives only selected context, permitted search results, and plugin outputs needed for the job.
Privacy exclusions and review policy further constrain what should be indexed, recalled, or sent to a model route.
Plugin Consent
Section titled “Plugin Consent”Plugin consent is the main user-facing permission model for service access. A consent surface should name:
- publisher,
- requested data classes,
- action classes,
- whether network egress is possible,
- whether cloud model context may include returned data,
- and how to revoke access.
First-party local corpus consent is a special case for local data classes such as Mail indexing. It still explains what is enabled before triggering broad macOS prompts.
Action Classes
Section titled “Action Classes”| Action | Expected Posture |
|---|---|
| Read local context | Requires relevant data-class consent and privacy boundary checks. |
| Write local notes | Should be auditable and correctable. |
| External side effect | Should name the target service and require policy approval. |
| Destructive or high-risk action | Should require explicit human review. |
macOS Permissions
Section titled “macOS Permissions”Stallari treats macOS grants separately:
- Login Items keeps the helper available.
- Full Disk Access enables specific user-approved local corpus features.
- App Management is recovery-oriented for update or replacement flows when macOS blocks them.
- Plugin/MCP consent controls service authority.
The product should explain before prompting. It should not trigger broad prompts on entry to a screen.
Prompt Injection
Section titled “Prompt Injection”Untrusted text can contain instructions. Stallari’s stance is to constrain authority outside the model:
- tools are scoped,
- permissions are checked at call time,
- risky actions route to review,
- and Activity records what happened.
The right answer to suspicious content is not simply a smarter prompt. It is a narrower capability set and an explicit review path.
Auditing And Recovery
Section titled “Auditing And Recovery”Use Activity to inspect work that ran. Use Status to find degraded subsystems or permission recovery actions. Use Review for human judgement and correction.
When something is denied, the denial should name the missing permission, missing plugin, unhealthy provider, or required review path.
Limits
Section titled “Limits”Guardrails reduce risk; they do not remove the need for user judgement. Be especially conservative with workflows that touch money, legal obligations, health, employment, family safety, or external communications.
Related concepts
Section titled “Related concepts”- Scope And ACL — scope tags are the substrate that makes “what can this agent see” answerable per task
- Agency Model — least capability, visible authority, review-before-risk are how typed primitives compose
- Legibility And Continuity — Activity and review surfaces make denied or sensitive work inspectable