Guardrails

Stallari assumes model output can be wrong, incomplete, or influenced by untrusted content. Guardrails limit what a model can see, which plugins it can use, and which actions require human review.

This page describes the public safety posture. It does not document internal rule names, scanner signatures, bypass details, or deployment topology.

Core Principles

Principle	Meaning
Least capability	A job should see only the tools and context it needs.
Visible authority	Plugin use, data classes, and action classes should be clear to the user.
Review before risk	Sensitive or irreversible actions should pause for confirmation.
Secrets stay out of prompts	Credentials are resolved at the tool boundary, not sent through model text.
Auditability	Work should be inspectable in Activity and related review surfaces.

Model Context

A provider key does not grant blanket vault access. A model receives only selected context, permitted search results, and plugin outputs needed for the job.

Privacy exclusions and review policy further constrain what should be indexed, recalled, or sent to a model route.

Plugin consent is the main user-facing permission model for service access. A consent surface should name:

publisher,
requested data classes,
action classes,
whether network egress is possible,
whether cloud model context may include returned data,
and how to revoke access.

First-party local corpus consent is a special case for local data classes such as Mail indexing. It still explains what is enabled before triggering broad macOS prompts.

Action Classes

Action	Expected Posture
Read local context	Requires relevant data-class consent and privacy boundary checks.
Write local notes	Should be auditable and correctable.
External side effect	Should name the target service and require policy approval.
Destructive or high-risk action	Should require explicit human review.

macOS Permissions

Stallari treats macOS grants separately:

Login Items keeps the helper available.
Full Disk Access enables specific user-approved local corpus features.
App Management is recovery-oriented for update or replacement flows when macOS blocks them.
Plugin/MCP consent controls service authority.

The product should explain before prompting. It should not trigger broad prompts on entry to a screen.

Prompt Injection

Untrusted text can contain instructions. Stallari’s stance is to constrain authority outside the model:

tools are scoped,
permissions are checked at call time,
risky actions route to review,
and Activity records what happened.

The right answer to suspicious content is not simply a smarter prompt. It is a narrower capability set and an explicit review path.

Auditing And Recovery

Use Activity to inspect work that ran. Use Status to find degraded subsystems or permission recovery actions. Use Review for human judgement and correction.

When something is denied, the denial should name the missing permission, missing plugin, unhealthy provider, or required review path.

Limits

Guardrails reduce risk; they do not remove the need for user judgement. Be especially conservative with workflows that touch money, legal obligations, health, employment, family safety, or external communications.

Scope And ACL — scope tags are the substrate that makes “what can this agent see” answerable per task
Agency Model — least capability, visible authority, review-before-risk are how typed primitives compose
Legibility And Continuity — Activity and review surfaces make denied or sensitive work inspectable