Safety & Validation
ARISE generates and executes Python code at runtime. This page covers the built-in safety mechanisms and production recommendations.
Generated code is untrusted
All synthesized skills are untrusted third-party code until they pass the full validation pipeline. Apply the same security discipline you would to any user-submitted code.
See SECURITY.md for the full threat model.
Validation Pipeline
Section titled “Validation Pipeline”Every synthesized skill passes through multiple layers before promotion:
| Layer | What it does |
|---|---|
| Sandbox | Runs tests in an isolated process or Docker container with a timeout |
| Test suite | LLM writes tests alongside the tool; all must pass |
| Adversarial testing | Separate LLM call tries to break the tool (edge cases, type boundaries, security) |
| Import restrictions | allowed_imports whitelist blocks dangerous modules |
| Promotion gate | Only skills passing all layers become ACTIVE |
| Version control | SQLite checkpoint before every promotion; rollback anytime |
Sandbox
Section titled “Sandbox”Generated code runs in an isolated environment. Configure it in ARISEConfig:
from arise import ARISEConfig
config = ARISEConfig( sandbox_backend="docker", # "subprocess" (default) or "docker" sandbox_timeout=30, # seconds before the sandbox kills the process)subprocess (default)
Section titled “subprocess (default)”Runs generated code in a separate Python process. Provides process isolation and timeout enforcement, but no network or filesystem isolation. Suitable for development and trusted environments.
Docker (recommended for production)
Section titled “Docker (recommended for production)”Runs generated code in an isolated container:
- No network access
- Read-only filesystem
- Resource limits (CPU, memory)
- Hard process timeout
pip install arise-ai[docker]config = ARISEConfig( sandbox_backend="docker", sandbox_timeout=30,)Import Restrictions
Section titled “Import Restrictions”Use allowed_imports to whitelist which modules synthesized skills can use. When set, ARISE performs both static and dynamic analysis:
- Static
import/from ... importstatements - Dynamic
__import__("module")calls importlib.import_module("module")callsexec()/eval()containing import statements
config = ARISEConfig( allowed_imports=[ "json", "re", "hashlib", "csv", "math", "base64", "datetime", "collections", "itertools", ],)Skills with disallowed imports are rejected and refined. If allowed_imports is None (the default), no restriction is applied.
Always set allowed_imports in production
Start with standard library modules only. Add third-party packages only as needed and after reviewing the risk. Never include subprocess, socket, os.system, or requests unless your use case specifically requires it.
Adversarial Testing
Section titled “Adversarial Testing”After the sandbox test suite passes, ARISE runs a second LLM call specifically designed to find weaknesses. The adversarial model generates inputs that target:
- Edge cases (empty inputs, extreme values, boundary conditions)
- Type boundary violations (passing strings where ints are expected)
- Security-probing inputs (path traversal attempts, injection strings)
- Unexpected data shapes
If adversarial tests find a problem, ARISE refines the skill and re-tests before promotion. Skills that still fail after max_refinement_attempts are kept in TESTING status rather than promoted.
Version Control & Rollback
Section titled “Version Control & Rollback”Every skill promotion is checkpointed with an integer version number. You can inspect and roll back at any time:
# Check current library statearise status ./arise_skills
# List skills with their originsarise skills ./arise_skills
# View a specific skill's implementation and testsarise inspect ./arise_skills <skill_id>
# Roll back to a previous versionarise rollback ./arise_skills 3From Python:
arise.rollback(version=3)Rolling back restores the exact set of active skills from that checkpoint. The rolled-back versions are not deleted — you can roll forward again.
Skill Registry Security
Section titled “Skill Registry Security”The SkillRegistry distributes executable Python code via S3. Treat registry entries with the same care as any code distribution system.
When pulling from a registry:
from arise import SkillRegistryfrom arise.skills.sandbox import Sandbox
registry = SkillRegistry(bucket="my-registry")sandbox = Sandbox(backend="docker")
# Always validate pulled skillsskill = registry.pull("parse_csv", validate=True, sandbox=sandbox)
# Pin a specific version — don't always pull latestskill = registry.pull("parse_csv", version=3)IAM permissions:
- Agent processes should have read-only S3 access (
s3:GetObject,s3:ListBucket) - Only the worker process (or a dedicated publisher role) should have write access (
s3:PutObject) - Enable S3 versioning on the registry bucket for rollback capability
Rate Limiting
Section titled “Rate Limiting”Cap LLM spend for evolution with max_evolutions_per_hour:
config = ARISEConfig( max_evolutions_per_hour=3, # default max_library_size=50, # cap total active skills)When the rate limit is hit, ARISE skips the evolution cycle and logs a message. Failures continue to accumulate and evolution resumes in the next hour window.
Production Recommendations
Section titled “Production Recommendations”- Set
allowed_imports— start with standard library only, add packages explicitly. - Use Docker sandbox for any workload that processes untrusted input.
- Review promoted skills before deploying — use
arise inspect <id>to read the implementation. - Restrict IAM permissions — read-only S3 for agent processes; write access only for the worker.
- Monitor evolution costs — set
max_evolutions_per_hourand watch cost_tracker output. - Set
max_library_size— prevents unbounded skill accumulation. - Enable OTel tracing with
arise-ai[otel]to observe evolution steps in your existing observability stack.