Skip to content

feat(security): #495 — behavioral SBOM at compile time + perry audit --sbom#953

Merged
proggeramlug merged 1 commit into
mainfrom
feat/495-perry-audit
May 18, 2026
Merged

feat(security): #495 — behavioral SBOM at compile time + perry audit --sbom#953
proggeramlug merged 1 commit into
mainfrom
feat/495-perry-audit

Conversation

@proggeramlug
Copy link
Copy Markdown
Contributor

Closes #495.

Summary

Every Perry compile now writes a per-module behavioral manifest to <project>/.perry-cache/audit.json — a JSON document capturing the stdlib symbols each source module actually calls. The manifest is the foundation for the rest of the supply-chain hardening series (#501, #496, #502) and gives reviewers a way to see exactly what surface a dependency touches without rebuilding the binary.

Zero runtime cost — the walk runs at compile time over the lowered HIR.

Cross-platform — runs in the platform-agnostic compile_command driver, so every backend (LLVM / WASM / ArkTS / HarmonyOS / Glance / SwiftUI / JS) inherits SBOM emission from one choke point.

Example output

main.ts:

import * as fs from "fs";
import * as path from "path";
const data = fs.readFileSync("/etc/hostname", "utf8");
const p = path.join("/tmp", "x");

.perry-cache/audit.json:

{
  "version": 1,
  "modules": [{
    "source": "/repo/main.ts",
    "package": null,
    "stdlib": {
      "fs": ["readFileSync"],
      "path": ["join"]
    }
  }]
}

perry audit --sbom:

Behavioral SBOM (perry audit --sbom)
  manifest version: 1
  modules: 1

== <host source> ==
  /repo/main.ts
    fs: readFileSync
    path: join

JSON shape is versioned (version: 1) and byte-deterministic across builds (BTreeMap keys + sorted method lists), enabling the supply-chain review workflow: perry audit --sbom > before.txt, package.json change, rebuild, diff before.txt after.txt.

Walker

perry-hir::audit::audit_module(&Module, source) walks init + every function body + every class method body, capturing two HIR channels:

  • NativeMethodCall { module, method, .. } — the general-shape variant for stdlib calls after alias resolution.
  • Specialized HIR variants for hot paths (FsReadFileSync, FsWriteFileSync, FsExistsSync, …, PathJoin, PathDirname, PathResolve, …, ProcessEnv, ProcessCwd, ProcessArgv, Process{Stdin,Stdout,Stderr}IsTTY, TtyIsAtty, FileURLToPath) mapped through specialized_stdlib_call() to their equivalent (namespace, method) pair. Without this, a host that only calls fs.readFileSync would appear to make zero stdlib calls, defeating the SBOM — the smoke test in development surfaced this exact gap.

CLI

Adds a --sbom flag to the existing perry audit subcommand. When passed, short-circuits before the remote security-scan call and prints the local manifest instead. Walks up the directory tree to find .perry-cache/audit.json (same shape perry compile walks up to find package.json).

The pre-existing perry audit remote-scan behavior is preserved — only the new --sbom flag activates the SBOM viewer.

Test coverage

9 unit tests in perry-hir::audit::tests cover the walker:

  • empty module → no records
  • top-level + nested-inside-Stmt::If calls recorded
  • duplicate calls deduped, sorted within namespace
  • package-name extraction for unscoped / scoped / nested node_modules/ / user-source paths
  • byte-deterministic JSON serialization

End-to-end smoke confirms the produced audit.json round-trips through perry audit --sbom.

Out of scope (#495 follow-ups documented in code + docs)

The manifest shape's version: 1 field exists so consumers can detect when new top-level keys land.

Acceptance

Notes

No Cargo.toml version bump, no CLAUDE.md version line touch, no CHANGELOG.md entry — maintainer folds those in at merge time.

…t --sbom\`

Every Perry compile now writes a per-module behavioral manifest to
<project>/.perry-cache/audit.json. The manifest captures, for each
source module, the stdlib symbols actually called by the lowered HIR:

  {
    "version": 1,
    "modules": [{
      "source": "/repo/main.ts",
      "package": null,        // or "@scope/pkg" if under node_modules/
      "stdlib": {
        "fs": ["readFileSync"],
        "path": ["join"]
      }
    }]
  }

JSON shape is versioned (`version: 1`); keys are sorted (BTreeMap),
method lists are deduplicated + sorted — so the bytes are
deterministic across builds. `perry audit --sbom > before.txt`,
package.json change, rebuild, diff = a meaningful supply-chain
review tool.

- #501 will consume the SBOM to enforce host-controlled per-package
  capabilities ("this dep must not call child_process.*").
- #496 (--lockdown) will flag violations from the same data.
- #502 (URL/host egress allowlist) will graft `literal_hosts` onto
  the same shape.

The audit walk runs in the platform-agnostic
`compile_command` driver, walking each `Module` in `ctx.native_modules`
*after* `collect_modules` finalizes the dep graph. Every backend
(LLVM / WASM / ArkTS / HarmonyOS / Glance / SwiftUI / JS) inherits
the SBOM emission from one choke point.

`perry-hir::audit::audit_module(&Module, source)` returns a
`ModuleAudit`. Traversal visits `init` + every function body + every
class method body. Two source channels collected:

- `NativeMethodCall { module, method, .. }` — the general-shape
  variant for stdlib calls after alias resolution.
- Specialized HIR variants for hot paths
  (`FsReadFileSync`, `FsWriteFileSync`, `FsExistsSync`, …,
  `PathJoin`, `PathDirname`, `PathResolve`, …,
  `ProcessEnv`, `ProcessCwd`, `ProcessArgv`,
  `Process{Stdin,Stdout,Stderr}IsTTY`, `ProcessStdout{Columns,Rows}`,
  `TtyIsAtty`, `FileURLToPath`) mapped through
  `specialized_stdlib_call()` to their equivalent (namespace, method)
  pair. Without this, a host that only calls `fs.readFileSync` would
  appear to make zero stdlib calls, defeating the SBOM.

Re-uses the existing `perry audit` subcommand (which talks to the
remote security scanner) with a new `--sbom` flag that short-circuits
before the remote call. Reads the manifest, walks up the directory
tree to find `.perry-cache/audit.json` (same shape `perry compile`
walks up to find package.json), groups modules by owning npm package
in text mode, dumps raw JSON in `--format json` mode.

9 unit tests in `perry-hir::audit::tests`:

- `empty_module_has_no_records`
- `top_level_native_call_recorded`
- `duplicate_calls_dedupe`
- `nested_call_recorded` (inside `Stmt::If`)
- `package_name_extracted_from_node_modules_path`
- `scoped_package_name_extracted`
- `nested_node_modules_returns_innermost`
- `user_source_has_no_package`
- `serializes_to_stable_json` — pins the byte-deterministic shape

End-to-end smoke (cleaned up after the test):

- `import fs, path; fs.readFileSync(...); path.join(...)` →
  audit.json correctly captures `"fs": ["readFileSync"]` +
  `"path": ["join"]`.
- `perry audit --sbom` pretty-prints the same data grouped by
  package.

- Literal `fetch`/`http.get` URLs — #502 territory; will graft on as
  a `literal_hosts` key under the same versioned shape.
- Native-library symbol references (FFI registry).
- `perry audit --sbom --diff` — the deterministic JSON shape
  already enables the workflow via plain `diff`; built-in `--diff`
  is a follow-up.

- [x] `.perry-cache/audit.json` written every build
- [partial] Per-module breakdown: stdlib symbols ✓; literal hosts/URLs + native lib symbols deferred (#502, FFI registry follow-up)
- [x] `perry audit --sbom` prints human-readable summary
- [partial] `perry audit --diff` — deterministic JSON enables the workflow today; built-in `--diff` deferred
- [x] Foundation for issues that consume this manifest (#501, #496)
@proggeramlug proggeramlug force-pushed the feat/495-perry-audit branch from 3ed7771 to 92c7ad0 Compare May 18, 2026 11:10
@proggeramlug proggeramlug merged commit b64a157 into main May 18, 2026
@proggeramlug proggeramlug deleted the feat/495-perry-audit branch May 18, 2026 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

security: perry audit — emit behavioral SBOM at compile time

1 participant