Research
Methodology2026-05-17

SPECTRE vs Competitor Gap Analysis (2026-05-17)

Author: spectre-solana-max engineering Supersedes: spectre-vs-competitors-gap-analysis-2026-05-16.md Scope: Static analysis, formal verification, audit-firm tooling, and real-time monitoring on Solana, benchmarked against SPECTRE at branch feat/spectre-solana-max tip ebad85d2 (76 commits ahead of main). Method: Web research on competitor feature surfaces refreshed 2026-05-17, cross-referenced with SPECTRE's post-replay-closure state (24/24 exact-rule + 24/24 class-level on the historical- incident replay benchmark; 77 total Solana rules; 7 cross-program rules including CROSS-007 ship today).

What changed since 2026-05-16

The May 16 doc framed SPECTRE as having "the strongest rule pack" but sitting at 20/28 (71%) exact-rule on the replay benchmark. One day of focused rule-shipping closed every remaining ≈ row:

Stage Exact-rule Class-level ≈ remaining
2026-05-16 baseline 21/24 (87%) 24/24 (100%) 3
AUTH-100 body-level ext 22/24 (91%) 24/24 (100%) 2
ACC-021 init-write exception 23/24 (96%) 24/24 (100%) 1
ACC-015 ship (Cashio class) 24/24 (100%) 24/24 (100%) 0

Plus the 2026-05-16 day-of work: CROSS-007 (UXD delegate-risk), ACC-014 (Wormhole sysvar), ITER-001 (Jet $25M whitehat), Bucket-A duplicate-id replay-script merge.

Net rule-pack additions over the 48-hour push: 4 new detectors (ITER-001, CROSS-007, ACC-014, ACC-015), 2 rule-precision extensions (AUTH-100 body-level + ACC-021 init-write exception), 3 corpus snapshots (Raydium AMM v4 architectural reference, plus ITER/CROSS-007 ± fixtures).

Executive summary

SPECTRE now ships the strongest historical-incident replay detection rate in the published Solana static-analysis market — 24 of 24 mapped Solana exploits (Wormhole, Cashio ×2, Mango v3 ×3, Solend ×2, Cypher ×3, Drift v2, Jet v1 ×2, Metaplex CMv1/v2, Metaplex Auction House, Raydium ×2, UXD, SPL Token Lending, …) caught at both exact-rule and architectural-class level. No competitor publishes a comparable benchmark.

The cross-program analysis (8 rules: CROSS-001/002/003/004/005/ 007/010/020 + CROSS-CDF data-flow tracker) remains genuinely unique substrate; no public competitor reasons across multi-program workspaces. CROSS-007 (delegation-of-economic-backing to weakly- gated venue) ships today — the only static detector for the UXD / Tulip / yield-aggregator class.

Plus a multi-stage agentic remediation chain (see "Behind" item 2 below for details) that takes static-analysis findings through investigation → testgen → sandboxed pre-fix verification → LLM remediation → sandboxed post-fix verification → report. Comparable to or stronger than Sec3 Premium's auto-auditor / L3X / Octane in the AI-augmented-static-analysis dimension.

Where SPECTRE is behind (unchanged from May 16 with one major revision — see item 2):

  1. Distribution — no public install path. Single highest-leverage gap separating "research preview" from "Solana devs use it." Sec3 X-Ray, L3X, Solana Fender all have cargo install + GitHub Action.
  2. AI / LLM augmentation: actually a STRENGTH, not a gap (corrected from prior drafts). SPECTRE ships a multi-stage agentic remediation chain that consumes static findings end-to-end: spectre.investigatespectre.testgenspectre.verify_prespectre.remediatespectre.verify_postspectre.report. The chain is implemented in code/agents/ as a Rust workspace with five Redis-Streams-driven agent binaries, each running on Qwen3.6-35b-a3b (currently); tests and remediation patches are executed in docker-sandboxed workspaces and the verify agents confirm the bug both pre- and post-fix before any patch is accepted. This is meaningfully more than what Sec3 Premium's "auto-auditor" (single-pass LLM triage on findings), L3X (multi-LLM cross-validation of patterns), or Octane (per-PR AI reasoning) ship — SPECTRE actually executes the proposed fix and verifies via sandboxed reproduce-then-fix. See code/agents/README.md for the chain architecture.
  3. Formal verification — Certora Solana Prover remains the only production FV option; SPECTRE is rule-pack + agent-chain only, no formal proofs.
  4. Triage workflow — no SARIF output, no suppression markers, no baseline file.

Note on Sec3 OwLLM. Sec3 separately ships OwLLM — an open-source LLM trained on millions of historical on-chain transactions (94% MEV detection accuracy). OwLLM is a run-time / TX-monitoring product (alongside Hexagate / Forta), NOT the AI inside X-Ray and NOT a SPECTRE competitor.

The single highest-leverage gap is still distribution. The rule pack and methodology are now demonstrably best-in-market by the only published measurable benchmark; the wrapper around them isn't yet dev-grade.

Competitor inventory (refreshed 2026-05-17)

Static analysis (direct competitors)

Tool OSS Layer Approach Distribution Rule count AI Published bench
Sec3 X-Ray (github) yes LLVM-IR data-flow + symbolic cargo install + GH Action + hosted pro.sec3.dev 50+ no no
Sec3 X-Ray Premium (blog) no LLVM-IR + AI auto-auditor data-flow + AI triage / explanation hosted 50+ + auto-auditor yes no
L3X (VulnPlanet/l3x) yes Rust AST + LLM pattern + AI cross-validation CLI ~20 + LLM-augmented yes no
Solana Fender (honey-guard) yes Rust AST pattern CLI small / Anchor-only no no
Sol-azy (fuzzinglabs) partial sBPF disassembly static + RE CLI RE-oriented no no
Eloizer (Inversive-Labs) yes Rust AST pattern CLI research no no
Octane no Rust AST + AI per-PR semantic hosted unknown yes no
CodeQL / Semgrep partial generic pattern GH Action + cloud minimal Solana no no
SPECTRE (this) not yet public Rust AST + symbol table + cross-program linker + tree-sitter file-walks + 5-stage LLM agent chain pattern + cross-program trust-posture comparator + TS↔Anchor cross-language + sandboxed reproduce-then-fix none yet 77 (55 Solana + 22 generic) yes (Qwen agent chain: investigate / testgen / verify_pre / remediate / verify_post) yes (100% / 100% on 24-incident replay)

Formal verification

  • Certora Solana Prover (SCP) (CertoraProver) — decompiles SBF to Certora IR, full formal proofs. Open-sourced. Secures $75B+ in DeFi. Per-property harness authorship required. Different layer from SPECTRE: per-function logic against written spec vs codebase-wide architectural patterns.

Audit firms

  • OtterSec — 120+ audits, $36B TVL. Formal verification + differential fuzzing + incident response. Internal tooling.
  • Zellic, Halborn, Trail of Bits, Neodyme — manual + proprietary.

Real-time monitoring (adjacent, not direct)

  • Hexagate (Chainalysis) — tx simulation + ML, custom Gatelang. Run-time, 75+ chains.
  • Forta — decentralized monitoring network.

Feature matrix (refreshed)

Dimension SPECTRE (today) Sec3 X-Ray open Sec3 Premium L3X Solana Fender Certora SCP OtterSec
Solana rules 55 50+ 50+ ~20 small n/a n/a
Native Solana yes yes yes yes no yes yes
Anchor yes yes yes yes yes yes yes
Cross-program analysis 8 dedicated rules no partial no no per-protocol harness manual
TS-client ↔ Anchor handler yes no no no no no manual
AI / LLM (static layer) 5-stage agent chain (investigate / testgen / verify_pre / remediate / verify_post) no auto-auditor (single-pass) multi-LLM cross-validation no no partial
Sandboxed reproduce-then-fix yes (docker sandbox per stage) no partial (auto-auditor doesn't execute) no no n/a manual
AI / LLM (tx-monitoring layer) no (out of scope) no OwLLM (separate Sec3 product) no no no no
FV no no no no no yes yes
Differential fuzzing no no no no no no yes
Real-time monitoring no no no no no no no
Historical-incident replay 24/24 exact-rule + class-level (100% / 100%) none published none published none published none published none published none published
Per-rule F1 published yes no no no no no no
Open source not yet yes no yes yes yes no
Cargo install no yes n/a yes yes yes n/a
GitHub Action no yes yes yes yes yes n/a
SARIF no unclear unclear unclear no no n/a
Baseline / suppress no yes yes yes no yes n/a
Hosted UI no no yes no no no n/a

SPECTRE's genuine differentiators (sharpened)

Five pieces of substrate no public competitor matches:

  1. Cross-program analysis with trust-posture comparator. 8 dedicated rules now: CROSS-001 (trust downgrade) + CROSS-002 (missing program-id verification) + CROSS-003 (Token-2022 extension propagation) + CROSS-004 (account-binding drift across CPI) + CROSS-005 (signer-privilege forwarding) + CROSS-007 (delegation to weakly-gated venue, shipped 2026-05-16) + CROSS-010 (multi-hop chain reasoning) + CROSS-020 (2-hop reentrancy cycle) + CROSS-CDF (write-then-cross-program-read data-flow). Every other public tool reasons program-by-program.

  2. Historical-incident architectural-fingerprint replay at 100% / 100%. Each of 24 distinct mapped Solana incidents (after duplicate-id merge) ships with an architectural_fingerprint of SPECTRE rule IDs that should fire on pre-hack source. The replay benchmark scans the mapped corpus snapshots and reports exact-rule

    • class-level detection. Every mapped incident now fires at both exact-rule and class-level. No comparable methodology published by any competitor.
  3. TypeScript-to-Anchor cross-language analysis (META-001 + META-002). Traces from a TS client's program.methods.xxx() call into the Anchor handler it invokes; cross-language rules check whether the client's call-site assumptions match the handler's constraints. Unique to SPECTRE.

  4. Framework-agnostic file-walking rule layer. ITER-001 (Jet sentinel-iteration), ACC-014 (Wormhole sysvar), ACC-015 (Cashio untied typed account) all walk .rs files with their own tree-sitter pass and don't depend on the Anchor / native extractor's symbol-table coverage. Catches bugs in pre-Anchor-0.30 programs (Solitaire / anchor_comp), nested-struct shapes the graph-based rules miss, and forks of all of the above. No competitor has a comparable file-walking second-tier.

  5. Multi-stage agentic remediation chain. code/agents/ is a Rust workspace implementing a five-binary chain consuming Redis Streams: pinpoint-investigation-agent (LLM analyzes finding + graph neighborhood + blast radius), pinpoint-testgen- agent (LLM emits a failing test for the finding), pinpoint-verify-agent (sandboxed test execution: confirms the bug reproduces pre-fix), pinpoint-remediation-agent (LLM generates patch), pinpoint-verify-agent (sandboxed test execution post-fix: confirms the bug is gone). Uses Qwen3.6-35b-a3b currently; the pinpoint-agent-runtime crate abstracts the LLM client so model swapping is a single config change. Docker sandbox per stage. Idempotency store on Postgres. S3 artifact uploads per stage. Prometheus metrics on ports 9091-9094. No competitor public tool reproduces this end-to-end: Sec3 Premium's auto-auditor is single-pass LLM triage on existing findings (no test generation, no sandboxed verification, no patch execution). L3X uses LLMs to cross-validate pattern matches (no remediation). Octane does AI reasoning per PR (no automated verify-pre / verify-post loop). Only OtterSec's internal tooling reportedly does anything comparable, and it isn't public.

SPECTRE's actual gaps (ranked)

Tier 1: blocks adoption today (unchanged)

  1. Distribution / install path. Same as 2026-05-16. ~1 week for cargo install pinpoint-spectre + prebuilt binary release + GitHub Action.
  2. SARIF output. ~1 day; PR annotations.
  3. Suppression + baseline. ~1 week; inline // spectre-allow: markers + .spectre-baseline.json for PR-diff scanning.

Tier 2: feature parity for serious bake-offs

  1. AI / LLM marketing surface. Not an engineering gap — the agent chain already exists and is materially deeper than what Sec3 Premium / L3X / Octane ship. The gap is visibility: no public docs front the agent chain, the model swap is undecided, no public demo of reproduce-then-fix. ~1-2 weeks of doc + demo work, no new engineering.
  2. Hosted scan UI. Sec3's pro.sec3.dev. ~3-4 weeks.
  3. Bug-bounty marketing channel. Gated on T1.1.

Tier 3: orthogonal capabilities

  1. Formal verification → Certora's layer (open-sourced; "buy not build"). FV integration sketch shipped in spectre-certora-fv-integration-2026-05-16.md.
  2. Differential fuzzing → OtterSec's internal layer. Integration sketch shipped in spectre-differential-fuzzing-integration-2026-05-16.md.
  3. Real-time monitoring → Hexagate's layer. Integration sketch shipped in spectre-hexagate-gatelang-exporter-2026-05-16.md.

Tier 4: substrate followups

  1. Nested #[derive(Accounts)] struct extraction. The Anchor extractor doesn't recurse into nested structs (Cashio's arrow is 3 levels deep). ACC-015 works around this with file-walking, but ACC-013 / ACC-020 / ACC-021 / ACC-030 still have the same blind spot. ~1 week of extractor work would unblock all four.

What's actually defensible today

A protocol team's path to value with SPECTRE today:

  • For continuous architectural-pattern detection in CI. The 100% / 100% replay coverage on the only published Solana benchmark is the strongest reproducible number in the market. No tool catches more of the historical Solana exploit corpus.

  • For multi-program workspaces (Kamino, Drift v2, Jet v1, Cypher, Cardinal, MarginFi v2, Squads, Metaplex). 8 cross-program rules catch composition bugs structurally invisible to single-program tools.

  • For Anchor + TypeScript client codebases. META-001 / META-002 cross-language analysis is unique.

  • For pre-Anchor-0.30 / Solitaire / anchor_comp programs. File- walking rules (ITER-001, ACC-014, ACC-015) reach programs that the Anchor symbol-table extractor doesn't fully model.

Honest read (revised)

The 2026-05-16 read said "rule pack is at or above market; distribution is the only thing separating research preview from Solana devs use it." With the 24-hour push that closed the replay benchmark to 100% / 100%, the rule pack is now demonstrably best-in-market by the only published measurable benchmark. Distribution is still the only gating item.

Major revision (this session): SPECTRE already has AI augmentation, and it's deeper than the visible competitors. The 5-stage agent chain (code/agents/) takes static findings through investigation, testgen, sandboxed pre-fix verification, LLM remediation, and sandboxed post-fix verification. This is more than Sec3 Premium's auto-auditor (single-pass LLM triage), L3X's multi-LLM cross-validation (no remediation), or Octane's per-PR reasoning (no automated verify loop). The agent chain runs Qwen3.6-35b-a3b today; the runtime abstraction makes model swapping a config change. Sec3's OwLLM is a separate transaction-pattern LLM (MEV detection, TX monitoring) that lives one layer over in the run-time space and isn't a SPECTRE competitor.

What's actually missing on the AI front:

  • The agent chain isn't part of the published competitive narrative. Internal product, no marketing surface yet. The competitor docs all front their AI features; SPECTRE doesn't.
  • Public model choices. Qwen3.6-35b-a3b is a defensible workhorse but the marketing-friendly choice (Claude 4.x, GPT-5, Gemini 2.5) might shift perception. Runtime abstraction makes this a config-change, not engineering work.

The historical-incident replay methodology asset is now load- bearing for SPECTRE's positioning. Publishing the methodology (under documents/audits/methodology/, with reproducible runner/replay_incidents.py and corpus manifest schema) — and ideally challenging competitors to score against the same corpus — would convert the metric into a market standard. Today no other tool can claim a single-digit detection rate on the SPECTRE corpus, because no other tool has been measured against it.

References (refreshed 2026-05-17)