Benchmarks
Black-box claims score lower. TryAnneal ships a reproducible benchmark — anyone can run it and get the same precision / recall.
Results
| Contract | Exploit analog | Losses | Detected |
|---|---|---|---|
| MinterestVuln.sol | Minterest Jul 2024 (Mantle) | $1.4M | ✅ HIGH |
| EulerDonation.sol | Euler Mar 2023 | $197M | ✅ HIGH |
| NomadInit.sol | Nomad Aug 2022 | $190M | ✅ HIGH |
| LayerZeroDVN.sol | KelpDAO Apr 2026 | $292M | ✅ HIGH |
| Clean1.sol | — | — | ✅ CLEAN |
| Clean2.sol | — | — | ✅ CLEAN |
Gas optimization — measured before/after
The gas profiler’s saving estimates are no longer hand-waved. Each Mantle technique it advertises has a real naive/optimized contract pair that we compile with solc 0.8.24 and run through the engine’s own computeFee Arsia model. The saving is measured on the L1-data fee — the FastLZ-driven, size-dependent component these techniques actually move on Mantle. L2 execution and operator fees are held fixed across both sides so the comparison isolates the size win.
| Technique | L1 before (MNT) | L1 after (MNT) | Measured saving |
|---|---|---|---|
| calldata_packing | 0.000000000540 | 0.000000000506 | 6.3% |
| batch_operations | 0.000000002052 | 0.000000000205 | 90% |
| storage_layout | 0.000000000971 | 0.000000000901 | 7.2% |
constants to drop the constructor SSTOREs from the deploy init code (608→568 bytes). Numbers are the verbatim output of pnpm --filter @tryanneal/engine benchmark:gas.Reproduce it
Every fixture runs runAudit({ noLlm: true }) — Slither + Aderyn + corpus only, no API keys, deterministic across runs. That’s the point: the verdict isn’t a black box.
Methodology + the committed results live in packages/engine/benchmarks.