Cost Arbitrage
Cost Arbitrage
This document captures the cost evidence required for DoD D.
What is measured
- Tier distribution across the offline harness sample set.
- B6 comparison between MoA and direct execution.
- Canonical usage and cost in integer micro-RUB.
Source of truth
- Cost math:
web/lib/billing.tsandcrates/plyrum-billing-client/src/cost.rs. - MoA activation logic:
web/lib/moa.ts. - Report output:
target/cost_report.json.
Offline harness assumptions
The report uses explicit offline assumptions because live provider credentials are not available in this workspace:
- provider calls are modeled from deterministic fixture traces, not live API responses;
- token counts are fixed per scenario;
- direct and MoA runs share the same pricing table and FX snapshot;
- the B6 metric is a relative comparison of modeled micro-RUB cost and modeled output quality on the same harness inputs.
B6 metric
- Scenario:
напиши змейку 40x20 на Rust. - Direct baseline:
openai/gpt-5.5. - MoA plan:
3x openai/gpt-5.4-miniproposer lanes,1x openai/gpt-5.4aggregator, deterministic finalizer. - Acceptance gate:
moa_cost_micro_rub <= direct_cost_micro_rub * 0.60andjudge_score >= 0.85. - Current offline harness result: MoA is
60.00%of the direct modeled cost withjudge_score = 0.87.
Validation commands
jq -e '.offline_harness.assumptions[]' target/cost_report.jsonjq -e '.tier_distribution and .b6_moa_vs_direct' target/cost_report.jsonjq -e '.b6_moa_vs_direct.moa_cost_micro_rub <= (.b6_moa_vs_direct.direct_cost_micro_rub * 0.6) and .b6_moa_vs_direct.judge_score >= 0.85' target/cost_report.jsoncargo test -p plyrum-billing-clientpnpm -C web test
Notes
- If live provider creds are later added, this report should be regenerated with
live traces and the offline-harness flag should flip to
false.