← Back

DOCS

Документация

Cost Arbitrage

Cost Arbitrage

This document captures the cost evidence required for DoD D.

What is measured

  • Tier distribution across the offline harness sample set.
  • B6 comparison between MoA and direct execution.
  • Canonical usage and cost in integer micro-RUB.

Source of truth

  • Cost math: web/lib/billing.ts and crates/plyrum-billing-client/src/cost.rs.
  • MoA activation logic: web/lib/moa.ts.
  • Report output: target/cost_report.json.

Offline harness assumptions

The report uses explicit offline assumptions because live provider credentials are not available in this workspace:

  • provider calls are modeled from deterministic fixture traces, not live API responses;
  • token counts are fixed per scenario;
  • direct and MoA runs share the same pricing table and FX snapshot;
  • the B6 metric is a relative comparison of modeled micro-RUB cost and modeled output quality on the same harness inputs.

B6 metric

  • Scenario: напиши змейку 40x20 на Rust.
  • Direct baseline: openai/gpt-5.5.
  • MoA plan: 3x openai/gpt-5.4-mini proposer lanes, 1x openai/gpt-5.4 aggregator, deterministic finalizer.
  • Acceptance gate: moa_cost_micro_rub <= direct_cost_micro_rub * 0.60 and judge_score >= 0.85.
  • Current offline harness result: MoA is 60.00% of the direct modeled cost with judge_score = 0.87.

Validation commands

  • jq -e '.offline_harness.assumptions[]' target/cost_report.json
  • jq -e '.tier_distribution and .b6_moa_vs_direct' target/cost_report.json
  • jq -e '.b6_moa_vs_direct.moa_cost_micro_rub <= (.b6_moa_vs_direct.direct_cost_micro_rub * 0.6) and .b6_moa_vs_direct.judge_score >= 0.85' target/cost_report.json
  • cargo test -p plyrum-billing-client
  • pnpm -C web test

Notes

  • If live provider creds are later added, this report should be regenerated with live traces and the offline-harness flag should flip to false.