Automatic Prompt Token Budgeting

Sidecar automatically reduces prompt size during task prompt compilation to lower token usage while preserving prompt quality.

Beta: Prompt token budgeting is currently in beta and may evolve as we tune trimming behavior and metadata.


What it does

When Sidecar compiles an execution brief (sidecar prompt compile or task execution flow), it:

  • Estimates prompt size.
  • Keeps required execution-critical sections.
  • Deduplicates repeated items.
  • Trims optional high-volume context sections only when needed.
  • Adds concise overflow references (for example: + 12 more ... (see task packet for full list)).

Required sections always preserved

  • Task
  • Objective
  • Why this matters
  • Constraints
  • Validation
  • Definition of done
  • Runner guidance
  • Final response format

Optional sections that may be trimmed

  • In scope
  • Out of scope
  • Read these first (files_to_read)
  • Avoid changing (files_to_avoid)
  • Linked decisions
  • Linked notes

Budget behavior (current defaults)

  • Target prompt budget: 1200 estimated tokens
  • Safety ceiling: 1500 estimated tokens

If the baseline prompt is under budget, no trimming happens.

Where to see optimization results

Prompt optimization metadata is attached to run records and prompt compile output:

  • prompt_tokens_estimated_before
  • prompt_tokens_estimated_after
  • prompt_budget_target
  • prompt_trimmed_sections

Example output (JSON)

prompt_optimization json
{
  "prompt_optimization": {
    "estimated_tokens_before": 5881,
    "estimated_tokens_after": 774,
    "budget_target": 1200,
    "budget_max": 1500,
    "trimmed_sections": [
      "in_scope",
      "out_of_scope",
      "files_to_read",
      "related_decisions",
      "related_notes"
    ]
  }
}

Related docs