Decks AI Assisted Engineering · Backpressure
← → navigate · N notes · M menu · P read mode

Decks AI Assisted Engineering

Backpressure · Part 1

Manufacturing backpressure in coding agents.

How to slow a fast agent down until its work is proven correct, using sensors, gates, and friction you add on purpose.

scroll to read P to present M contents

The problem

Fast output is not the hard part.

Coding agents can produce a lot of code quickly. That is useful, but it creates a simple problem: how do you know the work is correct before the agent moves on?

Backpressure is how the harness says not yet, and here is why.

Where you have met it

In a server, backpressure is friction you remove.

A fast producer, a slower consumer, and a bounded buffer between them. When the buffer fills, the system sends a "slow down" signal upstream.

  • It shows up in TCP flow control, rate limits, and bounded queues.
  • Every request is value a user wanted served, so blocking or dropping it has a cost.
  • The ideal server has enough capacity that backpressure is rare.

But in a coding agent harness, that same friction flips value.

The inversion

The first output is provisional.

A harness is the tools, context, and checks around a model that writes code. Its raw output may be right, or it may be plausible code heading the wrong way.

  • Build on a wrong change and the cost of fixing it grows.
  • Backpressure makes the agent stop at useful points and check before continuing.
  • It aligns progress to the rate at which work can be proven.

Same shape, inverted value

Verification is the slow consumer.

flowchart LR A[Agent: fast producer] -->|generates code| B[(Unverified work: bounded buffer)] B --> C[Verification and trust: slow consumer] C -.->|not yet, here is why| A C -->|proven| D[Done]

The agent produces fast; verification drains slowly and paces it.

Make it concrete

Say the agent changes an API endpoint.

  • A build tells us whether the code still compiles.
  • A test tells us whether the endpoint still behaves.
  • A lint check tells us whether it follows local rules.
  • A reviewer prompt tells us whether it fits the architecture.

Those are readings. The useful part is what happens next: a red build means the agent cannot call the work done.

Two actors

The human's leverage is the checks.

The harness reads, writes, and runs code. It can see the code's text with ordinary tools, but text alone does not reveal whether the work is correct.

Visible

What text shows

Read a file, grep for a symbol, list a directory. The agent perceives this directly.

Hidden

What needs a reading

Does it build? Do tests pass? Does it hold to the architecture? These need an instrument.

So the team's job is to give the harness the checks that reveal the hidden properties.

Part two · perception

Sensors take a reading.

A sensor is a tool that returns an assessment of a property of the work. All sensors are tools, but not all tools are sensors.

What a tool does

An assessment, not data and not a change.

Tool kindPurposeExampleSensor
Effectorchange the worldwrite, edit, run a migrationno
Native perceptionread directly-visible stateread file, grep, list dirno
Sensormeasure a hidden property, return an assessmentbuild, test, lint, type check, reviewer subagentyes

bash running rm is an effector. bash running npm test is a sensor. The output is an assessment, not a change.

Two classes of sensor

Proof, or a fallible opinion.

Deterministic

Runs a repeatable tool

Build, test, lint, and type check sit here. Objective and repeatable. It yields proof.

Inferential

Judges against a rubric

A reviewer subagent checking an architecture pattern. It yields a useful but fallible opinion.

Assessment plus evidence

A reading is only as good as the correction it provokes.

The assessment

The pass-or-fail answer to a question the agent could not answer by reading the code.

The evidence

What makes the assessment useful. A failed build prints file, line, and message, enough to act on.

AspectDeterministicInferential
Trustproofadvisory opinion
Catcheswill not compile, test redwrong abstraction, misread intent, taste

Part three · policy

A gate turns a reading into backpressure.

A reading on its own is only information. Backpressure appears when a gate refuses to let work advance on a red reading.

Keep them separate

Perception is not policy.

S

Sensor

Only reads. It is perception. One sensor can feed several gates.

G

Gate

Consumes a reading and resists progress. It is policy. You can change it without touching the sensor.

B

Backpressure

What the agent feels when a gate refuses: not yet, and here is why.

Note a gate is only as strong as its enforcement: the paved path must beat the shortcut.

How strong is the push

Coverage, strictness, trust.

The strength of the backpressure depends on three things working together.

  • Coverage — how much of what matters is sensed at all.
  • Strictness — how much is enforced rather than suggested.
  • Trust — how much each reading is worth, where proof outweighs opinion.

The stronger these are, the harder it is for wrong work to appear finished while proven work keeps moving.

Two composition rules

Cheap checks first, no ungated path to done.

flowchart LR Human[Human or team] -->|builds sensors and gates| Kit[Sensors and gates] Agent[Agent harness] -->|runs sensors| Kit Kit -->|reading: assessment and evidence| Agent Kit --> Gate{Gate: may it proceed} Gate -->|no: not yet, here is why| Agent Gate -->|yes| Done[Proven and done]

Humans build the checks; the agent runs them and is paced by the gate.

  • Run the cheap deterministic sensors first; the expensive review later.
  • Leave no ungated path to done, or the agent routes around every sensor.

Part four · what prompts them

Different gaps prompt different objects.

A sensor answers "did we know?" A gate answers "did knowing change anything?" Most confusion comes from conflating the two.

Signals for a sensor

When you are the sensor, again.

Every sensor removes a human from a feedback loop they should not be standing in. Five signals to watch for.

  • An escaped defect reached review or production. The strongest signal.
  • The same correction by hand, repeated. You are the sensor.
  • A near-miss caught late that could easily have slipped.
  • The agent flails, burning tokens because no reading tells it it is off track.
  • A high-stakes invariant: auth, tenant isolation, money, data integrity.

Diagnostic verification gap wants a sensor; a knowledge gap wants a guide instead.

Signals for a gate

A reading that exists but does not bite.

  • The reading is there and gets ignored: move it from advisory to enforced.
  • The cost of escape is high or irreversible: strictness tracks blast radius.
  • Trust has risen enough to block on it.
  • There is an ungated path to done.

The inverse matters too: do not over-gate. A flaky sensor that hard-blocks trains everyone to bypass it, worse than no gate.

The sequence

Sense first. Gate later.

Cheap & safe

Adding a sensor

A new sensor can run in advisory mode and teach you something at low risk. One engineer, in the moment.

Friction for all

Hardening a gate

Blocking friction slows everyone, so it is a team act: harden once you trust the reading and the cost warrants it.

"We need a gate" usually means promote a reading we already have, not build something new.

The basic rule

Add a sensor when you need a reading. Add a gate when failures still get through.

Start by sensing. Gate only when the signal is trusted and the cost of letting the issue through is high enough.

Read the full post

This deck is the concepts. The follow-up covers how sensors and gates are defined, where they live, and how enforcement is wired.