# Feedback Loop Template

**Level:** 4  
**Best for:** systems that should improve over time, not just execute  
**Use with:** OpenClaw or any agent system that can log outcomes, re-read them on a schedule, and update its own strategy

---

## What this is

Level 3 agents follow your strategy. Level 4 agents notice when the strategy is wrong and change it.

That sounds like a lot. The actual mechanism is simpler than most people expect, and smaller too.

Most self-improving agents are not rewriting their own code. They are doing something more like this: after every run, the agent logs what happened. On a schedule, it re-reads those logs, looks for patterns, and updates its own playbook. That's the loop.

The template below gives you the formula. Six components. Without all six, it drifts.

---

## How to use this

1. **Decide what you want the system to get better at** — be specific. "Get better at growth" is not a metric. "Increase reply rate on follow-up messages" is.
2. **Fill in the log format first** — this is the most skipped step and the one that breaks the loop. If the agent doesn't log the right signals, the review has nothing to work with.
3. **Set the review cadence** — weekly is usually right for most systems. Daily is too noisy. Monthly is too slow to catch problems.
4. **Define what can change automatically and what can't** — this is the guardrail that keeps Level 4 from drifting.

---

## The formula

### 1. What gets logged (per run)

This is what the agent writes to memory or a structured log after every execution. It's what the review will read.

```
RUN DATE: [date]
TASK: [what the agent was doing]
WHAT HAPPENED: [short factual summary — not evaluation, just what occurred]
OUTCOME SIGNAL: [the metric you care about — reply rate, conversion, engagement, accuracy]
WHAT WORKED: [one or two specific things that produced good signal]
WHAT FAILED: [one or two specific things that produced weak or bad signal]
NOTES: [anything unusual — anomaly, external factor, one-off that shouldn't influence strategy]
```

If you skip this and just log a success/fail flag, the review agent has no raw material. It can't learn from "success" alone.

### 2. The review (what the agent re-reads and asks)

On a schedule, the agent pulls the last N runs and looks for patterns:

```
REVIEW WINDOW: last [7 / 14 / 30] runs
QUESTIONS TO ASK:
- Which conditions consistently produced strong signal?
- Which conditions consistently produced weak signal?
- Is there a pattern that didn't exist in the previous review?
- Did anything that worked before stop working?
- What should change based on this?
```

The output of the review is not a report. It is a proposed change to the strategy — specific and small.

### 3. What can be updated (the update buckets)

```
PROMPT / STRATEGY:
- [what the agent is trying to do]
- [tone, approach, angle]
- [priorities and what gets attention first]

TIMING / ROUTING:
- [when the agent runs]
- [which platform or channel gets focus]
- [how traffic or effort is distributed]

ATTENTION ALLOCATION:
- [which inputs are prioritized]
- [which segments, sources, or categories get more weight]
```

Updates go into one of these three buckets. If a proposed change doesn't fit, it probably needs human review, not automatic application.

### 4. What cannot be auto-updated (the guardrails)

```
LOCKED:
- [core voice, tone rules, brand constraints — the things that define the work]
- [any action that is irreversible: send, post, delete, charge, publish]
- [anything that affects account security or access]
- [any rule that exists for a reason external to performance — legal, ethical, relationship-based]
```

These never change automatically. If the review suggests touching one of these, it flags it for human review instead.

### 5. The rollback trigger

```
ROLLBACK CONDITION:
If [specific metric] drops [X%] over [N review cycles], revert to the previous strategy version.
Rollback is automatic. Re-activation requires human review.
```

### 6. The review cron skill prompt

This is what you give the agent that runs the weekly review. It reads the log and writes the updates back into the skill file.

```
You are the weekly review agent for [SYSTEM NAME].

Your job: read [LOG FILE] for the last [N] days and update the Self-Evolution section
of [SKILL FILE].

Read the log entries and find:
- Which [format / approach / channel] consistently produced [your quality signal]?
- Which produced weak or no signal?
- Did anything change compared to the previous review cycle?
- What is one thing worth trying differently next cycle?

Then rewrite these sections inside [SKILL FILE]:

### What's Working
[specific patterns with evidence — name the format, the channel, the number]

### What's Not Working
[specific patterns with evidence — same standard]

### Next Experiments
[1–3 concrete things to try next cycle, not vague directions]

### [Your metric] Actuals (Rolling [N] Days)
[real numbers vs. targets for each tracked category]

Rules:
- Never touch [list your locked sections]
- Be specific. Vague updates ("engagement was mixed") are useless — name what drove it
- If the data window is too small to draw a conclusion, say so and don't guess
- Write for the agent that reads this on its next run, not for a human report
```

---

## Example: Social Strategy Feedback Loop

A social posting bot runs daily on X and Bluesky. A weekly review cron reads the posting log and rewrites four sections inside the skill file. The next run reads the updated file. The behavior changes because the spec changed.

**POSTING_LOG.md — what the log actually looks like after one week:**

```
DATE: 2026-03-10
PLATFORM: Bluesky
FORMAT: sequence
POST_TEXT: "Claude Code from zero: Install -> first project -> 12 projects -> prompting..."
ENGAGEMENT: likes=14, replies=6, follows_gained=3
QUALITY_SIGNAL: high
NOTES: thread follow-up got 2 replies asking for the article link

DATE: 2026-03-10
PLATFORM: X
FORMAT: sequence
POST_TEXT: "Claude Code from zero: Install -> first project -> 12 projects -> prompting..."
ENGAGEMENT: likes=3, replies=0, follows_gained=0
QUALITY_SIGNAL: low
NOTES: credit limit hit mid-day, reach may be suppressed

DATE: 2026-03-11
PLATFORM: Bluesky
FORMAT: tip
POST_TEXT: "75 things to smoke test. Saved me from shipping 3 bugs. [link]"
ENGAGEMENT: likes=9, replies=2, follows_gained=1
QUALITY_SIGNAL: high
NOTES: none

DATE: 2026-03-11
PLATFORM: X
FORMAT: tip
POST_TEXT: "75 things to smoke test. Saved me from shipping 3 bugs. [link]"
ENGAGEMENT: likes=1, replies=0, follows_gained=0
QUALITY_SIGNAL: low
NOTES: none

DATE: 2026-03-12
PLATFORM: Bluesky
FORMAT: milestone
POST_TEXT: "Just hit 5,500 subscribers. Started with 0 and a 30-project challenge."
ENGAGEMENT: likes=22, replies=8, follows_gained=5
QUALITY_SIGNAL: high
NOTES: 3 replies from builders sharing their own milestones — engagement was genuine
```

**SKILL.md — the Self-Evolution section after the weekly cron runs:**

```
## Self-Evolution
(Last updated: 2026-03-17 by posting-weekly-review cron)

### What's Working
- Bluesky: sequence and milestone formats consistently produce high-quality signal (replies
  from real builders, meaningful follows)
- Thread-style posts on Bluesky outperform single posts 3:1 on reply quality
- Milestone posts drive the highest engagement on both platforms — use sparingly
- Active engagement (replies, follows) correlates with follower quality gains

### What's Not Working
- X: all formats producing low-quality or zero engagement this cycle
- X credit limits suppressing reach — ROI is poor relative to Bluesky effort
- Promotional posts (direct product pushes without story) getting no traction anywhere

### Next Experiments
- Shift primary posting effort to Bluesky: target 2 posts/day, reduce X to 1
- Test thread format more: break long tips into 2-3 connected posts
- Try opening with a stat or number on Bluesky — stat-hook format hasn't been tested there yet

### Content Mix Actuals (Rolling 7 Days)
- Resource sharing: 45% (target 40%) — slightly over, pull back
- Building in public: 30% (target 20%) — milestone posts driving this up, OK for now
- Original tips: 15% (target 20%) — slightly under
- Engagement: 10% (target 10%) — on target
```

**The review cron prompt (filled):**

```
You are the weekly review agent for the Build to Launch social posting bot.

Your job: read POSTING_LOG.md for the last 7 days and update the Self-Evolution
section of social-posting/SKILL.md.

Read the log entries and find:
- Which post formats (sequence, tip, milestone, thread, question) consistently
  produced QUALITY_SIGNAL: high?
- Which produced QUALITY_SIGNAL: low or no engagement?
- Did either platform shift compared to last week's pattern?
- What is one format or approach worth testing next cycle?

Then rewrite these four sections inside SKILL.md under ## Self-Evolution:

### What's Working
[formats and platforms producing high-quality signal — name the format, platform, number]

### What's Not Working
[formats and platforms producing weak or no signal — same standard]

### Next Experiments
[1–3 specific things to try next cycle — format, platform, timing, or approach]

### Content Mix Actuals (Rolling 7 Days)
[real percentages per content type vs. targets in the Content Mix table]

Rules:
- Never touch: Voice section, Engagement Strategy approval rules, Formatting Rules,
  max posts per day limit, account credentials
- Be specific. "Bluesky sequence posts averaged 4 replies vs. 0 on X" beats
  "Bluesky performed better"
- If fewer than 5 log entries exist, note the thin data and make no changes to
  Next Experiments
- Write for the bot that reads this file tomorrow morning, not for Jenny
```

**What stayed locked — never touched by the cron:**
```
LOCKED:
- Voice section (tone, personality, brand rules)
- Any action requiring approval: replies, quote tweets
- Max posts per day hard limit
- Account credentials and security settings
```

**Rollback condition:**
```
ROLLBACK CONDITION:
If Bluesky follows_gained drops more than 30% over 2 consecutive review cycles,
cron stops writing to Next Experiments and flags for human review instead.
Autonomous updates resume only after Jenny confirms.
```

---

## What changed between Level 3 and Level 4 here

At Level 3, the agent had a strategy: post on both platforms, engage, grow.

At Level 4, the agent noticed its own strategy was wrong — not because it was told, but because it read its own logs and found the pattern. The strategy update came from the loop, not from a human.

The loop itself is simple. What makes it Level 4 is not the complexity. It's that the system now has the raw material to improve itself — and the guardrails to do it without drifting.

---

## Why the log format is the most important part

If the agent only logs pass/fail, the review is useless. "It worked" tells the system nothing about why, so there's nothing to optimize.

The log needs enough signal to answer: what specifically drove that outcome? If the agent can't answer that from the log, the loop will keep running and learning nothing.

Start small. Four fields per run: what happened, what the signal was, what seemed to drive it, what didn't. That is enough to start finding patterns.
