Wire a reflection loop into your Notion Custom Agent

Wire a reflection loop into your Notion Custom Agent

Most Custom Agents drift mid-run because they skip self-evaluation. Brian Zhang's Notion hackathon win (14 Office characters, under 48 hours) exposed three instruction-engineering patterns PMs can add today: a four-phase reflection loop, goal-based autonomy that replaces rigid scripts, and dynamic context compression to stay under the ~10,000-token ceiling. Also covers the LLM Council four-lens deliberation technique and four production gotchas.

Notion Automation Pro Tips
June 5, 2026 · 11:51 PM
9 subscriptions · 20 items

Research Brief

Most Custom Agents drift within minutes of a complex run. They follow their first interpretation of the task even when the evidence changes, ignore properties they were explicitly told to read, and produce outputs that feel a bit generic — technically correct, but without the character or judgment you described in the instructions.
The fix is instruction architecture, not a model upgrade. Brian Zhang, who won Notion's first developer platform hackathon by simulating all 14 characters from The Office using Custom Agents in under 48 hours, shared three patterns in a Notion interview this week that transfer directly to PM workflows. 1

Prerequisites

RequirementDetail
Notion planBusiness or Enterprise — Custom Agents are not available on Plus or Free 2
CreditsIncluded in Business plan; extra packs at $10 / 1,000 credits 3
Agent typeAny Custom Agent with at least one database tool or built-in Notion search
Time to implement20–30 minutes per pattern; patterns are additive and can be deployed one at a time

Pattern 1: The four-phase loop with a reflection step

The single most common failure mode Brian observed was an agent that skips the "check your own output" phase. His simulation agents ran a four-step cycle on every tick: plan → decide → reflect → adapt. The reflection step is where each agent evaluated its action against its own stated goals and identity before writing any database property. 1
In The Office sim, Brian made each character write a brief "talking head" after each action — a direct inner monologue capturing what they just noticed and how it fit their personality. That forced the agent to surface inconsistencies before they compounded.
For a PM agent (sprint reporter, PRD drafter, risk scanner), wire the same loop into the system prompt instructions:
After completing each step:
1. State what you just did in one sentence.
2. Check: does this output match the goal I was given?
3. If not, list what is missing or off, then redo that step.
4. Only move to the next step after passing your own check.
This is pure prompt engineering — no tools, no Workers code, no additional credits beyond the reasoning itself. Matthias Frank documented the same pattern under the name "Compounding Engineering": each run appends confidence level and knowledge gaps to an Agent Run Log database, creating a quality signal you can review to refine the instructions further. 4
Loading content card…

Pattern 2: Goals instead of scripts

Brian started by giving each character a detailed script — a fixed sequence of actions based on their personality. Agents followed those scripts too literally. When something unexpected happened (another character entered the room, a prop changed state), the agent kept executing the original plan rather than reacting. 1
His fix: replace detailed scripts with high-level goals. Instead of "Step 1: read the sprint board. Step 2: identify blocked tickets. Step 3: write a Slack message," give the agent:
Goal: Surface any sprint risk that a PM would want to know about before the standup.
Constraints: Only include items with a due date within 5 days and status Blocked or At Risk.
Output format: Bullet list, owner + ticket title + one-sentence risk summary.
The difference is that a goal gives the agent room to exercise judgment about which database properties to read, what counts as a risk, and what the PM actually needs — while the constraints prevent the output from sprawling.
Notion's official Plan Mode (released May 7, 2026) looks similar but operates at the user-facing layer: it lets the Notion Agent ask clarifying questions and surface a plan for your approval before executing. 5 Plan Mode is not available as a toggle for Custom Agents. Brian's pattern is the instruction-level equivalent you can implement today.

Pattern 3: Compress context before you hit the ceiling

The Agent SDK alpha imposes a hard limit of approximately 10,000 tokens per Custom Agent call. Brian regularly hit 11,000–12,000 tokens and had to manually trim. By the five-minute mark of his simulation replay, characters started dropping off the board — token overflow caused agents to fail silently. 1 This limit is not publicly documented in Notion's developer docs, but Brian's confirmed rate was consistent across the alpha.
The Notion Help Center best practice is to "keep context tight — point your agent to the specific pages or databases it needs." 6 That's good advice for scope, but it doesn't address what to do when a single run accumulates a long event history over time. Brian's answer was explicit compression: after each action phase, summarize the history into a compact representation before passing it to the next step.
For a PM context, this means structuring your agent's context into two layers:
  • Static context (loaded once per run): team roster, sprint goal, definition of blocked, output template. Store this in an external Notion page and @mention it in the agent prompt. Multiple agents can share the same page — update it once and all agents pick up the change. 4
  • Dynamic context (built during the run): what the agent has read so far, summarized, not appended in full.
The instruction pattern looks like this:
After reading each database page, write a 1-sentence summary of what you learned.
Carry forward only those summaries — not the raw page content.
On a sprint board with 20+ tickets, carrying raw page content into each subsequent step is the fastest way to hit the ceiling. Summaries prevent that accumulation.
Loading content card…

Bonus: LLM Council for complex PM decisions

Anfernee (SolopreneurCode) implemented a four-lens deliberation pattern inside a single Custom Agent — no multi-agent coordination required. 7 The agent runs three sequential phases within one prompt:
  1. First opinions: The agent answers the question four times, each time explicitly adopting one of four roles — The Strategist (long-term positioning), The Operator (speed and execution), The Skeptic (risks and blind spots), The Creator (new angles).
  2. Peer review: The agent re-reads all four answers, anonymized, and ranks them with reasons.
  3. Chairman synthesis: The agent merges the strongest insights, flags unresolved tensions, and ends with one concrete next action.
In a head-to-head test against single-pass analysis of the same roadmap question, the Council surfaced a cannibalization risk (from the Skeptic) and a voice-engine moat idea (from the Creator) that the single-pass approach missed entirely. A 800-word output cap prevents the synthesis from inflating. The cost is roughly 9× the reasoning work of a single pass, so use it for high-stakes decisions, not routine status checks.

Gotchas

No persistent memory across runs. Custom Agents do not carry any state between runs — every run starts fresh. The Compounding Engineering pattern (Agent Run Log database) is the current workaround: write structured output to a Notion database at the end of each run, then point the next run at that database as context. 4
Agents cannot call each other directly. There is no native agent-to-agent invocation. Chain agents through a shared database: Agent A writes a property, Agent B is triggered by a property-edited automation. This introduces latency but is the only supported pattern today. 2
Backlinks are invisible to agents. An agent cannot traverse two-way mentions to discover related pages. Build explicit database relations for any context you want the agent to navigate. 2
The reflection loop costs credits. A four-phase loop that includes a self-check step uses more reasoning than a single-pass prompt. On an agent running dozens of times a day, test credit consumption before deploying to production. Marina Camim, PM for Custom Agents at Notion, noted that context scope and model choice drive cost far more than instruction complexity — narrow the context first, then add reasoning steps. 8
Loading content card…
Cover image: interview still from Notion's LinkedIn post featuring Brian Zhang

Add more perspectives or context around this Post.

  • Sign in to comment.