Agent Rules That Actually Work

You’ve got an AI coding agent. You’ve given it access to your codebase. It’s productive for about 20 minutes — then it console.logs in production code, installs a dependency with npm instead of bun, queries your database with raw psql, and commits without running the linter.

The problem isn’t the model. It’s that you haven’t told it how your project works.

I maintain a set of rules for AI agents across my codebase. Not prompts — persistent, versioned, synced rule files that every agent session loads automatically. Here’s what I’ve learned about writing rules that agents actually follow.

The Setup: One Source of Truth

All my rules live in .rulesync/rules/ at the repo root. Each file covers one concern. A sync tool rulesync distributes them to wherever each agent expects them — CLAUDE.md for Claude Code, .cursor/rules/ for Cursor, .github/copilot-instructions.md for Copilot.

.rulesync/rules/
├── 03-code-style.md        # TypeScript, React, import conventions
├── 05-patterns.md          # Canonical code examples
├── 07-donts.md             # Hard prohibitions
├── 09-ai-guidelines.md     # Behavioral rules for thinking
├── 11-code-quality.md      # Pre-commit ritual
├── 13-agent-cli-tools.md   # Purpose-built CLI tools
├── 15-dependency-management.md
├── 16-logging.md           # Structured event shapes
└── ...

.rulesync/rules/
├── 03-code-style.md        # TypeScript, React, import conventions
├── 05-patterns.md          # Canonical code examples
├── 07-donts.md             # Hard prohibitions
├── 09-ai-guidelines.md     # Behavioral rules for thinking
├── 11-code-quality.md      # Pre-commit ritual
├── 13-agent-cli-tools.md   # Purpose-built CLI tools
├── 15-dependency-management.md
├── 16-logging.md           # Structured event shapes
└── ...

You edit one place. Every agent gets the update. Rules are versioned in git alongside the code they govern.

You are free, no lock-in, can switch agents whenever you like.

What Makes a Good Rule

After months of iteration, I’ve found that effective agent rules share a few properties.

1. Be Specific and Absolute

Agents don’t do well with “prefer X over Y.” They do great with “NEVER do X. ALWAYS do Y.”

Bad:

Try to use bun instead of npm when possible.

Try to use bun instead of npm when possible.

Good:

NEVER use npm, yarn, or pnpm. ALWAYS use bun. This is non-negotiable.

NEVER use npm, yarn, or pnpm. ALWAYS use bun. This is non-negotiable.

My “Don’ts” file is a list of 7 absolute prohibitions. No wiggle room, no judgment calls:

bun only — no npm/pnpm/yarn
TypeScript only — no .js files
No any types
Structured logging only — no console.log
Secrets in .env only — never hardcoded
No skipping the linter
No hand-editing autogenerated files

Each one prevents a whole class of bugs. Agents follow absolutes far more reliably than preferences.

2. Show, Don’t Just Tell

A rule that says “use Drizzle ORM” is less useful than a rule that shows your actual query patterns:

// How we query in this project
const events = await db
  .select()
  .from(schema.events)
  .where(eq(schema.events.cityId, cityId))
  .orderBy(desc(schema.events.startDate))
  .limit(20)

// How we query in this project
const events = await db
  .select()
  .from(schema.events)
  .where(eq(schema.events.cityId, cityId))
  .orderBy(desc(schema.events.startDate))
  .limit(20)

I keep a “Patterns” file with canonical examples for every common operation: route definitions, controller shapes, service patterns, template syntax. The agent pattern-matches against these instead of inventing its own style.

3. Provide the Right Tools (and Block the Wrong Ones)

This is the single highest-impact rule category. Instead of letting agents use general-purpose tools like psql or curl, I built purpose-built CLI wrappers and wrote rules that make them mandatory:

| Task             | Use This              | NEVER This                    |
|------------------|-----------------------|-------------------------------|
| Query database   | `bun run db --toon`   | psql, raw SQL clients, pg_dump|
| Check logs       | `bun run logs --toon` | grep, tail, cat on log files  |
| Test endpoints   | `bun run api --toon`  | curl commands                 |

| Task             | Use This              | NEVER This                    |
|------------------|-----------------------|-------------------------------|
| Query database   | `bun run db --toon`   | psql, raw SQL clients, pg_dump|
| Check logs       | `bun run logs --toon` | grep, tail, cat on log files  |
| Test endpoints   | `bun run api --toon`  | curl commands                 |

The --toon flag formats output for token efficiency — ~40% fewer tokens than JSON tables. The CLI tools are read-only by design, so agents can’t accidentally mutate data.

Check out: Token-Oriented Object Notation (TOON)

This isn’t just about safety. It’s about token efficiency. An agent that runs psql and gets back a raw table dump burns through context fast. A purpose-built tool returns exactly what’s needed in a compact format.

4. Rules for Thinking, Not Just Syntax

The most underrated category. Beyond code style, I have rules for how agents should approach problems:

Surface assumptions explicitly — don’t hide confusion, ask
Simplicity first — no speculative features, no over-engineering
Surgical changes — touch only what’s needed, match surrounding style
Goal-driven — define what success looks like before writing code

These sound obvious, but without them agents tend to refactor neighboring code “while they’re in there,” add error handling for impossible scenarios, and create abstractions for one-time operations.

5. Verification Is a Rule, Not a Suggestion

For UI changes, I have a hard rule: you must visually verify with Playwright before claiming completion. Not “it would be nice to check” — it’s a required workflow step.

Before marking a visual task complete:
- [ ] Code change is saved
- [ ] Dev server is running
- [ ] Used Playwright to view the page
- [ ] Visually confirmed the change looks correct

Before marking a visual task complete:
- [ ] Code change is saved
- [ ] Dev server is running
- [ ] Used Playwright to view the page
- [ ] Visually confirmed the change looks correct

For non-visual changes, the pre-commit ritual is codified:

bun lint && bun typecheck    # Before every commit
bun run test                 # Before every PR

bun lint && bun typecheck    # Before every commit
bun run test                 # Before every PR

Agents that know they’ll be checked do better work.

6. Pin Your Dependencies

This one bit me early. An agent decided to “update” a dependency while fixing a bug. Cascading breakage.

The rule is simple: all dependencies use exact versions. No ^, no ~. Shared versions go in a root-level catalog to prevent drift:

{
  "catalog": {
    "react": "19.1.0",
    "drizzle-orm": "0.44.2"
  }
}

{
  "catalog": {
    "react": "19.1.0",
    "drizzle-orm": "0.44.2"
  }
}

Agents can add dependencies. They cannot change versions of existing ones without explicit permission.

7. Structured Logging as Law

Without a logging rule, agents will console.log their way through debugging and leave the mess behind.

The rule mandates structured events with a canonical shape:

// Required shape
logger.info({ event: 'search_completed', outcome: 'success', results: 42 }, 'Search returned results')

// Violations
console.log('search done')           // ❌ No console.log
logger.info('search done')           // ❌ No string-only messages

// Required shape
logger.info({ event: 'search_completed', outcome: 'success', results: 42 }, 'Search returned results')

// Violations
console.log('search done')           // ❌ No console.log
logger.info('search done')           // ❌ No string-only messages

This isn’t pedantic — it’s what makes logs queryable in production.

Related, and a great read: https://loggingsucks.com

Rules That Don’t Work

Some things I’ve tried that agents consistently ignore or misapply:

Vague preferences: “Try to keep functions small” — meaningless without a threshold
Context-dependent rules: “Use pattern A for new code, pattern B for legacy” — agents can’t reliably judge which is which
Rules about rules: “Read all rules before starting” — they read them, but meta-instructions about process get lost
Negative examples without alternatives: “Don’t do X” without “do Y instead” leaves agents guessing

The Maintenance Loop

Rules aren’t write-once. My process:

Agent does something wrong
I fix it and ask: “Would a rule have prevented this?”
If yes, I write the rule in .rulesync/rules/
rulesync propagates it to all agent configs
Next session, the agent knows

The rule set grows over time but stays curated. I remove rules that prove unnecessary and sharpen ones that agents misinterpret. After a few months, the rules stabilize — most new sessions “just work” because the common mistakes are all covered.

Getting Started

You don’t need 18 rule files on day one. Start with three:

Don’ts — 5-7 absolute prohibitions specific to your project
Patterns — 3-5 canonical code examples showing “how we do things here”
Tools — Which CLI commands to use (and which to never touch)

Put them in a .rulesync/rules/ directory, set up rulesync to distribute them, and iterate from there. Every time an agent makes a mistake that a rule would have caught — write the rule.

The goal isn’t to constrain the agent. It’s to encode your project’s conventions so the agent can focus on the actual problem instead of reinventing your style guide every session.

Edu Wass