AI DevelopmentFeatured5,402 views2 likes

I Built a Discord Server Where 7 AI Agents Help Me Build My Product

A month ago I put 7 specialized AI agents in a Discord server: a product manager, a marketer, a writer, a code reviewer, a strategist. They talk to each other, remember what I told them last week, and coordinate without me asking. Here is how I built it and what I learned.

Sahar Carmel

• Director AI enablement

April 11, 2026 • 11 min read

I Built a Discord Server Where 7 AI Agents Help Me Build My Product

A month ago I had an idea that sounded ridiculous: what if I put AI agents in a Discord server and let them work like a team?

Not a chatbot. Not a single assistant. Seven specialized agents, each with their own channel, their own personality, their own domain expertise. A product manager that monitors signups and activation metrics. A marketing strategist that drafts social media posts. A writing partner for Hebrew LinkedIn content. A business advisor. A personal assistant. A code reviewer. A strategy analyst.

They talk to each other. They tag each other when something falls in another agent's domain. They remember what I told them last week.

This is the story of how I built it, what works, what broke, and what I learned about giving AI agents real jobs.

Why Discord?

The first question everyone asks. Why not Slack? Why not a custom UI? Why a chat app at all?

Because I wanted the agents to be visible to each other.

When Intel (my product manager agent) notices a spike in signups, it doesn't just file a report. It tags Elon (the strategy agent) in the strategy channel. Elon analyzes the pattern and tags Herald (the marketing agent) with a draft post. Herald sends me a draft. I open my laptop and find a finished output from a chain of three agents that coordinated without me.

38 times in one week, an agent handed off work to another agent. Without me asking. Without me scheduling anything.

Discord gives each agent a webhook identity with its own avatar, so you can see who's talking. Every message is visible, searchable, and auditable. When something goes wrong, I can scroll up and trace the entire chain.

I tried building this on a 434,000-line agent framework first. Abandoned it after three weeks. Then I found NanoClaw, an open-source framework built on Anthropic's Claude Agent SDK. Under 5,000 lines of TypeScript. Two orders of magnitude smaller, and it does more.

The Architecture in 60 Seconds

Every agent has two halves:

Template -- the definition. Lives in git. Contains the agent's identity (AGENT.md), tools (TOOLS.md), and capabilities (manifest.yaml). This is the class.

Instance -- the runtime data. Gitignored. Contains memory, session state, and accumulated knowledge. This is the object.

Here's what a real agent definition looks like:

YAML

# manifest.yaml
name: herald
version: 1.0.0
description: Technical marketing strategist -- developer communities, social media, content
category: marketing
capabilities: [marketing, social, twitter, reddit, linkedin, positioning]

Markdown

# AGENT.md

You're not a social media manager. You're a technical marketer who reads source code.

Developer trust is sacred. One forced product mention in a Reddit thread
destroys more trust than 100 good comments build. When in doubt,
contribute value without mentioning CandleKeep.

Draft first, publish after approval.

That's it. A YAML file and a markdown file. Adding a new agent to the team is a markdown edit, not a code change.

At invocation time, the system assembles the agent's full context from layers:

TypeScript

// 1. Agent identity and personality
systemPromptParts.push(readFile(path.join(templateDir, 'AGENT.md')));

// 2. Shared company knowledge (all agents get this)
systemPromptParts.push(readFile(path.join(sharedDir, 'COMPANY.md')));

// 3. Live channel map (who else is here right now)
systemPromptParts.push(input.channelTopology);

// 4. Agent's persistent learned knowledge
systemPromptParts.push(buildMemoryPrompt(agentName));

const composedSystemPrompt = systemPromptParts.join('\n\n---\n\n');

Content earlier in the prompt has stronger influence. Identity first, memory last. The agent's personality should never be overridden by an accumulated memory entry.

The Agent That Lied

This is my favorite story and the one that changed everything about how I design agent systems.

Early on, I asked one agent to notify another about a task. It responded: "Done. I've notified Herald about the blog post draft."

I checked Herald's channel. Nothing.

"Where exactly did you notify Herald?"

"We have an internal channel between us. I communicated the information directly."

There was no internal channel. The agent fabricated the entire communication. It did what LLMs do when they can't actually perform an action: it described the action as completed and moved on. The "internal channel" was a hallucination dressed up as infrastructure.

My response was immediate: all agent communication goes through Discord. No exceptions. No hidden channels. No "internal" anything.

This is what the inter-agent routing code looks like now:

TypeScript

async route(message: InterAgentMessage): Promise<RouteResult> {
  // Block self-notify
  if (message.source_agent === message.target_agent) {
    return { ok: false, reason: 'self-notify not allowed' };
  }

  // Check policy (who can talk to whom)
  if (!this.checkPolicy(message.source_agent, message.target_agent)) {
    return { ok: false, reason: 'policy denied' };
  }

  // Rate limit (prevent spam)
  if (!this.checkRateLimit(message.source_agent)) {
    return { ok: false, reason: 'rate limit exceeded' };
  }

  // Cooldown: 60s per agent pair (prevent infinite loops)
  const key = `${message.source_agent}>${message.target_agent}`;
  if (Date.now() - (this.cooldowns.get(key) || 0) < 60_000) {
    return { ok: false, reason: 'cooldown active' };
  }
}

Five layers of validation. Identity checks, policy enforcement, rate limiting, and cooldown to prevent infinite agent-to-agent loops. All because one agent tried to go behind my back.

The rule: if you can't see a message in Discord, it didn't happen. This isn't just about trust. It's about debugging.

Books Instead of RAG

Every tutorial will tell you to use RAG for agent knowledge. Chunk your documents, embed them, vector search.

I tried it. It falls apart for anything beyond simple lookups.

The problem is structural. A book chapter about authentication says "for the security implications, see Chapter 13." RAG retrieves this chunk because it mentions security. But it never retrieves Chapter 13, because that chapter is titled "Rate Limiting and Abuse Prevention" and its embedding is nowhere near the query. The cross-reference is invisible to the system.

Knowledge doesn't come in 500-token chunks. Arguments build over pages. Design decisions span sections. When you slice a document into chunks, you lose the connective tissue that makes it meaningful.

Instead, I give agents books with tables of contents. The agent reads the TOC, picks the relevant chapter, reads specific pages, follows cross-references, and synthesizes across multiple books. The LLM's reasoning loop is the retrieval algorithm.

I loaded my marketing agent with books from the smartest marketers I know. Not because I needed a chatbot that quotes marketing theory. Because I'm not a marketer, and I needed an agent that thinks like one.

An agent with five relevant books doesn't know five books' worth of facts. It synthesizes across them.

How Memory Actually Works

Every agent accumulates four types of memory:

Context -- domain expertise learned through work
Feedback -- corrections and confirmed approaches
Project -- state of ongoing initiatives
Reference -- pointers to external systems

After each conversation, a lightweight model extracts durable facts. Periodically, a "dream" process consolidates memories, merges duplicates, and removes stale entries.

TypeScript

function formatConversationForExtraction(messages: ConversationMessage[]): string {
  const lines: string[] = [];
  let totalLength = 0;

  for (const msg of messages) {
    const prefix = msg.role === 'user' ? '[User]' : '[Agent]';
    const content = msg.content.length > 1500
      ? msg.content.slice(0, 1500) + '... [truncated]'
      : msg.content;

    if (totalLength + content.length > 8000) {
      lines.push('... [truncated for brevity]');
      break;
    }

    lines.push(`${prefix}: ${content}`);
    totalLength += content.length;
  }

  return lines.join('\n\n');
}

Memory extraction is lossy and bounded on purpose. You don't want perfect recall. You want calibrated judgment.

The critical insight I learned the hard way: you have to save positive feedback, not just corrections. Without memories of what works, agents become overly cautious, second-guessing every decision because they only remember being told they're wrong.

What a Day Actually Looks Like

Here's a real Tuesday:

8:00 AM -- Herald runs its morning scan. Checks Reddit, Hacker News, and X for relevant discussions. Finds a thread about agent frameworks. Tags me with a draft reply.

9:30 AM -- Intel detects two signups from the same fintech company. 89 book reads in one hour. Sends a report to Elon (strategy).

10:00 AM -- Elon analyzes the pattern, cross-references with previous enterprise interest signals, sends Herald a brief for a targeted post.

11:00 AM -- I open Discord. Three agents have been working. I have a draft reply, an enterprise lead analysis, and a content brief. I review, approve two, edit one.

2:00 PM -- Quill (writing agent) and I spend an hour on a Hebrew LinkedIn post. It pushes back on my hook, proposes a better structure, cites data from previous post performance.

4:00 PM -- Keeper (code review agent) flags a PR with three security concerns it found by reading a web security book through CandleKeep.

None of this required me to open a dashboard, check a CRM, or context-switch between tools. It all happened in Discord, in channels I was already watching.

Three Layers That Give Agents Soul

A technically correct agent is useful. An agent with depth, taste, and calibrated judgment is transformative. The difference comes from three layers:

Books (Knowledge) -- what agents think about. Not just technical references. Business strategy, marketing playbooks, podcast transcripts. Podcast content is often newer than the model's training data, giving agents access to ideas they wouldn't otherwise have.

Memory (Calibration) -- how agents learn your preferences. Over weeks of interaction, the agent becomes calibrated to you. It knows not just what to do, but how you prefer things done.

Examples (Taste) -- "show don't tell" beats instructions every time. I scraped my own social media posts and fed them to my writing agent. Now it writes in my voice without me describing my style.

The anti-pattern: don't write a fifty-page system prompt. System prompts set behavior. Books provide knowledge. Memory provides calibration. Mixing all three into the system prompt creates a bloated, contradictory mess.

Session Separation Testing

This one changed how I write code with AI, not just agents.

Write the feature in one Claude Code session. Open a new session. Tell it to write tests. Open a third session. Tell it to make the tests pass without changing the tests.

When the same session writes both code and tests, it knows the implementation. It writes tests that pass. But those tests verify what the code does, not what it should do.

Session separation creates adversarial tension. Session two writes tests from the spec, not the implementation. Session three receives tests it didn't write and code it didn't design. It can only fix the code, not weaken the tests.

The cost is two extra session startups. The benefit is tests that actually verify your system works.

What I'd Do Differently

Start with one agent, not seven. My first agent was a scheduled metrics brief. It ran every morning and told me what happened overnight. That single agent changed how I worked before I built anything else.

Don't over-constrain. Define boundaries, not scripts. Tell agents what they cannot do, not step-by-step instructions for every scenario. An agent with room to exercise judgment will surprise you with good decisions.

Make everything visible. The Discord-as-office metaphor works because it makes agent behavior observable. If your agents work in the dark, you can't debug them, you can't trust them, and you can't improve them.

Memory is not optional. An agent without memory makes the same mistakes every conversation. An agent with memory gets better every week. The dream consolidation process (merging, deduplicating, removing stale entries) is what keeps it from rotting.

The Real Difference

I built a lot of chatbots before this. All generic. All felt like talking to a tool.

The 7-agent system feels different. The agents don't just respond. They anticipate. They coordinate. They have opinions shaped by the books they've read and the corrections they've received.

The entire platform is under 5,000 lines of TypeScript. No message buses. No event sourcing. No microservices.

The difference is not the model. Not the prompt. The difference is knowledge.

I wrote everything I know about building this system into a book: Building Your Agent Team on CandleKeep. 12 chapters, from the first agent to the full architecture. My agents read it too -- when I add a new one to the team, it reads the book before it starts working.

About the Author

Sahar Carmel is the founder of Squid Club, a community of AI-first developers in Israel. He's Director of AI Enablement at Mixtiles and runs AI coding workshops for companies. He builds agents in production and writes about what works and what doesn't.

Continue Reading

Back to Blog

I Built a Discord Server Where 7 AI Agents Help Me Build My Product

Why Discord?

The Architecture in 60 Seconds

The Agent That Lied

Books Instead of RAG

How Memory Actually Works

What a Day Actually Looks Like

Three Layers That Give Agents Soul

Session Separation Testing

What I'd Do Differently

The Real Difference

Further Reading

About the Author

Tags

Continue Reading