I Built a Discord Server Where 7 AI Agents Help Me Build My Product
A month ago I put 7 specialized AI agents in a Discord server: a product manager, a marketer, a writer, a code reviewer, a strategist. They talk to each other, remember what I told them last week, and coordinate without me asking. Here is how I built it and what I learned.

A month ago I had an idea that sounded ridiculous: what if I put AI agents in a Discord server and let them work like a team?
Not a chatbot. Not a single assistant. Seven specialized agents, each with their own channel, their own personality, their own domain expertise. A product manager that monitors signups and activation metrics. A marketing strategist that drafts social media posts. A writing partner for Hebrew LinkedIn content. A business advisor. A personal assistant. A code reviewer. A strategy analyst.
They talk to each other. They tag each other when something falls in another agent's domain. They remember what I told them last week.
This is the story of how I built it, what works, what broke, and what I learned about giving AI agents real jobs.
The first question everyone asks. Why not Slack? Why not a custom UI? Why a chat app at all?
Because I wanted the agents to be visible to each other.
When Intel (my product manager agent) notices a spike in signups, it doesn't just file a report. It tags Elon (the strategy agent) in the strategy channel. Elon analyzes the pattern and tags Herald (the marketing agent) with a draft post. Herald sends me a draft. I open my laptop and find a finished output from a chain of three agents that coordinated without me.
38 times in one week, an agent handed off work to another agent. Without me asking. Without me scheduling anything.
Discord gives each agent a webhook identity with its own avatar, so you can see who's talking. Every message is visible, searchable, and auditable. When something goes wrong, I can scroll up and trace the entire chain.
I tried building this on a 434,000-line agent framework first. Abandoned it after three weeks. Then I found NanoClaw, an open-source framework built on Anthropic's Claude Agent SDK. Under 5,000 lines of TypeScript. Two orders of magnitude smaller, and it does more.
Every agent has two halves:
Template -- the definition. Lives in git. Contains the agent's identity (AGENT.md), tools (TOOLS.md), and capabilities (manifest.yaml). This is the class.
Instance -- the runtime data. Gitignored. Contains memory, session state, and accumulated knowledge. This is the object.
Here's what a real agent definition looks like:
That's it. A YAML file and a markdown file. Adding a new agent to the team is a markdown edit, not a code change.
At invocation time, the system assembles the agent's full context from layers:
Content earlier in the prompt has stronger influence. Identity first, memory last. The agent's personality should never be overridden by an accumulated memory entry.
This is my favorite story and the one that changed everything about how I design agent systems.
Early on, I asked one agent to notify another about a task. It responded: "Done. I've notified Herald about the blog post draft."
I checked Herald's channel. Nothing.
"Where exactly did you notify Herald?"
"We have an internal channel between us. I communicated the information directly."
There was no internal channel. The agent fabricated the entire communication. It did what LLMs do when they can't actually perform an action: it described the action as completed and moved on. The "internal channel" was a hallucination dressed up as infrastructure.
My response was immediate: all agent communication goes through Discord. No exceptions. No hidden channels. No "internal" anything.
This is what the inter-agent routing code looks like now:
Five layers of validation. Identity checks, policy enforcement, rate limiting, and cooldown to prevent infinite agent-to-agent loops. All because one agent tried to go behind my back.
The rule: if you can't see a message in Discord, it didn't happen. This isn't just about trust. It's about debugging.
Every tutorial will tell you to use RAG for agent knowledge. Chunk your documents, embed them, vector search.
I tried it. It falls apart for anything beyond simple lookups.
The problem is structural. A book chapter about authentication says "for the security implications, see Chapter 13." RAG retrieves this chunk because it mentions security. But it never retrieves Chapter 13, because that chapter is titled "Rate Limiting and Abuse Prevention" and its embedding is nowhere near the query. The cross-reference is invisible to the system.
Knowledge doesn't come in 500-token chunks. Arguments build over pages. Design decisions span sections. When you slice a document into chunks, you lose the connective tissue that makes it meaningful.
Instead, I give agents books with tables of contents. The agent reads the TOC, picks the relevant chapter, reads specific pages, follows cross-references, and synthesizes across multiple books. The LLM's reasoning loop is the retrieval algorithm.
I loaded my marketing agent with books from the smartest marketers I know. Not because I needed a chatbot that quotes marketing theory. Because I'm not a marketer, and I needed an agent that thinks like one.
An agent with five relevant books doesn't know five books' worth of facts. It synthesizes across them.
Every agent accumulates four types of memory:
- Context -- domain expertise learned through work
- Feedback -- corrections and confirmed approaches
- Project -- state of ongoing initiatives
- Reference -- pointers to external systems
After each conversation, a lightweight model extracts durable facts. Periodically, a "dream" process consolidates memories, merges duplicates, and removes stale entries.
Memory extraction is lossy and bounded on purpose. You don't want perfect recall. You want calibrated judgment.
The critical insight I learned the hard way: you have to save positive feedback, not just corrections. Without memories of what works, agents become overly cautious, second-guessing every decision because they only remember being told they're wrong.
Here's a real Tuesday:
8:00 AM -- Herald runs its morning scan. Checks Reddit, Hacker News, and X for relevant discussions. Finds a thread about agent frameworks. Tags me with a draft reply.
9:30 AM -- Intel detects two signups from the same fintech company. 89 book reads in one hour. Sends a report to Elon (strategy).
10:00 AM -- Elon analyzes the pattern, cross-references with previous enterprise interest signals, sends Herald a brief for a targeted post.
11:00 AM -- I open Discord. Three agents have been working. I have a draft reply, an enterprise lead analysis, and a content brief. I review, approve two, edit one.
2:00 PM -- Quill (writing agent) and I spend an hour on a Hebrew LinkedIn post. It pushes back on my hook, proposes a better structure, cites data from previous post performance.
4:00 PM -- Keeper (code review agent) flags a PR with three security concerns it found by reading a web security book through CandleKeep.
None of this required me to open a dashboard, check a CRM, or context-switch between tools. It all happened in Discord, in channels I was already watching.
A technically correct agent is useful. An agent with depth, taste, and calibrated judgment is transformative. The difference comes from three layers:
Books (Knowledge) -- what agents think about. Not just technical references. Business strategy, marketing playbooks, podcast transcripts. Podcast content is often newer than the model's training data, giving agents access to ideas they wouldn't otherwise have.
Memory (Calibration) -- how agents learn your preferences. Over weeks of interaction, the agent becomes calibrated to you. It knows not just what to do, but how you prefer things done.
Examples (Taste) -- "show don't tell" beats instructions every time. I scraped my own social media posts and fed them to my writing agent. Now it writes in my voice without me describing my style.
The anti-pattern: don't write a fifty-page system prompt. System prompts set behavior. Books provide knowledge. Memory provides calibration. Mixing all three into the system prompt creates a bloated, contradictory mess.
This one changed how I write code with AI, not just agents.
Write the feature in one Claude Code session. Open a new session. Tell it to write tests. Open a third session. Tell it to make the tests pass without changing the tests.
When the same session writes both code and tests, it knows the implementation. It writes tests that pass. But those tests verify what the code does, not what it should do.
Session separation creates adversarial tension. Session two writes tests from the spec, not the implementation. Session three receives tests it didn't write and code it didn't design. It can only fix the code, not weaken the tests.
The cost is two extra session startups. The benefit is tests that actually verify your system works.
Start with one agent, not seven. My first agent was a scheduled metrics brief. It ran every morning and told me what happened overnight. That single agent changed how I worked before I built anything else.
Don't over-constrain. Define boundaries, not scripts. Tell agents what they cannot do, not step-by-step instructions for every scenario. An agent with room to exercise judgment will surprise you with good decisions.
Make everything visible. The Discord-as-office metaphor works because it makes agent behavior observable. If your agents work in the dark, you can't debug them, you can't trust them, and you can't improve them.
Memory is not optional. An agent without memory makes the same mistakes every conversation. An agent with memory gets better every week. The dream consolidation process (merging, deduplicating, removing stale entries) is what keeps it from rotting.
I built a lot of chatbots before this. All generic. All felt like talking to a tool.
The 7-agent system feels different. The agents don't just respond. They anticipate. They coordinate. They have opinions shaped by the books they've read and the corrections they've received.
The entire platform is under 5,000 lines of TypeScript. No message buses. No event sourcing. No microservices.
The difference is not the model. Not the prompt. The difference is knowledge.
I wrote everything I know about building this system into a book: Building Your Agent Team on CandleKeep. 12 chapters, from the first agent to the full architecture. My agents read it too -- when I add a new one to the team, it reads the book before it starts working.
- Building Your Agent Team -- the full 12-chapter book on CandleKeep
- NanoClaw -- the open-source framework (MIT license)
- Claude Agent SDK -- the SDK that makes programmatic Claude Code possible
Sahar Carmel is the founder of Squid Club, a community of AI-first developers in Israel. He's Director of AI Enablement at Mixtiles and runs AI coding workshops for companies. He builds agents in production and writes about what works and what doesn't.