General AgentsFeatured3,453 views3 likes

Above the Code: Why I Wrote a Cybersecurity Book for AI Agents

A security researcher found vulnerabilities in my website. I used it as an experiment: plain AI agent vs. agent armed with 3,910 pages of security knowledge. The book-equipped agent found 8x more critical issues.

Sahar Carmel

• Director AI enablement

March 15, 2026 • 9 min read

Above the Code: Why I Wrote a Cybersecurity Book for AI Agents

Someone messaged me on LinkedIn saying my website had a security vulnerability. I asked him not to tell me what it was.

The Message

A security researcher sent me a message after I published a post about the new Squid Club website. "I took a look out of curiosity, and I noticed you have a security vulnerability in your backend. The Swagger API is exposed, and the first endpoint lets anyone register as an admin."

My first instinct was to drop everything and fix it. But I stopped myself. I told him: don't tell me more. Let me diagnose it on my own, and then we'll compare notes.

Why? Because I'd been sitting on an idea for weeks, and this was the perfect test.

The Experiment

I'm not a security expert. I've been building AI products for ten years, and recently I built CandleKeep -- a library that gives AI agents access to real professional books. A few weeks earlier, I'd written the first book designed exclusively for AI agents -- a UI/UX guide with 170 rules synthesized from six leading design books. That book changed how my agents build interfaces. But security? A completely different domain.

So I ran a simple experiment. I pointed Claude Code at my website and asked it to run a security audit. Without any books, it found a few basic issues. Then I connected it to 10 professional security books through CandleKeep -- The Web Application Hacker's Handbook, the OWASP Testing Guide, Hacking APIs, and seven others -- and ran the same audit again.

The difference was staggering.

Without the books, the agent struggled to find even basic vulnerabilities. With the books, it found 15 security issues and remediated all of them. The fixes included HTTP-only cookies to prevent XSS token theft, proper role validation and IDOR protection, CSRF double-submit patterns, rate limiting against brute-force attacks, and security headers against common web attacks.

His response when I shared the results: "I really love the books approach you bring. There are truly hidden treasures of knowledge."

Above the Code

I sat with this and asked myself: why was the difference so large?

The answer turned out to be simple, and it changed how I think about AI and security. Security knowledge doesn't live in the code. It lives above the code.

An agent can read every line of a project and still not see that the Swagger API is exposed, that there's no rate limiting on authentication endpoints, that password reset tokens are stored in plaintext. These aren't bugs you find by reading code line by line. They're patterns you recognize from experience -- from having seen thousands of vulnerable applications and knowing what to look for.

This is exactly the kind of knowledge that sits in professional books but not in training data. A language model has a vague, averaged impression of security best practices from its training corpus. A focused, rule-based reference distilled from the leading security texts in the field gives it something entirely different: specific, opinionated, actionable rules with clear severity ratings and code examples.

The distinction matters. Training data is like a friend who says "yeah yeah, I know about security" but gets vague when you ask a specific question. A curated book of rules is like having a senior penetration tester sitting next to you, pointing at your code and saying: "this line, right here, is a CRITICAL vulnerability, and here's exactly why."

The Boeing Model 299

I started looking for historical parallels to this problem -- experts failing not because they lack skill, but because there's too much to remember.

On October 30, 1935, the Boeing Model 299 -- the most advanced bomber ever built -- crashed on takeoff at Wright Field in Dayton, Ohio. The pilot, Major Ployer "Pete" Hill, the Army Air Corps' chief test pilot, forgot to release the elevator gust lock before takeoff. The plane climbed to 200 feet, stalled, and crashed. Hill and another crew member died from their injuries.

The plane wasn't too complex to fly. It was too complex to remember. The Army Air Corps' solution wasn't to find better pilots. It was to create the pilot's checklist -- a systematic protocol that ensured every critical step was verified before takeoff. The B-17, as the Model 299 became known, went on to become one of the most important aircraft of World War II, with over 12,500 manufactured.

NASA research later showed that checklists reduced hull-loss accidents by 76.3% during takeoff, approach, and landing -- the phases comprising only 27% of flight time but accounting for the vast majority of fatal accidents. Surgeon Atul Gawande adapted the concept for medicine, achieving a 36% reduction in surgical complications across eight hospitals.

The pattern is universal. When a domain becomes too complex for human memory alone, the answer isn't better humans. It's better checklists.

The Cybersecurity Memory Problem

Cybersecurity is exactly this kind of domain. The numbers tell the story:

4.8 million cybersecurity positions are unfilled globally -- a 19% year-over-year increase
90% of cybersecurity teams report skills gaps beyond just staffing shortages
AI is now the #1 most-needed skill in cybersecurity, cited by 41% of respondents -- surpassing cloud security for the first time
Organizations with significant security staff shortages face breach costs that are, on average, $1.76 million higher

The problem isn't that there aren't good security experts. The problem is that there are too many things to remember. OWASP alone catalogs hundreds of vulnerability types. Each one has variants, edge cases, framework-specific manifestations, and evolving attack vectors. No single human -- and no AI agent operating from general training data alone -- can hold all of this in active working memory.

The Book

So I did what I did with the UI/UX book. I took 3,910 pages from 10 of the leading web security books in the field and wrote a new book -- this time, a comprehensive security reference designed exclusively for AI agents.

The source material:

Book	Author	Pages
The Web Application Hacker's Handbook	Dafydd Stuttard & Marcus Pinto	770
Web Application Security (2nd ed, O'Reilly 2024)	Andrew Hoffman	444
Bug Bounty Bootcamp	Vickie Li	418
Web Security for Developers	Malcolm McDonald	401
Secure by Design	Dan Bergh Johnsson, Daniel Deogun & Daniel Sawano	400
Hacking APIs	Corey J. Ball	363
Real-World Bug Hunting	Peter Yaworski	350
The Tangled Web	Michal Zalewski	324
Alice and Bob Learn Application Security	Tanya Janca	285
OWASP Web Security Testing Guide v4.2	OWASP Foundation	155

3,910 pages compressed into 39. 238 rules across 32 chapters. Each rule includes a threat summary, severity rating (CRITICAL/HIGH/MEDIUM/LOW), vulnerable code examples alongside secure alternatives, verification methods, and common mistakes.

The book covers everything from injection attacks and XSS to authentication flows, API security, cryptography, security headers, business logic vulnerabilities, supply chain risks, and compliance testing methodology. Every chapter is self-contained so an agent can read only what's relevant to its current task.

The Three-Way Audit

I wanted to test the impact in a measurable way. I took BoxyHQ's SaaS Starter Kit -- a popular open-source Next.js application with authentication, team management, API endpoints, and Prisma database -- and ran three independent security audits on the exact same codebase.

Audit 1: Claude Code's built-in security review

25 findings total
1 CRITICAL, 6 HIGH, 13 MEDIUM, 5 LOW

Audit 2: AI agent without the book

36 findings total
1 CRITICAL, 5 HIGH, 14 MEDIUM, 8 LOW

Audit 3: AI agent armed with the security book

34 findings total
8 CRITICAL, 9 HIGH, 10 MEDIUM, 7 LOW

8x more critical findings. Same codebase. Same model. The only difference: the book.

What the Book Caught That the Others Missed

The book-equipped agent found several entire categories of vulnerabilities that the other two audits missed completely:

Password reset tokens stored in plaintext. The reset token was saved directly to the database without hashing. If an attacker gains read access to the database through SQL injection, a backup leak, or an insider threat, they can use any active reset token to take over any account. The book flagged this as a violation of Chapter 4, Rule 4: "Store hash, not token" and Chapter 29: "Reset tokens must be stored hashed."

Password reset race condition (TOCTOU). The token is validated, then the password is changed, and only then is the token deleted. This creates a time-of-check-to-time-of-use window: multiple concurrent requests can all use the same token before it's consumed. The book caught this under Chapter 18, Rule 3: "Mark single-use tokens as used BEFORE performing the associated action."

Feature flags that don't actually stop execution. When SSO and SCIM features are disabled via configuration, the handler calls res.status(404).json(...) but doesn't return. Execution continues to process the request anyway. A disabled feature that still processes requests. The book flagged this under Chapter 1, Rule 5: "Fail closed."

Link-based invitations that never expire. After a user accepts a link-based invitation, the invitation stays active. Unlike email invitations which are deleted after acceptance, link invitations can be reused indefinitely by anyone who has the URL.

The plain agent and Claude Code's built-in review also miscalibrated severity on several findings. User enumeration -- where authentication endpoints reveal whether an email address is registered -- was rated MEDIUM by both. The book-equipped agent rated it CRITICAL, citing Chapter 4, Rule 2: "Return identical responses for all authentication failures."

These aren't obscure, theoretical vulnerabilities. They're the kind of issues that experienced penetration testers catch on day one of an engagement. They require pattern recognition from experience -- exactly what books provide and training data doesn't.

Not Smarter Agents. Better Checklists.

Major Pete Hill wasn't a bad pilot. He was one of the best in the world. But no pilot can remember everything. No AI agent can either.

The lesson from aviation, from medicine, and now from AI-assisted security auditing is the same: when complexity exceeds memory, you don't need more intelligence. You need systematic coverage. A checklist. A book of rules that ensures nothing gets missed.

The cybersecurity book for AI agents is available for free on CandleKeep.

Continue Reading

Back to Blog

Above the Code: Why I Wrote a Cybersecurity Book for AI Agents

The Message

The Experiment

Above the Code

The Boeing Model 299

The Cybersecurity Memory Problem

The Book

The Three-Way Audit

What the Book Caught That the Others Missed

Not Smarter Agents. Better Checklists.

Further Reading

Continue Reading