S01E03: How to Clone Your Brain — 3 Second-Brain Paradigms Tested Head-to-Head
Same corpus, same 7 questions, three architectures. Production RAG hallucinated. Gemini 1M-context aced the hardest question and ran out of budget on others. The /opt + Claude Code setup I already had won on faithfulness. Closes Season 1: Building My AI Twin.
Show Notes
The experiment
Same corpus, same seven hard synthesis questions, three competing architectures — a head-to-head test of how to actually clone a knowledge base. The punchline: a plain folder of markdown files navigated by Claude Code beat a production-grade RAG pipeline, 5 wins to 2. Total cost of the brain experiment: $4.30.
The three contenders
- Production RAG (Ask CTAIO) — OpenAI text-embedding-3-small → sqlite-vec → gpt-4.1-mini. The enterprise playbook. Won 2 of 7.
- Gemini 2.5 Pro long-context dump — 705,000 tokens pasted raw, no retrieval. Won 1 of 7 — the single hardest question every other system failed.
- File-based + Claude Code — markdown files in /opt plus an agent with read and grep (Karpathy's “LLM wiki”). Won 5 of 7.
Three failure modes
- RAG confabulates — it invented an ElevenLabs shutdown that exists nowhere, because semantically adjacent but factually disconnected chunks let the model bridge gaps from its pretraining.
- Long-context exhausts its budget — Gemini burned its entire output-token allocation computing attention over 705k tokens and timed out before answering the hardest questions.
- File-based is brittle but honest — a case-sensitive grep missed a heading on capitalization — then admitted it could not find the answer instead of hallucinating one.
Faithfulness vs fluency
The crux: basic read/grep tools mechanically enforce faithfulness — stick to the corpus, flag your limits — while a RAG pipeline's generative step optimizes for fluency at the cost of truth. For a knowledge system, a tool that can say “I don't know” beats one that sounds confident and is wrong.
The working-memory trap
A five-turn probe: turn 1 said “never include dollar figures,” turn 5 returned them anyway. The cause is a six-message rolling history cap — turn 1 was popped off the stack. Sfeir's “working-memory gap,” demonstrated reproducibly.
The economics
The full second-brain experiment — the seven-question battery plus the working-memory probe across all three systems — cost exactly $4.30 in API calls. That is the brain experiment only; the voice (EP01) and video-avatar (EP02) layers carried their own separate, larger costs.
Timestamps
- 00:00 — Intro
- 01:25 — The “digital Ferrari” trap
- 04:30 — The test: one corpus, seven hard questions
- 06:00 — Contender 1: production RAG (Ask CTAIO)
- 08:48 — Contender 2: Gemini 2.5 Pro long-context dump
- 10:59 — Contender 3: file-based + Claude Code (Karpathy)
- 14:24 — The scoreboard: markdown files win 5 of 7
- 18:53 — Failure 1: RAG confabulates an ElevenLabs shutdown
- 21:14 — Failure 2: Gemini exhausts its compute budget
- 23:10 — Failure 3: a case-sensitive grep misses a heading
- 28:03 — The working-memory trap: the 6-message window & Sfeir's gap
- 33:10 — Faithfulness vs fluency: why “I don't know” wins
- 35:48 — The real cost: $4.30 for the brain experiment only
- 38:57 — Outro & Season 2 preview
Links
Transcript coming soon. Until then, the companion article covers everything we talked about in this episode.
About the podcast
Where can I subscribe to the CTAIO Labs Podcast?
Apple Podcasts, Spotify, YouTube, and direct RSS. Links are at the bottom of every episode page. The RSS feed lives at https://ctaio.dev/en/podcast/feed.xml — drop it into any podcast app that supports custom feeds.
How often do new episodes drop?
Roughly one episode per topic, paced to the underlying lab work. A lab series typically takes four to eight weeks; the podcast episode usually lands the same week as the written writeup. Subscribe in your podcast app of choice to get new episodes automatically.
Is the podcast a literal reading of the lab article?
No. The article is the reference document with tables, screenshots, and citations. The podcast is a conversation about the same material — what surprised us, what we got wrong on the first run, what we would test next. Many episodes include details that did not make the article.
Can I read the transcript instead of listening?
Yes. Each episode page includes the full transcript below the player. The transcript is searchable, so the podcast content is reachable through site search and external search engines.
Is there a paid tier or Patreon?
No paid podcast tier. The podcast and the labs are free. Income comes from the CTAIO newsletter and consulting work — not from podcast sponsorships and not from Patreon. If a future episode is sponsored, that will be disclosed at the top of the show notes.
Subscribe to the podcast
Pick your platform and you'll get every new episode automatically.