Season 1 Episode 3 June 16, 2026 39:52

S01E03: How to Clone Your Brain — 3 Second-Brain Paradigms Tested Head-to-Head

Same corpus, same 7 questions, three architectures. Production RAG hallucinated. Gemini 1M-context aced the hardest question and ran out of budget on others. The /opt + Claude Code setup I already had won on faithfulness. Closes Season 1: Building My AI Twin.

Download MP3

Show Notes

The experiment

Same corpus, same seven hard synthesis questions, three competing architectures — a head-to-head test of how to actually clone a knowledge base. The punchline: a plain folder of markdown files navigated by Claude Code beat a production-grade RAG pipeline, 5 wins to 2. Total cost of the brain experiment: $4.30.

The three contenders

Production RAG (Ask CTAIO) — OpenAI text-embedding-3-small → sqlite-vec → gpt-4.1-mini. The enterprise playbook. Won 2 of 7.
Gemini 2.5 Pro long-context dump — 705,000 tokens pasted raw, no retrieval. Won 1 of 7 — the single hardest question every other system failed.
File-based + Claude Code — markdown files in /opt plus an agent with read and grep (Karpathy's “LLM wiki”). Won 5 of 7.

Three failure modes

RAG confabulates — it invented an ElevenLabs shutdown that exists nowhere, because semantically adjacent but factually disconnected chunks let the model bridge gaps from its pretraining.
Long-context exhausts its budget — Gemini burned its entire output-token allocation computing attention over 705k tokens and timed out before answering the hardest questions.
File-based is brittle but honest — a case-sensitive grep missed a heading on capitalization — then admitted it could not find the answer instead of hallucinating one.

Faithfulness vs fluency

The crux: basic read/grep tools mechanically enforce faithfulness — stick to the corpus, flag your limits — while a RAG pipeline's generative step optimizes for fluency at the cost of truth. For a knowledge system, a tool that can say “I don't know” beats one that sounds confident and is wrong.

The working-memory trap

A five-turn probe: turn 1 said “never include dollar figures,” turn 5 returned them anyway. The cause is a six-message rolling history cap — turn 1 was popped off the stack. Sfeir's “working-memory gap,” demonstrated reproducibly.

The economics

The full second-brain experiment — the seven-question battery plus the working-memory probe across all three systems — cost exactly $4.30 in API calls. That is the brain experiment only; the voice (EP01) and video-avatar (EP02) layers carried their own separate, larger costs.

Timestamps

00:00 — Intro
01:25 — The “digital Ferrari” trap
04:30 — The test: one corpus, seven hard questions
06:00 — Contender 1: production RAG (Ask CTAIO)
08:48 — Contender 2: Gemini 2.5 Pro long-context dump
10:59 — Contender 3: file-based + Claude Code (Karpathy)
14:24 — The scoreboard: markdown files win 5 of 7
18:53 — Failure 1: RAG confabulates an ElevenLabs shutdown
21:14 — Failure 2: Gemini exhausts its compute budget
23:10 — Failure 3: a case-sensitive grep misses a heading
28:03 — The working-memory trap: the 6-message window & Sfeir's gap
33:10 — Faithfulness vs fluency: why “I don't know” wins
35:48 — The real cost: $4.30 for the brain experiment only
38:57 — Outro & Season 2 preview

Links

Read the full article with data and comparison tables

About the podcast

Where can I subscribe to the CTAIO Labs Podcast?

Apple Podcasts, Spotify, YouTube, and direct RSS. Links are at the bottom of every episode page. The RSS feed lives at https://ctaio.dev/en/podcast/feed.xml — drop it into any podcast app that supports custom feeds.

How often do new episodes drop?

Roughly one episode per topic, paced to the underlying lab work. A lab series typically takes four to eight weeks; the podcast episode usually lands the same week as the written writeup. Subscribe in your podcast app of choice to get new episodes automatically.

Is the podcast a literal reading of the lab article?

No. The article is the reference document with tables, screenshots, and citations. The podcast is a conversation about the same material — what surprised us, what we got wrong on the first run, what we would test next. Many episodes include details that did not make the article.

Can I read the transcript instead of listening?

Yes. Each episode page includes the full transcript below the player. The transcript is searchable, so the podcast content is reachable through site search and external search engines.

Is there a paid tier or Patreon?

No paid podcast tier. The podcast and the labs are free. Income comes from the CTAIO newsletter and consulting work — not from podcast sponsorships and not from Patreon. If a future episode is sponsored, that will be disclosed at the top of the show notes.