The Mother-In-Law Method for Claude or ChatGPT

A Reddit post called “The Mother-In-Law Method” is making the rounds in r/ClaudeAI right now. The pitch from u/Ancient_Perception_6: prompt Claude to review your code as if it were written by your mother-in-law, the one who insulted your cooking and your “weird-looking feet.” Find revenge in the diff. Claude obliged, spawned four parallel “hostile reviewers” with distinct beats (money math, tenancy, API contracts, tests), and 31 minutes later returned 27 issues plus nits.

Funny post. Funnier thread. It’s tagged as humor but it’s pointing at something real: out of the box, Claude is too polite to your code. Default review tone is “yes, and, also nice job here.” That’s a problem when you’re shipping production logic.

Someone in the comments suggested making it into a slash command but picking it apart, here’s what I think is actually going on, and what I do instead.

What the MIL method actually does

1. Parallel reviewers with distinct scopes. Claude split its attention four ways instead of doing one diffuse pass over everything. Money math goes to one reviewer. Tenancy goes to another. API contracts and tests get their own. This is the workhorse.

2. License to drop the politeness tax. Hostile framing gives the model permission to escalate marginal observations without “you might consider” hedging.

3. Persona-induced confirmation hunger. Claude now has a stake. Please the hater. Find issues. Whether or not they’re real.

Strip the mother-in-law and keep #1. You get most of the value with none of the false-positive tax. The roleplay is theatre that happens to unlock a structural prompt.

The paradox

Buried in the comments, the OP mentions “a couple were suuuuper nitpick. some were genuinely good.” That admission matters more than the gimmick.

Sycophancy and hostile-prompt-pleasing are the same circuit. A Claude that infers “user wants validation” and a Claude that infers “user wants blood” are both doing audience modeling, not code modeling. The MIL method doesn’t fight sycophancy. It redirects it.

Which means any prompt that telegraphs the desired emotional valence of the output is corrupting the review. “Your annoying mother-in-law wrote this code” telegraphs hostility. “Be skeptical” telegraphs skepticism. “Play devil’s advocate” telegraphs combat. They all push the model toward a vibe instead of toward the truth.

Anthropic’s own work on this (Sharma et al., “Towards Understanding Sycophancy in Language Models,” 2023) shows that preference-trained assistants can learn to favor answers that match user beliefs, even when those beliefs are wrong. What the paper does not show is that hostile framing is the cure. From the model’s perspective, “make the user happy” and “make the user happy by being mean to a third party” are the same training objective, just pointed at a different target.

Or as one commenter on the thread, u/Ok_Boysenberry5849, put it more directly: “This is begging Claude to misrepresent non-issues as catastrophic bugs.”

Simulated independence vs structural independence

Here’s the concept the MIL method gestures at without naming.

When you prompt Claude to play a hostile reviewer, you’re asking the same model to switch sides. Same training data, same RLHF pressure, same blind spots. The hostility is performed. Call this simulated independence.

When you pipe Claude’s draft through a different model entirely (say, GPT-5.5 via Codex, or Gemini through its own CLI), that second model doesn’t share Claude’s weights, serving stack, or exact post-training incentives. It doesn’t have Claude’s defensive instincts about Claude’s own work. The independence is structural. The disagreement, when it comes, is real.

I have both models wired into my Claude Code setup on this machine. Codex relay returns a single-shot review in about 4.5 seconds. Gemini’s CLI is comparable. Cheap enough that cost hasn’t mattered in my workflow. And the feedback is genuinely hostile in the only sense that matters for review: the second model doesn’t share the first one’s bias.

That’s the version of “adversarial review” worth keeping IMHO.

What actually works (no roleplay required)

I ran this question itself through a three-way brainstorm: Codex, Gemini, and a Claude subagent in parallel. All three came back with the same verdict on /mil (don’t build it) but different angles on what to build instead. Three useful techniques survived the comparison, and none of them require valence priming.

Pre-mortem framing. Instead of “review this code,” ask: “This code shipped to production yesterday and caused a SEV-0. Reading the diff now, what was the bug?” Hindsight bias does the work. The model isn’t asked to be hostile. It’s asked to assume failure and reverse-engineer the cause. The same prompt shape works outside code: “this affiliate page got a 3% CTR instead of the 15% I expected. Why did it underperform?” Same trick.

Citation-or-drop gate. Make the schema strict. Every finding must cite specific code (file plus line), specify the failure mode in concrete terms, and propose a minimal fix or test. No citation? Drop it. This kills the bulk of MIL’s false positives without any framing change. Codex specifically proposed this as a strict JSON schema with severity, confidence, file, line, rationale, evidence, next_action, validation fields. Solid spec.

Structured beats with neutral framing. The parallel-reviewers part is good. Just drop the valence. “Review this PR across these four dimensions: money math, multi-tenancy, API contract, test coverage. For each, list concrete failure modes with code citations.” Same parallelism. No theatre.

Why I’m not adding /mil to my Claude Code setup

I run Claude Code with Codex relay and Gemini CLI wired in as second opinions. Both call signatures are documented in this site’s CLAUDE.md and run in single-digit seconds. When I want hostile feedback on Claude’s output, I send Claude’s output to Codex (and in fact it is told to by .md to do that frequently at critical stages). Codex doesn’t share Claude’s RLHF lineage. Disagreement, when it surfaces, isn’t theatre and believe me CODEX finds stuff almost every time. They make a good pair. Gemini is not my fav but can be handy for research or checking for hallucinations etc.

A /mil slash command sitting on top of that stack would be cosmetic at best. At worst it would actively degrade my workflow by introducing fabricated bugs that the structurally-independent second opinion would never have raised in the first place.

If I were going to build anything, it’d be the lightweight version Codex itself proposed during the brainstorm: extend the existing multi-LLM review path with a --beats flag and a strict-JSON output schema enforcing the citation-or-drop gate. Maybe 90 minutes of work. Doesn’t need a name.

What u/Ancient_Perception_6 got right

Credit where it’s due. The method’s real value isn’t the bugs it found in their expense refactor… it’s the demonstration that prompt framing can swing review depth significantly, and that most of us are leaving signal on the table by being polite to our tools.

That’s worth knowing. The fix IMHO isn’t to pick a fight with Claude. It’s to give Claude a 2nd structurally honest reviewer to argue with, and to ask both of them for evidence. And kudos to OP for trying novel approaches, this is just my 2 cents of take on the overall concept and please don’t take any of this as you must do it, rather this is what has worked very well for me.

Lastly, I hope my mother-in-law isn’t reading this. She’s actually lovely. 😉

Leave a Reply

Your email address will not be published. Required fields are marked *