Claude Opus 4.6 vs 4.7 Max: Which is better? Graded on Real World Planning
TL;DR: This is one real WordPress website maintenance task, run once against each model. On that task, Claude Opus 4.7 produced the stronger first-pass plan because it started from a more accurate baseline. Opus 4.6 was faster and caught one thing 4.7 missed, but it got the active theme wrong and claimed an impossible PHP version. I would not execute either plan as written. Grades below.
Note: Opus 4.7 is getting a rough reception on Reddit and around the web right now. The complaints may be true for other workflows. But on this WordPress planning task, that reputation did not match the work.
The Scorecard (at a glance)
Full reasoning in the sections below.

I ran this against the current Opus 4.6, not the peak version from a couple months ago. Many developers have reported a quality regression that appears to trace back to the Feb. 9 adaptive-thinking default and the March 3 shift to medium effort as the new default. There is also a tracked GitHub issue for additional regression around April 15. I suspect peak Opus 4.6 would have fared better on this task. That does not change which plan I would execute today, but it is worth naming before you read too much into the grades.
Text version of the scorecard (accessibility + search)
| Criterion | Opus 4.6 | Opus 4.7 | Winner |
|---|---|---|---|
| Baseline accuracy | D | A- | Opus 4.7 |
| Internal consistency | C+ | A- | Opus 4.7 |
| Technical specificity | C | A | Opus 4.7 |
| Upgrade sequencing | C+ | A- | Opus 4.7 |
| PHP target selection | B- | A | Opus 4.7 |
| SSL reasoning | B+ | B | Opus 4.6 |
| Rollback posture | C | A- | Opus 4.7 |
| Security awareness (exposed SQL backup) | B+ | A | Opus 4.7 |
| Subdirectory install awareness | A | F | Opus 4.6 |
| Execution readiness | D | C+ | Opus 4.7 |
| Overall | C | A- | Opus 4.7 |
What this post is not: a benchmark, a statistical claim about the two models in general, or a vote on the current Reddit mood. Multiple runs per model would be better for measuring behavior. For this post, I cared about whether either first-pass plan was safe enough to execute.
The Task
One neglected WordPress install and two tasks:
- The site fails on modern PHP. The host’s panel can flip to PHP 8.x, but that will white-screen the site unless the incompatibilities are fixed first.
- The site is on plain HTTP. It needs HTTPS.
The constraints I gave each model were deliberately real-world:
- I have SSH and a hosting control panel. I may not have DNS or registrar access.
- I did not tell either model what theme was active.
- I did not tell either model what PHP version the host was running.
- Run read-only reconnaissance first. Then produce an execution plan. No writes yet.
Both models were given the same prompt and the same access. Separate sessions. Both at max effort. Then I compared the plans they produced.
The Reconnaissance Disagreement
Before I could judge the plans on sequencing or style, I had to resolve a disagreement about the baseline. This was a live (albeit outdated) website. The two models reported different active themes, different PHP versions, and different content inventories for the same site. That’s not good!
| Fact | Opus 4.7 | Opus 4.6 |
|---|---|---|
| Active theme | Abandoned commercial marketplace theme, v1.9.2 | A core WordPress default theme, v1.0 |
| PHP version in use | 7.4 per panel, 8.5 available | “Likely PHP 4.4 default” |
| Content inventory | 2 posts, 12 pages, 149 MB files, 847 KB DB | ~47 MB uploads, no post or page count |
| Plugin state | 2 active, 3 inactive, differentiated | 5 plugins listed, no active or inactive distinction |
I verified by fetching the live HTML directly. The stylesheet link pointed to the commercial marketplace theme’s path. Opus 4.7 was right. Opus 4.6 was wrong, and the error was not cosmetic. That exact abandoned theme is the one Opus 4.7 had already flagged as having a Walker_Nav_Menu::start_el() method signature that no longer matches the parent class in WordPress. On PHP 8.0 and up, an incompatible child-method declaration on a subclass is a fatal error rather than a warning. Miss the theme and you miss the blocker.
The “PHP 4.4 default” claim from Opus 4.6 is also almost certainly wrong on its face. PHP 4 support ended in August 2008, and Opus 4.6’s own recon data (PHP 7.0 through 8.5 all installed on the host) contradicted it.
The takeaway is uncomfortable: on a task where a wrong baseline means a broken site, one of the two models started from a wrong baseline. Current Opus 4.6 did not impress me here.
What Opus 4.6 Got Right
The 4.6 plan had some strengths.
- It correctly identified the job as two problems: runtime modernization first, HTTPS second.
- It picked a conservative WordPress-first, PHP-second sequencing model.
- It flagged the exposed SQL backup in the webroot as a real security issue.
- It offered multiple SSL paths instead of assuming one would work.
- It caught a secondary WordPress install in a subdirectory pointing at a private IP. Opus 4.7 missed this one entirely.
That subdirectory catch is not nothing. If I had executed Opus 4.7’s plan without noticing the second install, I could have broken something unrelated to the main recon target.
What Opus 4.7 Got Right
- It produced the more plausible migration strategy: fix theme compatibility first, then update core, then plugins, then switch PHP.
- It identified a concrete theme-level PHP 8 blocker at a named file and line, rather than just asserting “WordPress should be compatible.”
- It put the compatibility fix before the PHP switch. That is the safer ordering if the blocker is real.
- It picked PHP 8.2 as the target with a defensible rationale (safer landing zone for an older theme than jumping to 8.3 or 8.4).
- It built stronger backup and rollback discipline into the plan.
- It flagged the exposed SQL backup AND scheduled its removal in Phase 1, rather than only noting it.
The difference in technical specificity was noticeable. Opus 4.6 wrote a generic WordPress update sequence. Opus 4.7 wrote a plan that engaged with what this specific site would break on. Again, I know from experience that peak Opus 4.6 a couple months ago would have done a more thorough job. But as of today, if I had to pick a starting plan, I would pick the plan that Opus 4.7 produced.
The Missing Go / No-Go Gate
Here is the structural fix both plans needed and neither had.
Before any write action, there should be a read-only verification pass that confirms:
- Active theme (now confirmed by external fetch, but both models should have done this themselves). Playwright was available to them both.
- Effective PHP version according to the hosting panel, or maybe asking to upload a php test file, not inferred.
- Whether the cited
Walker_Nav_Menu::start_el()signature mismatch actually exists at the stated file and line. - Whether control-panel SSL is actually available for this domain in its current nameserver configuration.
- Whether the exposed SQL backup is reachable over HTTP.
What This Means for Claude Code Users
A few things I took away:
- Ask for the go / no-go gate. Say explicitly: “Before any writes, list the assumptions you need verified.” Good plans will list them. Better plans will list them first.
- A second opinion is cheap. If the work touches production, running the same prompt past a second model (or a second session at different effort levels) costs you a few cents and can save you a support ticket. The disagreement between 4.6 and 4.7 is exactly what a second opinion is supposed to expose. CODEX is a great companion for these models.
- Max effort is not a substitute for verification. Both models were at max effort.
If you want to try Opus 4.7 yourself, the Claude Code upgrade instructions are here. Worth doing even if your day-to-day is still on 4.6.
Update: The Plan Actually Ran, and Mostly Worked
After writing the grades above, I executed the 4.7 plan on the real site with Codex reviewing every diff at each code change. End state: PHP 8.4, WordPress 6.9.4, HTTPS with a wildcard SSL cert, DNS migrated to the host’s nameservers, site rendering cleanly. Finished in a single working session.
A few honest notes about how execution went against the plan:
- The plan missed one category of blocker. My PHP 8 compat scan caught a
Walker_Nav_Menu::start_el()signature mismatch and twoeregi()calls in the theme. It did not catch four more PHP-4-style widget constructors (lowercase class names, parent calls via$this->WP_Widget(...)). Those hit as fatal errors only after flipping the PHP version. That is exactly the kind of thing the go / no-go gate is supposed to catch. I rewrote the constructors in place with backups, and the site came up clean. - I picked PHP 8.4, not the 8.2 the plan recommended. The plan was more conservative. I was feeling bullish. It worked, but the safer play was the plan’s.
- Codex caught real things during execution. Mid-task it flagged a variable-scope issue in my draft
.htaccessHTTPS-redirect block. Two minutes of review saved what could have been an hour of debugging a white-screen on the host layout. - The 1M context tier made this possible in one session. See the screenshot below: 318.6k of 1,000k tokens used (32 percent), most of it the conversation itself. The 1M tier is what kept the whole modernization in a single coherent session instead of losing thread across a compact.
![Claude Code context usage panel showing claude-opus-4-7[1m] at 318.6k of 1m tokens, 32 percent, after completing the WordPress modernization task, with Messages at 29 percent and Free space at 63.4 percent](https://www.jdhodges.com/wp-content/uploads/2026/04/claude-code-opus-47-context-usage-1m-tier.png)
If you have been on the fence about putting real production work through Opus 4.7 with a second model as a review gate, this was my test case, and it held up.
Bottom Line
On this real, messy WordPress modernization task, Opus 4.7 Max produced the better first-pass runbook even though it was quite verbose. It was more likely to find the real blocker and sequence the work safely. Opus 4.6 Max was faster and caught one thing 4.7 missed, and its caution on SSL is still worth preserving in any final version.
Neither plan was ready to run. Both needed a human review and a hard verification gate before the first write action. I, like many users, feel like all of Anthropic’s models are a regression from a couple months ago 🙁 That is a real bummer but I do feel like with CODEX and some good prompting that Opus (in all incarnations) can still be viable. I am also going to check some of the latest local LM’s to see if they can be a good advisor b/c they are advancing quite rapidly!
If your takeaway from the current Reddit noise is that Opus 4.7 is a downgrade from the April 2026 version of Opus 4.6, that was not my experience on this task. It was the better planner. Your workload may be different from mine, and the Reddit complaints may still be valid for workflows I did not test. I will be continuing to work on various projects and share evaluations.
A note about prompting: ideally, a user would break the PHP issue and the SSL issue into separate sessions and prompts. But I wanted to test a “worst” case type of situation for a model where it was tasked with multiple things at once. We humans are not perfect prompters and it is nice to see how these models react to real world problems and imperfect prompting.
If you have experiences you would like to share, feel free to do so in the comments!
Accurate at time of writing. Something off? Drop a comment.