ThinkPad P14s AI 9 XH PRO 370 w/96GB RAM & LLM benchmarks

Monday, 2025.08.18 J.D. H. Leave a comment

Running Big LLMs on a Little Workstation: My Adventures with the ThinkPad P14s Gen 6 AMD

I’ve been experimenting with large language models (LLMs) lately, and I wanted to see how far I could push things using a (relatively) inexpensive laptop. Enter the ThinkPad P14s Gen 6 AMD—a slim mobile workstation that set me back about $1,600. On paper it’s not exactly a “supercomputer,” but with the right configuration (and my go-to tool, LM Studio), it turns out this little black box can handle some pretty big AI workloads. 🚀

🔧 The Specs That Matter

CPU: AMD Ryzen™ AI 9 HX PRO 370 (12 cores / 24 threads, up to 5.10 GHz)
GPU: Integrated AMD Radeon™ 890M
RAM: 96 GB DDR5-5600 (2 × 48 GB SODIMMs)
Storage: 2 TB PCIe Gen4 SSD
Display: 14″ 1920×1200 IPS, 500 nits, 100% sRGB
Networking: Wi-Fi 7 (MediaTek MT7925)
Battery: 57 Wh (decent, but let’s just say AI workloads = keep the charger handy 🔌)

The real star here is that 96 GB of RAM in a 3 lb laptop. That’s unusual in this size/price class and was the reason I gambled on this configuration.

🧪 What I Tried Running

I’ve been testing various GGUF quantized LLMs (Qwen, LLaMA derivatives, and even some 100B+ parameter experiments) through LM Studio. Performance depends heavily on quantization level, but here’s the gist:

7B – 14B models: Run smooth as butter 🧈, even in FP16. Easily push 15–25 tokens/s.
30B class models: Manageable in quantized form, usually around 7–12 tokens/s.
70B class models: Surprisingly possible with aggressive quantization (Q4_K_M or Q5_0). Throughput drops to ~4–7 tokens/s, but still interactive enough for hobbyist use.
100B+ experiments (like GPT-style 120B OSS builds): Yes, it loads, and yes, it works (barely). My best so far is around 6–7 tokens/s with a Q4 quant. Borderline usable, but hey—it runs! 😅

For reference, that’s performance in the same ballpark as some desktops with 24–32 GB GPUs… but done here with just CPU + a heap of RAM.

💻 What It Feels Like in Practice

Heat & Noise: The laptop holds up well; fan noise is noticeable but not jet-engine level. The Ryzen chip + Lenovo cooling is surprisingly competent.
Power: You’ll want it plugged in. Sustained AI workloads = battery drain city. 🔋
Portability: With 96 GB RAM in a 14″ chassis, it feels like carrying around a mini AI dev lab in my backpack.

🌟 Why This Setup Works

Most laptops max out at 32 GB or 64 GB RAM, which rules out big LLM experiments. The P14s Gen 6 AMD is a rare bird—affordable, upgradeable, and supporting 48 GB SODIMMs per slot. That’s what makes the magic possible. ✨

⚡ Final Thoughts

Is this the fastest way to run big models? Nope. But as a balance of price, portability, and capability, the ThinkPad P14s Gen 6 AMD is shockingly good. For under $2k, I now have a mobile AI lab that can handle everything from snappy assistants to lumbering giants like 70B+ models (if I’m patient).

If you’ve ever wanted to tinker with LLMs locally without lugging around a desktop tower or dropping $4k on a workstation, this might be one of the best bang-for-buck setups out there. 💰

Now the real question: how far can we push this little ThinkPad before it taps out? I’ll keep testing and posting updates. Stay tuned. 😉

computer tips, Gadgets and Tech, Work 96gb, lenovo, llm, lm studio, thinkpad

J.D. Hodges

Every man has a story, this is my story.