ThinkPad P14s AI 9 XH PRO 370 w/96GB RAM & LLM benchmarks
Running Big LLMs on a Little Workstation: My Adventures with the ThinkPad P14s Gen 6 AMD
I’ve been experimenting with large language models (LLMs) lately, and I wanted to see how far I could push things using a (relatively) inexpensive laptop. Enter the ThinkPad P14s Gen 6 AMD—a slim mobile workstation that set me back about $1,600. On paper it’s not exactly a “supercomputer,” but with the right configuration (and my go-to tool, LM Studio), it turns out this little black box can handle some pretty big AI workloads. 🚀
🔧 The Specs That Matter
- CPU: AMD Ryzen™ AI 9 HX PRO 370 (12 cores / 24 threads, up to 5.10 GHz)
- GPU: Integrated AMD Radeon™ 890M
- RAM: 96 GB DDR5-5600 (2 × 48 GB SODIMMs)
- Storage: 2 TB PCIe Gen4 SSD
- Display: 14″ 1920×1200 IPS, 500 nits, 100% sRGB
- Networking: Wi-Fi 7 (MediaTek MT7925)
- Battery: 57 Wh (decent, but let’s just say AI workloads = keep the charger handy 🔌)
The real star here is that 96 GB of RAM in a 3 lb laptop. That’s unusual in this size/price class and was the reason I gambled on this configuration.
🧪 What I Tried Running
I’ve been testing various GGUF quantized LLMs (Qwen, LLaMA derivatives, and even some 100B+ parameter experiments) through LM Studio. Performance depends heavily on quantization level, but here’s the gist:
- 7B – 14B models: Run smooth as butter 🧈, even in FP16. Easily push 15–25 tokens/s.
- 30B class models: Manageable in quantized form, usually around 7–12 tokens/s.
- 70B class models: Surprisingly possible with aggressive quantization (Q4_K_M or Q5_0). Throughput drops to ~4–7 tokens/s, but still interactive enough for hobbyist use.
- 100B+ experiments (like GPT-style 120B OSS builds): Yes, it loads, and yes, it works (barely). My best so far is around 6–7 tokens/s with a Q4 quant. Borderline usable, but hey—it runs! 😅
For reference, that’s performance in the same ballpark as some desktops with 24–32 GB GPUs… but done here with just CPU + a heap of RAM.
💻 What It Feels Like in Practice
- Heat & Noise: The laptop holds up well; fan noise is noticeable but not jet-engine level. The Ryzen chip + Lenovo cooling is surprisingly competent.
- Power: You’ll want it plugged in. Sustained AI workloads = battery drain city. 🔋
- Portability: With 96 GB RAM in a 14″ chassis, it feels like carrying around a mini AI dev lab in my backpack.
🌟 Why This Setup Works
Most laptops max out at 32 GB or 64 GB RAM, which rules out big LLM experiments. The P14s Gen 6 AMD is a rare bird—affordable, upgradeable, and supporting 48 GB SODIMMs per slot. That’s what makes the magic possible. ✨
⚡ Final Thoughts
Is this the fastest way to run big models? Nope. But as a balance of price, portability, and capability, the ThinkPad P14s Gen 6 AMD is shockingly good. For under $2k, I now have a mobile AI lab that can handle everything from snappy assistants to lumbering giants like 70B+ models (if I’m patient).
If you’ve ever wanted to tinker with LLMs locally without lugging around a desktop tower or dropping $4k on a workstation, this might be one of the best bang-for-buck setups out there. 💰
Now the real question: how far can we push this little ThinkPad before it taps out? I’ll keep testing and posting updates. Stay tuned. 😉