ThinkPad P14s AI 9 XH PRO 370 w/96GB RAM & LLM benchmarks
Running Big LLMs on a Little Workstation: My Adventures with the ThinkPad P14s Gen 6 AMD
I’ve been experimenting with large language models (LLMs) lately, and I wanted to see how far I could push things using a (relatively) inexpensive laptop. Enter the ThinkPad P14s Gen 6 AMD—a slim mobile workstation that set me back about $1,600. On paper it’s not exactly a “supercomputer,” but with the right configuration (and my go-to tool, LM Studio), it turns out this little black box can handle some pretty big AI workloads. 🚀
🔧 The Specs That Matter
- CPU: AMD Ryzen™ AI 9 HX PRO 370 (12 cores / 24 threads, up to 5.10 GHz)
- GPU: Integrated AMD Radeon™ 890M
- RAM: 96 GB DDR5-5600 (2 × 48 GB SODIMMs)
- Storage: 2 TB PCIe Gen4 SSD
- Display: 14″ 1920×1200 IPS, 500 nits, 100% sRGB
- Networking: Wi-Fi 7 (MediaTek MT7925)
- Battery: 57 Wh (decent, but let’s just say AI workloads = keep the charger handy 🔌)
The real star here is that 96 GB of RAM in a 3 lb laptop. That’s unusual in this size/price class and was the reason I gambled on this configuration.
🧪 What I Tried Running
I’ve been testing various GGUF quantized LLMs (Qwen, LLaMA derivatives, and even some 100B+ parameter experiments) through LM Studio. Performance depends heavily on quantization level, but here’s the gist:
- 7B – 14B models: Run smooth as butter 🧈, even in FP16. Easily push 15–25 tokens/s.
- 30B class models: Manageable in quantized form, usually around 7–12 tokens/s.
- 70B class models: Surprisingly possible with aggressive quantization (Q4_K_M or Q5_0). Throughput drops to ~4–7 tokens/s, but still interactive enough for hobbyist use.
- 100B+ experiments (like GPT-style 120B OSS builds): Yes, it loads, and yes, it works (barely). My best so far is around 6–7 tokens/s with a Q4 quant. Borderline usable, but hey—it runs! 😅
For reference, that’s performance in the same ballpark as some desktops with 24–32 GB GPUs… but done here with just CPU + a heap of RAM.
💻 What It Feels Like in Practice
- Heat & Noise: The laptop holds up well; fan noise is noticeable but not jet-engine level. The Ryzen chip + Lenovo cooling is surprisingly competent.
- Power: You’ll want it plugged in. Sustained AI workloads = battery drain city. 🔋
- Portability: With 96 GB RAM in a 14″ chassis, it feels like carrying around a mini AI dev lab in my backpack.
🌟 Why This Setup Works
Most laptops max out at 32 GB or 64 GB RAM, which rules out big LLM experiments. The P14s Gen 6 AMD is a rare bird—affordable, upgradeable, and supporting 48 GB SODIMMs per slot. That’s what makes the magic possible. ✨
⚡ Final Thoughts
Is this the fastest way to run big models? Nope. But as a balance of price, portability, and capability, the ThinkPad P14s Gen 6 AMD is shockingly good. For under $2k, I now have a mobile AI lab that can handle everything from snappy assistants to lumbering giants like 70B+ models (if I’m patient).
If you’ve ever wanted to tinker with LLMs locally without lugging around a desktop tower or dropping $4k on a workstation, this might be one of the best bang-for-buck setups out there. 💰
Now the real question: how far can we push this little ThinkPad before it taps out? I’ll keep testing and posting updates. Stay tuned. 😉
This is so cool! Love seeing the experience with running large AI models on a ThinkPad P14s. Its inspiring to see such capable AI work being done on a relatively affordable, portable setup.
Thanks 👍👍
Have a great week!
-J.D.
Hi JD.
It’s been a while since I followed you, Brandy and the Drury gang through your undergraduate adventures.
Why not simply leverage openai’s LLMs through PyGPT?
Gun Control and Violence Prevention, an A.I. Odyssey:
Taking a Sober Step Back from the Pop-Culture
“Prepper” with an “Arsenal of Firearms” Trope
Query: > How has gun control failed to prevent violence?
Response from PyGPT Version: 2.5.33, Linux, x86_64 (snap)
Build: 2025-07-10
OpenAI API: 1.91.0, LlamaIndex: 0.12.44
Official website: https://pygpt.net
GitHub: https://github.com/szczyglis-dev/py-gpt
Documentation: https://pygpt.readthedocs.io
(c) 2025 Marcin Szczygliński
info@pygpt.net
Hey Terry, great to hear from you! 😊 Those days feel like a whole different lifetime, so your note gave me a big grin. Loving life these days too. Hope you’re doing great!
And yep, totally fair question. Using OpenAI (whether directly or through a wrapper like PyGPT) or Claude is often the most practical move, and I do lean on hosted models when I need the best quality fast or I’m building something where the “it just works” factor matters most.
The reason I keep testing local LLMs (and benchmarking machines like the P14s) is mainly data privacy and offline capability (I don’t trust big corporations very much lol 😂), predictable latency, cost control at higher usage, and honestly the pure IT-nerd joy of seeing what you can squeeze out of real (and cheap-ish) hardware. Also, when I’m testing and benchmarking, local runs are easier to keep apples-to-apples without model updates shifting under my feet.
And I appreciate the PyGPT info dump. I’ve heard of it but never actually used it. You’ve officially put it on my “go try this” list. Thanks again for dropping by! 🙏
-J.D.
Wow, J.D.! Thats some serious RAM packing! 😂 Carrying a mini AI dev lab in a backpack sounds like a great way to ensure your commute is *never* boring – guaranteed to get stares! 🔋🌟
Handling 70B+ models on a laptop is impressive, though 6–7 tokens/s sounds painfully slow for anything beyond simple queries. Hope the coffee supply is robust! ☕️💻
ThinkPad P14s Gen 6 AMD is definitely a gem for value, but calling it shockingly good might be an understatement – its practically a unicorn in the laptop world! Keep pushing those boundaries, J.D.! Were all curious to see how long this portable powerhouse can keep chugging before the inevitable battery drama ensues. 😉 Stay tuned!
Thanks for noticing the RAM packing! You nailed it, carrying an “AI dev lab” that fits on an airplane tray table is the real win here. Agreed, 6–7 tokens/s requires serious patience (and coffee ☕️), but it’s amazing that this little unicorn even loads those models at all. The battery anxiety is intense; the charger is definitely glued to the wall when the LLMs are running hot! 😅
Have a great week and thanks for visiting!
-J.D.