ThinkPad P14s AI 9 XH PRO 370 w/96GB RAM & LLM benchmarks
Running Big LLMs on a Little Workstation: My Adventures with the ThinkPad P14s Gen 6 AMD
I’ve been experimenting with large language models (LLMs) lately, and I wanted to see how far I could push things using a (relatively) inexpensive laptop. Enter the ThinkPad P14s Gen 6 AMD—a slim mobile workstation that set me back about $1,600. On paper it’s not exactly a “supercomputer,” but with the right configuration (and my go-to tool, LM Studio), it turns out this little black box can handle some pretty big AI workloads. 🚀
🔧 The Specs That Matter
- CPU: AMD Ryzen™ AI 9 HX PRO 370 (12 cores / 24 threads, up to 5.10 GHz)
- GPU: Integrated AMD Radeon™ 890M
- RAM: 96 GB DDR5-5600 (2 × 48 GB SODIMMs)
- Storage: 2 TB PCIe Gen4 SSD
- Display: 14″ 1920×1200 IPS, 500 nits, 100% sRGB
- Networking: Wi-Fi 7 (MediaTek MT7925)
- Battery: 57 Wh (decent, but let’s just say AI workloads = keep the charger handy 🔌)
The real star here is that 96 GB of RAM in a 3 lb laptop. That’s unusual in this size/price class and was the reason I gambled on this configuration.
🧪 What I Tried Running
I’ve been testing various GGUF quantized LLMs (Qwen, LLaMA derivatives, and even some 100B+ parameter experiments) through LM Studio. Performance depends heavily on quantization level, but here’s the gist:
- 7B – 14B models: Run smooth as butter 🧈, even in FP16. Easily push 15–25 tokens/s.
- 30B class models: Manageable in quantized form, usually around 7–12 tokens/s.
- 70B class models: Surprisingly possible with aggressive quantization (Q4_K_M or Q5_0). Throughput drops to ~4–7 tokens/s, but still interactive enough for hobbyist use.
- 100B+ experiments (like GPT-style 120B OSS builds): Yes, it loads, and yes, it works (barely). My best so far is around 6–7 tokens/s with a Q4 quant. Borderline usable, but hey—it runs! 😅
For reference, that’s performance in the same ballpark as some desktops with 24–32 GB GPUs… but done here with just CPU + a heap of RAM.
💻 What It Feels Like in Practice
- Heat & Noise: The laptop holds up well; fan noise is noticeable but not jet-engine level. The Ryzen chip + Lenovo cooling is surprisingly competent.
- Power: You’ll want it plugged in. Sustained AI workloads = battery drain city. 🔋
- Portability: With 96 GB RAM in a 14″ chassis, it feels like carrying around a mini AI dev lab in my backpack.
🌟 Why This Setup Works
Most laptops max out at 32 GB or 64 GB RAM, which rules out big LLM experiments. The P14s Gen 6 AMD is a rare bird—affordable, upgradeable, and supporting 48 GB SODIMMs per slot. That’s what makes the magic possible. ✨
⚡ Final Thoughts
Is this the fastest way to run big models? Nope. But as a balance of price, portability, and capability, the ThinkPad P14s Gen 6 AMD is shockingly good. For under $2k, I now have a mobile AI lab that can handle everything from snappy assistants to lumbering giants like 70B+ models (if I’m patient).
If you’ve ever wanted to tinker with LLMs locally without lugging around a desktop tower or dropping $4k on a workstation, this might be one of the best bang-for-buck setups out there. 💰
Now the real question: how far can we push this little ThinkPad before it taps out? I’ll keep testing and posting updates. Stay tuned. 😉
This is so cool! Love seeing the experience with running large AI models on a ThinkPad P14s. Its inspiring to see such capable AI work being done on a relatively affordable, portable setup.
Thanks 👍👍
Have a great week!
-J.D.
Hi JD.
It’s been a while since I followed you, Brandy and the Drury gang through your undergraduate adventures.
Why not simply leverage openai’s LLMs through PyGPT?
Gun Control and Violence Prevention, an A.I. Odyssey:
Taking a Sober Step Back from the Pop-Culture
“Prepper” with an “Arsenal of Firearms” Trope
Query: > How has gun control failed to prevent violence?
Response from PyGPT Version: 2.5.33, Linux, x86_64 (snap)
Build: 2025-07-10
OpenAI API: 1.91.0, LlamaIndex: 0.12.44
Official website: https://pygpt.net
GitHub: https://github.com/szczyglis-dev/py-gpt
Documentation: https://pygpt.readthedocs.io
(c) 2025 Marcin Szczygliński
info@pygpt.net
Hey Terry, great to hear from you! 😊 Those days feel like a whole different lifetime, so your note gave me a big grin. Loving life these days too. Hope you’re doing great!
And yep, totally fair question. Using OpenAI (whether directly or through a wrapper like PyGPT) or Claude is often the most practical move, and I do lean on hosted models when I need the best quality fast or I’m building something where the “it just works” factor matters most.
The reason I keep testing local LLMs (and benchmarking machines like the P14s) is mainly data privacy and offline capability (I don’t trust big corporations very much lol 😂), predictable latency, cost control at higher usage, and honestly the pure IT-nerd joy of seeing what you can squeeze out of real (and cheap-ish) hardware. Also, when I’m testing and benchmarking, local runs are easier to keep apples-to-apples without model updates shifting under my feet.
And I appreciate the PyGPT info dump. I’ve heard of it but never actually used it. You’ve officially put it on my “go try this” list. Thanks again for dropping by! 🙏
-J.D.
I absolutely concur with your concerns about privacy, J.D. I warn people who use the Leo A.I. setup in the Brave Browser that just because you pay Brave a subscription fee for which they assure privacy, if you use Deepseek for proprietary work, your data goes to Deepseek’s servers, which may or may not be monitored by the Chinese government.
The above said, because I only use A.I. for public policy research and analysis (which then is emailed to both the press and appropriate government officials), I’ve gotten as lazy as I am cheap. I found that PyGPT, while the only game in town if you want an LLM to analyze 100 of your data files comprehensively in one session, it’s high-maintenance.
In order to preserve both RAM and physical storage, as well as GPU and CPU usage, I have therefore defaulted to BrowerOS-ai (https://github.com/browseros-ai/BrowserOS/releases), which has handled some of the most intense data analysis I’ve ever done using local ports (BYOM) of my api subscriptions to both OpenAI and Anthropic.
The added advantage is BrowserOS’s “Council” set up, where you can query up to 3 A.I. assistants in separate panes of the same online session window (e.g., ChatGPT, Perplexity and Gemini). This also allows you to “cross-pollinate” their analyses with each other’s responses on the fly. They are instructed by prompting to vet each other. I then take their individually siloed outputs and funnel them into a 3-pane table, which I feed to Claude in a private session for further rumination, refereeing, vetting and a final comprehensive analysis.
Once Claude generates his/its report, I take his HTML output along with my original 3-pane comparison panel, export it all as a PDF file, and then run it through PDFindexgenerator, which appends a “live” (i.e., fully navigable) index to the original PDF file.
Now I have an optimally useful document (sans an occasional table or graphic that doesn’t render well in the conversion from HTML to PDF, but which are unscathed when included in the body of an email, and which are also more amenable to copying and pasting) that has both a built-in, “live” table of contents and an index. For those exhibits that did not render well, you have the HTML version in the email for reference.
Here’s my theory on the utility of this setup:
“Its universal utility lies in the method of its research model, and in its resulting yield of a comprehensively *overwhelming preponderance of the multiple-A.I.-model-generated, cross-corroborating AND independently-vetted evidence,* the likes of which ultimately prevails in the forums where public policy is litigated, and where winners find political backing and funding.”
“If properly set up and executed, this method/model acts like an Artificial Intelligence Large Language Model (AI LLM) filtration system. You simply skim ‘the cream’ off the top.”
“The above said, it’s not to say that tweaks to the selection of inputs would not produce variances in outputs. That’s the nature of comparative policy analysis. However, unless and until improvements are made to the actual model itself (i.e., through replication analysis), it is now, at this point in its development, the most well-tested, validated, de facto “gold standard” of AI-based public policy research in existence.”
Feel free to email me if you want to request a sample from almost any public policy topic area.
Terry, what a detailed and well-thought-out workflow! The DeepSeek/Brave privacy point is spot-on and something a lot of people miss.
BrowserOS-ai is new to me and I appreciate the pointer. The BYOM model with local port routing is clever, you get the UI convenience without surrendering your API keys to yet another cloud layer.
The Council setup is something I’ve been experimenting with and the “cross-pollinate and vet each other” instruction is a technique I’ve also found genuinely useful. There’s something valuable (and fun) about having the models argue with each other before you accept any conclusion.
Your final step of using Claude as the “referee” and report generator is essentially how I’d approach it too, it’s good at synthesis and catching logical inconsistencies across sources.
Keep me posted and have a fantastic weekend!
-J.D.
Wow, J.D.! Thats some serious RAM packing! 😂 Carrying a mini AI dev lab in a backpack sounds like a great way to ensure your commute is *never* boring – guaranteed to get stares! 🔋🌟
Handling 70B+ models on a laptop is impressive, though 6–7 tokens/s sounds painfully slow for anything beyond simple queries. Hope the coffee supply is robust! ☕️💻
ThinkPad P14s Gen 6 AMD is definitely a gem for value, but calling it shockingly good might be an understatement – its practically a unicorn in the laptop world! Keep pushing those boundaries, J.D.! Were all curious to see how long this portable powerhouse can keep chugging before the inevitable battery drama ensues. 😉 Stay tuned!
Thanks for noticing the RAM packing! You nailed it, carrying an “AI dev lab” that fits on an airplane tray table is the real win here. Agreed, 6–7 tokens/s requires serious patience (and coffee ☕️), but it’s amazing that this little unicorn even loads those models at all. The battery anxiety is intense; the charger is definitely glued to the wall when the LLMs are running hot! 😅
Have a great week and thanks for visiting!
-J.D.