ThinkPad P14s AI 9 XH PRO 370 w/96GB RAM & LLM benchmarks

Running Big LLMs on a Little Workstation: My Adventures with the ThinkPad P14s Gen 6 AMD

I’ve been experimenting with large language models (LLMs) lately, and I wanted to see how far I could push things using a (relatively) inexpensive laptop. Enter the ThinkPad P14s Gen 6 AMD—a slim mobile workstation that set me back about $1,600. On paper it’s not exactly a “supercomputer,” but with the right configuration (and my go-to tool, LM Studio), it turns out this little black box can handle some pretty big AI workloads. 🚀


🔧 The Specs That Matter

  • CPU: AMD Ryzen™ AI 9 HX PRO 370 (12 cores / 24 threads, up to 5.10 GHz)
  • GPU: Integrated AMD Radeon™ 890M
  • RAM: 96 GB DDR5-5600 (2 × 48 GB SODIMMs)
  • Storage: 2 TB PCIe Gen4 SSD
  • Display: 14″ 1920×1200 IPS, 500 nits, 100% sRGB
  • Networking: Wi-Fi 7 (MediaTek MT7925)
  • Battery: 57 Wh (decent, but let’s just say AI workloads = keep the charger handy 🔌)

The real star here is that 96 GB of RAM in a 3 lb laptop. That’s unusual in this size/price class and was the reason I gambled on this configuration.


🧪 What I Tried Running

I’ve been testing various GGUF quantized LLMs (Qwen, LLaMA derivatives, and even some 100B+ parameter experiments) through LM Studio. Performance depends heavily on quantization level, but here’s the gist:

  • 7B – 14B models: Run smooth as butter 🧈, even in FP16. Easily push 15–25 tokens/s.
  • 30B class models: Manageable in quantized form, usually around 7–12 tokens/s.
  • 70B class models: Surprisingly possible with aggressive quantization (Q4_K_M or Q5_0). Throughput drops to ~4–7 tokens/s, but still interactive enough for hobbyist use.
  • 100B+ experiments (like GPT-style 120B OSS builds): Yes, it loads, and yes, it works (barely). My best so far is around 6–7 tokens/s with a Q4 quant. Borderline usable, but hey—it runs! 😅

For reference, that’s performance in the same ballpark as some desktops with 24–32 GB GPUs… but done here with just CPU + a heap of RAM.


💻 What It Feels Like in Practice

  • Heat & Noise: The laptop holds up well; fan noise is noticeable but not jet-engine level. The Ryzen chip + Lenovo cooling is surprisingly competent.
  • Power: You’ll want it plugged in. Sustained AI workloads = battery drain city. 🔋
  • Portability: With 96 GB RAM in a 14″ chassis, it feels like carrying around a mini AI dev lab in my backpack.

🌟 Why This Setup Works

Most laptops max out at 32 GB or 64 GB RAM, which rules out big LLM experiments. The P14s Gen 6 AMD is a rare bird—affordable, upgradeable, and supporting 48 GB SODIMMs per slot. That’s what makes the magic possible. ✨


⚡ Final Thoughts

Is this the fastest way to run big models? Nope. But as a balance of price, portability, and capability, the ThinkPad P14s Gen 6 AMD is shockingly good. For under $2k, I now have a mobile AI lab that can handle everything from snappy assistants to lumbering giants like 70B+ models (if I’m patient).

If you’ve ever wanted to tinker with LLMs locally without lugging around a desktop tower or dropping $4k on a workstation, this might be one of the best bang-for-buck setups out there. 💰

Now the real question: how far can we push this little ThinkPad before it taps out? I’ll keep testing and posting updates. Stay tuned. 😉

6 comments

  • W.P.

    This is so cool! Love seeing the experience with running large AI models on a ThinkPad P14s. Its inspiring to see such capable AI work being done on a relatively affordable, portable setup.

  • Terry Gruzebeck

    Hi JD.

    It’s been a while since I followed you, Brandy and the Drury gang through your undergraduate adventures.

    Why not simply leverage openai’s LLMs through PyGPT?

    Gun Control and Violence Prevention, an A.I. Odyssey:
    Taking a Sober Step Back from the Pop-Culture
    “Prepper” with an “Arsenal of Firearms” Trope

    Query: > How has gun control failed to prevent violence?

    Response from PyGPT Version: 2.5.33, Linux, x86_64 (snap)
    Build: 2025-07-10
    OpenAI API: 1.91.0, LlamaIndex: 0.12.44

    Official website: https://pygpt.net
    GitHub: https://github.com/szczyglis-dev/py-gpt
    Documentation: https://pygpt.readthedocs.io

    (c) 2025 Marcin Szczygliński
    info@pygpt.net

    • Hey Terry, great to hear from you! 😊 Those days feel like a whole different lifetime, so your note gave me a big grin. Loving life these days too. Hope you’re doing great!

      And yep, totally fair question. Using OpenAI (whether directly or through a wrapper like PyGPT) or Claude is often the most practical move, and I do lean on hosted models when I need the best quality fast or I’m building something where the “it just works” factor matters most.

      The reason I keep testing local LLMs (and benchmarking machines like the P14s) is mainly data privacy and offline capability (I don’t trust big corporations very much lol 😂), predictable latency, cost control at higher usage, and honestly the pure IT-nerd joy of seeing what you can squeeze out of real (and cheap-ish) hardware. Also, when I’m testing and benchmarking, local runs are easier to keep apples-to-apples without model updates shifting under my feet.

      And I appreciate the PyGPT info dump. I’ve heard of it but never actually used it. You’ve officially put it on my “go try this” list. Thanks again for dropping by! 🙏
      -J.D.

  • Fred Posner

    Wow, J.D.! Thats some serious RAM packing! 😂 Carrying a mini AI dev lab in a backpack sounds like a great way to ensure your commute is *never* boring – guaranteed to get stares! 🔋🌟

    Handling 70B+ models on a laptop is impressive, though 6–7 tokens/s sounds painfully slow for anything beyond simple queries. Hope the coffee supply is robust! ☕️💻

    ThinkPad P14s Gen 6 AMD is definitely a gem for value, but calling it shockingly good might be an understatement – its practically a unicorn in the laptop world! Keep pushing those boundaries, J.D.! Were all curious to see how long this portable powerhouse can keep chugging before the inevitable battery drama ensues. 😉 Stay tuned!

    • Thanks for noticing the RAM packing! You nailed it, carrying an “AI dev lab” that fits on an airplane tray table is the real win here. Agreed, 6–7 tokens/s requires serious patience (and coffee ☕️), but it’s amazing that this little unicorn even loads those models at all. The battery anxiety is intense; the charger is definitely glued to the wall when the LLMs are running hot! 😅

      Have a great week and thanks for visiting!
      -J.D.

Leave a Reply to J.D. H. Cancel reply

Your email address will not be published. Required fields are marked *