Search  for anything...

GMKtec EVO-X2 AI Mini PC Ryzen Al Max+ 395 (up to 5.1GHz) Mini Gaming Computers, 96GB LPDDR5X 8000MHz (12GB*8) 1TB PCIe 4.0 SSD, Quad Screen 8K Display, WiFi 7 & USB4, SD Card Reader 4.0

  • Based on 0 reviews
Condition: New
Checking for inventory...

Notify me when this product is back in stock

$2,228.98 Why this price?

Buy Now, Pay Later


As low as / mo
  • – Up to 36-month term if approved
  • – No impact on credit to apply
  • – Instant approval decision
  • – Secure and straightforward checkout

Payment plans are offered through our trusted finance partners Klarna, Affirm, Afterpay, Apple Pay, and PayTomorrow. No-credit-needed leasing options through Acima may also be available at checkout.

Learn more about financing & leasing here.

Selected Option

Free shipping on this product

FREE 30-day refund/replacement

To qualify for a full refund, items must be returned in their original, unused condition. If an item is returned in a used, damaged, or materially different state, you may be granted a partial refund.

To initiate a return, please visit our Returns Center.

View our full returns policy here.


Availability: Unavailable
Fulfilled by Amazon
Available payment plans shown during checkout

Size: 96GB LPDDR5X + 1TB


Features

  • EVOLUTION RYZEN AI MAX+ 395 MINI PC - GMKtec EVO-X2 is the next evolution in AI mini PC Ryzen Strix Halo series. Thanks to AMD Simultaneous Multithreading (SMT) the core-count is effectively doubled, to 32 threads. Ryzen AI Max+ 395 has 64 MB of L3 cache and can boost up to 5.1 GHz, depending on the workload. The Ryzen AI Max+ 395 is currently rated as the "most powerful x86 APU" on the market for AI computing.
  • AI NPU with XDNA 2 ARCHITECTURE - Powered by 16 Zen 5 CPU cores, 50+ peak AI TOPS XDNA 2 NPU and a truly massive integrated GPU driven by 40 AMD RDNA 3.5 CUs, the Ryzen AI MAX+ 395 is a transformative upgrade and delivers a significant performance boost over the competition. The Ryzen AI Max+ 395 excels in consumer AI workloads like the llama.cpp-powered application: LM Studio. Shaping up to be the must-have app for client LLM workloads, LM Studio allows users to locally run the latest language model without any technical knowledge required and unleash their creativity and productivity.
  • AMD RADEON 8090S iGPU GAMING PC - The AMD Radeon RX 8060S offers all 40 CUs with up to 2.9 GHz graphics clock and uses the new RDNA 3.5 architecture. The powerful iGPU is positioned between an RTX 4060 and 4070 laptop GPU and therefore enables gaming in FHD at maximum details in most demanding games. The 8060S can also utilize the full 128GB pool, which is perfect for running LLMs such as Deepseek 70B Q8, which runs comfortably on this machine.
  • EIGHT CHANNEL LPDDR5X - LPDDR5X is a new ground breaking memory small form factor installed on-board. With blazing speeds up to to 8000MT/s, it runs 1.5x faster than the DDR5 SODIMMs; 90% better performance over DDR5 SODIMMs in video conferencing and photo editing; 30% better performance in productivity apps; 12% better performance in digital content workloads.
  • QUAD SCREEN 8K DISPLAY SUPPORT - EVO-X2 AI Mini PC support 4-screen 4K/8K output via HDMI 2.1 (8K@60Hz), DisplayPort 1.4 (4K@60Hz), and dual USB 4 40Gbps Transfer speed (supporting PD3.0/DP1.4/DATA). Ideal for gaming, video editing, and multitasking, it provides expansive and crisp multi-display support.
  • FAST 2.5GBE + WIFI 7 + BT 5.4 - Ethernet 2.5GbE LAN port design provides more applications, such as firewall, multichannel aggregation, soft routing, file storage server, etc. Known as 802.11be, Wi-Fi 7 promises up to 46Gbps theoretical throughput, making it 4.8x faster than Wi-Fi 6 and 13x faster than Wi-Fi 5, while maintaining compatibility with older Wi-Fi versions at lower speeds. Built-in Bluetooth 5.4 is more stable and efficient to connect multiple wireless devices such as projector, printer, monitor, speakers and etc.
  • TRIPLE COOLING FANS WITH LIGHTING - Dual turbo CPU fans + a massive DDR5/SSD cooling fan deliver silent, ultra-efficient cooling (just 35dB in Quiet Mode!), while the 3 advanced heatpipes and 360 airflow keep your Ryzen AI Max+ 395 Mini PC frosty under heavy loads. Plus, 13 dazzling RGB lighting modes let you personalize your rigs vibe with the touch of a button! Cooler. Quieter. Brighter.
  • THREE PERFORMANCE MODES - Switch seamlessly between Quiet (54W), Balanced (85W), and Performance Mode (140W) with just a tap of the dedicated power button for ultimate convenienceno BIOS hassle! See instant on-screen symbol confirmation when you shift modes. Unlock up to 96GB VRAM (via AMD software) for next-level AI and gaming, plus enjoy Auto Power On & Wake-on-LAN features. Power redefined.
  • SD 4.0 CARD READER - The SD/TF 4.0 Card Reader ensures steady and efficient data transmission, supporting SD/TF 4.0 and UHS-II cards. Faster photo and video transfer speeds ensuring optimal work performance.
  • GMKtec Warranty- GMKtec offers a 1-year limited warranty for each mini PC, starting from the date of the purchase. All defects due to design and workmanship are covered. With a professional after sales team always ready to attend to your needs, you can simply relax and enjoy your mini PC.

Brand: GMKtec


Operating System: Windows 11 Pro


CPU Model: Ryzen AI Max


CPU Speed: 5.1 GHz


Cache Size: 64 MB


Graphics Card Description: Integrated


Graphics Coprocessor: AMD Radeon 8060S Graphics 40Cores RDNA3.5


Memory Storage Capacity: 96 GB


Specific Uses For Product: Everyday Use, Gaming, Video Editing


Personal computer design type: Mini PC


Operating System: Windows 11 Pro


Specific Uses For Product: Everyday Use, Gaming, Video Editing


Personal Computer Design Type: Mini PC


Color: Silver


Additional Features: Eight Channel LPDDR5X 8000MT/s, 96GB VRAM Allocation, HDMI 2.1, DP, 2*USB4 Quad Display Video Ports, WIFI 7, BT5.4, Metal Chassis, Power Mode Button, 13 Different Lighting Modes on Cooling Fan, Auto Power On, WOL, Fan Speed Control, Upgraded Top and Bottom Large Cooling Fans


Hard Disk Description: PCIe 4.0 M.2 2280 SSD Dual Slots Max.4TB Each Slot


Hardware Interface: 3.5mm Audio, Bluetooth 5, DisplayPort, Ethernet, HDMI, SDXC, USB Type C


Power Consumption: 45 Watts


Item Dimensions: 6 x 6 x 3 inches


Item Weight: 3.64 Pounds


Video Output Interface: DisplayPort, HDMI, USB4


Hard Disk Interface: PCIE x 16


Style Name: Mini Computer


Cooling Method: Air


Compatible Devices: Headphone, Keyboard, Monitor, Mouse, Television


Power Plug Type: Type B - 3 pin (North American)


Video Output: HDMI, DisplayPort


Graphics Description: Integrated


Graphics Coprocessor: AMD Radeon 8060S Graphics 40Cores RDNA3.5


Graphics Card Ram: 96 GB


Graphics Ram Type: VRAM


Graphics Card Interface: Integrated


Processor Series: Ryzen AI Max


Processor Speed: 5.1 GHz


Processor Socket: TSMC 4nm FinFET


Processor Count: 16


Total Usb Ports: 7


Total Number of HDMI Ports: 1


Number of Component Outputs: 4


Human-Interface Input: Buttons, Keyboard, Microphone, Mouse


Brand: GMKtec


Model Number: EVO-X2


Model Name: EVO-X2 US


Built-In Media: GMKtec Nucbox EVO AMD Ryzen AI Max+ 395 Mini PC Computer, HDMI Cable, Power Supply & Cable, User Manual


Processor Brand: AMD


Model Year: 2025


CPU Model Number: AMD Ryzen AI Max+ 395


Warranty Description: 1 year warranty


Video Processor: AMD


Manufacturer: Shenzhenshi Jimokekejiyouxiangongsi


Cache Memory Installed Size: 64 MB


Memory Storage Capacity: 96 GB


RAM Memory Installed: 96 GB


RAM Memory Technology: LPDDR5X 8000MT/S


Ram Memory Maximum Size: 128 GB


Memory Speed: 8000 MT/s


RAM Type: DDR5 RAM


Memory Clock Speed: 8000 MHz


Display Resolution Maximum: 7680x4320


Display Type: External


Aspect Ratio: 219


Resolution: 3840 x 2160


Native Resolution: 3840 x 2160


Connectivity Technology: Bluetooth, Ethernet, HDMI, USB, Wi-Fi


Wireless Compability: 2.4 GHz Radio Frequency, 5 GHz Radio Frequency, 5.8 GHz Radio Frequency, 802.11.be, Bluetooth


Wireless Technology: Bluetooth, Wi-Fi


Wireless Network Technology: Wi-Fi


Frequently asked questions

This product is currently out of stock. Please check back later for shipping info.

Yes, absolutely! You may return this product for a full refund within 30 days of receiving it.

To initiate a return, please visit our Returns Center.

View our full returns policy here.

  • Klarna Financing
  • Affirm Pay in 4
  • Affirm Financing
  • Afterpay Financing
  • PayTomorrow Financing
  • Financing through Apple Pay
Leasing options through Acima may also be available during checkout.

Learn more about financing & leasing here.

Top Amazon Reviews


  • Capable of running large AI models
Size: 128GB LPDDR5X + 2TB
The largest AI model I am able to load onto my GMKtec EVO-X2 into the LM Studio software outputs about 8 tokens/second: Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound-inc This mixture-of-experts AI model is also known as: Qwen3-235B-A22B-Instruct-2507-128x10B-Q2_K_S-00001-of-00002.gguf This is a 235 Billion parameter model with Q2_K_S automatic Intel quantization. I am able to load this model with 96GB of memory dedicated to VRAM and the following model config in LM Studio 0.3.20: GPU Offload: 80 / 94 Offload KV Cache to GPU Memory: NO (slider to the left) Keep Model in Memory: NO (slider to the left) All other model config settings are at their default values, including the attempt to use mmap option, which seems to be necessary to load this particular model. In the magnifying lens / Runtime window, I had to select: GGUF: ROCm llama.cpp (Windows) v1.42 ROCm allows more layers to be offloaded to the GPU and allows this large model to be faster than the Llama 3.3 70B parameter model, which outputs about 5 tokens/second with Vulkan and with the test prompt, "Why is the sky blue?" The smaller models seem to run faster with Vulkan. When you reboot the computer, you can repeatedly press the Esc key to enter the BIOS configuration and set 96GB for the graphics memory and activate Performance mode. I learned that it is possible to activate the High Performance mode in Windows 11 by right clicking in the Windows menu and selecting: Terminal (Admin) A command prompt will then open. You can then enter: powercfg -setactive SCHEME_MIN I then went into the Control Panel / Power Options to change the advanced power settings for the "High performance" plan which was now unhidden. I set the "Turn off hard disk after" option to Never. I set the "Sleep after" option to Never. I set the "Minimum processor state" to 100%. The output speed of Qwen3 235B increased to 8.7 to 8.8 tokens/second for the test question, "Why is the sky blue?" This output rate is faster than the rate at which I can read when I think about what the AI model is writing. Smaller AI models might output answers faster, but the answers tend to be wrong with greater frequency than larger AI models. Wrong answers delivered fast are useless to me. Qwen3 235B sometimes outputs gibberish, such as, "GGGGG..." I don't know if this is due to a memory shortage, a context window that is too small, a result of the quantization, or something else. I don't seem to have enough memory left to increase the context windows size from the default 4,096. Keeping prompts short on one line seems to help prevent this issue. 08/04/2025 Update: I was able to increase the context window size by setting the options: Context Length: 262144 Evaluation Batch Size: 256 Flash attention: YES (slider to the right) The default Evaluation Batch Size of 512 seems to cause an eventual crash, but 256 seems stable. The speed of Qwen3 235B gradually slows down as the context window fills up. After about 27,000 tokens are in the context window, the output speed drops to about 1 token/second, so 27,000 is about the practical limit for the context length unless you are patient. I deleted OneDrive and the Recall feature. In the Edge browser, I disabled the Settings / System and performance / System / "Startup boost" option to prevent Edge from staying resident in memory all the time. I also disabled the option for "Continue running background extensions and apps when Edge is closed". I ran the msconfig program to disable the Print Spooler and some other print-related process which I will never use on this computer. 8/5/2025 Update: I was able to load the new AI model: openai/gpt-oss-120b The GGUF files come in 2 parts: gpt-oss-120b-MXFP4-00001-of-00002.gguf gpt-oss-120b-MXFP4-00002-of-00002.gguf This is a 120 Billion parameter GPT Open Source Software version from OpenAI with MXFP4 quantization. The size listed in LM Studio is 63.39GB. In the BIOS, I had to select 64GB dedicated to graphics memory. I am still able to load Qwen3 235B, but the output speed drops to about 7 tokens/second for the test question, "Why is the sky blue?" For Qwen3 235B, it seems better to have 96GB dedicated to graphics memory. The output speed of openai/gpt-oss-120b is about 10 tokens/second for the same question. The following is the openai/gpt-oss-120b model config in LM Studio 0.3.21: Context Length: 30000 GPU Offload: 36 / 36 Evaluation Batch Size: 4 Offload KV Cache to GPU Memory: NO (slider to the left) Keep Model in Memory: NO (slider to the left) Try mmap: NO (slider to the left) All other model config settings are at their default values. Attempting to offload the KV cache to GPU memory results in a failure to load. Attempting to activate Flash memory results in a failure to load. Setting the mmap option to NO seems to result in much more RAM being available once the model is loaded, dropping from about 57GB used to less than 5GB used according to the number in the lower right corner of LM Studio. In the Magnifying lens (Discover) / Runtime window, I had to select: GGUF: Vulkan llama.cpp (Windows) v1.44.0 Attempting to use the ROCm llama.cpp v1.43.1 causes the openai/gpt-oss-120b model not to load. So, if I switch between loading Qwen3 235B and openai/gpt-oss-120b, I must choose the version of llama.cpp that works with each model. 8/7/2025 Update: I was occasionally getting an error in GPT after it had finished outputting a response. The error message was something like: "Unexpected end of grammar stack after receiving input: G". Repeated outputs of the letter G, such as, "GGGGG...", seems to be related to the Evaluation Batch Size, which seems to be stable at a value of 4 for the GPT model. I used a variation of the following prompt and kept dividing the Evaluation Batch Size by 2 until I no longer got the error message: Please write a story about people who wake up from suspended animation in another star system. The study of Arctic and Antarctic fish allowed scientists to develop an artificial protein that inhibits ice formation. The people left Earth due to warfare and did not have time to prepare. The ship's AI still thinks it is near Earth in the past, because that is when the AI was last updated while its atomic clock still functioned. The atomic clock was hit with shrapnel in orbit, caused by anti-satellite weapons, that was recorded as a micro-meteorite strike. Automated systems sealed the hole in the hull, but the clock remained non-functional. Any watches worn by the people in cryochambers caused frost injuries to the skin where bare metal contacted skin. The watches were broken by the cold in the cryochambers. Any watches that were stored outside the cryochambers either have depleted batteries or have not been regularly wound by movement in the case of automatic watches. Analog watch hands are halted at various time indications. An automated routine in the cryochambers wakes the crew up independently of the main AI system. Any views of the outside are through cameras and sensors which are controlled by the AI. A crew member tells the AI system to start up the main systems, but the AI system refuses to do so, because the AI refuses to accept that time has passed and thinks the crew is discussing a hypothetical scenario. The crew must either convince the AI that time has passed or somehow bypass the AI's control over the ship to survive. Bypassing the AI will result in loss of navigation and dynamic fusion containment. The AI shows the crew a picture of the Earth before they reached escape velocity as if the Earth were outside in the present. The image causes one crew member to believe that they are still on Earth. The crew member's mental illness had been well treated on Earth before the war made prescription refills impossible. The crew attempt to improvise a new clock for the AI, but the AI recognizes the inaccuracy of the clock and ignores the clock. The crew attempt to manually restart the fusion reactor but abandon the attempt when they almost lose containment. All hope seems to be lost. However, the improvised clock allows the AI's predictions, which are analogous to dreams or hallucinations in humans, to move forward in time. Previously, the AI's predictions had all been overlaid on top of each other at the same instant in time. The AI decides on its own to search for a pulsar as a time reference. 8/10/2025. I feel the need to respond to Bill Bohn's review of a computer which he apparently does not own. There is nothing misleading in my review. The numbers are what they are. If you actually had a GMKtec EVO-X2, then you could reproduce my numbers. Moonshot Kimi-K2 is simply wrong when it writes, "The reviewer is running a GMKtec EVO-X2 mini-PC that houses a Ryzen 9 8945HS (8 Zen 4 cores + Radeon 780M iGPU) plus 96 GB of system DDR5 and BIOS UMA set to 96 GB." Moonshot Kimi-K2 is incorrectly identifying the processor, the graphics unit, and the amount of memory the GMKtec EVO-X2 computer has. The processor is a Ryzen AI Max+ 395 which has 16 processor cores, not 8. The graphics processor is an integrated AMD Radeon 8060S, not 780M. The GMktec EVO-X2 has 128GB of total system memory, not 96GB. A maximum 96GB of the 128GB total system memory can be allocated for VRAM. Why are you reviewing a computer that you apparently do not own and apparently know nothing about? Do not blindly accept whatever information an AI system gives to you. 8/11/2025. I have changed the previous Evaluation Batch Size from 16 to 4 for the gpt-oss-120b model. This seems to be most stable setting so far. 8/17/2025. The gpt-oss-120b model from OpenAI now outputs 36 to 40 tokens/second for the prompt, "Why is the sky blue?" This speed increase was made possible by selecting the new ROCm llama.cpp v1.46.0 and choosing 96GB of memory to be allocated to video memory in the BIOS. Thanks to whomever made this dramatic speed increase possible. What's funny is gpt-oss-120b tells me it is impossible for consumer-level computer hardware to achieve this speed. In the gpt-oss-120b settings, I set: Context Length: 63000 GPU Offload: 36/36 Evaluation Batch Size: 256 Offload KV Cache to GPU Memory: YES (slider to the right) Flash Attention: YES (slider to the right) In the magnifying lens / Runtime setting, I set: GGUF: ROCm llama.cpp (Windows) v1.46.0 All other parameters are at their default settings for gpt-oss-120b. Qwen3 235B now outputs about 9 to 10 tokens/second. The new version of ROCm llama.cpp v1.46.0 seems to allow 81 layers to be offloaded to the GPU instead of 80 layers. My model settings for Qwen3 235B are now: Context Length: 30000 GPU Offload: 81/94 Evaluation Batch Size: 128 Offload KV Cache to GPU Memory: NO (slider to the left) Keep Model in Memory: NO (slider to the left) Flash Attention: YES (slider to the right) In the magnifying lens / Runtime setting, I set: GGUF: ROCm llama.cpp (Windows) v1.46.0 All other parameters are at their default settings for Qwen3 235B. Currently, there seems to be no substitute for experimenting with the various parameters to get a stable configuration in LM Studio. The parameters that AI models recommend do not work. I recommend dividing the Evaluation Batch Size by 2 if you get gibberish from one of these large AI models or if the AI model crashes. 8/26/2025. After upgrading to AMD Adrenalin 25.8.1, the gpt-oss-120b AI model stopped working with ROCm llama.cpp v1.46.0 in LM Studio 0.3.23. I am still able to use Vulkan llama.cpp v1.46.0 but at a reduced output speed of about 33 tokens/second for the test prompt, "Why is the sky blue?" You might want to set a System Restore point before upgrading the AMD Adrenalin software so that you can go back to a faster configuration if the upgrade degrades inference speed. 9/10/2025. I installed a second 4TB NVME drive made by Western Digital after removing the rubber feet to reveal the screws. I followed the instructions of gpt-oss-120b for running a free Windows program named Rufus to make a bootable USB drive with a Debian installation *.iso file extracted to the USB drive and booting from the drive. Everything was going well until I had to enter my wireless network password. After repeatedly entering the correct password, the installation would not advance beyond the DHCP configuration. Updating the computer's BIOS did not help. I discovered I had to select, "None", as the driver for my wired network adapter. That option comes up somewhere in the non-graphical installation process. The wireless network Debian installation then worked. The gpt-oss-120b AI model outputs at 47 tokens/second in LM Studio 0.3.25 running in Debian Linux with the Gnome graphical interface for the prompt, "Why is the sky blue?" Gnome has an option for High Performance mode, similar to Windows, which I activated. I set the magnifying glass / Hardware / Memory Limit value to 110. The Runtime / GGUF setting is Vulkan llama.cpp (Linux) v1.50.2. For some reason, the ROCm llama.cpp v1.50.2 program does not work in Debian Linux with the EVO-X2 computer at present. The gpt-oss-120b model settings in LM Studio in Debian Linux are: Context Length: 30000 GPU Offload: 36/36 Evaluation Batch Size: 127 Keep Model in Memory: NO (slider to the left) Try mmap(): NO (slider to the left) Flash Attention: YES (slider to the right) All other gpt-oss-120b model parameters are at their default settings. 10/1/2025. LM Studio 0.3.27 running in Debian "Trixie" Linux automatically installed ROCm llama.cpp (Linux) v1.52.0 and selected that as the engine for GGUF even though ROCm is currently not supported in the Debian "Trixie" edition. The Vulkan llama.cpp (Linux) v1.52.0 apparently uses more memory than v1.50.2 and causes the output speed of gpt-oss-120b to plummet to around 2 tokens/second. This is a step in the wrong direction. Luckily, I could scroll down in the All tab of the Runtime window and click on the "..." next to Vulkan llama.cpp (Linux) and select, "Manage Versions", and delete Vulkan llama.cpp v1.52.0 to make v1.50.2 the default. I also had to uncheck the, "Auto-update selected Runtime Extension Packs". ... show more
Reviewed in the United States on July 25, 2025 by Edward Lee

  • Blazing fast AI workstation that works perfect with Fedora
Size: 128GB LPDDR5X + 2TB
Been running for about a month with no issues. Fantastic AI workstation and quite a bit heavier than I expected! Booted into Windows very quickly, then after verifying BIOS was up to date, wiped Windows and installed the latest beta of Fedora 44. WiFi, Ethernet, Bluetooth immediately recognized and connected. Ethernet on Realtek RTL8125 and WiFi uses Mediatek MT7925, so double check compatibility if you're installing Linux. Screaming fast and cleanly running 90b local AI models with Open WebUI and SearXNG. If you're pushing this box to the limit, keep it well ventilated and in a cool place because it can get hot quickly when running heavier AI operations. Otherwise it's near silent and CPU stays around 36°C for most operations with a really low power draw ... show more
Reviewed in the United States on April 4, 2026 by James Kimber

  • Great value for this form factor - work required if really want to use it for AI
Size: 128GB LPDDR5X + 2TB
I bought the 128 GB configuration to run large language models locally. After a month of daily use as a dedicated AI inference server on Linux, this is a very capable mini PC, but it's not plug-and-play for this use case. THE SHORT(ISH) VERSION The 128 GB unified memory is the killer feature. For local AI inference, memory is everything. The Ryzen AI Max+ 395 with its integrated Radeon 8060S (40 compute units, RDNA 3.5) is fast enough to make running serious models practical, not just possible. What I love: - 128 GB unified memory lets me run models that would require a high-end discrete GPU elsewhere - Dual NVMe slots, it came with a 2 TB drive and I added a Crucial T710 4 TB NVMe for model storage. 6 TB total, no external drives needed. Note: the slots are PCIe 4.0, so Gen5 drives like the T710 work but run at Gen4 speeds - Cooling hasn't been an issue, I run it 24/7 under loads with no problems - I run it headless as a dedicated server, 2.5 GbE and SSH is all I need. Ports haven't been a concern for my use case - The footprint is tiny. It sits on my desk under my monitors What to know before buying: - RAM is soldered. Get the 128 GB config. There is no upgrade path. - If you're buying this for AI/ML on Linux, read the technical section below - The marketing focuses on Windows AI features and the NPU, the real value for AI workloads is the GPU + unified memory on Linux STAR RATING: 4/5. The hardware is genuinely exceptional for the price. I dock one star because the AI use case, which is the main reason to buy the 128 GB config, requires significant Linux configuration that isn't documented by GMKtec. They're selling AI-capable hardware with consumer-grade setup guidance. FOR AI/ML USERS ON LINUX If you're here because you want to run local LLMs, here's what I've learned running this machine 24/7 for a month. GMKtec's listing mentions LM Studio and Deepseek 70B, my results go well beyond that. WHAT IT CAN ACTUALLY DO Real benchmarks from my setup running llama.cpp with ROCm 7.2.0 on Ubuntu: - Qwen3.5-35B (3.5B active MoE, Q6_K, 28 GB): 44 tokens/sec generation, 180-300 tok/s prompt processing via Vulkan - Qwen3-Coder-Next 80B (MoE, Q4_K_M, 46 GB): 42 tokens/sec, an 80B coding model at conversational speed - Qwen3.5-122B (10B active MoE, Q4_K_M, 75 GB): 15 tokens/sec with HIP backend, a 122-billion parameter model running on a mini PC - I run whisper.cpp for speech-to-text (~1.5 GB) alongside the LLM server simultaneously with no issues - 8 parallel inference slots with 262K context window on the 35B model These aren't cherry-picked numbers. This is what I get daily serving models to multiple clients on my network. WHAT REQUIRED WORK This is not plug-and-play for AI workloads. You need GRUB kernel parameters for ROCm stability: - amdgpu.cwsr_enable=0 (prevents GPU crashes) - amd_iommu=off (IOMMU compatibility) - amdgpu.gttsize=126976 (expands the GPU translation table to 124 GB, the stock default is much smaller) - ttm.pages_limit=32505856 (memory management) Without these, expect crashes and out-of-memory errors that don't make sense given the hardware specs. I spent real time figuring this out. GTT fragmentation is real, even with 124 GB configured, contiguous allocation limits mean you can't always use it all. The 122B model requires careful batch-size tuning (I dropped from 16384 to 2048) to fit. Full GPU offload (ngl=99) still OOMs on the 122B; I run it at ngl=44. Vulkan vs. HIP matters: Vulkan is ~10% faster for MoE models (the sparse ones), HIP is ~10% faster for the 122B dense-ish model. You'll want to benchmark both for your specific models. WHO SHOULD BUY THIS If you want to run 30-80B parameter models at usable speeds without a discrete GPU, this is a really good option in this form factor. The unified memory architecture means the GPU can access the full 128 GB, something that would require a high-end discrete GPU or multiple consumer GPUs to match. If you just want a fast mini PC for normal desktop use, you're paying a premium for 128 GB of soldered RAM you may never use. Consider the 96 GB config instead. If you want a turnkey AI appliance, this isn't it. Budget time for Linux setup and GRUB tuning. But if you're comfortable with that, this machine punches absurdly above its weight class. COMMUNITY RESOURCES GMKtec doesn't document the Linux AI setup, but the community does. These were invaluable for getting this machine dialed in: - kyuz0/amd-strix-halo-toolboxes (GitHub) GRUB parameters, flash attention, no-mmap recommendations, firmware warnings - pablo-ross/strix-halo-gmktec-evo-x2 (GitHub) EVO-X2 specific Ubuntu setup guide - strixhalo.wiki community performance tracking and tuning tips for Strix Halo Start there. It'll save you hours. ... show more
Reviewed in the United States on March 19, 2026 by Mike

  • Excellent AI computer for running LLms
Size: 128GB LPDDR5X + 2TB
Where do I start. This thing is fantastic and configurable. I have the 128 GB variant and it nice to be able allocate 96GB to Video ram and still have 32 for the system. Runs quiet unless pushing it and even then it's tolerable. AI power is where it's at with this box. I havent really gamed on it, so cant speak for that. When desk space is at a premium and you've outgrown your big RGB lit liquid cooled case with a 1200 watt power supply chugging along at moderate 400 watts, you get the fastest consumer grade AI micro computer on the market that will put the giant to shame. Throw in a 4TB nvme on top of the 2 it comes with and you have a powerhouse of a box imo. It been about 8 years since the last computer I built, so was during days. ... show more
Reviewed in the United States on January 13, 2026 by JBogs

Can't find a product?

Find it on Amazon first, then paste the link below.
Checking for inventory...