Home Cases

NVIDIA DGX Spark Cluster Review: Distributed Inference on Dell, GIGABYTE, and HP

All Products

Rack Storage Server
(165)

Huawei Fusion Server
(31)

Dell Poweredge Server
(59)

H3C Server
(31)

Datacom Switches
(96)

WLAN Device
(21)

Smart Wireless Router
(10)

Hard Drive HDD
(78)

Internal Hard Drive SSD
(16)

Geforce Graphic Card
(27)

INTEL CPU Processor
(20)

Server Memory RAM
(6)

Refurbished Storage Server
(6)

SFP Transceiver Module
(4)

Fibre Channel Switch
(42)

Certification

Customer Reviews

The sales staff of Beijing Qianxing Jietong Technology Co.,Ltd are very professional and patient. They can provide quotations quickly. The quality and packaging of the products are also very good. Our cooperation is very smooth.

—— 《Festfing DV》LLC

When I was looking for intel CPU and Toshiba SSD urgently, Sandy from Beijing Qianxing Jietong Technology Co., Ltd gave me a lot of help and got me the products I needed quickly. I really appreciate her.

—— Kitty Yen

Sandy of Beijing Qianxing Jietong Technology Co.,Ltd is a very careful salesman, who can remind me of configuration errors in time when I buy a server. The engineers are also very professional and can quickly complete the testing process.

—— Strelkin Mikhail Vladimirovich

We are very happy with our experience working with Beijing Qianxing Jietong. The product quality is excellent, and delivery is always on time. Their sales team is professional, patient, and very helpful with all our questions. We truly appreciate their support and look forward to a long-term partnership. Highly recommended!

—— Ahmad Navid

Quality： “Great experience with my supplier. The MikroTik RB3011 was already used, but it was in very good condition and everything works perfectly. Communication was fast and smooth, and all my concerns were addressed quickly. Very reliable supplier—highly recommended.”

—— Geran Colesio

I'm Online Chat Now

NVIDIA DGX Spark Cluster Review: Distributed Inference on Dell, GIGABYTE, and HP

May 15, 2026

Two defining traits stand out for the NVIDIA DGX Spark: 128GB unified memory in a $4,000 desktop unit, and a built-in 200Gb datacenter-grade network. The high-speed fabric differentiates it from regular workstations, enabling multi-node clustering once exclusive to rack-mounted servers. This review benchmarks distributed inference across Dell, GIGABYTE, and HP Spark variants in two-node 200GbE clusters across diverse models and workloads. It also analyzes pipeline parallelism (PP), an alternative splitting method outperforming NVIDIA’s default tensor parallelism (TP).

200Gb Network Fabric

Each Spark equips two QSFP56 cages paired with an integrated ConnectX-7 SmartNIC. Limited by PCIe Gen5 x4 bandwidth, the usable network speed caps at 200Gb, with one port sufficient for full bandwidth; the second port offers topology flexibility. Three common configurations are available: direct Spark-to-Spark 200Gb links, switch-free ring topology via dual 100Gb ports, and hybrid clustering with NVMe-oF high-speed storage access. NVIDIA sells single-unit desktops, validated two-node clusters, and newly released four-node setups. The dual-Spark configuration is the most practical for production-style inference and the focus of this test.

Rationale for Spark Clustering

The primary benefit is expanding model capacity: two linked Sparks can run 120B-parameter models that exceed single-unit memory limits. More importantly, the platform serves as an affordable educational tool. NVIDIA designs Spark for beginners to learn AI workflows, with official guides covering model deployment, fine-tuning, and PyTorch/JAX development. Dual-node clusters further teach multi-node parallelism and network bottleneck analysis without costly datacenter hardware. Notably, Spark is not optimized for production inference. Restricted by memory bandwidth and inter-node latency, its 200GbE link is slower than internal PCIe connections. Larger clusters suffer severe performance degradation, with low token throughput, limiting them to educational use rather than commercial serving.

Performance Testing: PP vs TP

Parallelism Strategy Selection

NVIDIA defaults to TP, which splits each transformer layer across two GPUs with frequent all-reduce data exchanges. By contrast, PP divides models by layer, transferring activations only once between nodes. On 200GbE links, PP minimizes cross-node communication. For large models at high batch sizes, PP vastly outperforms TP; TP only excels in single-request low-latency chat scenarios.

Tests on GPT-OSS-120B confirm this gap. At batch size 128, PP hits 554.69 tok/s (2.20× faster than TP) in balanced workloads, 310.63 tok/s vs 164.99 tok/s in prefill-heavy tasks. TP leads only at batch size 1. For small models like Llama-3.1-8B, TP dominates most batch sizes due to lightweight layer computation, with PP overtaking TP merely at high concurrency.

Multi-Model Benchmark Results (PP=2)

GPT-OSS Series

For GPT-OSS-120B, HP topped peak throughput in balanced (504.88 tok/s) and prefill-heavy (441.63 tok/s) workloads; GIGABYTE led decode-heavy tests (494.37 tok/s). For GPT-OSS-20B, Dell dominated balanced (976.77 tok/s) and prefill-heavy (852.39 tok/s) scenarios, while GIGABYTE led decode tasks (945.55 tok/s).

Llama 3.1 8B Variants

In BF16 precision, Dell led balanced (689.53 tok/s) and decode-heavy (581.43 tok/s) workloads; GIGABYTE won prefill-heavy tests (539.27 tok/s). FP4 optimization boosted throughput sharply: GIGABYTE led balanced (1458.86 tok/s) and prefill-heavy (954.23 tok/s) tasks. For FP8, Dell maintained narrow leads in balanced (1105.42 tok/s) and decode-heavy (862.33 tok/s) scenarios.

Mistral & Qwen Models

Mistral Small 3.1 24B saw minimal gaps: GIGABYTE peaked at 255.09 tok/s in balanced workloads. For Qwen3 Coder 30B (A3B Base), GIGABYTE led prefill-heavy tasks (1862.40 tok/s); Dell excelled in decode scenarios. Under FB8 quantization, GIGABYTE topped prefill-heavy throughput (3088.62 tok/s), while Dell led decode tasks (705.77 tok/s).

Dual Spark Systems Peak Output Summary

Model	Scenario (BS – 64)	Dell Peak Output	GIGABYTE Peak Output	HP Peak Output
GPT-OSS-120B	Equal ISL/OSL	463.97 tok/s	497.26 tok/s	504.88 tok/s
GPT-OSS-120B	Prefill Heavy	419.56 tok/s	417.34 tok/s	441.63 tok/s
GPT-OSS-120B	Decode Heavy	451.18 tok/s	494.37 tok/s	474.85 tok/s
GPT-OSS-20B	Equal ISL/OSL	976.77 tok/s	952.31 tok/s	915.72 tok/s
GPT-OSS-20B	Prefill Heavy	852.39 tok/s	802.37 tok/s	757.05 tok/s
GPT-OSS-20B	Decode Heavy	938.65 tok/s	945.55 tok/s	865.78 tok/s
Llama-3.1-8B-Instruct	Equal ISL/OSL	689.53 tok/s	687.48 tok/s	618.87 tok/s
Llama-3.1-8B-Instruct	Prefill Heavy	515.45 tok/s	539.27 tok/s	463.39 tok/s
Llama-3.1-8B-Instruct	Decode Heavy	581.43 tok/s	576.91 tok/s	531.07 tok/s
Llama-3.1-8B-FP4	Equal ISL/OSL	1427.39 tok/s	1458.86 tok/s	1413.51 tok/s
Llama-3.1-8B-FP4	Prefill Heavy	884.22 tok/s	954.23 tok/s	843.57 tok/s
Llama-3.1-8B-FP4	Decode Heavy	1008.98 tok/s	1007.23 tok/s	943.73 tok/s
Llama-3.1-8B-FP8	Equal ISL/OSL	1105.42 tok/s	1089.85 tok/s	1076.68 tok/s
Llama-3.1-8B-FP8	Prefill Heavy	759.50 tok/s	827.40 tok/s	725.51 tok/s
Llama-3.1-8B-FP8	Decode Heavy	862.33 tok/s	855.81 tok/s	800.78 tok/s
Mistral-Small-3.1-24B	Equal ISL/OSL	249.77 tok/s	255.09 tok/s	239.09 tok/s
Mistral-Small-3.1-24B	Prefill Heavy	216.01 tok/s	214.38 tok/s	197.92 tok/s
Mistral-Small-3.1-24B	Decode Heavy	238.44 tok/s	237.97 tok/s	221.41 tok/s

Conclusion

Dell, GIGABYTE, and HP Spark units deliver negligible performance gaps, with minor batch-specific leads. Purchase decisions should prioritize chassis design, thermal performance, warranty, and after-sales support over trivial benchmark differences. Parallelism strategy exerts far greater impact than OEM variations: PP outperforms TP for batched inference, while TP suits single-stream low-latency interaction. NVIDIA’s TP recommendation aligns with Spark’s positioning as an interactive learning device rather than production infrastructure. A dual-node Spark cluster serves as an affordable teaching platform for distributed AI. Future tests will cover larger clusters and end-to-end small-model training, pending lab 800Gb switch deployment.

Beijing Qianxing Jietong Technology Co., Ltd.
Sandy Yang/Global Strategy Director
WhatsApp / WeChat: +86 13426366826
Email: yangyd@qianxingdata.com
Website: www.qianxingdata.com/www.storagesserver.com
Business Focus:
ICT Product Distribution/System Integration & Services/Infrastructure Solutions
With 20+ years of IT distribution experience, we partner with leading global brands to deliver reliable products and professional services.
“Using Technology to Build an Intelligent World”Your Trusted ICT Product Service Provider!

PREV: ORICO X50 Review: Thunderbolt 5 Speed in a Portable SSD Enclosure

NEXT: Seagate IronWolf Pro 32TB Review: Top-of-Stack Capacity for Multi-Bay NAS

Contact Details

Beijing Qianxing Jietong Technology Co., Ltd.

Contact Person: Ms. Sandy Yang

Tel: 13426366826

NVIDIA DGX Spark Cluster Review: Distributed Inference on Dell, GIGABYTE, and HP

Rack Storage Server

Huawei Fusion Server

Dell Poweredge Server

H3C Server

Datacom Switches

WLAN Device

Smart Wireless Router

Hard Drive HDD

Internal Hard Drive SSD

Geforce Graphic Card

INTEL CPU Processor

Server Memory RAM

Refurbished Storage Server

SFP Transceiver Module

Fibre Channel Switch

NVIDIA DGX Spark Cluster Review: Distributed Inference on Dell, GIGABYTE, and HP

200Gb Network Fabric

Rationale for Spark Clustering

Performance Testing: PP vs TP

Parallelism Strategy Selection

Multi-Model Benchmark Results (PP=2)

GPT-OSS Series

Llama 3.1 8B Variants

Mistral & Qwen Models

Dual Spark Systems Peak Output Summary

Conclusion

Rack Storage Server

12 Bays 1U Rackmount Server Lenovo ThinkSystem SR630 Rack Server

ThinkSystem SR250 V2 4SFF Rack Storage Server Intel Xeon E-2378G Processor

Intel C621A Rack Storage Server Inspur NF5180M6 1U Rack Mount Server

Huawei Fusion Server

FusionServer 5288 V6 4U Rack Server 32 DDR4 DIMMs 44 3.5 Inch Hard Disks

Ultra High Density Huawei Fusion Server 1U Network Storage Server 1288H V5

New Gen OceanStor 5310 Huawei Rack Server Hybrid Flash Storage