Home News

company news about Big Blue’s Redbook on Storage Scale KV Cache management

All Products

Rack Storage Server
(179)

Huawei Fusion Server
(31)

Dell Poweredge Server
(59)

H3C Server
(31)

Datacom Switches
(96)

WLAN Device
(21)

Smart Wireless Router
(17)

Hard Drive HDD
(78)

Internal Hard Drive SSD
(16)

Geforce Graphic Card
(27)

INTEL CPU Processor
(20)

Server Memory RAM
(6)

Refurbished Storage Server
(6)

SFP Transceiver Module
(4)

Fibre Channel Switch
(125)

Certification

Customer Reviews

The sales staff of Beijing Qianxing Jietong Technology Co.,Ltd are very professional and patient. They can provide quotations quickly. The quality and packaging of the products are also very good. Our cooperation is very smooth.

—— 《Festfing DV》LLC

When I was looking for intel CPU and Toshiba SSD urgently, Sandy from Beijing Qianxing Jietong Technology Co., Ltd gave me a lot of help and got me the products I needed quickly. I really appreciate her.

—— Kitty Yen

Sandy of Beijing Qianxing Jietong Technology Co.,Ltd is a very careful salesman, who can remind me of configuration errors in time when I buy a server. The engineers are also very professional and can quickly complete the testing process.

—— Strelkin Mikhail Vladimirovich

We are very happy with our experience working with Beijing Qianxing Jietong. The product quality is excellent, and delivery is always on time. Their sales team is professional, patient, and very helpful with all our questions. We truly appreciate their support and look forward to a long-term partnership. Highly recommended!

—— Ahmad Navid

Quality： “Great experience with my supplier. The MikroTik RB3011 was already used, but it was in very good condition and everything works perfectly. Communication was fast and smooth, and all my concerns were addressed quickly. Very reliable supplier—highly recommended.”

—— Geran Colesio

I'm Online Chat Now

Company News

Big Blue’s Redbook on Storage Scale KV Cache management

IBM Storage Scale parallel file system supports distributed KV cache management paired with NVIDIA Dynamo, catering to large-scale AI inference scenarios with massive context workloads.

IBM has released an official Redbook titled Context Without Limits: A High-Performance KV Cache Platform for Large-Scale AI Inference, delivering a complete validated reference architecture for this joint solution. The integrated stack combines Supermicro Petascale Storage Servers, NVIDIA Spectrum-X Ethernet networking, and IBM Storage Scale Erasure Coding Edition (ECE) to build a high-performance shared storage tier for AI inference. As authoritative technical documents published by IBM ITSO (International Technical Support Organization), IBM Redbooks offer hands-on, in-depth deployment guidance for enterprise-grade IBM infrastructure products.

Co-authored by engineering teams from IBM, Supermicro and NVIDIA, the Redbook addresses a core pain point of long-context AI workloads. Use cases including multi-turn dialogue assistants, RAG retrieval applications and autonomous agent pipelines generate massive KV cache data inside GPU HBM. Once cached data is evicted from limited HBM resources, repeated recomputation will trigger severe latency rises, making persistent cross-request KV cache storage indispensable.

The solution adopts a five-tier hierarchical KV cache architecture covering different latency and capacity demands:

G1 Layer: GPU node local HBM
G2 Layer: CPU node system DRAM
G3 Layer: Direct-attached local SSD
G3.5 Layer: Pod-level shared flash storage, fronted by NVIDIA BlueField DPUs with direct interconnection to GPU server DPUs
G4 Layer: External cross-Ethernet shared storage pool connected to all GPU compute servers

Covering end-to-end memory and storage hierarchy, this multi-tier setup delivers continuous latency and capacity gradients. It enables NVIDIA Dynamo to conduct intelligent cache placement, automatic eviction and dynamic data reloading across the whole storage stack, adapting flexibly to varied workload access patterns and total infrastructure cost budgets.

Deployed on Supermicro Petascale Storage Servers, Storage Scale ECE serves as the G4 cold cache tier. It is optimized for non-latency-sensitive KV cache data, including inactive multi-turn conversation states, shared agent context data and historical query records that do not require instant response.

According to test results recorded in the Redbook, this production-ready reference architecture effectively accelerates generative AI and agentic AI inference services. In single-request TTFT (Time To First Token) tests compared with standalone GPU servers without external Storage Scale KV cache, the integrated system maintains stable TTFT regardless of prompt length changes. It achieves a 56x speedup under 130k-token input sequences and completely eliminates inference latency fluctuations caused by extended prompt lengths.

Under concurrent multi-user inference pressure, the solution achieves dramatic performance improvement: request throughput surges from 0.19 RPS to 4.26 RPS, marking a 22x throughput boost. Meanwhile, the total processing time for 200 inference requests drops by 95%, greatly lifting GPU utilization efficiency and overall inference cluster scalability.

The stack also maintains robust performance under harsh noisy-neighbor stress tests. With four client ends generating sustained 200 GB/s competing network I/O traffic, the integrated system still stably runs at 3.6 RPS, finishing all 200 inference requests within 55.56 seconds. Its throughput remains 18x higher than the baseline GPU-only recomputation architecture.

The research team concluded in the Redbook: “For enterprises aiming to maximize ROI on expensive GPU hardware investments, this verified integrated architecture provides a straightforward, production-ready approach to boosting inference throughput, cutting end-to-end latency, supporting higher service concurrency, and building more cost-effective large-scale AI inference infrastructure.”

Keywords: SUPERMICRO, IBM Storage Scale, NVIDIA Dynamo

Beijing Qianxing Jietong Technology Co., Ltd.
Sandy Yang/Global Strategy Director
WhatsApp / WeChat: +86 13426366826
Email: yangyd@qianxingdata.com
Website: www.qianxingdata.com/www.storagesserver.com
Business Focus:
ICT Product Distribution/System Integration & Services/Infrastructure Solutions
With 20+ years of IT distribution experience, we partner with leading global brands to deliver reliable products and professional services.
“Using Technology to Build an Intelligent World”Your Trusted ICT Product Service Provider!

Pub Time : 2026-06-12 11:09:46 >> News list

Contact Details

Beijing Qianxing Jietong Technology Co., Ltd.

Contact Person: Ms. Sandy Yang

Tel: 13426366826

company news about Big Blue’s Redbook on Storage Scale KV Cache management

Rack Storage Server

Huawei Fusion Server

Dell Poweredge Server

H3C Server

Datacom Switches

WLAN Device

Smart Wireless Router

Hard Drive HDD

Internal Hard Drive SSD

Geforce Graphic Card

INTEL CPU Processor

Server Memory RAM

Refurbished Storage Server

SFP Transceiver Module

Fibre Channel Switch

Rack Storage Server

12 Bays 1U Rackmount Server Lenovo ThinkSystem SR630 Rack Server

ThinkSystem SR250 V2 4SFF Rack Storage Server Intel Xeon E-2378G Processor

Intel C621A Rack Storage Server Inspur NF5180M6 1U Rack Mount Server

Huawei Fusion Server

FusionServer 5288 V6 4U Rack Server 32 DDR4 DIMMs 44 3.5 Inch Hard Disks

Ultra High Density Huawei Fusion Server 1U Network Storage Server 1288H V5

New Gen OceanStor 5310 Huawei Rack Server Hybrid Flash Storage