logo
Home News

company news about MinIO adds petabyte-scale MemKV cache for Nvidia GPU inference

Certification
China Beijing Qianxing Jietong Technology Co., Ltd. certification
China Beijing Qianxing Jietong Technology Co., Ltd. certification
Customer Reviews
The sales staff of Beijing Qianxing Jietong Technology Co.,Ltd are very professional and patient. They can provide quotations quickly. The quality and packaging of the products are also very good. Our cooperation is very smooth.

—— 《Festfing DV》LLC

When I was looking for intel CPU and Toshiba SSD urgently, Sandy from Beijing Qianxing Jietong Technology Co., Ltd gave me a lot of help and got me the products I needed quickly. I really appreciate her.

—— Kitty Yen

Sandy of Beijing Qianxing Jietong Technology Co.,Ltd is a very careful salesman, who can remind me of configuration errors in time when I buy a server. The engineers are also very professional and can quickly complete the testing process.

—— Strelkin Mikhail Vladimirovich

We are very happy with our experience working with Beijing Qianxing Jietong. The product quality is excellent, and delivery is always on time. Their sales team is professional, patient, and very helpful with all our questions. We truly appreciate their support and look forward to a long-term partnership. Highly recommended!

—— Ahmad Navid

Quality: “Great experience with my supplier. The MikroTik RB3011 was already used, but it was in very good condition and everything works perfectly. Communication was fast and smooth, and all my concerns were addressed quickly. Very reliable supplier—highly recommended.”

—— Geran Colesio

I'm Online Chat Now
Company News
MinIO adds petabyte-scale MemKV cache for Nvidia GPU inference

MinIO has developed a petabyte-scale MemKV caching system tailored for Nvidia GPUs, deployed on top of its AIStor object storage platform.

GPU clusters running inference require high-bandwidth memory (HBM) to store context, vectorized tokens and intermediate key-value (KV) pairs. Once GPU HBM is saturated, data cascades down to CPU DRAM and NVMe SSDs, managed by Nvidia BlueField-4 (BF4) DPUs. When these tiers reach capacity, MinIO AIStor acts as the final storage backup. Nvidia’s STX architecture governs this multi-layer cache hierarchy, and MemKV complies with the standard to deliver persistent, shared context across GPU clusters at superior scale.

latest company news about MinIO adds petabyte-scale MemKV cache for Nvidia GPU inference  0


AB Periasamy, MinIO co-founder and co-CEO, commented: “The industry has been papering over context loss for years because, at small scale, you may absorb the recompute tax. At today’s high GPU density for hyperscalers and neoclouds, this is no longer viable.

 Recomputing generated context wastes power; for clusters with thousands of GPUs, it creates fundamental structural inefficiency. Large-scale inference requires purpose-built infrastructure, and MemKV is designed specifically for this data path.”

For the first time, MinIO enables shared context pools for entire GPU clusters at microsecond-level latency matching inference workflows, avoiding millisecond delays from conventional external storage. Without sufficient cache tiers, GPUs waste resources on repeated context recalculation.

In a 128-GPU deployment with 128K-token context length, MemKV improved time-to-first-token under production loads and boosted GPU utilization from 50% to over 90%, generating an estimated $2 million annual compute cost saving.

Purpose-built for Nvidia STX architecture, MemKV supports Nvidia Dynamo and NIXL caching tools. It delivers petabytes of shared context memory at SSD-level costs, decoupling cache scaling from GPU compute resources. Its core features are listed below:
  • Native BF4 STX support: Runs as an ARM64 binary within STX infrastructure, embedded in storage rather than separate x86 storage servers.
  • End-to-end RDMA transport: Transfers KV cache between GPU memory and NVMe via RDMA, bypassing conventional file and object storage protocols.
  • GPU-optimized block size: Uses 2–16 MB blocks for GPU throughput demands, instead of legacy 4 KB storage blocks.
  • Wire-speed performance: Optimized for Nvidia Spectrum-X Ethernet and PCIe Gen6 to maximize physical fabric throughput.


MemKV directly transfers data from NVMe SSDs to AI pipelines over RDMA, eliminating HTTP overhead, file system translation and intermediate storage servers.


latest company news about MinIO adds petabyte-scale MemKV cache for Nvidia GPU inference  1

MinIO categorizes rival context memory solutions into two types: non-sharable local NVMe (G3) and general-purpose shared storage (G4). It positions MemKV as a purpose-built G3.5 tier, distinguishing itself from generic storage products.

The firm emphasizes that legacy vendors’ G3.5 offerings still retain redundant protocol nodes, metadata services and file translation layers. These layers ensure durability and consistency for training data and model weights, yet they are unnecessary for ephemeral, recomputable KV cache optimized for 2–16 MB data blocks.

Hardware RAID vendor GRAID and storage firm WEKA also provide STX-compatible KV cache solutions. A broad range of storage vendors support Nvidia STX, including Cloudian, Dell, DDN, Everpure, Hammerspace, Hitachi Vantara, HPE, Lightbits/ScaleFlux, NetApp, Nutanix, Peak:AIO, Pliops and VAST Data.

Beijing Qianxing Jietong Technology Co., Ltd.
Sandy Yang/Global Strategy Director
WhatsApp / WeChat: +86 13426366826
Email: yangyd@qianxingdata.com
Website: www.qianxingdata.com/www.storagesserver.com
Business Focus:
ICT Product Distribution/System Integration & Services/Infrastructure Solutions
With 20+ years of IT distribution experience, we partner with leading global brands to deliver reliable products and professional services.
“Using Technology to Build an Intelligent World”Your Trusted ICT Product Service Provider!
Pub Time : 2026-05-14 13:46:14 >> News list
Contact Details
Beijing Qianxing Jietong Technology Co., Ltd.

Contact Person: Ms. Sandy Yang

Tel: 13426366826

Send your inquiry directly to us (0 / 3000)