logo
Home Cases

MinIO Introduces MemKV for Petabyte-Scale AI Inference Memory

Certification
China Beijing Qianxing Jietong Technology Co., Ltd. certification
China Beijing Qianxing Jietong Technology Co., Ltd. certification
Customer Reviews
The sales staff of Beijing Qianxing Jietong Technology Co.,Ltd are very professional and patient. They can provide quotations quickly. The quality and packaging of the products are also very good. Our cooperation is very smooth.

—— 《Festfing DV》LLC

When I was looking for intel CPU and Toshiba SSD urgently, Sandy from Beijing Qianxing Jietong Technology Co., Ltd gave me a lot of help and got me the products I needed quickly. I really appreciate her.

—— Kitty Yen

Sandy of Beijing Qianxing Jietong Technology Co.,Ltd is a very careful salesman, who can remind me of configuration errors in time when I buy a server. The engineers are also very professional and can quickly complete the testing process.

—— Strelkin Mikhail Vladimirovich

We are very happy with our experience working with Beijing Qianxing Jietong. The product quality is excellent, and delivery is always on time. Their sales team is professional, patient, and very helpful with all our questions. We truly appreciate their support and look forward to a long-term partnership. Highly recommended!

—— Ahmad Navid

Quality: “Great experience with my supplier. The MikroTik RB3011 was already used, but it was in very good condition and everything works perfectly. Communication was fast and smooth, and all my concerns were addressed quickly. Very reliable supplier—highly recommended.”

—— Geran Colesio

I'm Online Chat Now

MinIO Introduces MemKV for Petabyte-Scale AI Inference Memory

May 15, 2026
MinIO has released MemKV, a dedicated context memory store built to resolve a critical bottleneck within large-scale AI inference pipelines. Serving as MinIO’s second flagship solution alongside AIStor, MemKV expands the firm’s data infrastructure into the memory tier. It is engineered to deliver persistent, shared contextual data for agentic AI workloads running on distributed GPU clusters.

latest company case about MinIO Introduces MemKV for Petabyte-Scale AI Inference Memory  0

MinIO AIStor


As AI systems advance from one-off replies to multi-turn reasoning and automated task execution, sustaining continuous context across inference cycles has grown increasingly essential. Under existing architectures, context data is often discarded due to the limited capacity of GPU-adjacent memory tiers including HBM and DRAM. This compels GPUs to recalculate existing context repeatedly, driving up latency, compute usage and power draw. MinIO defines this redundant workload as the "recompute tax", an inefficiency that worsens exponentially in hyperscale cloud environments.

MemKV is engineered to alleviate this pain point via a shared, persistent memory layer capable of petabyte-scale storage with microsecond-level access latency. By retaining contextual data throughout inference workflows, the platform cuts down redundant computation and boosts overall infrastructure efficiency. Internal benchmark data from MinIO verifies improved time-to-first-token latency under production-grade concurrency. In a typical deployment equipped with 128 GPUs and 128K-token context windows, GPU utilization jumped from approximately 50% to over 90%, translating to substantial annual compute cost reductions.

MinIO’s executives stated that recompute overhead remains unnoticeable in small-scale deployments yet turns into a fundamental structural flaw at enterprise scale. As GPU clusters expand, repeated context regeneration incurs higher power consumption and infrastructure expenses, making specialized memory systems indispensable for sustainable AI operation.

Addressing the Memory-Scale Tradeoff


Legacy AI infrastructure forces developers to compromise between access speed and storage capacity. High-performance memory tiers such as HBM and DRAM deliver microsecond latency but come with tight capacity limits and high costs. In contrast, conventional storage systems offer massive scalability but suffer from millisecond-level latency, making them incompatible with real-time inference and long-context reasoning tasks.

latest company case about MinIO Introduces MemKV for Petabyte-Scale AI Inference Memory  1

Micron HBM4


MemKV bridges this industry gap by introducing an intermediate shared memory tier that balances ultra-low latency and large storage scalability. Natively compatible with NVIDIA BlueField-4 STX and integrated with NVIDIA Dynamo alongside NIXL tools, the solution enables entire GPU clusters to access unified contextual data pools at inference-aligned transmission speeds. This design eliminates frequent context data migration between isolated memory and storage layers, lowering latency and elevating system throughput.

NVIDIA BlueField-4


Architecture Optimized for Inference Workloads


Tailored exclusively for inference data pipelines, MemKV fits into the G3.5 layer of MinIO’s GPU memory hierarchy framework. Built on NVMe storage infrastructure, it achieves petabyte-level capacity while retaining microsecond access latency, successfully decoupling memory scalability from GPU compute resources.

The system abandons cumbersome traditional storage abstractions, transferring data straight from NVMe drives to AI data pipelines via end-to-end RDMA transmission. This cuts out performance overhead brought by HTTP protocols, file system conversion and intermediate storage servers—common bottlenecks in object and file-based storage architectures.

Source: Google

Key architectural optimizations include native ARM64 binary execution on NVIDIA BlueField-4 STX, embedded directly within the storage layer to reduce dependence on external x86 storage nodes. All data transfers between GPU memory and NVMe storage adopt RDMA transmission, bypassing redundant conventional storage stacks. Additionally, MemKV utilizes enlarged block sizes ranging from 2 MB to 16 MB, which are optimized for GPU throughput characteristics instead of the legacy 4 KB storage blocks. It supports cutting-edge high-speed interconnection fabrics such as NVIDIA Spectrum-X Ethernet and PCIe Gen6, facilitating near wire-speed data transmission across clusters.

Availability


MinIO MemKV is now commercially available for enterprise deployment.


Beijing Qianxing Jietong Technology Co., Ltd.
Sandy Yang/Global Strategy Director
WhatsApp / WeChat: +86 13426366826
Email: yangyd@qianxingdata.com
Website: www.qianxingdata.com/www.storagesserver.com
Business Focus:
ICT Product Distribution/System Integration & Services/Infrastructure Solutions
With 20+ years of IT distribution experience, we partner with leading global brands to deliver reliable products and professional services.
“Using Technology to Build an Intelligent World”Your Trusted ICT Product Service Provider!
Contact Details
Beijing Qianxing Jietong Technology Co., Ltd.

Contact Person: Ms. Sandy Yang

Tel: 13426366826

Send your inquiry directly to us (0 / 3000)