Running on Oracle Cloud Infrastructure (OCI), WEKA NeuralMesh and Augmented Memory Grid software delivers 10x higher token throughput, 10x more concurrent users and 7x more tokens per GPU, compared with standard OCI environments relying solely on local DRAM.
WEKA’s Augmented Memory Grid extends GPU server memory for AI inference by leveraging external storage via NeuralMesh, turning external resources into a high-performance KV Cache. It delivers microsecond latency and multi-GB/s bandwidth, offering up to petabytes of extra memory address space, with full compatibility for NVIDIA’s SX KV caching architecture. NeuralMesh is WEKA’s high-performance AI file system. All benchmarks were validated on a 9-node OCI bare-metal H100 cluster with 100,000-token context windows.
Pablo Salem, Senior Director of Software Development at OCI, commented: “Enterprise AI workloads keep expanding context windows and raising GPU utilization to new limits. These benchmarks prove WEKA’s solution eliminates GPU memory bottlenecks on OCI, enabling larger, more demanding inference workloads without extra GPU hardware investments.”
WEKA notes growing inference demand amplifies AI infrastructure inefficiencies. Frequent KV cache evictions create hidden overhead that wastes GPU cycles, increases latency, hurts user experience and raises per-token operational costs. For long-context and agentic AI workloads with 100,000-token-plus inputs, such overhead severely damages the unit economics of production AI deployments.
The benchmark was built on 9 nodes, 72 H100 GPUs, 100,000-token context windows and thousands of concurrent users, with clear performance gaps shown below:
-
Concurrent user capacity: WEKA supported over 5,000 concurrent users, versus just 600 on DRAM-only setups. It prevents cache saturation failures by expanding active cache from 8.64 TiB DRAM to 287 TiB NVMe flash storage, maximizing ROI on existing GPU hardware with no additional GPU purchases.
-
Token throughput: The WEKA stack hit around 2 million tokens per second, 10 times faster than the under-200,000 tokens/sec baseline of DRAM-only systems.
-
Total token processing volume: In a one-hour test with 2,400 concurrent users, WEKA processed 5 billion tokens, while the DRAM-only setup only handled 700 million tokens.
For agentic AI workflows, insufficient DRAM triggers constant GPU recomputation after cache saturation, lifting per-token costs and lowering ROI. With 7x tokens processed per GPU, WEKA greatly cuts overall token costs for production AI services.
For real-time AI services including search, summarization, code assistance and multi-turn agents, token throughput defines service limits for user capacity, response speed and infrastructure revenue potential. The 10x throughput improvement fully unlocks native GPU computing power within the OCI cluster.
In short, WEKA’s memory expansion software helps cloud platforms serve more users, process more tokens and cut operational costs effectively.
Liran Zvibel, CEO of WEKA, said: “Inference performance is bottlenecked by available GPU effective memory. These results prove hardware upgrades alone cannot fix AI token economic issues. The real limitation is the long-standing memory wall restraining GPU performance. WEKA’s solution on OCI boosts token processing capacity drastically with optimized total cost of ownership.”
OCI has published full benchmark methodology, system configurations and complete test results on its official AI & Data Science blog.
NeuralMesh with Augmented Memory Grid is now generally available for WEKA customers and listed on Oracle Marketplace, with OCI acting as its exclusive cloud launch partner. Enterprises running long-context inference on OCI can deploy this production-ready, fully validated architecture right away.
Beijing Qianxing Jietong Technology Co., Ltd.
Sandy Yang/Global Strategy Director
WhatsApp / WeChat: +86 13426366826
Email: yangyd@qianxingdata.com
Website: www.qianxingdata.com/www.storagesserver.com
Business Focus:
ICT Product Distribution/System Integration & Services/Infrastructure Solutions
With 20+ years of IT distribution experience, we partner with leading global brands to deliver reliable products and professional services.
“Using Technology to Build an Intelligent World”Your Trusted ICT Product Service Provider!