logo
Home Cases

IBM Introduces Content-Aware-Storage for RAG Workloads

Certification
China Beijing Qianxing Jietong Technology Co., Ltd. certification
China Beijing Qianxing Jietong Technology Co., Ltd. certification
Customer Reviews
The sales staff of Beijing Qianxing Jietong Technology Co.,Ltd are very professional and patient. They can provide quotations quickly. The quality and packaging of the products are also very good. Our cooperation is very smooth.

—— 《Festfing DV》LLC

When I was looking for intel CPU and Toshiba SSD urgently, Sandy from Beijing Qianxing Jietong Technology Co., Ltd gave me a lot of help and got me the products I needed quickly. I really appreciate her.

—— Kitty Yen

Sandy of Beijing Qianxing Jietong Technology Co.,Ltd is a very careful salesman, who can remind me of configuration errors in time when I buy a server. The engineers are also very professional and can quickly complete the testing process.

—— Strelkin Mikhail Vladimirovich

We are very happy with our experience working with Beijing Qianxing Jietong. The product quality is excellent, and delivery is always on time. Their sales team is professional, patient, and very helpful with all our questions. We truly appreciate their support and look forward to a long-term partnership. Highly recommended!

—— Ahmad Navid

Quality: “Great experience with my supplier. The MikroTik RB3011 was already used, but it was in very good condition and everything works perfectly. Communication was fast and smooth, and all my concerns were addressed quickly. Very reliable supplier—highly recommended.”

—— Geran Colesio

I'm Online Chat Now

IBM Introduces Content-Aware-Storage for RAG Workloads

April 24, 2026
IBM has unveiled a content-aware storage (CAS) architecture that embeds AI data processing directly within the storage layer. This approach is tailored for retrieval-augmented generation (RAG) workflows, as it integrates document vectorization into the storage system itself—cutting down on the need for external preprocessing pipelines.

CAS transfers a key RAG function—document embedding via large language model (LLM)-based methods—into the storage infrastructure. This allows enterprises to process and index data in its existing location, aligning storage systems with AI-driven workloads and minimizing data movement across different infrastructure tiers. IBM positions this as a means to simplify deployment while boosting performance and enhancing data locality for AI applications.

Vector Database at Scale


At the heart of IBM’s CAS implementation lies a vector database optimized for semantic search. Vector databases support approximate nearest-neighbor (ANN) search, enabling AI systems to retrieve relevant data chunks based on similarity metrics like cosine similarity or L2 distance. This capability is fundamental to RAG, where user queries are converted into vectors and matched against indexed enterprise data to deliver context-aware responses.


latest company case about IBM Introduces Content-Aware-Storage for RAG Workloads  0
                                                                             IBM CAS ChartSource: IBM

IBM Research, in collaboration with Samsung and NVIDIA, showcased a prototype system capable of scaling to 100 billion vectors on a single server. The system achieved over 90 percent recall and precision, with an average query latency of under 700 milliseconds. This scale caters to enterprise environments where datasets can span billions of files and, once fully indexed, grow to hundreds of billions of vectors.

RAG Pipeline Integration


RAG is becoming a favored approach for enterprise AI, as it enhances output accuracy without the need for model retraining. It works by supplementing prompts with enterprise-specific data retrieved from a vector database.

The pipeline starts with data ingestion, where documents such as PDFs and presentations are parsed, split into chunks, and converted into embeddings. These embeddings are stored in a vector database that organizes data for efficient similarity search. During querying, user input is embedded and matched against stored vectors, with relevant content passed to the language model as context. This grounding mechanism reduces hallucinations and increases trust in AI-generated outputs.

IBM’s CAS integrates this entire pipeline directly into storage, consolidating ingestion, indexing, and retrieval in close proximity to the data.

Addressing Scale and Cost Challenges


Enterprise storage systems already operate at petabyte scale. When extended to CAS, each file can generate hundreds of vectors, quickly expanding the dataset size. Traditional vector databases typically scale out across multiple servers, introducing additional costs and operational complexity. Indexing and reindexing large datasets also become time-consuming tasks.

IBM’s approach focuses on improving vector density and reducing indexing overhead to limit infrastructure sprawl. The architecture separates vector and index storage from query compute, enabling independent scaling of storage and compute resources. This is made possible by IBM Storage Scale and its high-performance parallel file system.

Storage and Hardware Architecture


The CAS implementation leverages the IBM Storage Scale System 6000 (ESS 6000), an all-flash platform designed for AI and high-performance workloads. The system supports up to 48 NVMe drives per 4U enclosure, with individual drive capacities ranging from 7 TB to 60 TB. It integrates PCIe Gen5, 400 Gb InfiniBand, or 200 Gb Ethernet connectivity, delivering up to 340 GB/s read and 175 GB/s write throughput per node, along with up to 7 million IOPS.

The platform also supports NVIDIA GPUDirect Storage, facilitating direct data paths between storage and GPUs, as well as BlueField-3 DPUs to offload network and data processing tasks.

Samsung PM9D3a PCIe Gen5 NVMe SSDs provide high-throughput, high-density storage. Based on eighth-generation TLC V-NAND, these drives offer up to 30.72 TB per device, with sequential read speeds of up to 12 GB/s and write speeds of up to 6.8 GB/s. The use of commercially available enterprise SSDs allows the architecture to scale using standard components.

Hierarchical Indexing and GPU Acceleration


To tackle indexing at scale, IBM developed a hierarchical indexing model consisting of multiple sub-indexes that can be optimized independently. This structure enables incremental updates and localized reindexing without disrupting the entire dataset, improving both availability and operational efficiency.

GPU acceleration drastically reduces indexing time compared to CPU-only approaches. Tasks that would take hours on CPUs can be completed in minutes using NVIDIA GPUs. In testing, building indexes for 100 billion vectors took 4 days with 6 NVIDIA H200 GPUs, compared to an estimated 120 days on a dual-socket CPU system.

The full dataset, including vectors and indexes, consumed approximately 153 TiB of storage. Initial data loading and partitioning took nine days. The resulting system delivered an average query latency of 694ms with 90% recall, validated against brute-force ground-truth calculations.

Roadmap


IBM and NVIDIA are continuing to optimize the platform, focusing on reducing indexing and query latency. Current targets include indexing 100 billion or more vectors within a single day, cutting data ingestion time from nine days to one day, and lowering query latency to the 50-100 millisecond range while maintaining 90 percent recall.

Integrating vector indexing into standard file systems aims to simplify deployment and lower barriers to enterprise AI adoption. By embedding RAG capabilities directly into storage, IBM is positioning CAS as a foundational layer for AI-enabled infrastructure.

Beijing Qianxing Jietong Technology Co., Ltd.
Sandy Yang/Global Strategy Director
WhatsApp / WeChat: +86 13426366826
Email: yangyd@qianxingdata.com
Website: www.qianxingdata.com/www.storagesserver.com
Business Focus:
ICT Product Distribution/System Integration & Services/Infrastructure Solutions
With 20+ years of IT distribution experience, we partner with leading global brands to deliver reliable products and professional services.
“Using Technology to Build an Intelligent World”Your Trusted ICT Product Service Provider!
Contact Details
Beijing Qianxing Jietong Technology Co., Ltd.

Contact Person: Ms. Sandy Yang

Tel: 13426366826

Send your inquiry directly to us (0 / 3000)