IBM has unveiled a content-aware storage (CAS) architecture that embeds AI data processing directly within the storage layer. This approach is tailored for retrieval-augmented generation (RAG) workflows, as it integrates document vectorization into the storage system itself—cutting down on the need for external preprocessing pipelines.
CAS transfers a key RAG function—document embedding via large language model (LLM)-based methods—into the storage infrastructure. This allows enterprises to process and index data in its existing location, aligning storage systems with AI-driven workloads and minimizing data movement across different infrastructure tiers. IBM positions this as a means to simplify deployment while boosting performance and enhancing data locality for AI applications.
Vector Database at Scale
At the heart of IBM’s CAS implementation lies a vector database optimized for semantic search. Vector databases support approximate nearest-neighbor (ANN) search, enabling AI systems to retrieve relevant data chunks based on similarity metrics like cosine similarity or L2 distance. This capability is fundamental to RAG, where user queries are converted into vectors and matched against indexed enterprise data to deliver context-aware responses.
IBM CAS ChartSource: IBM
IBM Research, in collaboration with Samsung and NVIDIA, showcased a prototype system capable of scaling to 100 billion vectors on a single server. The system achieved over 90 percent recall and precision, with an average query latency of under 700 milliseconds. This scale caters to enterprise environments where datasets can span billions of files and, once fully indexed, grow to hundreds of billions of vectors.
RAG Pipeline Integration
RAG is becoming a favored approach for enterprise AI, as it enhances output accuracy without the need for model retraining. It works by supplementing prompts with enterprise-specific data retrieved from a vector database.
The pipeline starts with data ingestion, where documents such as PDFs and presentations are parsed, split into chunks, and converted into embeddings. These embeddings are stored in a vector database that organizes data for efficient similarity search. During querying, user input is embedded and matched against stored vectors, with relevant content passed to the language model as context. This grounding mechanism reduces hallucinations and increases trust in AI-generated outputs.
IBM’s CAS integrates this entire pipeline directly into storage, consolidating ingestion, indexing, and retrieval in close proximity to the data.
Addressing Scale and Cost Challenges
Enterprise storage systems already operate at petabyte scale. When extended to CAS, each file can generate hundreds of vectors, quickly expanding the dataset size. Traditional vector databases typically scale out across multiple servers, introducing additional costs and operational complexity. Indexing and reindexing large datasets also become time-consuming tasks.
IBM’s approach focuses on improving vector density and reducing indexing overhead to limit infrastructure sprawl. The architecture separates vector and index storage from query compute, enabling independent scaling of storage and compute resources. This is made possible by IBM Storage Scale and its high-performance parallel file system.
Storage and Hardware Architecture
The CAS implementation leverages the IBM Storage Scale System 6000 (ESS 6000), an all-flash platform designed for AI and high-performance workloads. The system supports up to 48 NVMe drives per 4U enclosure, with individual drive capacities ranging from 7 TB to 60 TB. It integrates PCIe Gen5, 400 Gb InfiniBand, or 200 Gb Ethernet connectivity, delivering up to 340 GB/s read and 175 GB/s write throughput per node, along with up to 7 million IOPS.
The platform also supports NVIDIA GPUDirect Storage, facilitating direct data paths between storage and GPUs, as well as BlueField-3 DPUs to offload network and data processing tasks.
Samsung PM9D3a PCIe Gen5 NVMe SSDs provide high-throughput, high-density storage. Based on eighth-generation TLC V-NAND, these drives offer up to 30.72 TB per device, with sequential read speeds of up to 12 GB/s and write speeds of up to 6.8 GB/s. The use of commercially available enterprise SSDs allows the architecture to scale using standard components.
Hierarchical Indexing and GPU Acceleration
To tackle indexing at scale, IBM developed a hierarchical indexing model consisting of multiple sub-indexes that can be optimized independently. This structure enables incremental updates and localized reindexing without disrupting the entire dataset, improving both availability and operational efficiency.
GPU acceleration drastically reduces indexing time compared to CPU-only approaches. Tasks that would take hours on CPUs can be completed in minutes using NVIDIA GPUs. In testing, building indexes for 100 billion vectors took 4 days with 6 NVIDIA H200 GPUs, compared to an estimated 120 days on a dual-socket CPU system.
The full dataset, including vectors and indexes, consumed approximately 153 TiB of storage. Initial data loading and partitioning took nine days. The resulting system delivered an average query latency of 694ms with 90% recall, validated against brute-force ground-truth calculations.
Roadmap
IBM and NVIDIA are continuing to optimize the platform, focusing on reducing indexing and query latency. Current targets include indexing 100 billion or more vectors within a single day, cutting data ingestion time from nine days to one day, and lowering query latency to the 50-100 millisecond range while maintaining 90 percent recall.
Integrating vector indexing into standard file systems aims to simplify deployment and lower barriers to enterprise AI adoption. By embedding RAG capabilities directly into storage, IBM is positioning CAS as a foundational layer for AI-enabled infrastructure.
Beijing Qianxing Jietong Technology Co., Ltd.
Sandy Yang/Global Strategy Director
WhatsApp / WeChat: +86 13426366826
Email: yangyd@qianxingdata.com
Website: www.qianxingdata.com/www.storagesserver.com
Business Focus:
ICT Product Distribution/System Integration & Services/Infrastructure Solutions
With 20+ years of IT distribution experience, we partner with leading global brands to deliver reliable products and professional services.
“Using Technology to Build an Intelligent World”Your Trusted ICT Product Service Provider!
Sandy Yang/Global Strategy Director
WhatsApp / WeChat: +86 13426366826
Email: yangyd@qianxingdata.com
Website: www.qianxingdata.com/www.storagesserver.com
Business Focus:
ICT Product Distribution/System Integration & Services/Infrastructure Solutions
With 20+ years of IT distribution experience, we partner with leading global brands to deliver reliable products and professional services.
“Using Technology to Build an Intelligent World”Your Trusted ICT Product Service Provider!



