Spotlight: How the DataStax Astra DB Vector Database Fits the Vector Datastore Evaluation Criteria

3 min readSep 14, 2023

I am launching a new blog series called The Spotlight where I get to apply my market research to innovative products and concepts. In this first episode, I had a great privilege of learning from DataStax’s AI team and have applied my vector data store evaluation criteria to the DataStax Astra DB Vector Database.

Jonathan Ellis, a key committer and former project chair of Apache Cassandra, and co-founder of DataStax took it upon himself to spearhead the initiative to build vector embedding support, including ingest, store, index, and retrieval capabilities in its Astra DBaaS. Another DataStax expert I had the honor of interacting with is Alan Ho, the head of AI strategy, who brings his vast experience as an AI researcher at Google Research to the development of DataStax’s vector database offering.

You can look up the original evaluation criteria on Medium or you can download (no login required) the document from DataStax.

DataStax has been developing and maintaining the open source Apache Cassandra database since 2010. In 2011, it released a commercial version called DataStax Enterprise (DSE) and in 2020, a fully managed database-as-a-service called Astra DB.

In mid-2023, DataStax added support for generative AI use cases that rely on vector embeddings and vector search by offering Astra DB and DSE as a highly scalable vector database. DataStax positions its multi-cloud Astra DB as the only vector database for building production-level AI applications designed for real-world, mixed workloads with vector, tabular and streaming data.

Cassandra is known for its unique architecture that allows multiple concurrent writes and reads across hundreds and thousands of nodes distributed globally with limitless scalability and extreme throughput. Companies like Apple and Netflix have used Cassandra at extreme scale for AI applications not only because of its performance and scalability but also because it is fault tolerant as it can recover from failed nodes with no downtime.

DataStax designed Astra DB as a unique vector database by taking Cassandra’s SAI (storage-attached indexes) technology and implementing state-of-the-art vector search with it, and marrying it to the massively scalable and AI-proven Cassandra.

DataStax has continued to enhance the underlying platform by adding capabilities to support additional use cases, like multiprotocol streaming support for AMQP, JMS and Apache Kafka built on Apache Pulsar, and DevOps features using Kubernetes.

This spotlight uses the Vector Data Store Evaluation Criteria to highlight DataStax Astra DB’s new generative AI capabilities.