How To Build a Modern Database Management System: Reflections from SingleStore Now 2024

Sanjeev Mohan
11 min readNov 1, 2024

--

On October 3, 2024, SingleStore hosted its second annual conference, SingleStore NOW, at the iconic Chase Center in San Francisco. The event’s name itself reflects the urgency and innovation, defining today’s rapidly evolving data and AI landscape, driven by breakthrough technologies.

I had a front-row seat to hear from industry leaders, engage with customers and partners, and witness firsthand SingleStore’s incredible momentum, culminating in its best quarter of growth in history.

SingleStore Now 2024 at Chase Center, San Francisco on Oct 3, 2024

The theme for SingleStore NOW 2024 revolved around the intersection of AI and data at enterprise scale, with a focus on building intelligent applications. Key discussions centered on how to simplify complex data environments to accelerate innovation, scale, and seamless AI integration into real-time applications.

With the rising demand for real-time data ingestion, integration, and analysis, SingleStore demonstrated its capabilities in addressing these needs. In addition, customers are seeking native support for vector search, retrieval-augmented generation (RAG), large language models (LLMs), and the adoption of open standards like Apache Iceberg, all of which were prominent topics at the event.

While SingleStore may not yet have the widespread name recognition of some database giants, its origins provide an important context for understanding the unique value this platform brings to modern data management.

This blog shares valuable insights and lessons learned from the event.

What is SingleStore?

SingleStore began its journey in 2011 as an in-memory OLTP (Online Transaction Processing) database under the name MemSQL, delivering millisecond query responses with a MySQL interface. Even today, it remains significantly faster than many popular OLTP databases. However, with the introduction of columnstore to address OLAP (Online Analytical Processing) needs, the name MemSQL no longer reflected its full capabilities. The platform now supports both OLTP and OLAP workloads within a unified database, prompting its rebranding to SingleStore.

Over its thirteen-year journey, SingleStore has raised over $410 million in funding and attracted over 350 clients. Notably, in 2024, after OpenAI acquired Rockset, a significant number of Rockset’s clients migrated to SingleStore, further bolstering its growth. It has achieved Centaur status, a significant milestone marked by surpassing $100 million in annual recurring revenue (ARR). Impressively, 60% of this revenue is reinvested into research and development, demonstrating its commitment to innovation.

SingleStore has long been a pioneer in database technology. In 2018, during the Gartner Catalyst conference, the company showcased its advanced vector search capabilities — far ahead of its time. However, this innovation flew under the radar until recent years when the demand for running AI workloads skyrocketed. Before expanding into AI support, SingleStore had already enhanced its database in 2017 by integrating full-text search with SIMD hardware acceleration.

SingleStore’s distributed architecture is engineered for high performance with no single point of failure, scaling dynamically through its aggregator and leaf structure. It automatically distributes data across leaf nodes in partitions, adding nodes as needed to optimize load handling. An aggregator routes queries to the leaf nodes, consolidates intermediate results, and returns the output to the client. Additionally, it monitors the cluster and manages failover as required, ensuring resilience and reliability.

The following sections of this document dive into the technological advancements inherent in the platform that sets a new standard for how to create a developer-centric GenAI platform.

Platform

A single datastore significantly reduces overhead of converting transaction-focussed OLTP databases to analytical query-focused OLAP databases. ETL or ELT processes add a significant overhead of time and cost. Businesses want fast access to data without incurring heavy complexity and costs.

SingleStore provides that simplicity, but it is carving out an interesting niche for itself as being the “contextual speed layer” that consolidates traditional data stacks comprising data warehouses, streaming and time-series transactional databases, with vector data and leverages LLMs and agents at the forefront. We see this strategy coming to fruition via its expanding capabilities and partnerships like Snowflake.

To achieve this goal, here are the highlights of its architecture:

Data types

Not resting on its hybrid transaction analytical processing (HTAP) laurels, SingleStore has added native JSON data type allowing semi-structured data to be ingested, stored, and queried along with structured data in the same SQL statement. JSON-specific indexing, vectorized JSON array aggregation, and JSONPath expressions extract and filter specific parts of JSON documents efficiently.

As mentioned earlier, SingleStore has supported vectors for several years but those vectors were stored in the BLOB columns. Now native vector data types efficiently store and query high-dimensional vectors, which represent complex data like text, images, and audio. This capability allows for similarity search, recommendation systems, and other advanced analytics use cases. Please see the Hybrid Search section below for more details on vectors.

Along with all the usual numeric, string, date, boolean, and blob data types, SingleStore also has robust support for geospatial data types. All these data types benefit from the distributed and resilient underlying platform.

Three-tiered storage architecture

SingleStore handles data across three layers:

  • Memory rowstore is used for millisecond performance and high concurrent workloads.
  • Disk, also called persistent cache, stores the working set data in columnstore format for balancing performance with cost-efficiency.
  • Object store, aka bottomless storage, stores a copy of all the data asynchronously for durability and to perform point-in-time recovery, if needed.

Data automatically moves between memory, SSD/NVMe cache, and cloud object stores based on usage patterns. SingleStore’s “universal storage” unifies a single table type that can be optimized for both transactional and analytical workloads by combining the strengths of rowstore and columnstore storage layers.

Lakehouse integration

SingleStore’s zero-ETL integration from Apache Iceberg supports automated schema inference. This public preview launch marks SingleStore’s first foray into open table formats that enable organizations to bring any compute engine to analyze data. Today, Iceberg Tables are loaded into memory as “projections” to permit real-time analysis. Soon, SingleStore will add support for external tables and schema evolution.

SingleStore has a bi-directional lakehouse support where it can export its data as Iceberg Tables and write metadata to Amazon Glue and Snowflake REST API catalog (for more information please see the Partners section below).

Real-time

AI is forcing databases to become more real-time and multimodal. For example, OpenAI has introduced Real-time APIs for speech. With the cost of tokens going to almost zero, businesses expect their databases to also offer real-time support.

All these enhancements further strengthen the foundation of the HTAP distributed and auto-scalable architecture with a single goal to rapidly build and deploy real-time applications.

Connectors

While SingleStore is on a quest to enable storage of multi-modal data and simplify data management via a single database for most of the workloads, organizations have several disparate systems of records that need to be queried in order to provide comprehensive decisions. SingleStore has relied on built-in streaming technologies like Kafka connectors and change data capture (CDC) to feed its real-time data ingestion needs.

At Now 2024, SingleStore announced its acquisition of BryteFlow to further expand its connectors to a rich set of ERP and CRM sources like SAP, Oracle, and Salesforce. This Sydney-based company uses CDC and built-in data integrity checks to enable data integration using a no-code user interface. The SingleConnect service incorporates all the data ingestion options — real-time streaming and batch.

SingleStore Kai is a MongoDB-compatible API that enables seamless ingestion of MongoDB data directly into SingleStore using the zero-ETL paradigm. That means as soon as a new document is inserted into MongoDB, it is visible in SingleStore near instantly and can be used to perform millisecond analytics. In addition, MongoDB applications operate on SingleStore without requiring any code modifications (just need to change the database endpoint).

Hybrid Search

Since pioneering vector search in a DBMS, SingleStore has been on a rampage in this space considering that they have done, what seems like, one AI-led webinar every working day of this year! The topics range from using the latest LLMs to deploying retrieval augmented generation (RAG) pipelines, and developing agents.

In 2024, SingleStore added Approximate Nearest Neighbor (ANN) indexing to K-Nearest Neighbor Search. There is an accuracy vs. speed trade-off between exact kNN and ANN search but this supports its pluggable and tunable vector index algorithm approach. Developers can bring open-source libraries:

  • FLAT: Keeps all vectors in RAM. Instead of using an index, a full scan is done to find the nearest vector matches.
  • IVF_FLAT: Inverted File Index (IVF) clusters vectors and the search algorithm only looks at the nearest clusters.
  • IVF_PQ: Product quantization (PQ) are used to encode vectors to speed up the IVF process.
  • IVF_PQFS: Further performance boost is possible by using 4-bit PQ fast scans.
  • HNSW_FLAT: Meta’s Hierarchical Network Small World (HNSW) builds a hierarchical proximity graph and search is done in a layered fashion.
  • HNSW_PQ: applies product quantization to the foundational HNSW_FLAT.

For vector search, SingleStore supports Dot Product and Euclidean Distance. Using the power of SQL, developers can create multiple different vector indexes on the same table and evaluate recall precision, performance, and cost for each to select the most optimum choice. A future release will add support for GPUs to build vector indexes faster.

The engine for full text search is migrating to the Java version of Apache Lucene. Hybrid search lets users combine real-time, vector, and full text searches using ANSI SQL and JSON-based interfaces on HTAP data. This is what makes SingleStore standout as compared to specialized databases for each of the use cases. In fact, SingleStore Spaces contains a gallery of blueprints to help developers quickly start customizing their applications.

Deployment

SingleStore Helios, a Database-as-a-Service (DBaaS), is a fully managed cloud platform that simplifies deployment, scaling, and maintenance of its databases. It handles elastic scaling, HA/DR, database upgrades, patching, and monitoring and includes end-to-end encryption, role-based access control, and compliance certifications.

Helios offers a free-for-life tier, called Shared Edition to help developers get started with their cloud-native applications quickly. This edition allows users to create a “starter workspace” on shared-computer resources that does not consume any credits. It is meant to prototype non-production applications. It takes under a second to have a full working copy of the SingleStore database.

A “workspace” is a compute instance similar to Snowflake’s virtual warehouse as it gives customers separation of compute and storage. Users can have separate workspaces for their different workloads while accessing the same storage. This way, read-intensive workloads do not impact write heavy workloads even on the same data.

Standard Edition uses dedicated compute resources, providing more configurability, control, larger resource limits, and it consumes credits. SingleStore provides some free credits initially for trial users to experience the platform.

Helios brings the SaaS version on the public cloud to its already self-managed option on-premises and in the cloud. In addition, Singlestore now supports the emerging trend that is known as BYOC (Bring Your Own Cloud). In this approach, the client’s data plane is still in the public cloud, but in their own account / VPC. This ensures that their data obeys all the internal security rules and policies. Finally, SingleStore doesn’t yet have a serverless option, but it is on their roadmap for 2025.

SingleStore Pro Max combines many of the capabilities we have discussed thus far, like real-time data ingestion, processing, and querying of analytics and transactional workloads with data stored in universal storage. It supports all the data types like JSON and vector, and scales compute resources based on demand, optimizing cost and performance.

SingleStore Smart DR maintains business continuity by automatically replicating data, configurations, users, roles and permissions, firewall policies, Pipelines (for ingest), and all other metadata to a secondary region. Disaster recovery (DR) can be performed across multiple regions in a single click. The secondary copy does not use any compute resources. Imagine having a spare copy of your database in a different region that you don’t have to pay for unless you failover to it.

One of Snowflake’s most popular features has been “zero copy clone” that allows a copy of the database to be spawned instantly. SingleStore has introduced a similar concept called Database branching.

Partners

SingleStore has several partners. In this section, I only mention the ones that were at the Now ’24 event.

SingleStore now runs as a Snowflake Native App inside Snowpark Container Service. Think of this as a reincarnation of Snowflake’s Unistore idea of providing OLTP capabilities on its data, but via SingleStore. In other words, SingleStore can now deliver millisecond query responses while running as a container inside Snowflake. Hence, users can extend all their Snowflake processes like data access policy execution natively to SingleStore data.

The bi-directional integration allows real-time data pipelines to ingest data directly into SingleStore and share it with Snowflake. The single interface underscores SingleStore’s simplicity story as users no longer have to move data across data silos and can develop apps using Snowflake APIs. In addition, SingleStore users can leverage Snowflake’s Polaris catalog to process Iceberg Tables. Ingesting data directly using Kafka from Iceberg Tables doesn’t incur any Snowflake credits.

SAS is a venerable analytics company with over 17K customers. Its cloud SaaS product, Viya, embeds SingleStore. It not only uses the database’s capabilities, but its roadmap will enable future in-database model execution.

AWS is one of SingleStore’s strategic AI partners. At the conference, AWS demonstrated a reference architecture of building a real-time customer support co-pilot solution using AWS Bedrock agents with SingleStore and Anthropic models.

LlamaIndex has been at the forefront for enabling AI workloads such as parsing, chunking and extracting entities from unstructured documents like PDFs, and enabling RAG pipelines. It has advanced into helping provide agentic architectures.

IBM demonstrated how its lakehouse (watsonx.data) and governance (watsonx.governance) utilize SingleStore’s real-time capabilities.

Groq is a language processing unit (LPU) chip that uses multi-modal LLMs, supporting text, speech, and vision, to provide very fast interference. It can process 1200 tokens a second using its 14nm chip. At Now, SingleStore demonstrated running analytics in real-time videos using Groq to accelerate inference and embedding generation using vision foundation models.

Customers

Several customers presented their use cases at Now ’24. Here are the highlights:

LiveRamp connects massive customer data from different sources to drive marketing analytics and personalization. They run 6K pipelines daily. It evaluated multiple databases across nearly twenty criteria and has published its results publicly. Based on the results, they chose SingleStore as their future state database.

Adobe demonstrated how they consolidated PostgreSQL, Snowflake and Elastic into ProMax.

6Sense is a Martech company that ingests high volumes of clickstream data and then enriches and analyzes for three personas — marketers, sellers and revenue operations. Their use case involves curating data and enriching by, say, adding emails to the clickstream data.

Konnect 3D replaced MongoDB with a 5-line migration script and zero-ETL. They have reduced their costs and increased performance significantly.

Conclusion

SingleStore is meeting the promises of democratizing data and AI via its three tenets: speed, scale and simplicity. All its announcements in 2024 have been towards meeting these goals.

SingleStore’s commitment to innovation, demonstrated through its rapid development and integration of advanced features, sets a new standard for modern database management systems. The company’s strategic vision and technological prowess position it well to lead in the evolving landscape of real-time data and AI integration. With its robust capabilities and continuous investment in research and development, SingleStore is poised to drive significant advancements in the industry, making real-time, multimodal data processing more accessible and efficient for enterprises worldwide.

--

--

Sanjeev Mohan
Sanjeev Mohan

Written by Sanjeev Mohan

Sanjeev researches the space of data and analytics. Most recently he was a research vice president at Gartner. He is now a principal with SanjMo.

No responses yet