Oracle Shines Through Multi-Clouds: A CloudWorld 2024 Retrospective

Sanjeev Mohan
10 min readSep 28, 2024

--

How many cloud companies can claim they colocate their hardware directly inside other cloud providers’ data centers?

The biggest news from this year’s Oracle CloudWorld was that Oracle Cloud Infrastructure (OCI) will soon be hosted inside all three hyperscalers — AWS, Google Cloud, and Microsoft Azure.

While this news may have dominated headlines, the event was packed with other exciting developments that highlight Oracle’s rapid growth and innovation in the cloud space. It is fair to say that Oracle was a late bloomer in the cloud, but has innovated quickly.

Oracle is also making its mark in the AI space, becoming the place to go for all kinds of AI workloads including extreme-scale model training by offering a wide range of GPU instances. It announced the world’s first Zettascale AI supercomputer with 131,072 NVIDIA GPUs in OCI.

This blog highlights many of its new developments in the data space as shown in Figure 1.

Figure 1. Key Data and AI announcements at Oracle CloudWorld 2024

If you are interested in a podcast version of this blog, please click here.

Public Multi-Cloud

Oracle has had its fast interconnect with Microsoft Azure now for a few years. This arrangement allows Oracle’s databases to be accessed by Azure services, like PowerBI, with very low latency. Both companies achieved this milestone by locating their hardware close to each other’s and then connecting them using a 100 Gb/s network connection. The OCI-Azure relationship was an excellent learning curve and test lab for Oracle’s multi-cloud blueprint. It’s interesting how it progressed over the years to the point when Oracle and Google signed their deal, both knew how to ramp up quickly.

Oracle has extended Oracle Database@Azure by moving OCI physically inside the other two hyperscalers’ data centers. Oracle Database@Google Cloud is now generally available in four Google Cloud regions across the United States and Europe. Oracle Database@AWS will have one region operational by December 2024.

The integration with the hyperscale cloud providers spans not just data and networking but also identity, metrics, events, logging and billing. For example, AWS customers will be able to request Oracle database services running on Exadata and OCI from their management console or marketplace and have it drawn against their commitment.

But you may ask why is this needed when AWS already has Amazon RDS for Oracle? I am glad you asked.

Amazon RDS only supports Oracle’s Standard Edition which is a good starting point but it lacks all the new enhancements of Oracle RDBMS like vector and graph support. Oracle Database@AWS, is meant for the offerings that only exist in OCI, like Exadata, Exascale (more on it later), Autonomous Database, and Oracle Database 23ai RDBMS.

Another difference is regarding the license cost. While Amazon RDS for Oracle license cost is higher than the RDBMS from Oracle, the cost of Oracle Database 23ai in OCI and other hyperscalers will be exactly the same. Also, this partnership removes any egress charges between Oracle and the cloud providers.

The partnership is not bi-directional at the services level. In other words, data in Oracle databases is available to, say, Amazon SageMaker for ML use cases, but you can’t access, say, Google BigQuery inside of OCI.

Before we move onto other announcements, it is interesting to note that Oracle has not stopped at just partnering with these hyperscalers, but has similar arrangements with other cloud providers like Fujitsu, NTT Data, and NRI in Japan.

Finally, OCI is the cheapest cloud provider, according to a study by RedMonk.

Private Cloud

Oracle has robust offerings for those customers who rely on private clouds but want fully managed OCI’s public cloud services in their own premises. This allows organizations to maintain control over data residency, security, and compliance. Oracle’s private cloud offerings include:

  • Oracle Dedicated Region@Customer: Oracle deploys and operates all of OCI public cloud services directly in a customer’s own data center. In 2024, it shrank the smallest footprint of a dedicated region to just 3 racks, from 25 previously.
  • Alloy Cloud: Allows organizations, such as independent software vendors (ISVs) and financial institutions, to offer cloud services under their own brand. They can customize and extend OCI to meet the specific needs of their customers, while still leveraging Oracle’s core cloud technologies. Alloy Cloud can be managed by the organizations or by Oracle.
  • Sovereign Cloud: A specialized cloud offering designed to meet the stringent data sovereignty, security, and compliance requirements of governments, highly regulated industries, and enterprises with sensitive data. Oracle ensures that all customer data and metadata remain within the jurisdiction of the country or region where the cloud operates and is physically and logically separated from public OCI.

Customers get the same operational model, APIs, and service-level agreements (SLAs) as they would in Oracle’s public cloud, meaning workloads can be easily migrated between the customer’s region and Oracle’s public cloud.

Exascale

Exadata has been Oracle’s mainstay massively parallel processing (MPP) database system designed to handle extremely large datasets and complex workloads with extreme performance, reliability, availability, and security. Exadata customers tend to be very large organizations where each cloud tenant has dedicated compute and storage servers. This option is highly optimized for each customer and provides workload isolation. However, it requires a large investment.

Exadata Exascale has been reimagined for smaller workloads using a multi-tenant approach where a common pool of compute and storage is used to support thousands of tenants and millions of databases. It is like a “virtualized Exadata”. Its pricing is based on a pay-per-use model. Exascale infrastructure inherits all the capabilities of Exadata, like remote direct memory access (RDMA) and RDMA over Converged Ethernet (RoCE) switching infrastructure. The entry price for Exascale is only $300 compared to Exadata whose starting point is $10K.

While Exadata supports high-performance transactional and analytical workloads, Exacale is designed for workloads that need extreme scale, like AI model training, scientific modeling, and real-time data processing. Exadata Exascale can be deployed in Oracle Cloud or in customers’ private cloud infrastructure.

Exadata is Oracle’s integrated, pre-configured, optimized, and engineered hardware and software platform while Exascale is its elastic, cloud-native architecture that leverages Exadata. Oracle Real Application Clusters (RAC) is the database clustering technology that allows multiple instances to access a single database simultaneously and handle availability. RAC can be deployed on Exadata or on commodity hardware. Both RAC and Exadata have their own licensing separate from Oracle Database license. Exascale includes RAC but also introduces additional features and optimizations for the entire Exadata platform. Oracle Autonomous Database is a fully managed cloud database service that is built on Exadata infrastructure and incorporates RAC technology and uses machine learning to automate database administration tasks.

To summarize, Oracle’s offerings have many options which are related to each other and not mutually exclusive:

Figure 2: Oracle’s multi faceted capabilities

Converged, unlike others, is not a product but Oracle’s philosophy to deliver a comprehensive database that serves multiple workloads and use cases. Let’s look at Oracle’s Converged Database next.

Converged Data

Most databases are optimized for specific types of data (e.g., relational, JSON, spatial, vector) but Oracle Database supports multiple data models, workloads, and services natively, all within a single database engine. This eliminates the need for organizations to manage multiple specialized databases for different use cases and move data (ETL) across databases improving operational efficiency and reducing complexity.

JSON Relational Duality

Handling JSON documents in relational databases has been a conundrum. In the very first iteration, relational databases just stored JSON documents as a BLOB column and then flattened JSON hierarchies into a single column. However, that limited JSON’s semi structured hierarchical benefits. In the next iteration, databases added a native JSON data type that allowed documents to be stored in their raw format. Oracle’s Autonomous JSON supports this. Today, with JSON-Relational Duality Views, Oracle in effect encodes the hierarchies so they can be “reinflated.” This allows Oracle to address the impedance mismatch between application developers and the data modelers, while still benefiting from all its database optimizations and the security model of its RDBMS. This approach reduces the need and the overhead of a traditional object relational mapping (ORM) tool.

Oracle stores JSON documents in its relational storage engine, but still enables REST API access (along with SQL API and property graph access which is covered next). If an attribute value changes inside a document, that change doesn’t have to be made to every single impacted JSON document, but it can use the power of foreign keys to make that atomic change instantly in the schema. Oracle has added MongoDB API compatibility to its JSON APIs.

Property Graphs

Oracle has offered a standalone graph database for a long time, but has now built its in-database graph data model. This allows Oracle to support operational real-time use cases. More interesting is the support for a new standard called SQL/PGQ. Property Graph Queries is an open-source extension to SQL that allows creating property graphs in SQL. The standard helps in increasing code portability and reducing the risk of application lock-in.

SQL/PGQ queries can be executed on graphs represented as a JSON document. The JSON document, in this case, stores vertices and edges.

AI Vector Search

Oracle has brought hybrid search text and vector search and retrieval augmented generation (RAG) capabilities right inside its database. Oracle Database 23ai has added a new VECTOR data type and two types of vector indexes — a quantized in-memory and a disk-based one that uses Meta’s Hierarchical Navigable Small World (HNSW). It has an ambitious roadmap that will add other popular vector indexes, like SCANN, IVF, and Product Quantization (PQ).

Oracle provides multiple options to read, chunk, and create embeddings for unstructured data:

  1. Using SQL functions and PL/SQL stored procedure to perform parallel chunking and embedding operations.
  2. Using a REST API to access third-party vector embedding models. Oracle uses the local host REST endpoint provider, Ollama, which is a free and open-source command-line interface tool that allows you to run LLMs (such as Llama 3, Phi 3, Mistral, Gemma 2) locally and privately.
  3. Converting an embedding model into the Open Neural Network Exchange (ONNX) format and downloading it to the database.

In addition, Oracle AI Vector Search is integrated with open source orchestration framework, LangChain. Also, GoldenGate 23ai replicates data from various sources to Oracle where the data is vectorized to enable AI vector search.

Oracle vector search has been optimized by transparently offloading to parallel Exadata storage servers for faster search. And, the HNSW indexes can be replicated on every RAC node for even faster similarity searches.

Oracle Database 23ai advancements

One common theme that has emerged is that Oracle is adding new native data types and is bringing back capabilities that used to exist in the database but moved to other layers like security or application. Hence, they have added a firewall and a new cache to the database.

Oracle launches a long-term support (LTS) release database every four years and its latest is 23ai. In this release, JavaScript is a first-class client-server development language along with PL/SQL or Java to write stored procedures.

Oracle Database 23ai is a treasure trove of new enhancements:

  • Lock-Free long running transactions: Lock-free updates resolve complex transaction management problems where multiple apps can read and update data without holding locks and compromising data integrity.
  • Globally Distributed Database Raft-based replication: This quorum-based replication protocol provides automatic failover to a replica in under 3 seconds. Oracle’s distributed database creates horizontal shards of data, where each shard acts as a primary for a subset of data. This new approach guarantees active-active symmetric configuration and zero data loss.
  • True Cache: This is a lightweight disk-less instance of Oracle that caches data and avoids querying back-end databases. In case of a cache miss, True Cache automatically fetches the data from the underlying database.
  • SQL Firewall: Monitors and blocks unauthorized SQL and SQL injection attacks. Security is a whole topic which warrants a complete blog post on it. According to Oracle, its database has been hardened by organizations like the CIA for the last 20 years.
  • Auto Real-time SQL Plan Management: Obviates the need of a DBA to resolve performance issues as it automatically detects and repairs SQL performance regressions.

Other enhancements help with supporting microservices, priority transactions, and rolling patches.

HeatWave

Formerly known as MySQL HeatWave, this started as a full-managed MySQL offering in OCI with in-memory analytics. It has grown in every direction as a versatile HTAP database providing built-in capabilities:

  • HeatWave MySQL: Fully automated, managed MySQL enterprise edition with integrated in-memory query accelerator. Queries can join data in MySQL with data on object stores without any data movement.
  • HeatWave Lakehouse: Leverages OCI or Amazon S3 object store, while using HeatWave’s query engine to do analytics across structured, semi-structured, and unstructured documents.
  • HeatWave AutoML: Includes automated pipeline to train and build ML models.
  • HeatWave GenAI: Released in 2024, it is bundled with in-database LLMs (Mistral and Meta Llama 3) and a vector store with scale-out vector processing up to 512 nodes. One unique feature of HeatWave GenAI is that when the source data changes, it automates vector store creation and updates embeddings. It can also use external LLMs like Cohere. Another unique addition is HeatWave Chat, which allows users to interact with data in natural language.

At the conference, Oracle announced a slew of enhancements, ranging from optimizer performance to reliability. Besides OCI, HeatWave is also available natively on AWS.

Intelligent Data Lake

Oracle Intelligent Data Lake is a new component of the Oracle Data Intelligence Platform that will be available in 2025. It is Oracle’s version of a lakehouse that integrates with Oracle Autonomous Data Warehouse and HeatWave. It will comprise a data catalog, Apache Spark and Apache Flink processing engine and a Jupyter notebook.

Lakehouses with open table formats is one of the hottest topics in the data space but somehow didn’t get as much airtime as the rest of new announcements. However, this is a critical initiative from Oracle because it covers not only its data estate but also its vast array of Fusion ERP and CRM applications.

Oracle is expanding the reach of its Fusion Data Intelligence beyond its own ERP and CRM applications and into Salesforce. While we are on the topic of Fusion apps, Oracle released 50 AI agents to automate key business processes with an eye towards improving productivity.

Figure 3 shows oracle’s schematic of its Intelligent Data Lake.

Figure 3. Oracle Intelligent Data Lake Marketecture

We will look forward to more announcements regarding the lakehouse at the Oracle CloudWorld 2025.

--

--

Sanjeev Mohan

Sanjeev researches the space of data and analytics. Most recently he was a research vice president at Gartner. He is now a principal with SanjMo.