Google Cloud Next ’24 Infuses AI into a Unified Data Stack

Sanjeev Mohan
11 min readApr 19, 2024

--

The Google Cloud Next 2024 edition took place from April 9 — 11 in Las Vegas. As expected, it was packed with new announcements and developments, even though the last conference was held only 8 months ago. This was the biggest Google Cloud Next to date with over 30,000 attendees.

In this paper, I distill some of the key announcements pertaining to data, analytics, and AI and its impact on the big picture. Last year, I had created a framework mimicking the 7-layer OSI model to cover the new developments. This year, I am using sections of Google’s strategy framework, as shown below to bucket and analyze key announcements.

Figure 1: Google Cloud’s strategy pillars with AI infused in each category.

The breadth of announcements was so wide that out of almost 600 new developments, Google Cloud could only cover 218 at the event. In the same spirit, we will focus on a subset of the announcements in this report.

Many of the announcements are in public or private preview. Please refer to the Google Cloud documentation to get the latest release status.

Keynote Highlights

The main keynote set the tempo for where Google Cloud is heading.

Figure 2: Google Cloud Next keynote in Michelob Arena, Las Vegas

Key points from the keynote that stood out are:

  • It comprised several live demos by Google Cloud executives and product team members without any external speakers. This demonstrated Google Cloud’s confidence in their unified full-stack AI cloud story.
  • Focus was on AI, specifically on the benefits of AI agents to various parts of business and customers and not on the picks and shovels of the cloud and data infrastructure.
  • AI is a big reset for the IT players and Google Cloud seems to be at an inflection point to accelerate its growth and attain leadership in the space.

Infra Cloud

The hallmark for Google Cloud is that it is a full-stack offering — mainly proprietary but steadily adding open standards. The bottommost layer of the stack comprises its data centers with chips, storage, networking and security infrastructure. To serve its daily billion-users on YouTube, Ads, Search, and Workspace, this infrastructure is already planet-scale. Google has been using AI and ML for a long time to optimize hardware resources based on the shape of workloads.

A new custom Arm-based CPU called Axion was announced and it will be available to customers later in 2024. Google’s own TPU based on multi-dimensional vectors continues to be the workhorse for training, fine-tuning and inference. Although you can soon get Nvidia’s H100 powered instances, called A3 Mega Compute and the Blackwell platform, called AI Hypercomputer, Google’s focus is on TPUs.

Google Cloud has also been embracing both hybrid and cross-cloud while ensuring end-to-end security. More on this when we look at BigQuery Omni.

A pleasant surprise was to see the longevity and rising influence of Kubernetes — which turns 10 on June 6, 2024! Google Cloud Run is increasingly being used to deploy containerized applications — both by the end users and by Google itself. Hugging Face demonstrated training and deploying its models on either Vertex AI or Google Kubernetes Engine (GKE). TPUs can be used to train not just native Google models but also those on Hugging Face.

GenAI Models & Platforms

The shining star of the conference was the Gemini multi-modal large language model (LLM). A refreshed version of Gemini 1.0 Pro with lower latency and higher quality is now GA while the 1-million context window Gemini 1.5 Pro is in public preview.

Why do we need such a large context prompt?

We got that answer on day 2 during the developer keynote when the presenters sent the ongoing keynote video to the model and were able to ask it questions. It is possible that large context window sizes may soon become table stakes. For example, organizations looking for security vulnerabilities may want to send their entire code base to the LLM without having to chunk it first. This option provides a more accurate and holistic outcome.

But, you may ask, at what cost?

Google’s new attention technique called, Infini-attention, gives LLMs infinite context while keeping memory and compute usage bounded.

Despite these efficient techniques, the fact remains that LLMs require huge infrastructure for training and inference. OpenAI’s GPT-4 training cost is estimated to be $100M. Therefore, the need for lightweight, open models, like Google’s Gemma 2B and 7B that can also run on edge devices which aren’t expected to be equipped with high-performance chips like the GPUs. Although most organizations are not yet training their own LLMs, Google Cloud is making the process easier. See the BigQuery section below for more details.

The Gemini and Gemma models are a part of Google’s AI/ML development and deployment environment called Vertex AI. The main persona for Vertex AI used to be a data scientist, but is now an application developer! Tuning a model can now take just a few hours and at a reasonable cost.

Some other announcement include:

  1. Prompt management: maintains a history of prompts, even suggests prompts and allows tagging.
  2. Model evaluation: Auto SxS enables side-by-side comparison of responses from two models.
  3. Experimentation: CI/CD on prompts and observability through partners like Honeycomb.io,

Model Garden consists of curated models. It has over 130 models which include almost 100 open-source, two Anthropic Claude models, and the rest are Google’s own models. One of the native models is a text-to-image model from Google DeepMind, named Imagen 2.0 that generates short videos and photo realistic images from natural language prompts.

In summary, I feel what we are witnessing is that Google is finally bringing its research into production faster. Although it created so many seminal products like Hadoop, the Transformer technique and Kubernetes, it let others execute on them faster. Now, it looks like Google Cloud is wresting control.

Developer Cloud

The main keynote was delivered by Thomas Kurian, affectionately known as TK, but the star was a bevy of AI agents. We saw demos of agents performing all kinds of tasks from customer-facing shopping to coding to other security aspects. Figure 3 shows a list of agents showcased in the keynote.

Figure 3: AI-driven autonomous agents are so critical that they were mentioned 46 times during the keynote

It is quite fascinating to see how fast Google has bought into the agents concept. To make it easier to build and deploy agents, it launched Vertex AI Agent Builder with a UI and an SDK. It integrates with LangChain, a framework for developing LLM-powered applications.

Gemini is powering assistive experiences in every layer including:

  1. Data management and security: For example, data migration service (DMS) is using Gemini to recommend code migration changes. More on this later.
  2. Workspace: Google Vids, a video creation app is headed our way in June.
  3. Development productivity: code completion and recommendations.

One assistant is the Gemini Code Assist, an evolution of last year’s Duet AI for Developers. It supports writing code in over 20 languages (Java, SQL, Python, C++, etc.) in IDEs like Visual Studio and JetBrains and has a natural language chat interface.

Amongst many pre-built agents is the data agent which uses a natural language interface to build data pipelines on data in various Google Cloud databases like AlloyDB and BigQuery. Using Looker semantics, developers can ask questions and collaborate on results in Google Workspaces. Gemini models are used to generate outcomes.

Data Cloud — Databases

Data is the kingmaker in the AI story. Period.

At the highest level, key development themes across databases and data analytics include:

  • Vector Search: Every Google Cloud database gets vector support, Firestore being the latest.
  • AI back-end: AI is used behind the scenes in tasks like workload optimization.
  • AI front-end: Natural language query using Gemini-based studios and the use of data agents.
  • Ecosystem: LangChain has been integrated across all databases to streamline development of agents and perform retrieval augmented generation (RAG).

AlloyDB introduced pgvector-based vector search capability last year, based on the very-popular HNSW indexing from Meta. This year, AlloyDB has added its own approximate nearest neighbor technique called ScaNN, which reduces latency for indexing and queries and has a lower memory footprint. AlloyDB also introduces model endpoints to work with Gemini, OpenAI, Hugging Face, Anthropic, etc. models.

Last year, Google Cloud announced Spanner’s data federation capability called Data Boost. This allowed real-time operational data to be queried within the analytical database BigQuery and other analytical tools. This year, Google expanded its Data Boost capabilities to Bigtable, Google’s key-value, wide-table database, allowing customers to execute analytical queries, ETL jobs and train machine learning models directly on transactional data without disrupting operational workloads.

As mentioned in the previous section, Gemini is also enabling AI-assisted experiences in Google’s databases for developers and administrators. It is being used for generating SQL code and using assistants to ease code migrations in migrating data to Google Cloud databases. A new use of assistant is for fleet management for customers with large deployments of their Cloud SQL and AlloyDB MySQL and PostgreSQL databases within a single pane of glass. It shows not only the availability status of the databases but also security, compliance, performance and cost aspects. Not only is the natural language interface used to manage databases, but can also provide recommendations and assistance to migrate other databases into AlloyDB.

New ‘parameterized secure views’ to secure prompts to AlloyDB and avoid prompt injection attacks was also introduced.

Google Cloud doesn’t always have to run in a data center connected to the public internet. Google Distributed Cloud (GDC) is designed to run on an air-gapped appliance on-premises and edge locations. This fully managed offering, formerly known as Anthos, runs various services like AlloyDB Omni using GKE. In the expo hall, McDonalds demonstrated running Vertex AI models on appliances that are going into its 40,000 outlets.

Data Cloud — Data Analytics

There were so many announcements in the data space that I have separated the data analytics ones from the operational data in this section.

Google’s unified data and AI platform is very close to the Intelligent Data Platform that we proposed in our 2024 Trends document. And, BigQuery is Google Cloud’s center of gravity when it comes to delivering the unified platform that works on multi structured data across any cloud and supports the multi-engine paradigm of using SQL, Python/Spark on the shared metadata plane. BigQuery is becoming a complete ecosystem. Google Cloud VP & GM of data & analytics, Gerrit Kazmaier, says, “BigQuery is the ERP of data.

Also in the 2024 trends document, we identified one of the top trends to be “personalizing AI” by enriching it with corporate data. We identified three approaches: training, fine-tuning, and RAG. Just four months ago, we didn’t see training to be a feasible option due to high cost and complexity, but it seems the integration of the Vertex AI Model Garden and Gemini in BigQuery can possibly make this option more practical. This option will lead to the significant rise of small language models that are domain or task-specific.

The Gemini LLM is a multi-modal. Now it can intelligently parse, store and analyze both structured and unstructured data in a single place. While Hadoop opened the doors to multi structured data storage in an object store, we were limited by what we could do as we lacked context. BigQuery ingests real-time streaming text, documents, audio, and video data and transcribes it so that the output can be converted into vector embeddings and used for similarity search. The transcribed data can also train models, fine-tune them, create LoRA adapters, and RAG pipelines.

Unstructured data, like documents, images, audio and video files are stored on Google Cloud Storage-based BigLake. BigQuery stores the URI to the files. Using Document API, Speech API or Vision API respectively, it stores the extracted entities or transcribed data or extracted features.

Next, the Gemini model helps create vector embeddings which are also stored in a column. Now developers can write SQL or Python queries to perform vector search, sentiment analysis, recommendations, or even look at the raw data. BigQuery Studio is the unified user experience front-end and it went GA at Next.

BigQuery has witnessed a sharp increase in its Serverless Spark usage. Google announced the preview of their serverless engine for Apache Spark integrated within BigQuery Studio. We expect that by 2025 Next, the team will support other compute engines like DuckDB and Ray.

Finally, for all data that is ingested into BigQuery, its metadata is also generated and stored in Dataplex. This serves two purposes: first, the metadata knowledge graph helps improve accuracy of the output. Second, tags and classification allow attribute-based access control (ABAC) to implement data access governance. Data observability features, like data quality rules and column-level lineage also made their debut and are not yet GA.

BigLake is the managed lakehouse offering. It supports DDL and DML operations on Parquet files stored in GCS using all three major open table formats — Apache Iceberg, Apache Hudi and Delta. BigQuery also announced the ability to access Spanner tables as external tables.

BigQuery Omni is the capability that allows BigQuery to run on-premises and in other cloud hyperscalers like AWS and Azure. New in 2024, is cross-cloud materialized views and bi-directional zero-copy data sharing with Salesforce Data Cloud.

BigQuery is battle-hardened and feature rich. In all the focus on AI, its GA announcement for Data Clean Rooms got little attention. This data sharing option applies privacy-enhancing techniques (PET) like k-anonymity and differential privacy to shared data, which can be cross-cloud using BigQuery Omni. Interestingly, Snowflake’s Data Clean Rooms (integration of their acquisition of LeapYear) also went GA recently.

Another unsung announcement is FinOps. The new FinOps Hub reduces the time to detect cost anomalies by 30%, provides root cause analysis, and helps align costs with top-line KPIs. When the cloud space was new, cost was a big concern. However, the focus should now shift to value and topics like sustainability. Just like security became table stakes, we expect FinOps to be baked in and not be a standalone topic.

Customers, Partners

There was plenty of customer activity with over 300 of them showcasing their work. I got a chance to talk to a few customers and partners in 1–1 settings to learn how they use Google Cloud products and the benefits they are deriving.

One of the customers I met is Sabre which provides technology to the travel and hospitality industry globally and at scale. They have migrated 17 data centers comprising 40K servers and 50PB of data to Google Cloud; they have integrated over a dozen analytics platforms into Google BigQuery. The PNR database has been migrated from a mainframe to Cloud Spanner. They also run several hundred Oracle databases on GCE but are moving to GKE. When I asked them what attracted them to Google Cloud, they pointed out two areas — exceptional engineering of data centers and responsiveness. One of their use cases is building revenue optimization models using Vertex AI.

I also got an opportunity to do a panel discussion with UPS which has built a digital twin of its entire distribution network and is using models to reduce losses due to ‘porch pirates’.

Several banks, retailers and healthcare providers showcased their advancements from improving contact center operations, to optimizing inventory, to developing AI caregiver assistants.

“I realized the power of LLMs when I spoke to a leader in the master data management space, Reltio. They have used ML-powered inference for many years but as the incoming data changes shape frequently, the ML model needs to be constantly retrained. Now, they are finding a much higher level of accuracy using an LLM to do entity resolution and it doesn’t require constant training. In addition, it provides, out of the box capabilities like matching similar names in foreign languages. They are now making a complete switch and going production with their clients.”

If you have made it this far in the document, I am truly honored. Thank you for your support. If you are interested in an animated and verbal discourse on key announcements from three independent analysts, please check out this podcast.

--

--

Sanjeev Mohan

Sanjeev researches the space of data and analytics. Most recently he was a research vice president at Gartner. He is now a principal with SanjMo.