AWS’s Comprehensive Stack for Generative AI
Amazon Web Services (AWS) is no stranger to the space of innovation. Its patent company, Amazon, completed 30 storied years in the summer of 2024. It not only has created and perfected the cloud category, but is now rapidly expanding its presence in the generative AI space.
In this paper, we explore the components of AWS’ generative AI stack which build upon its world-class data management and machine learning foundations. Amazon has been using traditional AI in its offerings like Alexa and recommendation systems for a long time.
Although generative AI requires high-quality, reliable, and secure data foundation, this paper does not cover the data layer. Its primary focus is to explain how AWS’ offerings help with generative AI workloads, like building agentic architectures, retrieval augmented generation (RAG) pipelines, and fine-tuning and pre-training models.
AWS’ Integrated AI Stack
The AWS generative AI ethos is to efficiently build, customize, deploy, and scale AI apps:
- Efficiency: Starts from identification of the relevant and impactful use cases to establishing the process to go quickly from experimentation to production while mapping to the maturity of the organization.
- Customizable: Gives users choices in using native and third-party foundation models and tools that meet their unique needs, like cost performance, security, and governance.
- Scale: Measures and tracks business value and optimizes for cost, latency, and accuracy to manage risks and maintain trust.
AWS provides a full range of capabilities, as depicted in Figure 1.
Infrastructure
AWS has made substantial investments in building a robust and scalable infrastructure to support generative AI workloads. This includes purpose-built chips for inference and training, as well as a vast network of data centers. For example, Amazon is investing over $100B over the next few years to expand its data center footprint.
The combination of hardware and software optimizations enables AWS to deliver resilient, high-performance and cost-effective AI solutions to meet their current and future AI needs.
The cornerstone of AWS’ AI infrastructure push is purpose-built chips, like Inferentia for inference tasks and Tranium for training models. AWS is using these chips for its own native foundation models (FMs) and for third party models like Anthropic’s Claude family. Amazon Prime Day in July 2024 used 80,000 Inferentia and Tranium chips.
These AI-specific chips are besides its purpose-built CPUs called Graviton4 which went GA in the summer of 2024. This 4th generation CPU consists of 96 cores and is based on Arm’s high performance computing processor architecture called Neoverse V2.
[The author added on September 2, 2024: According to The Verge, Amazon is set to revitalize Alexa by incorporating Anthropic’s Claude family of models alongside its existing ML algorithms and Titan models. This “Remarkable Alexa” is anticipated to launch in October 2024 and may require a paid subscription.]
Models
The beating heart of GenAI is the Transformer-architecture-based foundation models like the large language models (LLMs). These pre-trained models serve as the building blocks for a variety of AI applications.
AWS strategy consists of giving customers model choices to find the right model for the right use case through its native family of models called Titan and 3rd party models. AWS provides models from A21 Labs, Anthropic, Cohere, Meta, Mistral AI, and Stability AI. In fact, it has invested $4B into Anthropic whose Claude family of models is now amongst the top-performing models. Interestingly, OpenAI models are not available.
As expected, AWS has trained Titan models on massive datasets, allowing them to capture complex patterns and relationships within the data. Enterprises can customize Titan models to suit their specific needs by fine-tuning them on their own datasets. They are designed to be scalable and efficient to handle large-scale workloads and achieve optimal performance.
The Titan family of models comprises eight versions offering text, image, multimodal and embedding capabilities via a fully managed API. These models are available through Amazon Bedrock which we cover later in the document.
Next we look at Amazon SageMaker which is used to build, train and deploy models. It is AWS’s fully managed machine learning service, which simplifies the deployment, monitoring, and maintenance of powerful AI tools. It includes tools like notebooks, debuggers, profilers, pipelines, and MLOps support in one integrated development environment (IDE). Customers can build their own FMs, or fine-tune existing models, or use hundreds of pre-trained models.
Model evaluation is a critical step in the machine learning lifecycle because it ensures that the models perform as expected, are reliable, and are suitable for their intended tasks. Without proper evaluation, there is a risk of deploying models that might make incorrect predictions or provide biased outputs. Model evaluation not only helps with accuracy and trustworthiness but also meet the intended use case’s functional and nonfunctional requirements.
Amazon SageMaker Clarify not only helps select the optimal model but also quantify the impact of the model customization techniques, such as prompt engineering, reinforcement learning from human feedback (RLHF), retrieval-augmented generation (RAG), and supervised fine tuning (SFT).
SageMaker Clarify can analyze pre-training data and post-training predictions to identify potential biases. By using SHAP (SHapley Additive exPlanations) values, the tool can explain individual predictions. This allows users to see how each feature contributes to a particular prediction, making the model more interpretable and transparent. To summarize, the tool helps in model evaluation, bias detection, feature importance, and model explainability.
Next, Amazon SageMaker Experiments is used to track, organize, and compare multiple versions of machine learning experiments. It records the parameters, configurations, model versions, results of each run, such as dataset used, model architecture, hyperparameters, and training outcomes. This tool helps in A/B testing, collaboration and auditing.
The experimentation and development can be done in an IDE like SageMaker Studio or in SageMaker Notebook, or a no-code visual interface called SageMaker Canvas. Each option provides unique features and interfaces to accommodate the diverse ways in which data scientists, developers, analysts, and business users engage with machine learning tasks.
For example, SageMaker Studio is designed for data scientists and ML engineers who are comfortable writing code and need a powerful, all-in-one solution for managing the entire ML workflow from data preparation to deployment. It provides a comprehensive environment for end-to-end machine learning, from data preparation and model building to training, tuning, and deployment. SageMaker Notebooks is a cloud-based Jupyter notebook for data scientists while SageMaker Canvas is designed for business analysts and other non-technical users who want to build ML models without writing any code.
These are only a handful of Amazon SageMaker components since it is a comprehensive and powerful platform that covers the entire lifecycle of AI.
Development
For organizations to build and deploy AI workloads, they need an integrated set of tools and technologies. An example of an AI workload consists of extracting entities from their unstructured data to either fine tune a model or to create a RAG pipeline.
AWS offers several services for traditional ML workloads, such as
- Text to speech — Amazon Polly
- Speech to text — Amazon Transcribe
- Image generation & analysis — Amazon Rekognition
- Chatbots — Amazon Lex
- NLP — Amazon Comprehend
- Language translation — Amazon Translate
- Document Processing — Amazon Textract
- Recommendation engine — Amazon Personalize
- Amazon Q Developer
Amazon Bedrock is a serverless, managed service to access the FMs mentioned in the previous section. Using a common API across all models, developers can upgrade to the latest model versions with minimal changes. As a result, when moving from experimentation to production, you can change models (native or 3rd party) through configuration rather than code changes.
Developers who use the FMs to do model evaluation or to build genAI apps have two options — in-demand inference and batch inference. Batch inference allows multiple requests to run asynchronously and at 50% of the pricing of on-demand inference. This capability went to GA in August 2024.
AWS has also launched its AI-driven assistant, App Studio (in preview), to non-software-developer technical roles to quickly build genAI applications using natural language. Using prompts, users can ask App Studio to develop a multi-page UI, data model, and business logic. Furthermore, the point and click interface helps connect to AWS data sources, 3rd-party business applications like Salesforce or use connector APIs to SaaS solutions like HubSpot, Twilio, etc. It is a free service.
Finally, Cohere has launched an LLM University with courses on building AI applications using its Cohere’s Chat, Embed, and Rerank endpoints. One module is on using Cohere models with Amazon Bedrock to do text generation, summarization, RAG pipelines, and building semantic search and use Amazon SageMaker to increase search result accuracy.
Governance
Security is job #1 at AWS. Ethical and responsible AI is a paramount concern when it comes to enabling customers who have entrusted it with securing its most sensitive data. Now the same rigor is being applied to securing and governing AI workloads.
The AWS framework for governing AI outcomes is called Cloud Adoption Framework for Artificial Intelligence (AI), Machine Learning (ML), and generative AI (CAF-AI). This is a living methodology that focuses on not just technology, but also on the people and process aspects.
AWS also partners with dedicated AI governance platforms like IBM’s watsonx.governance. The close integration of the IBM product with Amazon SageMaker allows organizations to monitor and manage risks pertaining to:
- Models: Maintains enterprise-wide model inventory consisting of policies, metrics and models.
- Operations: Consolidates risk and control assessments, action plans, key risk indicators, and internal / external loss events.
- Regulatory: Integrates data feeds, software apps and processes into a single pane of glass to maintain a comprehensive and timely view of compliance risks.
Guardrails for Amazon Bedrock is a feature that enhances the safety and responsibility of AI applications built on AWS’s generative AI platform. It provides customizable safeguards on top of the native protections offered by foundation models (FMs), blocking up to 85% more harmful content and filtering over 75% of hallucinated responses. The solution enables customers to apply safety, privacy, and truthfulness protections consistently across all their applications, regardless of the underlying model.
Key features include the ability to define and block undesirable topics, filter harmful content based on specific AI policies, redact sensitive information (PII), and create custom word filters. Additionally, Guardrails supports contextual grounding checks to detect and filter hallucinations in model responses, ensuring that AI-generated content remains accurate and trustworthy.
Guardrails for Amazon Bedrock works with all large language models available on the platform, as well as fine-tuned models, and can be integrated with other AWS services like Agents and Knowledge Bases. The ApplyGuardrail API also allows for evaluating inputs and model responses generated by custom or third-party models outside of Bedrock. This makes it a versatile tool for organizations aiming to build responsible AI applications that align with their specific policies and requirements.
Apps
Amazon Q, is a natural language based, sophisticated approach to question-answering systems It is used to process and respond to complex queries, particularly in structured and semi-structured data contexts. Advanced NLP techniques enable the system to parse, interpret, and generate accurate responses from large datasets. The system’s natural language understanding (NLU) is robust enough to handle nuanced queries, making it a powerful tool for extracting information from extensive and diverse data sources.
Amazon Q’s capabilities make it an invaluable tool across several use cases, including customer support, data analytics, and business intelligence:
Q Developers:
Assistant for developers which reduces software development time up to 70%. Amongst its capabilities include coding suggestions and security scanning.
Amazon Q Developer has rapidly expanded and is seamlessly integrated within Amazon’s broader ecosystem like SageMaker and Amazon Redshift enabling the system to tap into large-scale data warehouses and machine learning models to provide insightful answers. The synergy with Amazon Redshift, for example, allows Q Developer to perform complex SQL queries on vast datasets, delivering precise results in response to natural language questions.
Q Business:
Ingests data from 40+ systems like Amazon S3 and Salesforce to help build “Q Apps” for sales, lawyers etc. to perform Q&A to help users get answers to their questions, provide summaries, generate content, and securely complete tasks based on data and information in their enterprise systems.
Users can also use Amazon Q Apps to generate apps in a single step from their conversation with or by describing their requirements.
Embedded in native and 3rd party apps:
- Amazon QuickSight: Amazon Q is its ability to handle advanced data querying beyond simple fact retrieval. It supports complex queries that require understanding relationships between data points, performing aggregations, and even generating insights from data trends. This makes Amazon Q not just a question-answering tool but a potent engine for data analysis and business intelligence.
- Amazon Connect: Amazon Q can provide automated responses to customer inquiries, reducing the load on human agents and improving response times.
- Supply Chain: It is fine-tuned to understand and respond to queries within specific domains, such as finance, healthcare, or retail to not only understand the language used in these fields but also the context, resulting in more accurate and relevant answers.
Amazon Q’s architecture supports real-time query processing, enabling users to get immediate responses to their questions, especially in dynamic environments where timely information is critical, such as financial markets or operational monitoring systems. Since it leverages the AWS infrastructure, it is highly scalable for query loads that can vary significantly and require consistent, reliable performance and adheres to the underlying stringent security protocols.
Amazon Q can act as a bridge between raw data and decision-makers, enabling executives to ask direct questions about business performance and receive clear, actionable answers. Business users can query complex datasets with no need to understand the underlying data structures or SQL, democratizing access to data-driven insights.
Summary
Generative AI has ushered in a new era of technological innovation. As enterprises seek to leverage this transformative technology, AWS has positioned itself as a pivotal player. It provides a comprehensive suite of development tools to facilitate the creation and deployment of generative AI applications. The paper highlights AWS’s integrated AI stack, designed to efficiently build, customize, deploy, and scale AI applications.
We examined the foundational components, including infrastructure, models, development tools, governance frameworks, and applications. This nuanced analysis uncovers the unique value proposition that AWS offers to enterprises seeking to harness the power of generative AI.
AWS has energized its massive ecosystem of partners to help strategize, build, deploy, and operationalize AI solutions. As the field of generative AI continues to evolve, AWS is well-positioned to remain at the forefront of innovation.