IBM Cloud Pak for Data - Hemanth Manda - E-Book

IBM Cloud Pak for Data E-Book

Hemanth Manda

0,0
39,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Cloud Pak for Data is IBM's modern data and AI platform that includes strategic offerings from its data and AI portfolio delivered in a cloud-native fashion with the flexibility of deployment on any cloud. The platform offers a unique approach to addressing modern challenges with an integrated mix of proprietary, open-source, and third-party services.
You'll begin by getting to grips with key concepts in modern data management and artificial intelligence (AI), reviewing real-life use cases, and developing an appreciation of the AI Ladder principle. Once you've gotten to grips with the basics, you will explore how Cloud Pak for Data helps in the elegant implementation of the AI Ladder practice to collect, organize, analyze, and infuse data and trustworthy AI across your business. As you advance, you'll discover the capabilities of the platform and extension services, including how they are packaged and priced. With the help of examples present throughout the book, you will gain a deep understanding of the platform, from its rich capabilities and technical architecture to its ecosystem and key go-to-market aspects.
By the end of this IBM book, you'll be able to apply IBM Cloud Pak for Data's prescriptive practices and leverage its capabilities to build a trusted data foundation and accelerate AI adoption in your enterprise.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 366

Veröffentlichungsjahr: 2021

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



IBM Cloud Pak for Data

An enterprise platform to operationalize data, analytics, and AI

Hemanth Manda

Sriram Srinivasan

Deepak Rangarao

BIRMINGHAM—MUMBAI

IBM Cloud Pak for Data

Copyright © 2021 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Ali Abidi

Senior Editor: Roshan Kumar

Content Development Editors: Athikho Sapuni Rishana and Priyanka Soam

Technical Editor: Manikandan Kurup

Copy Editor: Safis Editing

Project Coordinator: Aparna Ravikumar Nair

Proofreader: Safis Editing

Indexer: Pratik Shirodkar

Production Designer: Aparna Bhagat

First published: October 2021

Production reference: 2221021

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN: 978-1-80056-212-7

www.packt.com

Contributors

About the authors

Hemanth Manda heads product management at IBM and is responsible for the Cloud Pak for Data platform. He has broad experience in the technology and software industry spanning a number of strategy and execution roles over the past 20 years. In his current role, Hemanth leads a team of over 20 product managers responsible for simplifying and modernizing IBM's data and AI portfolio to support cloud-native architectures through the new platform offering that is Cloud Pak for Data. Among other things, he is responsible for rationalizing and streamlining the data and AI portfolio at IBM, a $6 billion-dollar business, and delivering new platform-wide capabilities through Cloud Pak for Data.

Sriram Srinivasan is an IBM Distinguished Engineer leading the architecture and development of Cloud Pak for Data. His interests lie in cloud-native technologies such as Kubernetes and their practical application for both client-managed environments and Software as a Service. Prior to this role, Sriram led the development of IBM Data Science Experience Local and the dashDB Warehouse as a Service for IBM Cloud. Early on in his career at IBM, Sriram led the development of various web and Eclipse tooling platforms, such as IBM Data Server Manager and the SQL Warehousing tool. He started his career at Informix, where he worked on application servers, database tools, e-commerce products, and Red Brick data warehouse.

Deepak Rangarao leads WW Technical Sales at IBM and is responsible for the Cloud Pak for Data platform. He has broad cross-industry experience in the data warehousing and analytics space, building analytic applications at large organizations and technical presales, both with start-ups and large enterprise software vendors. Deepak has co-authored several books on topics such as OLAP analytics, change data capture, data warehousing, and object storage and is a regular speaker at technical conferences. He is a certified technical specialist in Red Hat OpenShift, Apache Spark, Microsoft SQL Server, and web development technologies.

About the reviewers

Sumeet S Kapoor is a technology leader, seasoned data and AI professional, inventor, and public speaker with over 18 years of experience in the IT Industry. He currently works for the IBM India software group as a solutions architect Leader and enables global partners and enterprise customers on the journey of adopting data and AI platforms. He has solved complex real-world problems across industry domains and has also filed a patent in the area of AI data virtualization and governance automation. Prior to IBM, he has worked as a senior technology specialist and development lead in Fortune 500 global product and consulting organizations. Sumeet enjoys running as his hobby and has successfully completed eight marathons and counting.

Campbell Robertson is the worldwide data and AI practice leader for IBM's Customer Success Group. In his role, Campbell is responsible for providing strategy and subject matter expertise to IBM Customer Success Managers, organizations, and IBM business partners. His primary focus is to help clients make informed decisions on how they can successfully align people, processes, and policies with AI- and data-centric technology for improved outcomes and innovation. He has over 25 years of experience of working with public sector organizations worldwide to deploy best-of-breed technology solutions. Campbell has an extensive background in architecture, data and AI technologies, expert labs services, IT sales, marketing, and business development.

Table of Contents

Preface

Section 1: The Basics

Chapter 1: The AI Ladder – IBM's Prescriptive Approach

Market dynamics and IBM's Data and AI portfolio

Introduction to the AI ladder

The rungs of the AI ladder

Collect – making data simple and accessible

Organize – creating a trusted analytics foundation

People empowering your data citizens

Analyze – building and scaling models with trust and transparency

Infuse – operationalizing AI throughout the business

Customer service

Risk and compliance

IT operations

Financial operations

Business operations

The case for a data and AI platform

Summary

Chapter 2: Cloud Pak for Data: A Brief Introduction

The case of a data and AI platform – recap

Overview of Cloud Pak for Data

Exploring unique differentiators, key use cases, and customer adoption

Key use cases

Customer use case: AI claim processing

Customer use case: data and AI platform

Cloud Pak for Data: additional details

An open ecosystem

Premium IBM cartridges and third-party services

Industry accelerators

Packaging and deployment options

Red Hat OpenShift

Summary

Section 2: Product Capabilities

Chapter 3: Collect – Making Data Simple and Accessible

Data – the world's most valuable asset

Data-centric enterprises

Challenges with data-centric delivery

Enterprise data architecture

NoSQL data stores – key categories

Data virtualization – accessing data anywhere

Data virtualization versus ETL – when to use what?

Platform connections – streamlining data connectivity

Data estate modernization using Cloud Pak for Data

Summary

Chapter 4: Organize – Creating a Trusted Analytics Foundation

Introducing Data Operations (DataOps)

Organizing enterprise information assets

Establishing metadata and stewardship

Business metadata components

Technical metadata components

Profiling to get a better understanding of your data

Classifying data for completeness

Automating data discovery and business term assignment

Enabling trust with data quality

Steps to assess data quality

DataOps in action

Automation rules around data quality

Data privacy and activity monitoring

Data integration at scale

Considerations for selecting a data integration tool

The extract, transform, and load (ETL) service in Cloud Pak for Data

Advantages of leveraging a cloud-native platform for ETL

Master data management

Extending MDM toward a Digital Twin

Summary

Chapter 5: Analyzing: Building, Deploying, and Scaling Models with Trust and Transparency

Self-service analytics of governed data

BI and reporting

Predictive versus prescriptive analytics

Understanding AI

AI life cycle – Transforming insights into action

AI governance: Trust and transparency

Automating the AI life cycle using Cloud Pak for Data

Data science tools for a diverse data science team

Distributed AI

Establishing a collaborative environment and building AI models

Choosing the right tools to use

ModelOps – Deployment phase

ModelOps – Monitoring phase

Streaming data/analytics

Distributed processing

Summary

Chapter 6: Multi-Cloud Strategy and Cloud Satellite

IBM's multi-cloud strategy

Supported deployment options

Managed OpenShift

AWS Quick Start

Azure Marketplace and QuickStart templates

Cloud Pak for Data as a Service

Packaging and pricing

IBM Cloud Satellite

A data fabric for a multi-cloud future

Summary

Chapter 7: IBM and Partner Extension Services

IBM and third-party extension services

Collect extension services

Db2 Advanced

Informix

Virtual Data Pipeline

EDB Postgres Advanced Server

MongoDB Enterprise Advanced

Organize extension services

DataStage

Information Server

Master Data Management

Analyze cartridges – IBM Palantir

Infuse cartridges

Cognos Analytics

Planning Analytics

Watson Assistant

Watson Discovery

Watson API Kit

Modernization upgrades to Cloud Pak for Data cartridges

Extension services

Summary

Chapter 8: Customer Use Cases

Improving health advocacy program efficiency

Voice-enabled chatbots

Risk and control automation

Enhanced border security

Unified Data Fabric

Financial planning and analytics

Summary

Section 3: Technical Details

Chapter 9: Technical Overview, Management, and Administration

Technical requirements

Architecture overview

Characteristics of the platform

Technical underpinnings

The operator pattern

The platform technical stack

Infrastructure requirements, storage, and networking

Understanding how storage is used

Networking

Foundational services and the control plane

Cloud Pak foundational services

Cloud Pak for Data control plane

Management and monitoring

Multi-tenancy, resource management, and security

Isolation using namespaces

Resource management and quotas

Enabling tenant self-management

Day 2 operations

Upgrades

Scale-out

Backup and restore

Summary

References

Chapter 10: Security and Compliance

Technical requirements

Security and Privacy by Design

Development practices

Vulnerability detection

Delivering security assured container images

Secure operations in a shared environment

Securing Kubernetes hosts

Security in OpenShift Container Platform

Namespace scoping and service account privileges

RBAC and the least privilege principle

Workload notification and reliability assurance

Additional considerations

Encryption in motion and securing entry points

Encryption at rest

Anti-virus software

User access and authorizations

Authentication

Authorization

User management and groups

Securing credentials

Meeting compliance requirements

Configuring the operating environment for compliance

Auditing

Integration with IBM Security Guardium

Summary

References

Chapter 11: Storage

Understanding the concept of persistent volumes

Kubernetes storage introduction

Types of persistent volumes

In-cluster storage

Optimized hyperconverged storage and compute

Separated compute and storage Nodes

Provisioning procedure summary

Off-cluster storage

NFS-based persistent volumes

Operational considerations

Continuous availability with in-cluster storage

Data protection – snapshots, backups, and active-passive disaster recovery

Quiescing Cloud Pak for Data services

Db2 database backups and HADR

Kubernetes cluster backup and restore

Summary

Further reading

Chapter 12: Multi-Tenancy

Tenancy considerations

Designating tenants

Organizational and operational implications

Architecting for multi-tenancy

Achieving tenancy with namespace scoping

Ensuring separation of duties with Kubernetes RBAC and separation of duties with operators

Securing access to a tenant instance

Choosing dedicated versus shared compute nodes

Reviewing the tenancy requirements

Isolating tenants

Tenant security and compliance

Self-service and management

A summary of the assessment

In-namespace sub-tenancy with looser isolation

Approach

Assessing the limitations of this approach

Summary

Other Books You May Enjoy

Preface

Cloud Pak for Data is IBM's modern Data and AI platform that includes strategic offerings from its data and AI portfolio delivered in a cloud-native fashion with the flexibility of deployment on any cloud. The platform offers a unique approach to address modern challenges with an integrated mix of proprietary, open source, and third-party services.

You will start with key concepts in modern data management and AI, review real-life use cases, and develop an appreciation of the AI Ladder principle. With this foundation, you will explore how Cloud Pak for Data helps in the elegant implementation of the AI Ladder practice to collect, organize, analyze, and infuse data and trustworthy AI across your business. As you advance, you will also discover the capabilities of the platform and extension services, including how they are packaged and priced. With examples throughout the book, you will gain a deep understanding of the platform, from its rich capabilities and technical architecture to its ecosystem and key go-to-market aspects.

At the end of this IBM book, you will be well-versed in the concepts of IBM Cloud Pak for Data, and be able to apply its prescriptive practices and leverage its capabilities in building a trusted data foundation and accelerate AI adoption in your enterprise.

Note

The content in this book is comprehensive and covers multiple versions in support as of Oct 2021 including version 3.5 and version 4.0. Some of the services, capabilities, and features highlighted in the book might not be relevant to all versions, and as the product evolves we expect a few more changes.

However, the overarching message, value prop, and underlying architecture will remain more or less consistent. Given the rapid progress and product evolution, we decided to be exhaustive while focusing to highlight the core concepts.

We sincerely hope that you will find this book helpful and overlook any inconsistencies attributed to product evolution.

Who this book is for

This book is for business executives, CIOs, CDOs, data scientists, data stewards, data engineers, and developers interested in learning about IBM's Cloud Pak for Data. Knowledge of technical concepts and familiarity with data, analytics, and AI initiatives at various levels of maturity is required to make the most of this book.

What this book covers

Chapter 1, The AI Ladder: IBM's Prescriptive Approach, explores market dynamics, IBM's data and AI portfolio, and a detailed overview of the AI Ladder, what it entails, and how IBM offerings map to the different rungs of the ladder.

Chapter 2, Cloud Pak for Data: A Brief Introduction, covers IBM's modern data and AI platform in detail, along with some of its key differentiators. We will discuss Red Hat OpenShift, the implied cloud benefits it confers, and the platform foundational services that form the basis of Cloud Pak for Data.

Chapter 3, Collect – Making Data Simple and Accessible, offers a flexible approach to address the modern challenges with data-centric delivery, with the proliferation of data both in terms of volume and variety, with a mix of proprietary, open source, and third-party services.

Chapter 4, Organize – Creating a Trusted Analytics Foundation, allows you to learn how Cloud Pak for Data enables Data Ops (data operations), orchestration of people, processes, and technology to deliver trusted, business-ready data to data citizens, operations, applications, and artificial intelligence (AI) fast.

Chapter 5, Analyzing: Building, Deploying, and Scaling Models with Trust and Transparency, explains how to analyze your data in smarter ways and benefit from visualization and AI models that empower your organization to gain new insights and make better and smarter decisions.

Chapter 6, Multi-Cloud Strategy and Cloud Satellite, offers to operationalize AI throughout the business, allowing your employees to focus on higher-value work.

Chapter 7, IBM and Partner Extension Services, covers the technical concepts underpinning Cloud Pak for Data, including, but not limited to, an architecture overview, common services, Day-2 operations, infrastructure and storage support, and other advanced concepts.

Chapter 8, Customer Use Cases, drills down into the concepts of extension services, how they are packaged and priced, and the various IBM extension services available on Cloud Pak for Data across the Collect, Organize, Analyze, and Infuse rungs of the AI ladder.

Chapter 9, Technical Overview, Management, and Administration, addresses the importance of a partner ecosystem, the different tiers of business partners, and how clients can benefit from an open ecosystem on Cloud Pak for Data.

Chapter 10, Security and Compliance, focuses on the importance of business outcomes and key customer use case patterns of Cloud Pak for Data while highlighting the top three use case patterns: data modernization, DataOps, and an automated AI life cycle.

Chapter 11, Storage, looks at how the two critical prerequisites for enterprise adoption, security and governance, are addressed in Cloud Pak for Data.

Chapter 12, Multi-Tenancy, covers the different storage options supported by Cloud Pak for Data and how to configure it for high availability and disaster recovery.

To get the most out of this book

Knowledge of technical concepts and familiarity with data, analytics, and AI initiatives at various levels of maturity is required to make the most of this book.

If you are using the digital version of this book, we advise you to type the code yourself. Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the color images

We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here:

https://static.packt-cdn.com/downloads/9781800562127_ColorImages.pdf

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "The Cloud Pak for Data control plane introduces a special persistent volume claim called user-home-pvc."

A block of code is set as follows:

kubectl get pvc user-home-pvc

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE

user-home-pvc Bound pvc-44e5a492-9921-41e1-bc42-b96a9a4dd3dc 10Gi RWX nfs-client 33d

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

Port: zencoreapi-tls 4444/TCP

TargetPort: 4444/TCP

Endpoints: 10.254.16.52:4444,10.254.20.23:4444

Bold: Indicates a new term, an important word, or words that you see on screen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "There are essentially two types of host nodes (as presented in the screenshot) – the Master and Compute (worker) nodes."

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you've read IBM Cloud Pak for Data, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

Section 1: The Basics

In this section, we will learn about market trends, data and AI, IBM's offering portfolio, its prescriptive approach to AI adoption, and an overview of Cloud Pak for Data.

This section comprises the following chapters:

Chapter 1, The AI Ladder: IBM's Prescriptive ApproachChapter 2, Cloud Pak for Data – A Brief Introduction

Chapter 1: The AI Ladder – IBM's Prescriptive Approach

Digital transformation is impacting every industry and business, with data and artificial intelligence (AI) playing a prominent role. For example, some of the largest companies in the world, such as Amazon, Facebook, Uber, and Google, leverage data and AI as a key differentiator. However, not every enterprise is successful in embracing AI and monetizing their data. The AI ladder is IBM's response to this market need – it's a prescriptive approach to AI adoption and entails four simple steps or rungs of the ladder.

In this chapter, you will learn about market dynamics, IBM's Data and AI portfolio, and a detailed overview of the AI ladder. We are also going to cover what it entails and how IBM offerings map to the different rungs of the ladder.

In this chapter, we will be covering the following main topics:

Market dynamics and IBM's Data and AI portfolioIntroduction to the AI ladderCollect – making data simple and accessibleOrganize – creating a trusted analytics foundationAnalyze – building and scaling AI with trust and transparencyInfuse – operationalizing AI throughout the business

Market dynamics and IBM's Data and AI portfolio

The fact is that every company in the world today is a data company. As the Economist magazine rightly pointed out in 2017, data is the world's most valuable resource and unless you are leveraging your data as a strategic differentiator, you are likely missing out on opportunities.

Simply put, data is the fuel, the cloud is the vehicle, and AI is the destination. The intersection of these three pillars of IT is the driving force behind digital transformation disrupting every company and industry. To be successful, companies need to quickly modernize their portfolio and embrace an intentional strategy to re-tool their data, AI, and application workloads by leveraging a cloud-native architecture. So, cloud platforms act as a great enabler by infusing agility, while AI is the ultimate destination, the so-called nirvana that every enterprise seeks to master.

While the benefits of the cloud are becoming obvious by the day, there are still several enterprises that are reluctant to embrace the public cloud right away. These enterprises are, in some cases, constrained by regulatory concerns, which make it a challenge to operate on public clouds. However, this doesn't mean that they don't see the value of the cloud and the benefits derived from embracing the cloud architecture. Everyone understands that the cloud is the ultimate destination, and taking the necessary steps to prepare and modernize their workloads is not an option, but a survival necessity:

Figure 1.1 – What's reshaping how businesses operate? The driving forces behind digital transformation

IBM enjoys a strong Data and AI portfolio, with 100+ products being developed and acquired over the past 40 years, including some marquee offerings such as Db2, Informix, DataStage, Cognos Analytics, SPSS Modeler, Planning Analytics, and more. The depth and breadth of IBM's portfolio is what makes it stand out in the market. With Cloud Pak for Data, IBM is doubling down on this differentiation, further simplifying and modernizing its portfolio as customers look to a hybrid, multi-cloud future.

Introduction to the AI ladder

We all know data is the foundation for businesses to drive smarter decisions. Data is what fuels digital transformation. But it is AI that unlocks the value of that data, which is why AI is poised to transform businesses with the potential to add almost 16 trillion dollars to the global economy by 2030. You can find the relevant source here: https://www.pwc.com/gx/en/issues/data-and-analytics/publications/artificial-intelligence-study.html.

However, the adoption of AI has been slower than anticipated. This is because many enterprises do not make a conscious effort to lay the necessary data foundation and invest in nurturing talent and business processes that are critical for success. For example, the vast majority of AI failures are due to data preparation and organization, not the AI models themselves. Success with AI models is dependent on achieving success in terms of how you collect and organize data. Business leaders not only need to understand the power of AI but also how they can fully unleash its potential and operate in a hybrid, multi-cloud world.

This section aims to demystify AI, common AI challenges and failures, and provide a unified, prescriptive approach (which we call "the AI ladder") to help organizations unlock the value of their data and accelerate their journey to AI.

As companies look to harness the potential of AI and identify the best ways to leverage data for business insights, they need to ensure that they start with a clearly defined business problem. In addition, you need to use data from diverse sources, support best-in-class tools and frameworks, and run models across a variety of environments.

According to a study by MIT Sloan Management Review, 81% of business leaders (http://marketing.mitsmr.com/offers/AI2017/59181-MITSMR-BCG-Report-2017.pdf) do not understand the data and infrastructure required for AI and "No amount of AI algorithmic sophistication will overcome a lack of data [architecture] – bad data is simply paralyzing."

Put simply: There is no AI without IA (information architecture).

IBM recognizes this challenge our clients are facing. As a result, IBM built a prescriptive approach (known as the AI ladder) to help clients with the aforementioned challenges and accelerate their journey to AI, no matter where they are on their journey. It allows them to simplify and automate how organizations turn data into insights by unifying the collection, organization, and analysis of data, regardless of where it lives. By climbing the AI ladder, enterprises can build a governed, efficient, agile, and future-proof approach to AI. Furthermore, it is also an organizing construct that underpins the Data and AI product portfolio of IBM.

It is critical to remember that AI is not magic and requires a thoughtful and well-architected approach. Every step of the ladder is critical to being successful with AI.

The rungs of the AI ladder

The following diagram illustrates IBM's prescriptive approach, also known as the AI ladder:


Figure 1.2 – The AI ladder – a prescriptive approach to the journey of AI

The AI ladder has four steps (often referred to as the rungs of the ladder). They are as follows:

Collect: Make data simple and accessible. Collect data of every type regardless of where it lives, enabling flexibility in the face of ever-changing data sources.Organize: Create a business-ready analytics foundation. Organize all the client's data into a trusted, business-ready foundation with built-in governance, quality, protection, and compliance.Analyze: Build and scale AI with trust and explainability. Analyze the client's data in smarter ways and benefit from AI models that empower the client's team to gain new insights and make better, smarter decisions.Infuse: Operationalize AI throughout the business. You should do this across multiple departments and within various processes by drawing on predictions, automation, and optimization. Craft an effective AI strategy to realize your AI business objectives. Apply AI to automate and optimize existing workflows in your business, allowing your employees to focus on higher-value work.

Spanning the four steps of the AI ladder is the concept of Modernize from IBM, which allows clients to simplify and automate how they turn data into insights. It unifies collecting, organizing, and analyzing data within a multi-cloud data platform known as Cloud Pak for Data.

IBM's approach starts with a simple idea: run anywhere. This is because the platform can be deployed on the customer's infrastructure of choice. IBM supports Cloud Pak for Data deployments on every major cloud platform, including Google, Azure, AWS, and IBM Cloud. You can also deploy Cloud Pak for Data platforms on-premises in your data center, which is extremely relevant for customers who are focused on a hybrid cloud strategy.

The way IBM supports Cloud Pak for Data on all these infrastructures is by layering Red Hat OpenShift at its core. This is one of the key reasons behind IBM's acquisition of Red Hat in 2019. The intention is to offer customers the flexibility to scale across any infrastructure using the world's leading open source steward: Red Hat. OpenShift is a Kubernetes-based platform that also allows IBM to deploy all our products through a modern container-based model. In essence, all the capabilities are rearchitected as microservices so that they can be provisioned as needed based on your enterprise needs.

Now that we have introduced the concept of the AI ladder and IBM's Cloud Pak for Data platform, let's spend some time focusing on the individual rungs of the AI ladder and IBM's capabilities that make it stand out.

Collect – making data simple and accessible

The Collect layer is about putting your data in the appropriate persistence store to efficiently collect and access all your data assets. A well-architected "Collect" rung allows an organization to leverage the appropriate data store based on the use case and user persona; whether it's Hadoop for data exploration with data scientists, OLAP for delivering operational reports leveraging business intelligence or other enterprise visualization tools, NoSQL databases such as MongoDB for rapid application development, or some mixture of them all, you have the flexibility to deliver this in a single, integrated manner with the Common SQL Engine.

IBM offers some of the best database technology in the world for addressing every type of data workload, from Online Transactional Processing (OLTP) to Online Analytical Processing (OLAP) to Hadoop to fast data. This allows customers to quickly change as their business and application needs change. Furthermore, IBM layers a Common SQL Engine across all its persistence stores to be able to write SQL once, and leverage your persistence store of choice, regardless of whether it is IBM Db2 or open source persistence stores such as MongoDB or Hadoop. This allows for portable applications and saves enterprises significant time and money that would typically be spent on rewriting queries for different flavors of persistence. Also, this enables a better experience for end users and a faster time to value.

IBM's Db2 technology is enabled for natural language queries, which allows non-SQL users to search through their OLTP store using natural language. Also, Db2 supports Augmented Data Exploration (ADE), which allows users to access the database and visualize their datasets through automation (as opposed to querying data using SQL).

To summarize, Collect is all about collecting data to capture newly created data of all types, and then bringing it together across various silos and locations to make it accessible for further use (up the AI ladder). In IBM, the Collect rung of the AI ladder is characterized by three key attributes:

Empower: IT architects and developers in enterprises are empowered as they are offered a complete set of fit-for-purpose data capabilities that can handle all types of workloads in a self-service manner. This covers all workloads and data types, be it structured or unstructured, open source or proprietary, on-premises or in the cloud. It's a single portfolio that covers all your data needs.Simplify: One of the key tenets of simplicity is enabling self-service, and this is realized rather quickly in a containerized platform built using cloud-native principles. For one, provisioning new data stores involves a simple click of a button. In-place upgrades equate to zero downtime, and scaling up and down is a breeze, ensuring that enterprises can quickly react to business needs in a matter of minutes as opposed to waiting for weeks or months. Last but not least, IBM is infusing AI into its data stores to enable augmented data exploration and other automation processes.Integrate: Focuses on the need to make data accessible and integrate well with the other rungs of the AI ladder. Data virtualization, in conjunction with data governance, enables customers to access a multitude of datasets in a single view, with a consistent glossary of business terms and associated lineage, all at your fingertips. This enables the democratization of enterprise data accelerating AI initiatives and driving automation to your business. The following diagram summarizes the key facets of the Collect rung of the AI ladder:

Figure 1.3 – Collect – making data simple and accessible

Our portfolio of capabilities, all of which support the Collect rung, can be categorized into four workload domains in the market:

First, there's the traditional operational database. This is your system of records, your point of sales, and your transactional database.Analytics databases are in high demand as the amount of data is exploding. Everyone is looking for new ways to analyze data at scale quickly, all the way from traditional reporting to preparing data for training and scoring AI models.Big data. The history of having a data lake using Hadoop at petabyte scale is now slowly transforming into the separation of storage and compute, with Cloud Object Storage and Spark playing key roles. The market demand for data lakes is clearly on an upward trajectory.Finally, IoT is quickly transforming several industries, and the fast data area is becoming an area of interest. This is the market of the future, and IBM is addressing requirements in this space through real-time data analysis.

Next, we will explore the importance of organizing data and what it entails.

Organize – creating a trusted analytics foundation

Given that data sits at the heart of AI, organizations will need to focus on the quality and governance of their data, ensuring it's accurate, consistent, and trusted. However, many organizations struggle to streamline their operating model when it comes to developing data pipelines and flows.

Some of the most common data challenges include the following:

Lack of data quality, governance, and lineageTrustworthiness of structured and unstructured dataSearchability and discovery of relevant dataSiloed data across the organizationSlower time-to-insight for issues that should be real time-basedCompliance, privacy, and regulatory pressuresProviding self-service access to data

To address these many data challenges, organizations are transforming their approach to data: they are undergoing application modernization and refining their data strategies to stay compliant while still fueling innovation.

Delivering trusted data throughout your organization requires the adoption of new methodologies and automation technologies to drive operational excellence in your data operations. This is known as DataOps. This is also referred to as "enterprise data fabric" by many and plays a critical role in ensuring that enterprises are gaining value from their data.

DataOps corresponds to the Organize rung of IBM's AI ladder; it helps answer questions such as the following:

What data does your enterprise have, and who owns it?Where is that data located?What systems are using the data in question and for what purposes?Does the data meet all regulatory and compliance requirements?

DataOps also introduces agile development processes into data analytics so that data citizens and business users can work together more efficiently and effectively, resulting in a collaborative data management practice. And by using the power of automation, DataOps helps solve the issues associated with inefficiencies in data management, such as accessing, onboarding, preparing, integrating, and making data available.

DataOps is defined as the orchestration of people, processes, and technology to deliver trusted, high-quality data to whoever needs it.

People empowering your data citizens

A modern enterprise consists of many different "data citizens" – from the chief data officer; to data scientists, analysts, architects, and engineers; to the individual line of business users who need insights from their data. The Organize rung is about creating and sustaining a data-driven culture that enables collaboration across an organization to drive agility and scale.

Each organization has unique requirements where stakeholders in IT, data science, and the business lines need to add value to drive a successful business. Also, because governance is one of the driving forces needed to support DataOps, organizations can leverage existing data governance committees and lessons from tenured data governance programs to help establish this culture and commitment.

The benefits of DataOps mean that businesses function more efficiently once they implement the right technology and develop self-service data capabilities that make high-quality, trusted data available to the right people and processes as quickly as possible. The following diagram shows what a DataOps workflow might look like: architects, engineers, and analysts collaborate on infrastructure and raw data profiling; analysts, engineers, and scientists collaborate on building analytics models (whether those models use AI); and architects work with business users to operationalize those models, govern the data, and deliver insights to the points where they're needed.

Individuals within each role are designated as data stewards for a particular subset of data. The point data citizens of the DataOps methodology is that each of these different roles can rely on seeing data that is accurate, comprehensive, secure, and governed:

Figure 1.4 – DataOps workflow by roles

IBM has a rich portfolio of offerings (now available as services within Cloud Pak for Data) that address all the different requirements of DataOps, including data governance, automated data discovery, centralized data catalogs, ETL, governed data virtualization, data privacy/masking, master data management, and reference data management.

Analyze – building and scaling models with trust and transparency

Enterprises are either building AI or buying AI solutions to address specific requirements. In the case of a build scenario, companies would benefit significantly from commercially available data science tools such as Watson Studio. IBM's Watson Studio not only allows you to make significant productivity gains but also ensures collaboration among the different data scientists and user personas.

Investing in building AI and retraining employees can have a significant payoff. Pioneers across multiple industries are building AI and separating themselves from laggards:

In construction, they're using AI to optimize infrastructure design and customization.In healthcare, companies are using AI to predict health problems and disease symptoms.In life science, organizations are advancing image analysis to research drug effects.In financial services, companies are using AI to assist in fraud analysis and investigation.Finally, autonomous vehicles are using AI to adapt to changing conditions in vehicles, while call centers are using AI for automating customer service.

However, several hurdles remain, and enterprises face significant challenges in operationalizing AI value.

There are three areas that we need to tackle:

Data: 80% of time is spent preparing data versus building AI models.Talent: 65% find it difficult to fund or acquire AI skills.Trust: 44% say it's very challenging to build trust in AI outcomes.

Source: 2019 Forrester, Challenges That Hold Firms Back From Achieving AI Aspirations.

Also, it's worth pointing out that building AI models is the easy part. The real challenge lies in deploying those AI models into production, monitoring them for accuracy and drift detection, and ensuring that this becomes the norm.

IBM's AI tools and runtimes on Cloud Pak for Data present a differentiated and extremely strong set of capabilities. Supported by the Red Hat OpenShift and Cloud Pak for Data strategy, IBM is in a position to set and lead the market for AI tools. There are plenty of point AI solutions from niche vendors in the market, as evidenced from the numerous analyst reports; however, none of them are solving the problem of putting AI into production in a satisfactory manner. The differentiation that IBM brings to the market is the full end-to-end AI life cycle:

Figure 1.5 – AI life cycle

Customers are looking for an integrated platform for a few reasons. Before we get to these reasons, the following teams care about the integrated platform:

Data science teams are looking for integrated systems to manage assets across the AI life cycle and across project team members.Chief Data Officer are looking to govern AI models and the data associated with them. Chief Risk Officer (CRO) are looking to control the risks that these models expose by being integrated with business processes.Extended AI application teams need integration so that they can build, deploy, and run seamlessly. In some situations, Chief information officer (CIOs)/business technology teams who want to de-risk and reduce the costs of taking an AI application to production are responsible for delivering a platform.

Customer Use Case

A Fortune 500 US bank is looking for a solution in order to rapidly deploy machine learning projects to production. The first step in this effort is to put in place a mechanism that allows project teams to deliver pilots without having to go through full risk management processes (from corporate risk/MRM teams). They call this a soft launch, which will work with some production data. The timeline to roll out projects is 6-9 months from conceptualization to pilot completion. This requirement is being championed (and will need to be delivered across the bank) by the business technology team (who are responsible for the AI operations portal). The idea is that this will take the load away from MRM folks who have too much on their plate but still have a clear view of how and what risk was evaluated. LOB will be using the solution every week to retrain models. However, before that, they will upload a CSV file, check any real-time responses, and pump data to verify that the model is meeting strategy goals. All this must be auto-documented.

One of the key differentiators for IBM's AI life cycle is AutoAI, which allows data scientists to create multiple AI models and score them for accuracy. Some of these tests are not supposed to be black and white.

Several customers are beginning to automate AI development. Due to this, the following question arises: why automate model development? Because if you can automate the AI life cycle, you can enhance your success rate.

An automated AI life cycle allows you to do the following:

Expand your talent pool: This lowers the skills required to build and operationalize AI modelsSpeed up time to delivery: This is done by minimizing mundane tasks.Increase the readiness of AI-powered apps: This is done by optimizing model accuracy and KPIs.Deliver real-time governance: This improves trust and transparency by ensuring model management, governance, explainability, and versioning.

Next, we will explore how AI is operationalized in enterprises to address specific use cases and drive business value.

Infuse – operationalizing AI throughout the business

Building insights and AI models is a great first step, but unless you infuse them into your business processes and optimize outcomes, AI is just another fancy technology. Companies who have automated their business processes based on data-driven insights have disrupted the ones who haven't – case in point being Amazon in retail, who has upended many traditional retailers by leveraging data, analytics, and AI to streamline operations and gain a leg up on the competition. The key here is to marry technology with culture and ensure that employees are embracing AI and infusing it into their daily decision making:

Figure 1.6 – Infuse – AI is transforming how businesses operate

The following are some diverse examples of companies infusing AI into their business processes. These are organized along five key themes:

Customer service (business owner: CCO): Customer care automation, Customer 360, customer data platform.Risk and compliance (business owner: CRO): Governance risk and compliance.IT operations (business owner: CIO): Automate and optimize IT operations.Financial operations (business owner: CFO): Budget and optimize across multiple dimensions.Business operations (business owner: COO): Supply chain, human resources management.

Customer service

Customer service is changing by the day with automation driven by chatbots and a 360-degree view of the customer becoming more critical. While there is an active ongoing investment on multiple fronts within IBM, the one that stands out is IBM's Watson Anywhere campaign, which allows customers to buy Cloud Pak for Data Watson services (Assistant, Discovery, and API Kit) at a discount and have it deployed.

Customer Use Case

A technology company that offers mobile, telecom, and CRM solutions is seeing a significant demand for intelligent call centers and invests in an AI voice assistant on IBM Cloud Pak for Data. The objective is to address customers' queries automatically, reducing the need for human agents. Any human interaction happens only when detailed consultation is required. This frees up call center employees to focus on more complex queries as opposed to handling repetitive tasks, thus improving the overall operational efficiency and quality of customer service, not to mention reduced overhead costs. This makes building intelligent call centers simpler, faster, and more cost-effective to operate. Among other technologies, that proposed solution uses Watson Speech to Text, which converts voice into text to help us understand the context of the question. This allows AI voice agents to quickly provide the best answer in the context of a customer inquiry.

Risk and compliance

Risk and compliance is a broad topic and companies are struggling to ensure compliance across their processes. In addition to governance risk and compliance, you also need to be concerned about the financial risks posed to big banks. IBM offers a broad set of out-of-the-box solutions such as OpenPages, Watson Financial Crimes Insight, and more, which, when combined with AI governance, deliver significant value, not just in addressing regulatory challenges, but also in accelerating AI adoption.

IT operations

With IT infrastructure continuing to grow exponentially, there is no reason to believe that it'll decline any time soon. On the contrary, the complexity of operating IT infrastructure is not a simple task and requires the use of AI to automate operations and proactively identify potential risks. Mining data to predict and optimize operations is one of the key use cases of AI. IBM has a solution called Watson AIOps on the Cloud Pak for Data platform, which is purpose-built to address this specific use case.

Financial operations

Budgeting and forecasting typically involves several stakeholders collaborating across the enterprise to arrive at a steady answer. However, this requires more than hand waving. IBM's Planning Analytics solution on Cloud Pak for Data is a planning, budgeting, forecasting, and analysis solution that helps organizations automate manual, spreadsheet-based processes and link financial plans to operational tactics.