This page describes some aspects of hybrid multi-cloud data architectures of SAP Cloud and S/4HANA intelligent enterprises.

SAP Business Data-To-Value on BTP AWS Azure Cloud Platform

Hybrid multi-cloud data management and analytics platforms enable organizations to implement data-driven processes based on data analytics and AI Machine Learning (ML). Cloud data environments have to consider data governance and security requirements together with self-service options for data democratization to empower business users with insights.

Hybrid Multi-Cloud Data-to-Value

Integrated cloud data platforms store, process and orchestrate large distributed data volumes to reduce data silos and avoid monolithic big data lakes.

Metadata driven data governance across heterogeneous systems enables unified enterprise views on data. Data catalogs enable metadata based data access to generate data insights from origin to consumption visualized with data lineage methods.

Hybrid Multi-Cloud Data Architecture

Modern data warehouses, cloud analytics services and data-driven SaaS applications can be built on data lakes to pre-process loaded data and limit further data processing on subsets with relevant data.

Data Fabric, Data Mesh and Data LakeHouse are modern hybrid multi-cloud data architecture patterns, which can be used in combinination to implement customer specific requirements. Data Fabrics and Data Lakehouse patterns are mainly driven by products or services of vendors, in contrast to Data Meshes as primarily organizational approach.

Data Fabric architectures enable centrally governed data management platforms for distributed data with enterprise wide real-time views on integrated data sources. Data fabric architecture designs are technology-centric and metadata-driven to allow streamlined central and local data processing in hybrid multi-cloud environments.

AI Machine Learning (ML) and low code technologies play a major role to simplify and automate data delivery processes of Data Fabrics. Business users benefit from data democratization with offered self-services.

Data Lakehouses combine elements of data warehouses and data lakes. Data Lakehouses offer ACID transaction support (e.g. with Delta Lake), curated data layer for structured and unstructured data, schema support and data governance and end-to-end streaming for real-time scenarios.

Data Meshes focus on organizational changes to shift responsibilities from central data engineer teams of monolithic data warehouses to domain-driven teams. This decentralized approach eliminates the bottleneck of centralized data teams (engineers and analysts) and enables data engineering embedded in business units with domain expertise, to perform local processing and governance of distributed data.

Cloud Data Ingestion Layer

Cloud Data Ingestion Layers connect with various sources of structured, semi-structured and unstructured data. Data Ingestion Layers support various protocols like ODBC/JDBC to connect with various databases and enable the integration of batch or streaming data sources.

Analytics or AI/ML data ingestion processes are typically performed with three steps: Extract (E), Load (L) and Transform (T) which can be performed with different sequence ELT or ETL.

ELT processing reduces the resource contention on source systems and enhances the architectural flexibility with multiple supported transformations.

Cloud Data Storage Layer

Storing all kind of data (structured, semi-structured, unstructured) from hybrid multi-cloud sources in one Cloud Data storage layer, optimizes data management and processing. Cloud Data storage layer shall offer curated data which is organized, consumption-ready and conform with organizational standards and data models.

The Cloud Data Lakehouse concept is one approach to implement this storage layer which can be extended with a Delta Lake curated zone. Data separated into folders (bronze, silver, gold) is reflecting the enrichment status. Delta lakes offer database capabilities with data versioning through ACID transactions and schema support.

Cloud metadata governance layers are built with metadata catalogs. Metadata stores enable unified processing of structured, semi-structured and unstructured data.

Cloud Data Compute Layer

Cloud Data compute layers shall be separated from storage layer with independent scalability. Massive Parallel Processing (MPP) capabilities enable to build architectures with high performance. High availability is implemented with Cloud Data compute layers, distributed over multiple availability zones.

Analytic runtimes of Cloud Data processing layers offer access to data with SQL, Spark. SQL queries are unified across structured, semi-structured and unstructured data.

Cloud Data consumption layer shall be compatible with preferred Business Intelligence services.

Hybrid Multi-Cloud Cloud Data Integration

Data-driven organizations enhance agility and performance with data virtualization which abstracts data access with hybrid multi-cloud data sources. Data virtualization separates storage from consumption and eliminates the need to replicate data. Data Federation is a technology where different virtualized cloud data stored get combined and act as one.

SAP Hybrid Multi-Cloud SDA SDI

Virtualization enables query processing in source cloud systems which facilitates data management with the available source semantic layer and business context (e.g. organization structure). Virtual live data hasn't to be moved and duplicated, is always up-to-date and can facilitate compliance with company policies like restrictions to data locations.

Data Replication stores data redundant and data queries have to be processed in the target system. Extract Load Transform (ELT) pipelines are typically used to ingest data from an on-premises database to the cloud. ELT reduces the resource contention on source systems and enhances the architectural flexibility with multiple supported transformations.

Azure Cloud Data Architecture

Modern Azure data warehouse platforms can be built on Azure Data Lake with high performant data ingestion to enable analytics BI reporting and data-driven serverless SaaS applications. Efficient data access with hierarchical namespaces on Blob storage enables enterprise big data analytics based on Azure Data Lake Storage Gen2. Data lakes can store huge amounts of frequently used of data for SaaS applications or BI reports as presentation layers.

Azure Enterprise Hybrid Multi-Cloud Data Warehouse DataBricks

Azure Data Factory (ADF) data ingestion implements ETL data integration and transformations with scale-out capabilities. The ADF management interface allows to orchestrate multiple activities (processing steps) of data pipelines. Transformations with mapping data flows on serverless infrastructures can be visually designed and monitored.

ADF pipelines perform actions with integration logic like running SQL Server Integration Services (SSIS) packages, which are similar to ADF pipelines. Linked services define required information to link disparate data sources like SAP HANA, S/4HANA, SAP ECC or BW to the data factory with available connectors. Automated uploads can be implemented with copy activities e.g. from Azure Blob storage for reporting to Azure SQL DB.

Azure Data to Value Solutions

Azure offers solutions to generate value from data with. data lakes storing unstructured, raw data to be cleaned and structured by data warehouses for analytics or data-driven business processes.

Azure DataBricks Lakehouse architectures combine governance of data warehouses with Spark based big data processing for analytics and machine learning. DataBricks supports Delta Lake with tables for ACID transactions and the Unity Catalog for Data Governance or Discovery. Azure Data Lake Storage credential passthrough is a premium feature which allows storage access without configured service principals.

Azure Synapse Analytics combines data integration, enterprise data warehousing and big data analytics with massive parallel processing (MPP) to run high-performance analytics. The data integration capabilities (pipelines, data flows) are based on a subset of Azure Data Factory features.

Synapse Analytics uses the PolyBase virtualization technology to access external data stored in Azure Blob storage or Azure Data Lake Store via T-SQL language. Data from disparate sources (such as on-premise, IoT) can be aggregated with the data warehouse features.

Azure Synapse Link integrates SQL or Cosmos DB databases with Azure Synapse Analytics to provide near realtime analytical processing (HTAP) on operational data.

Azure Analysis Services enables companies to implement tabular semantic data models for business users to query data interactively with Power BI which offers integrated Analysis Services premium features. Tabular models of connected databases can run in-memory (default for e.g. SAP HANA, PostgreSQL) or in DirectQuery (to overcome memory limitations for e.g. Azure SQL) mode.

Azure Data Explorer is a fully managed, scalable, high-performance, big data analytics platform, optimized for interactive, ad-hoc queries over various data types (structured, semi-structured, unstructured) ingested into single set data collection.

Some Azure Data Explorer use-cases are log analytics (clickstream data, product logs), time series analytics (e.g. IoT) with built-in capabilities, general-purpose exploratory analytics e.g. on streaming data or scoring of high-quality machine learning models (applying on data) created and managed with Azure ML.

Power BI provides visualization and reporting insights using LiveQueries without datasets or DirectQueries to PowerBI datasets of connected data sources. Data imports enable fast processing with advanced and premium features for large datasets or large scale support.

Some sample connection options for Power BI are DirectQuery to SAP HANA, Amazon Redshift or Athena and LiveQuery to Azure Analysis Services.

Azure Cloud Data Analytics Connection Types

AWS Cloud Data Architecture

AWS Cloud Data Analytics

Layer / Service	Short Description
Ingestion Layer
	AWS Glue is a serverless data integration service. Glue offers built in connectors to common sources like S3, Kinesis, Redshift, Postgres. Furthermore, AWS Glue allows development of custom Spark, JDBC or Athena based connectors. Glue crawlers populate the AWS Glue Data Catalog with tables from source data stores.
	Appflow ingests SaaS applications data
	Data Migration to ingest data from several operational RDBMS and NoSQL databases
	Datasync ingests files from NFS and SMB
	Kinesis Firehouse ingests streaming data. Kinesis Data Stream enables building real-time analytics pipelines.
Storage Layer
	S3 to store structured, semi-structured, and unstructured data, typically using open file formats.
	Amazon Redshift provides petabyte scale data warehouse storage for highly structured data.
Catalog Layer	Discovery and Governance. Layer stores the schemas of structured or semi-structured datasets in Amazon S3
	AWS Glue Data Catalog central metadata catalog for entire data landscape
	AWS Lake Formation offer a Data Catalog based on Glue. Enables row level security
Processing Layer
	AWS SQL based ELT with Redshift Spectrum ELT
Big Data Processing	Apache Spark Jobs
	AWS EMR Big Data Spark job processing
	AWS Glue Spark jobs
Near realtime ETL
	Spark Streaming with Kinesis Analytics
	AWS EMR Big Data Spark Streaming
	AWS Glue Spark Streaming
Consumption Layer
	QuickSight is machine learning powered. Generated data models automatically understand the meaning and relationships of business data
Data Virtualization	Athena and Redshift spectrum are serverless services offering virtualization features like Federated Query and User Defined Functions. Both services have native integration with the AWS Glue metadata catalog as well as the Hive Metastor and use SQL to query virtual tables of data stored on S3. Both services provide JDBC drivers. Athena is a standalone interactive service, whereas Spectrum is part of the Redshift stack
	AWS Redshift in built on a MPP architecture. AWS Glue virtualizes Redshift spectrum data access to S3. External tables define schemas and location of S3 data files. Virtualized tables can be joined with tables in Redshift. Federated Query allows you to run a Redshift query across additional databases and data lakes like S3, Amazon RDS or Aurora.
	Athena query results stored on S3, can be loaded into Redshift

SAP Cloud BTP Data and Analytics

SAP Business Technology Platform provides cloud data and analytics services.

SAP HANA Cloud Database Service

HANA Cloud Execution Plan Optimizer

Calculation models are translated into a relational SQL representation whenever possible to execute SQL engine optimization. This unfolding transformation is applied globally and automatically.

The default optimization mode of calculation views is calculation engine. To avoid two optimizations of calculation views within SQL statements, the "Execute in SQL Engine" option overrides the default optimization. This behavior leads to better and more efficient optimization results.

The SQL plan marks partly successful transformations of Calculation Engine as "ceQoPop”.

To check whether a query is unfolded you can use the Explain Plan functionality. If all tables that are used in your query appear in your plan, unfolding has taken place successfully

Some features of nested calculation view can block unfolding for the whole calculation view. Execute in SQL Engine option allows you to override the default execution behavior in calculation views.

Performance: plan is translated into an SQL representation, referred to as a "QO" (query optimization) Box in PlanViz output represents plan operator (POP). Prefix define execution engine e.g. CePOP Calculation Engine, BwPOP OLAP Engine

SAP HANA Cloud is a fully managed, in-memory cloud database as a service (DBaaS). HANA Cloud Service (HCS) instances are deployed as single tenant databases.

HANA Cloud is scalable with regard to compute and data storage independently (memory + disk), vertically (in/out) and horizontally (up/down).

Hana Cloud does not support auto-scaling. Decreasing memory size requires a service request. Furthermore, HANA Cloud cannot scale based on the current workload and also not dynamically to speed up the currently running query.

Main characteristics:

hybrid transactional analytical processing (HTAP)
in-memory column store
SDA and SDI data connectivity options
storage data pyramid based on usage type
- frequently used, hot data stored in memory
- infrequently used, warm data managed with SAP HANA Native Storage Extension (NSE)
  - intelligent caching mechanism efficiently moving popular data between cache and memory
- Data Lakes optional for read-only cold data
  - to manage huge data volumes on disk
  - for structured, unstructured and semi-structured data
  - HCS integration with SDA
sizing of memory, CPU, disk ratio has to consider memory requirements, performance and costs

Some HANA Cloud use cases:

data connections to
- data warehouse (DWC) and management (Data Intelligence) solutions
- analytical services (SAP Cloud Analytics)
create, run, extend new applications in hybrid and multi-cloud cloud environments

HANA Cloud Smart Data

Virtualization, Federation and Replication are supported by HANA Smart Data Access (SDA) and Smart Data Integration (SDI).

Features of natively available HANA SDA adapters :

ODBC (Open Database Connectivity) connections to other HANA /anyDB databases like Amazon Athena (ODBC)
REST based connections to HANA Cloud, Google BigQuery
linked database to access all tables of remote DB
- virtual tables are automatically created
- house keeping activities required
optimized SDA for HANA remote tables
- cached metadata
- dynamic house keeping (creation and drop of virtual tables)
data replication is only supported with HANA as remote source

SDI features:

connects with JDBC protocol
depends on an data provisioning agent
- running outside of HANA
- long polling from agent to cloud
- controlled by the HANA provisioning server
- additionally also the target system driver has to be installed e.g. MS SQL Server for Azure Data Explorer
supports data federation and data replication
replication task configuration (single table or mass replication)

Depending on the location, the SDI dpagent can use TCP/IP or HTTP (secure SSL enabled).

some SDI Adapters

CDC support to track data changes
- log reader adapters MSSQL, PostgreSQL
- ABAP with Operational Data Provisioning (ODP)
- HANA
JDBC (Apache Camel), ABAP RFC on tables, OData, Soap, File

HANA Cloud tools

Main features:

Cloud Central
- create, start and scale instances
- assign memory and storage
- trigger quarterly released updates with maximum 7 months delay
- monitor instance status and resource consumption
SAP HANA Cloud Cockpit for administration tasks and monitoring
- security, integration, alerts configuration
- monitor services, resources
- checking encryption status
- control automatic backups, replicated to additional availability zones with a recovery point objective (RPO) not more than 15 minutes
- manage HDI containers
  - adding them into groups
  - display users and roles with granted access
HANA Database Explorer
- browse catalog, execute statements, import + export data
- explore database schema
- analyze and debug SQL queries
- Import / Export data to hybrid or multi-cloud storage
Business Application Studio for Cloud Application Programming (CAP)

SAP HANA Cloud Modeling

Calculation views are the main HANA graphical design time objects, which will be converted to SQL processible column views at run-time. They can be enhanced with read-only SQLScript procedures and functions. These procedures or functions return tables or scalar values of the view or calculate view input parameters for dynamic placeholders.

Pushing down input parameters as filters, pruning (join, union, column) and parallelization are optimization concepts.

There are three types of calculation views:

Dimension or central master data view
Cube without star schema to model
- data sets comprised of attributes and measures
- transactional data processing
- aggregations on measures
Cube with star schema provide dimensional data models with one or more joined dimension calculation views

Dataflow scenarios can be composed in the Calculation View Modeler with a range of node operations (e.g. join, aggregation) on data sources (like tables, views, functions).

Other modeling features, which can be used standalone or within calculation views:

Analytical capabilities for predictive modeling
- PAL contains algorithms
  - for pre-processing (e.g. sampling, binning, partitioning) and predictive modeling (e.g. classification, clustering, regression, time series, association, neural network)
  - which can be integrated with flow graphs, which define data inputs, data processing, and outputs and parameters used in the predictive model
- APL automates the selection of algorithms for data preparations, scoring and predictions
multi model capabilities for different data types and tasks
- enable SQL access to relational, json, graph, geo and text data
- Spatial uses
  - open standards with geometrical data types such as points, lines or polygons
  - query functions like Within, Distance or Crosses
- Graph modeling can be applied to
  - data representing networks with entities (nodes) and lines (edges) like social, supply-chain, transport
  - attributes assigned to nodes and lines can add additional meaning like flow direction
embedded machine learning
schema less database artifacts
- HDI containers allow multiple deployments on same HANA instance with different corresponding schemas
flowgraphs to design data flows

SAP HANA Cloud HANA Security Basics

SAP HANA's holistic security framework comprises user management, authentication, authorization, anonymization, masking, encryption and auditing:

Authorization is controlled by privileges
- bundled with roles or assigned to users
- object privileges on tables, procedures and functions with standard SQL database permissions
- analytical privileges to define access on row level
Masking and Anonymization hide sensitive data
Personal Security Environment (PSE)
- manage certificate collections
- stores public-key certificates of trusted communication partners or root certificates of trusted CA
- public keys are used to validate JSON Web Tokens (JWT)
Administration users
- HANA user SYSTEM is reserved for use by SAP
- Administrators own system privilege USER ADMIN

SAP HANA Cloud Storage

HANA's column-based storage and processing is optimized for on-line analytical processing (OLAP). Column tables enable multi-core parallel processing and fast aggregations.

Row storage is regarded as the optimal storage design for an on-line transactional processing (OLTP).

HANA Cloud column store tables are automatically compressed to reduce size in memory and CPU cache. Compression is applied during the delta merge operation.

Dictionary encoding compression removes the repetition of values in a column. Each distinct value is stored only once in the dictionary store. Integers representing unique dictionary values are addressable with indices. Some further compression techniques are run-length, cluster and prefix encoding.

Columns can be partitioned to improve performance and only columns needed for processing are actually loaded to memory.

Welcome to SAP Cloud and S/4HANA Architecture solutions.

SAP ISA-M Data and Analytics Integration

The Integration Solution Advisory Methodology (ISA-M) describes data and analytics integrations. These cloud integrations are part of data-to-value scenarios in hybrid multi-cloud environments.

SAP Cloud Data Intelligence

SAP Data Intelligence is a unified data management solution for data orchestration, with data catalog and integration capabilities. Comprehensive metadata management rules optimize governance and minimize compliance risk.

Some features:

Predefined connection types enable hybrid multi-cloud data processing
Data Flows can
- integrate advanced Machine Learning with Jupyter Notebooks
- ingest data into target systems like SAP Analytics Cloud datasets

SAP Data Warehouse Cloud (DWC)

SAP Data Warehouse Cloud is a modeling and data governance environment, built on the HANA Cloud database.

DWC spaces provide isolated virtual environments for

business analysts to model business layer
data engineers implement the data layer
IT to centralize gover connections, permissions, and resources like computing and storage

Spaces can represent business domains Data Mesh architectures. They can also be used to access HANA HDI containers with created SQL users.

Some characteristics of the limited Data Flow modeler embedded in SAP Data Warehouse Cloud:

missing advanced Machine Learning
no data ingestion to target systems
simplified UI

Various DWC integration requirements can be implemented with HANA SDI (Smart Data Integration) and SDA. The HANA SDI DP agent offers system connection options. To connect to S/4HANA Cloud, the DP agent has to be installed on a separate server. Additionally, DWC supports Open Connectors.

S-API extractors are mostly regarded as acquisition of data based on S/4HANA compatibility views or function modules. They are not recommended for virtualization and limited suitable for replication with DWC.

Some DWC connectivity options to Hyperscalers:

remote sources Amazon Redshift + Athena, Azure SQL DB + Server
Data Flow Amazon Simple Storage, Microsoft Azure Blob + Data Lake (Gen1, Gen2)

SAP Analytics Cloud (SAC)

SAP Analytics Cloud is a cloud service for:

augmented analytics with Smart Features
predictive analytics
business intelligence (BI) dashboards and reporting
enterprise planning

These SAC features are powered by artificial intelligence and machine learning technologies.

SAP Analytics Cloud Data Management

Datasets are data collection suitable for ad-hoc analysis, as basis for stories (embedded or public) and smart predict data sources.

Models allow multiple data sources and planning scenarios.

Data modeling entails e.g.

data wrangling, cleaning
defining measures, dimensions or hierarchies

SAP Analytics Cloud Connections

Data can be accessed with SAC with live connections or after data import. Live connections are supported to cloud (e.g. BTP, S/4HANA Cloud, DWC) and On-premise (like BW, HANA, S/4HANA) data sources. In combination with DWC, there are various other connection options possible.

Data connections can be created for live and acquired data:

live data connections
- direct with CORS and VPN access or browser inside firewall
  - direct data flow from source to browser
  - public internet access requires web dispatcher or opening of firewall port
- cloud connector tunnel
  - available over internet
  - data flow through tunnel may effect performance
  - allow external access
acquired data connections with additional setup
- leverage planning, smart predict
- SAP BTP Connectivity Cloud Connector (CC) with principal propagation
  - standalone for HTTPS connections to e.g. OData, S/4HANA sources
- CC and SAP Analytics Cloud Agent for non HTTPS data sources like BW, SQL DB, Universe

All live data connections, except those to DWC, require a proxy live data model. This proxy model tells SAP Analytics Cloud how to query data from source model (HANA calculation view, BW query).

Live connections from SAP Analytics Cloud to Data Warehouse Cloud, in combination with DWC Remote connection options, enable further SAC live connectivity data sources.

SAP Analytics Cloud Stories

SAP Analytics Cloud stories visualize information data views with visualizations e.g. charts, tables, geo maps. Business users can explore data interactively with story views to find insights to make decisions. Both views are based on models defined with dimensions (categories) and measures (quantities) on data.

SAC offers templates (provided by SAP or customized) and predefined story page types (responsive for mobile, canvas, grid, smart discovery form) as starting point for stories.

SAP Analytics Cloud Augmented Analytics

Smart Predict is an SAP Analytics Cloud feature, that helps analysts answer business questions about future trends. e.g. target sales campaign limited to customers who are most likely to by a specific product.

The coupled date and target value information is called the signal. Signals will be analyzed by Smart Predict Time Series Forecasting.

Search to Insight is a natural language query interface used to query data.

SAP Analytics Cloud Predictive Planning

Major use cases for predictive planning are expense & costs, revenue & sales and workforce planning.

Predictive Planning answers questions about "what should happen" based on Smart Prediction ("what can we learn from historical data") and Time Series Forecasting ("what could happen").

SAP Analytics Cloud Designer

The SAP Analytics Cloud Designer SDK enables implementation of

3D graphs
extend applications with custom widgets (web components)
guidance through analytical applications
user friendly HTML applications
embedded analytical applications in external pages with post message API communication