Modern SAP Cloud AI applications orchestrate multi-modal GenAI capabilities with interaction patterns from human orientated chatbots to autonomous software systems.

How to orchestrate AI Apps with SAP Cloud Best Practices

Pre-trained LLMs provide huge general knowledge but are often not suitable to perform domain-specific business tasks without the implementation of customization and orchestration best practices.

Orchestrate Business AI Apps with SAP Cloud Best Practices

The performance of LLMs can be improved with customization techniques like fine-tuning or in-context learning.

Fine-tuning guides models to deliver customized outputs for specific tasks with higher costs and resource requirements than in-context learning.

In-context customization strategies like Prompt Engineering or Retrieval Augmented Generation (RAG) add business domain knowledge to general purpose models without retraining.

SAP Generative AI Hub Orchestration & Customization

GenAI orchestration implements workflows and features like templating, grounding, content filters or data masking to integrate GenAI models into business apps and scenarios. These orchestration processes can be implemented with frameworks like LangChain on modern Cloud AI platforms.

Enterprise AI Platforms

Modern cloud AI platforms like the SAP BTP AI Foundation, AWS Bedrock, Azure AI Foundry or Google Vertex AI offer tools and model catalogs to realize GenAI Business scenarios. Cloud AI applications can access LLMs hosted on Azure, AWS or Google directly or via proxy deployments on SAP BTP AI Core.

Integrate LLMs into Business Applications

Chatbots or agents are AI solutions based on LLMs and building blocks of Business AI scenarios. Frameworks like LlamaIndex, LangChain or LangGraph offer features to integrate LLMs into Cloud AI applications.

Empower GenAI with Customization

GenAI customization empowers AI models on different levels with in-context learning and fine-tuning options.

In-Context Learning

Prompt Engineering and Retrieval Augmented Generation (RAG) augment the context of GenAI models to improve the quality of generated output. In-context learning doesn't change model weights and empowers GenAI models without the need of expensive training runs.

SAP Azure Cloud Generative & Business AI

Prompt engineering and advanced search capabilities are typical options to optimize generated outputs with custom business context without model retraining. Vector search on embedding models can be implemented with different data sources for RAG, similarity search or recommendation scenarios.

Based on vector search techniques, Retrieval-Augmented Generation (RAG) augments prompts for generative AI solutions with information retrieved from custom data sources.

In-context learning with few-shot prompting guides AI models with examples to extend context to improve the accuracy of the output.

Advanced Prompting Techniques

Advanced prompting techniques like Chain of Thoughts (CoT), Tree of Thoughts (ToT) or ReAct guide LLMs to process logical reasoning in a structured way.

CoT mimics sequential human-style decision-making with plans of steps to attain specific goals. ToT additionally explores multiple parallel options and weights the paths to pick the best one.

ReAct prompting combines reasoning with actions to implement agentic interactions with environments. Interative stepwise thoughts and feedback loops enable agents to perform actions with tools and to return observations. This structured approach enables agents to operate autonomously without human interaction to perform goal oriented tasks.

GenAI Fine-Tuning

GenAI model fine-tuning adapts weighted connections between model parameters on different layers of pre-trained base models to learn new capabilities. The training result depends on hyperparameter settings such as learning rate, epochs or batch size and on the quality of prepared domain specific datasets.

In contrast to in-context learning with few-shots, the length of fine tuning training examples exceed the context window length of GenAI models and resources costs increase significantly with fine-tuning of generative AI model.

Transformer architectures have high memory requirements which quadruples in case of doubled training data length. The maximum data transformer models can process at once is limited by the training data sequence length which determines the context window of the trained GenAI model. Transformer model context window length include the sum of input prompt and generated output.

Transfer learning techniques like PEFT or LoRA limit the amount of weights to be trained, to guide pre-trained models how to reuse existing knowledge for modified, related tasks.

Parameter Efficient Fine Tuning (PEFT) identifies most relevant parameters for specific tasks to focus training activities efficiently with reduced compute power and memory. PEFT avoids altering all parameters to reduce the demand for infrastructure resources and training data size. Low Rank Parameter (LoRA) reduces the dimensional space of training parameters.

Transformer Model Quantization

Generative AI Transformer models often use 32-bit sized weights which results in high memory requirements. Quantization reduces the precision of parameters to 16-bit or lower to optimize training efficiency with lowered costs.

Quantization can help to overcome SAP AI Core RAM limitations e.g. to reduce 60b models with 32-bit precision quantized to 30b 16-bit for 60 GB available GPU RAM.

Advanced GenAI Hub Orchestration

Orchestration features are available on all major generative AI platforms like SAP AI Launchpad, Azure AI Foundry, AWS Bedrock or Google Vertex AI and can be implemented as part of Business AI applications the help of SAP Generative Hub SDK or frameworks like LangChain or LangGraph.

GenAI orchestration features help to manage prompt templates with placeholders, grounding with domain specific data or to implement advanced security with content filters.

Grounding with techniques like RAG or fine-tuning empowers GenAI models to process use-case specific Business AI qtasks with specific and relevant information beyond the knowledge of the trained base model.

Retrieval Augmented Generation (RAG) augments input prompt instructions with domain specific knowledge to improve outputs of Generative AI solutions for Business AI use-cases.