Cloud AI platforms like SAP BTP Generative AI Hub, AWS Bedrock, Azure AI Foundry or Google Vertex AI offer customization and orchestration capabilities with model catalogs to realize GenAI Business scenarios.
Pre-trained foundation models or LLMs provide huge general knowledge but are often not able to process specialized business tasks with highest accuracy.
In-context customization strategies like Prompt Engineering or Retrieval Augmented Generation (RAG) add business domain knowledge to general purpose models without retraining.
GenAI Orchestration with SAP Generative AI Hub or Azure Machine Learning Prompt Flow integrate customized GenAI models effectively into business apps and scenarios with workflows and features like templating, grounding, content filters or data masking.
GenAI customization empowers AI models on different levels with in-context learning and fine-tuning options.
Prompt Engineering and Retrieval Augmented Generation (RAG) augment the context of GenAI models to improve the quality of generated output. In-context learning doesn't change model weights and empowers GenAI models without the need of expensive training runs.
Prompt engineering and advanced search capabilities are typical options to optimize generated outputs with custom business context without model retraining. Vector search on embedding models can be implemented with different data sources for RAG, similarity search or recommendation scenarios.
Based on vector search techniques, Retrieval-Augmented Generation (RAG) augments prompts for generative AI solutions with information retrieved from custom data sources.
In-context learning with few-shot prompting guides AI models with examples to extend context to improve the accuracy of the output.
GenAI model fine-tuning adapts weighted connections between model parameters on different layers of pre-trained base models to learn new capabilities. The training result depends on hyperparameter settings such as learning rate, epochs or batch size and on the quality of prepared domain specific datasets.
In contrast to in-context learning with few-shots, the length of fine tuning training examples exceed the context window length of GenAI models and resources costs increase significantly with fine-tuning of generative AI model.
Transformer architectures have high memory requirements which quadruples in case of doubled training data length. The maximum data transformer models can process at once is limited by the training data sequence length which determines the context window of the trained GenAI model. Transformer model context window length include the sum of input prompt and generated output.
Parameter Efficient Fine Tuning (PEFT) identifies most relevant parameters for specific tasks to focus training activities efficiently with reduced compute power and memory. PEFT avoids altering all parameters to reduce the demand for infrastructure resources and training data size. Low Rank Parameter (LoRA) reduces the dimensional space of training parameters.
Generative AI Transformer models often use 32-bit sized weights which results in high memory requirements. Quantization reduces the precision of parameters to 16-bit or lower to optimize training efficiency with lowered costs.
Quantization can help to overcome SAP AI Core RAM limitations e.g. to reduce 60b models with 32-bit precision quantized to 30b 16-bit for 60 GB available GPU RAM.
Orchestration features are available on all major generative AI platforms like SAP AI Launchpad, Azure AI Foundry, AWS Bedrock or Google Vertex AI and can be implemented as part of Business AI applications the help of SAP Generative Hub SDK or frameworks like LangChain or LangGraph.
GenAI orchestration features help to manage prompt templates with placeholders, grounding with domain specific data or to implement advanced security with content filters.
Grounding with techniques like RAG or fine-tuning empowers GenAI models to process use-case specific Business AI qtasks with specific and relevant information beyond the knowledge of the trained base model.
Retrieval Augmented Generation (RAG) augments input prompt instructions with domain specific knowledge to improve outputs of Generative AI solutions for Business AI use-cases.