This is Chapter 1 of the AWS AI and ML series.
These are the concepts and service distinctions I kept returning to while working with AWS AI services. Writing them down in one place made the concepts stick.
Different types of ML algorithms
Three broad families, each suited to a different class of problem.
Supervised learning - trains on labelled input-output pairs. Two sub-types:
- Classification - predicts a category (spam/not spam, image labels). Algorithms: logistic regression, decision trees, random forests, SVM, neural networks.
- Regression - predicts a continuous value (house price, demand forecast). Algorithms: linear regression, gradient boosting, neural networks.
Unsupervised learning - finds structure in unlabelled data:
- Clustering - groups similar items (K-means, hierarchical clustering)
- Dimensionality reduction - compresses features while preserving signal (PCA)
- Anomaly detection - identifies outliers (Isolation Forest, autoencoders)
Reinforcement learning - an agent learns by trial and error, optimising for a reward signal. Well suited to sequential decision problems: game playing, robotics, recommendation systems.
Deep learning - neural networks with many layers. Not a separate family, but a technique applicable across all three: deep CNNs for image classification (supervised), autoencoders for anomaly detection (unsupervised), deep RL for complex control tasks.
Algorithm selection guide
| Problem type | Data | Algorithm family |
|---|---|---|
| Predict a category | Labelled | Supervised - classification |
| Predict a number | Labelled | Supervised - regression |
| Find natural groups | Unlabelled | Unsupervised - clustering |
| Reduce feature dimensions | Unlabelled | Unsupervised - dimensionality reduction |
| Detect outliers | Unlabelled | Unsupervised - anomaly detection |
| Learn from sequential decisions | Reward signals | Reinforcement learning |
| Image, audio, or text at scale | Labelled or unlabelled | Deep learning |
| Use a general-purpose pre-trained model | None (prompting) | Foundation models / generative AI |
ML performance metrics
Classification metrics
| Metric | What it measures | When to use it |
|---|---|---|
| Accuracy | Correct predictions / total predictions | Balanced classes only |
| Precision | True positives / (true positives + false positives) | When false positives are costly (spam filter) |
| Recall | True positives / (true positives + false negatives) | When false negatives are costly (cancer detection) |
| F1 | Harmonic mean of precision and recall | Imbalanced classes |
| AUC-ROC | Area under the ROC curve | Ranking models by discrimination ability |
For imbalanced datasets (e.g. fraud detection where 99% of transactions are legitimate), accuracy is misleading - a model predicting “not fraud” every time scores 99%. F1 or AUC-ROC gives a truer picture.
Regression metrics
| Metric | What it measures |
|---|---|
| MAE | Average absolute error - interpretable in the same units as the target |
| MSE | Penalises large errors more than MAE |
| RMSE | Square root of MSE - back in the target’s units, sensitive to outliers |
| R² | Proportion of variance explained by the model (1.0 = perfect) |
Amazon AI services and their usage
| Service | What it does |
|---|---|
| Amazon SageMaker | Full ML lifecycle - build, train, evaluate, deploy, and monitor custom models |
| Amazon Bedrock | Managed API access to foundation models from AWS and third-party providers |
| Amazon Rekognition | Image and video analysis - object detection, facial analysis, content moderation |
| Amazon Comprehend | NLP - entity recognition, sentiment analysis, key phrase extraction, PII detection |
| Amazon Textract | Extracts text and structured data from documents and forms |
| Amazon Transcribe | Speech-to-text; Transcribe Medical for clinical audio |
| Amazon Polly | Text-to-speech with neural voices |
| Amazon Lex | Conversational AI for building chatbots (same engine as Alexa) |
| Amazon Personalize | Real-time personalisation and recommendations |
| Amazon Forecast | Time series forecasting |
| Amazon Kendra | Intelligent enterprise search with ML-powered relevance |
| Amazon Translate | Neural machine translation |
| Amazon Augmented AI (A2I) | Human review workflows for ML predictions |
Different types of inference
| Inference type | What it is | Use it for | AWS |
|---|---|---|---|
| Real-time | Synchronous, low-latency - request goes in, response comes back immediately | User-facing features, APIs | SageMaker real-time endpoints, Bedrock on-demand |
| Batch | Process a large dataset offline, results written to storage, no latency requirement | Overnight scoring jobs, bulk document processing | SageMaker Batch Transform |
| Async | Client submits a request, gets a job ID, polls or receives a callback when done | Inference takes minutes (large inputs, complex models) | SageMaker Async Inference |
| Serverless | No always-on endpoint; infrastructure scales from zero, cold start on first request after idle | Intermittent or unpredictable traffic where you don’t want to pay for idle capacity | SageMaker Serverless Inference |
| Edge | Model runs on-device, no round trip to the cloud | Latency, connectivity, or data residency rules out cloud inference | SageMaker Edge Manager, AWS Greengrass |
Bedrock vs SageMaker
| Bedrock | SageMaker | |
|---|---|---|
| What you bring | Prompts and data | Training data and model code |
| Model | Pre-trained foundation model from a provider | Custom model you train or bring your own |
| Training | None - model weights are fixed (unless fine-tuning via Bedrock) | Full training and retraining control |
| Deployment | Managed by AWS, no infrastructure to configure | You configure and manage endpoints |
| Use case | Consume a general-purpose model with prompting, RAG, or agents | Build, train, and host a custom model for a specific task |
| Skill requirement | Prompt engineering, RAG patterns | ML engineering, MLOps |
The key distinction: Bedrock is for consuming foundation models; SageMaker is for building and operating your own.
Bedrock vs SageMaker inference mapping
| Inference type | Bedrock | SageMaker |
|---|---|---|
| Real-time (on-demand) | On-demand throughput - pay per token, no reservation | Real-time endpoints - always-on, pay per hour |
| Provisioned / reserved | Provisioned Throughput - reserve model units for guaranteed capacity | Provisioned endpoints with auto scaling |
| Batch | Batch inference via S3 | Batch Transform |
| Async | Not natively - wrap with Lambda or Step Functions | Async Inference endpoints |
| Serverless | On-demand behaves serverless for most Bedrock use cases | Serverless Inference endpoints |
| Edge | Not applicable | SageMaker Edge Manager / Greengrass |
Foundation model customisation approaches
In order of increasing cost, complexity, and control:
| Approach | What it does | Cost/complexity | AWS |
|---|---|---|---|
| Prompt engineering | No model changes, just better instructions. Worth exhausting before reaching for anything else | Zero, beyond inference | - |
| RAG (Retrieval-Augmented Generation) | Inject relevant context at inference time from an external knowledge base. Keeps knowledge fresh without retraining; best for factual, updatable knowledge | Low | Bedrock Knowledge Bases |
| Continued pre-training | Train a base model further on domain-specific unlabelled text. Adapts the model’s language to a specialised domain (medical, legal, financial) without supervised examples | High - significant data and compute | - |
| Fine-tuning | Train the model on labelled input-output examples to adjust its behaviour or output style. Better than RAG for consistent tone, format, or domain-specific task performance | Medium-high | Bedrock fine-tuning, SageMaker |
| RLHF (Reinforcement Learning from Human Feedback) | Refine model behaviour using human preference signals - the technique used to align foundation models | Provider-side - not typically something you run yourself | - |
| Pre-train from scratch | Build a foundation model on your own data from the ground up | Highest - only viable with the data and compute budget of a foundation model provider | - |
Responsible AI tools
| Tool | What it does |
|---|---|
| Amazon SageMaker Clarify | Detects bias in training data and model predictions; explains model decisions via feature importance |
| Amazon Bedrock Guardrails | Filters content, denies specified topics, redacts PII from model inputs and outputs |
| Amazon Augmented AI (A2I) | Routes low-confidence predictions to human reviewers for validation |
| SageMaker Model Monitor | Detects data drift and model quality degradation in production endpoints |
| SageMaker Model Cards | Documents model purpose, training data, evaluation results, and intended use |
Prompt engineering techniques
| Technique | How it works |
|---|---|
| Zero-shot | Instruction only, no examples. Works for well-understood tasks the model has seen in training. |
| Few-shot | Include two to five examples of the desired input-output format before the actual query. Useful for consistent formatting or domain-specific phrasing. |
| Chain-of-thought | Ask the model to reason step by step before giving a final answer. Improves accuracy on multi-step reasoning and maths problems. |
| ReAct | Interleave reasoning and tool use: the model reasons, takes an action (e.g. search), observes the result, then reasons again. Basis for agentic workflows. |
| System prompt | Set the model’s persona, role, and constraints at the top of the conversation. Applied before the user message and influences all subsequent responses. |
Notes
- This chapter doubles as the concept reference behind my AIF-C01 study notes - the same algorithm types, metrics, and service distinctions come up across that exam.
- AWS AI services move fast, particularly Bedrock model availability and features - treat the service-specific sections as a snapshot, not a permanent reference.