Mastering Prompt Optimization with Amazon Bedrock: A Step-by-Step Guide

Introduction

Amazon Bedrock’s new Advanced Prompt Optimization tool streamlines prompt tuning across multiple models, helping you migrate between models or boost performance on your current one. It compares your original prompts with optimized versions on up to five models simultaneously, using a metric-driven feedback loop that incorporates ground truth, evaluation metrics, and optional LLM-as-a-judge or AWS Lambda functions. This guide walks you through the process step by step.

Mastering Prompt Optimization with Amazon Bedrock: A Step-by-Step Guide — Source: aws.amazon.com

What You Need

An AWS account with permissions to access Amazon Bedrock
Your prompt template (plain text or with variable placeholders)
Example user inputs for each variable
Ground truth answers for evaluation
A clear evaluation metric (e.g., accuracy, relevance) – can be provided as a natural language description, an LLM-as-a-judge rubric, or an AWS Lambda function ARN
Optional: multimodal inputs in PNG, JPG, or PDF format for document/image analysis tasks
A JSONL file prepared according to the required schema (see Step 3)

Step 1: Access the Advanced Prompt Optimization Page

Log in to your AWS Management Console and navigate to Amazon Bedrock. On the left sidebar, find Advanced Prompt Optimization and click Create prompt optimization. This opens the configuration wizard.

Step 2: Select Up to Five Models

You can optimize your prompt for up to five inference models simultaneously. This is particularly useful when:

You are migrating from one model to another (select your current model as a baseline plus up to four target models).
You simply want to see how much a model’s own performance improves after optimization (select only your current model).

Each model will receive the original and optimized prompts, allowing you to compare results directly.

Step 3: Prepare Your Prompt Template in JSONL Format

The tool requires your prompt templates and evaluation data in JSONL format (each JSON object on a single line). Below is the structure:

{
    "version": "bedrock-2026-05-14",
    "templateId": "string",
    "promptTemplate": "string",
    "steeringCriteria": ["string"],
    "customEvaluationMetricLabel": "string",
    "customLLMJConfig": {
        "customLLMJPrompt": "string",
        "customLLMJModelId": "string"
    },
    "evaluationMetricLambdaArn": "string",
    "evaluationSamples": [
        {
            "inputVariables": [
                {
                    "variableName1": "value1",
                    "variableName2": "value2"
                }
            ],
            "referenceResponse": "ground truth"
        }
    ]
}

Key fields:

version: must be bedrock-2026-05-14 (fixed).
promptTemplate: Your instruction with placeholders like .
steeringCriteria: Optional hints for the optimizer (e.g., “be concise”).
evaluationSamples: Array of objects containing variable values and the ground truth answer.
One of customLLMJConfig, evaluationMetricLambdaArn, or a natural language description must be provided for evaluation.

Step 4: Define Your Evaluation Criteria

You can guide the optimization using one of three methods:

Natural language description: e.g., “Prefer answers that are factually correct and include citations.”
LLM-as-a-judge rubric: Provide a custom prompt and model ID in the customLLMJConfig field.
AWS Lambda function: Supply its ARN in evaluationMetricLambdaArn. This function should take the model’s response and the ground truth, then return a score.

The optimizer uses this metric in a feedback loop, iteratively refining the prompt until the best possible score is achieved.

Step 5: Run the Optimization

Upload your JSONL file through the console or via the AWS CLI/API. Click Start optimization. The process may take several minutes depending on the number of models and sample size. The tool will output:

Original and final prompt templates for each model
Evaluation scores
Cost estimates (tokens consumed)
Latency metrics

Step 6: Analyze Results

Compare the performance of your original prompt versus the optimized version across all selected models. Look for:

Score improvements – did the metric increase?
Regression checks – ensure known use cases still work well.
Cost and latency trade‑offs – optimizations that increase length may raise cost.

If you’re migrating, you can now confidently switch to a new model using the optimized prompt.

Tips for Success

Start small: Use 3–5 evaluation samples initially to test the workflow, then scale up.
Be specific with criteria: Vague steering instructions like “make it better” yield weaker results. Use concrete guidance.
Leverage multimodal inputs: If your task involves documents or images, include PNG, JPG, or PDF files as part of the prompt template.
Iterate: Run optimization multiple times with different evaluation metrics or steering criteria to discover the best prompt.
Monitor costs: Since the optimizer calls models repeatedly, set budget alerts to avoid surprise charges.
Version your prompts: Save each optimized version with notes on the model and metric used for future reference.

Tags: