Amazon Bedrock has announced the general availability of its Intelligent Prompt Routing, a serverless endpoint that efficiently routes requests between different foundation models within the same model family. The system dynamically predicts the response quality of each model for a request and routes the request to the model it determines is most appropriate based on cost and response quality. The system incorporates state-of-the-art methods for training routers for different sets of models, tasks, and prompts. Users can use the default prompt routers provided by Amazon Bedrock or configure their own prompt routers to adjust for performance linearly between the performance of two candidate LLMs. The system has reduced the overhead of added components by over 20% to approximately 85 ms (P90), resulting in an overall latency and cost benefit compared to always hitting the larger/more expensive model. Amazon Bedrock has conducted internal tests with proprietary and public data to evaluate the system’s performance metrics.