Launch an eval
Launch an evaluation. This is the API-equivalent of the Eval function that is built into the Braintrust SDK. In the Eval API, you provide pointers to a dataset, task function, and scoring functions. The API will then run the evaluation, create an experiment, and return the results along with a link to the experiment. To learn more about evals, see the Evals guide.
Authorization
AuthorizationMost Braintrust endpoints are authenticated by providing your API key as a header Authorization: Bearer [api_key] to your HTTP request. You can create an API key in the Braintrust organization settings page.
In: header
Request Body
application/jsonRequiredEval launch parameters
project_idUnique identifier for the project to run the eval in
dataThe dataset to use
taskThe function to evaluate
scoresThe functions to score the eval on
experiment_namestringAn optional name for the experiment created by this eval. If it conflicts with an existing experiment, it will be suffixed with a unique identifier.
metadataobjectOptional experiment-level metadata to store about the evaluation. You can later use this to slice & dice across experiments.
parentAny properties in span_parent_struct, stringOptions for tracing the evaluation
streambooleanWhether to stream the results of the eval. If true, the request will return two events: one to indicate the experiment has started, and another upon completion. If false, the request will return the evaluation's summary upon completion.
trial_countnumberThe number of times to run the evaluator per input. This is useful for evaluating applications that have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the variance in the results.
is_publicbooleanWhether the experiment should be public. Defaults to false.
timeoutnumberThe maximum duration, in milliseconds, to run the evaluation. Defaults to undefined, in which case there is no timeout.
max_concurrencynumberThe maximum number of tasks/scorers that will be run concurrently. Defaults to undefined, in which case there is no max concurrency.
base_experiment_namestringAn optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.
base_experiment_idstringAn optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment.
git_metadata_settingsobjectOptional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.
repo_infoobjectEval launch response