Evaluation how-to guides

These guides answer “How do I….?” format questions. They are goal-oriented and concrete, and are meant to help you complete a specific task. For conceptual explanations see the Conceptual guide. For end-to-end walkthroughs see Tutorials. For comprehensive descriptions of every class and function see the API reference.

Offline evaluation

Evaluate and improve your application before deploying it.

Run an evaluation

Define an evaluator

Configure the evaluation data

Configure an evaluation job

Unit testing

Unit test your system to identify bugs and regressions.

Unit test applications (Python only)

Online evaluation

Evaluate and monitor your system's live performance on production data.

Automatic evaluation

Set up evaluators that automatically run for all experiments against a dataset.

Analyzing experiment results

Use the UI & API to understand your experiment results.

Dataset management

Manage datasets in LangSmith used by your evaluations.

Annotation queues and human feedback

Collect feedback from subject matter experts and users to improve your applications.

Evaluation how-to guides

Offline evaluation

Run an evaluation

Define an evaluator

Configure the evaluation data

Configure an evaluation job

Unit testing

Online evaluation

Automatic evaluation

Analyzing experiment results

Dataset management

Annotation queues and human feedback

Was this page helpful?

You can leave detailed feedback on GitHub.

Offline evaluation​

Run an evaluation​

Define an evaluator​

Configure the evaluation data​

Configure an evaluation job​

Unit testing​

Online evaluation​

Automatic evaluation​

Analyzing experiment results​

Dataset management​

Annotation queues and human feedback​

Was this page helpful?

You can leave detailed feedback on GitHub.

Offline evaluation

Run an evaluation

Define an evaluator

Configure the evaluation data

Configure an evaluation job

Unit testing

Online evaluation

Automatic evaluation

Analyzing experiment results

Dataset management

Annotation queues and human feedback