vitals

The vitals package provides a framework for evaluating large language model (LLM) applications built with ellmer in R. It helps developers measure and compare the performance, cost, and latency of LLM products like custom chat apps.

The package allows you to assess whether prompt changes or new tools improve your LLM application, compare different models’ effects on performance metrics, and identify problematic behaviors. It’s an R port of the Python Inspect framework and writes evaluation logs compatible with the Inspect log viewer, making it straightforward to transition between the two tools if needed. Evaluations are built from three components: datasets with input/target pairs, solvers that generate responses to inputs, and scorers that measure how well solver outputs match targets.

vitals

Contributors

Simon Couch

Hadley Wickham

Jeroen Janssens

Mine Çetinkaya-Rundel

Tomasz Kalinowski