Generative AI made easy for engineering teams.
LastMile AI AutoEval — Generative AI Evaluation Platform is an enterprise-grade platform designed to help developers test, evaluate, and benchmark AI applications. It ships with battery-included evaluation metrics for RAG and multi-agent AI, along with fine-tuning capabilities to design custom evaluators. The platform emphasizes reproducibility, private deployment, and real-time evaluation at scale.
Python example (pip install lastmile):
from lastmile.lib.auto_eval import AutoEval, BuiltinMetrics
client = AutoEval()
result = client.evaluate_data( data=pd.DataFrame({"input": ["Where did the author grow up?"], "output": ["France"], "ground_truth": ["England"]}), metrics=[BuiltinMetrics.FAITHFULNESS] )
TypeScript example:
const { AutoEval, BuiltinMetrics } = require("lastmile/lib/auto_eval");
const client = await AutoEval.create();
const result = await client.evaluateData({ data: [{ input: "Where did the author grow up?", output: "France", ground_truth: "England" }], metrics: [BuiltinMetrics.FAITHFULNESS] });
Custom metrics: Upload app data, define your own evaluator models, and fine-tune as needed.