Getting Started
To get started, simply callweave.init(project=...) at the beginning of your script. Use the project argument to log to a specific W&B Team name with team-name/project-name or do project-name to log to your default team/entity.
Tracking Call Metadata
To track metadata from your Verdict pipeline calls, you can use theweave.attributes context manager. This context manager allows you to set custom metadata for a specific block of code, such as a pipeline run or evaluation batch.
Traces
Storing traces of AI evaluation pipelines in a central database is crucial during both development and production. These traces are essential for debugging and improving your evaluation workflows by providing a valuable dataset. Weave automatically captures traces for your Verdict applications. It will track and log all calls made through the Verdict library, including:- Pipeline execution steps
- Judge unit evaluations
- Layer transformations
- Pooling operations
- Custom units and transformations
Pipeline Tracing Example
Here’s a more complex example showing how Weave traces nested pipeline operations:- The main Pipeline execution
- Each JudgeUnit evaluation within the Layer
- The MeanPoolUnit aggregation step
- Timing information for each operation
Configuration
Upon callingweave.init(), tracing is automatically enabled for Verdict pipelines. The integration works by patching the Pipeline.__init__ method to inject a VerdictTracer that forwards all trace data to Weave.
No additional configuration is needed - Weave will automatically:
- Capture all pipeline operations
- Track execution timing
- Log inputs and outputs
- Maintain trace hierarchy
- Handle concurrent pipeline execution
Custom Tracers and Weave
If you’re using custom Verdict tracers in your application, Weave’sVerdictTracer can work alongside them:
Models and Evaluations
Organizing and evaluating AI systems with multiple pipeline components can be challenging. Using theweave.Model, you can capture and organize experimental details like prompts, pipeline configurations, and evaluation parameters, making it easier to compare different iterations.
The following example demonstrates wrapping a Verdict pipeline in a WeaveModel:
Evaluations
Evaluations help you measure the performance of your evaluation pipelines themselves. By using theweave.Evaluation class, you can capture how well your Verdict pipelines perform on specific tasks or datasets: