Context
When working with LangSmith, users may need to export their annotated traces in bulk for analysis, data portability, or further processing. While this functionality is not available directly through the user interface, it can be accomplished programmatically using the LangSmith SDK.
Answer
To export annotated traces in bulk from LangSmith, you'll need to use the Python SDK to retrieve runs from an annotation queue and export them to a CSV file. Here's how to do it:
First, ensure you have the LangSmith client installed and properly configured
Retrieve the annotation queue ID from the URL in your address bar and plug it into the script below under QUEUE_ID_HERE. E.g.:
https://smith.langchain.com/o/WORKSPACE_ID/annotation-queues/ANNOTATION_QUEUE_IDUse the following Python script to export your annotation queue data:
from langsmith import Client
def export_annotation_queue_data(queue_id):
client = Client()
# Get all runs from the annotation queue
annotated_runs = []
index = 0
print(f"Retrieving runs from annotation queue: {queue_id}")
while True:
try:
run = client.get_run_from_annotation_queue(
queue_id=queue_id,
index=index
)
annotated_runs.append(run)
index += 1
except Exception:
break
print(f"Found {len(annotated_runs)} runs in the queue")
# Extract annotations for each run
all_annotations = []
for run in annotated_runs:
# Get feedback/annotations for this run
feedback_list = client.list_feedback(run_ids=[run.id])
for feedback in feedback_list:
annotation_data = {
'run_id': run.id,
'run_name': run.name,
'feedback_key': feedback.key,
'score': feedback.score,
'value': feedback.value,
'comment': feedback.comment,
'created_at': feedback.created_at,
'inputs': run.inputs,
'outputs': run.outputs
}
all_annotations.append(annotation_data)
return all_annotations
# Usage
queue_id = "QUEUE_ID_HERE"
annotations = export_annotation_queue_data(queue_id)
# # Convert to DataFrame or CSV for easier analysis
import pandas as pd
df = pd.DataFrame(annotations)
df.to_csv('annotation_queue_data.csv', index=False)The exported CSV file will contain all annotated traces from the specified queue, including run IDs, feedback, scores, comments, and associated input/output data.