Use this job to drop data older than X days.
Warnings
1. Please run during off-peak hours (nights, weekends). These can be expensive operations.
2. These queries perform DELETE operations. It is recommended to take a backup of your persistent volume before performing these operations.
3. Run these queries at your own risk. LangChain is not responsible for any lost data.
Step 1: Find parts to target
SELECT
r._part,
min(r.inserted_at) AS min_inserted_at,
max(r.inserted_at) AS max_inserted_at,
countIf(r.inserted_at < now() - INTERVAL 100 DAYS) AS rows_before_cutoff
FROM runs AS r
JOIN system.parts AS p
ON p.database = currentDatabase()
AND p.table = 'runs'
AND p.name = r._part
AND p.active
WHERE r.trace_first_received_at IS NOT NULL
AND r.ttl_seconds IS NOT NULL
GROUP BY r._part
ORDER BY rows_before_cutoff DESC;This returns output like:
┌─_part───────────────────────────┬────────────min_inserted_at─┬────────────max_inserted_at─┬─rows_before_cutoff─┐
1. │ all_0_5533517_534_9235849 │ 2025-07-13 19:43:30.100585 │ 2025-09-21 14:53:27.728860 │ 1562791 │
2. │ all_8190823_8937443_419_9235849 │ 2025-10-02 15:12:43.225511 │ 2025-10-18 02:38:53.053977 │ 0 │
3. │ all_9001802_9082188_152_9235849 │ 2025-10-20 14:09:13.299046 │ 2025-10-23 17:12:03.223946 │ 0 │
4. │ all_8937444_9001801_104_9235849 │ 2025-10-18 02:38:59.031352 │ 2025-10-20 14:09:12.324759 │ 0 │
5. │ all_9230975_9235606_324_9235849 │ 2025-10-30 13:37:21.609983 │ 2025-10-30 18:48:03.960415 │ 0 │
6. │ all_9212826_9226335_174_9235849 │ 2025-10-29 21:54:33.545948 │ 2025-10-30 09:04:11.649532 │ 0 │
7. │ all_9226336_9229183_27_9235849 │ 2025-10-30 09:04:13.245519 │ 2025-10-30 11:38:03.496015 │ 0 │
8. │ all_5533518_8190822_519_9235849 │ 2025-09-21 14:53:29.635032 │ 2025-10-02 15:12:43.026718 │ 0 │
9. │ all_9185532_9197540_164_9235849 │ 2025-10-28 20:02:34.214949 │ 2025-10-29 09:01:58.561627 │ 0 │
10. │ all_9230038_9230974_23_9235849 │ 2025-10-30 12:41:07.511320 │ 2025-10-30 13:37:18.976109 │ 0 │
11. │ all_9167893_9185531_112_9235849 │ 2025-10-27 23:57:18.997924 │ 2025-10-28 20:02:07.192944 │ 0 │
12. │ all_9236339_9236339_0 │ 2025-10-30 19:17:13.213276 │ 2025-10-30 19:17:13.213276 │ 0 │
13. │ all_9236338_9236338_0 │ 2025-10-30 19:17:13.141552 │ 2025-10-30 19:17:13.141552 │ 0 │
14. │ all_9235994_9236282_158 │ 2025-10-30 19:07:11.710410 │ 2025-10-30 19:16:08.836892 │ 0 │
15. │ all_9082189_9167892_245_9235849 │ 2025-10-23 17:12:04.777948 │ 2025-10-27 23:57:18.634470 │ 0 │
16. │ all_9229184_9230037_22_9235849 │ 2025-10-30 11:38:13.896638 │ 2025-10-30 12:41:05.571393 │ 0 │
17. │ all_9197541_9212825_226_9235849 │ 2025-10-29 09:01:58.915913 │ 2025-10-29 21:54:29.526225 │ 0 │
18. │ all_9236283_9236337_14 │ 2025-10-30 19:16:09.678017 │ 2025-10-30 19:17:11.889849 │ 0 │
19. │ all_9235607_9235993_35_9235849 │ 2025-10-30 18:48:15.001440 │ 2025-10-30 19:06:43.890718 │ 0 │
└─────────────────────────────────┴────────────────────────────┴────────────────────────────┴────────────────────┘Target parts with the largest number of rows to delete first.
Step 2: Delete
Pick a part name and run the following query to trigger a DELETE mutation:
ALTER TABLE default.runs
DELETE WHERE _part = '<your_part_name_here>'
AND inserted_at < now() - interval X DAYS;Mutations run asynchronously, so the client will immediately return.
Step 3: Track the mutation
Track the status of the mutation with:
SELECT * FROM system.mutations WHERE is_done = 0 AND is_killed = 0;You should see a row (maybe 2 at some point). Once the mutation is complete, the is_done field changes to 1 and this query will not return anything.
WARNING: Please wait for each mutation to complete before proceeding, otherwise your query times may be significantly affected.
If you are seeing some significant CPU/memory spikes to the point where it is impacting queries, you can run the following to stop the active mutation:
KILL MUTATION WHERE mutation_id = <mutation_id_from_select_query>Step 4: Repeat
After a mutation, parts may be renamed. Re-run Step 1 to fetch updated part names, identify a part to delete, delete it, and track the mutation.
Repeat until you have your desired disk space.
You can repeat with the following tables. Use trace_first_received_at instead of inserted_at:
- `runs_metadata_kv`
- `runs_inputs_kv`
- `runs_outputs_kv`
- `runs_reference_example_id`
- `runs_run_id_v2`
- `runs_tags`
- `runs_token_counts`
- `runs_trace_id`
- `runs_lite`
- `root_runs_lite`
- `runs_feedbacks_rmt_wide`
- `runs_token_counts_sources_wide`
- `runs_token_counts_rolled_up_wide`