Overview
This article provides four SQL queries for understanding storage usage per workspace in a self-hosted LangSmith deployment backed by ClickHouse. Each query offers different tradeoffs between execution speed, detail, and the level of information returned.
Which query should I use?
- Query 1 (Postgres) - Fast, resolves workspace and org names, based on trace count. Best starting point.
- Query 2 (ClickHouse) - Slow, uses ingest metadata, does not resolve names.
- Query 3 (ClickHouse) - Slowest, measures actual data length, does not resolve names.
- Query 4 (Postgres + ClickHouse) - Slowest, measures actual data length, resolves names. Most complete.
Query 1: Trace Count from Postgres (Fast, Resolves Names)
This query retrieves trace counts from Postgres. It executes very quickly and resolves workspace and organization IDs to their display names. Since it is based only on trace count rather than actual byte size, the results are approximate but should correlate with storage usage in ClickHouse. This is the recommended starting point.
SELECT
o.id AS organization_id,
o.display_name AS organization_name,
t.id AS workspace_id,
t.display_name AS workspace_name,
tct.transaction_type,
coalesce(sum(tct.trace_count), 0) AS trace_count
FROM tenants t
JOIN organizations o ON o.id = t.organization_id
LEFT JOIN trace_count_transactions tct ON tct.tenant_id = t.id
GROUP BY 1, 2, 3, 4, 5
ORDER BY trace_count DESCQuery 2: Ingest-Based Storage from ClickHouse (Slow)
This query runs against ClickHouse and examines the metadata associated with traces at ingest time. It reads all traces, which can take some time to execute. The storage estimate is more directionally accurate than Query 1, but does not account for ClickHouse compression or traces offloaded to object storage (if that feature is enabled in your deployment). It does not resolve workspace IDs to names.
SELECT
r.tenant_id AS workspace_id,
count(*) AS run_count,
sum(r.input_size + r.output_size) AS total_bytes,
formatReadableSize(sum(r.input_size + r.output_size)) AS storage_used
FROM default.runs AS r
GROUP BY r.tenant_id
ORDER BY total_bytes DESC
LIMIT 20Query 3: Data-Length-Based Storage from ClickHouse (Slowest)
This query runs against ClickHouse and examines the actual data stored for each trace. It executes slowly but provides the best approximation of storage usage among the first three queries. Note that results are still directionally correct rather than exact, as they do not account for ClickHouse compression or storage-level overhead. It does not resolve workspace IDs to names.
SELECT
r.tenant_id AS workspace_id,
count(*) AS run_count,
sum(length(r.inputs) + length(r.outputs)) AS total_bytes,
formatReadableSize(sum(length(r.inputs) + length(r.outputs))) AS storage_used
FROM default.runs r
GROUP BY r.tenant_id
ORDER BY total_bytes DESC
LIMIT 20Query 4: Data-Length-Based Storage with Name Resolution (Slowest, Resolves Names)
This query combines Queries 1 and 3 by first retrieving the workspace-to-organization mapping from Postgres, then embedding those results into a ClickHouse query. This provides both the best available storage approximation and resolved workspace and organization names.
Step 1: Run in Postgres - Get workspace/org mapping
Execute the following query in Postgres to retrieve the workspace-to-organization mapping.
SELECT t.id AS workspace_id, t.display_name AS workspace_name,
o.id AS organization_id, o.display_name AS org_name
FROM tenants t
JOIN organizations o ON t.organization_id = o.idStep 2: Run in ClickHouse - Embed the Postgres results and compute storage
Replace the sample row in the workspaces CTE with the actual output from Step 1.
WITH workspaces AS (
SELECT workspace_id, workspace_name, org_name
FROM (
SELECT arrayJoin([
-- Replace with rows from the Postgres query above:
('d6684905-655c-429b-bd14-ba302d94c03b', 'Workspace 1', 'Personal')
]) AS t
)
ARRAY JOIN
[toUUID(t.1)] AS workspace_id,
[t.2] AS workspace_name,
[t.3] AS org_name
)
SELECT
ws.org_name,
ws.workspace_name,
r.tenant_id AS workspace_id,
count(*) AS run_count,
sum(length(r.inputs) + length(r.outputs)) AS total_bytes,
formatReadableSize(sum(length(r.inputs) + length(r.outputs))) AS storage_used
FROM default.runs r
JOIN workspaces ws ON ws.workspace_id = r.tenant_id
GROUP BY ws.org_name, ws.workspace_name, r.tenant_id
ORDER BY total_bytes DESC
LIMIT 20Important Notes
1) Storage estimates are approximate. None of these queries account for ClickHouse compression ratios or storage-level overhead. The values returned represent uncompressed logical data sizes.
2) Object storage offloading. If your deployment has blob/object storage enabled, traces offloaded to object storage will not be reflected in the ClickHouse-based Query 2.
3) Query performance. Queries 2, 3, and 4 scan the entire runs table in ClickHouse and can take a long time on large deployments. Start with Query 1 for a quick overview before running the slower queries.
4) LIMIT clause. Queries 2, 3, and 4 include LIMIT 20 by default. Adjust or remove this limit as needed to see all workspaces.