Skip to content

ScopeDB related information

concepts

  • partition is one to one relation mapped to a xxxx.celty on s3. a .celty file is a partition

  • push down means pushing down the compute responsibility of DQL functions to scopedb. for example, the sum operator in dql will be pushed down to AGGREGATE SUM to scopedb

components

scopedb-job

basically cronjob for scopedb. each job is just executing a scopeql at a certain time.

  • most jobs are created by guance-select when workspace are created

  • two common jobs: 1. deletion of data outside of retention 2. OPTIMIZE TABLE

  • OPTIMIZE TABLE basically try to aggregate small partitions into an optimize size partition for quering. this job by default runs once every 15 minutes. but it for some tables, it takes almost 1 hour to execute this job. under those scenarios, some job instances will be skipped.

scopedb-query-meta

  • for guance-select to get metadata of table during query. usuaslly 4c8g is enough

scopedb-ingest-meta

  • for guance-select to get metadata of table during ingest. usuaslly 4c8g is enough

scopedb-query

  • does the heavy lifting of querying data

  • scopedb-query itself dosn't route any queries internally. which query nodegroup for a query to go to is solely decided by guance-select

  • sometimes it get oomed. we don't need to care about it.

FAQ

  • Q: why do we have to have multiple nodegroups for query workload? What's the difference between query0 and query1?

  • A: According to guancedb dev, the separation between query0 and query1 makes sure a single query wouldn't take up all the resources available. im guessing he was implying this sort of scenario: when a large query comes to guancedb-select, guancedb-select would select one query workload and then use up all the resources available, for example it can choose query0 and use all the cpu available, then a scopedb-query1 makes sure there are still available resources for other queries

  • Q: how to find the partition number scanned by a query

  • A: use this dql in the tracing page to see service:~scopedb-query resource:driver.execute_statement. you should see it in the span attributes

  • Q: how to find out the number of partition that need to be scanned for a certain piece of data in a table

  • A:

sql
EXPLAIN EXEC FROM t15.l
WHERE time = '2025-08-16T00:00:00Z'::timestamp;