ScopeDB related information
concepts
partitionis one to one relation mapped to axxxx.celtyon s3. a.celtyfile is a partitionpush downmeans pushing down the compute responsibility of DQL functions to scopedb. for example, thesumoperator in dql will be pushed down toAGGREGATE SUMto scopedb
components
scopedb-job
basically cronjob for scopedb. each job is just executing a scopeql at a certain time.
most jobs are created by guance-select when workspace are created
two common jobs: 1. deletion of data outside of retention 2.
OPTIMIZE TABLEOPTIMIZE TABLEbasically try to aggregate small partitions into an optimize size partition for quering. this job by default runs once every 15 minutes. but it for some tables, it takes almost 1 hour to execute this job. under those scenarios, some job instances will be skipped.
scopedb-query-meta
- for guance-select to get metadata of table during query. usuaslly
4c8gis enough
scopedb-ingest-meta
- for guance-select to get metadata of table during ingest. usuaslly
4c8gis enough
scopedb-query
does the heavy lifting of querying data
scopedb-query itself dosn't route any queries internally. which query nodegroup for a query to go to is solely decided by guance-select
sometimes it get oomed. we don't need to care about it.
FAQ
Q: why do we have to have multiple nodegroups for query workload? What's the difference between query0 and query1?
A: According to guancedb dev, the separation between query0 and query1 makes sure a single query wouldn't take up all the resources available. im guessing he was implying this sort of scenario: when a large query comes to guancedb-select, guancedb-select would select one query workload and then use up all the resources available, for example it can choose query0 and use all the cpu available, then a scopedb-query1 makes sure there are still available resources for other queries
Q: how to find the partition number scanned by a query
A: use this dql in the tracing page to see
service:~scopedb-query resource:driver.execute_statement. you should see it in the span attributesQ: how to find out the number of partition that need to be scanned for a certain piece of data in a table
A:
EXPLAIN EXEC FROM t15.l
WHERE time = '2025-08-16T00:00:00Z'::timestamp;