Small. Fast. Reliable.
Choose any three.
The DBSTAT Virtual Table

1. Overview

The DBSTAT virtual tables is a read-only eponymous virtual table that returns information about which pages of the database files are used by which tables and indexes in the schema. The DBSTAT virtual table is used to implement sqlite3_analyzer.exe utility program, and to help compute the table size pie-chart in the Fossil-implemented version control system for SQLite.

The dbstat virtual table is available on all database connections when SQLite is built using the SQLITE_ENABLE_DBSTAT_VTAB compile-time option. The dbstat virtual table provides low-level information about btree and overflow pages in a database file.

The dbstat virtual table is an eponymous virtual table, meaning that is not necessary to run CREATE VIRTUAL TABLE to create an instance of the dbstat virtual table before using it. The "dbstat" module name can be used as if it were a table name to query the dbstat virtual table directly. For example:

SELECT * FROM dbstat;

If a named virtual table that uses the dbstat module is desired, then the recommended way to create an instance of the dbstat virtual table is as follows:

CREATE VIRTUAL TABLE temp.stat USING dbstat(main);

Note the "temp." qualifier before the virtual table name ("stat"). This qualifier causes the virtual table to be temporary - to only exist for the duration of the current database connection. This is the recommended approach.

The "main" argument to dbstat is default schema for which information is to be provided. The default is "main", and so the use of "main" in the example above is redundant. For any particular query, the schema can be changed by specifying the alternative schema as a function argument to the virtual table name in the FROM clause of the query. (See further discussion of table-valued functions in the FROM clause for more details.)

The schema for the dbstat virtual table is this:

CREATE TABLE dbstat(
  name       TEXT,        -- Name of table or index
  path       TEXT,        -- Path to page from root
  pageno     INTEGER,     -- Page number
  pagetype   TEXT,        -- 'internal', 'leaf' or 'overflow'
  ncell      INTEGER,     -- Cells on page (0 for overflow)
  payload    INTEGER,     -- Bytes of payload on this page
  unused     INTEGER,     -- Bytes of unused space on this page
  mx_payload INTEGER,     -- Largest payload size of all cells on this page
  pgoffset   INTEGER,     -- Offset of page in file
  pgsize     INTEGER,     -- Size of the page
  schema     TEXT HIDDEN  -- Database schema being analyzed
);

There is a single row of the dbstat table for each page in the database file. Freelist pages, the lock page, and pointer-map pages of the database file do not appear in the dbstat virtual table.

2. The "path" column of the dbstat virtual table

The "path" column describes the path taken from the root node of the btree structure to each page. The "path" of the root node itself is '/'. The "path" for the left-most child page of the root of a btree page is '/000/'. (Btrees store content ordered from left to right so the pages to the left have smaller keys than the pages to the right.) The next to left-most child of the root page is '/001', and so on, each sibling page identified by a 3-digit hex value. The children of the 451st left-most sibling have paths such as '/1c2/000/, '/1c2/001/' etc. Overflow pages are specified by appending a '+' character and a six-digit hexadecimal value to the path to the cell they are linked from. For example, the three overflow pages in a chain linked from the left-most cell of the 450th child of the root page are identified by the paths:

'/1c2/000+000000'         // First page in overflow chain
'/1c2/000+000001'         // Second page in overflow chain
'/1c2/000+000002'         // Third page in overflow chain

If the paths are sorted using the BINARY collation sequence, then the overflow pages associated with a cell will appear earlier in the sort-order than its child page:

'/1c2/000/'               // Left-most child of 451st child of root

3. Example uses of the dbstat virtual table

To find the total number of pages used to store table "xyz" in schema "aux1", use:

SELECT count(*) FROM dbstat('aux1') WHERE name='xyz';

To see how efficiently the content of a table is stored on disk, compute the amount of space used to hold actual content divided by the total amount of disk space used. The closer this number is to 100%, the more efficient the packing. (In this example, the 'xyz' table is assumed to be in the 'main' schema.)

SELECT sum(pgsize-unused)/sum(pgsize) FROM dbstat WHERE name='xyz';

To find the average fan-out for a table, run:

SELECT avg(ncell) FROM dbstat WHERE name='xyz' AND pagetype='internal';

Modern filesystems operate faster when disk accesses are sequential. Hence, SQLite will run faster if the content of the database file is on sequential pages. To find out what fraction of the pages in a database are sequential (and thus obtain a measurement that might be useful in determining when to VACUUM), run a query like the following:

CREATE TEMP TABLE s(rowid INTEGER PRIMARY KEY, pageno INT);
INSERT INTO s(pageno) SELECT pageno FROM dbstat ORDER BY path;
SELECT sum(s1.pageno+1==s2.pageno)*1.0/count(*)
  FROM s AS s1, s AS s2
 WHERE s1.rowid+1=s2.rowid;
DROP TABLE s;