Continued from:
Command implementations
The core scanning logic is in a helper function that takes a block's height and a memory view of its contents.
Referential integrity between blocks is ensured by scanning sequentially by height; that is, all relevant tx and output records from prior blocks will be known by the time we see the inputs that spend them. However, as far as I know this topological ordering is not guaranteed for the transaction sequence within a block (eg. tx 1 could spend outputs of tx 2, or vice versa) so we do separate passes over the transaction list for outputs and inputs.
def scan_block(height, v): stdout.write('block %s' % height) # [perf] computing every tx hash (blkhash, prev, time, target, txs), size = load_block(v)
The performance comment above was just to note some not-strictly-necessary work being done, in case the scan ended up horribly slow.(i)
An output is relevant if its script is standard and pays a known address. At least with foreign key constraints enabled, we can't insert an output until the tx record it references exists, but we don't know whether to insert the tx until we see if any of its outputs are relevant, so we again use a two-pass approach.
count_out = 0 n_tx = 0 for (hash, size, txins, txouts) in txs: matched_outs = [] for n, txout in enumerate(txouts): val, script = txout a = out_script_address(script) if a is not None: #print format_address(a) addr_id = get_address_id(a) if addr_id is not None: matched_outs.append((n, addr_id, val)) if len(matched_outs) > 0: tx_id = insert_or_get_tx_id(hash, blkhash, height, n_tx, size) for n, addr_id, val in matched_outs: insert_output(tx_id, n, addr_id, val) count_out += len(matched_outs) n_tx += 1 stdout.write(' new-outs %s' % count_out)
An input is relevant if it spends a known output. Recall that insert_input
updates the corresponding output to create the back-reference, indicating it has been spent.
# Inputs scanned second in case an output from the same block is spent. # Coinbase (input of first tx in block) doesn't reference anything. count_in = 0 n_tx = 1 for (hash, size, txins, txouts) in txs[1:]: matched_ins = [] for n, txin in enumerate(txins): prevout_hash, prevout_n, scriptsig = txin prevout_tx_id = get_tx_id(prevout_hash) if prevout_tx_id is not None: prevout_id = get_output_id(prevout_tx_id, prevout_n) if prevout_id is not None: matched_ins.append((n, prevout_id)) if len(matched_ins) > 0: tx_id = insert_or_get_tx_id(hash, blkhash, height, n_tx, size) for n, prevout_id in matched_ins: insert_input(tx_id, n, prevout_id) count_in += len(matched_ins) n_tx += 1 stdout.write(' spent-outs %s\n' % count_in)
Assorted helpers: handling usage errors; looking up a tag ID that must exist.
def die(msg, help=False):
stderr.write('gbw-node: %s\n' % msg)
if help:
cmd_help([])
exit(-1)
def require_tag(name):
i = get_tag_id(name)
if i is None:
die('tag not found: %r' % name)
return i
The entry point for any user command "X" is the function "cmd_X", having help text in its docstring and taking a list of any supplied CLI arguments past the command name.
First, the sync commands. The scan process commits one database transaction per block.
def cmd_scan(argv): ''' scan Iterate blocks from bitcoind, indexing transaction inputs and outputs affecting watched addresses. May be safely interrupted and resumed. NOT PRESENTLY SAFE TO RUN CONCURRENT INSTANCES due to the dumpblock to named pipe kludge. ''' db.execute('PRAGMA synchronous=NORMAL') height = db.execute('SELECT scan_height FROM state').fetchone()[0] blockcount = max(-1, rpc('getblockcount') - CONFIRMS) while height < blockcount: height += 1 scan_block(height, memoryview(getblock(height))) db.execute('UPDATE state SET scan_height = ?', (height,)) db.commit() def cmd_reset(argv): ''' reset Reset the scan pointer so the next scan will proceed from the genesis block, to find transactions associated with newly watched addresses. ''' db.execute('UPDATE state SET scan_height = -1') db.commit()
Next, commands to query the watched address sets (not in the original spec but trivial and clearly useful).
def cmd_tags(argv): ''' tags List all tag names. ''' for name, in db.execute('SELECT name FROM tag'): stdout.write(name + '\n') def cmd_addresses(argv): ''' addresses [TAG] List addresses with the given TAG (or all watched addresses). ''' if len(argv) > 0: tag_id = require_tag(argv.pop(0)) r = db.execute('SELECT address FROM address \ JOIN address_tag ON address.address_id=address_tag.address_id \ WHERE tag_id=?', (tag_id,)) else: r = db.execute('SELECT address FROM address') for a, in r: stdout.write(format_address(str(a)) + '\n')
To be continued.
- I've found the Python profiler quite useful so far compared to such guesswork; still, optimization is something of a balance between experimentally-driven efforts and not doing obviously wasteful things from the start. [^]
[...] presentation of Python code for the node extension: 1, 2, 3, 4, 5, [...]
Pingback by Gales Bitcoin Wallet (re)release « Fixpoint — 2021-12-03 @ 09:01