Dropping BDB locking, bitcoind finally follows the Bitcoin protocol

2024-03-22

Dropping BDB locking, bitcoind finally follows the Bitcoin protocol

Filed under: Bitcoin, JWRD, Networks, Software — Jacob Welsh @ 05:30

After diagnosing and measuring the problem, considering options, and mapping the minefield, all that remains is to Make It So. Given the state of the codebase involved, this too explodes into more work than one might like⁽ⁱ⁾ - a whole five patches in this case, of which three got a second iteration after the writing and re-reading.

For the results with regards to the original problem, see the new debug log output and then let these thousand words say the rest:

That's right: a successful reorg 503 blocks deep, during which memory usage stayed almost completely flat and at no point exceeded 600 megabytes (base-2 megabytes, that is; a hair over 600 in base-10), where the "resting state" was previously somewhere around 2 gigabytes. Bring on the Pentium 3 nodes, baby!

Much of the reduction is simply from dropping the millions of lock/locker/object slots reserved in the database environment for BDB's locking system. The rest is likely from dropping the vResurrect and vDelete vectors where transactions were "queued" during the reorg for later mempool update. From the timings, we see that even with the "wasteful" second pass of rereading the block files instead of memorizing them, the mempool update as a whole took merely 13 seconds out of 2,777 or 0.5% of the total time.

Some lesser but still significant wins come along in this batch, too. First, it is now possible to "cleanly" shut down the node, i.e. using the "stop" command, terminal interrupt or termination signal, without it taking absolutely forever. FINALLY! I now rescind my prior advice to always use "kill -9", though it should still be safe to do so as far as database integrity is concerned (but please do remember to include the ~/.bitcoin/database/ directory - where the writeahead log goes - with any backups you take of blkindex.dat, and take them all from a point-in-time consistent state i.e. with the daemon stopped).

Second, the build process now completes without spewing a massive torrent of compiler warnings. FINALLY! This was done by a combination of suppressing the most prolific and less-than-helpful warnings, with code improvements to satisfy others, in some cases actually fixing a potential problem. For instance, the worst case of reckless stack buffer allocation is replaced with safe heap allocation.⁽ⁱⁱ⁾

I validated a subset of the warning-driven changes by checking that the build produced identical binaries before and after. It's hard to do this in general because any slight change in line numbering will throw it off due to the embedded debug messages; but at minimum, I confirmed the addition of braces with BOOST_FOREACH works as expected, i.e. the macro behaves syntactically the same as a "for" loop with or without the braces, which was quite difficult to tell from looking at its lengthy definition in the Boost headers.

The patch does not completely eliminate the warnings, but they're looking far more approachable for sure and it achieves the main goal which was that any new ones produced by a change don't go unnoticed.

Patch listing

Patch	Seals	Tree
bitcoin_reorg_tracing.vpatch⁽ⁱⁱⁱ⁾	jfw	Browse
bitcoin_db_shutdown_checkpoint_calming.vpatch^(iv)	jfw	Browse
bitcoin_drop_bdb_locking.vpatch^(v)	jfw	Browse
bitcoin_easy_warning_fixes.vpatch^(vi)	jfw	Browse
bitcoin_reorg_bounded_space.vpatch^(vii)	jfw	Browse

A sixth fetch-bitcoind release that wraps these up along with the recent keksum work is up in the canonical place. Besides the new patches, it updates the base URLs for my change of domain name although I've still kept the old ones working.

Then again, that would depend on the "one" in question, wouldn't it ? For one not allergic to work itself, perhaps it's exactly the right amount. How would you even know ? [^]
The C/C++ execution environment provides no guarantee that stack overflow is detected, and in particular, single stack frames much larger than the OS page size can sometimes be exploitable, especially in multi-threaded programs. I'm simplifying of course; look into "stack clashing" for the sad details. [^]
Add more useful log messages for tracking reorganize progress (as well as initial block connection). [^]
Remove various frantic, overdone database-related activities, mostly in the shutdown path where they were doing more harm than good. The goal is to make shutdown fast and somewhat easier to analyze. Checkpointing comes up because it's no longer necessarily done at shutdown. The conditions for periodic checkpoints are simplified; in particular, it's no longer done ~every time a db handle object goes out of scope. Given the use of DB_TXN_WRITE_NOSYNC this means update durability with respect to system crash is somewhat relaxed, but this seems fine given the nature of the application. The size limit on db transaction log files is raised 10x, restoring its earlier value. [^]
Turn off BDB's page-level locking subsystem, with some tweaks to firm up the already largely single-threaded pattern of database use. This allows removing the magic-number limits on locks, lockers and lock objects; the numbers were reported by past authority to be sufficient for any 1MB block, but have proven insufficient for large reorganizations, because BDB accumulates locks for the duration of a transaction. It also saves nearly 2GB of preallocations in the database environment (__db.00[1-5] files) to support the large limits. [^]
Quiet the vast majority of compiler warning spew so any actual trouble stands a chance of being spotted. Continue incremental conversion from CRITICAL_BLOCK to SCOPED_LOCK, as that and BOOST_FOREACH are the main sources of spurious -Wparentheses due to the macros' internal if-statements. Besides the more cosmetic changes, this cures a reckless stack buffer allocation (200k, when truncating the debug log) and a potential integer overflow (in string formatting code). Related string handling routines still need attention though. [^]
Remove linear heap allocation by Reorganize, at the "cost" of rereading the affected blocks from disk after database commit to update the mempool. As usual, even the smallest of fixes uncovers other landmines: resurrected transactions were not checked for minimum relay fee; TxnAbort was doubly invoked in the case of ConnectBlock failure; invalid blocks were not distinguished from db failure - and still aren't, but the warning text is finessed (again) so as not to lie outright. The mempool itself remains unbounded, though at least now the minimum fee upholds a cost for entering it. [^]

3 Comments »

Well, that was an incredible turn around. I’ll update my nodes soon.

Comment by whaack — 2024-03-23 @ 00:10
Cheers. It was maybe two weeks of work all told with some interruptions in between. Things can be fixed!

Comment by Jacob Welsh — 2024-03-23 @ 19:14
[...] even 4G RAM is likely overkill after my last round of work but yeah the storage is what it is jfw: http://jfxpt.com/2024/jwrd-logs-for-Jun-2024/#11816 - [...]

Pingback by Blockchain kindergarten: why proof of work? « Fixpoint — 2024-06-21 @ 02:25

RSS feed for comments on this post. TrackBack URL

Fixpoint

2024-03-22