Fixpoint

2024-03-20

Auditing bitcoind for concurrent database objects: the call graph from hell

Filed under: Bitcoin, Data, JWRD, Networks, Software — Jacob Welsh @ 20:49

Where we left off, a stuck bitcoind instance had provided a fortuitous test case of the code's behavior under a deep reorganization, i.e. when one fork of the block chain overtakes another in the proof of work contest, and their divergence is not especially recent so there are many blocks and transactions to unwind and just as many new ones to process all at once. Basically, it's when there's been a successful "51% attack"; the network has been violently raped - whether deliberately or by incompetence doesn't especially matter - but still faces the choice to lie down and die or else to go on living.

This is no idle consideration, either, because such attacks have happened, repeatedly, with ongoing effects in censorship of legitimate transactions. The happenstance that very deep reorganizations have not manifested in the process is merely proof that the largest miners have conspired, acting as a cartel against the interests of Bitcoin. If sufficiently powerful interests defect from the soft consensus to claim the massive bounty, rough weather on the blockchain is almost a certainty while the battle plays out. So I, for one, wish my vessel to be seaworthy.

Various possible approaches came up for consideration, with the least invasive looking to be a turn away from BerkeleyDB's ever-troublesome fine-grained locking system in favor of the coarse-grained locks already in use for protecting memory structures throughout the codebase. This would also have the effect of reduced reliance on "black box" dependency code, with more responsibility moving to the C++ code of bitcoind itself. This would be a net win from my perspective, lacking in-depth knowledge of BDB internals and not quite seeing it worth the cost of acquisition at this point either.

To validate this change I proposed a simple locking discipline that was largely satisfied already and fit existing usage of C++ features. To restate: any thread doing something with a database object (any of class CDB, CTxDB, CAddrDB or CWalletDB) is to be mutually excluded from others doing the same for the lifetime of those objects, as well as any calls to the global dbenv object. So we have our roots: the concrete search terms by which to scour the whole codebase.

Further, by picking (overloading, perhaps) cs_main as the specific lock to use, we can disregard anything within the dynamic extent(i) of ProcessMessage, ProcessBlock, SendMessages, any RPC command, or AppInit2.

So let's walk the tree.

  • CTxDB users:
    • LoadBlockIndex is called only by AppInit2.
    • CheckDiskBlocks is called only by CTxDB::LoadBlockIndex, which is called by AppInit2 and LoadBlockIndex (above).
    • CWallet::ReacceptWalletTransactions is called by AppInit2 and RPC.
    • CWalletTx::RelayWalletTransaction() (i.e. the no-argument form) is called by CWallet::CommitTransaction which holds cs_main.
    • CWallet::ResendWalletTransactions is called only by ResendWalletTransactions, which is called only by SendMessages.
    • CWallet::CreateTransaction holds cs_main.
    • CTransaction::ReadFromDisk(COutPoint) is unused: remove it.
    • CMerkleTx::SetMerkleBranch(NULL) is called only by CWalletTx::AddSupportingTransactions, which is called only by CWallet::CreateTransaction. Worth noting that this means we have one CTxDB created within the lifetime of another, though both are read-only.
    • CTransaction::AcceptToMemoryPool(bool, bool*) is called by RPC and ProcessMessage. Not to be confused with...
    • CMerkleTx::AcceptToMemoryPool() is called only by CWallet::CommitTransaction (above).
    • CWalletTx::AcceptWalletTransaction() is unused: remove it.
    • InvalidChainFound is called only by CBlock::SetBestChain, which is called by CBlock::AddToBlockIndex (below) and CheckDiskBlocks, which is called only by CTxDB::LoadBlockIndex (above). This time, we have a read-write CTxDB created within the lifetime of another.
    • CBlock::AddToBlockIndex is called by LoadBlockIndex (above) and CBlock::AcceptBlock, which is called by ProcessBlock.
    • CreateNewBlock holds cs_main.
  • CAddrDB users:
    • AddAddress is called by AppInit2 and ProcessMessage.
    • AddressCurrentlyConnected is called only by ProcessMessage.
    • LoadAddresses is called only by AppInit2.
  • CWalletDB users:
    • WriteSetting is called by RPC and GenerateBitcoins, which is called by RPC and StartNode, which is a secondary thread: possible trouble.
    • GetAccountAddress is called only by RPC.
    • GetAccountBalance is called only by RPC.
    • CWallet::SetBestChain is called only by SetBestChain, which is called only by CBlock::SetBestChain (above).
    • CWallet::AddKey is called by RPC only.
    • CWallet::AddCryptedKey is called through virtual dispatch(ii) by CCryptoKeyStore::AddKey and CCryptoKeyStore::EncryptKeys.
      • CCryptoKeyStore::AddKey is called only by CWallet::LoadKey (which would seem to contradict the associated comment "Adds a key to the store, without saving it to disk"), which is called only by CWalletDB::LoadWallet, which is called only by CWallet::LoadWallet (below). What a tangle! And here we've got a read-write CWalletDB used within the lifetime of another.
      • CCryptoKeyStore::EncryptKeys is called only by CWallet::EncryptWallet (below).
    • CWallet::ChangeWalletPassphrase is called only by RPC.
    • CWallet::EncryptWallet is called only by RPC.
    • CWallet::EraseFromWallet is called only by EraseFromWallets, which is called only by CTransaction::AcceptToMemoryPool(CTxDB&, bool, bool*). That explodes into a larger pile but rather than expanding on it, simply observe that since it receives a CTxDB argument it must fit within what we've already covered on that branch.
    • CWalletTx::WriteToDisk is called by CWallet::WalletUpdateSpent, CWallet::AddToWallet, CWallet::ReacceptWalletTransactions (above), and CWallet::CommitTransaction (below) - which means a read-write CWalletDB inside a read-only CWalletDB.
      • CWallet::WalletUpdateSpent is called by CWallet::AddToWallet (below) and CWallet::AddToWalletIfInvolvingMe, which is called by CWallet::ScanForWalletTransactions and SyncWithWallets.
        • CWallet::ScanForWalletTransactions is called by RPC, CWallet::ReacceptWalletTransactions (above), and AppInit2.
        • SyncWithWallets is called by RPC, ProcessMessage, and CBlock::ConnectBlock, which is covered because it receives a CTxDB.
      • CWallet::AddToWallet is called by CWallet::AddToWalletIfInvolvingMe (above) and CWallet::CommitTransaction (below).
    • CWallet::CommitTransaction holds cs_main.
    • CWallet::LoadWallet is called only by AppInit2.
    • CWallet::SetAddressBookName is called by CWallet::AddToWallet (above), CWallet::LoadWallet (above), RPC, and GetAccountAddress, which is called only by RPC.
    • CWallet::DelAddressBookName is unused: remove it.
    • CWallet::SetDefaultKey is called by CWallet::AddToWallet (above) and CWallet::LoadWallet (above).
    • CWallet::NewKeyPool is called only by CWallet::EncryptWallet (above).
    • CWallet::TopUpKeyPool is called by RPC, CWallet::ReserveKeyFromKeyPool (below), and ThreadTopUpKeyPool: possible trouble.
    • CWallet::ReserveKeyFromKeyPool is called by CWallet::GetKeyFromPool, CWallet::GetOldestKeyPoolTime, and CReserveKey::GetReservedKey.
      • CWallet::GetKeyFromPool is called by RPC, GetAccountAddress, CWallet::AddToWallet, and CWallet::LoadWallet (all above).
      • CWallet::GetOldestKeyPoolTime is called only by RPC.
      • CReserveKey::GetReservedKey is called by CWallet::CreateTransaction (above) and CreateNewBlock, which is called by RPC and BitcoinMiner and holds cs_main but not at the point of the GetReservedKey call: possible trouble.
    • CWallet::KeepKey is called by CWallet::GetKeyFromPool (above) and CReserveKey::KeepKey, which is called by CWallet::CommitTransaction (above) and CheckWork, which holds cs_main.
  • CDB or dbenv users:
    • EnvShutdown is called by DBFlush(true) (below) and CDBInit::~CDBInit, which is presumably called by the runtime machinery on exit from the main thread or some such: possible trouble.
    • The static CDB::Rewrite is called by CWallet::EncryptWallet and CWallet::LoadWallet (both above).
    • DBFlush is called by Shutdown: possible trouble.
    • ThreadFlushWalletDB: possible trouble.
    • BackupWallet is called only by RPC.

I'd say it's safe to conclude that 1) the wallet code/labyrinth is becoming much more of a liability than an asset and 2) a number of non-obvious trouble spots came to light through the exhaustive approach. Indeed there are a few of them that the patches I wrote and tested so far don't cover. So I'm delayed again in putting out the fix, but barring any major surprises, it's still coming soon.

  1. This term comes at least from the Scheme literature; it's a bit like the static "scope" but extended to cover also anything called by a given function, i.e. everywhere that execution proceeds between entry to that function and exit from it, whether by normal return, exception or other non-local exit. [^]
  2. So much for the possibility of automated static call graph analysis. [^]

1 Comment »

  1. [...] diagnosing and measuring the problem, considering options, and mapping the minefield, all that remains is to Make It So. Given the state of the codebase involved, this too explodes [...]

    Pingback by Dropping BDB locking, bitcoind finally follows the Bitcoin protocol « Fixpoint — 2024-03-22 @ 05:34

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by MP-WP. Copyright Jacob Welsh.