Table of Contents
Berkeley DB includes support for building highly available applications based on replication. Berkeley DB replication groups consist of some number of independently configured database environments. There is a single master database environment and one or more client database environments. Master environments support both database reads and writes; client environments support only database reads. If the master environment fails, applications may upgrade a client to be the new master. The database environments might be on separate computers, on separate hardware partitions in a non-uniform memory access (NUMA) system, or on separate disks in a single server. As always with Berkeley DB environments, any number of concurrent processes or threads may access a database environment. In the case of a master environment, any number of threads of control may read and write the environment, and in the case of a client environment, any number of threads of control may read the environment.
Applications may be written to provide various degrees of consistency between the master and clients. The system can be run synchronously such that replicas are guaranteed to be up-to-date with all committed transactions, but doing so may incur a significant performance penalty. Higher performance solutions sacrifice total consistency, allowing the clients to be out of date for an application-controlled amount of time.
There are two ways to build replicated applications. The simpler way is to use the Berkeley DB Replication Manager. The Replication Manager provides a standard communications infrastructure, and it creates and manages the background threads needed for processing replication messages. (Note that in Replication Manager applications, all updates to databases at the master environment must be done through a single DB_ENV environment handle, though they may occur in multiple threads. This of course means that only a single process may update data.)
The Replication Manager implementation is based on TCP/IP sockets, and uses POSIX 1003.1 style networking and thread support. (On Windows systems, it uses standard Windows thread support.) As a result, it is not as portable as the rest of the Berkeley DB library itself.
The alternative is to use the lower-level replication "Base APIs". This approach affords more flexibility, but requires the application to provide some critical components:
(Note that Replication Manager does not provide wire security for replication messages.)
The following pages present various programming considerations, many of which are directly relevant only for Base API applications. However, even when using Replication Manager it is important to understand the concepts.
Finally, the Berkeley DB replication implementation has one other additional feature to increase application reliability. Replication in Berkeley DB is implemented to perform database updates using a different code path than the standard ones. This means operations that manage to crash the replication master due to a software bug will not necessarily also crash replication clients.
Replication Manager Methods | Description |
---|---|
DB_ENV->repmgr_add_remote_site() | Specify the Replication Manager's remote sites |
DB_ENV->repmgr_set_ack_policy() | Specify the Replication Manager's client acknowledgement policy |
DB_ENV->repmgr_set_local_site() | Specify the Replication Manager's local site |
DB_ENV->repmgr_site_list() | List the sites and their status |
DB_ENV->repmgr_start() | Start the Replication Manager |
DB_ENV->repmgr_stat() | Replication Manager statistics |
DB_ENV->repmgr_stat_print() | Print Replication Manager statistics |
Base API Methods | |
DB_ENV->rep_elect() | Hold a replication election |
DB_ENV->rep_process_message() | Process a replication message |
DB_ENV->rep_set_transport() | Configure replication transport callback |
DB_ENV->rep_start() | Start replication |
Additional Replication Methods | |
DB_ENV->rep_stat() | Replication statistics |
DB_ENV->rep_stat_print() | Print replication statistics |
DB_ENV->rep_sync() | Replication synchronization |
Replication Configuration | |
DB_ENV->rep_set_clockskew() | Configure master lease clock adjustment |
DB_ENV->rep_set_config() | Configure the replication subsystem |
DB_ENV->rep_set_limit() | Limit data sent in response to a single message |
DB_ENV->rep_set_nsites() | Configure replication group site count |
DB_ENV->rep_set_priority() | Configure replication site priority |
DB_ENV->rep_set_request() | Configure replication client retransmission requests |
DB_ENV->rep_set_timeout() | Configure replication timeouts |