.. toctree:: :maxdepth: 1 :hidden: :titlesonly: moving-a-cluster
This document covers the administration of an existing FoundationDB cluster. We recommend you read this document before setting up a cluster for performance testing or production use.
Note
In FoundationDB, a "cluster" refers to one or more FoundationDB processes spread across one or more physical machines that together host a FoundationDB database.
To administer an externally accessible cluster, you need to understand basic system tasks. You should begin with how to :ref:`start and stop the database <administration-running-foundationdb>`. Next, you should review management of a cluster, including :ref:`adding <adding-machines-to-a-cluster>` and :ref:`removing <removing-machines-from-a-cluster>` machines, and monitoring :ref:`cluster status <administration-monitoring-cluster-status>` and the basic :ref:`server processes <administration_fdbmonitor>`. You should be familiar with :ref:`managing trace files <administration-managing-trace-files>` and :ref:`other administrative concerns <administration-other-administrative-concerns>`. Finally, you should know how to :ref:`uninstall <administration-removing>` or :ref:`upgrade <upgrading-foundationdb>` the database.
After installation, FoundationDB is set to start automatically. You can manually start and stop the database with the commands shown below.
These commands start and stop the master fdbmonitor
process, which in turn starts fdbserver
and backup-agent
processes. See :ref:`administration_fdbmonitor` for details.
On Linux, FoundationDB is started and stopped using the service
command as follows:
user@host$ sudo service foundationdb start user@host$ sudo service foundationdb stop
On Ubuntu, it can be prevented from starting at boot as follows (without stopping the service):
user@host$ sudo update-rc.d foundationdb disable
On RHEL/CentOS, it can be prevented from starting at boot as follows (without stopping the service):
user@host$ sudo chkconfig foundationdb off
On macOS, FoundationDB is started and stopped using launchctl
as follows:
host:~ user$ sudo launchctl load /Library/LaunchDaemons/com.foundationdb.fdbmonitor.plist host:~ user$ sudo launchctl unload /Library/LaunchDaemons/com.foundationdb.fdbmonitor.plist
It can be stopped and prevented from starting at boot as follows:
host:~ user$ sudo launchctl unload -w /Library/LaunchDaemons/com.foundationdb.fdbmonitor.plist
FoundationDB servers and clients use a cluster file (usually named fdb.cluster
) to connect to a cluster. The contents of the cluster file are the same for all processes that connect to the cluster. An fdb.cluster
file is created automatically when you install a FoundationDB server and updated automatically when you :ref:`change coordination servers <configuration-choosing-coordination-servers>`. To connect to a cluster from a client machine, you will need access to a copy of the cluster file used by the servers in the cluster. Typically, you will copy the fdb.cluster
file from the :ref:`default location <default-cluster-file>` on a FoundationDB server to the default location on each client.
Warning
This file should not normally be modified manually. To change coordination servers, see :ref:`configuration-choosing-coordination-servers`.
When you initially install FoundationDB, a default fdb.cluster
file will be placed at a system-dependent location:
- Linux:
/etc/foundationdb/fdb.cluster
- macOS:
/usr/local/etc/foundationdb/fdb.cluster
All FoundationDB components can be configured to use a specified cluster file:
- The
fdbcli
tool allows a cluster file to be passed on the command line using the-C
option. - The :doc:`client APIs <api-reference>` allow a cluster file to be passed when connecting to a cluster, usually via
open()
orcreate_cluster()
. - A FoundationDB server or
backup-agent
allow a cluster file to be specified in :ref:`foundationdb.conf <foundationdb-conf>`.
In addition, FoundationDB allows you to use the environment variable FDB_CLUSTER_FILE
to specify a cluster file. This approach is helpful if you operate or access more than one cluster.
All FoundationDB components will determine a cluster file in the following order:
- An explicitly provided file, whether a command line argument using
-C
or an argument to an API function, if one is given; - The value of the
FDB_CLUSTER_FILE
environment variable, if it has been set; - An
fdb.cluster
file in the current working directory, if one is present; - The :ref:`default file <default-cluster-file>` at its system-dependent location.
This automatic determination of a cluster file makes it easy to write code using FoundationDB without knowing exactly where it will be installed or what database it will need to connect to.
Warning
A cluster file must have the :ref:`required permissions <cluster_file_permissions>` in order to be used.
Warning
If FDB_CLUSTER_FILE
is read and has been set to an invalid value (such as an empty value, a file that does not exist, or a file that is not a valid cluster file), an error will result. FoundationDB will not fall back to another file.
FoundationDB servers and clients require read and write access to the cluster file and its parent directory. This is because certain administrative changes to the cluster configuration (see :ref:`configuration-choosing-coordination-servers`) can cause this file to be automatically modified by all servers and clients using the cluster. If a FoundationDB process cannot update the cluster file, it may eventually become unable to connect to the cluster.
The cluster file contains a connection string consisting of a cluster identifier and a comma-separated list of IP addresses (not hostnames) specifying the coordination servers. The format for the file is:
description:ID@IP:PORT,IP:PORT,...
Together the description
and the ID
should uniquely identify a FoundationDB cluster.
A cluster file may contain comments, marked by the #
character. All characters on a line after the first occurrence of a #
will be ignored.
Generally, a cluster file should not be modified manually. Incorrect modifications after a cluster is created could result in data loss. To change the set of coordination servers used by a cluster, see :ref:`configuration-choosing-coordination-servers`. To change the cluster description
, see :ref:`configuration-setting-cluster-description`.
It is very important that each cluster use a unique random ID. If multiple processes use the same database description and ID but different sets of coordination servers, data corruption could result.
Any client connected to FoundationDB can access information about its cluster file directly from the database:
- To get the path to the cluster file, read the key
\xFF\xFF/cluster_file_path
. - To get the contents of the cluster file, read the key
\xFF\xFF/connection_string
.
Warning
You can add new machines to a cluster at any time:
:doc:`Install FoundationDB <getting-started-linux>` on the new machine.
Copy an :ref:`existing cluster file <specifying-a-cluster-file>` from a server in your cluster to the new machine, overwriting the existing
fdb.cluster
file.Restart FoundationDB on the new machine so that it uses the new cluster file:
user@host2$ sudo service foundationdb restart
If you have previously :ref:`excluded <removing-machines-from-a-cluster>` a machine from the cluster, you will need to take it off the exclusion list using the
include <ip>
command of fdbcli before it can be a full participant in the cluster.Note
Addresses have the form
IP
:PORT
. This form is used even if TLS is enabled.
To temporarily or permanently remove one or more machines from a FoundationDB cluster without compromising fault tolerance or availability, perform the following steps:
Make sure that your current redundancy mode will still make sense after removing the machines you want to remove. For example, if you are currently using
triple
redundancy and are reducing the number of servers to fewer than five, you should probably switch to a lower redundancy mode first. See :ref:`configuration-choosing-redundancy-mode`.If any of the machines that you would like to remove is a coordinator, you should :ref:`change coordination servers <configuration-changing-coordination-servers>` to a set of servers that you will not be removing. Remember that even after changing coordinators, the old coordinators need to remain available until all servers and clients of the cluster have automatically updated their cluster files.
Use the
exclude
command infdbcli
on the machines you plan to remove:user@host1$ fdbcli Using cluster file `/etc/foundationdb/fdb.cluster'. The database is available. Welcome to the fdbcli. For help, type `help'. fdb> exclude 1.2.3.4 1.2.3.5 1.2.3.6 Waiting for state to be removed from all excluded servers. This may take a while. It is now safe to remove these machines or processes from the cluster.
exclude
can be used to exclude either machines (by specifying an IP address) or individual processes (by specifying anIP
:PORT
pair).Note
Addresses have the form
IP
:PORT
. This form is used even if TLS is enabled.Excluding a server doesn't shut it down immediately; data on the machine is first moved away. When the
exclude
command completes successfully (by returning control to the command prompt), the machines that you specified are no longer required to maintain the configured redundancy mode. A large amount of data might need to be transferred first, so be patient. When the process is complete, the excluded machine or process can be shut down without fault tolerance or availability consequences.If you interrupt the exclude command with Ctrl-C after seeing the "waiting for state to be removed" message, the exclusion work will continue in the background. Repeating the command will continue waiting for the exclusion to complete. To reverse the effect of the
exclude
command, use theinclude
command.On each removed machine, stop the FoundationDB server and prevent it from starting at the next boot. Follow the :ref:`instructions for your platform <administration-running-foundationdb>`. For example, on Ubuntu:
user@host3$ sudo service foundationdb stop user@host3$ sudo update-rc.d foundationdb disable
:ref:`test-the-database` to double check that everything went smoothly, paying particular attention to the replication health.
You can optionally :ref:`uninstall <administration-removing>` the FoundationDB server package entirely and/or delete database files on removed servers.
If you ever want to add a removed machine back to the cluster, you will have to take it off the excluded servers list to which it was added in step 3. This can be done using the
include
command offdbcli
. Typingexclude
with no parameters will tell you the current list of excluded machines.
The procedures for adding and removing machines can be combined into a recipe for :doc:`moving an existing cluster to new machines <moving-a-cluster>`.
A FoundationDB cluster has the option of supporting :doc:`Transport Layer Security (TLS) <tls>`. To enable TLS on an existing, non-TLS cluster, see :ref:`Converting a running cluster <converting-existing-cluster>`.
Use the status
command of fdbcli
to determine if the cluster is up and running:
user@host$ fdbcli Using cluster file `/etc/foundationdb/fdb.cluster'. The database is available. Welcome to the fdbcli. For help, type `help'.
fdb> status
- Configuration:
- Redundancy mode - triple Storage engine - ssd-2 Coordinators - 5 Desired Proxies - 5 Desired Logs - 8
- Cluster:
- FoundationDB processes - 272 Machines - 16 Memory availability - 14.5 GB per process on machine with least available Retransmissions rate - 20 Hz Fault Tolerance - 2 machines Server time - 03/19/18 08:51:52
- Data:
- Replication health - Healthy Moving data - 0.000 GB Sum of key-value sizes - 3.298 TB Disk space used - 15.243 TB
- Operating space:
- Storage server - 1656.2 GB free on most full server Log server - 1794.7 GB free on most full server
- Workload:
- Read rate - 55990 Hz Write rate - 14946 Hz Transactions started - 6321 Hz Transactions committed - 1132 Hz Conflict rate - 0 Hz
- Backup and DR:
- Running backups - 1 Running DRs - 1 as primary
Client time: 03/19/18 08:51:51
The summary fields are interpreted as follows:
Redundancy mode | The currently configured redundancy mode (see the section :ref:`configuration-choosing-redundancy-mode`) |
Storage engine | The currently configured storage engine (see the section :ref:`configuration-configuring-storage-subsystem`) |
Coordinators | The number of FoundationDB coordination servers |
Desired Proxies | Number of proxies desired. If replication mode is 3 then default number of proxies is 3 |
Desired Logs | Number of logs desired. If replication mode is 3 then default number of logs is 3 |
FoundationDB processes | Number of FoundationDB processes participating in the cluster |
Machines | Number of physical machines running at least one FoundationDB process that is participating in the cluster |
Memory availability | RAM per process on machine with least available (see details below) |
Retransmissions rate | Ratio of retransmitted packets to the total number of packets. |
Fault tolerance | Maximum number of machines that can fail without losing data or availability (number for losing data will be reported separately if lower) |
Server time | Timestamp from the server |
Replication health | A qualitative estimate of the health of data replication |
Moving data | Amount of data currently in movement between machines |
Sum of key-value sizes | Estimated total size of keys and values stored (not including any overhead or replication) |
Disk space used | Overall disk space used by the cluster |
Storage server | Free space for storage on the server with least available. For ssd storage engine, includes only disk; for memory storage engine, includes both RAM and disk. |
Log server | Free space for log server on the server with least available. |
Read rate | The current number of reads per second |
Write rate | The current number of writes per second |
Transaction started | The current number of transactions started per second |
Transaction committed | The current number of transactions committed per second |
Conflict rate | The current number of conflicts per second |
Running backups | Number of backups currently running. Different backups could be backing up to different prefixes and/or to different targets. |
Running DRs | Number of DRs currently running. Different DRs could be streaming different prefixes and/or to different DR clusters. |
The "Memory availability" is a conservative estimate of the minimal RAM available to any fdbserver
process across all machines in the cluster. This value is calculated in two steps. Memory available per process is first calculated for each machine by taking:
availability = ((total - committed) + sum(processSize)) / processes
where:
total | total RAM on the machine |
committed | committed RAM on the machine |
processSize | total physical memory used by a given fdbserver process |
processes | number of fdbserver processes on the machine |
The reported value is then the minimum of memory available per process over all machines in the cluster. If this value is below 4.0 GB, a warning message is added to the status report.
The status
command can provide detailed statistics about the cluster and the database by giving it the details
argument:
user@host$ fdbcli Using cluster file `/etc/foundationdb/fdb.cluster'. The database is available. Welcome to the fdbcli. For help, type `help'. fdb> status details
- Configuration:
- Redundancy mode - triple Storage engine - ssd-2 Coordinators - 5
- Cluster:
- FoundationDB processes - 85 Machines - 5 Memory availability - 7.4 GB per process on machine with least available Retransmissions rate - 5 Hz Fault Tolerance - 2 machines Server time - 03/19/18 08:59:37
- Data:
- Replication health - Healthy Moving data - 0.000 GB Sum of key-value sizes - 87.068 GB Disk space used - 327.819 GB
- Operating space:
- Storage server - 888.2 GB free on most full server Log server - 897.3 GB free on most full server
- Workload:
- Read rate - 117 Hz Write rate - 0 Hz Transactions started - 43 Hz Transactions committed - 1 Hz Conflict rate - 0 Hz
- Process performance details:
- 10.0.4.1:4500 ( 2% cpu; 2% machine; 0.010 Gbps; 0% disk IO; 3.2 GB / 7.4 GB RAM ) 10.0.4.1:4501 ( 1% cpu; 2% machine; 0.010 Gbps; 3% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.1:4502 ( 2% cpu; 2% machine; 0.010 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.1:4503 ( 0% cpu; 2% machine; 0.010 Gbps; 0% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.1:4504 ( 0% cpu; 2% machine; 0.010 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.1:4505 ( 2% cpu; 2% machine; 0.010 Gbps; 0% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.1:4506 ( 2% cpu; 2% machine; 0.010 Gbps; 0% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.1:4507 ( 2% cpu; 2% machine; 0.010 Gbps; 0% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.1:4508 ( 2% cpu; 2% machine; 0.010 Gbps; 1% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.1:4509 ( 2% cpu; 2% machine; 0.010 Gbps; 1% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.1:4510 ( 1% cpu; 2% machine; 0.010 Gbps; 1% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.1:4511 ( 0% cpu; 2% machine; 0.010 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.1:4512 ( 0% cpu; 2% machine; 0.010 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.1:4513 ( 0% cpu; 2% machine; 0.010 Gbps; 0% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.1:4514 ( 0% cpu; 2% machine; 0.010 Gbps; 0% disk IO; 0.2 GB / 7.4 GB RAM ) 10.0.4.1:4515 ( 12% cpu; 2% machine; 0.010 Gbps; 0% disk IO; 0.2 GB / 7.4 GB RAM ) 10.0.4.1:4516 ( 0% cpu; 2% machine; 0.010 Gbps; 0% disk IO; 0.3 GB / 7.4 GB RAM ) 10.0.4.2:4500 ( 2% cpu; 3% machine; 0.124 Gbps; 0% disk IO; 3.2 GB / 7.4 GB RAM ) 10.0.4.2:4501 ( 15% cpu; 3% machine; 0.124 Gbps; 19% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.2:4502 ( 2% cpu; 3% machine; 0.124 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.2:4503 ( 2% cpu; 3% machine; 0.124 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.2:4504 ( 2% cpu; 3% machine; 0.124 Gbps; 1% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.2:4505 ( 18% cpu; 3% machine; 0.124 Gbps; 18% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.2:4506 ( 2% cpu; 3% machine; 0.124 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.2:4507 ( 2% cpu; 3% machine; 0.124 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.2:4508 ( 2% cpu; 3% machine; 0.124 Gbps; 19% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.2:4509 ( 0% cpu; 3% machine; 0.124 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.2:4510 ( 0% cpu; 3% machine; 0.124 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.2:4511 ( 2% cpu; 3% machine; 0.124 Gbps; 1% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.2:4512 ( 2% cpu; 3% machine; 0.124 Gbps; 19% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.2:4513 ( 0% cpu; 3% machine; 0.124 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.2:4514 ( 0% cpu; 3% machine; 0.124 Gbps; 0% disk IO; 0.2 GB / 7.4 GB RAM ) 10.0.4.2:4515 ( 11% cpu; 3% machine; 0.124 Gbps; 0% disk IO; 0.2 GB / 7.4 GB RAM ) 10.0.4.2:4516 ( 0% cpu; 3% machine; 0.124 Gbps; 0% disk IO; 0.6 GB / 7.4 GB RAM ) 10.0.4.3:4500 ( 14% cpu; 3% machine; 0.284 Gbps; 26% disk IO; 3.0 GB / 7.4 GB RAM ) 10.0.4.3:4501 ( 2% cpu; 3% machine; 0.284 Gbps; 0% disk IO; 2.8 GB / 7.4 GB RAM ) 10.0.4.3:4502 ( 2% cpu; 3% machine; 0.284 Gbps; 0% disk IO; 2.8 GB / 7.4 GB RAM ) 10.0.4.3:4503 ( 2% cpu; 3% machine; 0.284 Gbps; 0% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.3:4504 ( 7% cpu; 3% machine; 0.284 Gbps; 12% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.3:4505 ( 2% cpu; 3% machine; 0.284 Gbps; 0% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.3:4506 ( 2% cpu; 3% machine; 0.284 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.3:4507 ( 2% cpu; 3% machine; 0.284 Gbps; 26% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.3:4508 ( 2% cpu; 3% machine; 0.284 Gbps; 0% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.3:4509 ( 2% cpu; 3% machine; 0.284 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.3:4510 ( 2% cpu; 3% machine; 0.284 Gbps; 0% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.3:4511 ( 2% cpu; 3% machine; 0.284 Gbps; 12% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.3:4512 ( 2% cpu; 3% machine; 0.284 Gbps; 3% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.3:4513 ( 2% cpu; 3% machine; 0.284 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.3:4514 ( 0% cpu; 3% machine; 0.284 Gbps; 0% disk IO; 0.1 GB / 7.4 GB RAM ) 10.0.4.3:4515 ( 0% cpu; 3% machine; 0.284 Gbps; 0% disk IO; 0.1 GB / 7.4 GB RAM ) 10.0.4.3:4516 ( 0% cpu; 3% machine; 0.284 Gbps; 0% disk IO; 0.1 GB / 7.4 GB RAM ) 10.0.4.4:4500 ( 2% cpu; 4% machine; 0.065 Gbps; 0% disk IO; 3.2 GB / 7.4 GB RAM ) 10.0.4.4:4501 ( 2% cpu; 4% machine; 0.065 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.4:4502 ( 0% cpu; 4% machine; 0.065 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.4:4503 ( 2% cpu; 4% machine; 0.065 Gbps; 16% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.4:4504 ( 2% cpu; 4% machine; 0.065 Gbps; 0% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.4:4505 ( 0% cpu; 4% machine; 0.065 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.4:4506 ( 0% cpu; 4% machine; 0.065 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.4:4507 ( 2% cpu; 4% machine; 0.065 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.4:4508 ( 0% cpu; 4% machine; 0.065 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.4:4509 ( 2% cpu; 4% machine; 0.065 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.4:4510 ( 24% cpu; 4% machine; 0.065 Gbps; 15% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.4:4511 ( 2% cpu; 4% machine; 0.065 Gbps; 0% disk IO; 2.8 GB / 7.4 GB RAM ) 10.0.4.4:4512 ( 2% cpu; 4% machine; 0.065 Gbps; 0% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.4:4513 ( 0% cpu; 4% machine; 0.065 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.4:4514 ( 0% cpu; 4% machine; 0.065 Gbps; 1% disk IO; 0.2 GB / 7.4 GB RAM ) 10.0.4.4:4515 ( 0% cpu; 4% machine; 0.065 Gbps; 1% disk IO; 0.2 GB / 7.4 GB RAM ) 10.0.4.4:4516 ( 0% cpu; 4% machine; 0.065 Gbps; 1% disk IO; 0.6 GB / 7.4 GB RAM ) 10.0.4.5:4500 ( 6% cpu; 2% machine; 0.076 Gbps; 7% disk IO; 3.2 GB / 7.4 GB RAM ) 10.0.4.5:4501 ( 2% cpu; 2% machine; 0.076 Gbps; 19% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.5:4502 ( 1% cpu; 2% machine; 0.076 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.5:4503 ( 0% cpu; 2% machine; 0.076 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.5:4504 ( 2% cpu; 2% machine; 0.076 Gbps; 0% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.5:4505 ( 2% cpu; 2% machine; 0.076 Gbps; 0% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.5:4506 ( 0% cpu; 2% machine; 0.076 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.5:4507 ( 2% cpu; 2% machine; 0.076 Gbps; 6% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.5:4508 ( 31% cpu; 2% machine; 0.076 Gbps; 8% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.5:4509 ( 0% cpu; 2% machine; 0.076 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.5:4510 ( 2% cpu; 2% machine; 0.076 Gbps; 0% disk IO; 2.7 GB / 7.4 GB RAM ) 10.0.4.5:4511 ( 2% cpu; 2% machine; 0.076 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.5:4512 ( 2% cpu; 2% machine; 0.076 Gbps; 0% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.5:4513 ( 0% cpu; 2% machine; 0.076 Gbps; 3% disk IO; 2.6 GB / 7.4 GB RAM ) 10.0.4.5:4514 ( 0% cpu; 2% machine; 0.076 Gbps; 0% disk IO; 0.2 GB / 7.4 GB RAM ) 10.0.4.5:4515 ( 0% cpu; 2% machine; 0.076 Gbps; 0% disk IO; 0.2 GB / 7.4 GB RAM ) 10.0.4.5:4516 ( 0% cpu; 2% machine; 0.076 Gbps; 0% disk IO; 0.6 GB / 7.4 GB RAM )
- Coordination servers:
- 10.0.4.1:4500 (reachable) 10.0.4.2:4500 (reachable) 10.0.4.3:4500 (reachable) 10.0.4.4:4500 (reachable) 10.0.4.5:4500 (reachable)
Client time: 03/19/18 08:59:37 Several details about individual FoundationDB processes are displayed in a list format in parenthesis after the IP address and port:
cpu | CPU utilization of the individual process |
machine | CPU utilization of the machine the process is running on (over all cores) |
Gbps | Total input + output network traffic, in Gbps |
disk IO | Percentage busy time of the disk subsystem on which the data resides |
REXMIT! | Displayed only if there have been more than 10 TCP segments retransmitted in last 5s |
RAM | Total physical memory used by process / memory available per process |
In certain cases, FoundationDB's overall performance can be negatively impacted by an individual slow or degraded computer or subsystem. If you suspect this is the case, this detailed list is helpful to find the culprit.
If a process has had more than 10 TCP segments retransmitted in the last 5 seconds, the warning message REXMIT!
is displayed between its disk and RAM values, leading to an output under Process performance details
of the form:
10.0.4.1:4500 ( 3% cpu; 2% machine; 0.004 Gbps; 0% disk; REXMIT! 2.5 GB / 4.1 GB RAM )
The core FoundationDB server process is fdbserver
. Each fdbserver
process uses up to one full CPU core, so a production FoundationDB cluster will usually run N such processes on an N-core system.
To make configuring, starting, stopping, and restarting fdbserver
processes easy, FoundationDB also comes with a singleton daemon process, fdbmonitor
, which is started automatically on boot. fdbmonitor
reads the :ref:`foundationdb.conf <foundationdb-conf>` file and starts the configured set of fdbserver
processes. It is also responsible for starting backup-agent
.
During normal operation, fdbmonitor
is transparent, and you interact with it only by modifying the configuration in :ref:`foundationdb.conf <foundationdb-conf>` and perhaps occasionally by :ref:`starting and stopping <administration-running-foundationdb>` it manually. If some problem prevents an fdbserver
or backup-agent
process from starting or causes it to stop unexpectedly, fdbmonitor
will log errors to the system log.
If kill_on_configuration_change parameter is unset or set to true in foundationdb.conf then fdbmonitor will restart on changes automatically. If this parameter is set to false it will not restart on changes.
By default, trace files are output to:
/var/log/foundationdb/
on Linux/usr/local/foundationdb/logs/
on macOS
Trace files are rolled every 10MB. These files are valuable to the FoundationDB development team for diagnostic purposes, and should be retained in case you need support from FoundationDB. Old trace files are automatically deleted so that there are no more than 100 MB worth of trace files per process. Both the log size and the maximum total size of the log files are configurable on a per process basis in the :ref:`configuration file <foundationdb-conf>`.
In the present version of FoundationDB, disaster recovery (DR) is implemented via asynchronous replication of a source cluster to a destination cluster residing in another datacenter. The asynchronous replication updates the destination cluster using transactions consistent with those that have been committed in the source cluster. In this way, the replication process guarantees that the destination cluster is always in a consistent state that matches a present or earlier state of the source cluster.
Recovery takes place by reversing the asynchronous replication, so the data in the destination cluster is streamed back to a source cluster. For further information, see the :ref:`overview of backups <backup-introduction>` and the :ref:`fdbdr tool <fdbdr-intro>` that performs asynchronous replication.
FoundationDB's storage space requirements depend on which storage engine is used.
Using the ssd
storage engine, data is stored in B-trees that add some overhead.
- For key-value pairs larger than about 100 bytes, overhead should usually be less than 2x per replica. In a triple-replicated configuration, the raw capacity required might be 5x the size of the data. However, SSDs often require over-provisioning (e.g. keeping the drive less than 75% full) for best performance, so 7x would be a reasonable number. For example, 100GB of raw key-values would require 700GB of raw capacity.
- For very small key-value pairs, the overhead can be a large factor but not usually more than about 40 bytes per replica. Therefore, with triple replication and SSD over-provisioning, allowing 200 bytes of raw storage capacity for each very small key-value pair would be a reasonable guess. For example, 1 billion very small key-value pairs would require 200GB of raw storage.
Using the memory
storage engine, both memory and disk space need to be considered.
- There is a fixed overhead of 72 bytes of memory for each key-value pair. Furthermore, memory is allocated in chunks whose sizes are powers of 2, leading to a variable padding overhead for each key-value pair. Finally, there is some overhead within memory chunks. For example, a 32 byte chunk has 6 bytes of overhead and therefore can only contain 26 bytes. As a result, a 27-byte key-value pair will be stored in a 64 byte chunk. The absolute amount of overhead within a chunk increases for larger chunks.
- Disk space usage is about 8x the original data size. The memory storage engine interleaves a snapshot on disk with a transaction log, with the resulting snapshot 2x the data size. A snapshot can't be dropped from its log until the next snapshot is completely written, so 2 snapshots must be kept at 4x the data size. The two-file durable queue can't overwrite data in one file until all the data in the other file has been dropped, resulting in 8x the data size. Finally, it should be noted that disk space is not reclaimed when key-value pairs are cleared.
For either storage engine, there is possible additional overhead when running backup or DR. In usual operation, the overhead is negligible but if backup is unable to write or a secondary cluster is unavailable, mutation logs will build up until copying can resume, occupying space in your cluster.
FoundationDB is aware of the free storage space on each node. It attempts to distribute data equally on all the nodes so that no node runs out of space before the others. The database attempts to gracefully stop writes as storage space decreases to 100 MB, refusing to start new transactions with priorities other than SYSTEM_IMMEDIATE
. This lower bound on free space leaves space to allow you to use SYSTEM_IMMEDIATE
transactions to remove data.
The measure of free space depends on the storage engine. For the memory storage engine, which is the default after installation, total space is limited to the lesser of the storage_memory
configuration parameter (1 GB in the default configuration) or a fraction of the free disk space.
If the disk is rapidly filled by other programs, trace files, etc., FoundationDB may be forced to stop with significant amounts of queued writes. The only way to restore the availability of the database at this point is to manually free storage space by deleting files.
Processes running in different VMs on a single machine will appear to FoundationDB as being hardware isolated. FoundationDB takes pains to assure that data replication is protected from hardware-correlated failures. If FoundationDB is run in multiple VMs on a single machine this protection will be subverted. An administrator can inform FoundationDB of this hardware sharing, however, by specifying a machine ID using the locality_machineid
parameter in :ref:`foundationdb.conf <foundationdb-conf>`. All processes on VMs that share hardware should specify the same locality_machineid
.
FoundationDB is datacenter aware and supports operation across datacenters. In a multiple-datacenter configuration, it is recommended that you set the :ref:`redundancy mode <configuration-choosing-redundancy-mode>` to three_datacenter
and that you set the locality_dcid
parameter for all FoundationDB processes in :ref:`foundationdb.conf <foundationdb-conf>`.
If you specify the --datacenter_id
option to any FoundationDB process in your cluster, you should specify it to all such processes. Processes which do not have a specified datacenter ID on the command line are considered part of a default "unset" datacenter. FoundationDB will incorrectly believe that these processes are failure-isolated from other datacenters, which can reduce performance and fault tolerance.
To uninstall FoundationDB from a cluster of one or more machines:
Uninstall the packages on each machine in the cluster.
On Ubuntu use:
user@host$ sudo dpkg -P foundationdb-clients foundationdb-server
On RHEL/CentOS use:
user@host$ sudo rpm -e foundationdb-clients foundationdb-server
On macOS use:
host:~ user$ sudo /usr/local/foundationdb/uninstall-FoundationDB.sh
Delete all the data and configuration files stored by FoundationDB.
- On Linux these will be in
/var/lib/foundationdb/
,/var/log/foundationdb/
, and/etc/foundationdb/
by default. - On macOS these will be in
/usr/local/foundationdb/
and/usr/local/etc/foundationdb/
by default.
- On Linux these will be in
When a FoundationDB package is installed on a machine that already has a previous version, the package will upgrade FoundationDB to the newer version. For recent versions, the upgrade will preserve all previous data and configuration settings. (See the :ref:`notes on specific versions <version-specific-upgrading>` for exceptions.)
To upgrade a FoundationDB cluster, you must install the updated version of FoundationDB on each machine in the cluster. As the installations are taking place, the cluster will become unavailable until a sufficient number of machines have been upgraded. By following the steps below, you can perform a production upgrade with minimal downtime (seconds to minutes) and maintain all database guarantees. The instructions below assume that Linux packages are being used.
Warning
Apart from patch version upgrades, you should install the new client binary on all your clients and restart them to ensure they can reconnect after the upgrade. See :ref:`multi-version-client-api` for more information. Running status json
will show you which versions clients are connecting with so you can verify before upgrading that clients are correctly configured.
Go to :doc:`downloads` and select Ubuntu or RHEL/CentOS, as appropriate for your system. Download both the client and server packages and copy them to each machine in your cluster.
For Ubuntu, perform the upgrade using the dpkg command:
user@host$ sudo dpkg -i |package-deb-clients| \ |package-deb-server|
For RHEL/CentOS, perform the upgrade using the rpm command:
user@host$ sudo rpm -Uvh |package-rpm-clients| \ |package-rpm-server|
The foundationdb-clients
package also installs the :doc:`Python <api-python>` and :doc:`C <api-c>` APIs. If your clients use :doc:`Ruby <api-ruby>`, Java, or Go, follow the instructions in the corresponding language documentation to install the APIs.
Test the database to verify that it is operating normally by running fdbcli
and :ref:`reviewing the cluster status <administration-monitoring-cluster-status>`.
You can now remove old client library versions from your clients. This is only to stop creating unnecessary connections.
Upgrades from 5.0.x will keep all your old data and configuration settings. 5.1 has a new backup format so backups will need to be restarted after upgrading.
Upgrades from 5.0.x will keep all your old data and configuration settings.
Upgrades from 4.6.x will keep all your old data and configuration settings.
Upgrades from 4.5.x will keep all your old data and configuration settings.
Upgrades from 4.4.x will keep all your old data and configuration settings.
Backup and DR must be stopped before upgrading. Upgrades from 4.3.x will keep all your old data and configuration settings.
Backup and DR must be stopped before upgrading. Upgrades from 4.2.x will keep all your old data and configuration settings.
Backup and DR must be stopped before upgrading. Upgrades from 4.1.x will keep all your old data and configuration settings.
Backup and DR must be stopped before upgrading. Upgrades from 4.0.x will keep all your old data and configuration settings.
To upgrade from versions prior to 4.0, you should first upgrade to 4.0 and then to the current version.
Upgrades from versions older than 3.0.0 are no longer supported. To upgrade from an older version, first upgrade to 4.0.x, then upgrade to the desired version.