The Cluster version provides four deployment methods, which can be selected according to your actual situation:
- Install Cluster Step By Step Chinese guide
- Install AllinOne Chinese guide
- Install Exchange Step By Step Chinese guide
thirdparty:
- Hadoop+Spark Deployment Chinese guide
In a party, FATE (Federated AI Technology Enabler) has the following modules. Specific module information is as follows:
Module Name | Port of module | Module function |
---|---|---|
fate_flow | 9360;9380 | Federated learning pipeline management module, there is only one service for each party |
fateboard | 8080 | Federated learning process visualization module, only one service needs to be deployed per party |
clustermanager | 4670 | The cluster manager manages the cluster, only one instance needs to be deployed per party |
nodemanger | 4671 | Node manager manages the resources of each machine, each party can have multiple of this service, but a server can only have one |
rollsite | 9370 | Cross-site or cross-party communication components, equivalent to proxy + federation, each party has only one service |
mysql | 3306 | Data storage, clustermanager and fateflow dependency, each party only needs one service |
Example deployment in two parties
The following configuration information is for one-sided server configuration. If there are multiple parties, please refer to this configuration to replicate this environment:
Server | |
---|---|
Quantity | 1 or more than 1 (according to the actual server allocation module provided) |
Configuration | 8 core / 16G memory / 500G hard disk / 10M bandwidth |
Operating System | Version: CentOS Linux release 7.2 |
Dependency Package | yum source gcc gcc-c++ make autoconfig openssl-devel supervisor gmp-devel mpfr-devel libmpc-devel libaio numactl autoconf automake libtool libffi-dev |
Users | User: app owner:apps (app user can sudo su root without password) |
File System | 1. The 500G hard disk is mounted to the /data directory. 2. Create /data/projects directory, projects directory belongs to app:apps. |
party | partyid | hostname | IP | os | software | services |
---|---|---|---|---|---|---|
PartyA | 10000 | VM_0_1_centos | 192.168.0.1 | CentOS 7.2/Ubuntu 16.04 | fate, eggroll, mysql | fate_flow, fateboard, clustermanager, nodemanger, mysql |
PartyA | 10000 | VM_0_2_centos | 192.168.0.2 | CentOS 7.2/Ubuntu 16.04 | fate, eggroll | nodemanger, rollsite |
PartyB | 9999 | VM_0_3_centos | 192.168.0.3 | CentOS 7.2/Ubuntu 16.04 | fate, eggroll, mysql | all |
1) Modify the host name
Run under the 192.168.0.1 root user:
hostnamectl set-hostname VM_0_1_centos
Run under the 192.168.0.2 root user::
hostnamectl set-hostname VM_0_2_centos
Run under the 192.168.0.3 root user:
hostnamectl set-hostname VM_0_3_centos
2) Add Host Mapping
Execute under the root user of the target server (192.168.0.1 192.168.0.2 192.168.0.3):
vim /etc/hosts
192.168.0.1 VM_0_1_centos
192.168.0.2 VM_0_2_centos
192.168.0.3 VM_0_3_centos
Execute under the root user of the target server (192.168.0.1 192.168.0.2 192.168.0.3):
Confirm whether selinux is installed
Centos system executes: rpm -qa | grep selinux
Ubuntu system executes: apt list --installed | grep selinux
If selinux is already installed, execute: setenforce 0
Execute under the root user of the target server (192.168.0.1 192.168.0.2 192.168.0.3):
- vim /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
- vim /etc/security/limits.d/20-nproc.conf
* soft nproc unlimited
Execute under the root user of the target server (192.168.0.1 192.168.0.2 192.168.0.3):
If it is a Centos system:
systemctl disable firewalld.service
systemctl stop firewalld.service
systemctl status firewalld.service
If it is an Ubuntu system:
ufw disable
ufw status
Execute under the root user of the target server (192.168.0.1 192.168.0.2 192.168.0.3)
1) create user
groupadd -g 6000 apps
useradd -s /bin/bash -g apps -d /home/app app
passwd app
2) Create a directory
mkdir -p /data/projects/fate
mkdir -p /data/projects/install
chown -R app:apps /data/projects
3) Install dependencies
#centos
yum -y install gcc gcc-c++ make openssl-devel gmp-devel mpfr-devel libmpc-devel libaio numactl autoconf automake libtool libffi-devel snappy snappy-devel zlib zlib-devel bzip2 bzip2-devel lz4-devel libasan lsof sysstat telnet psmisc
#ubuntu
apt-get install -y gcc g++ make openssl supervisor libgmp-dev libmpfr-dev libmpc-dev libaio1 libaio-dev numactl autoconf automake libtool libffi-dev libssl1.0.0 libssl-dev liblz4-1 liblz4-dev liblz4-1-dbg liblz4-tool zlib1g zlib1g-dbg zlib1g-dev
cd /usr/lib/x86_64-linux-gnu
if [ ! -f "libssl.so.10" ];then
ln -s libssl.so.1.0.0 libssl.so.10
ln -s libcrypto.so.1.0.0 libcrypto.so.10
fi
Execute under the root user of the target server (192.168.0.1 192.168.0.2 192.168.0.3)
When used in a production environment, 128G virtual memory needs to be added due to memory calculation. Refer to:
cd /data
dd if=/dev/zero of=/data/swapfile128G bs=1024 count=134217728
mkswap /data/swapfile128G
swapon /data/swapfile128G
cat /proc/swaps
echo '/data/swapfile128G swap swap defaults 0 0' >> /etc/fstab
Note: The installation directory of this guide is /data/projects/install by default, the user is the app, and it should be modified according to the actual situation during installation.
Execute under the app user of the target server (192.168.0.1 has an external network environment):
mkdir -p /data/projects/install
cd /data/projects/install
wget https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/python-env-1.4.0-release.tar.gz
wget https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/jdk-8u192-linux-x64.tar.gz
wget https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/mysql-1.4.0-release.tar.gz
wget https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/FATE_install_1.4.0-release.tar.gz
#Send to 192.168.0.2和192.168.0.3
scp *.tar.gz [email protected]:/data/projects/install
scp *.tar.gz [email protected]:/data/projects/install
Execute under the app user of the target server (192.168.0.1 192.168.0.3)
1) MySQL installation:
#Create mysql root directory
mkdir -p /data/projects/fate/common/mysql
mkdir -p /data/projects/fate/data/mysql
#Unzip the package
cd /data/projects/install
tar xzvf mysql-1.4.0-release.tar.gz
cd mysql
tar xf mysql-8.0.13.tar.gz -C /data/projects/fate/common/mysql
#Configuration settings
mkdir -p /data/projects/fate/common/mysql/mysql-8.0.13/{conf,run,logs}
cp service.sh /data/projects/fate/common/mysql/mysql-8.0.13/
cp my.cnf /data/projects/fate/common/mysql/mysql-8.0.13/conf
#initialization
cd /data/projects/fate/common/mysql/mysql-8.0.13/
./bin/mysqld --initialize --user=app --basedir=/data/projects/fate/common/mysql/mysql-8.0.13 --datadir=/data/projects/fate/data/mysql > logs/init.log 2>&1
cat logs/init.log |grep root@localhost
#Note that the root @ localhost: in the output information is the initial password of the mysql user root, which should be recorded for later changing password
#Start service
cd /data/projects/fate/common/mysql/mysql-8.0.13/
nohup ./bin/mysqld_safe --defaults-file=./conf/my.cnf --user=app >>logs/mysqld.log 2>&1 &
#Change mysql root user password
cd /data/projects/fate/common/mysql/mysql-8.0.13/
./bin/mysqladmin -h 127.0.0.1 -P 3306 -S ./run/mysql.sock -u root -p password "fate_dev"
Enter Password:【Enter the root initial password】
#Verify login
cd /data/projects/fate/common/mysql/mysql-8.0.13/
./bin/mysql -u root -p -S ./run/mysql.sock
Enter Password:【Enter the modified password of root: fate_dev】
2)Database creation, authorization and business configuration
cd /data/projects/fate/common/mysql/mysql-8.0.13/
./bin/mysql -u root -p -S ./run/mysql.sock
Enter Password:【fate_dev】
#Create eggroll database and tables
mysql>source /data/projects/install/mysql/create-eggroll-meta-tables.sql;
#Create fate_flow database
mysql>CREATE DATABASE IF NOT EXISTS fate_flow;
#Create remote users and authorizations
1) 192.168.0.1 execute
mysql>CREATE USER 'fate'@'192.168.0.1' IDENTIFIED BY 'fate_dev';
mysql>GRANT ALL ON *.* TO 'fate'@'192.168.0.1';
mysql>CREATE USER 'fate'@'192.168.0.2' IDENTIFIED BY 'fate_dev';
mysql>GRANT ALL ON *.* TO 'fate'@'192.168.0.2';
mysql>flush privileges;
2) 192.168.0.3 execute
mysql>CREATE USER 'fate'@'192.168.0.3' IDENTIFIED BY 'fate_dev';
mysql>GRANT ALL ON *.* TO 'fate'@'192.168.0.3';
mysql>flush privileges;
#insert configuration data
1) 192.168.0.1 execute
mysql>INSERT INTO server_node (host, port, node_type, status) values ('192.168.0.1', '4670', 'CLUSTER_MANAGER', 'HEALTHY');
mysql>INSERT INTO server_node (host, port, node_type, status) values ('192.168.0.1', '4671', 'NODE_MANAGER', 'HEALTHY');
mysql>INSERT INTO server_node (host, port, node_type, status) values ('192.168.0.2', '4671', 'NODE_MANAGER', 'HEALTHY');
2) 192.168.0.3 execute
mysql>INSERT INTO server_node (host, port, node_type, status) values ('192.168.0.3', '4670', 'CLUSTER_MANAGER', 'HEALTHY');
mysql>INSERT INTO server_node (host, port, node_type, status) values ('192.168.0.3', '4671', 'NODE_MANAGER', 'HEALTHY');
#check
mysql>select User,Host from mysql.user;
mysql>show databases;
mysql>use eggroll_meta;
mysql>show tables;
mysql>select * from server_node;
Execute under the app user of the target server (192.168.0.1 192.168.0.2 192.168.0.3)
#Create jdk installation directory
mkdir -p /data/projects/fate/common/jdk
#Unzip the package
cd /data/projects/install
tar xzf jdk-8u192-linux-x64.tar.gz -C /data/projects/fate/common/jdk
cd /data/projects/fate/common/jdk
mv jdk1.8.0_192 jdk-8u192
Execute under the app user of the target server (192.168.0.1 192.168.0.2 192.168.0.3)
#Create python virtual installation directory
mkdir -p /data/projects/fate/common/python
#Install miniconda3
cd /data/projects/install
tar xvf python-env-1.4.0-release.tar.gz
cd python-env
sh Miniconda3-4.5.4-Linux-x86_64.sh -b -p /data/projects/fate/common/miniconda3
#Install virtualenv and create virtual environment
/data/projects/fate/common/miniconda3/bin/pip install virtualenv-20.0.18-py2.py3-none-any.whl -f . --no-index
/data/projects/fate/common/miniconda3/bin/virtualenv -p /data/projects/fate/common/miniconda3/bin/python3.6 --no-wheel --no-setuptools --no-download /data/projects/fate/common/python/venv
#Install dependencies
tar xvf pip-packages-fate-*.tar.gz
source /data/projects/fate/common/python/venv/bin/activate
pip install setuptools-42.0.2-py2.py3-none-any.whl
pip install -r pip-packages-fate-1.4.0/requirements.txt -f ./pip-packages-fate-1.4.0 --no-index
pip list | wc -l
#The result should be 158
#Software deployment
#Execute under the app user of the target server (192.168.0.1 192.168.0.2 192.168.0.3)
cd /data/projects/install
tar xf FATE_install_1.4.0-release.tar.gz
cd FATE_install_1.4*
tar xvf python.tar.gz -C /data/projects/fate/
tar xvf eggroll.tar.gz -C /data/projects/fate
#Execute under the app user of the target server (192.168.0.1 192.168.0.3)
tar xvf fateboard.tar.gz -C /data/projects/fate
#Set the environment variable file
#Execute under the app user of the target server (192.168.0.1 192.168.0.2 192.168.0.3)
cat >/data/projects/fate/init_env.sh <<EOF
export PYTHONPATH=/data/projects/fate/python:/data/projects/fate/eggroll/python
export EGGROLL_HOME=/data/projects/fate/eggroll/
venv=/data/projects/fate/common/python/venv
source \${venv}/bin/activate
export JAVA_HOME=/data/projects/fate/common/jdk/jdk-8u192
export PATH=\$PATH:\$JAVA_HOME/bin
EOF
This configuration file are shared among rollsite, clustermanager, and nodemanager, and configuration across multiple hosts on each party should be consistent. Content needs to be modified:
-
Database driver, the database corresponds to the connection IP, port, user name and password used by the party. Usually the default value for the port should suffice.
eggroll.resourcemanager.clustermanager.jdbc.driver.class.name
eggroll.resourcemanager.clustermanager.jdbc.username
eggroll.resourcemanager.clustermanager.jdbc.password
-
Corresponding to the IP, port, nodemanager port, process tag, and port of the party clustermanager. Usually the default value for the port should suffice.
eggroll.resourcemanager.clustermanager.host
eggroll.resourcemanager.clustermanager.port
eggroll.resourcemanager.nodemanager.port
eggroll.resourcemanager.process.tag
-
The Python virtual environment path, business code pythonpath, and JAVA Home path are modified. If there is no change in the related path, keep the default.
eggroll.resourcemanager.bootstrap.egg_pair.venv
eggroll.resourcemanager.bootstrap.egg_pair.pythonpath
eggroll.resourcemanager.bootstrap.roll_pair_master.javahome
-
Modify IP and port corresponding to the party rollsite and the party's Party Id. Default value for rollsite's port generally should suffice.
eggroll.rollsite.host eggroll.rollsite.port eggroll.rollsite.party.id
The above parameter adjustment can be manually configured by referring to the following example, or can be completed using the following command:
Configuration file: /data/projects/fate/eggroll/conf/eggroll.properties
#Execute under the app user of the target server (192.168.0.1 192.168.0.2)
cat > /data/projects/fate/eggroll/conf/eggroll.properties <<EOF
[eggroll]
#db connect inf
eggroll.resourcemanager.clustermanager.jdbc.driver.class.name=com.mysql.cj.jdbc.Driver
eggroll.resourcemanager.clustermanager.jdbc.url=jdbc:mysql://192.168.0.1:3306/eggroll_meta?useSSL=false&serverTimezone=UTC&characterEncoding=utf8&allowPublicKeyRetrieval=true
eggroll.resourcemanager.clustermanager.jdbc.username=fate
eggroll.resourcemanager.clustermanager.jdbc.password=fate_dev
eggroll.data.dir=data/
eggroll.logs.dir=logs/
#clustermanager & nodemanager
eggroll.resourcemanager.clustermanager.host=192.168.0.1
eggroll.resourcemanager.clustermanager.port=4670
eggroll.resourcemanager.nodemanager.port=4671
eggroll.resourcemanager.process.tag=fate-host
eggroll.bootstrap.root.script=bin/eggroll_boot.sh
eggroll.resourcemanager.bootstrap.egg_pair.exepath=bin/roll_pair/egg_pair_bootstrap.sh
#python env
eggroll.resourcemanager.bootstrap.egg_pair.venv=/data/projects/fate/common/python/venv
#pythonpath, very import, do not modify.
eggroll.resourcemanager.bootstrap.egg_pair.pythonpath=/data/projects/fate/python:/data/projects/fate/eggroll/python
eggroll.resourcemanager.bootstrap.egg_pair.filepath=python/eggroll/roll_pair/egg_pair.py
eggroll.resourcemanager.bootstrap.roll_pair_master.exepath=bin/roll_pair/roll_pair_master_bootstrap.sh
#javahome
eggroll.resourcemanager.bootstrap.roll_pair_master.javahome=/data/projects/fate/common/jdk/jdk-8u192
eggroll.resourcemanager.bootstrap.roll_pair_master.classpath=conf/:lib/*
eggroll.resourcemanager.bootstrap.roll_pair_master.mainclass=com.webank.eggroll.rollpair.RollPairMasterBootstrap
eggroll.resourcemanager.bootstrap.roll_pair_master.jvm.options=
# for roll site. rename in the next round
eggroll.rollsite.coordinator=webank
eggroll.rollsite.host=192.168.0.2
eggroll.rollsite.port=9370
eggroll.rollsite.party.id=10000
eggroll.rollsite.route.table.path=conf/route_table.json
eggroll.session.processors.per.node=4
eggroll.session.start.timeout.ms=180000
eggroll.rollsite.adapter.sendbuf.size=1048576
eggroll.rollpair.transferpair.sendbuf.size=4150000
EOF
#Execute under the app user of the target server (192.168.0.3)
cat > /data/projects/fate/eggroll/conf/eggroll.properties <<EOF
[eggroll]
#db connect inf
eggroll.resourcemanager.clustermanager.jdbc.driver.class.name=com.mysql.cj.jdbc.Driver
eggroll.resourcemanager.clustermanager.jdbc.url=jdbc:mysql://192.168.0.3:3306/eggroll_meta?useSSL=false&serverTimezone=UTC&characterEncoding=utf8&allowPublicKeyRetrieval=true
eggroll.resourcemanager.clustermanager.jdbc.username=fate
eggroll.resourcemanager.clustermanager.jdbc.password=fate_dev
eggroll.data.dir=data/
eggroll.logs.dir=logs/
#clustermanager & nodemanager
eggroll.resourcemanager.clustermanager.host=192.168.0.3
eggroll.resourcemanager.clustermanager.port=4670
eggroll.resourcemanager.nodemanager.port=4671
eggroll.resourcemanager.process.tag=fate-guest
eggroll.bootstrap.root.script=bin/eggroll_boot.sh
eggroll.resourcemanager.bootstrap.egg_pair.exepath=bin/roll_pair/egg_pair_bootstrap.sh
#python env
eggroll.resourcemanager.bootstrap.egg_pair.venv=/data/projects/fate/common/python/venv
#pythonpath, very import, do not modify.
eggroll.resourcemanager.bootstrap.egg_pair.pythonpath=/data/projects/fate/python:/data/projects/fate/eggroll/python
eggroll.resourcemanager.bootstrap.egg_pair.filepath=python/eggroll/roll_pair/egg_pair.py
eggroll.resourcemanager.bootstrap.roll_pair_master.exepath=bin/roll_pair/roll_pair_master_bootstrap.sh
#javahome
eggroll.resourcemanager.bootstrap.roll_pair_master.javahome=/data/projects/fate/common/jdk/jdk-8u192
eggroll.resourcemanager.bootstrap.roll_pair_master.classpath=conf/:lib/*
eggroll.resourcemanager.bootstrap.roll_pair_master.mainclass=com.webank.eggroll.rollpair.RollPairMasterBootstrap
eggroll.resourcemanager.bootstrap.roll_pair_master.jvm.options=
# for roll site. rename in the next round
eggroll.rollsite.coordinator=webank
eggroll.rollsite.host=192.168.0.3
eggroll.rollsite.port=9370
eggroll.rollsite.party.id=9999
eggroll.rollsite.route.table.path=conf/route_table.json
eggroll.session.processors.per.node=4
eggroll.session.start.timeout.ms=180000
eggroll.rollsite.adapter.sendbuf.size=1048576
eggroll.rollpair.transferpair.sendbuf.size=4150000
EOF
This configuration file rollsite is used to configure routing information. You can manually configure it by referring to the following example, or you can use the following command:
Configuration file: /data/projects/fate/eggroll/conf/route_table.json
#Execute under the app user of the target server (192.168.0.2)
cat > /data/projects/fate/eggroll/conf/route_table.json << EOF
{
"route_table":
{
"10000":
{
"default":[
{
"port": 9370,
"ip": "192.168.0.2"
}
],
"fateflow":[
{
"port": 9360,
"ip": "192.168.0.1"
}
]
},
"9999":
{
"default":[
{
"port": 9370,
"ip": "192.168.0.3"
}
]
}
},
"permission":
{
"default_allow": true
}
}
EOF
#Execute under the app user of the target server (192.168.0.3)
cat > /data/projects/fate/eggroll/conf/route_table.json << EOF
{
"route_table":
{
"9999":
{
"default":[
{
"port": 9370,
"ip": "192.168.0.3"
}
],
"fateflow":[
{
"port": 9360,
"ip": "192.168.0.3"
}
]
},
"10000":
{
"default":[
{
"port": 9370,
"ip": "192.168.0.2"
}
]
}
},
"permission":
{
"default_allow": true
}
}
EOF
-
fateflow
fateflow IP , host: 192.168.0.1,guest: 192.168.0.3
grpc port: 9360
http port: 9380
- fateboard
fateboard IP, host: 192.168.0.1, guest: 192.168.0.3
fateboard port: 8080
-
proxy
proxy IP, host: 192.168.0.2, guest: 192.168.0.3---Rollsite component corresponds to IP
proxy port:9370
This file should be configured in json format, otherwise an error will be reported, you can refer to the following example to manually configure, you can also use the following instructions to complete.
Configuration file: /data/projects/fate/python/arch/conf/server_conf.json
#Execute under the app user of the target server (192.168.0.1 192.168.0.2)
cat > /data/projects/fate/python/arch/conf/server_conf.json << EOF
{
"servers": {
"fateflow": {
"host": "192.168.0.1",
"grpc.port": 9360,
"http.port": 9380
},
"fateboard": {
"host": "192.168.0.1",
"port": 8080
},
"proxy": {
"host": "192.168.0.2",
"port": 9370
},
"servings": [
"127.0.0.1:8000"
]
}
}
EOF
#Execute under the app user of the target server (192.168.0.3)
cat > /data/projects/fate/python/arch/conf/server_conf.json << EOF
{
"servers": {
"fateflow": {
"host": "192.168.0.3",
"grpc.port": 9360,
"http.port": 9380
},
"fateboard": {
"host": "192.168.0.3",
"port": 8080
},
"proxy": {
"host": "192.168.0.3",
"port": 9370
},
"servings": [
"127.0.0.1:8000"
]
}
}
EOF
-
work_mode(1 means cluster mode, default)
-
db connection IP, port, account and password
-
Redis IP, port, password (no configuration required for temporary use of redis)
This configuration file should be in yaml format, otherwise an error will be raised during parsing, you can refer to the following example to manually configure, or you can use the following command.
Configuration file: /data/projects/fate/python/arch/conf/base_conf.yaml
#Execute under the app user of the target server (192.168.0.1)
cat > /data/projects/fate/python/arch/conf/base_conf.yaml <<EOF
work_mode: 1
fate_flow:
host: 0.0.0.0
http_port: 9380
grpc_port: 9360
database:
name: fate_flow
user: fate
passwd: fate_dev
host: 192.168.0.1
port: 3306
max_connections: 100
stale_timeout: 30
redis:
host: 127.0.0.1
port: 6379
password: WEBANK_2014_fate_dev
max_connections: 500
db: 0
default_model_store_address:
storage: redis
host: 127.0.0.1
port: 6379
password: fate_dev
db: 0
EOF
#Execute under the app user of the target server (192.168.0.3)
cat > /data/projects/fate/python/arch/conf/base_conf.yaml <<EOF
work_mode: 1
fate_flow:
host: 0.0.0.0
http_port: 9380
grpc_port: 9360
database:
name: fate_flow
user: fate
passwd: fate_dev
host: 192.168.0.3
port: 3306
max_connections: 100
stale_timeout: 30
redis:
host: 127.0.0.1
port: 6379
password: WEBANK_2014_fate_dev
max_connections: 500
db: 0
default_model_store_address:
storage: redis
host: 127.0.0.1
port: 6379
password: fate_dev
db: 0
EOF
1)application.properties
-
Service port
server.port---default
-
fateflow access url
fateflow.url, host: http://192.168.0.1:9380, guest: http://192.168.0.3:9380
-
Database connection string, account number and password
fateboard.datasource.jdbc-url, host: mysql://192.168.0.1:3306, guest: mysql://192.168.0.3:3306
fateboard.datasource.username: fate
fateboard.datasource.password: fate_dev
The above parameter adjustment can be manually configured by referring to the following example, or can be completed using the following command:
Configuration file: /data/projects/fate/fateboard/conf/application.properties
#Execute under the app user of the target server (192.168.0.1)
cat > /data/projects/fate/fateboard/conf/application.properties <<EOF
server.port=8080
fateflow.url=http://192.168.0.1:9380
spring.datasource.driver-Class-Name=com.mysql.cj.jdbc.Driver
spring.http.encoding.charset=UTF-8
spring.http.encoding.enabled=true
server.tomcat.uri-encoding=UTF-8
fateboard.datasource.jdbc-url=jdbc:mysql://192.168.0.1:3306/fate_flow?characterEncoding=utf8&characterSetResults=utf8&autoReconnect=true&failOverReadOnly=false&serverTimezone=GMT%2B8
fateboard.datasource.username=fate
fateboard.datasource.password=fate_dev
server.tomcat.max-threads=1000
server.tomcat.max-connections=20000
EOF
#Execute under the app user of the target server (192.168.0.3)
cat > /data/projects/fate/fateboard/conf/application.properties <<EOF
server.port=8080
fateflow.url=http://192.168.0.3:9380
spring.datasource.driver-Class-Name=com.mysql.cj.jdbc.Driver
spring.http.encoding.charset=UTF-8
spring.http.encoding.enabled=true
server.tomcat.uri-encoding=UTF-8
fateboard.datasource.jdbc-url=jdbc:mysql://192.168.0.3:3306/fate_flow?characterEncoding=utf8&characterSetResults=utf8&autoReconnect=true&failOverReadOnly=false&serverTimezone=GMT%2B8
fateboard.datasource.username=fate
fateboard.datasource.password=fate_dev
server.tomcat.max-threads=1000
server.tomcat.max-connections=20000
EOF
2)service.sh
#Execute under the app user of the target server (192.168.0.1 192.168.0.3)
cd /data/projects/fate/fateboard
vi service.sh
export JAVA_HOME=/data/projects/fate/common/jdk/jdk-8u192
Execute under the app user of the target server (192.168.0.2)
#Start eggroll service
source /data/projects/fate/init_env.sh
cd /data/projects/fate/eggroll
sh ./bin/eggroll.sh rollsite start
sh ./bin/eggroll.sh nodemanager start
Execute under the app user of the target server (192.168.0.1)
#Start eggroll service
source /data/projects/fate/init_env.sh
cd /data/projects/fate/eggroll
sh ./bin/eggroll.sh clustermanager start
sh ./bin/eggroll.sh nodemanager start
#Start the fate service, fateflow depends on the start of rollsite and mysql. Make sure to start fateflow after eggroll of all nodes have been started. Otherwise, you will get stuck, and an error will be raised.
source /data/projects/fate/init_env.sh
cd /data/projects/fate/python/fate_flow
sh service.sh start
cd /data/projects/fate/fateboard
sh service.sh start
Execute under the app user of the target server (192.168.0.3)
#Start eggroll service
source /data/projects/fate/init_env.sh
cd /data/projects/fate/eggroll
sh ./bin/eggroll.sh all start
#Start fate service
source /data/projects/fate/init_env.sh
cd /data/projects/fate/python/fate_flow
sh service.sh start
cd /data/projects/fate/fateboard
sh service.sh start
- eggroll log
/data/projects/fate/eggroll/logs/eggroll/bootstrap.clustermanager.err
/data/projects/fate/eggroll/logs/eggroll/clustermanager.jvm.err.log
/data/projects/fate/eggroll/logs/eggroll/nodemanager.jvm.err.log
/data/projects/fate/eggroll/logs/eggroll/bootstrap.nodemanager.err
/data/projects/fate/eggroll/logs/eggroll/bootstrap.rollsite.err
/data/projects/fate/eggroll/logs/eggroll/rollsite.jvm.err.log
- fateflow log
/data/projects/fate/python/logs/fate_flow/
- fateboard log
/data/projects/fate/fateboard/logs
You need to set 3 parameters for this test: guest_partyid,host_partyid,work_mode.
- Executed on 192.168.0.1, guest_partyid and host_partyid are set to 10000:
source /data/projects/fate/init_env.sh
cd /data/projects/fate/python/examples/toy_example/
python run_toy_example.py 10000 10000 1
A result similar to the following indicates success:
"2020-04-28 18:26:20,789 - secure_add_guest.py[line:126] - INFO: success to calculate secure_sum, it is 1999.9999999999998"
- Executed on 192.168.0.3, guest_partyid and host_partyid are set to 9999:
source /data/projects/fate/init_env.sh
cd /data/projects/fate/python/examples/toy_example/
python run_toy_example.py 9999 9999 1
A result similar to the following indicates success:
"2020-04-28 18:26:20,789 - secure_add_guest.py[line:126] - INFO: success to calculate secure_sum, it is 1999.9999999999998"
Select 9999 as the guest and execute on 192.168.0.3:
source /data/projects/fate/init_env.sh
cd /data/projects/fate/python/examples/toy_example/
python run_toy_example.py 9999 10000 1
A result similar to the following indicates success::
"2020-04-28 18:26:20,789 - secure_add_guest.py[line:126] - INFO: success to calculate secure_sum, it is 1999.9999999999998"
Start the virtual environment in host and guest respectively.
In the node of guest and host parties, set the fields: guest_id, host_id, arbiter_id in run_task.py according to your actual setting. This file is located in / data / projects / fate / python / examples / min_test_task/.
In the node of host party, run:
source /data/projects/fate/init_env.sh
cd /data/projects/fate/python/examples/min_test_task/
sh run.sh host fast
Get the values of "host_table" and "host_namespace" from test results, and pass them to following command.
In the node of guest party, run:
source /data/projects/fate/init_env.sh
cd /data/projects/fate/python/examples/min_test_task/
sh run.sh guest fast ${host_table} ${host_namespace}
Wait a few minutes, a result showing "success" indicates that the operation is successful. In other cases, if FAILED or stuck, it means failure.
Just replace the word "fast" with "normal" in all the commands, the rest is the same with fast mode.
Fateboard is a web service. Get the ip of fateboard. If fateboard service is launched successfully, you can see the task information by visiting http://${fateboard-ip}:8080. Firewall may need to be opened. When fateboard and fatefow are deployed to separate servers, you need to specify server information of fateflow service on Fateboard page: click the gear icon on the top right corner of Board homepage -> click "add" -> fill in ip, os user, ssh, and password for fateflow service.
Execute under the app user of the target server (192.168.0.1 192.168.0.2 192.168.0.3)
source /data/projects/fate/init_env.sh
cd /data/projects/fate/eggroll
Start / stop / status / restart all:
sh ./bin/eggroll.sh all start/stop/status/restart
Start / stop / status / restart a single module (optional: clustermanager, nodemanager, rollsite):
sh ./bin/eggroll.sh clustermanager start/stop/status/restart
- Start / stop / status / restart fate_flow service
source /data/projects/fate/init_env.sh
cd /data/projects/fate/python/fate_flow
sh service.sh start|stop|status|restart
If you start module by module, you need to start eggroll first and then start fateflow. Fateflow depends on the start of eggroll.
- Start / stop / status / restart fateboard service
cd /data/projects/fate/fateboard
sh service.sh start|stop|status|restart
Start / stop / status / restart mysql service
cd /data/projects/fate/common/mysql/mysql-8.0.13
sh ./service.sh start|stop|status|restart
Execute under the app user of the target server (192.168.0.1 192.168.0.2 192.168.0.3)
#See if the process starts according to the deployment plan
ps -ef | grep -i clustermanager
ps -ef | grep -i nodemanager
ps -ef | grep -i rollsite
ps -ef | grep -i fate_flow_server.py
ps -ef | grep -i fateboard
#Check whether the process port exists according to the deployment plan
#clustermanager
netstat -tlnp | grep 4670
#nodemanager
netstat -tlnp | grep 4671
#rollsite
netstat -tlnp | grep 9370
#fate_flow_server
netstat -tlnp | grep 9360
#fateboard
netstat -tlnp | grep 8080
Service | Log path |
---|---|
eggroll | /data/projects/fate/eggroll/logs |
fate_flow&Task log | /data/projects/fate/python/logs |
fateboard | /data/projects/fate/fateboard/logs |
mysql | /data/projects/fate/common/mysql/mysql-8.0.13/logs |
refer to build guide
Configuration file path: /data/projects/fate/eggroll/conf/eggroll.properties
Configuration file path: eggroll.session.processors.per.node
Assume that the CPU cores (cpu cores) are: c, The number of Nodemanager is: n, The number of tasks to be run simultaneously is p, then:
egg_num=eggroll.session.processors.per.node = c * 0.8 / p
partitions (Number of roll pair partitions) = egg_num * n