SUCCESS | Type | Versions | OS | DB | KDC | TLS | Comment |
---|---|---|---|---|---|---|---|
✓ |
basic |
CDP 7.3.1.0 & CM 7.13.1.0 |
RHEL 8.10 |
Postgres 14 |
MIT KDC |
True |
|
✓ |
full |
CDP 7.3.1.0 & CM 7.13.1.0 |
RHEL 8.10 |
Postgres 14 |
MIT KDC |
True |
|
✓ |
all-services |
CDP 7.3.1.0 & CM 7.13.1.0 & CFM 2.2.9.0 & CSA 1.14.0.0 |
RHEL 8.10 |
Postgres 14 |
MIT KDC |
True |
|
✓ |
streaming |
CDP 7.3.1.0 & CM 7.13.1.0 & CFM 2.2.9.0 & CSA 1.14.0.0 |
RHEL 8.10 |
Postgres 14 |
MIT KDC |
True |
|
✓ |
streaming-with-efm |
CDP 7.3.1.0 & CM 7.13.1.0 & CFM 2.2.9.0 & EFM 2.2.99.0 & CSA 1.14.0.0 |
RHEL 8.10 |
Postgres 14 |
MIT KDC |
True |
|
✓ |
observability |
CDP 7.3.1.0 & CM 7.13.1.0 & Observability 3.5.3 |
RHEL 8.10 |
Postgres 14 |
MIT KDC |
True |
|
pvc |
CDP 7.3.1.0 & CM 7.13.1.0 & PvC 1.5.4 |
RHEL 8.8 |
Postgres 14 |
Free IPA |
True |
||
all-services-pvc |
CDP 7.3.1.0 & CM 7.13.1.0 & PvC 1.5.4 |
RHEL 8.8 |
Postgres 14 |
Free IPA |
True |
||
pvc-oc |
CDP 7.3.1.0 & CM 7.13.1.0 & PvC 1.5.4 |
CENTOS 7.9 |
Postgres 14 |
Free IPA |
True |
||
all-services-pvc-oc |
CDP 7.3.1.0 & CM 7.13.1.0 & PvC 1.5.4 |
CENTOS 7.9 |
Postgres 14 |
Free IPA |
True |
||
✓ |
basic |
CDP 7.1.9.14 & CM 7.11.3.9 |
CENTOS 8.8 |
Postgres 14 |
MIT KDC |
True |
On AWS with setup-cluster-on-cloud |
✓ |
full |
CDP 7.1.9.14 & CM 7.11.3.9 |
CENTOS 8.8 |
Postgres 14 |
MIT KDC |
True |
On AWS with setup-cluster-on-cloud |
✓ |
streaming |
CDP 7.1.9.14 & CM 7.11.3.9 & CFM 2.1.7 |
CENTOS 8.8 |
Postgres 14 |
MIT KDC |
True |
On AWS with setup-cluster-on-cloud |
pvc |
CDP 7.1.9.14 & CM 7.11.3.9 & PvC 1.5.4 |
CENTOS 8.8 |
Postgres 14 |
Free IPA |
True |
On AWS with setup-cluster-on-cloud |
|
✓ |
cdp-719 |
CDP 7.1.9.1015 & CM 7.11.3.26 |
RHEL 8.10 |
Postgres 14 |
MIT KDC |
True |
|
✓ |
cdp-719-basic |
CDP 7.1.9.1015 & CM 7.11.3.26 |
RHEL 8.10 |
Postgres 14 |
MIT KDC |
True |
|
✓ |
cdp-719-all-services |
CDP 7.1.9.1015 & CM 7.11.3.26 |
RHEL 8.10 |
Postgres 14 |
MIT KDC |
True |
|
✓ |
cdp-719-streaming |
CDP 7.1.9.1015 & CM 7.11.3.26 & CFM 2.1.7.1000 |
RHEL 8.10 |
Postgres 14 |
MIT KDC |
True |
|
✓ |
cdp-719-pvc |
CDP 7.1.9.1015 & CM 7.11.3.26 & PvC 1.5.4 |
RHEL 8.8 |
Postgres 14 |
Free IPA |
True |
Use --os-version=8.8 |
✓ |
cdp-717 |
CDH 7.1.7.2026 & CM 7.6.7 |
CENTOS 7.9 |
Postgres 14 |
MIT KDC |
True |
|
✓ |
cdh-5 |
CDH 5.16 & CM 5.16.2 |
CENTOS 7.6 |
Postgres 12 |
MIT KDC |
False |
|
✓ |
cdh-6 |
CDH 6.3.4 & CM 6.3.4 |
CENTOS 7.6 |
Postgres 12 |
MIT KDC |
False |
|
✓ |
hdp-3 |
HDP 3.1.5.6091 & Ambari 2.7.5.0 |
CENTOS 7.6 |
MySQL |
MIT KDC |
False |
|
✓ |
hdp-2 |
HDP 2.6.5.0 & Ambari 2.6.2.2 |
Postgres 12 |
MySQL |
MIT KDC |
False |
Successful installation are marked with a ✓ in SUCCESS column
It requires Ansible with minimum version 2.10 and maximum version 2.16.
Launch the script
to enable all requirements before launching the full script.requirements.sh
You need to pass one parameter, which is the type of the machine where you are running it, it could be: rhel, debian, suse or mac. As an example, if you run this on your macOS, you will run:
./requirements.sh mac
To install a cluster, default one is a CDP 7 - 10 nodes with Kerberos and TLS set:
export PAYWALL_USER= # Your Paywall User from Cloudera to access archive.cloduera.com export PAYWALL_PASSWORD= # Your Paywall password from Cloudera to access archive.cloduera.com export LICENSE_FILE= # Your Licence file from Cloudera export CLUSTER_NAME= # A name of your choice (ex: cloudera-test ) export NODES= # *Space* separated list of nodes (ex: "node1 node2 node3 ") (You must provide as much as nodes are needed for the type of installation you are launching, default being 10.)
./setup-cluster.sh \ --cluster-name=${CLUSTER_NAME} \ --license-file=${LICENSE_FILE} \ --paywall-username=${PAYWALL_USER} \ --paywall-password=${PAYWALL_PASSWORD} \ --nodes="${NODES}"
N.B. : This assumes that a passwordless connection is present from here to all your cluster nodes, however provide a password with --node-password
or a private key file with --node-key
It is only working with AWS as of now.
Note: This requires you to install Terraform and AWS CLI on the machine where you will launch this script.
It will require also that you create in advance a key pair in AWS, get your private key locally, and also get AWS access key and secret access key:
export ACCESS_KEY="" # Your AWS Access Key export SECRET_ACCESS_KEY="" # Your AWS Secret Access Key export KEY_PAIR_NAME="" # Name of a key pair in AWS that will be set to acess your machines export PRIVATE_KEY_PATH="" # Local private key path to use to access your machines export WHITELIST_IP="" # Your IP, so only this IP will be able to access your machines
export PAYWALL_USER= # Your Paywall User from Cloudera to access archive.cloduera.com export PAYWALL_PASSWORD= # Your Paywall password from Cloudera to access archive.cloduera.com export LICENSE_FILE= # Your Licence file from Cloudera export CLUSTER_NAME= # A name of your choice (ex: cloudera-test )
./setup-cluster-on-cloud.sh \ --cloud-provider="AWS" \ --aws-access-key=${ACCESS_KEY} \ --aws-secret-access-key=${SECRET_ACCESS_KEY} \ --aws-key-pair-name=${KEY_PAIR_NAME} \ --private-key-path=${PRIVATE_KEY_PATH} \ --whitelist-ip=${WHITELIST_IP} \ --os-version=8.7 \ --setup-etc-hosts=false \ \ --cluster-name=${CLUSTER_NAME} \ --license-file=${LICENSE_FILE} \ --paywall-username=${PAYWALL_USER} \ --paywall-password=${PAYWALL_PASSWORD}
All parameters above must be let like this, as they are appropriate to AWS machines. After these parameters, you can add all other parameters that worked with script: setup-cluster.sh.
The script, will use terraform to provide your machines, setup connectivity and then launch setup-cluster.sh with pre-configured parameters to create the wanted cluster.
./setup-cluster.sh \ --cluster-name=${CLUSTER_NAME} \ --cluster-type=basic \ --nodes-base="${NODES}"
./setup-cluster.sh \ --cluster-name=${CLUSTER_NAME} \ --license-file=${LICENSE_FILE} \ --paywall-username=${PAYWALL_USER} \ --paywall-password=${PAYWALL_PASSWORD} \ --nodes-base="${NODES}"
./setup-cluster.sh \ --cluster-name=${CLUSTER_NAME} \ --license-file=${LICENSE_FILE} \ --paywall-username=${PAYWALL_USER} \ --paywall-password=${PAYWALL_PASSWORD} \ --cluster-type=basic \ --nodes-base="${NODES}"
./setup-cluster.sh \ --cluster-name=${CLUSTER_NAME} \ --license-file=${LICENSE_FILE} \ --paywall-username=${PAYWALL_USER} \ --paywall-password=${PAYWALL_PASSWORD} \ --cluster-type=basic-enc \ --nodes-kts=<Dedicated Node(s) for KTS> \ --nodes-base="${NODES}"
CDP 7 - Basic 6 nodes with Free IPA on a dedicated node (All CDP clusters can have free-ipa just by adding --free-ipa=true and provide a node with --node-ipa=) (Kerberos / TLS)
./setup-cluster.sh \ --cluster-name=${CLUSTER_NAME} \ --license-file=${LICENSE_FILE} \ --paywall-username=${PAYWALL_USER} \ --paywall-password=${PAYWALL_PASSWORD} \ --cluster-type=basic \ --free-ipa=true \ --node-ipa=<One node dedicated to IPA> \ --nodes-base="${NODES}"
CDP 7 - Streaming cluster (6 nodes basic with Spark 3 and Flink + a VPC of 3 nodes of Kafka/Nifi) (Kerberos / TLS)
./setup-cluster.sh \ --cluster-name=${CLUSTER_NAME} \ --license-file=${LICENSE_FILE} \ --paywall-username=${PAYWALL_USER} \ --paywall-password=${PAYWALL_PASSWORD} \ --cluster-type=streaming \ --nodes-base="${NODES}"
CDP 7 - All Services (6 nodes basic with Spark 3 and Flink + 3 Nifi/Kafka nodes + 1 node for KTS ) (Kerberos / TLS)
./setup-cluster.sh \ --cluster-name=${CLUSTER_NAME} \ --license-file=${LICENSE_FILE} \ --paywall-username=${PAYWALL_USER} \ --paywall-password=${PAYWALL_PASSWORD} \ --cluster-type=all-services \ --nodes-kts=<Dedicated Node for KTS> \ --nodes-base="${NODES}"
./setup-cluster.sh \ --cluster-name=${CLUSTER_NAME} \ --license-file=${LICENSE_FILE} \ --paywall-username=${PAYWALL_USER} \ --paywall-password=${PAYWALL_PASSWORD} \ --cluster-type=pvc \ --nodes-ecs=<Space separated list of 3 nodes> \ --node-ipa=<One node dedicated to IPA> \ --nodes-base="${NODES}"
CDP 7 - 6 nodes basic for PVC with Openshift (Experiences installed on a provided OCP cluster) (Kerberos / TLS / FreeIPA)
./setup-cluster.sh \ --cluster-name=${CLUSTER_NAME} \ --license-file=${LICENSE_FILE} \ --paywall-username=${PAYWALL_USER} \ --paywall-password=${PAYWALL_PASSWORD} \ --cluster-type=pvc-oc \ --kubeconfig-path=<Path to your kubeconfig file> \ --oc-tar-file-path=<Path to your oc.tar file downloaded from RedHat> \ --node-ipa=<One node dedicated to IPA> \ --nodes-base="${NODES}"
CDP 7 - All Services (6 nodes basic with Spark 3 and Flink + 3 Nifi/Kafka nodes + 1 node for KTS + Associated with a PvC ) (Kerberos / TLS / FreeIPA)
./setup-cluster.sh \ --cluster-name=${CLUSTER_NAME} \ --license-file=${LICENSE_FILE} \ --paywall-username=${PAYWALL_USER} \ --paywall-password=${PAYWALL_PASSWORD} \ --cluster-type=all-services-pvc \ --nodes-kts=<Dedicated Node for KTS> \ --node-ipa=<Dedicated Node for IPA> \ --nodes-ecs=<Space separated list of 3 nodes> \ --nodes-base="${NODES}"
CDP 7 - Observability cluster (Requires a cluster to be pluggued to; it creates a cluster of 6 nodes ) (Kerberos / TLS)
./setup-cluster.sh \ --cluster-name=${CLUSTER_NAME} \ --license-file=${LICENSE_FILE} \ --paywall-username=${PAYWALL_USER} \ --paywall-password=${PAYWALL_PASSWORD} \ --cluster-type=observability \ --altus-key-id=<ALTUS key ID provided by Cloudera> \ --altus-private-key=<path to ALTUS private key provided by Cloudera> \ --cm-base-url=<http://<CM host to connect to OBSERVABILITY>:<Port> \ --tp-host=<Host in base cluster that will have Telemetry Publisher installed> \ --nodes-base="${NODES}"
./setup-cluster.sh \ --cluster-name=${CLUSTER_NAME} \ --license-file=${LICENSE_FILE} \ --paywall-username=${PAYWALL_USER} \ --paywall-password=${PAYWALL_PASSWORD} \ --cdh-version='7.1.8.1' \ --cm-version='7.7.3-33365545' \ --nodes-base="${NODES}"
./setup-cluster.sh \ --cluster-name=${CLUSTER_NAME} \ --license-file=${LICENSE_FILE} \ --paywall-username=${PAYWALL_USER} \ --paywall-password=${PAYWALL_PASSWORD} \ --kerberos=false \ --tls=false \ --nodes-base="${NODES}"
./setup-cluster.sh \ --cluster-name=${CLUSTER_NAME} \ --license-file=${LICENSE_FILE} \ --paywall-username=${PAYWALL_USER} \ --paywall-password=${PAYWALL_PASSWORD} \ --cluster-type=cdh6 \ --nodes-base="${NODES}"
./setup-cluster.sh \ --cluster-name=${CLUSTER_NAME} \ --license-file=${LICENSE_FILE} \ --paywall-username=${PAYWALL_USER} \ --paywall-password=${PAYWALL_PASSWORD} \ --cluster-type=cdh5 \ --nodes-base="${NODES}"
./setup-cluster.sh \ --cluster-name=${CLUSTER_NAME} \ --license-file=${LICENSE_FILE} \ --paywall-username=${PAYWALL_USER} \ --paywall-password=${PAYWALL_PASSWORD} \ --cluster-type=hdp3 \ --nodes-base="${NODES}"
At the end, CM or Ambari depending on your installation should be available at the first node URL with appropriate http or https and port (depending on tls parameters for HDP which is false by default and tls for CDP which is true by default).
During the installation, you can also follow the installation from CM or Ambari by connecting to it.
N.B.: It is recommended to not interfer with the cluster during ansible installation until it is done
At the end of the installation, if it completed successfully, users are created on machines, their keytabs too and are retrieved in your local computer under
, /tmp/
is also retrieved.krb5.conf
Moreover, it is also possible to launch some random data generation into various systems.
All default passwords are Cloudera1234
This describe in details the steps made during the installation in the right order, each one could be skipped and hence be launched separately.
Once you gathered all previous requirements, a launch could be made, it will mainly consist of 5 steps:
-
Prepare your machines
-
Launch the installation from the first node of your cluster using appropriate ansible playbook and files
-
Do post-install configuration (mainly for CDP)
-
Create users on your cluster
-
Load some data into your cluster
Each step could be skipped (see command line help).
This group of scripts, coordinated by main script:
has the goal to configure machines provided and launch a CDP (or HDP, CDH) installation with ansible.
Finally, some extra configurations steps and random data could be generated into different services.setup-cluster.sh
All this, is only made from your machine.
This script relies on ansible scripts that must be accessible from your machine (if they are not, please setup an internal webserver and provide its url through command line).
Ansible script relies also on Cloudera repository to access CDP, CM, HDP, Ambari etc… (if they are not accessible, please setup an internal webserver and provide its url through command line).
This script relies also on github repository to load data. (if they are not accessible, please setup an internal webserver and provide its url through command line).
This step uses Playbook hosts_setup.
If you did not set parameter --setup
to false, it will prepare all machines by setting ssh-passwordless, pushing required files to them.
N.B.: This step can be done only one time and then bypass if you reuse same machines
This step uses Playbook ansible_install_preparation and then launch commands directly on the host to launch ansible installation there.
The first playbook used can be skipped setting parameter --install
to false, which is true by default.
It cleans up the first node, creates a directory
, get ansible repository as a zip in it and add files for your installation in it.~/deployment/ansible-repo/
Then, the proper ansible command corresponding to the installation is lauched directly on the first node.
This step uses Playbook post_install.
If you install a CDP cluster and let parameter --post-install
to true, it will do some extra-steps, such as setting no unlogin on CM, fix various potential bugs.
This step uses Playbook user_creation.
If you did not set explicitly parameter --user-creation
to false, and installation completed succesfully, some users are created defined in extra_vars of user_creation.
They are present on all nodes with their
directory containing their keytabs./home
Their keytabs are also fetch in your
directory along with the /tmp
allowing you to kinit directly from your computer.krb5.conf
This step uses Playbook data_load.
If you let parameter --data-load
to true, a data loading step will start (only on CDP, HDP 2 and CDH 5 currently) to generate data into existing services of the paltform: HDFS, HBase, Hive etc…
It is based on random-datagen project
Note that this step is completely extensible as you can add new files to specify how data should be generated in folder playbooks/data_load/generate_data/models
N.B.: This step will also create Ranger required policies, and these are also extensible by adding policies in playbooks/data_load/ranger_policies/push_policies/policies
Once you are familiar with these scripts, you can easily tune them using command-line parameters to provide your own cluster files and repositories.
To provide a quick new definition of a cluster:
-
Copy-Paste directory ansible-cdp and name it for example: ansible-cdp-configured
-
Make all your modifications in files of your copied directory
-
Launch script with argument:
--cluster-type=ansible-cdp-configured
(It will automatically take files under ansible-cdp-configured/ directory)
Those steps can be launched indepently and you can configure it to create more users or load different and more data.
Look inside playbooks folder to extra_vars.yml to get more about possibilities.
Private Cloud setup (on ECS or OC) can also be launched independently on a running cluster.
Configuration of private cloud cluster can also be launched independently. (Use --install-pvc=false
but --pvc=true
to configure but not re-install your pvc).
In extra_vars.yml you can provide CDWs, CDEs, CMLs that will be provisionned for you and also rights that you expect on your users.
-
TLS is not set for HDP & CDH clusters
-
Data loading is not made for HDP 3 & CDH 6 clusters
-
Free IPA is only available for CDP clusters
Please feel free to contribute and help solve and implement TODOs listed in TODOs.adoc