-
Notifications
You must be signed in to change notification settings - Fork 13
VM hadoop setup instructions
Request VM from openstack.cern.ch
Install admin frontend das mongodb packages in a normal way how we install stuff on cmsweb VM [1]
I created /data/wma area and copied stuff from my lxplus.cern.ch:~valya/workspace/wma/ area over there
Create install area
mkdir -p /data/wma/usr/lib/python2.7/site-packages
Create setup.sh file
#!/bin/bash
source /data/srv/current/apps/das/etc/profile.d/init.sh
export JAVA_HOME=/usr/lib/jvm/java
#export PATH=$PATH:$PWD/mongodb/bin
export PYTHONPATH=$PYTHONPATH:/data/wma/usr/lib/python2.7/site-packages
Set-up your environment source setup.sh
, this will setup MongoDB, python 2.7, pymongo
Install pip [2] (optional) ''' curl https://bootstrap.pypa.io/get-pip.py > get-pip.py python get-pip.py ''' Install java on VM
sudo yum install java-1.8.0-openjdk-devel.x86_64
Create /etc/yum.repos.d/cloudera.repo with the following content:
[cloudera]
gpgcheck=0
name=Cloudera
enabled=1
priority=15
baseurl=https://cern.ch/it-service-hadoop/yum/cloudera-cdh542
Install hadoop
sudo yum install hadoop-hdfs.x86_64 hadoop.x86_64 hive.noarch hadoop-libhdfs.x86_64
sudo yum install hadoop-hdfs-namenode.x86_64 hadoop-hdfs-datanode.x86_64
Configure hadoop [3, 4]
sudo cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster
sudo alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
sudo alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
ls /etc/hadoop/conf.my_cluster/
sudo vim /etc/hadoop/conf.my_cluster/core-site.xml
Here is relevant part you should have in core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Adjust hdfs-site.xml
sudo vim /etc/hadoop/conf.my_cluster/hdfs-site.xml
Create relevant part in hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///var/lib/hadoop-hdfs/cache/hdfs/dfs/name</value>
</property>
Format local HDFS
sudo -u hdfs hdfs namenode -format
Start HDFS
cd /etc/init.d/
sudo service hadoop-hdfs-datanode start
sudo service hadoop-hdfs-namenode start
Create some areas on HDFS
sudo -u hdfs hadoop fs -mkdir /tmp
sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
sudo -u hdfs hadoop fs -mkdir /test
sudo -u hdfs hadoop fs -chmod -R 1777 /test
hadoop fs -ls /tmp
# now we ready to put anything to hadoop, e.g.
hadoop fs -put local_file /tmp
hadoop fs -ls /tmp
Install pydoop
cd /data/wma/soft/pydoop
python setup.py install --prefix=/data/wma/usr
Install avro
cd /data/wma/soft/avro-1.7.7
python setup.py install --prefix=/data/wma/usr
Fetch WMCore framework
git clone [email protected]:dmwm/WMCore.git
Get WMArchive framework
git clone [email protected]:dmwm/WMArchive.git
Remove DAS from deploy area, otherwise it will be started
rm /data/srv/enabled/das
Adjust wmarch_config.py file to use your favorite storage, e.g.
mkdir /data/wma/storage
and setup ROOTDIR to /data/wma in wmarch_config.py
Start WMArchive service
cd /data/wma
./run_wma.sh
Run simple test with fileio storage, post some data and retrieve it back
curl -D /dev/stdout -X POST -H "Content-type: application/json" -d "{\"data\":{\"name\":1}}" http://localhost:8246/wmarchive/data/
curl -D /dev/stdout -H "Content-type: application/json" http://localhost:8246/wmarchive/data/eed35faf3b73d58157aa53d097899e8d
Here are some commands to use
# single document injection
curl -D /dev/stdout -X POST -H "Content-type: application/json" -d "{\"data\":{\"name\":1}}" http://localhost:8246/wmarchive/data/
# single document retrieval
curl -D /dev/stdout -H "Content-type: application/json" http://localhost:8246/wmarchive/data/eed35faf3b73d58157aa53d097899e8d
# multiple documents injection
curl -D /dev/stdout -X POST -H "Content-type: application/json" -d "{\"data\":[{\"name\":1}, {\"\name\":2}]}" http://localhost:8246/wmarchive/data/
# multiple documents retrieval
curl -D /dev/stdout -X POST -H "Content-type: application/json" -d "{\"query\":[\"eed35faf3b73d58157aa53d097899e8d\", \"bcee13403f554bc14f644ffdeaa93372\"]}" http://localhost:8246/wmarchive/data/
- https://cms-http-group.web.cern.ch/cms-http-group/tutorials/environ/vm-setup.html
- https://pip.pypa.io/en/stable/installing/
- https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#Standalone_Operation
- http://www.cloudera.com/content/www/en-us/documentation/cdh/5-0-x/CDH5-Installation-Guide/cdh5ig_hdfs_cluster_deploy.html?scroll=topic_11_2