Running Hermes

One of the easiest ways to run the Hermes code is to run spark in a standalone instance. The instructions below are how to get things running on a linux box in particular, but only the anaconda download would be different for a different operating system.

In addition, there are currently both Python 2.7 and 3.5 compatible versions of the Hermes project at the current moment. Both versions work with Python 2.7, but the Python 2.7 version will not run on Python 3.5. Both versions operate best with Spark version 2.0.

Install Anaconda

wget https://repo.continuum.io/archive/Anaconda2-4.2.0-Linux-x86_64.sh 
chmod +x Anaconda2-4.2.0-Linux-x86_64.sh
./Anaconda2-4.2.0-Linux-x86_64.sh

Install Hermes dependencies

conda install networkx xlrd beautifulsoup4 
pip install rdflib pyshp

Install Spark

wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.2-bin-hadoop2.7.tgz 
tar xzf spark-2.0.2-bin-hadoop2.7.tgz

Clone the Hermes repository! Also, while you are at it create a zip file of the Hermes code, which is what we use to run scripts. We have found that having you zip the code yourself generally yields better results than if we zip the files and push that onto GitHub.

git clone https://github.com/Lab41/hermes.git
cd hermes/
zip -r hermes.zip src __init__.py
cd ..

Convert the data into json files. The exact format varies on which dataset you are working with. For example, to convert the Kaggle dataset, you would do the following (of course modifying the paths dependent upon where your files are located):

python hermes/src/utils/kaggle_etl/scripts_to_json.py /path/to/kaggle/files/ -o /output/directory/

Run Spark

cd spark-2.0.2-bin-hadoop2.7
PYSPARK_PYTHON=python3 bin/pyspark --master local[30] --driver-memory 8g

While you are in Spark, execute a Hermes python script

# Python 2 Version
execfile('hermes_script.py')
# Python 3 Version
exec(open('hermes_script.py').read())

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running Hermes

Clone this wiki locally