Skip to content

The Production installation of DataPusher for Ckan2.5.2 on CentOS6.8

Lili Yin edited this page Jul 25, 2016 · 4 revisions

Precondition

  • This guide assumes that:
  • Your system belongs to CentOS, not ubantu. If is ubantu, you can follow the http://docs.ckan.org/projects/datapusher/en/latest/.
  • You installed your ckan from source.
  • You have already installed CKAN on this server in the default location described in the CKAN install documentation (/usr/lib/ckan/default). If this is correct you should be able to run the following commands directly, if not you will need to adapt the previous path to your needs.

1. Production installation and Setup

These instructions set up the DataPusher webservice on Apache running on port 8800.

(1) install requirements for the DataPusher

yum install python-devel python-virtualenv libxslt-devel libxml2-devel git
yum groupinstall build-essential

(2) create a virtualenv for datapusher

virtualenv /usr/lib/ckan/datapusher

(3) create a source directory and switch to it

mkdir /usr/lib/ckan/datapusher/src
cd /usr/lib/ckan/datapusher/src

(4) clone the source (always target the stable branch)

git clone -b stable https://github.com/ckan/datapusher.git

(5) install the DataPusher and its requirements

cd datapusher
/usr/lib/ckan/datapusher/bin/pip install -r requirements.txt
/usr/lib/ckan/datapusher/bin/python setup.py develop

Tips: When I install requirements.txt, it outputs "…… please update setuptools before installing ……". So I update my setuptools berore ……pip install -r requirements.txt:

/usr/lib/ckan/datapusher/bin/pip install --upgrade setuptools

(6) copy the standard Apache config file

Tips: use deployment/datapusher.apache2-4.conf if you are running under Apache 2.4. You can use httpd -v to see your Apache version.

cp deployment/datapusher.conf /etc/httpd/conf.d/datapusher.conf

(7) edit the Apache config file

Edit the /etc/httpd/conf.d/datapusher.conf. Change the following lines:

ErrorLog /var/log/httpd/datapusher.error.log
CustomLog /var/log/httpd/datapusher.custom.log combined

(8) copy the standard DataPusher wsgi file

Tips: see note below if you are not using the default CKAN install location.

cp deployment/datapusher.wsgi /etc/ckan/

(9) copy the standard DataPusher settings.

cp deployment/datapusher_settings.py /etc/ckan/

(10) open up port 8800 on Apache where the DataPusher accepts connections.

Tips: make sure you only run these 2 functions once otherwise you will need to manually edit /etc/apache2/ports.conf.

sh -c 'echo "NameVirtualHost *:8800" >> /etc/httpd/conf/httpd.conf'
sh -c 'echo "Listen 8800" >> /etc/httpd/conf/httpd.conf'

(11) Set up port 8800 belongs to http_port_t in SELinux.

semanage port -a -t http_port_t -p tcp 8800

(12) add port 8800 to iptables

Edit the file /etc/sysconfig/iptables by inserting the following line near the middle of the file:

-A INPUT -m state --state NEW -m tcp -p tcp --dport 8800 -j ACCEPT

Restart iptables

service iptables restart

(13) restart the Apache

service httpd restart

Note:

If you are installing the DataPusher on a different location than the default one you need to adapt the following line in the datapusher.wsgi file to point to the virtualenv you are using:

activate_this = os.path.join('/usr/lib/ckan/datapusher/bin/activate_this.py')

2. CKAN Configuration

In order to tell CKAN where this webservice is located, the following must be added to the [app:main] section of your CKAN configuration file (generally located at /etc/ckan/default/development.ini) :

ckan.datapusher.url = http://0.0.0.0:8800/

The DataPusher also requires the ckan.site_url configuration option to be set on your configuration file:

ckan.site_url = http://your.ckan.instance.com

(1)CKAN 2.2 and above

If you are using at least CKAN 2.2, you just need to add datapusher to the plugins in your CKAN configuration file:

ckan.plugins = <other plugins> datapusher

Restart apache:

service httpd restart

(2)CKAN 2.1

If you are using CKAN 2.1, the logic for interacting with the DataPusher is located in a separate extension, ckanext-datapusherext.

To install it, follow the following steps

1)go to the ckan source directory

cd /usr/lib/ckan/default/src

2)clone the DataPusher CKAN extension

git clone https://github.com/ckan/ckanext-datapusherext.git

3)install datapusherext cd ckanext-datapusherext /usr/lib/ckan/default/bin/python setup.py develop

4)Add datapusherext to the plugins line in /etc/ckan/default/production.ini:

ckan.plugins = <other plugins> datapusherext

5)Restart apache:

service httpd restart

3. Using the DataPusher

The DataPusher will work without any more configuration as long as the datapusher (or datapusherext for version 2.1) plugin is installed and added to the ckan config file.

Any file that has a format of csv or xls will be attempted to be loaded into to datastore.

(1)CKAN 2.2 and above

When editing a resource in CKAN (clicking the Manage button on a resource page), a new tab will appear named DataStore. This will contain a log of the last attempted upload and a button named Upload to DataStore to upload the data.

(2)CKAN 2.1

If you want to retry an upload go into the resource edit form in CKAN and just click the “Update” button to resubmit the resource metadata. This will retrigger an upload. Configuring the maximum upload size

By default the datapusher will only attempt to process files less than 10Mb in size. To change this value you can specify the MAX_CONTENT_LENGTH setting in datapusher_settings.py

MAX_CONTENT_LENGTH = 1024 # 1Kb maximum size

Configuring the guessing of types

The datapusher uses Messytables in order to infer data types. A default configuration is provided which is sufficient in many cases. Depending on your data however, you may need to implement your own Messytables types.

You can specify the types to use with the following settings in your datapusher_settings.py:

TYPES = [messytables.StringType, messytables.DecimalType, YourCustomType...]
TYPE_MAPPING = {'String': 'text', 'Decimal': 'numeric', 'YourCustom': 'timestamp'... }

4. Debugging

Test the configuration

To test if it is DataPusher service is working or not run:

curl 0.0.0.0:8800

The result should look something like:

{
"help": "\n        Get help at:\n        http://ckan-service-provider.readthedocs.org/."
}