-
Notifications
You must be signed in to change notification settings - Fork 155
The Production installation of DataPusher for Ckan2.5.2 on CentOS6.8
Precondition
- This guide assumes that:
- Your system belongs to CentOS, not ubantu. If is ubantu, you can follow the http://docs.ckan.org/projects/datapusher/en/latest/.
- You installed your ckan from source.
- You have already installed CKAN on this server in the default location described in the CKAN install documentation (/usr/lib/ckan/default). If this is correct you should be able to run the following commands directly, if not you will need to adapt the previous path to your needs.
These instructions set up the DataPusher webservice on Apache running on port 8800.
(1) install requirements for the DataPusher
yum install python-devel python-virtualenv libxslt-devel libxml2-devel git
yum groupinstall build-essential
(2) create a virtualenv for datapusher
virtualenv /usr/lib/ckan/datapusher
(3) create a source directory and switch to it
mkdir /usr/lib/ckan/datapusher/src
cd /usr/lib/ckan/datapusher/src
(4) clone the source (always target the stable branch)
git clone -b stable https://github.com/ckan/datapusher.git
(5) install the DataPusher and its requirements
cd datapusher
/usr/lib/ckan/datapusher/bin/pip install -r requirements.txt
/usr/lib/ckan/datapusher/bin/python setup.py develop
Tips: When I install requirements.txt
, it outputs "…… please update setuptools before installing ……". So I update my setuptools berore ……pip install -r requirements.txt
:
/usr/lib/ckan/datapusher/bin/pip install --upgrade setuptools
(6) copy the standard Apache config file
Tips: use deployment/datapusher.apache2-4.conf
if you are running under Apache 2.4. You can use httpd -v
to see your Apache version.
cp deployment/datapusher.conf /etc/httpd/conf.d/datapusher.conf
(7) edit the Apache config file
Edit the /etc/httpd/conf.d/datapusher.conf
. Change the following lines:
ErrorLog /var/log/httpd/datapusher.error.log
CustomLog /var/log/httpd/datapusher.custom.log combined
(8) copy the standard DataPusher wsgi file
Tips: see note below if you are not using the default CKAN install location.
cp deployment/datapusher.wsgi /etc/ckan/
(9) copy the standard DataPusher settings.
cp deployment/datapusher_settings.py /etc/ckan/
(10) open up port 8800 on Apache where the DataPusher accepts connections.
Tips: make sure you only run these 2 functions once otherwise you will need to manually edit /etc/apache2/ports.conf.
sh -c 'echo "NameVirtualHost *:8800" >> /etc/httpd/conf/httpd.conf'
sh -c 'echo "Listen 8800" >> /etc/httpd/conf/httpd.conf'
(11) Set up port 8800 belongs to http_port_t in SELinux.
semanage port -a -t http_port_t -p tcp 8800
(12) add port 8800 to iptables
Edit the file /etc/sysconfig/iptables
by inserting the following line near the middle of the file:
-A INPUT -m state --state NEW -m tcp -p tcp --dport 8800 -j ACCEPT
Restart iptables
service iptables restart
(13) restart the Apache
service httpd restart
Note:
If you are installing the DataPusher on a different location than the default one you need to adapt the following line in the datapusher.wsgi file to point to the virtualenv you are using:
activate_this = os.path.join('/usr/lib/ckan/datapusher/bin/activate_this.py')
In order to tell CKAN where this webservice is located, the following must be added to the [app:main] section of your CKAN configuration file (generally located at /etc/ckan/default/development.ini) :
ckan.datapusher.url = http://0.0.0.0:8800/
The DataPusher also requires the ckan.site_url configuration option to be set on your configuration file:
ckan.site_url = http://your.ckan.instance.com
(1)CKAN 2.2 and above
If you are using at least CKAN 2.2, you just need to add datapusher to the plugins in your CKAN configuration file:
ckan.plugins = <other plugins> datapusher
Restart apache:
service httpd restart
(2)CKAN 2.1
If you are using CKAN 2.1, the logic for interacting with the DataPusher is located in a separate extension, ckanext-datapusherext.
To install it, follow the following steps
1)go to the ckan source directory
cd /usr/lib/ckan/default/src
2)clone the DataPusher CKAN extension
git clone https://github.com/ckan/ckanext-datapusherext.git
3)install datapusherext cd ckanext-datapusherext /usr/lib/ckan/default/bin/python setup.py develop
4)Add datapusherext to the plugins line in /etc/ckan/default/production.ini:
ckan.plugins = <other plugins> datapusherext
5)Restart apache:
service httpd restart
The DataPusher will work without any more configuration as long as the datapusher (or datapusherext for version 2.1) plugin is installed and added to the ckan config file.
Any file that has a format of csv or xls will be attempted to be loaded into to datastore.
(1)CKAN 2.2 and above
When editing a resource in CKAN (clicking the Manage button on a resource page), a new tab will appear named DataStore. This will contain a log of the last attempted upload and a button named Upload to DataStore to upload the data.
(2)CKAN 2.1
If you want to retry an upload go into the resource edit form in CKAN and just click the “Update” button to resubmit the resource metadata. This will retrigger an upload. Configuring the maximum upload size
By default the datapusher
will only attempt to process files less than 10Mb in size. To change this value you can specify the MAX_CONTENT_LENGTH setting in datapusher_settings.py
MAX_CONTENT_LENGTH = 1024 # 1Kb maximum size
Configuring the guessing of types
The datapusher uses Messytables
in order to infer data types. A default configuration is provided which is sufficient in many cases. Depending on your data however, you may need to implement your own Messytables types.
You can specify the types to use with the following settings in your datapusher_settings.py:
TYPES = [messytables.StringType, messytables.DecimalType, YourCustomType...]
TYPE_MAPPING = {'String': 'text', 'Decimal': 'numeric', 'YourCustom': 'timestamp'... }
Test the configuration
To test if it is DataPusher service is working or not run:
curl 0.0.0.0:8800
The result should look something like:
{
"help": "\n Get help at:\n http://ckan-service-provider.readthedocs.org/."
}