Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup datastore and datapusher #27

Closed
2 tasks
etj opened this issue Apr 15, 2021 · 20 comments · Fixed by #35
Closed
2 tasks

Setup datastore and datapusher #27

etj opened this issue Apr 15, 2021 · 20 comments · Fixed by #35
Assignees
Labels
C195-CREA-AA-2020-AGRIDIGIT DevOps This label marks this as a DevOps activity

Comments

@etj
Copy link
Member

etj commented Apr 15, 2021

  • add dockerized datapusher app
  • setup datastore plugin
@randomorder randomorder added the estimate needed we need to do an estimate for this ticket label Apr 21, 2021
@randomorder
Copy link
Member

@lpasquali please make sure you have all the info you need to move on with the task and assign an estimate to the issue so we can schedule it. Make changes to the checklist as needed

@etj
Copy link
Member Author

etj commented Apr 21, 2021

Official doc here:
https://docs.ckan.org/en/2.9/maintaining/datastore.html

This is the config of the datapusher image in the official docker-compose:
https://github.com/ckan/ckan/blob/ckan-2.9.2/contrib/docker/docker-compose.yml#L45-L49
It's ok to run it into the CKAN VM.

In the ckan.ini file we need to add datastore to the ckan.plugins property list.

The datastore DB should already be set, anyway make sure that it can be properly accessed by both the datapusher app and from inside CKAN (tabular data should be displayed in a grid -- current deploy is not properly parsing the ";" as field delimiter.)

@etj etj added the DevOps This label marks this as a DevOps activity label Apr 28, 2021
@randomorder
Copy link
Member

@lpasquali please make sure you have all the info you need to move on with the task and assign an estimate to the issue so we can schedule it. Make changes to the checklist as needed

@lpasquali ?

@lpasquali lpasquali removed the estimate needed we need to do an estimate for this ticket label May 3, 2021
@lpasquali
Copy link
Contributor

@randomorder I think I can work on it, I put estimate

@randomorder randomorder assigned lpasquali and unassigned randomorder May 7, 2021
@lpasquali
Copy link
Contributor

@etj I think I implemented ckan datapusher/datastore plugins correctly. if I try to move data into the datastore the csv resources actually do become correctly formatted as stated above #27 (comment)
only thing is that for resources > 10 mb the import feature within the gui is not working, I do not know if pushing original json files to the datasource api instead of the default one, will work, I have not found yet how to import data that way.
The 10 mb limit is hardcoded here

also the "official" datapusher image is 4 years old:
image: clementmouchet/datapusher
as can be seen from the repository:
https://github.com/clementmouchet/datapusher

current upstream repository of datapusher is missing a Dockerfile but the code would support to setup MAX_CONTENT_LENGTH as env variable:

https://github.com/ckan/datapusher

I would suggest to add another submodule, for https://github.com/ckan/datapusher use Dockerfile from https://github.com/clementmouchet/datapusher and make our datapusher image, with more maintained datapusher code

@lpasquali
Copy link
Contributor

PR with work up to now: #35

@etj
Copy link
Member Author

etj commented May 10, 2021

It seems that the official ckan docker file references an old fork for the datapusher, which has not been updated in 6 years.

Issue opened about this in the official repo: ckan/datapusher#228.

Currently working on the dockerization of the master branch of the official repo: https://github.com/geosolutions-it/datapusher/tree/228_docker

@lpasquali
Copy link
Contributor

Currently working on the dockerization of the master branch of the official repo: https://github.com/geosolutions-it/datapusher/tree/228_docker
dockerization done, updated PR, we can move on testing things on azure, I will do it tomorrow I think @etj

@lpasquali
Copy link
Contributor

lpasquali commented May 11, 2021

@etj unfortunately the code using datastore writing and read only users is not working for similar reasons we found in past:

ckan          | [SQL: SELECT has_table_privilege(%s, '_foo', %s)]
ckan          | [parameters: ('datastore_ro@testpostgres01aaaa', 'INSERT')]
ckan          | (Background on this error at: http://sqlalche.me/e/f405)
ckan          | Setting var and venv...
ckan          | Initting DB...
ckan          | 2021-05-11 16:18:04,718 INFO  [ckan.cli] Using configuration file /etc/ckan/production.ini
ckan          | 2021-05-11 16:18:04,719 INFO  [ckan.config.environment] Loading static files from public
ckan          | 2021-05-11 16:18:04,721 DEBUG [ckan.lib.webassets_tools] Base path /usr/lib/ckan/venv/src/ckan/ckan/public/base
ckan          | 2021-05-11 16:18:04,751 INFO  [ckan.config.environment] Loading templates from /usr/lib/ckan/venv/src/ckan/ckan/templates
ckan          | 2021-05-11 16:18:04,959 DEBUG [ckan.logic] check access OK - get_site_user user=None
ckan          | 2021-05-11 16:18:05,004 INFO sqlalchemy.pool.impl.QueuePool Pool disposed. Pool size: 10  Connections in pool: 0 Current Overflow: -10 Current Checked out connections: 0
ckan          | 2021-05-11 16:18:05,004 INFO  [sqlalchemy.pool.impl.QueuePool] Pool disposed. Pool size: 10  Connections in pool: 0 Current Overflow: -10 Current Checked out connections: 0
ckan          | 2021-05-11 16:18:05,006 INFO sqlalchemy.pool.impl.QueuePool Pool recreating
ckan          | 2021-05-11 16:18:05,006 INFO  [sqlalchemy.pool.impl.QueuePool] Pool recreating
ckan          | 2021-05-11 16:18:05,017 INFO  [rdflib] RDFLib Version: 4.2.1
ckan          | 2021-05-11 16:18:05,113 DEBUG [ckan.lib.webassets_tools] Base path /usr/lib/ckan/venv/src/ckan/ckan/public/base
ckan          | 2021-05-11 16:18:05,279 DEBUG [ckanext.azure_auth.auth_config] Loading ADFS ID Provider configuration.
ckan          | 2021-05-11 16:18:05,280 INFO  [ckanext.azure_auth.auth_config] Trying to get OpenID Connect config from https://login.microsoftonline.com/00000000-0000-0000-0000-000000000000/.well-known/openid-configuration?appid=00000000-0000-0000-0000-000000000000
ckan          | 2021-05-11 16:18:05,363 INFO  [ckanext.azure_auth.auth_config] Trying to get ADFS Metadata file https://login.microsoftonline.com/00000000-0000-0000-0000-000000000000/FederationMetadata/2007-06/FederationMetadata.xml
ckan          | 2021-05-11 16:18:05,439 CRITI [ckanext.azure_auth.auth_config] Could not load any data from ADFS server. Authentication against ADFS is not possible. 
ckan          | 2021-05-11 16:18:05,440 CRITI [ckanext.azure_auth.plugin] Could not load any data from ADFS server. Authentication against ADFS is not possible. 
ckan          | 2021-05-11 16:18:05,442 INFO  [ckan.config.environment] Loading templates from /usr/lib/ckan/venv/src/ckan/ckan/templates
ckan          | Traceback (most recent call last):
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context
ckan          |     cursor, statement, parameters, context
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 550, in do_execute
ckan          |     cursor.execute(statement, parameters)
ckan          | psycopg2.errors.UndefinedObject: role "datastore_ro@testpostgres01aaaa" does not exist
ckan          | 
ckan          | 
ckan          | The above exception was the direct cause of the following exception:
ckan          | 
ckan          | Traceback (most recent call last):
ckan          |   File "/usr/lib/ckan/venv/bin/ckan", line 33, in <module>
ckan          |     sys.exit(load_entry_point('ckan', 'console_scripts', 'ckan')())
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/click/core.py", line 829, in __call__
ckan          |     return self.main(*args, **kwargs)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/click/core.py", line 781, in main
ckan          |     with self.make_context(prog_name, args, **extra) as ctx:
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/click/core.py", line 700, in make_context
ckan          |     self.parse_args(ctx, args)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/click/core.py", line 1212, in parse_args
ckan          |     rest = Command.parse_args(self, ctx, args)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/click/core.py", line 1048, in parse_args
ckan          |     value, args = param.handle_parse_result(ctx, opts, args)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/click/core.py", line 1630, in handle_parse_result
ckan          |     value = invoke_param_callback(self.callback, ctx, self, value)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/click/core.py", line 123, in invoke_param_callback
ckan          |     return callback(ctx, param, value)
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/cli/cli.py", line 100, in _init_ckan_config
ckan          |     ctx.obj = CkanCommand(value)
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/cli/cli.py", line 50, in __init__
ckan          |     self.app = make_app(self.config)
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/config/middleware/__init__.py", line 24, in make_app
ckan          |     load_environment(conf)
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/config/environment.py", line 122, in load_environment
ckan          |     p.load_all()
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/plugins/core.py", line 165, in load_all
ckan          |     load(*plugins)
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/plugins/core.py", line 193, in load
ckan          |     plugins_update()
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/plugins/core.py", line 153, in plugins_update
ckan          |     environment.update_config()
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckan/config/environment.py", line 296, in update_config
ckan          |     plugin.configure(config)
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckanext/datastore/plugin.py", line 81, in configure
ckan          |     self.backend.configure(config)
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckanext/datastore/backend/postgres.py", line 1777, in configure
ckan          |     self._check_urls_and_permissions()
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckanext/datastore/backend/postgres.py", line 1661, in _check_urls_and_permissions
ckan          |     if not self._read_connection_has_correct_privileges():
ckan          |   File "/usr/lib/ckan/venv/src/ckan/ckanext/datastore/backend/postgres.py", line 1708, in _read_connection_has_correct_privileges
ckan          |     (read_connection_user, privilege)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 982, in execute
ckan          |     return self._execute_text(object_, multiparams, params)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1155, in _execute_text
ckan          |     parameters,
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
ckan          |     e, statement, parameters, cursor, context
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1466, in _handle_dbapi_exception
ckan          |     util.raise_from_cause(sqlalchemy_exception, exc_info)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 399, in raise_from_cause
ckan          |     reraise(type(exception), exception, tb=exc_tb, cause=cause)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
ckan          |     raise value.with_traceback(tb)
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context
ckan          |     cursor, statement, parameters, context
ckan          |   File "/usr/lib/ckan/venv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 550, in do_execute
ckan          |     cursor.execute(statement, parameters)
ckan          | sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedObject) role "datastore_ro@testpostgres01aaaa" does not exist
ckan          | 
ckan          | [SQL: SELECT has_table_privilege(%s, '_foo', %s)]
ckan          | [parameters: ('datastore_ro@testpostgres01aaaa', 'INSERT')]
ckan          | (Background on this error at: http://sqlalche.me/e/f405)

ckan configuration of datastore database is correct, but the app itself is not able to determine the user even trying to escape, as in past the @ with %40, unfortunately the datastore code does not use ckan models for db

@etj
Copy link
Member Author

etj commented May 12, 2021

Found some other issues in the datastore:

@etj etj linked a pull request May 13, 2021 that will close this issue
@etj
Copy link
Member Author

etj commented May 13, 2021

I'm going to check the issue in the datastore code

@randomorder
Copy link
Member

updates @lpasquali ?

@lpasquali
Copy link
Contributor

Hello @etj did you get further on the datastore database (azure related) issues?

@randomorder
Copy link
Member

please let us know @etj

@etj
Copy link
Member Author

etj commented May 26, 2021

The datastore does not need any fix.
The pg role should be created the proper way:

psql -U $arg1@$arg3 -h $arg4 postgres -c "CREATE ROLE "datastore_ro@$arg3" NOCREATEDB NOCREATEROLE LOGIN PASSWORD '${arg5}';"

instead of

psql -U $arg1@$arg3 -h $arg4 postgres -c "CREATE ROLE "datastore_ro" NOCREATEDB NOCREATEROLE LOGIN PASSWORD '${arg5}';"

It means you have to add the @PGHOST part in the role name yourself.

etj added a commit that referenced this issue May 26, 2021
@etj
Copy link
Member Author

etj commented May 26, 2021

The set-permission.sql fails because sql command referencing the role ckan@etj-pg3 are failing, in that such role does not exist, even if the psql command is run using that very username.
It seems that the default user created at startup is handled by the azure pg in a different way than the roles created by hand.
Adding a sed -e "s/ckan@etj-pg3/ckan/g" in the pipe running set-permission.sql.

etj added a commit that referenced this issue May 26, 2021
@etj
Copy link
Member Author

etj commented May 26, 2021

I added a few commits to the datapusher-datastore-ckan branch that fixes the configuration (not any problem in the datastore per se, there were some configuration issues our side).

The deploy procedure now completes successfully and ckan is properly launched.

@lpasquali I guess this is unblocked now.

etj added a commit that referenced this issue May 26, 2021
@randomorder
Copy link
Member

please go ahead @lpasquali

We need this before COB Friday

@lpasquali
Copy link
Contributor

I finally was able to get datastore, datapusher and their interactions working
I cleaned the branch conflicts
I am doing one last clean deployment before making a last needed modification before the PR can be merged

#35

@lpasquali
Copy link
Contributor

Screenshot from 2021-05-28 17-01-05
making PR #35 ready

@etj etj closed this as completed in #35 Jun 4, 2021
etj added a commit that referenced this issue Jun 4, 2021
* implementation of datapusher/datastore plugins

* implementation of datapusher container in azure compose, fixed readme

* added new datapusher docker image, configured datastore, datapusher ckan plugins, initted datastore db

* aligned datapusher submodule

* aligned datapusher submodule

* aligned datapusher in azure compose

* removed unused module datapusher

* fixed building compose in ckan-docker/docker-compose.yml

* fixed building compose in ckan-docker/docker-compose.yml

* reverted parameters

* datastore db provision

* changed psql command

* changed psql command

* changed psql command

* changed psql command

* changed psql command

* changed psql command

* changed psql command

* changed psql command

* changed psql command

* changed psql command

* changed psql command

* pg for datastore

* #39 make ckan config persistent

* #39 make ckan config persistent

* #39 make ckan config persistent

* #39 make ckan config persistent

* #27 fix datastore role creation

* #27 fix datastore set-permission

* #27 fix datastore role creation

* parametrized sed

* updated wrong image for datapusher on azure

* datastore setup

* updated wrong image for datapusher on azure

* updated wrong image for datapusher on azure

* updated wrong image for datapusher on azure

* fixing datastore_ro privileges

* fixing datastore_ro privileges

* fixing datastore_ro privileges

* fixing datastore_ro privileges

* Provide APIKEY in command line

* fixing datastore_ro privileges

* fixing datastore_ro privileges

* fixed after merge

* last fixes

* fixing datastore_ro privileges

* wrong branch

* revert to master branch

* Use datapusher-datastore-ckan for testing

* Reinstate grace-period plugin

* Fix custom plugin order

* Switch back to master branch

Co-authored-by: etj <[email protected]>
Co-authored-by: Emanuele Tajariol <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C195-CREA-AA-2020-AGRIDIGIT DevOps This label marks this as a DevOps activity
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants