Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Re-use database connection pool in production #4751

Closed
LambertW opened this issue Jan 2, 2025 · 2 comments
Closed

[Feature Request]: Re-use database connection pool in production #4751

LambertW opened this issue Jan 2, 2025 · 2 comments

Comments

@LambertW
Copy link

LambertW commented Jan 2, 2025

What would you like to happen?

I'm currently using Apache Hop for ETL integration. However, I have some doubts about how to deploy and run it on the production environment server. How to manage the database connection pool well to avoid the overhead of repeated creations?

I saw hop-server should be the silimalier solution for long term running and could re-use connetion pool while setup scheduled jobs. but i could not found any documents about how to deploy hop with projects to servers and use hop-server to run this project with specific workflow/ pipeline files.

Issue Priority

Priority: 3

Issue Component

Component: Documentation, Component: Hop Server

@bamaer
Copy link
Contributor

bamaer commented Jan 7, 2025

connection pooling was removed in Apache Hop after the fork from PDI.
Connections in a pipeline/workflow are created at the start and release (max) at the end of the execution. Connection pooling adds little to no value in these scenarios (as was already the case in PDI/Kettle).
We could consider documenting the absence of connection pooling in the documentation though.

@bamaer bamaer closed this as completed Jan 7, 2025
@dave-csc
Copy link
Contributor

Joining this discussion, since I had a similar issue.

In may case the absence of connection pooling has caused a sort of temporary ban in a remote database, since I was making too many connections in a relatively short time (i.e. the time needed to get the results of a SELECT query...): after 2-3 queries my workflow deadlocked with no information in the logs.

To workaround this, I grouped the query-related pipelines in a new pipeline and set the Run configuration as transactional for the latter one. It worked for my needs since I had to do SELECTs only.

The case above can indeed be considered an added value for connection pooling. Also, it seems it's not currently possible to close a database transaction, but keeping the connection alive for other queries...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants