Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import is slow - how to debug #196

Open
andrisi opened this issue Mar 6, 2022 · 3 comments
Open

Import is slow - how to debug #196

andrisi opened this issue Mar 6, 2022 · 3 comments

Comments

@andrisi
Copy link

andrisi commented Mar 6, 2022

Is there a way to debug an import job? Perhaps run from the command line and add a flag to see timings. I'm importing a CSV with 40K lines and associated PDF files (few pages each), and using the Extract Text. In 24 hours it imported about 4k items. It does run, I see the worker process (/scripts/perform-job.php) and it does import items. Would be nice to know where does it spend that time. It doesn't seem to use much memory and CPU time either. Thanks!

@andrisi andrisi changed the title import is very slow - how to debug Import is slow - how to debug Mar 7, 2022
@zerocrates
Copy link
Contributor

A variety of things can slow down imports... one common culprit would be generating the thumbnails for each media you're importing. If the files are quite large or require some complex processing, ImageMagick can sometimes take a while to make a thumbnail for them. There's also just the pure download time of getting the files themselves from their source URLs. Some simple unobtrusive options like not mapping the media column, or disabling thumbnail generation, could narrow things down some.

@andrisi
Copy link
Author

andrisi commented Aug 20, 2022

@zerocrates thanks! One question remains, is there a way to debug the import job - eg. run it from the command line?

@zerocrates
Copy link
Contributor

zerocrates commented Feb 8, 2023

Sorry for the lack of response on this... you can run a job from the command line using the perform-job.php script, but it's not totally trivial to do. You have to actually create the job first by starting the import, then run it by its job ID. Setting the PHP CLI path in the config to something like /bin/true so it doesn't actually run the job, and then running it manually yourself with that job ID might be the best choice.

You run a job manually as php application/data/scripts/perform-job.php --job-id <job ID> --base-path <Omeka base URL path on server, i.e. / or /omeka> --server-url <address of server i.e. https://example.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants