Skip to content
This repository has been archived by the owner on Sep 25, 2022. It is now read-only.

Configuration

Peter Monks edited this page Jun 15, 2015 · 6 revisions

For the most part the Bulk Import Tool is self configuring, and shouldn't need much, if any, configuration or tuning to function with good performance. In the 8 years the tool has existed, the vast majority of performance bottlenecks have been unrelated to the tool or Alfresco - things like poorly tuned or under-capacity database servers, a saturated network (especially true when Alfresco's contentstore and the source content directory are on remote devices), a failing hard drive in a RAID array, etc. are all far more common.

That said, the tool does provide a small number of tunables, all of which can be added to alfresco-global.properties to override their default values. They are:

# The maximum "weight" of a batch.  Each file in a node (whether content,
# metadata or version) counts towards this total, as does (a fraction of)
# content file size.
alfresco-bulk-import.batch.weight=100

# The size of the thread pool during the folder import phase (must be > 0)
alfresco-bulk-import.folder.threadpool.size=2
# The size of the thread pool during the file import phase (<= 0 means autosize
# based on the number of CPU cores in the server)
alfresco-bulk-import.file.threadpool.size=-1

# The maximum size (number of batches) allowed in the queue, before scanning
# receives back-pressure (i.e. gets blocked)
alfresco-bulk-import.batch.queue.size=100

# How long to keep inactive threads alive
alfresco-bulk-import.threadpool.keepAlive.time=30
alfresco-bulk-import.threadpool.keepAlive.units=SECONDS

Tuning Alfresco itself is also worthwhile, although in general I recommend focusing on the database as the first priority.


Back to wiki home.

Clone this wiki locally