Configuration

For the most part the Bulk Import Tool is self configuring, and shouldn't need much, if any, configuration or tuning to function with good performance. In the 8 years the tool has existed, the vast majority of performance bottlenecks have been unrelated to the tool or Alfresco - things like poorly tuned or under-capacity database servers, a saturated network (especially true when Alfresco's contentstore and the source content directory are on remote devices), a failing hard drive in a RAID array, etc. are all far more common.

That said, the tool does provide a small number of tunables, all of which can be added to alfresco-global.properties to override their default values. They are:

# The maximum "weight" of a batch.  Each file in a node (whether content,
# metadata or version) counts towards this total, as does (a fraction of)
# content file size.
alfresco-bulk-import.batch.weight=100

# The size of the thread pool during the folder import phase (must be > 0)
alfresco-bulk-import.folder.threadpool.size=2
# The size of the thread pool during the file import phase (<= 0 means autosize
# based on the number of CPU cores in the server)
alfresco-bulk-import.file.threadpool.size=-1

# The maximum size (number of batches) allowed in the queue, before scanning
# receives back-pressure (i.e. gets blocked)
alfresco-bulk-import.batch.queue.size=100

# How long to keep inactive threads alive
alfresco-bulk-import.threadpool.keepAlive.time=30
alfresco-bulk-import.threadpool.keepAlive.units=SECONDS

Tuning Alfresco itself is also worthwhile, although in general I recommend focusing on the database as the first priority.

Back to wiki home.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration

Clone this wiki locally