Please Note: This repository contains known security vulnerabilities. Use at your own risk!
The PigSolrDataStore is a library to directly index solr from pig using a single STORE function. It is a configurable library which works with both Solr4 and Solr5. Check the examples below for knowing how to use the PigSolrDataStore.
The PigSolrDataStore can be configured using the following parameters:
- solrVersion (Required) - version of the solr being used. It can be either "SOLR4" or "SOLR5".
- collection (Required) - the collection we are updating.
- idField (Required) - the unique key of a document if it exists.
- cloud (Optional) - whether or not we are storing documents in a cloud server or not. (Not needed for SOLR4)
- partialUpdate (Optional) - whether or not we are doing partial updates or complete doc replacements. (requires unique key, not needed for SOLR4)
- ttlParamName (Optional) -
property, example_ttl_
, to use that can be coupled with expirationFieldName, example_expire_at_
, to set a document expiration datetime. (Not needed for SOLR4) - ttlExpression (Optional) - a date math expression, defaults to "+10080MINUTES", which is an offset from the request date/time. (Not needed for SOLR4)
These parameters can be supplied to the PigSolrStore in any order.
Run mvn clean install
and inside the target directory find the PigSolrDataStore-x.x.x-jar-with-dependencies.jar.
Can be index using the following STORE function
STORE adlogs INTO '' USING'solrVersion=SOLR4', 'collection=collection1', 'idField=someId')
- a)
In the case where we are submitting to a solr server we go direct to it. The url would be something like:
and the STORE function looks like
STORE adlogs INTO '' USING'solrVersion=SOLR5', 'collection=collection1', 'idField=someId')
- b)
In the case where we are submitting to a solr cloud server we access zookeeper. The url would be something like:
This can also be a csv to access an ensemble:
and the STORE function looks like
STORE adlogs INTO 'zookeeper://' USING'solrVersion=SOLR5',
If set to true a unique key is required, both in the settings here and in your solr schema. This means that a new document will be created if the unique key does not exist and if it does then only the provided fields will be updated.