Vector Scoring Plugin for Elasticsearch

This plugin allows you to score documents based on arbitrary raw vectors, using dot product or cosine similarity.

Releases

Master branch targets Elasticsearch 5.4. Not that version 5.5+ is not supported as Elasticsearch changed their plugin mechanism. An update for 5.5+ will be developed soon (PRs welcome).

Branch es-2.4 targets Elasticsearch 2.4.x

Overview

The aim of this plugin is to enable real-time scoring of vector-based models, in particular factor-based recommendation models.

In this case, user and item factor vectors are indexed using the Delimited Payload Token Filter, e.g. the vector [1.2, 0.1, 0.4, -0.2, 0.3] is indexed as a string: 0|1.2 1|0.1 2|0.4 3|-0.2 4|0.3.

This stores the vector indices as "terms" and the vector values as "payloads".

Scoring

This plugin provides a native script payload_vector_score for use in function_score queries.

The script computes the dot product between the query vector and the document vector. In pseudo-code:

for (i : vector_indices_terms) {
    payload = indexTermField(i).getPayload()
    score += payload * queryVector(i)
}

Plugin installation

Targets Elasticsearch 5.4.0 and Java 1.8.

Simple installation

ELASTIC_HOME/bin/elasticsearch-plugin install https://github.com/MLnick/elasticsearch-vector-scoring/releases/download/v5.4.0/elasticsearch-vector-scoring-5.4.0.zip

Build from source

Build: mvn package
Install plugin in Elasticsearch: ELASTIC_HOME/bin/elasticsearch-plugin install file:///PROJECT_HOME/target/releases/elasticsearch-vector-scoring-5.4.0.zip (stop ES first).

Start Elasticsearch: ELASTIC_HOME/bin/elasticsearch. You should see the plugin registered at Elasticsearch startup:

...
[2017-03-29T13:46:57,804][INFO ][o.e.p.PluginsService     ] [2Zs8kW3] loaded plugin [elasticsearch-vector-scoring]
...

Example usage

Below are examples illustrating basic usage. For a more complete usage example, including training a recommender model with Apache Spark, see the Elasticsearch Spark Recommender on IBM Code.

Index setup

curl -s -XPUT 'http://localhost:9200/test?pretty' -d '{
    "settings" : {
        "analysis": {
                "analyzer": {
                   "payload_analyzer": {
                      "type": "custom",
                      "tokenizer":"whitespace",
                      "filter":"delimited_payload_filter"
                    }
          }
        }
     }
}'

curl -s -XPUT 'http://localhost:9200/test/_mapping/movies?pretty' -d '
{
    "movies" : {
        "properties" : {
            "@model_factor": {
                            "type": "text",
                            "term_vector": "with_positions_offsets_payloads",
                            "analyzer" : "payload_analyzer"
                     }
        }
    }
}'

curl -s -XPUT 'http://localhost:9200/test/movies/1?pretty' -d '
{
    "@model_factor":"0|1.2 1|0.1 2|0.4 3|-0.2 4|0.3",
    "name": "Test 1"
}'

curl -s -XPUT 'http://localhost:9200/test/movies/2?pretty' -d '
{
    "@model_factor":"0|0.1 1|2.3 2|-1.6 3|0.7 4|-1.3",
    "name": "Test 2"
}'

curl -s -XPUT 'http://localhost:9200/test/movies/3?pretty' -d '
{
    "@model_factor":"0|-0.5 1|1.6 2|1.1 3|0.9 4|0.7",
    "name": "Test 3"
}'

curl -s -XGET 'http://localhost:9200/test/movies/1/_termvector?pretty' -d '
{
  "fields" : ["@model_factor"],
  "payloads" : true,
  "positions" : true
}'

Scoring example

curl -s -XPOST 'http://localhost:9200/test/movies/_search?pretty' -d '
{
    "query": {
        "function_score": {
            "query" : {
                "query_string": {
                    "query": "*"
                }
            },
            "script_score": {
                "script": {
                	"inline": "payload_vector_score",
                	"lang": "native",
                	"params": {
                    	"field": "@model_factor",
                    	"vector": [0.1,2.3,-1.6,0.7,-1.3],
                    	"cosine" : true
                    }
				}
            },
            "boost_mode": "replace"
        }
    }
}'

This query returns results sorted by cosine similarity (including the document itself). For "similar item" style recommendations, you can filter the query item from the returned results.

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.99999994,
    "hits" : [ {
      "_index" : "test",
      "_type" : "movies",
      "_id" : "2",
      "_score" : 0.99999994,
      "_source" : {
        "@model_factor" : "0|0.1 1|2.3 2|-1.6 3|0.7 4|-1.3",
        "name" : "Test 2"
      }
    }, {
      "_index" : "test",
      "_type" : "movies",
      "_id" : "3",
      "_score" : 0.2175577,
      "_source" : {
        "@model_factor" : "0|-0.5 1|1.6 2|1.1 3|0.9 4|0.7",
        "name" : "Test 3"
      }
    }, {
      "_index" : "test",
      "_type" : "movies",
      "_id" : "1",
      "_score" : -0.19618797,
      "_source" : {
        "@model_factor" : "0|1.2 1|0.1 2|0.4 3|-0.2 4|0.3",
        "name" : "Test 1"
      }
    } ]
  }
}

TODO

Tests

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
example		example
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vector Scoring Plugin for Elasticsearch

Releases

Overview

Scoring

Plugin installation

Simple installation

Build from source

Example usage

Index setup

Scoring example

TODO

About

Releases

Packages

Languages

License

samanthamccabe/elasticsearch-vector-scoring

Folders and files

Latest commit

History

Repository files navigation

Vector Scoring Plugin for Elasticsearch

Releases

Overview

Scoring

Plugin installation

Simple installation

Build from source

Example usage

Index setup

Scoring example

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages