The Drupal Connector fetches all the content that is available on the URL provided from Drupal Module. Using an algorithm that takes all the links from every page, all the content from those pages is going to be parsed and normalised in a Fusion 5 Server instance, using the SDK connector.
This is a community supported Fusion connector.
The System Diagram presents the 3 components and how are connected. Fusion is the component where the Java Connector will be uploaded as a plugin. This connector plugin will then be used as a datasource in Index Workbench. One of the properties that this connector expects is a URL coming from Drupal Module, from where all the content will be taken recursively and get indexed.
git clone https://github.com/lucidworks/drupal-connector.git
cd drupal8
./gradlew clean build assemblePlugin
This produces the zip file, named drupal8.zip
, located in the build
directory.
This artifact is now ready to be uploaded directly into Fusion Server as a connector plugin.
This connector is using the connector-plugin-sdk
version 2.0.1
which is compatible with Fusion Server 5.
- Drupal URL - the link from where this connector takes all the content.
- Username - the username used to login into drupal to be able to fetch a specific type of content. There are different roles for users defined in that module.
- Password
- Login Path - the path used to the login request -
defaultValue = "/user/login"
- Logout Path - the path used to the logout request -
defaultValue = "/user/logout"
- Entry Path - this entry indicates the page from where fetching the content begins -
defaultValue = "/en/fusion"
Plugin-Id: com.lucidworks.fusion.connector
Plugin-Type: connector
Plugin-Provider: Lucidworks
Plugin-Version: 1.0-SNAPSHOT
Plugin-Connectors-SDK-Version: 2.0.1
Plugin-Class: com.lucidworks.fusion.connector.ConnectorPlugin
The JsonContentFetcher class provides methods that define how data is fetched and indexed for Fusion Server. The data is fetched from Drupal using HttpUrlConnection to call the request. But before the actual request is done to get all the content a login request is needed. There are different types of users that can see the entire content or just a particular part from it. From the login response the header Set-Cookie is taken and used as a header for the next requests.
The DrupalContentCrawler is the class where the data for indexing is resolved. The startCrawling function will do a GET request to all URLs saved in drupalUrls and prepare the next step of execution. All the content is saved in a map topLevelJsonapiMap. This process is running until drupalUrls has values. If you need to check if the process is done you can check the value of processFinished flag.
The content from Drupal URL has a JSON:API format.
JSON:API is a specification for how a client should request the resources to be fetched or modified, and how a server should respond to those requests. JSON:API is designed to minimize both the number of requests and the amount of data transmitted between clients and servers. This efficiency is achieved without compromising readability, flexibility, or discoverability.
JSON:API requires use of the JSON:API media type application/vnd.api+json
for exchanging data.
The most important dependency is the SDK connector. The SDK connector used in this project can be found here.
The HttpUrlConnection class allows us to perform basic HTTP requests without the use of any additional libraries. All the classes that are needed are contained in the java.net package.
Lombok is a java library that automatically plugs into your editor and build tools and replaces using annotations most of the code regarding getters, setters and even constructors.