Skip to content

Graph neural network algorithms applied to FitLayout artifacts

Notifications You must be signed in to change notification settings

FitLayout/graphlearn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FitLayout - GNN Learning Algorithms

(c) 2024 Radek Burget ([email protected])

Experimental code for the recognition of specific document parts using Graph Neural Networks.

This project demonstrates the use of FitLayout as a data source for training Graph Neural Networks in Python. It provides a sample implementation of the following components:

The creategraph.py script shows a sample usage of the implemented clases with PyTorch Geometric.

Preparing the Source FitLayout artifact repository

This code assumes an existing FitLyaout artifact repository that contains the rendered pages (Page artifacts) and derived AreaTree artifacts. The repository must be accessible through the REST API provided by an instance of the FitLayoutWeb server.

Requirements

All the script server and CLI srcipts below require Docker to be installed on the system.

Artifact preparation

First, the FL_STORAGE environment variable is set to the path where the RDF artifacts will be stored, e.g.:

export FL_STORAGE="$HOME/.fitlayout/storage-demo"

The folder will be created automatically if it does not already exist.

The artifacts in the RDF storage may be prepared via the command-line interface (CLI) or via an interactive GUI. For using the CLI, the cli/fitlayout.sh script may be used. E.g. for rendering a page and the corresponding AreaTree, the following command may be used:

./fitlayout.sh \
    USE local \
    RENDER -b puppeteer https://cssbox.sourceforge.net \
    STORE \
    SEGMENT -m simple \
    STORE

See the FitLayout Wiki for the CLI usage instructions.

Alternatively, the GUI browser may be used for creating the artifacts interactively. See the server/local folder for instructions on how to run the local server with a web GUI and open the GUI in your browser. Then use the Render tab of the GUI to render the pages and subsequently Segmentation tab to create the AreaTree using the Simple area tree construction service.

Note: To avoid conflicts, the local server instance and the CLI tool should not be used simultaneously on the same storage folder.

Running the server

The server is used as the data source for the python scripts. Again, the FL_STORAGE environment variable should be set to the path where the RDF artifacts are stored, e.g.:

export FL_STORAGE="$HOME/.fitlayout/storage-demo"

For acessing the repository on a local machine, the local GUI browser may be used as mentioned above. See the server/local folder for instructions on how to run the local server. It is not necessary to open the GUI in a browser since the python scripts will use the server instead.

Alternatively, a standalone server with no GUI may be used. See the server/standalone folder for details.

The URL of the running server must be configured in the src/config.py configuration script. In both cases, the servers can run on a local machine (use localhost as the hostname) or a remote server (use the hostname of the server).

The src/list_artifacts.py may be used for checking the connection and listing all the AreaTree artifacts available in the repositoy.

About

Graph neural network algorithms applied to FitLayout artifacts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published