Skip to content
Zachary Hills edited this page Aug 31, 2020 · 43 revisions

Citation Galaxies Wiki Home

Summary

This wiki is targeted at developers contributing to this project. Here one will find a break down of project internals: classes used, file structure and similar information.

Built With

Debugging

Pre-configured debugging configurations for VSCode come with the project.

Key Features

Concept

Citation Galaxy objective is to provide a text analysis tool specifically geared to aid in Bibliometrician's work. Their work surrounds how, where, and when citations occur in academic texts. Traditionally they use python and various text parsing scripts to accomplish their goals. Citation Galaxy is designed to be a visual application that the user can easily interact with the UI and produce complex "rules". The rules allow the user to create logical gates that the backend will use to find occurences of the rule in the Pubmed corpus. For example, a simple rule can be a term such as heart and a range which denotes how far away the term heart must be from a citation. The range is based off sentences for example if the rule is 0<- heart ->0, the term heart must occur within a sentence that also contains a citation.

<Headers related to features>


Project Structure

node

The node folder is your working directory for the project

  • backend The backend folder is your Node.js server code
    index.js Main File

    This file sets up all the routing, dotEnv, middleware (express-session), and Socket.IO.

    Notable Info:

    • SocketManager is used for two way communications to update progress bars.
    • Express-Session is stored in the Postgres DB, it is not located in memory. Do not change that because it is necessary for horizontal scaling.
    • Post Body is set to 50MB, we ran into issues with maxing out get param limit. Should you run into that issue convert your get request to post.
    userRoutes.js

    This file contains all of the user routes

    Notable Info:

    • Creation of user uses email verfication
    • Unverified Users have their own table, if a bug happens in the verification process you will have to fix the unverified table
    • Recommend to use pool.query instead of creating specific clients
    api.js

    This file contains all of the api routes

    Notable Info:

    • Heavy dependency on dataLayer.js
    • Typical pattern for determining selected database is having the frontend send in a bool to state what database it wants to query (isPubmed)
    • The DATA_LAYER is the singleton class to access the database
    • If that project is being pushed for more horizontal scaling, decoupling the api routes and adding a business layer will make it extremely scalable
  • frontend
    js
    front.js

    This file is the legacy UI. It is still relevant in the current application.

    Notable Info:

    • The home page's code resides in this file
    • This file is global
    • The grid, erudit timeline and search code resides in this file
    paper.js

    This file contains all of the code for the paper view page.

    Notable Info:

    • The page could be prone to locks due to needing a huge amount of memory
    • The filter only applies to the in focus tab
    newfront.js

    This file is the new UI code.

    Notable Info:

    • The page could be prone to locks due to needing a huge amount of memory
    • The filter only applies to the in focus tab
    manager.js
    dbManager.js
tools

Code Supporting Key Features

Main View

Grid

Frontend

//front.js
prepContainers(); // this function is responsible for clearing the grid and prepping it for new data
drawAllYears(data): // this function is responsible for drawing the new data in the grid

Backend

const getGridVisualization = async (req, res);//This function can be used to get the grid view if you already have data in the user table (very useful for making new features) - const requiredInfo = { bins: {}, years: [], isPubmed: false };
const ruleSearch = async (req, res); //This is the rule search api which changes the grid and updates the user table const requireInfo = {bins: {}, years: [], isPubmed: false,};
const search = async function (req, res); //This is the search api which changes the grid and updates the user table const requireInfo = { rule: {}, bins: {}, years: [], term: "", isPubmed: false,};

Rules

Frontend

//manager.js
loadData(  table_name,
  params,
  draw_table = true,
  callback = undefined,
  new_load = true); //this function is called after the rule query is returned form the server. It calls populateTable.

populateTable(
  signals,
  name,
  table,
  links,
  actions,
  schema,
  external_data,
  aliases
); //This function is responsible for drawing the rule tables

showAddRow(); //this is the function that appends another set of rules to the rule table

Backend

const deleteRuleSet = async (req, res);//deletes rulesets
const updateRuleSet = async (req, res);//updates rulesets 
const addRuleSet = async (req, res);//add rule sets to user table
const loadRuleSets = async (req, res);//initial load of the rule sets
const updateRule = async (req, res);//update specific rules
const addRule = async (req, res);//add rules
const loadRules = async (req, res);//load rules initially

Tool Bar

Frontend

//front.js
searchForQuery(); //this is called when the basic search is clicked on the UI
ruleSearch(); //this is called when the rule search is clicked on the UI
//newfront.js
spinIconDb(); //this is called when the db icon is clicked. This sets the current database.
snapshot(); //this is called when the camera icon is clicked.
//dbManager.js
//this contains the current database state. It is where all of the front end differences for pubmed and erudit reside. For example, the citation icon in the range is used for pubmed while in Erudit a T icon is used.

Backend

const ruleSearch = async (req, res);//searches the current database for the supplied rules (expected input: const requireInfo = { bins: {},years: [], isPubmed: false,};)
const search = async function (req, res);//searches the current database for the term and range that is specified (expected input: const requireInfo = { rule: {}, bins: {}, years: [], term: "", isPubmed: false,};)
const submitSnapshot = async (req, res); //takes the snapshot and puts it in the snapshot_log table (expected input: const requiredInfo = { selection: [], filters: [], img: "", info: {} }; )

Paper View

Paper Overlay

Frontend

drawPaperList(
  papers,
  all_max,
  sentenceHits,
  ruleHits,
  local_norm = false
);

drawPapersByIndex(results, local_norm = false);

Backend

const getPapers = async (req, res); //gets the papers given the current selection in the grid view (expected input: const requiredInfo = { selections: [], rangeLeft: 0, rangeRight: 0, isPubmed: false,};)
const getPaper = async (req, res); //gets a specific paper, this is used when a user clicks a paper and the overlay pops up with all the information (expected input: let requiredInfo = { paper_id: 0, isPubmed: false };)

Paper Filters

Frontend

//newfront.js
submitFilters(); //called on submit, invokes getFilteredPapers()
addRowToFilterForm(); //called when filter is added
getFilterSuggestions(currentValue, filter, element); //called constantly when a filter field has new data. It is essentially the auto-complete
getFilteredPapers();//gets the data from the server

Backend

const getFilterNames = async (req, res);//This function is used for auto-complete in the filter section of the paper view page (expected input: const requiredInfo = { filter: "", currentValue: "", ids: [], isPubmed: false, year: 0, }; )
const getFilteredIDs = async (req, res);//This function will give back the ids that meet the fields that are supplied (expected input: const requiredInfo = { fields: [], ids: [], isPubmed: false, year: 0 };) 

Loading Data

The pubmed data can be loaded in using pubmed_data_loader.py. The data loader requires a path to be specified when called.

Example

[zhills@gpu04 Citation-Galaxies]$ nohup python -u pubmed_data_loader.py /Pubmed/non-commercial/ &

It is important to make sure that the path points to either the commercial or non-commercial folder in the pubmed dataset. These folders exist in the bulk download of Pubmed. The nohup will ensure you can close the terminal and it will keep running. The & at the end flushes the output buffers to ensure the print functions and errors are being put into the nohup.out file. You can track the progress of the script in the nohup.out file.

Database Breakdown

User Tables

Pubmed Tables

Erudit Tables

Common Developer Questions