Skip to content

cloudmesh-ansible/example-project-nist-fingerprint-matching

Repository files navigation

This example shows how to deploy the NIST Fingerprint dataset (Special Database 4) and tools (NBIS) to the cluster. Additionally, this demonstrates how to use Apache Spark to perform a fingerprint matching using the NBIS tools, store the results in HBase and use Apache Drill to query the results.

Work-in-progress

Fingerprint recognition refers to the automated method for verifying a match between two fingerprints and that is used to identify individuals and verify their identity. Fingerprints (Figure 1) are the most widely used form of biometric used to identify individuals.

Example fingerprints

Figure 1 Example fingerprints

The automated fingerprint matching generally required the detection of different fingerprint features (aggregate characteristics of ridges, and minutia points) and then the use of fingerprint matching algorithm, which can do both one-to- one and one-to- many matching operations. Based on the number of matches a proximity score (distance or similarity) can be calculated.

For this work we will use the following algorithms:

We use the following NIST dataset for the study:

  1. Match the fingerprint images from a probe set to a gallery set and report a match scores?
  2. What is the most efficient and high-throughput way to match fingerprint images from a probe set to a large fingerprint gallery set?
  • Apache Hadoop (with YARN)
  • Apache Spark
  • Apache HBase
  • Apache Drill
  • Scala
  1. Launch a virtual cluster
  2. Deploy the stack
    1. Big Data Stack
    2. deploy.yml
  3. Prepare the dataset
    1. adds dataset to HBase -- scala, spark, hbase
    2. partitions dataset into "probe" and "gallery"
  4. Run the analysis
    1. Load the probe sete
    2. Load the gallery set
    3. Compare each image in "probe" to "gallery"
    4. Store results in HBase
  5. Use Drill to query

In order to run this example, you need to have the following on your system:

  1. Python 2.7
  2. Pip
  3. Virtualenv
  4. Git

Additionally, you should have access to a cloud provider (such as OpenStack or Amazon Web Services). Any instances need to be accessible via SSH and have Python 2.7 installed.

  1. Cloud provider

Note: Your controller node needs to be able to run Ansible.

If you want to get started quickly here is what you need to do.

note: We assume the login user is ubuntu, you may need to adjust the ansible commands to accomodate a different user name.

  1. Have an account on github.com

  2. Upload you public ssh key to github.com

  3. Clone this repository:

    $ git clone --recursive [email protected]:cloudmesh/example-project-nist-fingerprint-matching
    

    IMPORTANT: make sure to include the --recursive flag else you will encounter errors during deployment.

  4. Create a virtual environment and install the dependencies:

    $ virtualenv venv
    $ source venv/bin/activate
    $ pip install -r big-data-stack/requirements.txt
    
  5. Start a virtual cluster (Ubuntu 14.04) with at least three nodes and obtain the IP addresses. We assume that the cluster is homogeneous.

  6. In the big-data-stack directory, generate the ansible files using mk-inventory:

    $ python mk-inventory -n mycluster 192.168.1.100  192.168.1.101 192.168.1.102
    
  7. Make sure each node is accessible by ansible:

    $ ansible all -o -m ping
    
  8. Deploy the stack (~ 20 minutes):

    $ ansible-playbook play-hadoop.yml addons/{zookeeper,spark,hbase,drill}.yml -e drill_with_hbase=True
    
  9. Deploy the dataset and NBIS software (~ 10 minutes):

    $ ansible-playbook ../{software,dataset}.yml
    
  10. Login to the first node and switch to the hadoop user:

    $ ssh [email protected]
    $ sudo su - hadoop
    
  11. Load the images data into HBase (~ 2 minutes):

    $ time spark-submit \
        --master yarn \
        --deploy-mode cluster \
        --driver-class-path $(hbase classpath) \
        --class LoadData \
        target/scala-2.10/NBIS-assembly-1.0.jar \
        /tmp/nist/NISTSpecialDatabase4GrayScaleImagesofFIGS/sd04/sd04_md5.lst
    
  12. Run MINDTCT for ridge detection (~ 20 minutes):

    $ time spark-submit \
        --master yarn \
        --deploy-mode cluster \
        --driver-class-path $(hbase classpath) \
        --class RunMindtct \
        target/scala-2.10/NBIS-assembly-1.0.jar
    
  13. Sample the images to select subsets as the probe and gallery images. In this case the probe set is 0.1% and the gallery set is 1% (~ 2 minutes):

    $ time spark-submit \
        --master yarn \
        --deploy-mode cluster \
        --driver-class-path $(hbase classpath) \
        --class RunGroup \
        target/scala-2.10/NBIS-assembly-1.0.jar \
        probe 0.001 \
        gallery 0.01
    
  14. Match the probe set to the gallery set (~ 2 minutes):

    $ time spark-submit \
        --master yarn \
        --deploy-mode cluster \
        --driver-class-path $(hbase classpath) \
        --class RunBOZORTH3 \
        target/scala-2.10/NBIS-assembly-1.0.jar \
        probe gallery
    
  15. Use Drill to query:

    $ sqlline -u jdbc:drill:zk=mycluster0,mycluster1,mycluster2;schema=hbase
    
    > use hbase;
    > SELECT
      CONVERT_FROM(Bozorth3.Bozorth3.probeId, 'UTF8') probe,
      CONVERT_FROM(Bozorth3.Bozorth3.galleryId, 'UTF8') gallery,
      CONVERT_FROM(Bozorth3.Bozorth3.score, 'INT_BE') score
      FROM Bozorth3
      ORDER BY score
      DESC
      LIMIT 10
      ;
    
  1. Clone this repository:

    git clone --recursive [email protected]:cloudmesh/example-project-nist-fingerprint-matching
    
  2. Create a virtualenv and activate it:

    virtualenv venv
    source venv/bin/activate
    
  3. Install the requirements:

    pip install -r big-data-stack/requirements.txt
    

These instructions are for manually building and bundling the source code for loading the images into HBase and running the analysis:

$ sbt package
$ sbt assembly

After building, the target jarfile to submit is located at:

target/scala-2.10/NBIS-assembly-1.0.jar

When submitting, you need to tell Spark to provide HBase in the execution classpath using:

--driver-class-path $(hbase classpath)

There are four components that are run with Spark:

  1. Loading the image data into HBase
  2. Running MINDTCT for ridge detection
  3. Partitioning into probe and gallery sets
  4. Running BOZORTH3 for comparing probe and gallery sets

In the command below the $MAIN_CLASS and $MAIN_CLASS_ARGS configure which component to run and arguments passed in. The possible configurations are

  • MAIN_CLASS=LoadData

    This runs the component that loads the data from local filesystem into HBase.

    Arguments: path (required): the path to the checksum file. This is the file from which the list of images and their metadata files is extracted. For example:

    MAIN_CLASS_ARGS=/tmp/nist/NISTSpecialDatabase4GrayScaleImagesofFIGS/sd04/sd04_md5.lst
    
  • MAIN_CLASS=RunMindtct

    This runs the component to find the ridges in the images by forking off the MINDTCT program to process each image and store the results in HBase.

    Arguments: None

  • MAIN_CLASS=RunGroup

    This subsamples the image database into "probe" and "gallery" sets. You must specify how much of the full dataset to use for each set. For example: 0.1% for "probe" and 1% for "gallery" below:

    MAIN_CLASS_ARGS="probe 0.001 gallery 0.01"
    

    The number of images to be included in a set is given by:

    count = multiplier * total_images
    

    Arguments:

    1. nameProbe (required): the name of the probe set, eg: probe
    2. multiplierProbe (required): subsample the full dataset by this value, eg: 0.001
    3. nameGallery (required): the name of the gallery set, eg: gallery
    4. multiplierGallery (required): subsample the full dataset by this value, eg: 0.01
  • MAIN_CLASS=RunBOZORTH3

    This applied the BOZORTH3 program to the images chosed by the grouping step. You must specify the names of the probe and gallery sets. For example:

    MAIN_CLASS_ARGS="probe gallery"
    

    Arguments:

    1. nameProbe (required): the name of the probe set, eg: probe
    2. nameGallery (required): the name of the gallery set, eg: gallery
spark-submit \
  --driver-class-path $(hbase classpath) \
  --class $MAIN_CLASS \
  target/scala-2.10/NBIS-assembly-1.0.jar \
  $MAIN_CLASS_ARGS

This is the same as the local submission, but add:

--master yarn --deploy-mode cluster
[1]This overview section was adapted from the NIST Big Data Public Working Group draft Possible Big Data Use Cases Implementation using NBDRA authored by Afzal Godil and Wo Chang

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published