Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor FIPS & unify CLI actions #275

Merged
merged 83 commits into from
Dec 8, 2022
Merged
Changes from 1 commit
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
a900b90
unify CLI actions
adamjanovsky Oct 25, 2022
0e12ba4
merge dataset constructors, from_web_latest code
adamjanovsky Oct 25, 2022
7aff8bf
unify _get_certs_by_name methods
adamjanovsky Oct 25, 2022
d0d9f91
unify get_keywords_df method
adamjanovsky Oct 25, 2022
1b150aa
unify and generalize dataset method get_keywords_df()
adamjanovsky Oct 27, 2022
5110b52
root_dir setter for FIPSDataset
adamjanovsky Oct 27, 2022
39c89c1
WiP: refactor FIPS get_certs_from_web()
adamjanovsky Oct 27, 2022
9433658
implement artifact download FIPS
adamjanovsky Oct 27, 2022
933b469
refactor tests unittest -> pytest
adamjanovsky Nov 4, 2022
80af3a2
add type hint for json serialization
adamjanovsky Nov 9, 2022
6134479
new object to hold auxillary datasets
adamjanovsky Nov 9, 2022
fad4fbf
use temp folders for cc analysis test data
adamjanovsky Nov 9, 2022
e26fb0c
mark further download tests with xfail
adamjanovsky Nov 9, 2022
14c369b
fix xfail marker on cpe_dset_from_web test
adamjanovsky Nov 9, 2022
8482d82
pandas tests, cve_dset, cpe_dset unify json_path approach
adamjanovsky Nov 10, 2022
c6913dc
merge main
adamjanovsky Nov 10, 2022
24f11fb
test maintenance updates
adamjanovsky Nov 10, 2022
9052138
fix paths handling in CPEDataset, CVEDataset
adamjanovsky Nov 11, 2022
67c6295
cleanup path issues
adamjanovsky Nov 11, 2022
1bc1c7d
auxillary dataset processing CC
adamjanovsky Nov 11, 2022
ca5d4c8
fix mypy error in cli.py
adamjanovsky Nov 11, 2022
b41fa12
fix error in mu dset tests
adamjanovsky Nov 11, 2022
d51c65c
FIPS policy pdf convert refactoring
adamjanovsky Nov 11, 2022
f7c5915
cleanup in fips code structure
adamjanovsky Nov 11, 2022
9fbba9e
common interface for Dataset.analyze_certificates()
adamjanovsky Nov 16, 2022
c0ca076
merge main
adamjanovsky Nov 16, 2022
2a694c4
delete plot_graph() of FIPSDataset
adamjanovsky Nov 16, 2022
9462624
analyce_certificate() interface, delete dead code
adamjanovsky Nov 16, 2022
01e4156
FIPSDataset new parsing of html modules
adamjanovsky Nov 17, 2022
315270a
fix tests
adamjanovsky Nov 23, 2022
2871384
refactor algorithm extraction from policy tables
adamjanovsky Nov 23, 2022
048f3f6
delete InternalState.errors of cert objects
adamjanovsky Nov 23, 2022
4e582e1
deduplicate FIPSAlgorithm data structures
adamjanovsky Nov 23, 2022
0568ca4
remove graphviz requirement
adamjanovsky Nov 23, 2022
d7603e1
move AlgorithmDataset to AuxillaryDatasets class
adamjanovsky Nov 23, 2022
35c5734
Refactor FIPSAlgorithm objects
adamjanovsky Nov 25, 2022
25d42fc
update flake8 CI workflow
adamjanovsky Nov 25, 2022
67fc667
update flake8 config
adamjanovsky Nov 25, 2022
97dce48
cleanup
adamjanovsky Nov 25, 2022
4d0ae40
clean-up, update docs, cli
adamjanovsky Nov 25, 2022
cb879f3
fix json objects for fips test
adamjanovsky Nov 29, 2022
5895c85
rename dependency -> references of transitive vulns
adamjanovsky Nov 29, 2022
c6d826c
fips refactor reference computation
adamjanovsky Nov 30, 2022
fc49b6a
implement transitive vuln. search for FIPS
adamjanovsky Nov 30, 2022
cae2dc2
restrict usage of fresh bool param
adamjanovsky Nov 30, 2022
e062f3e
improve dataset processing logging
adamjanovsky Nov 30, 2022
5b0a7cb
fix table extraction from fips policies
adamjanovsky Dec 2, 2022
f681244
fix reference computation fips
adamjanovsky Dec 2, 2022
08ff031
update readme
adamjanovsky Dec 2, 2022
3fbf5f0
random fixes for cc pipeline
adamjanovsky Dec 2, 2022
6953dfb
fix CC notebooks
adamjanovsky Dec 2, 2022
6d7a907
random fixes in FIPS notebooks
adamjanovsky Dec 2, 2022
2f21854
move label studio interface layout file
adamjanovsky Dec 2, 2022
91b0973
update readme
adamjanovsky Dec 2, 2022
4ddae8a
introduce pyupgrade
adamjanovsky Dec 2, 2022
6d66552
bump scipy, dependabot errors on it
adamjanovsky Dec 2, 2022
40206cd
bump pillow lib
adamjanovsky Dec 2, 2022
25dcec9
bump Github action versions
adamjanovsky Dec 2, 2022
ca0c4e2
convert examples to notebooks
adamjanovsky Dec 2, 2022
86a62cb
fips normalize embodiment string
adamjanovsky Dec 2, 2022
6ce7007
unify from __future__ import annotations
adamjanovsky Dec 5, 2022
4e62ae1
Update sec_certs/dataset/common_criteria.py
adamjanovsky Dec 5, 2022
1a3502a
Update sec_certs/dataset/fips.py
adamjanovsky Dec 5, 2022
8f7a14b
entry guard
adamjanovsky Dec 5, 2022
4085c61
revive tests settings
adamjanovsky Dec 5, 2022
b37eaaf
fix here, fix there
adamjanovsky Dec 5, 2022
6c02383
rename dataset of maintenance updates
adamjanovsky Dec 5, 2022
8ac389a
Update sec_certs/dataset/common_criteria.py
adamjanovsky Dec 5, 2022
bc5a532
Update sec_certs/model/cpe_matching.py
adamjanovsky Dec 5, 2022
4712279
chain.from_iterable() now working with generator expessions
adamjanovsky Dec 5, 2022
f14dfe3
fix getitem on fips dataset
adamjanovsky Dec 5, 2022
a1ec986
test config global fixture
adamjanovsky Dec 6, 2022
577300e
add pyupgrade into linter pipeline
adamjanovsky Dec 6, 2022
ed8813e
reimplement dataset serialization constraints
adamjanovsky Dec 7, 2022
29dd48c
delete pp dataset json
adamjanovsky Dec 7, 2022
52bddce
update docs
adamjanovsky Dec 7, 2022
30ef160
attempt to fix pipelines
adamjanovsky Dec 8, 2022
0bcda6b
don't download spacy model test pipeline
adamjanovsky Dec 8, 2022
7710ef8
test pipeline ubuntu 20.04
adamjanovsky Dec 8, 2022
7af69ca
disable CPE from web test
adamjanovsky Dec 8, 2022
be7f6d7
try ubuntu 22.04 test runner
adamjanovsky Dec 8, 2022
7d59063
cli print -> click.echo()
adamjanovsky Dec 8, 2022
4574a3d
FIPSCertificate no longer hashable
adamjanovsky Dec 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
update readme
adamjanovsky committed Dec 2, 2022
commit 08ff031712b92f59f3dd7cc482fe89562f03cfd2
199 changes: 32 additions & 167 deletions README.md
Original file line number Diff line number Diff line change
@@ -2,9 +2,7 @@

![](docs/_static/logo.svg)

Tool for analysis of security certificates and their security targets (Common Criteria, NIST FIPS140-2...).

This project is developed by the [Centre for Research On Cryptography and Security](https://crocs.fi.muni.cz) at Faculty of Informatics, Masaryk University.
A tool for data scraping and analysis of security certificates from Common Criteria and FIPS 140-2/3 frameworks. This project is developed by the [Centre for Research On Cryptography and Security](https://crocs.fi.muni.cz) at Faculty of Informatics, Masaryk University.

[![Website](https://img.shields.io/website?down_color=red&down_message=offline&style=flat-square&up_color=SpringGreen&up_message=online&url=https%3A%2F%2Fseccerts.org)](https://seccerts.org)
[![PyPI](https://img.shields.io/pypi/v/sec-certs?style=flat-square)](https://pypi.org/project/sec-certs/)
@@ -15,192 +13,59 @@ This project is developed by the [Centre for Research On Cryptography and Securi

## Installation

Use Docker with `docker pull seccerts/sec-certs` or just `pip install -U sec-certs`. For more elaborate description, see [docs](https://seccerts.org/docs/installation.html)
Use Docker with `docker pull seccerts/sec-certs` or just `pip install -U sec-certs`. For more elaborate description, see [docs](https://seccerts.org/docs/installation.html).

## Usage (CC)
## Usage

There are two main steps in exploring the world of Common Criteria certificates:
There are two main steps in exploring the world of security certificates:

1. Processing all the certificates
2. Data exploration
1. Data scraping and data processing all the certificates
2. Exploring and analysing the processed data

For the first step, we currently provide CLI and our already processed fresh snapshot. For the second step, we provide simple API that can be used directly inside our Jupyter notebook or locally, at your machine.

### Explore data with MyBinder Jupyter notebook
More elaborate usage is described in [docs/quickstart](https://seccerts.org/docs/quickstart.html). Also, see [example notebooks](https://github.com/crocs-muni/sec-certs/tree/main/notebooks/examples) either at GitHub or at docs. From docs, you can also run our notebooks in Binder.

Most probably, you don't want to process fresh snapshot of Common Criteria certificates by yourself. Instead, you can use our results and explore them using [online Jupyter notebook](https://mybinder.org/v2/gh/crocs-muni/sec-certs/dev?filepath=notebooks%2Fcpe_cve.ipynb).
## Data scraping

### Explore the latest snapshot locally
Run `sec-certs cc all` for Common Criteria processing, `sec-certs fips all` for FIPS 140 processing.

In Python, run
## Data analysis

```python
from sec_certs.dataset.common_criteria import CCDataset
import pandas as pd

dset = CCDataset.from_web_latest() # now you can inspect the object, certificates are held in dset.certs
df = dset.to_pandas() # Or you can transform the object into Pandas dataframe
dset.to_json(
'./latest_cc_snapshot.json') # You may want to store the snapshot as json, so that you don't have to download it again
dset = CCDataset.from_json('./latest_cc_snapshot.json') # you can now load your stored dataset again
```

### Process CC data with Python

If you wish to fully process the Common Criteria (CC) data by yourself, you can do that as follows. Running
Without needing to run the whole processing pipeline, you can fetch fresh snapshot of the dataset and explore it yourself.

```python
cc-certs all --output ./cc_dataset
```
dset = CCDataset.from_web_latest()

will fully process the Common Criteria dataset, which can take up to 6 hours to finish. You can select only same tasks to run. Calling `cc-cli --help` yields
# Get certificates with some CVE
vulnerable_certs = [x for x in dset if x.heuristics.related_cves]
df_vulnerable = df.loc[~df.related_cves.isna()]

```
Usage: cc_cli.py [OPTIONS]
[all|build|download|convert|analyze|maintenances]...
# Show CVE ids of some vulnerable certificate
print(f"{vulnerable_certs[0].heuristics.related_cves=}")

Specify actions, sequence of one or more strings from the following list:
[all, build, download, convert, analyze] If 'all' is specified, all
actions run against the dataset. Otherwise, only selected actions will run
in the correct order.
# Get certificates from 2015 and newer
df_2015_and_newer = df.loc[df.year_from > 2014]

Options:
-o, --output DIRECTORY Path where the output of the experiment will be
stored. May overwrite existing content.

-c, --config FILE Path to your own config yaml file that will override
the default one.

-i, --input FILE If set, the actions will be performed on a CC
dataset loaded from JSON from the input path.

-s, --silent If set, will not print to stdout
--help Show this message and exit.
# Plot distribution of years of certification
df.year_from.value_counts().sort_index().plot.line()
```

### Process CC data with Docker

1. pull the image from the DockerHub repository : `docker pull seccerts/sec-certs`
2. run `docker run --volume ./processed_data:/home/user/sec-certs/examples/debug_dataset -it seccerts/sec-certs`
3. All processed data will be in the `~/processed_data` directory

## Usage (FIPS)

Currently, the main goal of the FIPS module is to find dependencies between the certified products.

### MyBinder Jupyter Notebook

Without the need of processing the data locally, you can use the online MyBinder Jupyter notebook:

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/crocs-muni/sec-certs/fips?filepath=.%2Fnotebooks%2Ffips_data.ipynb)
### Explore data with MyBinder Jupyter notebook

Most probably, you don't want to process fresh snapshot of Common Criteria certificates by yourself. Instead, you can use our results and explore them using [online Jupyter notebook](https://mybinder.org/v2/gh/crocs-muni/sec-certs/dev?filepath=notebooks%2Fcpe_cve.ipynb).

### Explore the latest snapshot locally

You can also explore the latest snapshot locally using Python:
```py
from sec_certs.dataset.fips import FIPSDataset

dset: FIPSDataset = FIPSDataset.from_web_latest() # to get the latest snapshot
dset.to_json('./fips_dataset.json') # to save the dataset
new_dset = FIPSDataset.from_json('./fips_dataset.json') # to load it from disk

```

### Process FIPS data manually with Python

You can also process FIPS data manually using `fips-certs` in terminal after installation.
You can also use the `fips_cli.py` script.

Calling `fips-certs --help` outputs following:
```
Usage: fips-certs [OPTIONS] [new-run|all|build|convert|update|web-scan|pdf-
scan|table-search|analysis|graphs]...

Specify actions, sequence of one or more strings from the following list:

["new-run", "all", "build", "convert", "update", "pdf-scan",
"table-search", "analysis", "graphs"]

If 'new-run' is specified, a new dataset will be created and all the
actions will be run. If 'all' is specified, dataset will be updated and
all actions run against the dataset. Otherwise, only selected actions will
run in the correct order.

Dataset loading:

'build' Create a skeleton of a new dataset from NIST pages.

'update' Load a previously used dataset (created by 'build')
and update it with nonprocessed entries from NIST pages.

Both options download the files needed for analysis.

Analysis preparation:

'convert' Convert all downloaded PDFs.

'pdf-scan' Perform a scan of downloaded CMVP security policy
documents - Keyword extraction.

'table-search' Analyze algorithm implementation entries in tables in
security policy documents.

Analysis preparation actions are by default done only for
certificates, where each corresponding action failed. This
behaviour can be changed using '--redo-*' options. These actions
are also independent of each other.

Analysis:

'analysis' Merge results from analysis preparation and find
dependencies between certificates.

'graphs' Plot dependency graphs.

Options:
-o, --output DIRECTORY Path where the output of the experiment will be
stored. May overwrite existing content.

-c, --config FILE Path to your own config yaml file that will
override the default one.

-i, --input FILE If set, the actions will be performed on a CC
dataset loaded from JSON from the input path.

-n, --name TEXT Name of the json object to be created in the
<<output>> directory. Defaults to
timestamp.json.

--no-download-algs Don't fetch new algorithm implementations
--redo-web-scan Redo HTML webpage scan from scratch
--redo-keyword-scan Redo PDF keyword scan from scratch
--higher-precision-results Redo table search for certificates with high
error rate. Behaviour undefined if used on a
newly instantiated dataset.

-s, --silent If set, will not print to stdout
--help Show this message and exit.
```

The *Analysis* part is designed to find dependecies between certificates.

#### First run
The first time you are using the FIPS module, use the following command:
```
fips-certs new-run --output <directory name> --name <dataset name>
```
where `<directory name>` is the name of the working directory of the FIPS module
(e.g. where all the metadata will be stored), and `<dataset name>` is the name of the resulting dataset.

This will download a large amount of data (4-5 GB) and can take up to 4 hours to finish.
In Python, run

#### Next runs
```python
from sec_certs.dataset.common_criteria import CCDataset
import pandas as pd

When a dataset is successfully created using `new-run`, you can use the command `all` to update the dataset
(download latest files, redo scans for failed certificates, etc.). It is also **strongly advised** to use the `--higher-precision-results`
switch on the **second run**. The following command should be used to update the dataset:
```
fips-certs all --input <path to the dataset>
dset = CCDataset.from_web_latest() # now you can inspect the object, certificates are held in dset.certs
df = dset.to_pandas() # Or you can transform the object into Pandas dataframe
dset.to_json(
'./latest_cc_snapshot.json') # You may want to store the snapshot as json, so that you don't have to download it again
dset = CCDataset.from_json('./latest_cc_snapshot.json') # you can now load your stored dataset again
```
where `<path to the dataset>` is the **path to the dataset file**, i.e. `<directory name>/<dataset name>.json` from the first run.