Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor FIPS & unify CLI actions #275

Merged
merged 83 commits into from
Dec 8, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
a900b90
unify CLI actions
adamjanovsky Oct 25, 2022
0e12ba4
merge dataset constructors, from_web_latest code
adamjanovsky Oct 25, 2022
7aff8bf
unify _get_certs_by_name methods
adamjanovsky Oct 25, 2022
d0d9f91
unify get_keywords_df method
adamjanovsky Oct 25, 2022
1b150aa
unify and generalize dataset method get_keywords_df()
adamjanovsky Oct 27, 2022
5110b52
root_dir setter for FIPSDataset
adamjanovsky Oct 27, 2022
39c89c1
WiP: refactor FIPS get_certs_from_web()
adamjanovsky Oct 27, 2022
9433658
implement artifact download FIPS
adamjanovsky Oct 27, 2022
933b469
refactor tests unittest -> pytest
adamjanovsky Nov 4, 2022
80af3a2
add type hint for json serialization
adamjanovsky Nov 9, 2022
6134479
new object to hold auxillary datasets
adamjanovsky Nov 9, 2022
fad4fbf
use temp folders for cc analysis test data
adamjanovsky Nov 9, 2022
e26fb0c
mark further download tests with xfail
adamjanovsky Nov 9, 2022
14c369b
fix xfail marker on cpe_dset_from_web test
adamjanovsky Nov 9, 2022
8482d82
pandas tests, cve_dset, cpe_dset unify json_path approach
adamjanovsky Nov 10, 2022
c6913dc
merge main
adamjanovsky Nov 10, 2022
24f11fb
test maintenance updates
adamjanovsky Nov 10, 2022
9052138
fix paths handling in CPEDataset, CVEDataset
adamjanovsky Nov 11, 2022
67c6295
cleanup path issues
adamjanovsky Nov 11, 2022
1bc1c7d
auxillary dataset processing CC
adamjanovsky Nov 11, 2022
ca5d4c8
fix mypy error in cli.py
adamjanovsky Nov 11, 2022
b41fa12
fix error in mu dset tests
adamjanovsky Nov 11, 2022
d51c65c
FIPS policy pdf convert refactoring
adamjanovsky Nov 11, 2022
f7c5915
cleanup in fips code structure
adamjanovsky Nov 11, 2022
9fbba9e
common interface for Dataset.analyze_certificates()
adamjanovsky Nov 16, 2022
c0ca076
merge main
adamjanovsky Nov 16, 2022
2a694c4
delete plot_graph() of FIPSDataset
adamjanovsky Nov 16, 2022
9462624
analyce_certificate() interface, delete dead code
adamjanovsky Nov 16, 2022
01e4156
FIPSDataset new parsing of html modules
adamjanovsky Nov 17, 2022
315270a
fix tests
adamjanovsky Nov 23, 2022
2871384
refactor algorithm extraction from policy tables
adamjanovsky Nov 23, 2022
048f3f6
delete InternalState.errors of cert objects
adamjanovsky Nov 23, 2022
4e582e1
deduplicate FIPSAlgorithm data structures
adamjanovsky Nov 23, 2022
0568ca4
remove graphviz requirement
adamjanovsky Nov 23, 2022
d7603e1
move AlgorithmDataset to AuxillaryDatasets class
adamjanovsky Nov 23, 2022
35c5734
Refactor FIPSAlgorithm objects
adamjanovsky Nov 25, 2022
25d42fc
update flake8 CI workflow
adamjanovsky Nov 25, 2022
67fc667
update flake8 config
adamjanovsky Nov 25, 2022
97dce48
cleanup
adamjanovsky Nov 25, 2022
4d0ae40
clean-up, update docs, cli
adamjanovsky Nov 25, 2022
cb879f3
fix json objects for fips test
adamjanovsky Nov 29, 2022
5895c85
rename dependency -> references of transitive vulns
adamjanovsky Nov 29, 2022
c6d826c
fips refactor reference computation
adamjanovsky Nov 30, 2022
fc49b6a
implement transitive vuln. search for FIPS
adamjanovsky Nov 30, 2022
cae2dc2
restrict usage of fresh bool param
adamjanovsky Nov 30, 2022
e062f3e
improve dataset processing logging
adamjanovsky Nov 30, 2022
5b0a7cb
fix table extraction from fips policies
adamjanovsky Dec 2, 2022
f681244
fix reference computation fips
adamjanovsky Dec 2, 2022
08ff031
update readme
adamjanovsky Dec 2, 2022
3fbf5f0
random fixes for cc pipeline
adamjanovsky Dec 2, 2022
6953dfb
fix CC notebooks
adamjanovsky Dec 2, 2022
6d7a907
random fixes in FIPS notebooks
adamjanovsky Dec 2, 2022
2f21854
move label studio interface layout file
adamjanovsky Dec 2, 2022
91b0973
update readme
adamjanovsky Dec 2, 2022
4ddae8a
introduce pyupgrade
adamjanovsky Dec 2, 2022
6d66552
bump scipy, dependabot errors on it
adamjanovsky Dec 2, 2022
40206cd
bump pillow lib
adamjanovsky Dec 2, 2022
25dcec9
bump Github action versions
adamjanovsky Dec 2, 2022
ca0c4e2
convert examples to notebooks
adamjanovsky Dec 2, 2022
86a62cb
fips normalize embodiment string
adamjanovsky Dec 2, 2022
6ce7007
unify from __future__ import annotations
adamjanovsky Dec 5, 2022
4e62ae1
Update sec_certs/dataset/common_criteria.py
adamjanovsky Dec 5, 2022
1a3502a
Update sec_certs/dataset/fips.py
adamjanovsky Dec 5, 2022
8f7a14b
entry guard
adamjanovsky Dec 5, 2022
4085c61
revive tests settings
adamjanovsky Dec 5, 2022
b37eaaf
fix here, fix there
adamjanovsky Dec 5, 2022
6c02383
rename dataset of maintenance updates
adamjanovsky Dec 5, 2022
8ac389a
Update sec_certs/dataset/common_criteria.py
adamjanovsky Dec 5, 2022
bc5a532
Update sec_certs/model/cpe_matching.py
adamjanovsky Dec 5, 2022
4712279
chain.from_iterable() now working with generator expessions
adamjanovsky Dec 5, 2022
f14dfe3
fix getitem on fips dataset
adamjanovsky Dec 5, 2022
a1ec986
test config global fixture
adamjanovsky Dec 6, 2022
577300e
add pyupgrade into linter pipeline
adamjanovsky Dec 6, 2022
ed8813e
reimplement dataset serialization constraints
adamjanovsky Dec 7, 2022
29dd48c
delete pp dataset json
adamjanovsky Dec 7, 2022
52bddce
update docs
adamjanovsky Dec 7, 2022
30ef160
attempt to fix pipelines
adamjanovsky Dec 8, 2022
0bcda6b
don't download spacy model test pipeline
adamjanovsky Dec 8, 2022
7710ef8
test pipeline ubuntu 20.04
adamjanovsky Dec 8, 2022
7af69ca
disable CPE from web test
adamjanovsky Dec 8, 2022
be7f6d7
try ubuntu 22.04 test runner
adamjanovsky Dec 8, 2022
7d59063
cli print -> click.echo()
adamjanovsky Dec 8, 2022
4574a3d
FIPSCertificate no longer hashable
adamjanovsky Dec 8, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 0 additions & 155 deletions cc_cli.py

This file was deleted.

194 changes: 194 additions & 0 deletions cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
#!/usr/bin/env python3
import logging
import sys
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
from typing import Callable, List, Optional, Type, Union

import click

from sec_certs.config.configuration import config
from sec_certs.dataset import CCDataset, FIPSDataset
from sec_certs.utils.helpers import warn_if_missing_graphviz, warn_if_missing_poppler, warn_if_missing_tesseract

logger = logging.getLogger(__name__)


@dataclass
class ProcessingStep:
name: str
processing_function_name: str
precondition: Optional[str] = field(default=None)
precondition_error_msg: Optional[str] = field(default=None)
pre_callback_func: Optional[Callable] = field(default=None)

def run(self, dset: Union[CCDataset, FIPSDataset]) -> None:
if self.precondition and not getattr(dset.state, self.precondition):
err_msg = (
self.precondition_error_msg
if self.precondition_error_msg
else f"Error, precondition to run {self.name} not met, exiting."
)
print(err_msg)
sys.exit(1)
if self.pre_callback_func:
self.pre_callback_func()

getattr(dset, self.processing_function_name)()


def warn_missing_libs():
warn_if_missing_poppler()
warn_if_missing_tesseract()


def build_or_load_dataset(
framework: str, inputpath: Optional[Path], outputpath: Optional[Path], to_build: bool
) -> Union[CCDataset, FIPSDataset]:
constructor: Union[Type[CCDataset], Type[FIPSDataset]] = CCDataset if framework == "cc" else FIPSDataset
dset: Union[CCDataset, FIPSDataset]

if to_build:
if inputpath:
print(
f"Warning: you wanted to build a dataset but you provided one in JSON -- that will be ignored. New one will be constructed at: {outputpath}"
)
dset = constructor(
certs={},
root_dir=outputpath,
name=framework + "_dataset",
description=f"Full {framework} dataset snapshot {datetime.now().date()}",
)
dset.get_certs_from_web()
else:
if inputpath:
dset = constructor.from_json(inputpath)
if outputpath and dset.root_dir != outputpath:
print(
"Warning: you provided both input and output paths. The dataset from input path will get copied to output path."
)
dset.root_dir = outputpath
else:
print(
"Error: If you do not use 'build' action, you must provide --input parameter to point to an existing dataset."
)
sys.exit(1)

return dset


@click.command()
@click.argument(
"framework",
required=True,
nargs=1,
type=click.Choice(["cc", "fips"], case_sensitive=False),
)
@click.argument(
"actions",
required=True,
nargs=-1,
type=click.Choice(["all", "build", "process-aux-dsets", "download", "convert", "analyze"], case_sensitive=False),
)
@click.option(
"-o",
"--output",
type=click.Path(file_okay=False, dir_okay=True, writable=True, readable=True, resolve_path=True),
help="Path where the output of the experiment will be stored. May overwrite existing content.",
default=Path("./cc_dset/"),
show_default=True,
)
@click.option(
"-c",
"--config",
"configpath",
default=None,
type=click.Path(file_okay=True, dir_okay=False, writable=True, readable=True),
help="Path to your own config yaml file that will override the default one.",
)
@click.option(
"-i",
"--input",
"inputpath",
type=click.Path(file_okay=True, dir_okay=False, writable=True, readable=True),
help="If set, the actions will be performed on a CC dataset loaded from JSON from the input path.",
)
@click.option("-s", "--silent", is_flag=True, help="If set, will not print to stdout")
def main(
framework: str,
actions: List[str],
outputpath: Path,
configpath: Optional[str],
inputpath: Optional[Path],
silent: bool,
):
file_handler = logging.FileHandler(config.log_filepath)
stream_handler = logging.StreamHandler(sys.stderr)
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
file_handler.setFormatter(formatter)
stream_handler.setFormatter(formatter)
handlers: List[logging.StreamHandler] = [file_handler] if silent else [file_handler, stream_handler]
logging.basicConfig(level=logging.INFO, handlers=handlers)
start = datetime.now()

if configpath:
try:
config.load(Path(configpath))
except FileNotFoundError:
print("Error: Bad path to configuration file")
sys.exit(1)
except ValueError as e:
print(f"Error: Bad format of configuration file: {e}")

actions_set = (
{"build", "process-aux-dsets", "download", "convert", "analyze", "maintenances"}
if "all" in actions
else set(actions)
)

dset = build_or_load_dataset(framework, inputpath, outputpath, "build" in actions_set)
aux_dsets_to_handle = "PP, Maintenance updates" if framework == "cc" else "Algorithms"
analysis_pre_callback = None if framework == "cc" else warn_if_missing_graphviz

steps = [
ProcessingStep(
"process-aux-dsets",
"process_auxillary_datasets",
precondition="meta_sources_parsed",
precondition_error_msg=f"Error: You want to process the auxillary datasets: {aux_dsets_to_handle} , but the data from cert. framework website was not parsed. You must use 'build' action first.",
pre_callback_func=None,
),
ProcessingStep(
"download",
"download_all_pdfs",
precondition="meta_sources_parsed",
precondition_error_msg="Error: You want to download all pdfs, but the data from the cert. framework website was not parsed. You must use 'build' action first.",
pre_callback_func=None,
),
ProcessingStep(
"convert",
"convert_all_pdfs",
precondition="pdfs_downloaded",
precondition_error_msg="Error: You want to convert pdfs -> txt, but the pdfs were not downloaded. You must use 'download' action first.",
pre_callback_func=warn_missing_libs,
),
ProcessingStep(
"analyze",
"analyze_certificates",
precondition="pdfs_converted",
precondition_error_msg="Error: You want to process txt documents of certificates, but pdfs were not converted. You must use 'convert' action first.",
pre_callback_func=analysis_pre_callback,
),
]

processing_step: ProcessingStep
for processing_step in [x for x in steps if x in actions_set]:
processing_step.run(dset)

end = datetime.now()
logger.info(f"The computation took {(end-start)} seconds.")


if __name__ == "__main__":
main()
4 changes: 2 additions & 2 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,13 @@ If you insist on processing the whole certificates pipeline, make sure that you
::::{tab-set}
:::{tab-item} Common Criteria
```bash
$ cc-certs all
$ sec-certs cc all
```
:::

:::{tab-item} FIPS 140
```bash
$ fips-certs new-run
$ sec-certs fips all
```
:::
::::
Expand Down
Loading