Skip to content

Commit

Permalink
add eol-eos
Browse files Browse the repository at this point in the history
  • Loading branch information
zgeorg committed Oct 6, 2024
1 parent 03b929f commit 18ac656
Show file tree
Hide file tree
Showing 13 changed files with 211 additions and 133 deletions.
48 changes: 44 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Cisco Data Automation Project

### The Problem This Project Solves
Cisco provides Operational Insights (OI) through Network Consulting Engineers (NCE) or a customer portal, but this requires an onboarding process and an active subscription. This works well for existing customers, but what if you don’t want to go through all the formalities just to access important information like End of Life (EOL) dates, Common Vulnerabilities and Exposures (CVE), or best practices?

Expand All @@ -17,15 +19,53 @@ This project offers a standalone REST API that provides support data for Cisco d
- Easily extend this tool to fetch data from other vendors (like F5, Palo Alto, Arista, etc.).

### Why an Open API Matters
This solution could help potential new Cisco customers.
This solution could help potential new Cisco customers.

By making support data accessible, it shows the value Cisco offers, potentially encouraging investment in new devices or support contracts.
By making support data accessible, it shows the value Cisco offers, potentially encouraging investment in new devices or support contracts.

However, the purpose here isn't marketing — it’s about helping people find useful, well organised data or build their own tools on top of this one.
However, the purpose here isn't marketing — it’s about helping people find useful, well-organized data or build their own tools on top of this one.

### No Reinventing the Wheel — Just Fetching, Organizing, and Serving Data
It’s important to understand that this project isn’t about stealing proprietary information. All the data being fetched is already publicly available on Cisco's website. This project just automates the process, providing the same information in a more efficient way.

The tools themselves don't violate Cisco’s policies, but how you use them might. So, make sure you're following Cisco's Terms of Service.

#### Note: This repository does not contain any scraped Cisco data — only tools that can gather it... :)
#### Note: This repository does not contain any scraped Cisco data — only tools that can gather it... :)

### Job Descriptions

1. **CiscoEOLJob**:

- ![Activity Diagram](images/CiscoEOLJob_Activity_Diagram.png)
- **What It Does**: CiscoEOLJob is responsible for scraping and processing Cisco's End-of-Life (EOL), End-of-Sale (EOS), and Field Notices (FN) data.
- **How It Works**: It uses asynchronous HTTP requests to retrieve HTML pages from Cisco's website. It then parses these pages using BeautifulSoup to extract relevant information like product part numbers, milestone dates, and field notice details. The extracted data is saved to JSON files.

2. **GetCiscoProductsJob**:
- ![Activity Diagram](images/GetCiscoProductsJob_Activity_Diagram.png)
- **What It Does**: GetCiscoProductsJob is used to scrape product links from Cisco's support website and extract details about supported Cisco products.
- **How It Works**: It uses an asynchronous HTTP client (`httpx`) to make requests to Cisco's product pages. The job parses the pages to extract product links, and then extracts supported products from each product page. The extracted data is saved to a JSON file.

3. **GetFeaturesJob**:
- ![Activity Diagram](images/GetFeaturesJob_Activity_Diagram.png)
- **What It Does**: GetFeaturesJob is responsible for fetching data related to Cisco platforms, software releases, and features. It collects information on supported features for different Cisco hardware platforms and software releases.
- **How It Works**: This job first fetches platform data, then collects release information for each platform. Finally, it fetches features for each release. The data is fetched through Cisco APIs, and unique feature hashes are generated to track feature details. The collected data is saved to JSON files and archived as a `.tar.gz` file.

### Configuration Classes

1. **Config**:
- **Description**: The base configuration class that defines the general setup for the project, such as the data directory and some Cisco URLs. This configuration is inherited by other job-specific configurations.
- **Purpose**: It provides basic configuration properties and handles the setup of the project data directory, ensuring that paths are created when the configuration is instantiated.

2. **GetFeaturesConfig**:
- **Description**: This class extends `Config` and provides configuration specific to the `GetFeaturesJob`. It includes options to control whether to fetch data online, the number of concurrent requests, and the delay between requests.
- **Key Configurations**:
- `HASHING_DIGEST` controls the length of the hash used for unique feature identification.
- `CONCURRENT_REQUESTS_LIMIT` and `REQUEST_DELAY` determine request throttling to avoid getting blocked by Cisco.
- Defines URLs for different Cisco APIs to fetch platforms, releases, and features.

3. **GetEOLConfig**:
- **Description**: This class extends `Config` and provides configuration specific to the `CiscoEOLJob`. It defines various URLs and paths related to Cisco's EOL and EOS notices.
- **Key Configurations**:
- `CONCURRENT_REQUESTS` controls the number of requests that can be executed in parallel.
- `DATA_REFRESH_INTERVAL` defines the interval (in seconds) after which the data is refreshed.
- Configures archiving and paths for storing extracted EOL data.
3 changes: 1 addition & 2 deletions app/jobs/get_cisco_products.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@

import httpx
from bs4 import BeautifulSoup

from app.config import Config, logging
from config import Config, logging

logger = logging.Logger("GetCiscoProductsJob")

Expand Down
7 changes: 3 additions & 4 deletions app/jobs/get_eol_fn.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,9 @@

import httpx
from bs4 import BeautifulSoup

from app.config import GetEOLConfig, logging
from app.jobs.get_cisco_products import scrape_cisco_products
from app.utils import normalize_date_format, normalize_to_camel_case, save_to_json
from config import GetEOLConfig, logging
from jobs.get_cisco_products import scrape_cisco_products
from utils import normalize_date_format, normalize_to_camel_case, save_to_json


class CiscoEOLJob:
Expand Down
7 changes: 3 additions & 4 deletions app/jobs/get_features.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,9 @@

import aiofiles
import httpx

from app.config import GetFeaturesConfig, logging
from app.models import FeaturesRequestModel
from app.utils import save_to_json
from config import GetFeaturesConfig, logging
from models import FeaturesRequestModel
from utils import save_to_json


class GetFeaturesJob:
Expand Down
5 changes: 2 additions & 3 deletions app/models.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
from dataclasses import field
from enum import Enum
from typing import List, Optional, Union
from typing import Optional

from pydantic import BaseModel, HttpUrl
from pydantic import BaseModel


class ProductAlerts(BaseModel):
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,105 +1,21 @@
import json
import os
import sys
from unittest.mock import AsyncMock, MagicMock, patch

import pytest

from app.jobs.get_features import GetFeaturesJob, RequestModel
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))


@pytest.mark.asyncio
async def test_fetch_platforms_failure():
job = GetFeaturesJob()
mock_client = AsyncMock()
mock_response = AsyncMock()
mock_response.status_code = 404
mock_client.post.return_value = mock_response

result = await job._fetch_platforms(mock_client, "Switches")

assert result == {}
mock_client.post.assert_called_once_with(
job.config.REQUEST_1,
headers=job.config.HEADERS,
json=RequestModel(mdf_product_type="Switches").model_dump(),
timeout=900,
)


@pytest.mark.asyncio
async def test_fetch_releases_failure():
job = GetFeaturesJob()
mock_client = AsyncMock()
mock_response = AsyncMock()
mock_response.status_code = 404
mock_client.post.return_value = mock_response

platform_data = {"platform_id": 1}
result = await job._fetch_releases(mock_client, platform_data, "Switches")

assert result == []
mock_client.post.assert_called_once_with(
job.config.REQUEST_2,
headers=job.config.HEADERS,
json=RequestModel(platform_id=1, mdf_product_type="Switches").model_dump(),
timeout=900,
)


@pytest.mark.asyncio
async def test_fetch_features_success():
job = GetFeaturesJob()
mock_client = AsyncMock()
mock_response = AsyncMock()
mock_response.status_code = 200
mock_response.aread.return_value = json.dumps([]).encode()
mock_client.post.return_value = mock_response
mock_tar = MagicMock()

release_data = {"platform_id": 1, "release_id": 1}
await job._fetch_features(mock_client, release_data, "Switches", mock_tar)

mock_tar.add.assert_called_once()
mock_client.post.assert_called_once_with(
job.config.REQUEST_3,
headers=job.config.HEADERS,
json=RequestModel(
platform_id=1, mdf_product_type="Switches", release_id=1
).model_dump(),
timeout=900,
)


@pytest.mark.asyncio
async def test_fetch_features_failure():
job = GetFeaturesJob()
mock_client = AsyncMock()
mock_response = AsyncMock()
mock_response.status_code = 404
mock_client.post.return_value = mock_response
mock_tar = MagicMock()

release_data = {"platform_id": 1, "release_id": 1}
await job._fetch_features(mock_client, release_data, "Switches", mock_tar)

mock_tar.add.assert_not_called()
mock_client.post.assert_called_once_with(
job.config.REQUEST_3,
headers=job.config.HEADERS,
json=RequestModel(
platform_id=1, mdf_product_type="Switches", release_id=1
).model_dump(),
timeout=900,
)
from app.jobs.get_features import GetFeaturesJob


@pytest.mark.asyncio
async def test_fetch_all_features():
job = GetFeaturesJob()
mock_tar = MagicMock()
releases = {"Switches": [{"platform_id": 1, "release_id": 1}]}

with patch.object(job, "_fetch_features", new_callable=AsyncMock) as mock_fetch:
await job._fetch_all_features(releases, mock_tar)
await job._fetch_all_features(releases)

assert mock_fetch.await_count == 1
mock_fetch.assert_awaited_once()
Expand All @@ -110,7 +26,6 @@ async def test_fetch_data():
job = GetFeaturesJob()

# Ensure that FETCH_FEATURES_ONLINE is True during the test
job.config = AsyncMock()
job.config.FETCH_FEATURES_ONLINE = True

with patch.object(
Expand Down Expand Up @@ -171,7 +86,7 @@ async def test_fetch_and_archive_features():

await job._fetch_and_archive_features(releases)

mock_fetch.assert_awaited_once_with(releases, mock_tar)
mock_fetch.assert_awaited_once_with(releases)


@pytest.mark.asyncio
Expand Down
145 changes: 145 additions & 0 deletions app/tests/test_products.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
import os
import sys
from unittest.mock import AsyncMock, mock_open, patch

import pytest
from bs4 import BeautifulSoup

sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from app.jobs.get_cisco_products import (
BASE_URL,
extract_product_links,
extract_supported_products,
get_page_soup,
scrape_cisco_products,
)

BASE_HTML = """
<div data-config-metrics-title="Products by Category">
<a href="/product1.html">Product 1</a>
<a href="/product2.html">Product 2</a>
</div>
"""

SUPPORTED_PRODUCTS_HTML = """
<div id="allSupportedProducts">
<a href="/supported1.html">Supported Product 1</a>
<a href="/supported2.html">Supported Product 2</a>
</div>
"""


@pytest.mark.asyncio
async def test_get_page_soup_success():
# Mock the client and response
mock_client = AsyncMock()
mock_response = AsyncMock()
mock_response.status_code = 200
mock_response.content = BASE_HTML.encode()
mock_client.get.return_value = mock_response

# Call the function
soup = await get_page_soup(mock_client, "https://example.com")

# Assertions
assert isinstance(soup, BeautifulSoup)
assert "Products by Category" in str(soup)
mock_client.get.assert_awaited_once_with("https://example.com")


@pytest.mark.asyncio
async def test_get_page_soup_failure():
# Mock the client and response
mock_client = AsyncMock()
mock_response = AsyncMock()
mock_response.status_code = 404
mock_client.get.return_value = mock_response

# Call the function
soup = await get_page_soup(mock_client, "https://example.com")

# Assertions
assert soup is None
mock_client.get.assert_awaited_once_with("https://example.com")


@pytest.mark.asyncio
async def test_extract_product_links():
# Create a BeautifulSoup object from the base HTML
soup = BeautifulSoup(BASE_HTML, "html.parser")

# Call the function
product_links = await extract_product_links(soup)

# Assertions
assert len(product_links) == 2
assert product_links[0] == {
"product": "Product 1",
"url": "https://www.cisco.com/product1.html",
}
assert product_links[1] == {
"product": "Product 2",
"url": "https://www.cisco.com/product2.html",
}


@pytest.mark.asyncio
async def test_extract_supported_products():
# Mock the client and response
mock_client = AsyncMock()
mock_response = AsyncMock()
mock_response.status_code = 200
mock_response.content = SUPPORTED_PRODUCTS_HTML.encode()
mock_client.get.return_value = mock_response

# Call the function
supported_products = await extract_supported_products(
mock_client, "https://example.com/product1.html"
)

# Assertions
assert len(supported_products) == 2
assert supported_products[0] == {
"name": "Supported Product 1",
"url": "https://www.cisco.com/supported1.html",
}
assert supported_products[1] == {
"name": "Supported Product 2",
"url": "https://www.cisco.com/supported2.html",
}
mock_client.get.assert_awaited_once_with("https://example.com/product1.html")


@pytest.mark.asyncio
async def test_scrape_cisco_products():
# Mock the client, response, and open function
mock_client = AsyncMock()
mock_response = AsyncMock()
mock_response.status_code = 200
mock_response.content = BASE_HTML.encode()
mock_client.get.return_value = mock_response

# Mock extract_supported_products to return sample data
mock_extract_supported_products = AsyncMock()
mock_extract_supported_products.return_value = [
{"name": "Supported Product 1", "url": "https://www.cisco.com/supported1.html"}
]

with patch(
"app.jobs.get_cisco_products.get_page_soup",
return_value=BeautifulSoup(BASE_HTML, "html.parser"),
), patch(
"app.jobs.get_cisco_products.extract_supported_products",
new=mock_extract_supported_products,
), patch(
"httpx.AsyncClient", return_value=mock_client
), patch(
"builtins.open", mock_open()
):

# Run the scraping function
await scrape_cisco_products("output.json")

# Assertions
mock_extract_supported_products.assert_called()
mock_extract_supported_products.assert_awaited()
Loading

0 comments on commit 18ac656

Please sign in to comment.