This is a project that has the list of sites to scan in a database, runs scans on a schedule and writes the results to S3 along with logging audit history into to the database.
In addition to the cron audit feature this function will also return:
- List of all domains that were able to be audited
- All domains that were offline last time checked
- All domains that are now redirects to different domains
- All log data for a specific domain
Currently this is running audits for the following
Web performance query is retrieve using this process:
- does CrUX respond with data?
- Yes! cool, write to db log, put data on S3
- If not is site not responding at all or is it a redirect to another domain?
- If so log this status to db
- If site is responding but has not CrUX data it is probably a relatively low traffic site so request a lab run lighthouse audit
PK: "DOMAIN#example.com", // Partition Key
SK: "METADATA#latest", // Sort Key
domain: "example.com", // Actual domain string
status: "ONLINE", // Current status: ['REDIRECTS','ONLINE']
lastCheckedAt: "2025-01-25", // ISO date string
historyLog: { // Dates checks ran and links to files
// ... data log
}
}
GSI1PK: "STATUS#ONLINE", // GSI Partition Key
GSI1SK: "2025-01-25", // GSI Sort Key (lastCheckedAt)
- Query all domains with specific status:
QueryInput = {
IndexName: "GSI1",
KeyConditionExpression: "GSI1PK = :status",
ExpressionAttributeValues: {
":status": "STATUS#OFFLINE"
}
}
- Get information for a single domain:
data.domains.query({
KeyConditionExpression: 'PK = :PK AND SK = :SK',
ExpressionAttributeValues: {
':PK': `DOMAIN#${domain}`,
':SK': `METADATA#latest`
}
})
- Get 5 records with old check date and specific status:
QueryInput = {
IndexName: "GSI1",
KeyConditionExpression: "GSI1PK = :status AND GSI1SK < :date",
Limit: 5,
ExpressionAttributeValues: {
":status": "STATUS#ONLINE",
":date": "2025-01-25"
}
}
This Lambda function reviews the sites in the database on a schedule and writes the audit results to a publicly accessible S3 bucket: audits.scangov.org
Example: https://s3.us-east-1.amazonaws.com/audits.scangov.org/performance/copyright.gov.json
This uses the OpenJS Foundation backed Architect library which provides a nice wrapper for writing node.js to run on AWS Lambdas.
The AWS credentials are defined in the app.arc
file in the @aws
section:
@aws
profile scangov
The profile line references the name of a local AWS credentials profile.
To deploy run:
npm run deploy:staging
or
npm run deploy:production