Skip to content

Commit

Permalink
Add synthetic seed data version 2025.1.15
Browse files Browse the repository at this point in the history
Generated from recent crawls:

- a bunch of recent daily crawls: ./misc/merge_results.sh f3c7cc7 66b894a
- five 40K, one 35K and six 30K distributed crawl runs

Filtered to ignore a bunch of potentially malware-infected sites with --load-data-ignore-sites=asianetnews.com,gumtree.co.za,malaysiakini.com,minna.cc,planetsuzy.org

Filtered out some invalid beacon destination domains.
  • Loading branch information
ghostwords committed Jan 16, 2025
1 parent 66b894a commit eb5747c
Showing 1 changed file with 166,324 additions and 83,407 deletions.
Loading

0 comments on commit eb5747c

Please sign in to comment.