Skip to content

WMArchive aggregation

Valentin Kuznetsov edited this page Mar 28, 2016 · 4 revisions

Summary statistics

Based on current schema here we present possible aggregation metrics to collect and visualize.

  • For all agents (meta_data.agent_ver, host)
    • total number of jobs running by agent
    • total number of success/running/failed jobs
    • list of all acquisitionEra, acquisitionVersion
    • performance metrics for each step
      • total cpu, ram usage, avg time
  • Performance metrics for individual sites
    • get list of sites from sitedb and for each site get total number of success/running/failed jobs, total cpu, ram usage
  • Total number of processed runs/lumis
  • Total size of produced datasets

Task list

we need to perform the following tasks to create WMArchive aggregation framework.

  • write code (MR or spark) to collect aggregation statistics
  • integration code into production machinery (write and organize crontabs, schedule them on analytics cluster, etc.)
  • write code for web frontend to visualize the data, the data should be presented in JSON format
    • evaluate different JavaScript plotting libraries; a potential list of plotting libraries can be found here or we may use kibana
  • estimate collection time, i.e. how much it will take to get N-months statistics