Create MonitorDaemon docker container for ol-www0 to monitor HTTP status codes #10267

mekarpeles · 2025-01-03T22:15:54Z

Proposal

A general purpose container called something like MonitorDaemon that can be added to any VM and configured with a list of monitoring operations that run on that host.

First, for ol-www0 this entails the IP and status aggregation scripts defined in #8795:

check-node ol-www0 && scripts/nginx_http_status_monitor.py
check-node ol-home0 && scripts/monitoring/solr_updater_lag.py

Justification

Problem

What problem does this proposal address & for what audience(s)?

Currently stats easily interrupted when #8795 scripts run on ol-www0 via tmux are interrupted

Breakdown

Can be closed once #8795 is evolved into a docker container approach that can go into our deploy and, initially, run on ol-www0

monitoring:
    profile: ["ol-www0", "ol-home0"]

Related files

Stakeholders

Instructions for Contributors

Please run these commands to ensure your repository is up to date before creating a new branch to work on this issue and each time after pushing code to Github, because the pre-commit bot may add commits to your PRs upstream.

The text was updated successfully, but these errors were encountered:

itsBaivab · 2025-01-04T20:44:03Z

I would love to work on this. Could you please assign this to me?

mekarpeles · 2025-01-05T20:01:02Z

@itsBaivab I think this one should go to @cdrini on staff for now as he's already built most of the infrastructure

mekarpeles · 2025-01-06T21:14:55Z

Any new monitoring should use python instead of bash for writing to graphite

mekarpeles · 2025-01-06T21:18:13Z

This issue requires adding a new container that only runs on production (compose.production.yml) and gets deployed to every host, however the container will only run the jobs relating to the hosts the container is on.

For this issue, the only container with jobs should be ol-www0 and should be those jobs defined by:

This issue can be closed once this new docker instance for prod-only is running these two scripts on ol-www0

We should explore an alternative to watch as the command to run so the container doesn't prematurely die.

@cdrini also needs to stop the legacy tmux flow that's currently on prod for the old approach

mekarpeles · 2025-01-06T21:20:48Z

@itsBaivab if this is enough to go on, feel free to give it a try and ask questions

mekarpeles changed the title ~~Create ServiceMonitorDaemon docker container for ol-www0 to monitor HTTP status codes~~ Create MonitorDaemon docker container for ol-www0 to monitor HTTP status codes Jan 3, 2025

mekarpeles added Priority: 2 Important, as time permits. [managed] Lead: @mekarpeles Issues overseen by Mek (Staff: Program Lead) [managed] and removed Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Needs: Lead labels Jan 4, 2025

mekarpeles added this to the Sprint 2025-01 milestone Jan 4, 2025

github-actions bot added the Needs: Response Issues which require feedback from lead label Jan 5, 2025

mekarpeles assigned cdrini Jan 5, 2025

mekarpeles added Priority: 1 Do this week, receiving emails, time sensitive, . [managed] and removed Priority: 2 Important, as time permits. [managed] labels Jan 6, 2025

mekarpeles mentioned this issue Jan 7, 2025

Dockerized Solr Performance Monitoring #10290

Open

4 tasks

mekarpeles removed the Needs: Response Issues which require feedback from lead label Jan 12, 2025

mekarpeles added Priority: 2 Important, as time permits. [managed] and removed Priority: 1 Do this week, receiving emails, time sensitive, . [managed] labels Jan 21, 2025

mekarpeles modified the milestones: Sprint 2025-01, Sprint 2025-02 Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create MonitorDaemon docker container for ol-www0 to monitor HTTP status codes #10267

Create MonitorDaemon docker container for ol-www0 to monitor HTTP status codes #10267

mekarpeles commented Jan 3, 2025 •

edited

Loading

itsBaivab commented Jan 4, 2025

mekarpeles commented Jan 5, 2025

mekarpeles commented Jan 6, 2025

mekarpeles commented Jan 6, 2025 •

edited

Loading

mekarpeles commented Jan 6, 2025

Create MonitorDaemon docker container for ol-www0 to monitor HTTP status codes #10267

Create MonitorDaemon docker container for ol-www0 to monitor HTTP status codes #10267

Comments

mekarpeles commented Jan 3, 2025 • edited Loading

Proposal

Justification

Problem

Breakdown

Related files

Stakeholders

Instructions for Contributors

itsBaivab commented Jan 4, 2025

mekarpeles commented Jan 5, 2025

mekarpeles commented Jan 6, 2025

mekarpeles commented Jan 6, 2025 • edited Loading

mekarpeles commented Jan 6, 2025

mekarpeles commented Jan 3, 2025 •

edited

Loading

mekarpeles commented Jan 6, 2025 •

edited

Loading