Ever had that moment when your production services decided to take an unannounced vacation? Yeah, me too. That's why I built this automated health monitoring system that keeps tabs on your services and sends Slack notifications when things go sideways. Think of it as your infrastructure's personal health assistant.
Before diving in, make sure you have all the necessary components set up. Check out PREREQUISITES.md for a detailed setup guide.
Before proceeding, verify:
- AWS CLI configured (
aws sts get-caller-identity
) - Terraform installed (
terraform -v
) - Python 3.9 available (
python3.9 --version
) - Slack webhook URL obtained
- Virtual environment activated
-
dist/
directory with all necessary files
If any of these are missing, check the detailed sections above. Trust me, it's worth getting these, right from the start!
- Async Health Checks: Because waiting is so 2010
- Slack Integration: Get notifications that actually look good (and are useful!)
- AWS Lambda Ready: Serverless, because who wants to manage servers for monitoring servers?
- Infrastructure as Code: Everything in Terraform, because we're professionals here
- Configurable Monitoring: Customize everything from timeouts to headers
- Multi-Service Support: Monitor both frontend and backend services in one go
- Clone this repo:
git clone https://github.com/poacosta/service-health-monitor
cd service-health-monitor
- Set up your Python environment:
python -m venv .venv
source .venv/bin/activate # or `.venv\Scripts\activate` on Windows
pip install -r requirements.txt
- Create your
terraform.tfvars
:
project_name = "my-awesome-project"
environment = "production"
slack_webhook_url = "https://hooks.slack.com/services/your/webhook/url"
services_config = [
{
name = "Backend API"
url = "https://api.example.com/health"
type = "backend"
timeout = 30
expected_status = 200
custom_headers = {
"Authorization" = "Bearer your-token-if-needed"
}
},
{
name = "Frontend App"
url = "https://app.example.com"
type = "frontend"
timeout = 30
expected_status = 200
}
]
- Deploy to AWS:
cd terraform
terraform init -upgrade
terraform plan
terraform apply
- Microservices Monitoring: Keep track of your distributed services
- Frontend Health: Monitor your user-facing applications
- API Availability: Ensure your APIs are responding correctly
- Custom Health Checks: Add custom headers for authenticated endpoints
Each service in your terraform.tfvars
can have:
name
: Service identifierurl
: Health check endpointtype
: "backend" or "frontend"timeout
: Request timeout in seconds (default: 30)expected_status
: Expected HTTP status (default: 200)custom_headers
: Additional HTTP headers
Modify the check frequency in terraform.tfvars
:
schedule_expression = "rate(5 minutes)" # Default
# OR
schedule_expression = "cron(0/15 * * * ? *)" # Every 15 minutes
βββββββββββββββ ββββββββββββ ββββββββββββββ
β EventBridge β βββΆ β Lambda β βββΆ β Services β
βββββββββββββββ ββββββββββββ ββββββββββββββ
β
βΌ
βββββββββββ
β Slack β
βββββββββββ
- Add metrics export to CloudWatch
- Implement retry mechanisms with exponential backoff
- Add support for custom health check logic
- Create a dashboard for historical uptime data
- Add support for multiple notification channels
Feel free to dive in! Open an issue or submit PRs.
- Fork the Repository
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is licensed under the MIT License β see the LICENSE file for details.
- The async Python community for making non-blocking requests a breeze
- Terraform for making infrastructure manageable
- Coffee β for making everything possible
Please ensure you never commit sensitive information like tokens or webhook URLs. Use environment variables or AWS Secrets Manager for production deployments.
Built with love for DevOps engineers who want to sleep better at night. Because your services should notify you before your users do.