Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent and Prolonged Pod Restart Times in Kubernetes Cluster #82

Open
1 task
JulieHoaglandSorensen opened this issue Jan 3, 2025 · 0 comments
Open
1 task
Labels
bug Something isn't working

Comments

@JulieHoaglandSorensen
Copy link
Contributor

JulieHoaglandSorensen commented Jan 3, 2025

Description

Certain pods in Kubernetes clusters managed by Bottlenetes experience extended restart times. This issue is inconsistent across pods but frequently occurs, particularly during volume mounting or readiness probe execution. The long wait times disrupt user experience and may affect high-availability use cases.

Expected Behavior:
Pods should restart within a predictable and minimal time frame (e.g., ~15 seconds), ensuring seamless operation.

Actual Behavior:

  • Some pods restart within the expected time frame (~15 seconds).
  • Others experience extended delays, ranging from 40 seconds to over 2 minute.
  • Logs indicate delays during volume mounting and readiness probe execution.

Reproduction

Pod Restart Issue

Fork the Bottlenetes repository to your own GitHub account.

Clone your forked repository and follow the Quickstart instructions in the README to set up the application.

Deploy the Application: Deploy the Bottlenetes application on a Kubernetes cluster (e.g., Minikube or AWS EKS):
kubectl apply -f your-k8s-deployment.yaml

Restart Pods: From the Bottlenetes WebUI, select one pod (excluding the pods in the control plane) from the heatmap, and click "restart pod" button.

Measure Restart Times: Observe and record the restart times for all pods in the deployment.

  • Identify pods that restart within the expected time frame (~30 seconds).
  • Note any pods that take significantly longer (~2-5 minutes) to restart.

Review Logs
Check pod logs for any delays related to volume mounting, readiness probe execution, or other potential bottlenecks.

System information

System Information
Environment: Kubernetes clusters on Minikube and AWS EKS.
OS: MacOS.
Node.js Version: v18.17.0.
Cluster Tools: Kubernetes, Helm, Prometheus.

Dependencies

Primary Dependencies:
@emotion/react: v11.14.0
@emotion/styled: v11.14.0
@kubernetes/client-node: v0.22.3
@mui/material: v6.3.0
axios: v1.7.9
bcrypt: v5.1.1
body-parser: v1.20.3
chart.js: v4.4.7
concurrently: v9.1.0
cookie-parser: v1.4.7
cors: v2.8.5
dotenv: v16.4.7
express: v4.21.2
express-session: v1.18.1
express-validator: v7.2.0
jsonwebtoken: v9.0.2
lucide-react: v0.462.0
moment: v2.30.1
openai: v4.74.0
passport: v0.7.0
passport-github2: v0.1.12
passport-google-oauth20: v2.0.0
pg: v8.13.1
pg-format: v1.0.4
pg-hstore: v2.3.4
react: v18.3.1
react-chartjs-2: v5.2.0
react-dom: v18.3.1
react-draggable: v4.4.6
react-icons: v5.4.0
react-markdown: v9.0.1
react-router-dom: v7.0.2
sequelize: v6.37.5
wait-on: v8.0.1
zustand: v5.0.2

DevDependencies:
@eslint/js: v9.15.0
@tailwindcss/typography: v0.5.15
@types/bcrypt: v5.0.2
@types/chart.js: v2.9.41
@types/cookie-parser: v1.4.8
@types/cors: v2.8.17
@types/express: v5.0.0
@types/jsonwebtoken: v9.0.7
@types/passport: v1.0.17
@types/passport-github2: v1.2.9
@types/passport-google-oauth20: v2.0.16
@types/react: v18.3.12
@types/react-dom: v18.3.1
@vitejs/plugin-react: v4.3.4
autoprefixer: v10.4.20
eslint: v9.15.0
eslint-plugin-react: v7.37.2
eslint-plugin-react-hooks: v5.0.0
eslint-plugin-react-refresh: v0.4.14
globals: v15.12.0
nodemon: v3.1.9
postcss: v8.4.49
prettier: v3.4.1
prettier-plugin-tailwindcss: v0.6.9
tailwindcss: v3.4.17
ts-node: v10.9.2
typescript: v5.7.2
typescript-eslint: v8.19.0
vite: v6.0.6

Additional information

Priority: Medium-High
Extended pod restart times impact system reliability for high-availability use cases.

👨‍👧‍👦 Contributing

  • 🙋‍♂️ Yes, I'd love to make a PR to fix this bug!
@JulieHoaglandSorensen JulieHoaglandSorensen added the bug Something isn't working label Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant