Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEST-ONLY] [WIP] debug kourier tls runtime failure #15695

Closed
wants to merge 45 commits into from

Conversation

skonto
Copy link
Contributor

@skonto skonto commented Jan 14, 2025

Fixes #

Proposed Changes

Release Note


@skonto skonto changed the title [TEST] [WIP] debug kourier tls runtime failure [TEST-ONLY] [WIP] debug kourier tls runtime failure Jan 14, 2025
@knative-prow knative-prow bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jan 14, 2025
Copy link

knative-prow bot commented Jan 14, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: skonto

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 14, 2025
@skonto skonto added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 14, 2025
Copy link

codecov bot commented Jan 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.82%. Comparing base (19b9a09) to head (8df7745).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15695      +/-   ##
==========================================
+ Coverage   80.78%   80.82%   +0.04%     
==========================================
  Files         222      222              
  Lines       18025    18027       +2     
==========================================
+ Hits        14561    14570       +9     
+ Misses       3092     3086       -6     
+ Partials      372      371       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@knative-prow knative-prow bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 14, 2025
@skonto
Copy link
Contributor Author

skonto commented Jan 15, 2025

/retest

@skonto
Copy link
Contributor Author

skonto commented Jan 15, 2025

So far:

Activator:

2025-01-14T18:00:44.6007786Z     stream.go:304: E 17:59:59.628 activator-874694df9-mphrq [activator] [serving-tests/probe-runtime-http-get-yfsvnmvr-00001] error reverse proxying request; sockstat: sockets: used 260
2025-01-14T18:00:44.6008001Z         TCP: inuse 69 orphan 0 tw 28 alloc 1840 mem 193
2025-01-14T18:00:44.6008128Z         UDP: inuse 0 mem 0
2025-01-14T18:00:44.6008250Z         UDPLITE: inuse 0
2025-01-14T18:00:44.6008353Z         RAW: inuse 0
2025-01-14T18:00:44.6008516Z         FRAG: inuse 0 memory 0
2025-01-14T18:00:44.6008875Z          err=timed out dialing 10.96.177.28:8112 after 21.11s

Kourier gateway:

2025-01-14T17:59:41.733634585Z stdout F [2025-01-14T17:59:37.733Z] "GET /healthz HTTP/2" 200 - 0 0 1 1 "-" "Knative-Ingress-Probe" "c4d56a77-9b6f-4de7-ab89-b02c617301c2" "probe-runtime-http-get-yfsvnmvr.serving-tests" "10.244.3.8:8112"
..."internalkourier" "127.0.0.1:9901"
2025-01-14T18:00:21.738139469Z stdout F [2025-01-14T17:59:59.662Z] "GET /healthz/readiness HTTP/1.1" 502 - 0 49 20998 20997 "-" "Go-http-client/1.1" "77897c29-7698-454e-8a65-1f2bda9ad7b7" "probe-runtime-http-get-yfsvnmvr.serving-tests.example.com" "10.244.3.8:8112"

It seems certs are loaded and everything is up, including the targeted pod. Ingress probing works but when activator sends the request to the ksvc private service (10.96.177.28:8112) it fails. Something is off with the tls port or the TLS handshake because I dont see any logs at the QP side when presenting the certificate at the server side. So the connection proxied from the activator times out earlier with 502. Btw it never recovers as I a run a second request that fails as well.

@knative-prow-robot knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 16, 2025
@knative-prow-robot knative-prow-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 16, 2025
@knative-prow knative-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 16, 2025
pkg/http/proxy.go Fixed Show fixed Hide fixed
@@ -50,6 +50,7 @@
san := certificates.DataPlaneUserSAN(revID.Namespace)

tlsConf.VerifyConnection = verifySAN(san)
tlsConf.InsecureSkipVerify = true

Check failure

Code scanning / CodeQL

Disabled TLS certificate check High

InsecureSkipVerify should not be used in production code.
@skonto
Copy link
Contributor Author

skonto commented Jan 21, 2025

Not reproducible any more 🤔

@skonto
Copy link
Contributor Author

skonto commented Jan 21, 2025

I am seeing this in cert manager's webhook log

2025-01-21T12:32:24.301181686Z stderr F I0121 12:32:24.301076       1 logs.go:59] http: TLS handshake error from 172.18.0.6:44136: EOF
2025-01-21T12:39:05.127018Z stderr F I0121 12:39:05.126912       1 logs.go:59] http: TLS handshake error from 172.18.0.6:4220: EOF
2025-01-21T12:40:36.752184067Z stderr F I0121 12:40:36.750348       1 logs.go:59] http: TLS handshake error from 172.18.0.6:37034: EOF
2025-01-21T12:41:13.682699565Z stderr F I0121 12:41:13.681779       1 logs.go:59] http: TLS handshake error from 172.18.0.6:5044: EOF

cert-manager/cert-manager#4594

@skonto skonto closed this Jan 21, 2025
@skonto skonto reopened this Jan 21, 2025
Copy link

knative-prow bot commented Jan 21, 2025

@skonto: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
unit-tests_serving_main 76c7838 link true /test unit-tests
build-tests_serving_main 76c7838 link true /test build-tests

Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@skonto skonto closed this Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants