Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connecting to IoT fails with TLS negotiation timeout #119

Closed
jonyt opened this issue Dec 9, 2020 · 11 comments
Closed

Connecting to IoT fails with TLS negotiation timeout #119

jonyt opened this issue Dec 9, 2020 · 11 comments
Labels
feature-request A feature should be added or improved.

Comments

@jonyt
Copy link

jonyt commented Dec 9, 2020

Is your feature request related to a problem? Please describe.
I'm trying to connect to IoT via a proxy to a broker in a region that's far from me. This fails with:

[INFO ] [2020-12-09T11:10:44Z] [00000eb4] [http-connection] - 00000000009082E0: Client shutdown completed with error 1067 (AWS_IO_TLS_NEGOTIATION_TIMEOUT).
[ERROR] [2020-12-09T11:10:44Z] [00000eb4] [http-connection] - (00000000009082E0) Error 1067 while connecting to "xxxxxxx.credentials.iot.xxxxxx.amazonaws.com" via proxy.
[WARN ] [2020-12-09T11:10:44Z] [00000eb4] [connection-manager] - id=000000000088DB10: Failed to obtain new connection from http layer, error 1067(Channel shutdown due to tls negotiation timeout)

So basically if the network is slow or the latency high I won't be able to connect.

Describe the solution you'd like
TLS negotiation timeout is currently set to 4 seconds in the native layer. This is non-configurable. I'd like to be able to set it.

Describe alternatives you've considered
I can have a retry mechanism, but that won't work in a high latency setup.

Additional context
This feature request already exists in the python crt repo.

@jonyt jonyt added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Dec 9, 2020
@bretambrose
Copy link
Contributor

Thanks for bringing this up. You're not the first person to run into this problem and timeouts are definitely on our radar to find/expose a better solution to.

@bretambrose bretambrose removed the needs-triage This issue or PR still needs to be triaged. label Dec 9, 2020
@jonyt
Copy link
Author

jonyt commented Dec 10, 2020

Is it very difficult to expose the timeout option to configurability?

Also, even if the above not possible it would be nice to have the CRT error available to the java SDK. As it is this failure causes connect to fail with no description of the failure, so I had to enable the CRT log and look there. This would be a lot more difficult when the device was on the client's side.

@bretambrose
Copy link
Contributor

bretambrose commented Dec 10, 2020

It's not difficult to expose a particular kind of timeout, but there is some internal debate over how we should be exposing them. I'm currently a fan of "one timeout across the full connection establishment regardless of what's going on underneath" which API-wise doesn't mesh with exposing individual timeouts for small pieces of the full circuit construction.

For example, right now in the SDK you can configure a connection establishment request to:

  1. Establish a tls-protected connection to a proxy, followed by
  2. Establish a tls-protected end-to-end tunnel (completely independent of the tls context in 1) through the proxy to an arbitrary endpoint via a CONNECT request, in order to
  3. Make an http request to perform a websocket upgrade on (2) in order to use mqtt over the connection

Only when (3) completes successfully (or an error anywhere short circuits the process) will you, the user, get a callback. In that timeframe there's two separate tls timeouts, a socket timeout, two (CONNECT and websocket handshake) http request timeouts, and that's only counting things we can control (not the second half of the proxy CONNECT request).

In the future, we may have even more complex patterns (primarily revolving around proxy authentication).

From a user experience perspective, I'd much rather a single knob that meant "if my connection isn't fully established by this time, then give up" over 5+ knobs that all control little sub-pieces (that don't give complete cover) of timeout failures.

@jonyt
Copy link
Author

jonyt commented Dec 16, 2020 via email

@bretambrose
Copy link
Contributor

Just did a quick check and the default was recently raised to 10s from 4s. The latest version of the v2 sdk should have that default change. It's still not a configurable value, but updating to latest my temporarily ameliorate your issue until we get better timeout config implemented.

@jonyt
Copy link
Author

jonyt commented Dec 17, 2020

Got it, thanks. Much appreciated.

@jonyt
Copy link
Author

jonyt commented Dec 22, 2020

I upgraded to 1.2.10 and crt 0.9.2 and now the process crashes. I'm getting the following:

INFO   | jvm 1    | 2020/12/22 10:37:41 | WARNING: Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
INFO   | jvm 1    | 2020/12/22 10:39:20 | Dec 22, 2020 10:39:20 AM org.apache.commons.httpclient.HttpMethodBase getResponseBody
INFO   | jvm 1    | 2020/12/22 10:39:20 | WARNING: Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
INFO   | jvm 1    | 2020/12/22 10:44:10 | Dec 22, 2020 10:44:10 AM org.apache.commons.httpclient.HttpMethodBase getResponseBody
INFO   | jvm 1    | 2020/12/22 10:44:10 | WARNING: Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
INFO   | jvm 1    | 2020/12/22 10:44:11 | Fatal error condition occurred in C:\Program Files (x86)\Jenkins\workspace\aws-crt-java-build-dll-win64\aws-crt-java\crt\aws-c-mqtt\source\client.c:54: aws_mutex_try_lock(&(connection)->synced_data.lock) == (-1)
INFO   | jvm 1    | 2020/12/22 10:44:11 | Exiting Application
INFO   | jvm 1    | 2020/12/22 10:44:11 | at 0x7FF9073E2B62: Java_software_amazon_awssdk_crt_auth_signing_AwsSigner_awsSignerSignChunk
INFO   | jvm 1    | 2020/12/22 10:44:11 | at 0x7FF9073E2B62: Java_software_amazon_awssdk_crt_auth_signing_AwsSigner_awsSignerSignChunk
INFO   | jvm 1    | 2020/12/22 10:44:11 | at 0x7FF9073E2B62: Java_software_amazon_awssdk_crt_auth_signing_AwsSigner_awsSignerSignChunk
INFO   | jvm 1    | 2020/12/22 10:44:11 | at 0x7FF9073E2B62: Java_software_amazon_awssdk_crt_auth_signing_AwsSigner_awsSignerSignChunk
INFO   | jvm 1    | 2020/12/22 10:44:11 | at 0x7FF9073E2B62: Java_software_amazon_awssdk_crt_auth_signing_AwsSigner_awsSignerSignChunk
INFO   | jvm 1    | 2020/12/22 10:44:11 | at 0x7FF9073E2B62: Java_software_amazon_awssdk_crt_auth_signing_AwsSigner_awsSignerSignChunk
INFO   | jvm 1    | 2020/12/22 10:44:11 | at 0x7FF9073E2B62: Java_software_amazon_awssdk_crt_auth_signing_AwsSigner_awsSignerSignChunk
INFO   | jvm 1    | 2020/12/22 10:44:11 | at 0x00000000011C8C67: Failed to lookup symbol: error 126
ERROR  | wrapper  | 2020/12/22 10:44:11 | JVM exited unexpectedly.

@bretambrose
Copy link
Contributor

Are you on windows?

@jonyt
Copy link
Author

jonyt commented Dec 22, 2020 via email

@bretambrose
Copy link
Contributor

This should be fixed in v1.2.11

@github-actions
Copy link

github-actions bot commented Jan 5, 2021

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@jonyt jonyt mentioned this issue Feb 15, 2021
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved.
Projects
None yet
Development

No branches or pull requests

2 participants