Connecting to IoT fails with TLS negotiation timeout #119

jonyt · 2020-12-09T13:30:34Z

Is your feature request related to a problem? Please describe.
I'm trying to connect to IoT via a proxy to a broker in a region that's far from me. This fails with:

[INFO ] [2020-12-09T11:10:44Z] [00000eb4] [http-connection] - 00000000009082E0: Client shutdown completed with error 1067 (AWS_IO_TLS_NEGOTIATION_TIMEOUT).
[ERROR] [2020-12-09T11:10:44Z] [00000eb4] [http-connection] - (00000000009082E0) Error 1067 while connecting to "xxxxxxx.credentials.iot.xxxxxx.amazonaws.com" via proxy.
[WARN ] [2020-12-09T11:10:44Z] [00000eb4] [connection-manager] - id=000000000088DB10: Failed to obtain new connection from http layer, error 1067(Channel shutdown due to tls negotiation timeout)

So basically if the network is slow or the latency high I won't be able to connect.

Describe the solution you'd like
TLS negotiation timeout is currently set to 4 seconds in the native layer. This is non-configurable. I'd like to be able to set it.

Describe alternatives you've considered
I can have a retry mechanism, but that won't work in a high latency setup.

Additional context
This feature request already exists in the python crt repo.

The text was updated successfully, but these errors were encountered:

bretambrose · 2020-12-09T16:18:04Z

Thanks for bringing this up. You're not the first person to run into this problem and timeouts are definitely on our radar to find/expose a better solution to.

jonyt · 2020-12-10T11:01:45Z

Is it very difficult to expose the timeout option to configurability?

Also, even if the above not possible it would be nice to have the CRT error available to the java SDK. As it is this failure causes connect to fail with no description of the failure, so I had to enable the CRT log and look there. This would be a lot more difficult when the device was on the client's side.

bretambrose · 2020-12-10T16:43:40Z

It's not difficult to expose a particular kind of timeout, but there is some internal debate over how we should be exposing them. I'm currently a fan of "one timeout across the full connection establishment regardless of what's going on underneath" which API-wise doesn't mesh with exposing individual timeouts for small pieces of the full circuit construction.

For example, right now in the SDK you can configure a connection establishment request to:

Establish a tls-protected connection to a proxy, followed by
Establish a tls-protected end-to-end tunnel (completely independent of the tls context in 1) through the proxy to an arbitrary endpoint via a CONNECT request, in order to
Make an http request to perform a websocket upgrade on (2) in order to use mqtt over the connection

Only when (3) completes successfully (or an error anywhere short circuits the process) will you, the user, get a callback. In that timeframe there's two separate tls timeouts, a socket timeout, two (CONNECT and websocket handshake) http request timeouts, and that's only counting things we can control (not the second half of the proxy CONNECT request).

In the future, we may have even more complex patterns (primarily revolving around proxy authentication).

From a user experience perspective, I'd much rather a single knob that meant "if my connection isn't fully established by this time, then give up" over 5+ knobs that all control little sub-pieces (that don't give complete cover) of timeout failures.

jonyt · 2020-12-16T09:04:00Z

I agree. As a user I want the least amount of complexity exposed to me,as long as I can do my job. With that said at this point I'd like the fastest solution possible since this SDK is already deployed on client sites and if the network has high latency this particular feature, which is an important one, won't work.

…

On Thu, Dec 10, 2020 at 6:43 PM Bret Ambrose ***@***.***> wrote: It's not difficult to expose a particular kind of timeout, but there is some internal debate over how we should be exposing them. I'm currently a fan of "one timeout across the full connection establishment regardless of what's going on underneath" which API-wise doesn't mesh with exposing individual timeouts for small pieces of the full circuit construction. For example, right now in the SDK you can configure a connection establishment request to: 1. Establish a tls-protected connection to a proxy 2. Establish a tls-protected end-to-end tunnel (completely independent of the tls context in 1) through the proxy to an arbitrary endpoint via a CONNECT request, in order to 3. Make an http request to perform a websocket upgrade on (2) in order to use mqtt over the connection Only when (3) completes successfully (or an error anywhere short circuits the process) will you, the user, get a callback. In that timeframe there's two separate tls timeouts, a socket timeout, two (CONNECT and websocket handshake) http request timeouts, and that's only counting things we can control (not the second half of the proxy CONNECT request). From a user experience perspective, I'd much rather a single knob that meant "if my connection isn't fully established by this time, then give up" over 5+ knobs that all control little sub-pieces (that don't give complete cover) of timeout failures. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#119 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALHS2PA7TQBDADQCAFR5Q3SUD3E3ANCNFSM4UTPWX5Q> .

bretambrose · 2020-12-16T15:47:44Z

Just did a quick check and the default was recently raised to 10s from 4s. The latest version of the v2 sdk should have that default change. It's still not a configurable value, but updating to latest my temporarily ameliorate your issue until we get better timeout config implemented.

jonyt · 2020-12-17T15:54:38Z

Got it, thanks. Much appreciated.

jonyt · 2020-12-22T12:50:27Z

I upgraded to 1.2.10 and crt 0.9.2 and now the process crashes. I'm getting the following:

INFO   | jvm 1    | 2020/12/22 10:37:41 | WARNING: Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
INFO   | jvm 1    | 2020/12/22 10:39:20 | Dec 22, 2020 10:39:20 AM org.apache.commons.httpclient.HttpMethodBase getResponseBody
INFO   | jvm 1    | 2020/12/22 10:39:20 | WARNING: Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
INFO   | jvm 1    | 2020/12/22 10:44:10 | Dec 22, 2020 10:44:10 AM org.apache.commons.httpclient.HttpMethodBase getResponseBody
INFO   | jvm 1    | 2020/12/22 10:44:10 | WARNING: Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
INFO   | jvm 1    | 2020/12/22 10:44:11 | Fatal error condition occurred in C:\Program Files (x86)\Jenkins\workspace\aws-crt-java-build-dll-win64\aws-crt-java\crt\aws-c-mqtt\source\client.c:54: aws_mutex_try_lock(&(connection)->synced_data.lock) == (-1)
INFO   | jvm 1    | 2020/12/22 10:44:11 | Exiting Application
INFO   | jvm 1    | 2020/12/22 10:44:11 | at 0x7FF9073E2B62: Java_software_amazon_awssdk_crt_auth_signing_AwsSigner_awsSignerSignChunk
INFO   | jvm 1    | 2020/12/22 10:44:11 | at 0x7FF9073E2B62: Java_software_amazon_awssdk_crt_auth_signing_AwsSigner_awsSignerSignChunk
INFO   | jvm 1    | 2020/12/22 10:44:11 | at 0x7FF9073E2B62: Java_software_amazon_awssdk_crt_auth_signing_AwsSigner_awsSignerSignChunk
INFO   | jvm 1    | 2020/12/22 10:44:11 | at 0x7FF9073E2B62: Java_software_amazon_awssdk_crt_auth_signing_AwsSigner_awsSignerSignChunk
INFO   | jvm 1    | 2020/12/22 10:44:11 | at 0x7FF9073E2B62: Java_software_amazon_awssdk_crt_auth_signing_AwsSigner_awsSignerSignChunk
INFO   | jvm 1    | 2020/12/22 10:44:11 | at 0x7FF9073E2B62: Java_software_amazon_awssdk_crt_auth_signing_AwsSigner_awsSignerSignChunk
INFO   | jvm 1    | 2020/12/22 10:44:11 | at 0x7FF9073E2B62: Java_software_amazon_awssdk_crt_auth_signing_AwsSigner_awsSignerSignChunk
INFO   | jvm 1    | 2020/12/22 10:44:11 | at 0x00000000011C8C67: Failed to lookup symbol: error 126
ERROR  | wrapper  | 2020/12/22 10:44:11 | JVM exited unexpectedly.

bretambrose · 2020-12-22T15:24:50Z

Are you on windows?

jonyt · 2020-12-22T16:26:53Z

In this case yes.

…

On Tue, 22 Dec 2020, 17:25 Bret Ambrose, ***@***.***> wrote: Are you on windows? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#119 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALHS2O664TINSW5N6CAZ73SWC25FANCNFSM4UTPWX5Q> .

bretambrose · 2021-01-05T18:09:20Z

This should be fixed in v1.2.11

github-actions · 2021-01-05T18:09:48Z

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

jonyt added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Dec 9, 2020

bretambrose removed the needs-triage This issue or PR still needs to be triaged. label Dec 9, 2020

bretambrose closed this as completed Jan 5, 2021

jonyt mentioned this issue Feb 15, 2021

TLS negotiation timeout #131

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connecting to IoT fails with TLS negotiation timeout #119

Connecting to IoT fails with TLS negotiation timeout #119

jonyt commented Dec 9, 2020

bretambrose commented Dec 9, 2020

jonyt commented Dec 10, 2020

bretambrose commented Dec 10, 2020 •

edited

Loading

jonyt commented Dec 16, 2020 via email

bretambrose commented Dec 16, 2020

jonyt commented Dec 17, 2020

jonyt commented Dec 22, 2020

bretambrose commented Dec 22, 2020

jonyt commented Dec 22, 2020 via email

bretambrose commented Jan 5, 2021

github-actions bot commented Jan 5, 2021

Connecting to IoT fails with TLS negotiation timeout #119

Connecting to IoT fails with TLS negotiation timeout #119

Comments

jonyt commented Dec 9, 2020

bretambrose commented Dec 9, 2020

jonyt commented Dec 10, 2020

bretambrose commented Dec 10, 2020 • edited Loading

jonyt commented Dec 16, 2020 via email

bretambrose commented Dec 16, 2020

jonyt commented Dec 17, 2020

jonyt commented Dec 22, 2020

bretambrose commented Dec 22, 2020

jonyt commented Dec 22, 2020 via email

bretambrose commented Jan 5, 2021

github-actions bot commented Jan 5, 2021

⚠️COMMENT VISIBILITY WARNING⚠️

bretambrose commented Dec 10, 2020 •

edited

Loading