You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We had an issue where a Task Runner had an established socket (checked with netstat -tpn) and was waiting forever for traffic on that socket to occur. However, the other side had no corresponding socket and therefore couldn't send anything. My guess is that the router had dropped the socket from its NAT tables.
After killing the socket using ss -K sport = <port>, the Task Runner resumed normal operations. So the Task Runner was still operational, just waiting forever for a reply that wouldn't come. To make the Task Runner robust against situations like this, we should put a timeout on socket operations, so the operation fails if it doesn't make progress for a long time and a new socket can be opened on the next try.
This issue is very rare: I've had three Task Runners on the same machine with the same router for over half a year and it happened only once. So we can set the timeout value relatively high, for example a minute.
Note that the Task Runner doesn't do low-level socket operations directly: it uses java.net.HttpURLConnection instead. That class has setConnectTimeout() and setReadTimeout() methods that we can use. But those were added in Java 1.5, while the Task Runner was originally written in Java 1.4.
The text was updated successfully, but these errors were encountered:
We had an issue where a Task Runner had an established socket (checked with
netstat -tpn
) and was waiting forever for traffic on that socket to occur. However, the other side had no corresponding socket and therefore couldn't send anything. My guess is that the router had dropped the socket from its NAT tables.After killing the socket using
ss -K sport = <port>
, the Task Runner resumed normal operations. So the Task Runner was still operational, just waiting forever for a reply that wouldn't come. To make the Task Runner robust against situations like this, we should put a timeout on socket operations, so the operation fails if it doesn't make progress for a long time and a new socket can be opened on the next try.This issue is very rare: I've had three Task Runners on the same machine with the same router for over half a year and it happened only once. So we can set the timeout value relatively high, for example a minute.
Note that the Task Runner doesn't do low-level socket operations directly: it uses
java.net.HttpURLConnection
instead. That class hassetConnectTimeout()
andsetReadTimeout()
methods that we can use. But those were added in Java 1.5, while the Task Runner was originally written in Java 1.4.The text was updated successfully, but these errors were encountered: