-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #25292 Ephemeral ports are exhausted if stopping the DAS takes a long time #25293
Closed
Closed
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe sleep(10) would be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is always just a magic number - the problem is much deeper, because there is some time set by the OS kernel until the port is finally released. When we check again, we need a new local port, that is the cause of the issue; the more frequently we repeat that, the more close we are to exhaustion.
The correct solution really seems to open it just once and wait (!!!) until it closes from the server side - or return immediately if it is not listening at all.
SO_LINGER
is not recommended, because it would really resolve the issue, however it is an "evil" way to disconnect, it is like pulling the wire out of the socket. The server then must resolve it as an error state; on the other hand we had no intention to send any data though this connection, so it can remain on the table, but I would try the first recommended option first; however it will probably mean more refactoring, so maybe I will finally accept this and play for a while on my new laptop, because it detected another issue, see #25295 ; I tried to count local ports leading to 4848 - it was 78 at max level.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... even worse - around 200 local ports allocated when finished (and failed)
mvn clean install -pl :admin-tests
. They are cleaned up automatically after around 60 seconds, maximum seen in build was 315 even after I cherrypicked these two commits to my branch.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 things:
ProcessUtils.isListening(addr)
) only connects to the port and then returns true, without any waiting. I agree with what you suggest, @dmatej, that this check should wait until the server disconnects, instead of repeatedly trying to connect. For both local and remote server. For example, check in a loop the value ofserver.isClosed()
, or use NIO to wait for close event.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OndroMih , yield doesn't usually behave that way. On most systems it just abandons its "window" and yields the control to the scheduler instead of hogging the cpu.
With ports it is much more complicated - if client closes, it doesn't close really anything, it just declares that the application wants to close. That is where my assumption was wrong, I expected that after close the local port is free which is not true.
See
And for both of us this is interesting,
Thread.onSpinWait
seems better thanyield
, however I am still not sure id they finally don't do the same:Thread.sleep also means unloading and loading the context which means some load -
Thread.sleep(10)
might be less effective thanyield
oronSpinWait
.What we want here is not to iterate with sleeps, but to continue immediately after the condition is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this:
socket.getInputStream().read()
read
returns with -1 (EOF) if server socket closed gracefullySocketException
(with cause Connection reset) if server socket closed abnormally (e.g. server was killed)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May we use
ProcessHandle.onExit().get()
instead of any busy-waiting?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, after experiments with SO_LINGER, closing streams (recommended by AI) before closing the socket, sleeps, etc, .... I have also found more places with similar issue, and came to some conclusion, but I want to ask you both, @tnagao7 and @OndroMih - do we need to check ports in "local" case?
I think we don't, maybe just when things are broken somehow (like if somebody deleted the domain directory), then it could be used as some backup, but even then - maybe would be better to find the process using the same instance directory and check if it is running and ignore ports.
Another issue which came to my way on the new hardware is #25295 - the highest number of open local ports leading to localhost:4848 in admin-tests was 427, however those are not caused by restarts/stops/starts. I am not sure if it is ok or if there is something we should fix.
For now we can merge this and I can follow in my own PR. The count of open local ports is really issue which must be monitored and resolved regardless of the source of exhaustion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@avpinchuk The problem with ports is that when the port refuses connection, it doesn't provide any info about the state of the process. And the opposite too, when it is listening, it doesn't mean it is "our" GlassFish instance. If we want to monitor process, we should not monitor its ports at all.
In our case the instance is defined by its directory and by the process id.
With just one exception - asadmin stop-domain is able to stop ANY instance of GF listening on the admin port. Even if it has different PID than is in the pid file (which may not exist together with the instance directory deleted by mistake).
The onExit handle might be useful, however I am not sure if it is reliable ie. when we use
--kill
parameter? However this is a good idea.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JavaDoc for the
ProcessHandle
'sdestroy()
anddestroyForcibly()
says: