Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socket should be closed when socket error happened during abort #98

Merged
merged 1 commit into from
Jan 7, 2025

Conversation

leslie-qiwa
Copy link

FD leak issue happened from time to time at our production environment. Using strace, I found many EAGAIN error on read/write when issue occured. This PR should fix some cases, will watch after new fix is deployed

Example strace logs:

read(1727, 0xc002a2e000, 517)           = -1 EAGAIN (Resource temporarily unavailable)
read(1727, 0xc0001e3c00, 1772)          = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\223\303tH\t\246\5\261p\224\0376\272\302J\27\205\217\2412\206\220\267\5\22CO\315\213\331\313P"..., 5132) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\340,M\273\324K\27V\17\222\232-\332j\204\3618\313R\221\35\16[\267\316\304,\360r\232?m"..., 12899) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "_\20173\200\334W]aT\227\233\362LO^56\270\21\325\267%\361^\264O@\376\315\365Z"..., 5279) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "J\261\236\nK\325\354\231\312\rF'M\6\222\"v\247\317L\346tE\354Z*\251/?\254\357\214"..., 15977) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\371\345\221\352\10\232Pa\204\253\306\\lm\271\231Y\305\314\216U\244\314\247\36\204\300\256\231\217\2727"..., 7331) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "=aFp\207\367 lJ\245\351\37\313\357\232f\333\232\344D\361\266D\223\220\312\345r\321\10J}"..., 13193) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\315\10`\370\352\243K\314\265\270\3\260A6n\221\230Y\212V-Xp\325\231\236\350\32\207=A\25"..., 4547) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\300\206\365\313\325{:\0052hn:\270\205\30TC\23\203\237\360\273\216\242\353\373Y)\233\211\305B"..., 8504) = -1 EAGAIN (Resource temporarily unavailable)
read(1727, 0xc0029e18c0, 517)           = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\354\243\206\270'X\\\344\323\205\247gXa\254\335a\345|\\g\232Z\307\375r\314\33\207\31l\237"..., 1910) = -1 EAGAIN (Resource temporarily unavailable)
read(1727, 0xc001f8ad80, 517)           = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\246|\230}S\24\201\202\253\242\37\216\2121t\3\366=\345#\253\342^\335?\355\243Q\254\nI\245"..., 9677) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "z^f\226\372\353\25\24\214\302\331\253\222\310|\330l\364\201n\231\275Oi$[\30\273b\244\376&"..., 12608) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\263\326{h\204V\345\254\223\320\n\214\301\325;~\276\367\351\202\10\212\241\343\33\262\343\263\272\230\312\361"..., 6014) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "Qe\315\302\316\t\204\375\212RV\35u_\267\347N\361Xo\3\254cP{)\n\276\357.J\351"..., 11876) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\250E\244\276\35\204\n\34\376\0-\251/\22\10\327\310\332Y\6\204P\6\307NF\3K\363\316\351\311"..., 4256) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x220fb50, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x220fb50, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)

FD leak issue happened from time to time at our production environment.
Using strace, I found many EAGAIN error on read/write when issue occured.
This PR should fix some cases, will watch after new fix is deployed

Example `strace` logs:
```
read(1727, 0xc002a2e000, 517)           = -1 EAGAIN (Resource temporarily unavailable)
read(1727, 0xc0001e3c00, 1772)          = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\223\303tH\t\246\5\261p\224\0376\272\302J\27\205\217\2412\206\220\267\5\22CO\315\213\331\313P"..., 5132) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\340,M\273\324K\27V\17\222\232-\332j\204\3618\313R\221\35\16[\267\316\304,\360r\232?m"..., 12899) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "_\20173\200\334W]aT\227\233\362LO^56\270\21\325\267%\361^\264O@\376\315\365Z"..., 5279) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "J\261\236\nK\325\354\231\312\rF'M\6\222\"v\247\317L\346tE\354Z*\251/?\254\357\214"..., 15977) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\371\345\221\352\10\232Pa\204\253\306\\lm\271\231Y\305\314\216U\244\314\247\36\204\300\256\231\217\2727"..., 7331) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "=aFp\207\367 lJ\245\351\37\313\357\232f\333\232\344D\361\266D\223\220\312\345r\321\10J}"..., 13193) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\315\10`\370\352\243K\314\265\270\3\260A6n\221\230Y\212V-Xp\325\231\236\350\32\207=A\25"..., 4547) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\300\206\365\313\325{:\0052hn:\270\205\30TC\23\203\237\360\273\216\242\353\373Y)\233\211\305B"..., 8504) = -1 EAGAIN (Resource temporarily unavailable)
read(1727, 0xc0029e18c0, 517)           = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\354\243\206\270'X\\\344\323\205\247gXa\254\335a\345|\\g\232Z\307\375r\314\33\207\31l\237"..., 1910) = -1 EAGAIN (Resource temporarily unavailable)
read(1727, 0xc001f8ad80, 517)           = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\246|\230}S\24\201\202\253\242\37\216\2121t\3\366=\345#\253\342^\335?\355\243Q\254\nI\245"..., 9677) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "z^f\226\372\353\25\24\214\302\331\253\222\310|\330l\364\201n\231\275Oi$[\30\273b\244\376&"..., 12608) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\263\326{h\204V\345\254\223\320\n\214\301\325;~\276\367\351\202\10\212\241\343\33\262\343\263\272\230\312\361"..., 6014) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "Qe\315\302\316\t\204\375\212RV\35u_\267\347N\361Xo\3\254cP{)\n\276\357.J\351"..., 11876) = -1 EAGAIN (Resource temporarily unavailable)
write(1770, "\250E\244\276\35\204\n\34\376\0-\251/\22\10\327\310\332Y\6\204P\6\307NF\3K\363\316\351\311"..., 4256) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x220fb50, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x220fb50, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
```

Signed-off-by: Leslie Qi Wang <[email protected]>
@pin
Copy link
Owner

pin commented Jan 7, 2025

@leslie-qiwa, I think it make sense and good catch :)

The conn is supposed to be closed on abort but if sendTo of the error packet fails for some reason, we drop conn on the floor without closing it...

Merging, apology for the slow response, and thank you very much for good fix!

@pin pin merged commit f723ab0 into pin:master Jan 7, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants