-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENT-12033: Made multiple changes to increase robustness of remote file copying #5620
Conversation
a998ba0
to
b66d61d
Compare
05005c9
to
2c9ee5c
Compare
@cf-bottom Jenkins please :) |
Alright, I triggered a build: Jenkins: https://ci.cfengine.com/job/pr-pipeline/11318/ Packages: http://buildcache.cfengine.com/packages/testing-pr/jenkins-pr-pipeline-11318/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me otherwise. Nice stuff! THANKS!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
libcfnet/client_code.c
Outdated
Log(LOG_LEVEL_INFO, "Source '%s:%s' changed while copying", | ||
conn->this_server, source); | ||
close(dd); | ||
free(buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that you need some more cleanup. Before this and the other failure catch above this one we have
bool w_ok = FileSparseWrite(dd, buf, n_read,
&last_write_made_hole);
and so it would seem to me you need more than close/free but rather the same as the other failure case catch above, though I am not sure about conn->error = true but seems also reasonable.
free(buf);
unlink(dest);
close(dd);
FlushFileStream(conn->conn_info->sd, size - n_wrote_total - n_read);
conn->error = true;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the conn->error = true
either. As far as I can see, it's not really checked after executing this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be quite easy to merge the two cleanup blocks into a single one acting on two possible error/failure cases.
9fb6026
to
50ddf63
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me otherwise. Thanks!
libcfnet/client_code.c
Outdated
Log(LOG_LEVEL_INFO, "Source '%s:%s' changed while copying", | ||
conn->this_server, source); | ||
close(dd); | ||
free(buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be quite easy to merge the two cleanup blocks into a single one acting on two possible error/failure cases.
Some of the parameters were not that obvious. Hence, I added a doc string so that future hackers can save some time trying to understand this madness. Ticket: None Changelog: None Signed-off-by: Lars Erik Wik <[email protected]>
Added assert to check return code of snprintf in `[Encrypt]CopyRegularFileNet`. Ticket: None Changelog: None Signed-off-by: Lars Erik Wik <[email protected]>
Ticket: None Changelog: None Signed-off-by: Lars Erik Wik <[email protected]>
There was a check to see if the destination filename was larger than `CF_BUFSIZE - 20` when copying files over the network. This makes no sense to me. At first glans, one might think that it has to do with the network protocol. I.e., `GET $SIZE $FILENAME` has to fit in `CF_BUFSIZE`. However, this is the source filename, and it is never sent over the protocol, as far as I can see. Tried to use git blame to see why it was introduced, but that was not of much help. It appears to date back to the very first implmentation of the files promise in 2008. I'll try removing it. Ticket: None Changelog: None Signed-off-by: Lars Erik Wik <[email protected]>
Removed unnecessary assignment of terminating NULL-byte to buffer. `snprintf` will terminate the buffer. Ticket: None Changelog: None Signed-off-by: Lars Erik Wik <[email protected]>
We should not rely on `sscanf` not mutating the pointer arguments on a matching failure as this is not specified in the man page. Instead we should check the expected amount of items are successfully matched. Ticket: None Changelog: None Signed-off-by: Lars Erik Wik <[email protected]>
Added a sanity check to make sure we don't write more than the original filesize aquired by SYNCH ... STAT source. This might happen if the file was changed while copying. Ticket: None Changelog: None Signed-off-by: Lars Erik Wik <[email protected]>
Changed to code in `[Encrypt]CopyRegularFileNet` to allocate the receive buffer on the stack instead of the heap. There is no reason to have it dynamically allocated since the size is constant. Furthermore, the buffer is allocated and destroyed in the same scope. Ticket: None Changelog: None Signed-off-by: Lars Erik Wik <[email protected]>
Previously the file was `stat`'ed every N read to detect file changes during transmission. The reason for limiting the calls to stat is probably to make the code more efficient. However, there are reasons to believe that the bug experienced in ENT-12033 is attributed to filechanges during transmission. Hence, I'm changing the code to do this for every read, as it better to be safe and happy rather than fast but sorry. Yes, `stat()` is a system call. However, it has pretty good cashing mechanisms on modern systems. So it should not be too expensive. Ticket: None Changelog: None Signed-off-by: Lars Erik Wik <[email protected]>
By flushing the filestream, we make sure that there is no junk data after error. Ticket: None Changelog: None Signed-off-by: Lars Erik Wik <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully the last nitpick.
unlink(dest); | ||
close(dd); | ||
FlushFileStream(conn->conn_info->sd, size - n_wrote_total - n_read); | ||
FlushFileStream(conn->conn_info->sd, size - n_wrote_total); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this depend on which error case we are handling? I'd probably just check n_wrote_total + n_read
in the condition instead of doing the addition first.
Initially I was trying to reproduce a bug reported in ENT-12033. However, I was not able to reproduce it. While statically analyzing the code, I made some changes to make it more robust.