Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENT-12033: Made multiple changes to increase robustness of remote file copying #5620

Closed
wants to merge 10 commits into from

Conversation

larsewi
Copy link
Contributor

@larsewi larsewi commented Oct 21, 2024

Initially I was trying to reproduce a bug reported in ENT-12033. However, I was not able to reproduce it. While statically analyzing the code, I made some changes to make it more robust.

  • Added doc string to CopyRegularFileNet
  • Added assert to check return code of snprintf
  • Fixed typo in comment for CopyRegularFileNet
  • Removed check for filename too long
  • Removed unnecessary assignment of NULL-byte to buffer
  • Check return code of sscanf
  • Added sanity check in CopyRegularFileNet
  • Allocate receive buffer on stack instead of heap
  • cf-serverd now stats file every read on network transition
  • Added filestream flushing on error in EncryptCopyRegularFileNet

Build Status

@larsewi larsewi force-pushed the flat branch 2 times, most recently from a998ba0 to b66d61d Compare October 21, 2024 15:03
libcfnet/client_code.c Fixed Show fixed Hide fixed
libcfnet/client_code.c Fixed Show fixed Hide fixed
libcfnet/client_code.c Fixed Show fixed Hide fixed
libcfnet/client_code.c Fixed Show fixed Hide fixed
libcfnet/net.c Fixed Show fixed Hide fixed
libcfnet/net.c Fixed Show fixed Hide fixed
@larsewi larsewi force-pushed the flat branch 5 times, most recently from 05005c9 to 2c9ee5c Compare October 24, 2024 14:55
@larsewi
Copy link
Contributor Author

larsewi commented Oct 24, 2024

@cf-bottom Jenkins please :)

@cf-bottom
Copy link

@larsewi larsewi marked this pull request as ready for review October 25, 2024 06:58
Copy link
Contributor

@vpodzime vpodzime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me otherwise. Nice stuff! THANKS!

libcfnet/client_code.h Outdated Show resolved Hide resolved
libcfnet/client_code.c Show resolved Hide resolved
cf-serverd/server_common.c Outdated Show resolved Hide resolved
cf-serverd/server_common.c Show resolved Hide resolved
Copy link
Contributor

@craigcomstock craigcomstock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

libcfnet/client_code.h Outdated Show resolved Hide resolved
Log(LOG_LEVEL_INFO, "Source '%s:%s' changed while copying",
conn->this_server, source);
close(dd);
free(buf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that you need some more cleanup. Before this and the other failure catch above this one we have

        bool w_ok = FileSparseWrite(dd, buf, n_read,
                                    &last_write_made_hole);

and so it would seem to me you need more than close/free but rather the same as the other failure case catch above, though I am not sure about conn->error = true but seems also reasonable.

            free(buf);
            unlink(dest);
            close(dd);
            FlushFileStream(conn->conn_info->sd, size - n_wrote_total - n_read);
            conn->error = true;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch 🚀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the conn->error = true either. As far as I can see, it's not really checked after executing this code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be quite easy to merge the two cleanup blocks into a single one acting on two possible error/failure cases.

@larsewi larsewi force-pushed the flat branch 3 times, most recently from 9fb6026 to 50ddf63 Compare October 28, 2024 09:56
vpodzime
vpodzime previously approved these changes Oct 29, 2024
Copy link
Contributor

@vpodzime vpodzime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me otherwise. Thanks!

libcfnet/client_code.h Outdated Show resolved Hide resolved
Log(LOG_LEVEL_INFO, "Source '%s:%s' changed while copying",
conn->this_server, source);
close(dd);
free(buf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be quite easy to merge the two cleanup blocks into a single one acting on two possible error/failure cases.

libcfnet/client_code.c Show resolved Hide resolved
Some of the parameters were not that obvious. Hence, I added a doc
string so that future hackers can save some time trying to understand
this madness.

Ticket: None
Changelog: None
Signed-off-by: Lars Erik Wik <[email protected]>
Added assert to check return code of snprintf in
`[Encrypt]CopyRegularFileNet`.

Ticket: None
Changelog: None
Signed-off-by: Lars Erik Wik <[email protected]>
Ticket: None
Changelog: None
Signed-off-by: Lars Erik Wik <[email protected]>
There was a check to see if the destination filename was larger than
`CF_BUFSIZE - 20` when copying files over the network. This makes no
sense to me.

At first glans, one might think that it has to do with the network
protocol. I.e., `GET $SIZE $FILENAME` has to fit in `CF_BUFSIZE`.
However, this is the source filename, and it is never sent over the
protocol, as far as I can see.

Tried to use git blame to see why it was introduced, but that was not of
much help. It appears to date back to the very first implmentation of
the files promise in 2008. I'll try removing it.

Ticket: None
Changelog: None
Signed-off-by: Lars Erik Wik <[email protected]>
Removed unnecessary assignment of terminating NULL-byte to buffer.
`snprintf` will terminate the buffer.

Ticket: None
Changelog: None
Signed-off-by: Lars Erik Wik <[email protected]>
We should not rely on `sscanf` not mutating the pointer arguments on a
matching failure as this is not specified in the man page. Instead we
should check the expected amount of items are successfully matched.

Ticket: None
Changelog: None
Signed-off-by: Lars Erik Wik <[email protected]>
Added a sanity check to make sure we don't write more than the original
filesize aquired by SYNCH ... STAT source. This might happen if the file
was changed while copying.

Ticket: None
Changelog: None
Signed-off-by: Lars Erik Wik <[email protected]>
Changed to code in `[Encrypt]CopyRegularFileNet` to allocate the receive
buffer on the stack instead of the heap. There is no reason to have it
dynamically allocated since the size is constant. Furthermore, the
buffer is allocated and destroyed in the same scope.

Ticket: None
Changelog: None
Signed-off-by: Lars Erik Wik <[email protected]>
Previously the file was `stat`'ed every N read to detect file changes
during transmission. The reason for limiting the calls to stat is
probably to make the code more efficient. However, there are reasons to
believe that the bug experienced in ENT-12033 is attributed to
filechanges during transmission. Hence, I'm changing the code to do this
for every read, as it better to be safe and happy rather than fast but
sorry.

Yes, `stat()` is a system call. However, it has pretty good cashing
mechanisms on modern systems. So it should not be too expensive.

Ticket: None
Changelog: None
Signed-off-by: Lars Erik Wik <[email protected]>
By flushing the filestream, we make sure that there is no junk data
after error.

Ticket: None
Changelog: None
Signed-off-by: Lars Erik Wik <[email protected]>
@vpodzime vpodzime changed the title ENT-12033: Made multiple changed to increase robustness of remote file copying ENT-12033: Made multiple changes to increase robustness of remote file copying Oct 30, 2024
Copy link
Contributor

@vpodzime vpodzime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully the last nitpick.

unlink(dest);
close(dd);
FlushFileStream(conn->conn_info->sd, size - n_wrote_total - n_read);
FlushFileStream(conn->conn_info->sd, size - n_wrote_total);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this depend on which error case we are handling? I'd probably just check n_wrote_total + n_read in the condition instead of doing the addition first.

@larsewi larsewi closed this Dec 3, 2024
@larsewi larsewi deleted the flat branch December 3, 2024 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants