Fix processing of PCAP files with trimmed packets #657
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Network traffic datasets oftentimes omit the actual packet contents to reduce the dataset volume and probably for the sake of privacy. They do it with the use of tcpdump's --snapshot-length option or using scripts to only preserve headers up to transport layer. One example are MAWI lab datasets where packets are trimmed to lengths in the range 34-96 bytes depending on the packet type.
We see that TCP reports contents of length 1420, but the actual contents printed in ASCII do not exceed 100 bytes. This indicates the packet is trimmed. By running a very simple dpkt script (see below) that would copy all packets from singlepacket.pcap to copied.pcap we get the following result with tcpdump repoting an error:
Wireshark would exhibit similar behavior such as failure to associate packets belonging to the same flow. This happens because each packet in pcap format has two fields in the header associated with it: 'len' and 'caplen', they give tcpdump a hint whether the packet was trimmed. Currently dpkt ignores 'len' field and only uses 'caplen'.
To fix this, I provide two commits - one for the Writer side and another for the Reader side. The former allows providing 'len' value to be written in the pcap packet header, and the latter exposes this value from the pcap file to the user.
The code changes to preserve 'len' field with proposed API would be minimal: