-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docstrings + Process XOR/ROL optimisations #10
base: master
Are you sure you want to change the base?
Conversation
/// It's based off a <code>BinaryReader</code>, which is a little-endian reader. | ||
/// </summary> | ||
public partial class KaitaiStream : BinaryReader | ||
{ | ||
#region Constructors | ||
|
||
/// <summary> | ||
/// Create a KaitaiStream backed by abstract stream. It could be in-memory buffer or open file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an abstract stream
an in-memory buffer
an open file
KaitaiStream.cs
Outdated
for (int i = 0, j = 0; i < value.Length; i++, j = (j + 1) % keyLen) | ||
if (key.Length == 1) | ||
return ProcessXor(data, key[0]); | ||
if (key.Length <= 64 && ByteArrayZero(key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know what I think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimisations are based on conjecture/assumption that keys are much shorter than data. That makes this check cheap in comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also ByteArrayZero is short evaluation, it breaks loop on first non-zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimisations are based on conjecture/assumption that keys are much shorter than data. That makes this check cheap in comparison.
We don't need the assumption you are using ( 1 < key.Length <= 64 <=> Pr( key === zero bit string ) > key.Length/value.Length
) to be an inherent part of the runtime.
KaitaiStream.cs
Outdated
///</summary> | ||
public KaitaiStream(string file) : base(File.Open(file, FileMode.Open, FileAccess.Read, FileShare.Read)) | ||
/// <summary> | ||
/// Create a KaitaiStream backed by a closed file (in read-only binary-mode). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it's "closed file"? It actually remains open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a less known Python terminology. In general, functions that work on files are divided into those that operate on (currently) closed files and take filenames and those that operate on (currently) open files and take integer file descriptors/handles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
KaitaiStream.cs
Outdated
/// <summary> | ||
/// Used internally. | ||
/// Caches the current plaftorm endianness and allows emited bytecode to be optimised. Thanks to @Arlorean. | ||
/// </summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably warrants a <remarks>...</remarks>
section with a longer explanation of how this works, or at least a pointer to that discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added link, thanks, fixed.
KaitaiStream.cs
Outdated
private int BitsLeft = 0; | ||
|
||
/// <summary> | ||
/// Used internally. | ||
/// Caches the current plaftorm endianness and allows emited bytecode to be optimised. Thanks to @Arlorean. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plaftorm
=> platform
, emited
=> emitted
; arguably, optimised
=> optimized
, but that's UK vs US...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I'm from the UK but I try to make all my spelling US so it's consistent with the rest of .NET and other libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also speak US English. Sloppy typing, thanks. Fixed.
KaitaiStream.cs
Outdated
/// <summary> | ||
/// Used internally. | ||
/// </summary> | ||
private static void compute_single_rotations() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lower_underscore
is totally against any .NET coding standards, I'm afraid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree again. Best to not use underscores in C# code precomputedSingleRotations woud be better, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just copied the name from Python. It was a literal translation. Fixed.
KaitaiStream.cs
Outdated
if (amount > 7 || amount < -7) throw new ArgumentException("Rotation of more than 7 cannot be performed.", "amount"); | ||
if (amount < 0) amount += 8; // Rotation of -2 is the same as rotation of +6 | ||
if (groupSize < 1) | ||
throw new Exception("group size must be at least 1 to be valid"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw new ArgumentException(...)
OK now I'm being picky!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @GreyCat on his comments about naming.
I've verified the results of the optimizations on Windows with Visual C# Compiler (2.6.0.62329) and they're all good.
KaitaiStream.cs
Outdated
#endregion | ||
|
||
#region Stream positioning | ||
|
||
/// <summary> | ||
/// Check if the stream position is at the end of the stream | ||
/// Check if the stream position is at the end of the stream (at EOF). | ||
/// WARNING: This requires a seekable and tellable stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.NET does not use "tell" and thus you can expect that nobody would knows what "tellable" is. Microsoft actually calls both "seekable" and "tellable" => "supports seeking", and that actually makes sense to use that wording at least here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our own documentation (KS docs PR regarding seeking) we do use these terms, seekable and tellable, so there is no reason why we cant make it consistent with our own terminology, instead of Microsoft's. And I do keep Microsoft .NET in high regard. Its also consistent with docstrings in Python and in future, other runtimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've already told you in these PRs that it's a bad idea to extrapolate C-style or Python-style terminology to end users who don't know anything about that in the first place, and I'm repeating it here. How many times do I need to repeat that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, dont yell at me. If you are adamant that it should be changed, then be explicit. You only old me your opinion on this, and I told you mine. I will fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I certainly don't want to yell at anyone or anything like that. It just looked like we had no discussion there and you just kept pushing that C/Python jargonism where it does not belong (and that would have raised lots of questions in future). I've stated my position in kaitai-io/kaitai_struct_doc#14, and you've just ignored my comment, continuing to use it here and there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didnt make the connection. That and just got overall sloppy... 😪
KaitaiStream.cs
Outdated
/// <summary> | ||
/// Check if byte array is all zeroes. | ||
/// </summary> | ||
public static bool ByteArrayZero(byte[] a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably IsByteArrayZero
would be a better name, as otherwise it kind of clashes with Microsoft's naming of functions like ZeroMemory
, which actually sets all members of array to zero. Also, I'm not sure that it should be public
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed name to IsByteArrayZero. Fixed. The other helper methods are already public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other static methods like Mod
, ByteArrayCompare
, etc, are public because they are actually invoked in class sources generated by ksc. But then again, IsByteArrayZero
might be useful to somebody per se...
KaitaiStream.cs
Outdated
private ulong Bits = 0; | ||
/// <summary> | ||
/// Used internally. | ||
/// </summary> | ||
private int BitsLeft = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like "Number of bits left in Bits
buffer" + a cross-reference to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is cross reference?
f912c61
to
dc6488a
Compare
KaitaiStream.cs
Outdated
public KaitaiStream(string file) : base(File.Open(file, FileMode.Open, FileAccess.Read, FileShare.Read)) | ||
/// <summary> | ||
/// Create a KaitaiStream backed by a currently closed file (in read-only binary-mode). | ||
/// </summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "currently closed" is still very confusing. I've definitely not a Python person, but for C# guys that would be just as confusing. From OS point of view, "currently closed" file just does not exist as a resource: it takes no memory buffers, no handles, no nothing. Why don't we rephrase it in a somewhat more straightforward voice, i.e. something
Create a KaitaiStream by opening a file in read-only binary mode. This KaitaiStream will be backed by FileStream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
dc6488a
to
d86d555
Compare
d86d555
to
8adf176
Compare
8adf176
to
1fc9143
Compare
d7ca541
to
52308c2
Compare
It seems that @GreyCat was once again right and I was half wrong. You were right, that benchmarking C# code is more tricky than I thought, although I would like to point out in my defense that its for reasons that nobody pointed out explicitly and possibly nobody even understood. It is my current belief that warmup has very little impact as far as (1) CPU cache is concerned, since test data is 2GB in size and CPU L2 cache is ~1MB and (2) JIT cost is also smallish compared to overall run time. However, it seems that there is one other reason to warmup the tested method: When you allocate using System;
using System.Diagnostics;
using System.Threading;
using System.Runtime;
namespace demo
{
class MainClass
{
static void ByModulo (byte[] data, byte[] key)
{
int len = data.Length;
int keylen = key.Length;
for (int i = 0; i < len; i++)
data[i] = (byte) (data[i] ^ key[i % keylen]);
}
static void ByModulo2 (byte[] data, byte[] key)
{
for (int i = 0; i < data.Length; i++)
data[i] = (byte) (data[i] ^ key[i % key.Length]);
}
static void ByCheckWrap (byte[] data, byte[] key)
{
int len = data.Length;
int keylen = key.Length;
int k = 0;
for (int i = 0; i < len; i++)
data[i] = (byte) (data[i] ^ key[k]);
k++;
if (k == keylen)
k = 0;
}
static void ByCheckWrap2 (byte[] data, byte[] key)
{
int k = 0;
for (int i = 0; i < data.Length; i++)
data[i] = (byte) (data[i] ^ key[k]);
k++;
if (k == key.Length)
k = 0;
}
static void Measure (Action work, string description)
{
work.Invoke ();
var watch = Stopwatch.StartNew ();
work.Invoke ();
var elapsed = watch.Elapsed.TotalMilliseconds;
Console.WriteLine ("{0,-40}: {1} ms", description, elapsed);
}
public static void Main (string[] args)
{
byte[] data = new byte[int.MaxValue];
byte[] key = new byte[10];
Measure (() => ByModulo (data, key), "Loop with data[i % keylen]");
Measure (() => ByModulo2 (data, key), "Loop with data[i % key.Length]");
Measure (() => ByCheckWrap (data, key), "Loop with data[k]; if(k==keylen)k=0");
Measure (() => ByCheckWrap2 (data, key), "Loop with data[k]; if(k==key.Length)k=0");
}
}
}
|
7e2f45c
to
a6bbce0
Compare
All commits were amended to reflect last benchmarks. Previous changset was 52308c2. |
@arekbulski, I would keep insisting that you're missing the point with benchmarks here. The platform is wrong (given that 95% of .NET installations use Microsoft's .NET, it's only natural to benchmark these), the methodology is all wrong both from point of view of general statistics, and from .NET's environment point of view, etc. This alone is a total disaster, for example, which makes all tests effectively useless: int k = 0;
for (int i = 0; i < data.Length; i++)
data[i] = (byte) (data[i] ^ key[k]);
k++;
if (k == key.Length)
k = 0; If you don't see it, please take a longer look at this particular piece of code. I'm by no means a .NET expert, but even what I know is enough to see that it is not the way to do it. I can teach you how to do it, I can do these benchmarks myself, but that would take a considerable chunk of my time that I'd rather spend on some real-life tasks. This is, of course, interesting from educational perspective, both for me, you and anyone who might be interested in microbenchmarking, but there are like 400 other pressing issues in the main repo, and I do believe they're more important than this task that nobody even ever requested. |
My experience in real-world .NET applications on Windows is that the amount of memory you turn over, which puts pressure on the Garbage Collector, often outweighs any micro-optimizations at the loop level like this. We use tools like dotTrace and dotMemory to pin down performance problems, but more often than not, it's a change in algorithm, rather than a change in loop structuring that provides the most performance gains. The Roslyn team have some guidelines which may help. The key one being don't use LINQ as it allocates memory which causes performance issues. This brings me to an overall concern about the KaitaiStruct C# API. As much as I love it, I don't love the amount of memory that gets allocated when parsing a file. A solution along the lines of FlatBuffers would be preferable since it performs zero memory allocations but still gives easy access to the data structures stored in essentially a byte array. |
@Arlorean Unfortunately, it's not that easy. Projects like Cap'n Proto or FlatBuffers use carefully tuned representation of certain data laid out in memory in a manner that would be fast to access on a reasonably modern x86_64 box. If you'll try to apply the same principle to arbitrary binary data, chances are that you'll end up with very slow code due to every Also, that's not a single answer for everything anyway. You can't escape creating a string object in C# if you indeed want a string, not a byte array. Actually, in C# you can't escape allocating new byte array and copying slice of the data there every time you want a very basic byte array in the middle of memory. That said, I think that it's still a very good idea to try implementing such "zero-object creation", even if it would be of limited use and not a good fit for every one. I don't think that ksy format per se has any blockers for implementing that right now. |
@GreyCat Please note that only 2 items are (should be) in question: XOR(bytearray) and ROL(groupsize=1). Other changes are either easily justifiable (like returning entire array is cheaper than copying it) or add instead of replacing features (like ROL >1 groups). @Arlorean @GreyCat |
Ok, the catch is that you've used Python int k = 0;
for (int i = 0; i < data.Length; i++)
{
data[i] = (byte) (data[i] ^ key[k]);
}
k++;
if (k == key.Length)
{
k = 0;
} That |
Benchmarks of fixed code and fixed methodology, although still ran on Mono. using System;
using System.Diagnostics;
using System.Runtime;
namespace demo
{
class MainClass
{
static void ByBitwiseXor (byte[] data)
{
int len = data.Length;
byte x = 1;
for (int i = 0; i < len; i++) {
data[i] = (byte) (data[i] ^ x);
}
}
static void ByShifts (byte[] data)
{
int len = data.Length;
int amount = 1;
int anti = 7;
for (int i = 0; i < len; i++) {
byte x = data[i];
data[i] = (byte)((x << amount) | (x >> anti));
}
}
static void ByLookup (byte[] data)
{
int len = data.Length;
byte[] translate = new byte[256];
for (int i = 0; i < 256; i++)
translate[i] = (byte) (i ^ 1);
for (int i = 0; i < len; i++) {
data[i] = translate[data[i]];
}
}
static void ByModulo (byte[] data, byte[] key)
{
int len = data.Length;
int keylen = key.Length;
for (int i = 0; i < len; i++) {
data[i] = (byte) (data[i] ^ key[i % keylen]);
}
}
static void ByModulo2 (byte[] data, byte[] key)
{
for (int i = 0; i < data.Length; i++) {
data[i] = (byte) (data[i] ^ key[i % key.Length]);
}
}
static void ByCheckWrap (byte[] data, byte[] key)
{
int len = data.Length;
int keylen = key.Length;
int k = 0;
for (int i = 0; i < len; i++) {
data[i] = (byte) (data[i] ^ key[k]);
k++;
if (k == keylen)
k = 0;
}
}
static void ByCheckWrap2 (byte[] data, byte[] key)
{
int k = 0;
for (int i = 0; i < data.Length; i++) {
data[i] = (byte) (data[i] ^ key[k]);
k++;
if (k == key.Length)
k = 0;
}
}
static void Measure (Action work, string description)
{
GCSettings.LatencyMode = GCLatencyMode.LowLatency;
work.Invoke ();
var watch = Stopwatch.StartNew ();
work.Invoke ();
var elapsed = watch.Elapsed.TotalMilliseconds;
Console.WriteLine ("{0,-40}: {1:F0} ms", description, elapsed);
}
public static void Main (string[] args)
{
byte[] data = new byte[int.MaxValue];
byte[] key = new byte[10];
Measure (() => ByBitwiseXor (data), "Loop with ^ key");
Measure (() => ByShifts (data), "Loop with << >> |");
Measure (() => ByLookup (data), "Loop with translate[...]");
Measure (() => ByModulo (data, key), "Loop with data[i % keylen]");
Measure (() => ByModulo2 (data, key), "Loop with data[i % key.Length]");
Measure (() => ByCheckWrap (data, key), "Loop with data[k]; if(k==keylen)k=0");
Measure (() => ByCheckWrap2 (data, key), "Loop with data[k]; if(k==key.Length)k=0");
}
}
}
The benchmark shows that array lookup is faster than shifts (in ROL) but not faster than simple xoring (in XOR). It also shows that local variable and instance field .Length have identical performance, but only in loop variables, so |
a6bbce0
to
a531fee
Compare
The code was updated according to the benchmarks. READY TO MERGE (AGAIN! :). |
…-ends-with-exception' into 'master' Resolve "Forward PipeReader.SeekAsync on completed ReadResult ends with exception" Closes kaitai-io#10 See merge request marta/kaitai_struct_csharp_runtime!20
Analog implementation to Python PR.
@GreyCat There are 2 methods that remain without docstrings (marked ???). If you would please tell me what to write there, I would include it.
Benchmarks for loops:
Benchmarks for rotations:
Requesting review from @koczkatamas @LogicAndTrick @Arlorean @KOLANICH .