Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes/standardizes the behavior of a repetition '_index' within a 'repeat-until' expression #245

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

transverberate
Copy link

@transverberate transverberate commented Apr 4, 2022

Defines/Standardizes the Behavior of a Repetition _index within a repeat-until Expression

NOTE: This PR standardizes the previously-undefined behavior of _index within repeat_until.
Fixes issue #958. Because this standardizes something that was previously undefined, this PR may
warrant further discussion beyond the scope of a typical code review.

Standardizing this Behavior

Need

Issue #958 presents a basic scenario which warrants using (and therefore defining the behavior of)
_index within a repeat-until expression. Generally speaking, any data structure which contains
a list of entries of a variable-size N that is capable of terminating early (before N items have been processed),
benefits from this functionality.

Possible Behaviors

Inclusive _index (Implemented in this PR)

'Inclusive' means that _index gives the total number of elements processed plus the current entry.
Thus, for the first item processed in a repeat-until block, _index will yield a value of 1.
This is in contrast to the typical evaluation of _index outside a repeat-until expression.
However, I believe this treatment of _index within a repeat-until expression is justified (and, in-general, the expected behavior) - see the Reasoning for Inclusive Evaluation section.

Exclusive _index

'Exclusive' means that _index gives the total number of elements processed without counting the current entry.
Thus, for the first item processed in a repeat-until block, _index will yield a value of 0.
This is the typical behavior of _index in expressions outside repeat-until blocks.

Current Behavior

Currently, _index within repeat-until blocks is evaluated differently depending on the language target, see issue #958.

  • C++: _index is inclusive

  • C#: _index is inclusive

  • Go: _index is inclusive both within and outside repeat-until expressions.
    This is unlike every other language target where expressions (outside repeat-until)
    treat _index as exclusive (as they should).

  • Java: _index is inclusive

  • Javascript: _index is inclusive

  • Lua: _index is exclusive

  • Nim: _index is exclusive

  • Perl: Use of _index within repeat-until expressions results in code that does not compile

  • PHP: _index is inclusive

  • Python: _index is exclusive

  • Ruby: _index is inclusive

  • Rust: Use of _index within repeat-until expressions results in code that does not compile

Reasoning for Inclusive Evaluation

NOTE: This PR now implements exclusive evaluation. See the discussion in the comments.

Typically, _index is evaluated exclusively. This way, _index gives the current index of the
entry being processed. My argument, however, is that by the time the condition within a
repeat-until expression is being evaluated, the current item is already considered to be
processed.

This is supported by the fact that the contents of the current item
can be referred to by the _ reference within a repeat-until expression - further reinforcing
that the current item is already be considered to have been 'processed' when evaluating
repeat-until expression.

Inclusive evaluation is also consistent with most implementations of a 'do while' loop, a
structure to which a repeat-until block is roughly analogous.

Finally, inclusive evaluation will provide coverage for the most common use case:
where the maximum number of repetitions (N) is known, but has the potential to terminate early. Exclusive evaluation would require that the condition _index >= N - 1 is tested, rather than the cleaner _index >= N.

@dgelessus
Copy link
Contributor

Hey, thanks for looking into this and writing a detailed analysis and fix! Officially supporting _index inside repeat-until makes sense, especially because in kaitai_struct_formats we already have two specs that use this feature. In fact, these two specs apparently already expect two different meanings for _index, so there's definitely a need to standardize this.

I don't quite follow the argument for the "inclusive" _index behavior though. You are right that

by the time the condition within a
repeat-until expression is being evaluated, the current item is already considered to be
processed

But the repeat-until condition is still evaluated in the context of that already processed item. In particular, the "current" variable _ refers to the item that was just parsed, and not the upcoming item that has not been parsed yet (obviously that would be quite useless). So I'd argue that _index should match that and should be the index of the just parsed item, not the next one. IMO it would be counterintuitive to have _index be something different than the index of _ in the array.

I do see the advantage that the "inclusive" behavior matches what you would get in a C-style loop, where the index is incremented before the loop condition is evaluated. This indeed also makes it easier to write a maximum array size in a repeat-until condition. But I'm not sure if it's worth making _ and _index inconsistent to achieve this.

@transverberate
Copy link
Author

Great find on the two specs which employ this already!

I've been thinking about this some more...
The repeat functionality is quite limited currently. If this functionality is improved in the future such that its possible to specify whether a repeat block is inclusive/exclusive and/or when the loop condition is evaluated (perhaps some sort of repeat-while?) there may be new, cleaner ways which achieve the desired behavior (that being, terminating a variable-length list early).

If this is the case, perhaps _index should remain exclusive to keep behavior consistent across expressions. This way, current developers have a (rather hacky) means of accomplishing this behavior, while leaving room for more elegant solutions to be delivered (perhaps ones that allow the user to specify exclusive/inclusive evaluation) in the future without breaking backward compatibility.

TLDR:
Maybe _index should be exclusive.
Whatever the result, the behavior of _index within repeat-until should be documented in the user guide (possibly alongside an example).

@KOLANICH
Copy link

KOLANICH commented Apr 5, 2022

I think that _index should be exclusive, and if one needs _index + 1 as a count of already processed elements, he should be able to use _count, and if that _count is used, KSC should emit the code computing it.

@transverberate
Copy link
Author

I think that _index should be exclusive, and if one needs _index + 1 as a count of already processed elements, he should be able to use _count, and if that _count is used, KSC should emit the code computing it.

Introduction of a further parameter _count is beyond the scope of this PR (at least, in my opinion). In general, I'm skeptical of introducing another parameter whose definition is simply _index + 1. However, this is just my initial impression.

My suspicion is that many of the problems of the current representation could be addressed by introducing needed functionality into a repeat block (perhaps something as you have mentioned here).

I would love to see a discussion between the senior maintainers and the user-base addressing the limitations of the repeat functionality sometime in the future.

@generalmimon
Copy link
Member

generalmimon commented Apr 7, 2022

@dgelessus:

But the repeat-until condition is still evaluated in the context of that already processed item. In particular, the "current" variable _ refers to the item that was just parsed, and not the upcoming item that has not been parsed yet (obviously that would be quite useless). So I'd argue that _index should match that and should be the index of the just parsed item, not the next one. IMO it would be counterintuitive to have _index be something different than the index of _ in the array.

100% agree. I also think that _index should be exclusive in repeat-until - in other words, _ should be the same as items[_index]. If the repeat-until condition also depends on a previous element for example, that will be items[_index - 1] (of course this can be used only for _index >= 1, because there is no previous element for _index == 0). (Needing previous item in repeat-until appeared on Gitter and I came up with the solution where items[_index - 2] is the previous element and items[_index - 1] the current one because that's how the JavaScript code backing the Web IDE behaved, but it's certainly confusing and counter-intuitive.)

With the "inclusive" _index, items[_index] is undefined, because it would reference the future element - if we choose this behavior, there's a question what _index actually stands for when it can't be used as a subscript of the array. Oh, it makes emulating repeat-expr more "elegant" just by using repeat-until: _index == num_items (instead of repeat-until: _index == num_items - 1 for exclusive _index)? Once again, this looks weird to me too, because intuitively _index is an index to the array (if _index isn't an index, I don't know what it is) - it should be always in range 0 <= _index <= num_items - 1, so _index == num_items doesn't make sense and should never happen.

If you want to emulate repeat-expr using repeat-until (which may be useful if the repetition can be terminated before reaching the desired count when a special condition is fulfilled, as you describe in kaitai-io/kaitai_struct#958), you'll want to finish on the last element, which has the index num_items - 1, so repeat-until: _index >= num_items - 1 (or _index == num_items - 1 if you're sure that you won't somehow miss the specific index, which can be safely assumed here, because _index is always increased by 1 and the end condition is checked in each iteration).

Looking at your list in #245 (comment), this decision may break existing specs relying on the inclusive behavior, but I don't think there are that many specs using _index in repeat-until at all, so it won't matter so much. And we inevitably break some compatibility just by standardizing previously various/undefined behavior, so why not choose the one that makes more sense and will be less confusing to users (in my opinion).

@transverberate
Copy link
Author

@generalmimon

100% agree. I also think that _index should be exclusive in repeat-until - in other words, _ should be the same as items[_index]. If the repeat-until condition also depends on a previous element for example, that will be items[_index - 1] (of course this can be used only for _index >= 1, because there is no previous element for _index == 0). (Needing previous item in repeat-until appeared on Gitter and I came up with the solution where items[_index - 2] is the previous element and items[_index - 1] the current one because that's how the JavaScript code backing the Web IDE behaved, but it's certainly confusing and counter-intuitive.)

I failed to consider the more common use of _index within a repeat-until block (use as an actual index e.g., items[_index - 1]) in my initial reasoning.

I now agree exclusive evaluation makes more sense and modified the PR to implement this.
More considerable alterations were needed to achieve this result so I'm in the process of setting up the test suite and confirming proper operation.

@Mingun
Copy link
Contributor

Mingun commented Apr 16, 2022

#787 is related documenting issue

@@ -333,7 +333,8 @@ class GoCompiler(typeProvider: ClassTypeProvider, config: RuntimeConfig)
handleAssignmentRepeatEos(id, r)

override def condRepeatUntilHeader(id: Identifier, io: String, dataType: DataType, untilExpr: Ast.expr): Unit = {
out.puts(s"for i := 1;; i++ {")
out.puts("i := 0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I anticipate a problem with multiple declarations of the variable i in the same function/scope here (IIRC, the := operator in Go must introduce at least one new variable). This should be solved by wrapping the i declaration along with the for loop into a block scope. Note that this is already done in C# and Java.

The test RepeatUntilComplex (repeat_until_complex.ksy) should test this case.

Technically, there are already common methods blockScope{Header,Footer} in each language compiler. The reason the condRepeatUntil{Header,Footer} methods of CSharpCompiler and JavaCompiler don't use it is probably because blockScope{Header,Footer} were only added later (in 0.9 I think) for another feature and nobody went through the existing codebase to see whether it can't be used somewhere else (but it's no big deal, just two lines here and there).

This redeclaration issue typically only occurs in statically-typed languages, because the dynamically-typed are usually OK with redeclaring arbitrary variables.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to let my PR go a bit stale.
Thanks for your continued interested in this as I believe this usecase is very important.
I'll check on this later this week and add the appropriate scoping.
Thanks for your patience.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to let my PR go a bit stale.

You don't have to apologize. I'm sorry, because I was the one who let this PR go stale :) Unfortunately, there is always something with a higher priority, and it's hard to allocate time for less "urgent" things.

I appreciate your quick response and hopefully we get this merged soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent compiler generated code when 'repition index' is used in 'repeat-until' expression
5 participants