-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make set of result XML files always consistent and provide parameter for selection #983
Comments
Having (1) is on purpose and really important for lots of users. Without (1) one would always have to use a table-definition file with a
Yes, because the pendant of (1) for (4) would be exactly the same file as (4), just with a different name. So this would indeed be fully redundant.
I am aware of the inconsistency, but do not see a way to resolve it without breaking the standard way of using |
Consistency has a price. The current situation is "cluttered". It is a fundamental design principle to avoid data duplication whenever possible. (And we are generating duplicated data even without being asked for it.) There are many possible solutions, let me name a few:
If the most common use case is to use XMLs for run definitions, then we could simply suppress the files for task sets, that is, have This would avoid duplication and support both use cases. To make the change non-breaking, we could go until 4.0 with the current behavior and print a deprecated warning, and support the two options from an arbitrary time onwards. |
Yes, but I do not see that it is worth the price in this instance. Avoiding "clutter" is a relatively small advantage.
I would say that this principle is useful when for example duplication can cause problems such as inconsistencies etc. But I do not see this here, and I don't think it is a more general principle to avoid this whenever possible. After all, we also produce duplicate data by for example writing out the results in different file formats (
As I mentioned, using the results for the run definitions is a common case (I am pretty sure it is the most common use case for table-generator - it definitively is for my own personal usage). Requiring a tedious manual XML editing step like this is an absolute no-go.
I have considered this already, and there exists #977. But another (likely the second most) common use case is to create a table comparing two tools or two tool configs or two tool revisions. And as soon as we want to have two or more run sets next to each other in the table, any solution that requires to pass more than one file per run set on the command line would not be practicable anymore. Basically we have the following use cases:
This is possible.
I agree that it is likely not so common that users want both for a specific benchmark execution. However, I assume it will certainly happen that people will simply forget to pass So there is a real cost for 3. as well (in addition to the maintenance cost of more code paths and users having to learn to use yet another option). Besides for some slight potential for confusion, I do not see any actual problem (such as likely danger of using wrong sets of results or inconsistencies) caused by the duplication, so the current behavior seems to have less severe disadvantages than the alternatives. |
Depends on how valuable you rate my work time. I have wasted significant amounts of time over the years,
Exactly, it caused extreme problems, for example, a two days delay in getting SV-COMP results reproducible.
I hope now.
Text-book knowledge, I would say.
I also do not like this! But this did not yet cause significant problems to me. (In table-generator, I have a command-line switch to get rid of that clutter. I want only HTML, not both HTML and CSV, so I can choose, and beginners get both to see what BenchExec can offer.) But I would prefer that users would tell want they want, instead of what they do not want.
Is it not! Here you talk about different formats of the same data. There you talk about the same data in the same format.
Nothing compared to loss of time due to confusion regarding data duplication, missing files, etc.
Why do you not see this? |
Let me add a different perspective, because you mentioned several times that you do not see the problem. The use cases that you mention are those of light-weight users. I am not against supporting those ---BenchExec should not only be for power users---, and I made a constructive proposal how to achieve to support those use cases as well. But the BenchExec project was mainly motivated and developed to support "power users":
I hope this illustration helps to understand that there is no way to find it acceptable to have extra unnecessary files with duplicated data hanging around. The minimal (and perhaps perfectly fine) solution would be a command-line switch to suppress the output of one kind of files, just like I can tell table-generator to omit the CSV output. |
Thinking about power users and normal users is a good idea. Our goal should be to make simple things easy for standard users, and complex things possible for power users. This implies that defaults should suite standard users well, because power users are supposed to know what they need and be able to specify that.
I do not see one or the user group as more important than the other.
Side node: I find this far from "unacceptable", this word is much too strong here, and using such exaggerations does not help. I want to put this into perspective: "unacceptable" would be stuff that makes it likely to get wrong results, or makes BenchExec unusable for users. Here we are talking about fine-tuning some details of the output to make some use cases more convenient (it is not even a feature request that would enable more use cases than possible right now). And you seem to agree, after all you labeled this "low priority" yourself. So this can't be "unacceptable" practically by definition. And of course there is a workaround that power users can apply: delete the unnecessary files right after the benchmark execution, so that the space is wasted only for a short time.
This is a new suggestion, not brought up so far. Adding something like |
I felt the need to emphasize the one user group because you were so strongly arguing in favor of the other, not acknowledging sufficiently the needs of the first. I agree with everything until the side note. I disagree with the side note: I did not yet succeed to convince you that this does lead to wrong results. If a user looks at thousands of such files (there are non-expert BenchExec users looking at competition artifacts), The rundefinition files are missing in some situations (when there is only one task set in a particular rundefinition). I am against calling my problem report an exaggeration. I used priority "low" because the priority is a combination of urgency and importance. It is not urgent, because for SV-COMP 2024 I have to live with the And no, deleting files that BenchExec produced is not an option for me, for several reasons. (And the space is wasted on copy-on-write file systems, but disk space is not my concern.) Also, this would lead to wrong results: If the user removes all the taskset files and keeps only the rundefinition files, then the user has effectively lost results: all those that were only in taskset results, but had the rundefinition files missing. Regarding the naming of the command-line parameters: I suggest to use the notions that describe the result precisely: There is no notion of merging, and no notion of basic. But there is a notion of run definition and task set. Those should be used. And note that Last not least, I do not like the change of the title: And the parameter is not for the consistent naming, but for consistent contents. This issue is about "Inconsistent/duplicated generation of XML result files", which is not yet addressed, but discussed. |
Regarding deleting: I would have to delete 2373 result XML files for rundefinitions. I am tempted to remove them, in order to provide a clean artifact to users. |
For the route using the I think this is fairly simple to implement. I can create a pull request if necessary. |
Orthogonally to everything said so far: The use case for the (light-weight) user that you mentioned Philipp "benchexec produces one results file, and this can be given to table-generator without further action by the user" is currently not supported. The user gets several XML results files (one for each run definition, one for each task set), but not for all results. So there should be three levels of results files for such users:
Just an observation. I have no strong opinion on this. |
Not sure if I understand it correctly, but the files that you refer to as "missing files" should not be added in default mode, because their content would be exactly the same as the already existing files, so pure redundancy. The switch would (a) enable all task-definition result files and (b) disable all run-definition result files.
This is of course always welcome!
Yes, but
If you have more than one run definition in a benchmark definition, then the primary use case for this is to benchmark different configurations of the same tool. So when creating tables, these results per run definition should be put next to each other, and for that we exactly have the required file output. And the above mentioned command-line suggestion from The fact that SV-COMP uses different run definitions for different sets of tasks instead of different configuration is unfortunately currently necessary for technical reasons (#8), but not the common case. Btw., we also do not have a common output file with all results for the case where you give more than one benchmark definition to |
Proposal:
|
I have explained in #983 (comment) already that I do not see this as a good solution: Users will inevitably forget to pass |
Result of offline discussion:
A use case for the latter would for example also be CPAchecker's CI, where we only want result files on the run-definition level. The name of the parameter(s) is still open, but as usual we want a positive term (no negation) and prefer to avoid abbreviations. |
I have increased the priority. I used BenchExec again for a competition and ended up again with an inconsistent mess of files. In my opinion, #983 (comment) is the best solution, And if a user really wants the data twice, we could allow to give both parameters, If I cannot convince you of this solution, I am also fine with #983 (comment) But I find the current mess of inconsistent results unacceptable for a tool that emphasized reliability and accuracy. |
@schroeding Heads-up: this would be your next task after the current one. |
Given a benchmark definition with one run definition A with two task sets x and y and one run-definition B with one task-set z, then BenchExec produces ---surprizingly--- four XML result files:
(1) just duplicates the information in (2) and (3) and can/should be omitted.
In the other direction, the pendant of (1) for (4) is missing, there is no duplicating XML results file for run definition B.
This causes (repeatedly) confusion on the users side, because implicitly and automatically one assumes a certain consistency.
The text was updated successfully, but these errors were encountered: