-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression: inconsistent results between LicenseCompareHelper.isTextStandardLicense(), LicenseCompareHelper.isStandardLicenseWithinText(), and LicenseCompareHelper.matchingStandardLicenseIdsWithinText() #182
Comments
Update: after re-testing with v1.1.5 and v1.1.6 this appears to be caused by a change to the license list XML for [edit] this may be a furphy, given that the same inconsistency seems to be happening with other licenses (see next comment). |
I'm also seeing the same inconsistencies between the outputs of these methods when testing with the official
|
@pmonks Thanks for tracking this down - agree it is an issue. Just FYI - I'm going to be working on upgrading the library to the 3.0 spec over the next couple weeks, so it may be a while before I can work on this issue. |
Yep no worries - no hurry on my end. Just reporting things as I see them! |
Here's an |
I've done some analysis and isolated the issue to the Below is the Regex generated from the license template:
And below is the normalized text that should match the above regex:
Next step would be to analyze why the regex doesn't match. |
@goneall thanks for pulling those out! Couple of things I found:
I can't see any other obvious problems, though manually fixing the regex for each of these issues still didn't match the Atlassian BSD-2-Clause text, so I'm not certain that I've found everything. My suggestion would be to fix these identified issues, then start troubleshooting again with the improved regex if it's still not matching. |
@pmonks - thanks for the analysis.
There is some logic to change from greedy to non-greedy regexes - I think that's where the error may be. I'll do a bit more research. |
I did a bit more research - found where we're adding the This is probably an issue whenever we have optional or variable text at the beginning of the license template. |
Narrowed down the issue. The template It looks like the code only allowing optional text to be 5 characters long. What we should do is capture the optional text, tokenize it and make it a character group as optional. An easier solution would be to use the actual length of the optional text rather than the hard coded length of 5. Both of these are a bit of a design change. A short term fix would be to just increase the length of the optional text to something more reasonable. |
After changing the character limit to 50, I ran into a second part of the match which failed:
isn't matching:
which has me a bit baffled. |
Figured it out - needed to capture the newlines - changing |
The above change caused a stack overflow in the regex engine when comparing the entire text - so it isn't a solution. Since we're using Not quite sure where to go from here - other than replacing the regex with a hand build parser. |
What if you only used the regex to find only the beginning of the non-optional license text and ran LicenseCompareHelper.isTextStandardLicense on the rest of the text starting from there? If there is no difference or DifferenceDescription.getDifferenceMessage() starts with "Additional text found after the end of the expected license text", then there is a match for the license. |
Thanks @sdheh - Sounds like a good idea. I'll try it and see if it works. |
The checking for the difference message didn't work - I was getting inconsistent message. However, I figured out a different approach that did work. I created 2 different patterns - one for the start and one for the end and just ran the 2 different regexes to get the beginning and end of the license text. I'll work on a pull request after testing out other scenarios. |
Would you please tell me of some cases where it didn't work? I tried doing this myself and didn't run into any problems. |
I didn't keep track of all the details, but it was one of the unit tests I wrote for the recent matching issues. The error message that came back wasn't not the expected "Additional text found ..." but a different message. Possibly related to a variable or optional. Since the second approach worked, I didn't investigate further. |
@goneall will this approach (having two separate "before" and "after" regexes) work if a template has many BTW I'm also stumped as to why Here's how I tested (note: Clojure code, but it boils down to JVM byte codes / Java regexes under the covers): user=> (def re #"(?i)(?s)\QANY\E\s*EXPRESS(ED)?.{0,36000}\QPARTICULAR\E\s*")
#'user/re
user=> (def s "ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF merchantability AND FITNESS FOR A PARTICULAR")
#'user/s
user=> (re-matches re s)
["ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF merchantability AND FITNESS FOR A PARTICULAR" nil]
user=> (re-matches re "This does not match, just to show what that would look like.")
nil |
The optional text is replaced by a match all regex If you have a license that may not work, I can add it to the unit tests just to be sure.
Me too. here's the code that sets the It doesn't look like there is any other code path that would not set Let me know if you see anything wrong. What is also strange is that the |
Bizarre. I did a search for regex compilation throughout the code, just to see if there might be an alternative code path that's being used instead of that one, but I don't see anything obvious from that (regexes are compiled elsewhere, but they don't seem to be used in template matching). |
This is now fixed with PR #221 |
When tested with this
BSD-2-Clause
text, and theBSD-2-Clause
listed license object where appropriate, these three methods inorg.spdx.utility.compare.LicenseCompareHelper
return inconsistent results (emoji represent whether this result is expected or not):!isTextStandardLicense().isDifferenceFound()
returnstrue
✅isStandardLicenseWithinText()
returnsfalse
❌matchingStandardLicenseIdsWithinText()
returns an empty collection ❌matchingStandardLicenseIds()
returns["BSD-2-Clause"]
✅This was observed with v1.1.7 of the library, on JVM
OpenJDK 64-Bit Server VM Temurin-17.0.7+7 (build 17.0.7+7, mixed mode, sharing)
. This worked correctly in previous version (at least in v1.1.5 - not sure about v1.1.6).The text was updated successfully, but these errors were encountered: