-
Notifications
You must be signed in to change notification settings - Fork 308
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix URL parsing for the extended tag view
- Loading branch information
1 parent
c860f34
commit b659c0d
Showing
1 changed file
with
2 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
b659c0d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you intending to interpolate
$,
(i.e.$OFS
) into the character class? If not, then the RE isn't correct (it won't catch,
) and it is much slower than it needs to be. And when$,
warnings are enabled:If you are not intending interpolation:
Also note that
.
doesn't need escapement in a character class. Escapement of-
can be avoided too, if it is the first char in the class, but I left it with the other punct at the end.Finally, the
g
flag was curious to me given the two substitutions, in that the code can't catch the two types of URIs for a given string.b659c0d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I edited my reply above.
b659c0d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you tell me what would make that big performance difference? It's not an easy piece of code to read... And how did you test?
And while we're at it: wouldn't we want to have the hash tag in there, too? It can define an anger on a linked resource.
BTW: with the
i
modifier we'd not even have to define both upper and lower casea-z
in the class.b659c0d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore my mention of performance. I'd used single quote substitution delimiters to avoid interpolation. That was the big speed win. But of course that disables the replacement side too! It was a last minute change and when I ran my Benchmark::cmpthese() tests, I didn't notice the error in initial output of my lengthy test URLs.
Yes, of course the
i
modifier, so it would be good to remove either A-Z or a-z in the char class.Technically, that RE for parsing a URI isn't correct. The host part cannot contain
#
,?
or/
. And the path part cannot contain?
or#
. But I surmised you just wanted something quick and dirty that mostly worked. Your call on how well this RE needs to work.See: Parsing a URI Reference with a Regular Expression in RFC 3986.
https://www.rfc-editor.org/rfc/rfc3986#appendix-B
Given that the value may have multiple URIs, anywhere, with no well-defined delimiters, you can't use that RE since once you remove the beginning of line anchor it will degenerate and greedily also match the end of line empty string, so the global substitution would have to be replaced with a looping match so that you could ignore an empty string match. I didn't test or benchmark that option (but did test a global var qr/ / version of both REs; that turned out to be surprisingly much slower that the /o flag in the string version).
(Odd aside: GitHub seems to have enabled some new Commit feature preview for me since a few hours ago when I posted the comment above. With it enabled, I see comments when I'm logged out, but once I log in, they stop loading, both Safari and Firefox.).
b659c0d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, this really was a q'n'd hack to get something working, now tweaking. But I don't expect it to cover multiple URLs etc. I'd consider those corner cases. It's been good enough for the first several months 😁. So... what do you think should be fixed?
BTW: I wouldn't consider performance an issue really, as this regex would only be used once or twice per page rendering.
And yes about the latest Github changes. I've had similar remarks from other users, too. I myself only saw that kind of in-line/floating comment once. Ever since I've had old school comments only again.
b659c0d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a list context global match RE that I want to test more carefully tomorrow. It should allow eliminating the second RE, replaced by testing for absence of protocol and presence of 'www' for the emusic-type.
I didn't feel I understood the range of value possibilities to know how loose or tight the RE had to be; your reply helps.
b659c0d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, perhaps you can try this out:
This isn't strict, and it can't be given the freedom of the data input. Here's a test string I used:
and the output:
It probably isn't desirable to create a link from the sample.com/path value, but again, the data input is too undefined. And the existing code would do likewise when the data didn't contain any other URLs starting with
https?
.After I increased the 256 byte limit in Slim::Menu::TrackInfo::tagDump(), the appropriate links were clickable in the tag dump:
b659c0d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's even less human readable 😂. I first read it on the phone...
What about the
o
modifier?perlre
says:Sounds promising 😁. We can live without that, can't we? Could you please submit a PR against 9.1 (let's be cautious...)?
Would you know whether that regex was working in JavaScript, too? Material is parsing on the client.
b659c0d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ugly, I get it. The ambiguity of the data presents the challenge, the RE trying to do what the original code purported to do. But its effort was later thwarted by the 256 char limit (does it need to be so low?).
If the need is to only match a single... the first... URL in a value, the original code with
$
properly escaped and#
added is good enough; it can't be perfect. Let me know if I should still create the 9.1 PR (I fear your plate hath runneth over).The
/o
modifer is safe and useful if there's no interpolation in the pattern. In much older Perl versions, it could have some issues, and there were bugs in the Perl code optimizer. It was also added to far too many REs without thought.In the original code, if
warnings
were enabled, Perl would have emitted one warning (as it interpolated$,
during RE compilation). Remove the\o
, and Perl emits one warning per pattern match attempt... because it has to interpolate$,
on every pass, thus recompile the RE. The happenstance of enabling warnings and$,
evaluating to undef revealed three hidden problems in the RE.The RE itself should work in JavaScript so long as it's doing greedy capturing. Obviously the
<a>
formatting will require different code.Off topic: Here's where I was hoping to go with all this... my files have various ID tags for amg, discogs, mb, etc.. I was hoping to get time to see if I could extend Advanced Tag View to construct appropriate clickable URLs based on those IDs, presented in a list if the tag had multiple IDs (e.g. semicolon-separated). My taggers create these and push them to media software I use for library management (and it supports doing this via its expression language and UI). But it would be much more efficient during listening via Lyrion to jump into track details and click the link constructed from the ID(s).