-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistency? PAIR_HTML_COMMENT token not available in BLOCK_HTML #221
Comments
I would need to dig into some tests to address specific examples, but the main difference is that MultiMarkdown does not attempt to parse inside HTML blocks, as there is no point. HTML is copied verbatim and is not validated beyond that. HTML spans within Markdown blocks, which include HTML comments inside text paragraphs that would be matched as an Using MultiMarkdown as an HTML syntax highlighter will not be particularly useful. Basically you get HTML or HTML comment, and I would recommend both be styled the same to avoid confusion related to the issue you bring up. (NOTE: HTML Blocks that include blank lines are treated differently, where the HTML is treated as separate blocks, and it is possible to have plain Markdown blocks inside the "wrapping" HTML. Such as:
) |
In the end, in a syntax highlighting context, it does make sense to react to I'm a bit surprised that the tokenizer treats
as BLOCK_PARA, and
as BLOCK_HTML, but that's mostly irrelevant anyway. I don't have any particular request to push, it just looked a bit inconsistent at times and I didn't know if this is intentional or by accident. |
The first example requires parsing of multiple lines to determine whether this is an html comment, and whether there is additional content inside the paragraph. The second does not. |
@fletcher Another data point:
... the fencing inside the comment "block" affects all but the first line of text -- like any un-closed code fence does. I would expect the encompassing HTML comment to win, though, in the token level. What do you think? Is this a problem of the code that "applies" the token information instead? |
I'll have to dig in to see what it would take to change the parsing here. I agree that I would prefer this be a plain HTML comment, without a fenced block. Just need to see what the effort to change this would be. |
@fletcher After all this time, I can now imagine to try my hand at that and open a PR if you want to change things. Maybe it'd be useful if you first added some tests to prevent regressions on other complex cases, though? Your knowledge of edge cases in (M)MD parsing is unparalleled, so that's be helpful :) |
That's what the test suite is for. I'm all for you taking a stab at this if you like, but this may be one of the more complicated parts of the parsing to try to modify if you're not familiar with it (MultiMarkdown, lemon, etc.). I'm working on other MultiMarkdown software projects right now, so this is a bit further down the priority list for me. |
I noticed that when one uses HTML comments inside a regular paragraph, the token tree contains
PAIR_HTML_COMMENT → HTML_COMMENT_START
and eventuallyPAIR_HTML_COMMENT → HTML_COMMENT_STOP
(the → denotes is-a-child)This makes applying highlight colors to HTML comments a bit weird since the comment range is sometimes available as
PAIR_HTML_COMMENT
, sometimes not. Maybe something odd is underlying all this that you might want to know about @fletcherObservations
When you put the HTML comment on its own line in an empty document or at the end of a document without a trailing newline, it's the same, e.g.:
But when the HTML comment spans the whole line and ends in a newline character, the token tree is:
Note that this seems to be a special case, because the following in a Markdown document ...
... produces
BLOCK_HTML
with multipleLINE_HTML
tokens, except for the line with the HTML comment; the tokens from that line stand alone. Here's some debug output with token types and the ranges (location + length):Exceptions from the rule?
Oddly enough, when you have a multi-line HTML comment:
This is not at all recognized as
BLOCK_HTML
, butBLOCK_PARA
instead, and it includes aPAIR_HTML_COMMENT
.Plus if you add empty newlines around "foo" here, the whole
HTML_COMMENT_PAIR
is dissolved again and you have multiple paragraph blocks, one including the start, one the stop token.The text was updated successfully, but these errors were encountered: