Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Links in my mardown-file are not hidden, because of "redundant spaces" #344

Closed
Golddouble opened this issue Aug 7, 2023 · 8 comments
Closed

Comments

@Golddouble
Copy link

It looks like my "markdown-file" created with "maoxian-web-clipper" has too much spaces after "[" and before "]" in links.

I have clipped this page:
image
Source: https://www.ebay.de/sch/i.html?_nkw=dummies+statistik&_sacat=0&_sop=15

This gave me the following file:
2023-08-07 22-09-15.zip

It looks like this in reading view:
image

Only after deleting spaces after "[" and before "]" in the links, I get a better design:
image

Question:
Shouldn't the maoxian-web-clipper delete this spaces automatically?
Can I do anything?

Would appreciate some answer. Thank you.

@mika-cn
Copy link
Owner

mika-cn commented Aug 9, 2023

Thanks for the feedback :)

I can reproduce this problem. It's because Turndown (a js library that MaoXian used to convert HTML to Markdown) converts block elements into \n\n XXX \n\n, and the given page wrap images with <div>(which is a block element) inside link tags <a>. like this:

<a href="https://example.org/index">
  <div class="a-image-wrapper">
    <img src="a-image.png">
  <div>
</a>

So it'll be converted to markdown like this:

[

![](a-image.png)

](https://example.org/index)

Shouldn't the maoxian-web-clipper delete this spaces automatically?

Yes, It should delete these spaces.

Maybe we can unwrap the image link, and put the link below the image. like the belowing markdown, what do you think?

![](i-am-a-image.png)

[image link](https://example.org/index)

@Golddouble
Copy link
Author

Not sure, if I understand 100% correctly.

Do you mean this:
![Statistik für Wirtschafts- und Sozialwissenschaftler für Dummies A2 Thomas ...](asset/s-l300.webp)

[](https://www.ebay.de/itm/195524360773?epid=3042165208&hash=item2d8628fa45:g:Nr0AAOSwIf5joDmX&amdata=enc%3AAQAIAAAAwHFxi1KBPwUIcCk8tKLXL1LAN4KLWdr%2FP12IZNSb8zy66kfAwvL2Dz4x1MJgQ7IrKfQkHSPvU2m%2FBVmuG770YL0y5%2F4k%2FRFXli9ZbPAojdLW1Znou66D3v%2BkyoMFK3CahaAkhduhAfzMeTOZlphoShl0VSmXM%2Fdz3Mc2KJcr7GPnzSpcgveGrF5w5T83CRHH4bbie4ftZuQkOeadnNNv3ypR%2Fot5eOAczhzNyksLI4xLo6rt9xcZjwdsWrm0cD6prg%3D%3D%7Ctkp%3ABk9SR7Ko_dW1Yg)

or this:
![](asset/s-l300.webp)

[Statistik für Wirtschafts- und Sozialwissenschaftler für Dummies A2 Thomas ...](https://www.ebay.de/itm/195524360773?epid=3042165208&hash=item2d8628fa45:g:Nr0AAOSwIf5joDmX&amdata=enc%3AAQAIAAAAwHFxi1KBPwUIcCk8tKLXL1LAN4KLWdr%2FP12IZNSb8zy66kfAwvL2Dz4x1MJgQ7IrKfQkHSPvU2m%2FBVmuG770YL0y5%2F4k%2FRFXli9ZbPAojdLW1Znou66D3v%2BkyoMFK3CahaAkhduhAfzMeTOZlphoShl0VSmXM%2Fdz3Mc2KJcr7GPnzSpcgveGrF5w5T83CRHH4bbie4ftZuQkOeadnNNv3ypR%2Fot5eOAczhzNyksLI4xLo6rt9xcZjwdsWrm0cD6prg%3D%3D%7Ctkp%3ABk9SR7Ko_dW1Yg)

@Golddouble
Copy link
Author

Golddouble commented Aug 9, 2023

But maybe this is an issue, that better should be solved by the turndown-developper then.
Or do I miss anything?

@mika-cn
Copy link
Owner

mika-cn commented Aug 11, 2023

Sorry for the delay reply.

I was trying to fix it (remove these unneed spaces), and I can't stop thinking of other cases that a link <a> wrap other block elements. So i haven't fix it yet.


But maybe this is an issue, that better should be solved by the turndown-developper then.

I search Turndown's issues. there did has an issue about this, see: https://github.com/mixmark-io/turndown/issues/332.

The author of turndown said this:

It's a little more complicated. We need to introduce Markdown contexts to do it both universally and efficiently. This isolated case has indeed a simpler solution, but I don't like the idea to provide case-by-case fixes for these, especially when there is no universally correct solution. With contexts, users would be able to choose what to do with block elements nested in inline elements. It's simple for a div without semantics. But for e.g. a table inside a link, something valid means either discarding user's data or just keeping it in HTML (which denies Turndown itself). So in this cases, users have to choose. Now they can only choose what to do with these cases using HTML preprocessing - unfortunately.

Because this issue is opened at 2020-06-14. I don't see turndown will fix it in the near future. But this problem needs to be solved, even we can't come out a very good solution that can handle all block element inside anchor element cases.


And we will consider that unwrap solution in the future.

@Golddouble
Copy link
Author

Thank you for taking care of the problem.

I do not know something about markdown and HTML tags. But it is important, that making a fix does not create new conversion problems that did not exist before. So, I think it is important to delete the spaces only in a specific context.

@mika-cn
Copy link
Owner

mika-cn commented Aug 12, 2023

But it is important, that making a fix does not create new conversion problems that did not exist before. So, I think it is important to delete the spaces only in a specific context.

I agree with you. We can handle these common cases first. I've fix this specific case on the new version and I've published it. Please update and send feedbacks if the problem is still exist?

@Golddouble
Copy link
Author

Thank you.

I have tested it.
It works great for the "picture" case.

But in the example above, there is a second case. It's case 2 in the following picture:
grafik

Of course, this is another case.

No, secure way to solve also this second case?

@mika-cn
Copy link
Owner

mika-cn commented Aug 13, 2023

Thanks for the feedback.

This new case is not easy to solve. The problem is there's not corresponding format in markdown about block links (in this case: multiple lines of text). So how do we convert these block links to markdown?

As this issue is for the image links specifically, I've created a new issue for the discussion about general block links.

@mika-cn mika-cn closed this as completed Aug 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants