Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support: Markdown to Markdown (round trip) #164

Closed
firasm opened this issue May 7, 2021 · 5 comments
Closed

Support: Markdown to Markdown (round trip) #164

firasm opened this issue May 7, 2021 · 5 comments

Comments

@firasm
Copy link
Contributor

firasm commented May 7, 2021

I've been looking for a markdown parser and I think I've found what I'm looking for in markdown-it-py. Being a big fan of Jupyterbook, I figure it's best to stay in the family. Thanks for creating this!

I'm having some difficulty trying to do something very specific - I have a feeling this is related to an open issue (#10).

Here is my source markdown file (with a YAML header):

---
title: Distance travelled
type: mcq
author: Firas Moosvi
source: original
tags:
- kinematics
- test
---
# File Title

Introduction to the problem

## Question 1 Text

Some text here

### Answer 1 Section

Some answers here

## Question 2 Text

Some other text here

### Answer 2 Section

Some other answers here

I would like to parse this markdown file and create four different strings:

YAML Header

yaml = """---
title: Distance travelled
type: mcq
author: Firas Moosvi
source: original
tags:
- kinematics
- test
---"""

Heading

heading = """# File Title

Introduction to the problem"""

Block 1 and 2

block1 = """## Question 1 Text

Some text here

### Answer 1 Section

Some answers here"""

and

block2 = """## Question 2 Text

Some other text here

### Answer 2 Section

Some other answers here"""

Is this possible with markdown-it-py ? I don't want to convert/render it to html and then convert it to markdown, that seems like an extra unnecessary step.

I've created a minimal working example in a Jupyter Notebook here: https://github.com/firasm/mwe_md-it-py/blob/main/example.ipynb.

The crux of my issue is that I have the tokens, and I can create a SyntaxTreeNode, but I eventually need to turn those into objects that I could export. Even if I can't read in the YAML header, parsing the rest of the markdown would still be useful.

@welcome
Copy link

welcome bot commented May 7, 2021

Thanks for opening your first issue here! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.

If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).

Welcome to the EBP community! 🎉

@hukkin
Copy link
Contributor

hukkin commented May 7, 2021

I think this does what you want. Note that I never ran this so expect some debugging. This also doesn't include support for extra syntax like the frontmatter extension.

from markdown_it import MarkdownIt
from mdformat.renderer import MDRenderer

your_markdown = "# Heading\nParagraph\n"

mdit = MarkdownIt()
env = {}
tokens = mdit.parse(your_markdown, env)

tokens_parts = ...  # Split the token stream here

for part in tokens_parts:
    rendered_part = MDRenderer().render(part, mdit.options, env)
    print(rendered_part)

@firasm
Copy link
Contributor Author

firasm commented May 8, 2021

Thanks for the assist!

This is what ended up working:

tokens_parts = [tokens[0:17],tokens[17:23],tokens[23:35],tokens[35:]]

md_files = []

for part in tokens_parts:
    rendered_part = MDRenderer().render(part, mdit.options, env)
    md_files.append(rendered_part)
    #print(rendered_part)

I had to split the tokens manually, but there must be a way to process it so that I can split it algorithmically (like tokens.headings(level=2)[0] for everything nested within the first level 2 heading, and tokens.headings(level=2)[1] for everything nested within the second level 2 heading.

Will keep looking through the docs and code to find how to do this.

P.S. If this is a useful example, I'm happy to also include this as part of the documentation.

@firasm
Copy link
Contributor Author

firasm commented May 8, 2021

Managed to brute-force it like this:

parts = {'header': [],
         'h1':[],
         'h2':[]}

for x,t in enumerate(tokens):
    
    if t.tag == 'hr':
        parts['header'].append(x)
    elif t.tag == 'h1' and t.nesting == 1:
        parts['h1'].append(x)
    elif t.tag == 'h2': # and t.nesting == 1:
        parts['h2'].append(x)
        
    print(t,'\n')

Output:

{'header': [0, 16], 'h1': [17], 'h2': [23, 25, 35, 37]}

@raymondsryang
Copy link

I have the same issue with this, Will be good if there is an document can describe how to do this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants