Support: Markdown to Markdown (round trip) #164

firasm · 2021-05-07T04:38:15Z

I've been looking for a markdown parser and I think I've found what I'm looking for in markdown-it-py. Being a big fan of Jupyterbook, I figure it's best to stay in the family. Thanks for creating this!

I'm having some difficulty trying to do something very specific - I have a feeling this is related to an open issue (#10).

Here is my source markdown file (with a YAML header):

---
title: Distance travelled
type: mcq
author: Firas Moosvi
source: original
tags:
- kinematics
- test
---
# File Title

Introduction to the problem

## Question 1 Text

Some text here

### Answer 1 Section

Some answers here

## Question 2 Text

Some other text here

### Answer 2 Section

Some other answers here

I would like to parse this markdown file and create four different strings:

YAML Header

yaml = """---
title: Distance travelled
type: mcq
author: Firas Moosvi
source: original
tags:
- kinematics
- test
---"""

Heading

heading = """# File Title

Introduction to the problem"""

Block 1 and 2

block1 = """## Question 1 Text

Some text here

### Answer 1 Section

Some answers here"""

and

block2 = """## Question 2 Text

Some other text here

### Answer 2 Section

Some other answers here"""

Is this possible with markdown-it-py ? I don't want to convert/render it to html and then convert it to markdown, that seems like an extra unnecessary step.

I've created a minimal working example in a Jupyter Notebook here: https://github.com/firasm/mwe_md-it-py/blob/main/example.ipynb.

The crux of my issue is that I have the tokens, and I can create a SyntaxTreeNode, but I eventually need to turn those into objects that I could export. Even if I can't read in the YAML header, parsing the rest of the markdown would still be useful.

The text was updated successfully, but these errors were encountered:

welcome · 2021-05-07T04:38:16Z

Thanks for opening your first issue here! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.

If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).

Welcome to the EBP community! 🎉

hukkin · 2021-05-07T23:25:49Z

I think this does what you want. Note that I never ran this so expect some debugging. This also doesn't include support for extra syntax like the frontmatter extension.

from markdown_it import MarkdownIt
from mdformat.renderer import MDRenderer

your_markdown = "# Heading\nParagraph\n"

mdit = MarkdownIt()
env = {}
tokens = mdit.parse(your_markdown, env)

tokens_parts = ...  # Split the token stream here

for part in tokens_parts:
    rendered_part = MDRenderer().render(part, mdit.options, env)
    print(rendered_part)

firasm · 2021-05-08T01:51:06Z

Thanks for the assist!

This is what ended up working:

tokens_parts = [tokens[0:17],tokens[17:23],tokens[23:35],tokens[35:]]

md_files = []

for part in tokens_parts:
    rendered_part = MDRenderer().render(part, mdit.options, env)
    md_files.append(rendered_part)
    #print(rendered_part)

I had to split the tokens manually, but there must be a way to process it so that I can split it algorithmically (like tokens.headings(level=2)[0] for everything nested within the first level 2 heading, and tokens.headings(level=2)[1] for everything nested within the second level 2 heading.

Will keep looking through the docs and code to find how to do this.

P.S. If this is a useful example, I'm happy to also include this as part of the documentation.

firasm · 2021-05-08T05:38:33Z

Managed to brute-force it like this:

parts = {'header': [],
         'h1':[],
         'h2':[]}

for x,t in enumerate(tokens):
    
    if t.tag == 'hr':
        parts['header'].append(x)
    elif t.tag == 'h1' and t.nesting == 1:
        parts['h1'].append(x)
    elif t.tag == 'h2': # and t.nesting == 1:
        parts['h2'].append(x)
        
    print(t,'\n')

Output:

{'header': [0, 16], 'h1': [17], 'h2': [23, 25, 35, 37]}

raymondsryang · 2022-11-02T08:30:01Z

I have the same issue with this, Will be good if there is an document can describe how to do this

firasm closed this as completed May 11, 2021

hukkin mentioned this issue Jun 18, 2021

♻️ REFACTOR: Move to markdown-it + mdformat renderer executablebooks/rst-to-myst#18

Merged

inktrap mentioned this issue Jun 28, 2022

Markdown Renderer #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support: Markdown to Markdown (round trip) #164

Support: Markdown to Markdown (round trip) #164

firasm commented May 7, 2021

welcome bot commented May 7, 2021

hukkin commented May 7, 2021

firasm commented May 8, 2021

firasm commented May 8, 2021

raymondsryang commented Nov 2, 2022

Support: Markdown to Markdown (round trip) #164

Support: Markdown to Markdown (round trip) #164

Comments

firasm commented May 7, 2021

YAML Header

Heading

Block 1 and 2

welcome bot commented May 7, 2021

hukkin commented May 7, 2021

firasm commented May 8, 2021

firasm commented May 8, 2021

raymondsryang commented Nov 2, 2022