-
I'm researching ways of writing parsers for languages such as AsciiDoc or Markdown. Even though I loved Prolog back in the day, I never used it for parsing, and until recently, I had forgotten that one of its original purposes was to write parsers. I was reminded of this, and doing some research, I thought Prolog and DCGs would let me write parsers for say, AsciiDoc in a mostly declarative way (neither AsciiDoc nor Markdown are CFG, and AsciiDoc has particularly complex structure1). The README for Scryer seems to indicate that string handling is particularly efficient, so I'm thinking about starting my experiments on Prolog parsing with it. My idea would be to create a framework that allows me to write a grammar in Prolog, then write some scaffolding that applies that grammar to a file, then outputs a JSON AST of the parsed document. Probably I would add a post-processing step to annotate the AST with line/column information2. Ideally, I think I'd like to write a tool that lets you do something like:
Which loads up the rules in ... Is this a bad/crazy idea? Would this be reasonably implementable with Scryer as it exists today? Footnotes
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 19 replies
-
¡Hola Alex! I think you're the same person who asked on Reddit a similar question. I believe that it was already shown that it is possible to write parsers for more-than-CFG grammars. Also, remember that DCGs are just an abstraction that gets translated to standard Prolog code. Normal Prolog code is Turing-complete so any parser can be written in Prolog, even though some of them will be trickier. As for myself, I've written several parsers in Prolog (some of them in Scryer) and for me, it's way better than the Lex/Yacc workflow. It's very easy to compose rules, add easy checks and more advanced checks (require calls to Prolog code). However, it's not perfect. A problem I have found in some of my grammars is that while indeed the first parsing solution is what I want, there are some other solutions left. Be careful with leaving choicepoints as in larger samples, the parser might seem like it succeeded but in reality, it doesn't. Other problem you might have is that while theoretically, the order of some stuff doesn't matter, in practice it does matter and it can make the parsing much faster/slower. Two of my most complex grammars are Teruel, which is a Jinja like template language: https://github.com/aarroyoc/teruel/ and MIPSie, a MIPS-assembly emulator: https://github.com/aarroyoc/mipsie/. Use them as inspiration if you want but the code quality is still not there yet. Finally, why Scryer Prolog? Scryer treats strings very efficiently and makes writing DCGs for them very easily. Other systems also treat strings somehow efficiently but they're an opaque type and they loose the properties of reasoning about them as lists. Scryer is one of the few systems that use an internal representation of strings that is compact and efficient and at the same time, exposes to the program the strings as mere lists of characters (atoms). Just my two cents |
Beta Was this translation helpful? Give feedback.
-
For the moment, I think the answer is "yes". I got Scryer Prolog to parse a simple grammar and spit a JSON, and I think https://github.com/rla/prolog-markdown is proof enough. I'll have to conduct more experiments, though. |
Beta Was this translation helpful? Give feedback.
-
An update. https://github.com/alexpdp7/prolog-parsing/blob/main/asciidoc_poc.pro This has the "minor" issues that the plunit tests don't work under Scryer, and that Scryer doesn't have I've moved to SWI because the tooling is a bit more complete, but if I spend more time on this I'll definitely fix the small compat issues and do some benchmarking! |
Beta Was this translation helpful? Give feedback.
-
If we're talking about general parsing tools, I think tree-sitter would be a good way to interface with a wide variety of pre-built grammars. Ideally you'd have a prolog program that takes an evaluated tree sitter grammar (a JSON document) and produces a prolog DCG that you can use with phrase/3 that can convert text into an AST according to the grammar. I'm not sure where to start with this but I suspect that it would be a great way to interact with an existing large community. |
Beta Was this translation helpful? Give feedback.
An update.
https://github.com/alexpdp7/prolog-parsing/blob/main/asciidoc_poc.pro
This has the "minor" issues that the plunit tests don't work under Scryer, and that Scryer doesn't have
flatten/2
(but then, I shouldn't be using it- I'm lazy), but that's a parser for a minimal "hard" subset of AsciiDoc- and my only experience with Prolog until recently was a semester in University (not writing parsers) like 20 years ago.I've moved to SWI because the tooling is a bit more complete, but if I spend more time on this I'll definitely fix the small compat issues and do some benchmarking!