Skip to content
This repository has been archived by the owner on Oct 5, 2024. It is now read-only.

implement char escapes #20

Merged
merged 9 commits into from
Jul 4, 2019
Merged

Conversation

aaronjanse
Copy link
Contributor

fix #11

This is based on the Elm Parser example code (which has a bug btw). I wish I found it earlier.

I wrote the PR such that we can later reuse the escape-handling code for literalString.

8. <span id="f8"></span> Comprehensive tests missing; not tracked yet
9. <span id="f9"></span> Multiline strings (and maybe more) missing; not tracked yet
6. <span id="f6"></span> Comprehensive tests missing; not tracked yet
7. <span id="f7"></span> Multiline strings (and maybe more) missing; not tracked yet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next time please don't bother shifting the numbers, life's too short for that :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phew 👍

src="https://avatars0.githubusercontent.com/u/16829510">
</br>
<a href="https://github.com/aaronjanse">Aaron Janse</a>
</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

TODO Unicode escapes
-}
-- for literalChar and, in the future, literalString
stringHelp =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably rather name this character or insideQuotes or something. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

character is great because it parses one character, even if that character is represented by multiple bytes (e.g. \u{123}) 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by 483bf8a

-- for literalChar and, in the future, literalString
stringHelp =
P.oneOf
[ P.succeed (identity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably unneeded parens for the identity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by 657fa28

, P.map (\_ -> '\'') (P.token (P.Token "'" ExpectingEscapeCharacter))
, P.map (\_ -> '\n') (P.token (P.Token "n" ExpectingEscapeCharacter))
, P.map (\_ -> '\t') (P.token (P.Token "t" ExpectingEscapeCharacter))
, P.map (\_ -> '\r') (P.token (P.Token "r" ExpectingEscapeCharacter))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still thinking about these ParserProblems. The problem with this current solution is that the error messages will only be able to show five times the same thing ("I expected escape character, escape character, ...")

Please parameterize this one (| ExpectingEscapeCharacter Char) and give these usages of it the chars they're really about. That should be enough and not bloat the ParserProblem type so much. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like below?

The above behavior would interpret \x as a backslash followed by a separate x.
The below would throw an error if it sees \x.

 |= P.oneOf
-    [ P.map (\_ -> '\"') (P.token (P.Token "\"" ExpectingEscapeCharacter))
-    , P.map (\_ -> '\'') (P.token (P.Token "'"  ExpectingEscapeCharacter))
-    , P.map (\_ -> '\n') (P.token (P.Token "n"  ExpectingEscapeCharacter))
-    , P.map (\_ -> '\t') (P.token (P.Token "t"  ExpectingEscapeCharacter))
-    , P.map (\_ -> '\r') (P.token (P.Token "r"  ExpectingEscapeCharacter))
+    [ P.map (\_ -> '\"') (Parser.token "\""))
+    , P.map (\_ -> '\'') (Parser.token "'"))
+    , P.map (\_ -> '\n') (Parser.token "n"))
+    , P.map (\_ -> '\t') (Parser.token "t"))
+    , P.map (\_ -> '\r') (Parser.token "r"))
     , P.succeed identity
         |. P.token (P.Token "u" ExpectingEscapeCharacter)
         |. P.token (P.Token "{" ExpectingUnicodeEscapeLeftBrace)
         |= unicode
         |. P.token (P.Token "}" ExpectingUnicodeEscapeRightBrace)
+    , P.problem ExpectingValidEscapeChar
     ]

Please parameterize this one (| ExpectingEscapeCharacter Char) and give these usages of it the chars they're really about. That should be enough and not bloat the ParserProblem type so much. WDYT?

The problem is that I don't think in either solutions, both the current one in the PR and the diff above, the compiler would print out one of those token errors. Rather, it would either skip the oneOf (current PR code) or print ExpectingValidEscapeChar (diff above).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current PR state: https://ellie-app.com/5Z9tkd2hGSQa1
my suggestion: https://ellie-app.com/5Z9vStSJv72a1
The changes are on lines 31-37, and 77
I think that's enough! The ParserProblems will tell you what characters did it expect ➡️ we can write a nice error message.

If you think this still overlooks some corner case, please shout! :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, my bad!
Fixed by 7c495f7

string
|> String.uncons
|> Maybe.map (Tuple.first >> P.succeed)
|> Maybe.withDefault (P.problem (CompilerBug "Multiple characters chomped in `literalChar`"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message is not entirely correct, we're not inside literalChar. If we want to reuse this in Strings too. it would mislead the user seeing the message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by 483bf8a

|> Maybe.map (Tuple.first >> Char >> P.succeed)
|> Maybe.withDefault (P.problem (CompilerBug "Multiple characters chomped in `literalChar`"))
)
|> P.map (\n -> Char n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|> P.map Char

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by b7a3cf4

Thank you, @Janiczek!



addHex : Char -> Int -> Int
addHex char total =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use Hex.fromString instead of this. Also, it is buggy wrt. characters outside the hex range I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by 09fa46c

codeToChar str =
let
length = String.length str
code = String.foldl addHex 0 str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's check for the length, that's great, but let's use Hex.fromString for the actual "is this a hex string" and "what int value does it represent" functionality

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by 09fa46c

, "'\\\"'"
, Ok (Literal (Char '"')) -- "
) -- ^ workaround for official elm
-- vscode syntax highlighter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈 OK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, I'm happy to remove it.
Without it, however, the coloring is wrong for the next ~17 lines and a "parser error" prevents vscode from showing more helpful errors :-/

https://github.com/Krzysztof-Cieslak/vscode-elm/issues/244

@Janiczek
Copy link
Contributor

Janiczek commented Jul 3, 2019

Sorry for so many comments! The parsers are great, I'm just nitpicking a lot. Great job nevertheless! 🎉
Let's work through the comments and I can merge this.

(BTW we added elm-format on the master branch so at the end run it with make format please, otherwise Travis will yell)

@aaronjanse
Copy link
Contributor Author

Thank you for the comments. It's a nice codebase, and, like you, I want to keep it that way :-)

I'm busy at the moment but I'll try to go through the comments soon (maybe tonight).

@aaronjanse
Copy link
Contributor Author

(whoa the new GitHub merge conflict UI is awesome!)

@aaronjanse
Copy link
Contributor Author

I think this is ready for another review @Janiczek 😉

@Janiczek
Copy link
Contributor

Janiczek commented Jul 4, 2019

Just one comment re ParserProblems, otherwise this is ready for merge! :)

@aaronjanse
Copy link
Contributor Author

Just one comment re ParserProblems, otherwise this is ready for merge! :)

Great. Fixed by 7c495f7

@Janiczek Janiczek merged commit 016b286 into elm-in-elm:master Jul 4, 2019
@Janiczek
Copy link
Contributor

Janiczek commented Jul 4, 2019

Thanks @aaronjanse!

@aaronjanse aaronjanse deleted the char-escapes branch July 4, 2019 20:30
Warry pushed a commit to Warry/compiler that referenced this pull request Jul 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write comprehensive parser tests for characters
2 participants