Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 .ly files? #58

Closed
riccardove opened this issue Mar 28, 2015 · 6 comments
Closed

UTF-8 .ly files? #58

riccardove opened this issue Mar 28, 2015 · 6 comments
Labels

Comments

@riccardove
Copy link

There may be some problem with .ly files encoded in UTF-8, trying to process the file below fails unless I replace the "Là" with "La", the error message follows

\score {
\new Voice {
\relative {
\key cis \major
\time 3/4
gis4 cis8. cis16 eis8. eis16 ais4 gis4. eis8 ais4 gis4. eis8 fis8 eis8 dis2
}
\addlyrics {
Là su per le mon -- ta -- gne fra bo -- schi e val -- li d'or
}
}
\layout { }
\midi { }
}
\version "2.18.2"

MIDI: Parsing MIDI file has ended.

Current .ly source file: /Users/riccardo/Music/Cubase LE AI Elements Projects/LaMontanara-01/ly2video.tmp/sanitised.ly
Traceback (most recent call last):
File "/Users/riccardo/Desktop/Vocaloid/ly2video-0.4.1/ly2video.py", line 1867, in
status = main()
File "/Users/riccardo/Desktop/Vocaloid/ly2video-0.4.1/ly2video.py", line 1826, in main
pitchBends)
File "/Users/riccardo/Desktop/Vocaloid/ly2video-0.4.1/ly2video.py", line 691, in getNoteIndices
grobPitchValue, grobPitchToken = lySrcLocation.getAbsolutePitch()
File "/Users/riccardo/Desktop/Vocaloid/ly2video-0.4.1/ly2video.py", line 148, in getAbsolutePitch
return LySrc.get(self.filename).getAbsolutePitch(self)
File "/Users/riccardo/Desktop/Vocaloid/ly2video-0.4.1/ly2video.py", line 80, in get
cls.cache[filename] = LySrc(filename)
File "/Users/riccardo/Desktop/Vocaloid/ly2video-0.4.1/ly2video.py", line 87, in init
self.initParser(document)
File "/Users/riccardo/Desktop/Vocaloid/ly2video-0.4.1/ly2video.py", line 95, in initParser
language, keyPitch = ly.tools.languageAndKey(document)
File "/Users/riccardo/Desktop/Vocaloid/ly2video-0.4.1/ly/tools.py", line 322, in languageAndKey
for token in tokens:
File "/Users/riccardo/Desktop/Vocaloid/ly2video-0.4.1/ly/tokenize.py", line 693, in tokens
for token in super(LineColumnMixin, self).tokens(text, pos):
File "/Users/riccardo/Desktop/Vocaloid/ly2video-0.4.1/ly/tokenize.py", line 218, in tokens
yield self.Unparsed(text[pos:m.start()], pos)
File "/Users/riccardo/Desktop/Vocaloid/ly2video-0.4.1/ly/tokenize.py", line 296, in new
obj = unicode.new(cls, value)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

@aspiers
Copy link
Owner

aspiers commented Mar 28, 2015

Sounds like you're right :-/

@jonarnoldmusic
Copy link

Is there a way to tell which character is the offending one?

I don't have any special characters like the above poster mentioned, and I'm still getting this error. File attached as txt.
ly2vidAlto.txt

@aspiers
Copy link
Owner

aspiers commented Feb 27, 2016

Without looking closer, I guess it would be necessary to improve the tokenizer so that it correctly reports useful line/column numbers locating the error.

@aspiers aspiers added the bug label Feb 27, 2016
@aspiers
Copy link
Owner

aspiers commented Sep 3, 2017

But actually as per #19 we want to ditch the current tokenizer and use the latest python-ly instead.

@klirichek
Copy link

I've step into the same problem

Fast workaround:
at the top of tokenize.py, right after other imports, add:

import sys
reload(sys)
sys.setdefaultencoding('utf_8')

It solved problem for me.

@thawk
Copy link
Contributor

thawk commented Jan 18, 2020

It seems that use UTF-8 as default encoding is right.

In LilyPond — Notation Reference v2.18.2 (stable-branch) -> 3.3.3 Special characters -> Text encoding, it says that:

LilyPond uses the character repertoire defined by the Unicode consortium and ISO/IEC 10646. This defines a unique name and code point for the character sets used in virtually all modern languages and many others too. Unicode can be implemented using several different encodings. LilyPond uses the UTF-8 encoding (UTF stands for Unicode Transformation Format) which represents all common Latin characters in one byte, and represents other characters using a variable length format of up to four bytes.

thawk added a commit to thawk/ly2video that referenced this issue Jan 18, 2020
@aspiers aspiers closed this as completed in 2e905d2 Feb 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants