You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's currently quite difficult to implement visualization for a given const.str entry as the partition contains several possible character encodings and does not specify what character encoding is used for a particular element of the partition (instead relying upon the context for how the string is referenced (e.g., src.word can reference a const.str entry as a wide character literal while TextOffset contextually means a UTF-8 null-terminated identifier).
I think it would be a significant improvement if there was either a partition per-encoding (e.g., const.utf-8-str, const.utf-16-str, etc, or an encoding specified per line). This similarly would make it harder for folks -- such as myself :) -- to misinterpret the encoding of the contents of the partition.
The text was updated successfully, but these errors were encountered:
I agree that the partition const.str has morphed over time to serve more generally as backing store for all character strings with the encoding moving into the entry of the character string descriptor. It makes sense to reconsider that and introduce dedicated partitions:
- const.str.utf-8
- const.str.utf-16
- const.str.utf-32
That would probably means replacing the uses of TextOffset for string literals with an StrIndex with sorts:
StrSort::Utf8
StrSort::Utf16
StrSort::Utf16
TextOffset would be an index into const.str.utf-8 since all names are internalized as UTF-8 identifiers.
It's currently quite difficult to implement visualization for a given
const.str
entry as the partition contains several possible character encodings and does not specify what character encoding is used for a particular element of the partition (instead relying upon the context for how the string is referenced (e.g.,src.word
can reference aconst.str
entry as a wide character literal whileTextOffset
contextually means a UTF-8 null-terminated identifier).I think it would be a significant improvement if there was either a partition per-encoding (e.g.,
const.utf-8-str
,const.utf-16-str
, etc, or an encoding specified per line). This similarly would make it harder for folks -- such as myself :) -- to misinterpret the encoding of the contents of the partition.The text was updated successfully, but these errors were encountered: