You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is your question?
Often when reading from files, the data is not utf-8 encoded. In such situations pandas throws a clear utf-8 decoding error unless the encoding is specified.
Cudf read_csv on the other hand reads the data without throwing any errors. I can run operations like groupby, etc on my Dataframe. But trying to print or convert to pandas throws the utf-8 errors.
Would it make sense to throw an error if we encounter non utf-8 charsets (like iso-8859-1 encoding)?
Since cudf does not allow specifying encoding. Would it make more sense to let the behavior stay and allow users to do operations with df's and throw errors while print, converting to string, pandas etc.
Or possibly another option could raise some sort of warning mentioning that the data contains non-utf8 chars (if that's even possible) and let everything run as usual.
The text was updated successfully, but these errors were encountered:
kkraus14
changed the title
[QST] Handling non utf-8 charsets during io operations
[FEA] Throw exception on non utf-8 charsets during io operations
Jul 29, 2019
What is your question?
Often when reading from files, the data is not utf-8 encoded. In such situations pandas throws a clear
utf-8 decoding error
unless the encoding is specified.Cudf read_csv on the other hand reads the data without throwing any errors. I can run operations like groupby, etc on my Dataframe. But trying to print or convert to pandas throws the utf-8 errors.
Would it make sense to throw an error if we encounter non utf-8 charsets (like iso-8859-1 encoding)?
Since cudf does not allow specifying encoding. Would it make more sense to let the behavior stay and allow users to do operations with df's and throw errors while print, converting to string, pandas etc.
Or possibly another option could raise some sort of warning mentioning that the data contains non-utf8 chars (if that's even possible) and let everything run as usual.
The text was updated successfully, but these errors were encountered: