[FEA] Throw exception on non utf-8 charsets during io operations #2407

ayushdg · 2019-07-26T19:49:11Z

What is your question?
Often when reading from files, the data is not utf-8 encoded. In such situations pandas throws a clear utf-8 decoding error unless the encoding is specified.

Cudf read_csv on the other hand reads the data without throwing any errors. I can run operations like groupby, etc on my Dataframe. But trying to print or convert to pandas throws the utf-8 errors.

Would it make sense to throw an error if we encounter non utf-8 charsets (like iso-8859-1 encoding)?
Since cudf does not allow specifying encoding. Would it make more sense to let the behavior stay and allow users to do operations with df's and throw errors while print, converting to string, pandas etc.
Or possibly another option could raise some sort of warning mentioning that the data contains non-utf8 chars (if that's even possible) and let everything run as usual.

The text was updated successfully, but these errors were encountered:

ayushdg added Needs Triage Need team to review and classify question Further information is requested labels Jul 26, 2019

kkraus14 added cuIO cuIO issue and removed Needs Triage Need team to review and classify labels Jul 29, 2019

kkraus14 added feature request New feature or request and removed question Further information is requested labels Jul 29, 2019

kkraus14 changed the title ~~[QST] Handling non utf-8 charsets during io operations~~ [FEA] Throw exception on non utf-8 charsets during io operations Jul 29, 2019

GregoryKimball mentioned this issue Jun 27, 2022

[FEA] CuDF additional encoding support #2957

Closed

davidwendt mentioned this issue Jan 9, 2023

[BUG] read_csv() got an unexpected keyword argument 'encoding' #12412

Open

GregoryKimball added the libcudf Affects libcudf (C++/CUDA) code. label Apr 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Throw exception on non utf-8 charsets during io operations #2407

[FEA] Throw exception on non utf-8 charsets during io operations #2407

ayushdg commented Jul 26, 2019

[FEA] Throw exception on non utf-8 charsets during io operations #2407

[FEA] Throw exception on non utf-8 charsets during io operations #2407

Comments

ayushdg commented Jul 26, 2019