Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Throw exception on non utf-8 charsets during io operations #2407

Open
ayushdg opened this issue Jul 26, 2019 · 0 comments
Open

[FEA] Throw exception on non utf-8 charsets during io operations #2407

ayushdg opened this issue Jul 26, 2019 · 0 comments
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@ayushdg
Copy link
Member

ayushdg commented Jul 26, 2019

What is your question?
Often when reading from files, the data is not utf-8 encoded. In such situations pandas throws a clear utf-8 decoding error unless the encoding is specified.

Cudf read_csv on the other hand reads the data without throwing any errors. I can run operations like groupby, etc on my Dataframe. But trying to print or convert to pandas throws the utf-8 errors.

Would it make sense to throw an error if we encounter non utf-8 charsets (like iso-8859-1 encoding)?
Since cudf does not allow specifying encoding. Would it make more sense to let the behavior stay and allow users to do operations with df's and throw errors while print, converting to string, pandas etc.
Or possibly another option could raise some sort of warning mentioning that the data contains non-utf8 chars (if that's even possible) and let everything run as usual.

@ayushdg ayushdg added Needs Triage Need team to review and classify question Further information is requested labels Jul 26, 2019
@kkraus14 kkraus14 added cuIO cuIO issue and removed Needs Triage Need team to review and classify labels Jul 29, 2019
@kkraus14 kkraus14 added feature request New feature or request and removed question Further information is requested labels Jul 29, 2019
@kkraus14 kkraus14 changed the title [QST] Handling non utf-8 charsets during io operations [FEA] Throw exception on non utf-8 charsets during io operations Jul 29, 2019
@GregoryKimball GregoryKimball added the libcudf Affects libcudf (C++/CUDA) code. label Apr 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

No branches or pull requests

3 participants