Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]Cannot query dataframes with categorical columns #3961

Open
taureandyernv opened this issue Jan 28, 2020 · 0 comments
Open

[BUG]Cannot query dataframes with categorical columns #3961

taureandyernv opened this issue Jan 28, 2020 · 0 comments
Assignees
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@taureandyernv
Copy link
Contributor

taureandyernv commented Jan 28, 2020

Describe the bug
if i do a simple query on a categorical column, i get an error stating that This error is usually caused by passing an argument of a type that is unsupported by the named function.

Steps/Code to reproduce bug

import cudf
import pandas as pd

fn = 'test.csv'
lines = """id1,id2
1,45
2,3
3, 7
1, 25
"""
with open(fn, 'w') as fp:
    fp.write(lines)
pdf = pd.read_csv(fn, header=0, dtype={"id1":"category", "id2":"int32"})
cdf = cudf.read_csv(fn, header=0, dtype={"id1":"int32", "id2":"int32"}) #see #3960 for why i have to do this
cdf['id1'] = cdf['id1'].astype("category")
pdf.query("id1 == ['1'] and id2 == 45")
cdf.query("id1 == ['1'] and id2 == 45")

The cdf query outputs a rather large error

---------------------------------------------------------------------------
TypingError                               Traceback (most recent call last)
<ipython-input-27-28a794912e6e> in <module>
----> 1 cdf2.query("id1 == ['1'] and id2 == 45")

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py in query(self, expr, local_dict)
   2893         }
   2894         # Run query
-> 2895         boolmask = queryutils.query_execute(self, expr, callenv)
   2896 
   2897         selected = Series(boolmask)

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/utils/queryutils.py in query_execute(df, expr, callenv)
    223     # run kernel
    224     args = [out] + colarrays + envargs
--> 225     kernel.forall(nrows)(*args)
    226     out_mask = applyutils.make_aggregate_nullmask(df, columns=columns)
    227     if out_mask is not None:

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/compiler.py in __call__(self, *args)
    264     def __call__(self, *args):
    265         if isinstance(self.kernel, AutoJitCUDAKernel):
--> 266             kernel = self.kernel.specialize(*args)
    267         else:
    268             kernel = self.kernel

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/compiler.py in specialize(self, *args)
    808         argtypes = tuple(
    809             [self.typingctx.resolve_argument_type(a) for a in args])
--> 810         kernel = self.compile(argtypes)
    811         return kernel
    812 

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/compiler.py in compile(self, sig)
    824                 self.targetoptions['link'] = ()
    825             kernel = compile_kernel(self.py_func, argtypes,
--> 826                                     **self.targetoptions)
    827             self.definitions[(cc, argtypes)] = kernel
    828             if self.bind:

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/compiler_lock.py in _acquire_compile_lock(*args, **kwargs)
     30         def _acquire_compile_lock(*args, **kwargs):
     31             with self:
---> 32                 return func(*args, **kwargs)
     33         return _acquire_compile_lock
     34 

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/compiler.py in compile_kernel(pyfunc, args, link, debug, inline, fastmath, extensions, max_registers)
     60 def compile_kernel(pyfunc, args, link, debug=False, inline=False,
     61                    fastmath=False, extensions=[], max_registers=None):
---> 62     cres = compile_cuda(pyfunc, types.void, args, debug=debug, inline=inline)
     63     fname = cres.fndesc.llvm_func_name
     64     lib, kernel = cres.target_context.prepare_cuda_kernel(cres.library, fname,

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/compiler_lock.py in _acquire_compile_lock(*args, **kwargs)
     30         def _acquire_compile_lock(*args, **kwargs):
     31             with self:
---> 32                 return func(*args, **kwargs)
     33         return _acquire_compile_lock
     34 

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/compiler.py in compile_cuda(pyfunc, return_type, args, debug, inline)
     49                                   return_type=return_type,
     50                                   flags=flags,
---> 51                                   locals={})
     52 
     53     library = cres.library

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/compiler.py in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals, library, pipeline_class)
    526     pipeline = pipeline_class(typingctx, targetctx, library,
    527                               args, return_type, flags, locals)
--> 528     return pipeline.compile_extra(func)
    529 
    530 

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/compiler.py in compile_extra(self, func)
    324         self.state.lifted = ()
    325         self.state.lifted_from = None
--> 326         return self._compile_bytecode()
    327 
    328     def compile_ir(self, func_ir, lifted=(), lifted_from=None):

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/compiler.py in _compile_bytecode(self)
    383         """
    384         assert self.state.func_ir is None
--> 385         return self._compile_core()
    386 
    387     def _compile_ir(self):

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/compiler.py in _compile_core(self)
    363                 self.state.status.fail_reason = e
    364                 if is_final_pipeline:
--> 365                     raise e
    366         else:
    367             raise CompilerError("All available pipelines exhausted")

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/compiler.py in _compile_core(self)
    354             res = None
    355             try:
--> 356                 pm.run(self.state)
    357                 if self.state.cr is not None:
    358                     break

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/compiler_machinery.py in run(self, state)
    326                     (self.pipeline_name, pass_desc)
    327                 patched_exception = self._patch_error(msg, e)
--> 328                 raise patched_exception
    329 
    330     def dependency_analysis(self):

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/compiler_machinery.py in run(self, state)
    317                 pass_inst = _pass_registry.get(pss).pass_inst
    318                 if isinstance(pass_inst, CompilerPass):
--> 319                     self._runPass(idx, pass_inst, state)
    320                 else:
    321                     raise BaseException("Legacy pass in use")

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/compiler_lock.py in _acquire_compile_lock(*args, **kwargs)
     30         def _acquire_compile_lock(*args, **kwargs):
     31             with self:
---> 32                 return func(*args, **kwargs)
     33         return _acquire_compile_lock
     34 

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/compiler_machinery.py in _runPass(self, index, pss, internal_state)
    279             mutated |= check(pss.run_initialization, internal_state)
    280         with SimpleTimer() as pass_time:
--> 281             mutated |= check(pss.run_pass, internal_state)
    282         with SimpleTimer() as finalize_time:
    283             mutated |= check(pss.run_finalizer, internal_state)

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/compiler_machinery.py in check(func, compiler_state)
    266 
    267         def check(func, compiler_state):
--> 268             mangled = func(compiler_state)
    269             if mangled not in (True, False):
    270                 msg = ("CompilerPass implementations should return True/False. "

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/typed_passes.py in run_pass(self, state)
     92                 state.args,
     93                 state.return_type,
---> 94                 state.locals)
     95             state.typemap = typemap
     96             state.return_type = return_type

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/typed_passes.py in type_inference_stage(typingctx, interp, args, return_type, locals)
     64 
     65         infer.build_constraint()
---> 66         infer.propagate()
     67         typemap, restype, calltypes = infer.unify()
     68 

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/typeinfer.py in propagate(self, raise_errors)
    949                                   if isinstance(e, ForceLiteralArg)]
    950                 if not force_lit_args:
--> 951                     raise errors[0]
    952                 else:
    953                     raise reduce(operator.or_, force_lit_args)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x7f914147a320>) with argument(s) of type(s): (int32, int32)
 * parameterized
In definition 0:
    TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<built-in function eq>) with argument(s) of type(s): (int32, list(unicode_type))
Known signatures:
 * (bool, bool) -> bool
 * (int8, int8) -> bool
 * (int16, int16) -> bool
 * (int32, int32) -> bool
 * (int64, int64) -> bool
 * (uint8, uint8) -> bool
 * (uint16, uint16) -> bool
 * (uint32, uint32) -> bool
 * (uint64, uint64) -> bool
 * (float32, float32) -> bool
 * (float64, float64) -> bool
 * (complex64, complex64) -> bool
 * (complex128, complex128) -> bool
 * parameterized
In definition 0:
    All templates rejected with literals.
In definition 1:
    All templates rejected without literals.
In definition 2:
    All templates rejected with literals.
In definition 3:
    All templates rejected without literals.
In definition 4:
    All templates rejected with literals.
In definition 5:
    All templates rejected without literals.
In definition 6:
    All templates rejected with literals.
In definition 7:
    All templates rejected without literals.
In definition 8:
    All templates rejected with literals.
In definition 9:
    All templates rejected without literals.
In definition 10:
    All templates rejected with literals.
In definition 11:
    All templates rejected without literals.
In definition 12:
    All templates rejected with literals.
In definition 13:
    All templates rejected without literals.
In definition 14:
    All templates rejected with literals.
In definition 15:
    All templates rejected without literals.
In definition 16:
    All templates rejected with literals.
In definition 17:
    All templates rejected without literals.
In definition 18:
    All templates rejected with literals.
In definition 19:
    All templates rejected without literals.
This error is usually caused by passing an argument of a type that is unsupported by the named function.
[1] During: typing of intrinsic-call at <string> (2)

File "<string>", line 2:
<source missing, REPL/exec in use?>

    raised from /opt/conda/envs/rapids/lib/python3.7/site-packages/numba/typeinfer.py:951
In definition 1:
    TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<built-in function eq>) with argument(s) of type(s): (int32, list(unicode_type))
Known signatures:
 * (bool, bool) -> bool
 * (int8, int8) -> bool
 * (int16, int16) -> bool
 * (int32, int32) -> bool
 * (int64, int64) -> bool
 * (uint8, uint8) -> bool
 * (uint16, uint16) -> bool
 * (uint32, uint32) -> bool
 * (uint64, uint64) -> bool
 * (float32, float32) -> bool
 * (float64, float64) -> bool
 * (complex64, complex64) -> bool
 * (complex128, complex128) -> bool
 * parameterized
In definition 0:
    All templates rejected with literals.
In definition 1:
    All templates rejected without literals.
In definition 2:
    All templates rejected with literals.
In definition 3:
    All templates rejected without literals.
In definition 4:
    All templates rejected with literals.
In definition 5:
    All templates rejected without literals.
In definition 6:
    All templates rejected with literals.
In definition 7:
    All templates rejected without literals.
In definition 8:
    All templates rejected with literals.
In definition 9:
    All templates rejected without literals.
In definition 10:
    All templates rejected with literals.
In definition 11:
    All templates rejected without literals.
In definition 12:
    All templates rejected with literals.
In definition 13:
    All templates rejected without literals.
In definition 14:
    All templates rejected with literals.
In definition 15:
    All templates rejected without literals.
In definition 16:
    All templates rejected with literals.
In definition 17:
    All templates rejected without literals.
In definition 18:
    All templates rejected with literals.
In definition 19:
    All templates rejected without literals.
This error is usually caused by passing an argument of a type that is unsupported by the named function.
[1] During: typing of intrinsic-call at <string> (2)

File "<string>", line 2:
<source missing, REPL/exec in use?>

    raised from /opt/conda/envs/rapids/lib/python3.7/site-packages/numba/typeinfer.py:951
This error is usually caused by passing an argument of a type that is unsupported by the named function.
[1] During: resolving callee type: Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x7f914147a320>)
[2] During: typing of call at <string> (6)


File "<string>", line 6:
<source missing, REPL/exec in use?>

Expected behavior
I expect it to output similar to the pdf.query, pdf.query("id1 == ['1'] and id2 == 45")

id1 id2
1 45

Environment overview (please complete the following information)

  • Environment location: [Docker]
  • Method of cuDF install: [Docker]

Additional context
Converting from cudf to pandas to do the query also inexplicitly fails

tdf = cdf.to_pandas()
tdf['id1']

will output correctly with

0    1
1    2
2    3
3    1
Name: id1, dtype: category
Categories (3, int64): [1, 2, 3]

but when you run the query...

tdf.query("id1 == ['1'] and id2 == 45")

Outputs an empty table

  id1 id2
@taureandyernv taureandyernv added Needs Triage Need team to review and classify bug Something isn't working labels Jan 28, 2020
@kkraus14 kkraus14 added Python Affects Python cuDF API. feature request New feature or request and removed Needs Triage Need team to review and classify bug Something isn't working labels Jan 28, 2020
@galipremsagar galipremsagar self-assigned this Jun 21, 2020
@vyasr vyasr added this to cuDF Python Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
Status: Todo
Development

No branches or pull requests

3 participants