Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't do certain queries on a DataFrame prior to LSDB import #552

Open
2 of 3 tasks
gitosaurus opened this issue Jan 28, 2025 · 0 comments
Open
2 of 3 tasks

Can't do certain queries on a DataFrame prior to LSDB import #552

gitosaurus opened this issue Jan 28, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@gitosaurus
Copy link
Contributor

Bug report

When wanting to import a DataFrame into LSDB, some queries have to wait until after the import or they produce an error.

import json
from pathlib import Path
import pandas as pd

tns_path = Path('./tns/tns-all.json')
with tns_path.open('rb') as f_in:
    tns_objs = json.load(f_in)
tns_df = pd.DataFrame(tns_objs)
tns_f = tns_df.query("name_prefix == 'AT' and type.isna()")
# This query cannot be done prior to import
tns_f = tns_f.query("source_group == 'ZTF'")
tns_db = lsdb.from_dataframe(tns_f, ra_column='ra', dec_column='declination')
tns_db

Error produced:

ValueError: Metadata mismatch found in `from_delayed`.

Partition type: `pandas.core.frame.DataFrame`
+----------------------+---------------+-----------------+
| Column               | Found         | Expected        |
+----------------------+---------------+-----------------+
| 'Class_ADS_bibcodes' | null[pyarrow] | string[pyarrow] |
+----------------------+---------------+-----------------+

If this last query, source_group == 'ZTF', is done after import into LSDB, it works fine:

tns_path = Path('./tns/tns-all.json')
with tns_path.open('rb') as f_in:
    tns_objs = json.load(f_in)
tns_df = pd.DataFrame(tns_objs)
tns_df.shape
tns_f = tns_df.query("name_prefix == 'AT' and type.isna()")
tns_db = lsdb.from_dataframe(tns_f, ra_column='ra', dec_column='declination')
tns_db = tns_db.query("source_group == 'ZTF'")

Before submitting
Please check the following:

  • I have described the situation in which the bug arose, including what code was executed, information about my environment, and any applicable data others will need to reproduce the problem.
  • I have included available evidence of the unexpected behavior (including error messages, screenshots, and/or plots) as well as a description of what I expected instead.
  • If I have a solution in mind, I have provided an explanation and/or pseudocode and/or task list.
@gitosaurus gitosaurus added the bug Something isn't working label Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

1 participant