Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError when converting ZTF Lightcurves from hipscat -> hats #458

Closed
3 tasks done
troyraen opened this issue Dec 14, 2024 · 3 comments · Fixed by astronomy-commons/hats#447
Closed
3 tasks done
Assignees
Labels
bug Something isn't working

Comments

@troyraen
Copy link
Collaborator

Bug report

I ran into a TypeError when following Converting from HiPSCat directions with the ZTF DR20 Lightcurves dataset. The error happens when hats does healpix.radec2pix, which calls cdshealpix.lonlat_to_healpix.

Environment:

$ python --version
Python 3.12.7
$
$ pip freeze | grep hats
hats==0.4.5
hats-import==0.4.3

Full traceback:

2024-12-08 05:41:51,160 - distributed.worker.memory - WARNING - Worker is at 80% memory usage. Pausing worker.  Process memory: 2.81 GiB -- Worker memory limit: 3.50 GiB
2024-12-08 05:41:51,667 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 2.48 GiB -- Worker memory limit: 3.50 GiB
  worker address: tcp://127.0.0.1:44061
argument 'lon': 'ndarray' object cannot be converted to 'PyArray<T, D>'
2024-12-08 05:41:51,897 - distributed.worker.state_machine - WARNING - Async instruction for <Task cancelled name="execute('_convert_partition_file-dc2498719e37576e2a6124882e2cc83d')" coro=<Worker.execute() done, defined at /home/traen/anaconda3/envs/hats312/lib/python3.12/site-packages/distributed/worker_state_machine.py:3609>> ended with CancelledError
2024-12-08 05:41:51,898 - distributed.worker.state_machine - WARNING - Async instruction for <Task cancelled name="execute('_convert_partition_file-3056bf0c326d73055de8e25dc6075e1c')" coro=<Worker.execute() done, defined at /home/traen/anaconda3/envs/hats312/lib/python3.12/site-packages/distributed/worker_state_machine.py:3609>> ended with CancelledError
2024-12-08 05:41:51,899 - distributed.worker.state_machine - WARNING - Async instruction for <Task cancelled name="execute('_convert_partition_file-a2b7e055171791bb77ded708948824ab')" coro=<Worker.execute() done, defined at /home/traen/anaconda3/envs/hats312/lib/python3.12/site-packages/distributed/worker_state_machine.py:3609>> ended with CancelledError
2024-12-08 05:41:51,899 - distributed.worker.state_machine - WARNING - Async instruction for <Task cancelled name="execute('_convert_partition_file-416f54321eff974bc675b988f0d45ec5')" coro=<Worker.execute() done, defined at /home/traen/anaconda3/envs/hats312/lib/python3.12/site-packages/distributed/worker_state_machine.py:3609>> ended with CancelledError
2024-12-08 05:41:51,903 - distributed.worker.state_machine - WARNING - Async instruction for <Task cancelled name="execute('_convert_partition_file-4857d6f4d63a01eea8ccd166c3a79b7d')" coro=<Worker.execute() done, defined at /home/traen/anaconda3/envs/hats312/lib/python3.12/site-packages/distributed/worker_state_machine.py:3609>> ended with CancelledError
2024-12-08 05:41:51,904 - distributed.worker.state_machine - WARNING - Async instruction for <Task cancelled name="execute('_convert_partition_file-9b6a54e3566fe60102edf05cbc7769b5')" coro=<Worker.execute() done, defined at /home/traen/anaconda3/envs/hats312/lib/python3.12/site-packages/distributed/worker_state_machine.py:3609>> ended with CancelledError
2024-12-08 05:41:51,906 - distributed.worker.state_machine - WARNING - Async instruction for <Task cancelled name="execute('_convert_partition_file-306185ba71118e9d75bcd74c9cd17569')" coro=<Worker.execute() done, defined at /home/traen/anaconda3/envs/hats312/lib/python3.12/site-packages/distributed/worker_state_machine.py:3609>> ended with CancelledError
  worker address: tcp://127.0.0.1:44217
argument 'lon': 'ndarray' object cannot be converted to 'PyArray<T, D>'
  worker address: tcp://127.0.0.1:36887
argument 'lon': 'ndarray' object cannot be converted to 'PyArray<T, D>'
  worker address: tcp://127.0.0.1:46057
argument 'lon': 'ndarray' object cannot be converted to 'PyArray<T, D>'
  worker address: tcp://127.0.0.1:35783
argument 'lon': 'ndarray' object cannot be converted to 'PyArray<T, D>'
  worker address: tcp://127.0.0.1:43521
argument 'lon': 'ndarray' object cannot be converted to 'PyArray<T, D>'
2024-12-08 05:41:55,895 - distributed.nanny - WARNING - Worker process still alive after 4.0 seconds, killing
2024-12-08 05:41:55,896 - distributed.nanny - WARNING - Worker process still alive after 4.0 seconds, killing
2024-12-08 05:41:55,896 - distributed.nanny - WARNING - Worker process still alive after 4.0 seconds, killing
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File <timed eval>:1

File ~/anaconda3/envs/hats312/lib/python3.12/site-packages/hats_import/pipeline.py:32, in pipeline(args)
     26 """Pipeline that creates its own client from the provided runtime arguments"""
     27 with Client(
     28     local_directory=args.dask_tmp,
     29     n_workers=args.dask_n_workers,
     30     threads_per_worker=args.dask_threads_per_worker,
     31 ) as client:
---> 32     pipeline_with_client(args, client)

File ~/anaconda3/envs/hats312/lib/python3.12/site-packages/hats_import/pipeline.py:64, in pipeline_with_client(args, client)
     62 if args.completion_email_address:
     63     _send_failure_email(args, exception)
---> 64 raise exception

File ~/anaconda3/envs/hats312/lib/python3.12/site-packages/hats_import/pipeline.py:56, in pipeline_with_client(args, client)
     54     verification_runner.run(args)
     55 elif isinstance(args, ConversionArguments):
---> 56     conversion_runner.run(args, client)
     57 else:
     58     raise ValueError("unknown args type")

File ~/anaconda3/envs/hats312/lib/python3.12/site-packages/hats_import/hipscat_conversion/run_conversion.py:88, in run(args, client)
     80 for future in print_progress(
     81     as_completed(futures),
     82     stage_name="Converting Parquet",
   (...)
     85     simple_progress_bar=args.simple_progress_bar,
     86 ):
     87     if future.status == "error":
---> 88         raise future.exception()
     90 with print_progress(
     91     total=4,
     92     stage_name="Finishing",
     93     use_progress_bar=args.progress_bar,
     94     simple_progress_bar=args.simple_progress_bar,
     95 ) as step_progress:
     96     total_rows = parquet_metadata.write_parquet_metadata(args.catalog_path)

File ~/anaconda3/envs/hats312/lib/python3.12/site-packages/distributed/worker.py:3005, in apply_function_simple()
   3000 with (
   3001     context_meter.meter("thread-noncpu", func=time) as m,
   3002     context_meter.meter("thread-cpu", func=thread_time),
   3003 ):
   3004     try:
-> 3005         result = function(*args, **kwargs)
   3006     except (SystemExit, KeyboardInterrupt):
   3007         # Special-case these, just like asyncio does all over the place. They will
   3008         # pass through `fail_hard` and `_handle_stimulus_from_task`, and eventually
   (...)
   3011         # Any other `BaseException` types would ultimately be ignored by asyncio if
   3012         # raised here, after messing up the worker state machine along their way.
   3013         raise

File ~/anaconda3/envs/hats312/lib/python3.12/site-packages/hats_import/hipscat_conversion/run_conversion.py:155, in _convert_partition_file()
    153     pass
    154 dask_print(exception)
--> 155 raise exception

File ~/anaconda3/envs/hats312/lib/python3.12/site-packages/hats_import/hipscat_conversion/run_conversion.py:133, in _convert_partition_file()
    124 table = pq.read_table(input_file, schema=schema)
    125 num_rows = len(table)
    127 table = (
    128     table.drop_columns(["_hipscat_index", "Norder", "Dir", "Npix"])
    129     .add_column(
    130         0,
    131         "_healpix_29",
    132         [
--> 133             healpix.radec2pix(
    134                 29,
    135                 table[ra_column].to_numpy(),
    136                 table[dec_column].to_numpy(),
    137             )
    138         ],
    139     )
    140     .append_column("Norder", [np.full(num_rows, fill_value=pixel.order, dtype=np.int8)])
    141     .append_column("Dir", [np.full(num_rows, fill_value=pixel.dir, dtype=np.int64)])
    142     .append_column("Npix", [np.full(num_rows, fill_value=pixel.pixel, dtype=np.int64)])
    143 )
    144 table = table.replace_schema_metadata()
    146 destination_file = paths.pixel_catalog_file(args.catalog_path, pixel)

File ~/anaconda3/envs/hats312/lib/python3.12/site-packages/hats/pixel_math/healpix_shim.py:68, in radec2pix()
     65 ra = Longitude(ra, unit="deg")
     66 dec = Latitude(dec, unit="deg")
---> 68 return cdshealpix.lonlat_to_healpix(ra, dec, order).astype(np.int64)

File ~/anaconda3/envs/hats312/lib/python3.12/site-packages/cdshealpix/nested/healpix.py:134, in lonlat_to_healpix()
    132 depth = depth.astype(np.uint8)
    133 num_threads = np.uint16(num_threads)
--> 134 cdshealpix.lonlat_to_healpix(depth, lon, lat, ipix, dx, dy, num_threads)
    136 if return_offsets:
    137     return ipix, dx, dy

TypeError: argument 'lon': 'ndarray' object cannot be converted to 'PyArray<T, D>'

This does not seem related to the dataset schema, but FWIW, the schema is:

_hipscat_index: uint64
objectid: int64
filterid: int8
fieldid: int16
rcid: int8
objra: float
objdec: float
nepochs: int64
hmjd: double
mag: float
magerr: float
clrcoeff: float
catflags: int32
Norder: uint8
Dir: uint64
Npix: uint64

Have you seen this error before and/or do you have any suggestions?

It would be nice to convert this 10T dataset to HATS format to test #428 on a large dataset, but it is not crucial -- we're not planning to release this version of ZTF Lightcurves and I have smaller datasets I can use for testing purposes.

Before submitting
Please check the following:

  • I have described the situation in which the bug arose, including what code was executed, information about my environment, and any applicable data others will need to reproduce the problem.
  • I have included available evidence of the unexpected behavior (including error messages, screenshots, and/or plots) as well as a description of what I expected instead.
  • If I have a solution in mind, I have provided an explanation and/or pseudocode and/or task list.
@troyraen troyraen added the bug Something isn't working label Dec 14, 2024
@nevencaplar nevencaplar moved this to Todo in HATS / LSDB Dec 15, 2024
@hombit
Copy link
Contributor

hombit commented Dec 16, 2024

I agree that it's an issue, but I find it a bit strange that float32 was chosen for RA/Dec in the first place. For RA~360, the resolution would be approximately 6'', which is significantly higher than both ZTF astrometry precision (<0.2'') and image FWHM (~2'').

@troyraen
Copy link
Collaborator Author

Hmm, that's interesting. Those are the same types as in the mission-produced parquet files. (Unchanged when converted to hipscat.)

@hombit
Copy link
Contributor

hombit commented Dec 19, 2024

@troyraen sorry, my math was wrong, it is actually 0.1'' resolution, which is totally fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
3 participants