Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to tiff crate #17

Merged
merged 2 commits into from
Sep 7, 2024

Conversation

gschulze
Copy link
Collaborator

@gschulze gschulze commented Sep 3, 2024

This PR migrates everything to the tiff crate, doing significant simplification and cleanup along the way.

  • The TIFF struct has been renamed to GeoTIFF. The functionality in reader.rs and lowlevel.rs has been removed; reading TIFF data is accomplished through the read function in the struct implementation, which directly calls the functionality provided by the tiff crate.
  • Obtaining sample values is provided by the get_value_at<T>(x: usize, y: usize, sample: usize) function. The desired output type is specified by the generic parameter T. The sample parameter has been added to the function signature to allow retrieving data from multi-channel TIFFs. The function panics if the value is not representable as the desired output type or the requested sample is out of range.

This refactoring aims to reduce the code to the bare minimum needed to provide the functionality covered by the integration tests. The next step is to apply the correct coordinate transformation defined by the corresponding GeoTIFF tags.

@gschulze gschulze force-pushed the feature/migrate-to-tiff-crate branch 2 times, most recently from 580fbd7 to 10676f9 Compare September 3, 2024 12:26
@kylebarron
Copy link
Member

kylebarron commented Sep 3, 2024

This refactoring aims to reduce the code to the bare minimum needed to provide the functionality covered by the integration tests. The next step is to apply the correct coordinate transformation defined by the corresponding GeoTIFF tags.

In case you want a reference for parsing the geokey tags, I have an implementation here: https://github.com/developmentseed/aiocogeo-rs/blob/main/src/geo_key_directory.rs (it doesn't use the tiff crate for parsing the raw tags)

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gschulze for pulling this together! Just some minor comments for now, mostly related to variable names (yes, naming is hard), and a panic! that could probably be removed.

I'll be doing a Rust pair programming session with a colleague later, and might try to convince him to go though this PR so there might be a few more comments coming!

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated
Comment on lines 76 to 83
if sample >= *num_samples {
panic!(
"sample out of bounds: the number of samples is {} but the sample is {}",
num_samples, sample
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we raise an error (e.g. TiffError::LimitsExceeded) here instead of panicking?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was similar to accessing an array element, which also panics if the index is out of bounds. Also, returning a result might make it more cumbersome to use as a library consumer. What should be checked in addition is whether x and y are less than width and height, respectively. However, this will change again if we integrate coordinate transformations.

If we decide on returning a result from this function, I'd vote for having a separate GeoTiffError type. As far as I understand, the TiffError deals with errors related to encoding/decoding, whereas here, we are handling a potential programming error.

Copy link
Member

@weiji14 weiji14 Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was similar to accessing an array element, which also panics if the index is out of bounds.

A panic can happen if you access using data[index], but not if you use data.get(index) which returns an Option (see https://doc.rust-lang.org/std/vec/struct.Vec.html#indexing and https://doc.rust-lang.org/std/primitive.slice.html#method.get.

What should be checked in addition is whether x and y are less than width and height, respectively. However, this will change again if we integrate coordinate transformations.

Yes, ok with leaving x > width and y > height unchecked for now, and do it after coordinate transforms are implemented.

If we decide on returning a result from this function, I'd vote for having a separate GeoTiffError type. As far as I understand, the TiffError deals with errors related to encoding/decoding, whereas here, we are handling a potential programming error.

Don't want us to get sidetracked with implementing GeoTIFFError and GeoTIFFResult (though I'm sure that will come at some point). For now, I'm happy to compromise on keeping the panic here since it matches previous behaviour, and we can refactor later to return a Result instead of panic.

Edit: there's actually an issue at #6 that mentions panicking, which we could use to track this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just got distracted by the looks of GeoTIFFError and GeoTIFFResult. I think it should be written GeoTiffError and GeoTiffResult, according to the Rust naming conventions. Consequently, I believe the type in the crate should also be called GeoTiff.

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Show resolved Hide resolved
src/lib.rs Outdated
Comment on lines 31 to 36
pub struct GeoTIFF {
pub dimensions: (usize, usize),
pub num_samples: usize,
image_data: ImageData,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought I had was whether we should just expose the tiff Decoder, and have the attributes like dimensions, num_channels, etc be accessible via methods only. Thoughts on this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that Decoder<R> consumes an underlying stream of type R, I think this would prevent us from implementing write support in the future.

For instance, I have a use case where I need to create a new, empty GeoTIFF, which is then populated with data. Finally, I want to save it to a file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I've been thinking in read-only mode for now, but you're correct that we can't put in Decoder<R> if implementing write support later.

For instance, I have a use case where I need to create a new, empty GeoTIFF, which is then populated with data. Finally, I want to save it to a file.

Secretly, I would love to see this implemented soon, mostly just so we can write some roundtrip read/write tests and not have to store sample GeoTIFF files in this repo!

@gschulze gschulze force-pushed the feature/migrate-to-tiff-crate branch from 10676f9 to 1d3b66d Compare September 6, 2024 07:47
@gschulze gschulze force-pushed the feature/migrate-to-tiff-crate branch from 1d3b66d to 9d74e3d Compare September 6, 2024 07:54
@gschulze
Copy link
Collaborator Author

gschulze commented Sep 6, 2024

Hi @weiji14, I've incorporated your feedback and force-pushed the branch. I've renamed the GeoTIFFtype to GeoTiff, to follow the Rust naming conventions, if you don't mind.

I wonder whether we should rename ImageData to RasterData to align more closely with raster_width and raster_height. What do you think?

Besides that, I think the PR is ready for merging.

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @weiji14, I've incorporated your feedback and force-pushed the branch. I've renamed the GeoTIFFtype to GeoTiff, to follow the Rust naming conventions, if you don't mind.

Awesome, thanks @gschulze! I do need to check https://rust-lang.github.io/api-guidelines/naming.html more often 😄 Will merge this in after you do the ImageData -> RasterData rename (probably after the weekend).

On the force-pushing, I'm a little against it as a reviewer because it undos all the Viewed checkboxes I tick in the 'Files Changed' tab, and somewhat forces me to have to carefully look over everything again if I'm checking things locally on my diff viewer or on GitHub mobile (it's a little better on the GitHub webpage where I can click on 'Compare'). Also, I tend to squash merge so many little commits don't matter, but if you prefer regular merges or rebase+merge, I can go with that since you've only got one commit here.

Comment on lines +18 to +20
assert_eq!(geotiff.get_value_at::<u8>(761, 599, 0), 147);
assert_eq!(geotiff.get_value_at::<u8>(761, 599, 1), 128);
assert_eq!(geotiff.get_value_at::<u8>(761, 599, 2), 165);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went and double-checked this in QGIS, values are correct!

image

use std::fmt;
use std::fmt::{Debug, Formatter};

pub(super) enum ImageData {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should rename ImageData to RasterData to align more closely with raster_width and raster_height. What do you think?

Yep, ok with renaming to RasterData.

I had a half-thought about marking this enum as non_exhaustive in case we need to support complex numbers later (CInt16, CInt32, CFloat32 and CFloat64 are listed at https://gdal.org/en/latest/drivers/raster/gtiff.html#gtiff-geotiff-file-format), but https://docs.rs/tiff/0.9.1/tiff/decoder/enum.DecodingResult.html doesn't list those yet so nevermind for now.

@gschulze
Copy link
Collaborator Author

gschulze commented Sep 7, 2024

On the force-pushing, I'm a little against it as a reviewer because it undos all the Viewed checkboxes I tick in the 'Files Changed' tab, and somewhat forces me to have to carefully look over everything again if I'm checking things locally on my diff viewer or on GitHub mobile (it's a little better on the GitHub webpage where I can click on 'Compare'). Also, I tend to squash merge so many little commits don't matter, but if you prefer regular merges or rebase+merge, I can go with that since you've only got one commit here.

Understandable. Just tell me the workflow you prefer. I usually try to do commits that are somehow semantically complete. That's why I try to avoid "Typo"-level commits. I'm okay with you squashing them together later.

@gschulze
Copy link
Collaborator Author

gschulze commented Sep 7, 2024

@weiji14 Renamed 'ImageData' to 'RasterData'

@weiji14 weiji14 merged commit 03b2b6a into georust:main Sep 7, 2024
1 check passed
@weiji14 weiji14 added this to the v0.1.0 milestone Sep 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants