-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Projections and read velocity #40
Comments
Hi Pablo! Reading buffer Projections and re-projections |
I am sending here links to the 'paged' reader.... it reads blocks instead of bytes and allows for random access:
About projections... I've implemented this code to reproject your array structures: https://github.com/poblaciones/poblaciones/blob/db66311c00877a1399c190e8427d8f597df095b3/services/src/classes/Projections.php, It could serve as an example of converting outside you library. However, for 50k polygons, it takes a while to process... that's why I would have liked to have a way to inject the "ProjectPoint" function at some point in the loading process, to avoid rebuilding the whole object tree just to make the coordinates mapping work (i.e. to have a hook/callback somewhere you would still have your library standalone, which is a great feature... maybe the reader could have an optional 'translate' function? I guess all points that enters the loading process pass through "public function addPoint(Point $Point)" in multipoint... but i am not 100% sure about that). |
I've just give a very quick look at it (tomorrow and the day after I'll be away, so I won't be able to properly dig into it until the weekend, sorry). First thought are that some kind of ShapefileFileReaderInterface is needed on my side. Your About projections, I haven't looked at it too much, but why have you gone for this complex route? Wouldn't it be much easier and clean to rely on public PS: A quick thought out of the box. If ultimate performance is what you're after, probably a quick pass through SpatiaLite is your best bet. Read the shapefile with my library, feed geometries as WKT as provided by the library to an in-memory SQLite database ( |
Alright, I've been thinking about the first point and I believe the best and most elegant approach is indeed adding a You can write your own implementation and pass the initalized objects to About the second point: as I wrote before, projections and re-projections are definitely out of the scope of this library. Trying to integrate anything within the library would open a huge can of worms, starting with proper PRJ file parsing (different shapefile generators are producing some wild files out there...). PS: In case you absolutely wanted to stick with your implementation of |
Hi! Sorry for the delay. A FileInterface seems like a nice/low-impact solution. It could be great to make a few tests about what is a good default buffer size for paged reading. About the ways to avoid an overhead for projections (and get into a more concise user code), may be to add to all get* methods (getArray, getJson, etc) an optional parameter for translations/projections representing a callback (a "callable", such as in usort (https://www.php.net/manual/es/function.uasort.php).... such as: (changes in bold)
where projectionFn would be a callback that receives a point (or x, y, z) and returms the point transformed (projected, with an offset, or wathever transformation it could require... Another similar strategy would be to add a ->Transform method to all geometries, that may receive a similar callback and could iterate through internal point info applying the call to the callback for all points..... |
No problem, I had time to code and deploy that $ShapefileReader = new ShapefileReader([
Shapefile::FILE_SHP => new VirtualFileReader('path/to/file.shp', false),
Shapefile::FILE_SHX => new VirtualFileReader('path/to/file.shx', false),
Shapefile::FILE_DBF => new VirtualFileReader('path/to/file.dbf', false),
Shapefile::FILE_PRJ => new VirtualFileReader('path/to/file.prj', false),
Shapefile::FILE_CPG => new VirtualFileReader('path/to/file.cpg', false),
], [
/* Other options here */
]); You can use master branch to perform some tests, I'll release it with v.3.5.0 as soon as we're finished with everything here ;) By the way, I've given a look at your Now, onto the projection topic (maybe 2 separate issues would have helped keeping things cleaner? No big deal though). Forget about the first comment, it turns out it was exactly what you were doing already ( Your latest suggestions about adding a parameter to output methods or even some kind of Have you already thought about the pros and cons of both scenarios? I mean:
|
Thanks for the isopen fix. True about better with two tickets =) I did not try the sqlite strategy mainly because even if it would be performant, I am afraid it would add yet more apis to the code, and i think that may reduce maintainability to future developers... and even when i care about performance, it's not the main goal of the app to import or transform files... so I try to have same balance there. I believe "Adding a transform method to Geometry abstract class" may lead into mover flexible user code... if someone wants to apply two or more transformations, he could make them in different calls. With the getArray/getWkt thing, everything should be done at once... and (second problem) it mixes the export abilities to the transformation ones (e.g. if someone would want to open a shapefile, reproject and save to a shapefile again [a reasonable user case] it me be in trouble with this get* methods, while Transform method would be more than ok to so....). If you would add interfaces to a ->Transform(ITransformer transformer) method, it may be ok to distinguish between IPointTransformers and IGeometryTransformers... (projections would be IPointTransforms, as one don't care about the whole shapes or rings... just points).... If your are interested in examples other than projections, all css-transform cases may apply... (ej. scale the geometry, move the geometry)... Although I thinks projections would be the most common need in a shapefile library.... |
Hi there! So, I've been thinking about it, and probably the simplest approach is the best one, meaning a trivial namespace Shapefile\Geometry;
interface PointTranslator
{
/**
* Translates a Point.
*
* @param \Shapefile\Geometry\Point $Point
*
* @return \Shapefile\Geometry\Point
*/
public function translate(Point $Point);
} Final usage would be something like: The idea is that any geometry translation would come down to a simple point coordinates translation, thus let the libraray handle everything under the hoods while everything external translators need to know is how to translate a point. Now, onto the less obvious things: By the way, what about the |
The PointTranslator seems great. About the PagedFileReader, it would be nice if it could load by default for reading files. I made benchmarks, selecting different shapefiles. Below is the table showing how large the dbf and shp files were, the # of rows and how many lines per second the library fetched. In all tests the file was most probably in memory by the OS (windows), because I triggered a full read before the test to have that effect "controlled". I think that is realistic for web scenarios (i.e. when uploading a zip with shapefile and expand, or upload a file as attachment, php writes to disk and the OS may keep that in filesystem's caches). Additionally, I would have had to restart the computer before each benchmark to test otherwise, and I did not wanted to =) The main result is that there is no benefit on my code. When I first implemented the reader, I measured improvements while importing the file in the my local (dev) webserver. However, running the tests on a sample of different files, from a shell of my windows desktop pc (16gb ram) and in a shared linux hosting, both lead to better results in the raw fread() implementation. After such results, I would abandon the IFile interface, unless you would want to run further tests before discarding it. |
Pero, no habíamos quedado en que con ese buffer se podía leer 2 veces más rápido? Si fuera malo, ahora sería un buen momento para soltarte un mira que te lo dije... jejejeje Honestly, these results don't suprise me too much: when I added the writing buffer feature to But worry not, I like this Now, onto the |
Jaja... sí. Apparently the previous improvement was due to specific features of the file I was using. After all, that's why benchmarking with many different type of files makes sense. It seems ok to think carefully about mutating or not the points. Hope you get into something elegant for the code inside and outside the library =) |
Hello guys! To add my 2 cents in this topic - |
Hi there. I let these features hanging around (actually, I committed the changes for I will try to finalise it by the end of the month or so. @DarkSide666 Using the |
Yeah, that's exactly what I need - legit ability to do that myself by defining my own translator (with your defined interface) and passing it to your library. |
Hi, it's me again. I have two pieces of code to share with you... it's not a merge request, as I prefer to discuss first if you are interested:
first one is about performance. we have exchanged messages about write performance a while ago. I am now implementing uploading shapefiles in my open source geo app (https://github.com/poblaciones/poblaciones). I've added 'virtualization' to freads by implementing a class that exposes fread, fseek and fopen but internally reads data by blocks of 40kb. It's only for reading.
projections: it takes little code to re-project the entities using proj4php after reading a shapefile file, and projections can be "the problem" of importing shapefiles. I know it's not in the basics of your library, but as a code example, or as a method (geopandas has a method ->tocrs() for it's shape collections) i think it would be nice to have it crafted in your code. I have implemented a way to do it after reading the shapefile... i could send it to you too.
Regards, Pablo.
The text was updated successfully, but these errors were encountered: