Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
apacheGH-33854: [MATLAB] Add basic libmexclass integration code to MA…
…TLAB interface (apache#34563) ### Rationale for this change This pull request is a follow up to [this mailing list discussion](https://lists.apache.org/thread/2kgxbs54dw4wvcwrthzvb1ljqcvnrv7h) about integrating [`mathworks/libmexclass`](https://github.com/mathworks/libmexclass/) with the MATLAB Interface to Arrow code base. We've spent the last few months working on building `libmexclass` from scratch in order to ease development of the MATLAB Interface to Arrow. `libmexclass` essentially provides a way to connect MATLAB classes with corresponding C++ classes using an approach inspired by the [Proxy Design Pattern](https://en.wikipedia.org/wiki/Proxy_pattern). Our hope is that using `libmexclass` will enable us to more easily build out an object-oriented MATLAB Interface to Arrow memory by wrapping corresponding Arrow C++ classes and "proxying" method calls on these MATLAB objects to the underlying Arrow C++ objects. ### What changes are included in this PR? 1. Modifications were made to the CMake build system for the MATLAB interface to use `libmexclass` under the hood. This includes the addition of a new build flag `-D MATLAB_ARROW_INTERFACE = ON | OFF` which toggles building the new code that uses `libmexclass` under the hood. 2. To illustrate the basic usage of `libmexclass`, we have added one new MATLAB class `arrow.array.Float64Array`. This class allows users to construct an Arrow array with logical type `Float64` from a MATLAB `double` array with zero data copies. Under the hood, a `Proxy` wraps and bounds the lifetime of the underlying Arrow C++ `Float64Array` object. In addition, this `Proxy` is responsible for delegating method calls on an `arrow.array.Float64Array` to the corresponding Arrow C++ `Float64Array`. ### Are these changes tested? Yes, these changes have been tested on Linux, macOS, and Windows. 1. We've modified the MATLAB CI GitHub Actions workflow (`.github/workflows/matlab.yml`) to build the new `arrow.array.Float64Array` code using `libmexclass`. This includes passing `-D MATLAB_ARROW_INTERFACE=ON` to the `cmake` command call in `ci/scripts/matlab_build.sh`. 2. We've added a new basic MATLAB test `test/arrow/array/tFloat64Array.m` which tests for successful construction of an `arrow.array.Float64Array`. This [test is passing successfully in the MATLAB CI workflow](https://github.com/mathworks/arrow/actions/runs/4419365852/jobs/7747694543#step:6:50). 3. We've confirmed that the [`Dev` CI workflow linting checks are all passing](https://github.com/mathworks/arrow/actions/runs/4419365845) and appropriate Apache license headers have been added. 4. We've manually tested creation, deletion, and assignment of multiple `arrow.array.Float64Array` instances on Linux, macOS, and Windows with a variety of different MATLAB `double` arrays. ### Are there any user-facing changes? Yes, there is now a public class named `arrow.array.Float64Array` which is added to the MATLAB Path. Included below is a simple example of creating two different `arrow.array.Float64Array` objects in MATLAB: ```matlab >> A = arrow.array.Float64Array([1, 2, 3]) A = [ 1, 2, 3 ] >> random = arrow.array.Float64Array(rand(1, 10, 100)) random = [ 0.6311887342690112, 0.355073651878849, 0.9970032716066477, 0.22417149898312716, 0.6524510729686149, 0.6049906419082594, 0.38724543148313495, 0.14218715929050407, 0.025134985710203117, 0.4211122537652413, ... 0.6228027906591304, 0.7966246853083961, 0.74587490154065, 0.12553623135481973, 0.8223940067590204, 0.02515050142850217, 0.41442888092403163, 0.7314074679729372, 0.7813740002759628, 0.367285915131369 ] ``` **Note**: This is an early stage PR, so the naming scheme `arrow.array.<Type>Array` might change in the future. ### Future Directions 1. Currently, the "old" `featherread`/`featherwrite` code is still being built by CMake and installed to the specified `CMAKE_INSTALL_PREFIX`. This slows down the build process and complicates the build system logic. In addition, these Feather functions only support reading and writing a subset of Feather V1 files. We should considering disabling building of this legacy code by default or removing it entirely. In the long term, when we have more Arrow types in MATLAB (e.g. `arrow.Table`, `arrow.Schema`, `arrow.RecordBatch`, etc.) we should consider re-implementing this functionality in terms of the new APIs. 2. We would like to start adding more numeric array classes like (`arrow.array.UInt8Array`, `arrow.array.Int64Array`, etc.). 3. We only added one very basic test for `arrow.array.Float64Array` in this pull request. We should add a lot more tests as the APIs develop to test things like indexing, copying, slicing, etc. 4. We don't have any documentation for `arrow.array.Float64Array` right now. In general, we should start adding detailed documentation for the new APIs as we start to implement them. 5. Lots more! This is just the beginning of building out the MATLAB Interface to Arrow APIs. We plan on creating GitHub issues for tracking work as we go. ### Notes 1. Creating `libmexclass` and integrating it with the Arrow code base was a team effort! Thank you to @ sreeharihegden, @ lafiona, @ sgilmore10, @ jhughes-mw, and others at @ MathWorks for their help with this pull request! 2. Closes: apache#33854 Lead-authored-by: Kevin Gurney <[email protected]> Co-authored-by: Fiona La <[email protected]> Co-authored-by: shegden <[email protected]> Co-authored-by: Sreehari Hegden <[email protected]> Co-authored-by: Fiona la <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
- Loading branch information