Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse Arrays #2

Open
stefanv opened this issue Feb 27, 2023 · 13 comments
Open

Sparse Arrays #2

stefanv opened this issue Feb 27, 2023 · 13 comments
Assignees

Comments

@stefanv
Copy link
Member

stefanv commented Feb 27, 2023

No description provided.

@stefanv stefanv converted this from a draft issue Feb 27, 2023
@ivirshup
Copy link

Brought up on the first meeting's hackmd, but I would love to work on getting major packages like scikit-learn, networkx, etc. to support array-API sparse arrays from scipy.

@perimosocordiae perimosocordiae self-assigned this Apr 20, 2023
@dschult dschult self-assigned this Apr 20, 2023
@jjerphan
Copy link
Member

jjerphan commented Apr 20, 2023

Hi all,

Thank you for sharing the notes, @ivirshup.

I think there's different needs to address and constraints to take into account regarding improving sparse array's usability in the ecosystem and not breaking existing implementations' behaviors and workflows.

I am thinking of focusing all my efforts on this issue during the Developer Summit.

What do you think?

@jjerphan jjerphan self-assigned this Apr 20, 2023
@ivirshup
Copy link

I think that would be great, and would be very interested in doing work along these lines as well. With the caveat that I'm not particularly familiar with Cython.

@jjerphan
Copy link
Member

With the caveat that I'm not particularly familiar with Cython.

It's fine. That's something people can help you with, I think.

@ivirshup
Copy link

I would also be quite keen on making PRs into downstream package (especially dask, xarray, scikit-learn, my own packages) making sure these types are supported. I think this could be quite useful for finding pain points around usability in the ecosystem.

Is there a centralized place where I can look at planned work/ known issues around this in scipy?

I would also be interested in a call with those interested to figure out specific goals for the hackathon.

@ivirshup ivirshup self-assigned this Apr 20, 2023
@dschult
Copy link

dschult commented Apr 20, 2023

I also plan to focus on this issue for the Developer Summit. And it'll be important to get some sort of consensus about which aspects of a sparse array revamp we can work to implement -- and which might need more information or discussion before design decisions can be made (working prototypes are part of this process of course).

I think a focus on downstream packages is important for our success -- let's make it easy to switch code from dense array syntax to sparse array syntax, and also easy for current users of sparse packages to figure out how to switch to scipy sparse arrays. Having some folks writing PRs for downstream packages while others are writing PRs to convert the sparse matrix to sparse array interface and having those two groups talking during the process may be an effective approach.

Some more specific goals (but not really very specific actually) I'd like to see:

  • rewriting scipy.sparse with sparse_array as the primary data structure while sparse_matrix is built on top (i.e. switching which classes are built on the other in preparation for eventual deprecation of the matrix interface)
  • resolving how to best mimic the dense/numpy array interface for the sparse arrays (I think this is pretty good already)
  • developing a 1-d sparse API (which may be needed for parts of the previous 2 bullets)

Is this the kind of thing people are thinking of? What else?

@ivirshup
Copy link

ivirshup commented Apr 20, 2023

developing a 1-d sparse API (which may be needed for parts of the previous 2 bullets)

This is a really good point. It could definitely be difficult to integrate with new downstream libraries without this. Do we expect any major hurdles here? I would imaging the indexing code would essentially be factored out from the existing functions.

downstream packages

I'm not actually completley sure which packages are represented at this event. Was there a list for this somewhere?


other potential topics

  • Array API? It would be nice if sparse arrays followed this as much as possible (e.g. not sure dlpack is reasonable here). Though, I'm unclear on status/ why np.ndarray.__array_namespace__ does not seem to exist.
  • I would like to at least have a conversation about interoperability/ conversion with other sparse array implementations (cusparse, sparse, and graphblas).
  • General performance work. E.g. sparse.random, matrix multiplication.

@dschult
Copy link

dschult commented Apr 26, 2023

When you refer to "Array API", are you talking about NEP 47? Any other places to look?

Interoperability and performance are important too -- they can sometimes be hard if the other libraries don't make it easy. But this is a big part of making it easy for people currently using other packages to figure out how to use sparse arrays.
:)

@ivirshup
Copy link

ivirshup commented May 3, 2023

Sorry about the delayed response! Had a hackathon last week, so also missed the second sparse summit. Were there notes for that floating around somewhere?

@dschult yep, I do mean that NEP / https://data-apis.org/array-api/latest/. I'm wondering if this is going to be required by downstream libraries like dask/ xarray

@perimosocordiae
Copy link

I just opened scipy/scipy#18440 to "invert" the hierarchy between spmatrix and _sparray. It was mostly a mechanical change, so hopefully we can get that merged before the summit and have a clean starting point to build from.

@ivirshup
Copy link

ivirshup commented May 8, 2023

Saw some recent activity on sparse array support over on xarray:

@perimosocordiae
Copy link

As of just now, sparse arrays are the base type for scipy.sparse, with spmatrix defined as a thin wrapper around scipy.sparse._sparray. There are plenty of cleanups and improvements to be made still, but we're moving in the right direction.

@jjerphan
Copy link
Member

jjerphan commented Jun 10, 2023

Hi all,

How would you like to organise the rest of the work? Should we distribute remaining tasks we have identified among ourselves?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

5 participants