How should models with conflicting dependencies be handled? #159

edwardchalstrey1 · 2022-02-09T15:20:54Z

edwardchalstrey1
Feb 9, 2022
Maintainer

Imagine you are a scivision user who wants to load more than one model from the scivision catalog to try on your dataset. How should the differing environments required to run each model be handled? Should we use a particular package/environment manager e.g. Conda?

If someone finds a model from the catalog they want to use, they are already in a python environment, but the python environment would need to refresh/change to the be one that the model runs in. Is there some way in Python we can get the load_pretrained_model function to find the specified model environment (e.g. build it from the environment.yml or requirements.txt in the model's repo) and then refresh the python kernel so that it runs?

Right now, I think our examples are all rigged so that the python environment that the example notebook is being run in has the requirements of the model installed - this is fine for our example scivision use-cases gallery, however it means that nobody can currently load a model from the catalog unless their python environment happens to support it.

ghost · 2022-02-16T21:02:29Z

ghost
Feb 16, 2022

Dependency hell indeed!

Two approaches:

Wash your hands of it. Ensure that Scivision (the library) runs under a wide range of Python versions. Delegate the selection of Python version to the user who must also be aware of version constraints of his models. This is effectively the Python way.
Make an adapter that spins up the model in a separate process with a possibly different Python environment and uses inter-process-communication to interact with it.

The answer is probably to do (1) but not do anything that precludes (2).

3 replies

edwardchalstrey1 Feb 17, 2022
Maintainer Author

So the context here is when people are wishing to run models that they've discovered via the scivision catalog. In theory, if they've queried the catalog and it's returned the github url of a model of interest, they can then simply run model = scivision.load_pretrained_model('github.com/path/to/repo') and then go on to run model.predict(X).

The problem is that the model could require a different python environment that the user is working in when they imported scivision, so perhaps your second suggestion is an option for that. Any thoughts @ots22 @kasra-hosseini ?

miquelmassot Mar 16, 2022

Is it worth informing the users via catalog to which versions of python it's been tested against?

edwardchalstrey1 Mar 17, 2022
Maintainer Author

Could be a good idea @miquelmassot - the temporary solution is that we say in documentation that all models in the catalog must have installation instructions in a README, which one would hope includes which Python version if that is a factor: https://scivision.readthedocs.io/en/latest/model_repository_template.html#catalog

ots22 · 2022-02-17T10:50:40Z

ots22
Feb 17, 2022
Maintainer

I think this issue is closely related to the issue of handling system/environmental dependencies of model packages, in that python doesn't offer a complete solution, and it's very hard for an individual library (such as scivision) to solve it from within a particular installation. This is what package managers are for.

It could be tempting to, effectively, create our own ad-hoc python package manager, which I think we should avoid!

My feeling is we should try to use existing solutions and tools where we can, and stick to the norms of the python ecosystem - but we could certainly make suggestions or recommendations as to what to do (say, recommend pyenv or conda for managing a particular python version, and document a workflow that uses each of these).

3 replies

ots22 Feb 17, 2022
Maintainer

I think this is why I'm doubtful that we can solve these problems with the current approach of an 'allow_install' argument to load_pretrained_model. We could instead use the catalog to let a user identify interesting models, but leave it up to them to perform the installation - still valuable I think.

I think this might be a slightly controversial view, since one goal was to prevent installation headaches for new users. @quantumjot?

edwardchalstrey1 Feb 17, 2022
Maintainer Author

I'm tempted to agree, perhaps we should focus on the catalog as being the main USP of scivision - that being said, perhaps we can have some clever way that when you run load_pretrained_model('path/to/modelX') it also creates model X's environment and activates it? Then when you run load_pretrained_model('path/to/modelY) it does the same and there is a warning printed saying "Model Y is now the loaded model" or "Model Y environment activated"

edwardchalstrey1 Feb 17, 2022
Maintainer Author

But this would mean only one model can be loaded at any given time, I don't know whether this would be a problem?

acocac · 2022-02-17T11:17:20Z

acocac
Feb 17, 2022
Maintainer

I agree it seems to be very tricky to have multiple python environments working into a single plain script. For jupyter lab or notebooks, there's a workaround using nb_conda_kernels, https://github.com/Anaconda-Platform/nb_conda_kernels.

To alert the users about the right configuration, what about adding a sort of checking flag of the python version in allow_install. This means, model providers should indicate the python version when registering their models. An example below should be added somewhere in load_pretrained_model. Also, when returning the matches of models and data, we can have an additional (optional) argument to filter those ones with common python versions, then user can install multiple models into the same environment.

import sys    
if sys.version[0] != '3.x':
    print 'You need python 3.x to run the model. Please follow the instructions here to set up a virtual environment for serving the model'.

2 replies

edwardchalstrey1 Feb 17, 2022
Maintainer Author

I think it's unlikely that different models in the catalog would have matching python environments - sure the python version could be the same, however two models could rely on different versions of the same package, which would be a conflict.

edwardchalstrey1 Feb 17, 2022
Maintainer Author

And whilst it's nice that the functionality to switch kernels exists in notebooks, we would want scivision to be able to load models in any Python

edwardchalstrey1 · 2022-02-23T16:00:24Z

edwardchalstrey1
Feb 23, 2022
Maintainer Author

From discussion today online:

Perhaps one of the requirements in the contribution to scivision catalog guidelines should be that there must be quality documentation on installation/usage for submitted models. So much so, that when people PR an addition to the catalog, it will only be accepted if the model is installable by the reviewer (community member) and various other checks have passed.

This would be in a scenario where the answer to #149 is no, most models are not auto-magically installable by scivision, and instead the user who finds the model via a catalog is directed/instructed to install via link to the installation instructions (e.g. the model repo README).

The purpose of this would be to essentially bypass the problem being discussed in this discussion - the scivision user would use scivision for querying the catalog, but once they find a model they want to use, they would likely set up a bespoke python environment to use it, as per the model's instructions (e.g. repo README).

This does mean that there's no nice way to run models in the same environment, however that may not be realistic anyway.

@acocac suggests part of the minimum requirements for a model submission would also include a license (See #169)

2 replies

edwardchalstrey1 Feb 23, 2022
Maintainer Author

Essentially, when querying the catalog for models, it would be nice to see attributes of the returned models such as a) is_installable boolean flag which means you can use load_pretrained_model for this model, but most models will be False and b) licence with a set number of possible answers (GPL, MIT)

edwardchalstrey1 Feb 23, 2022
Maintainer Author

@kasra-hosseini also points out this paper: https://arxiv.org/pdf/1810.03993.pdf - which has a "Model card" we could take inspiration from, for additional catalog fields

edwardchalstrey1 · 2022-04-20T10:45:43Z

edwardchalstrey1
Apr 20, 2022
Maintainer Author

Using BeakerX or wrattler for multiple python environments in a scivision noteebook

Wrattler by Turing REG - not actively developed
Both Wrattler and BeakerX allow for "polyglot" notebooks: different languages in each cell - pass data objects between
BeakerX allows you to use any preferred kernel in a new cell.
- Can we create new kernels for each scivision model and then make them available for different cells in a notebook?
- Downside could be that with many differently named kernels being created, the UI could become messy - would need to think carefully about naming conventions!
BeakerX kernel magic:

4 replies

miquelmassot Aug 4, 2022

My two cents: I think this is very demanding - both logistically, in terms of time, and in terms of the design of the entire data pipeline. If a model python package is well written (e.g. has its good list of install requirements and dependencies) then it shouldn't be a big deal to use with scivision.
If a used needs two models with conflicting dependencies, that's not scivision job to solve it, I believe. I believe you are targetting an issue that scivision is not intended for.

In other words - the issue would persist with or without scivision. Users would need to have two environments of their choice (conda, pyenv, env, docker,...) and I would suggest this is out of scope for the package.

edwardchalstrey1 Aug 12, 2022
Maintainer Author

Just had a chat with Nick Barlow (Turing REG) who developed Wrattler and he was quite keen on the idea of it being useful in another project. I'm still somewhat keen on this idea as it would allow scivision users to do the following

Load a dataset in the usual way
Find matching models in the catalog
Run each model's predictions on the same data in different Jupyter cells, which have different environments/kernels when needed

Currently we don't offer a nice way to do part 3 other than running 2 separate notebooks/scripts in differing environments manually

@ots22 do you have any thoughts on this?

ots22 Aug 15, 2022
Maintainer

On the general problem:

I agree with Miquel that we should question whether and how much of the dependency problem should be in scope for us.
If we decide that it is out of scope, we should still make sure that the situation is clear to a user
- For example, one approach might be something like:
Models in the catalog must be installable and loadable with load_model in a fresh environment. Models may conflict with one another (their dependencies may conflict). In that case we recommend separate environments for each (that is, dealing with this problem considered out of scope for Scivision). Scivision itself has some dependencies for its core functionality, and a minimum python version. We do not support models that conflict with these core dependencies. (A consequence of the last point is that we should aim for a small set of requirements, and that a model may not always be supported - if we gain a dependency, or start requiring a more recent python version, for example).
- The above isn't the only possibility of course - my suggestion is really that we should have a clear an consistent recommendation, even if we don't do anything especially to handle conflicts.
The most convincing reason to attempt to solve it might be as part of dealing with the wider problem of system dependencies (perhaps related to some points here: Investigate usefulness of creating a scivision docker image #99 (comment))

On Wrattler as such:

This does seem to be the sort of issue Wrattler was developed to solve
I don't think we should require Wrattler in order to give Scivision a nice user experience (recognising that users may have other tools/workflows that they prefer - Scivision should be usable as 'just' a regular python library). So we still have the multiple-kernel issue for the (I expect majority) of Jupyter users.
Wrattler isn't maintained anymore, and I don't think we want to take on supporting it ourselves, so I'd hesitate about recommending it in our official channels.
It sounds like it could be an interesting experiment to try, and should 'just work' in Wrattler (and may be interested for the former Wrattler team) - we could just try it to see what it's like and whether we learn anything.
Are there any ideas we could borrow from it, without adopting it wholesale?

edwardchalstrey1 Aug 15, 2022
Maintainer Author

I like the idea of simply borrowing Wrattler, especially if it does "just work" - would have another scivision gallery notebook which is like "If you wanted to view the results of 2 models from the catalog side by side, you could use this tool that allows for multiple environments in the same notebook"

^ I think that would be something nice to return to once we have a situation where there are two or more suitable models for a particular dataset in the catalog, but right now I don't think we have good examples of that anyway

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How should models with conflicting dependencies be handled? #159

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 14 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How should models with conflicting dependencies be handled? #159

edwardchalstrey1 Feb 9, 2022 Maintainer

Replies: 5 comments · 14 replies

ghost Feb 16, 2022

edwardchalstrey1 Feb 17, 2022 Maintainer Author

miquelmassot Mar 16, 2022

edwardchalstrey1 Mar 17, 2022 Maintainer Author

ots22 Feb 17, 2022 Maintainer

ots22 Feb 17, 2022 Maintainer

edwardchalstrey1 Feb 17, 2022 Maintainer Author

edwardchalstrey1 Feb 17, 2022 Maintainer Author

acocac Feb 17, 2022 Maintainer

edwardchalstrey1 Feb 17, 2022 Maintainer Author

edwardchalstrey1 Feb 17, 2022 Maintainer Author

edwardchalstrey1 Feb 23, 2022 Maintainer Author

edwardchalstrey1 Feb 23, 2022 Maintainer Author

edwardchalstrey1 Feb 23, 2022 Maintainer Author

edwardchalstrey1 Apr 20, 2022 Maintainer Author

Using BeakerX or wrattler for multiple python environments in a scivision noteebook

miquelmassot Aug 4, 2022

edwardchalstrey1 Aug 12, 2022 Maintainer Author

ots22 Aug 15, 2022 Maintainer

edwardchalstrey1 Aug 15, 2022 Maintainer Author

edwardchalstrey1
Feb 9, 2022
Maintainer

Replies: 5 comments 14 replies

ghost
Feb 16, 2022

edwardchalstrey1 Feb 17, 2022
Maintainer Author

edwardchalstrey1 Mar 17, 2022
Maintainer Author

ots22
Feb 17, 2022
Maintainer

ots22 Feb 17, 2022
Maintainer

edwardchalstrey1 Feb 17, 2022
Maintainer Author

edwardchalstrey1 Feb 17, 2022
Maintainer Author

acocac
Feb 17, 2022
Maintainer

edwardchalstrey1 Feb 17, 2022
Maintainer Author

edwardchalstrey1 Feb 17, 2022
Maintainer Author

edwardchalstrey1
Feb 23, 2022
Maintainer Author

edwardchalstrey1 Feb 23, 2022
Maintainer Author

edwardchalstrey1 Feb 23, 2022
Maintainer Author

edwardchalstrey1
Apr 20, 2022
Maintainer Author

edwardchalstrey1 Aug 12, 2022
Maintainer Author

ots22 Aug 15, 2022
Maintainer

edwardchalstrey1 Aug 15, 2022
Maintainer Author