Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syntax for private/multiple registries #48

Open
dominictarr opened this issue Dec 3, 2013 · 34 comments
Open

syntax for private/multiple registries #48

dominictarr opened this issue Dec 3, 2013 · 34 comments

Comments

@dominictarr
Copy link
Owner

People have talked about multiple npm registries for ages,
but there is not any good support for this.

I tried to tackle this problem with a (too) clever proxy based technique
https://github.com/substack/shadow-npm

(that doesn't work anymore, because you cannot get the npm users anymore)

The problem, is that the npm registry needs to be aware that it is communicating with a different registry, because it's gotta send different auth for that registry.

As a minimum, this required, to make npm(d) aware of auth to multiple registries. It will be pretty easy to add that to npmd... hmm, npm-registy-client
is a separate module now too, so maybe it would be easy to add there too?

so, what does it look like in the package.json?

"dependencies": {
  "my-module": "npm:myregistry.com#~1"
}

I think this could be enough. it needs to be npm: because it's a protocol.
npm(d) won't just request that module. it will make a bunch of requests to various couchdb things. so, what it needs to know is use this registry.
and you can set up auth, etc for that registry in a config file somewhere.

@dominictarr
Copy link
Owner Author

/cc @jden

@junosuarez
Copy link

I would propose that the value of the dependency be a URI identifying the actual module by name and semver range, eg

"dependencies": {
  "my-module": "npm:myregistry.com/my-module#~1"
}

This is analogous to how npm currently handles git urls and has the benefit of keeping all of the information needed to identify the dependency in one string. Sticking with URIs also has the advantage of working with existing parsers, like node's built-in url module.

Auth for multiple registries should be based on hostname. current plain npm dependencies should be considered as using npmjs.org as the hostname. Stored credentials could then be retrieved based on hostname.

See also how the git client manages credentials internally: https://www.kernel.org/pub/software/scm/git/docs/v1.7.9/technical/api-credentials.html

/cc @Raynos (I know you were interested in npm using URLs in our previous conversation)

@Raynos
Copy link

Raynos commented Dec 3, 2013

i like the idea of a module being an uri. npmjs://myregistry.com/my-module/0.2.3.

@Gozala talked to me a while ago about the idea that a module should be an uri.

Obvouisly this suffers from the problem that the internet is mutable, so we need an immutable protocol / understanding. A version meaning only one thing is very important.

@dominictarr
Copy link
Owner Author

I like this, but the module name is redundant, what if it's different to the module name in the key?

If you want to express a module at a particular version as a single string, there is already a way to do this:

module-name@version

This already works for all resource types,
module-name@https://whatever.com/module-name.tgz already works.

@jden totally agree i want it to work by just assuming that all current modules are in registry.npmjs.org

@Raynos agree. this is the benefit of a central registry. it's easier to have a protocol that enables you to version modules in a disiplined way. if everything is just a url, then that is like script tags. arguably the most flexible possible method, but too flexible and not enough guarantees.

@dominictarr
Copy link
Owner Author

I guess you could make the module name in the url optional
npm:myregistry.com#~1.0.0

are the slashes really necessary?

@defunctzombie
Copy link

I don't think private registries should be something that is exposed to the file in the general case. I personally like the apt-get approach of stacking and maybe some .npmrc file could contain a list of registries in the order to check them.

Beyond that, I find the following syntax much more intuitive

schema://path@version
"dependencies": {
  "foo": "github://user/repo@tag",
  "bar": "github://user/repo@commit",
  "baz": "1.2.3",
  "fiz": "[email protected]",
  "foz": "file://local/file.js",
  "cat": "npm://[email protected]"
}

When you don't specify a schema it will assume an npm registry and use the list of configured registries (so the last line for cat could just be 3.4.5 but think of it like doing apt-get install -t unstable foobar to specifically identify that we want foobar from the "unstable" registry.

I am a fan of the "name":"where" style which also allows you to install it under whatever name you want (like the fiz-whatever) example. But that might be too extreme or lead to other confusion. I generally consult package.json when I want to know what is the dependency anyway.

@mbrevoort
Copy link

The URI semantic is more understandable. I like the rationale that npm is a protocol and should be represented as such.

The resources will be mutable with range/wildcard version specification. I can't see a way around that (e.g. "~1.0.1" or something more involved like ">= 0.7.3 < 1")

One point against treating it as a URI is spaces and some characters in the version conditionals (like above) need escaping.

However, I'd be happy with anything in this ballpark at this point! :)

@mbrevoort
Copy link

I like the apt-get style approach but I prefer predictability of which registry I'm resolving which deps from.

@defunctzombie
Copy link

-50000 x 50000 for anything that is not an EXACT pin of a version. Immediate failure when you do not install what you expected to install. You cannot predict the future of your dependencies so version ranges are not the answer. Semver is a hint and not some absolute that says just cause only the patch or minor changed that there won't be bugs or feature changes.

@defunctzombie
Copy link

Let me put it this way, without some sort of cryptographic hash on content you expect, it would be like doing a git clone of your project and getting an random version of the code for one of the folders.

If the reaction is that this makes deeply nested dependency updates a pain then maybe the entire approach of modules or tooling needs to be made better.

@mbrevoort
Copy link

@defunctzombie but when you specify a dependency with a range or wildcard like ~1.0.1 you are saying up front that you don't expect the same package every time. It's not random; it's a feature. If you want the same version, specify it explicitly (1.0.1). I realize loosely defined dependencies can change underneath but that's what npm shrinkwrap intended to address. I do think there should be a native way to ensure bit-wise equality though.

I assume the goal here is to identify an approach that is compatible with the current npm registry implementation or at the very least non breaking.

@junosuarez
Copy link

I don't think @dominictarr was proposing to change the way dependency version ranges are resolved to specific dependency versions. This issue is relating to how to best specify a dependency that is originating from an npm registry other than the public `registry.npmjs.

The way I see it, there are two issues:

  • Managing user credentials for multiple registries
  • Specifying the dependency in the dependant program's package.json

IMHO it would make sense to create separate issues for these two requirements. Since this thread has mostly discussed the second, I will continue with that.

The default npm method for specifying a dependency is
"name": "~>versionRange"
This assumes the protocol (npm), the registry (registry.npmjs.org). The actual location of the bits, and how to resolve the version range, is not specified in this declaration.

Git urls and local filesystem paths to tarballs look like
"name":"/path/to/repo/or/tarball"
This assumes a specific version and delivery mechanism (i.e., "get the bits located at this address"). It does not allow for version range dependency resolution. (And as stated above in this post, changing the way that works is beyond the scope of this issue).

Explicit registry modules need a way to encode:

  • module name
  • version range
  • registry

It does not need to encode a specific url as an address to download a specific tarball. The proposal for URIs would be used just as identifiers and as a means of encoding this dependency <name, version, registry> triple in a way that most programmers are familiar with.

@dominictarr the slashes in URIs are not required, but they are 1) familiar and 2) well specified in rfc 3986 and well supported in parsers like node's url

@chrisdickinson
Copy link

Out of curiosity, from whence comes npmjs:// / npm://? If I'm not mistaken npm's protocol is just HTTP -- shouldn't we faithfully expose that (vs. promoting the host portion of the url to the protocol?)

@max-mapper
Copy link

http://wzrd.in/ (browserify-cdn) implements some of these ideas -- might be useful as a way to test these ideas out immediately, patches welcome!

@terinjokes
Copy link

Why does defining an npm package as a dependency need to include registry information? I like the approach of a hierarchy of registries, and it works pretty well in the package management world of Linux.

Think of this scenario: request has a small bug in the current version 2.27.0. I make a fix and send a pull request, but publish it to my private npm server as [email protected]+cloudflare1. Why shouldn't this version be picked up automatically whenever request is requested?

I don't want to have to edit the dependency tree of my entire app to now say npm://[email protected] then edit it again when 2.27.1 comes out with my PR merged.

(No offense to mikeal here, of course)

@Gozala
Copy link

Gozala commented Dec 3, 2013

i like the idea of a module being an uri. npmjs://myregistry.com/my-module/0.2.3.

@Gozala talked to me a while ago about the idea that a module should be an uri.

Obvouisly this suffers from the problem that the internet is mutable, so we need an immutable protocol / understanding. A version meaning only one thing is very important.

To be clear for a while I just wanted to have require("raw.github.com/Gozala/method/v2.0.0/core") no package.json or any of that jazz. npm could be just registry that enforces versioning & package (or rather module)
manager would just look at modules and install dependencies.

I don't think this is very related to what's discussed here though.

@defunctzombie
Copy link

Doesn't GO do something like this?
On Dec 3, 2013 4:20 PM, "Irakli Gozalishvili" [email protected]
wrote:

i like the idea of a module being an uri. npmjs://
myregistry.com/my-module/0.2.3.

@Gozala https://github.com/gozala talked to me a while ago about the
idea that a module should be an uri.

Obvouisly this suffers from the problem that the internet is mutable, so
we need an immutable protocol / understanding. A version meaning only one
thing is very important.

To be clear for a while I just wanted to have require("
raw.github.com/Gozala/method/v2.0.0/core") no package.json or any of that
jazz. npm could be just registry that enforces versioning & package (or
rather module)
manager would just look at modules and install dependencies.

I don't think this is very related to what's discussed here though.


Reply to this email directly or view it on GitHubhttps://github.com//issues/48#issuecomment-29752543
.

@terinjokes
Copy link

@defunctzombie it does, but it also has no registry and I don't think it even attempts semver resolution.

@Gozala
Copy link

Gozala commented Dec 3, 2013

@defunctzombie @terinjokes Yes it's very similar, but biggest flaw is that they are git urls and go get only
installs master:
http://talks.golang.org/2012/splash.article#TOC_9.

I suspect in a a future support for tags will be added to fix that issue which a lot of people had being asking for.

@dominictarr
Copy link
Owner Author

@jden okay, I am persuaded by on the URI argument.

@Gozala that is out of scope.
@defunctzombie
git urls currently look like this git://github.com/user/project.git#commit-ish (from npm docs)
I believe # was chosen here because it's not a part of the url.
But extra information, pertaining to it, that is evaluated on the client, and not sent with the request. which is pretty much how giturls are handled.

changing how npm currently works (breaking change) or making a frivolous change even if it's better to npm is completely out of the question.

@Gozala wow, does go install dependencies recursively? really surprised it only installs master!

Okay: explicitly named private registries vs. a configured chain of registries.

@terinjokes this has the considerable advantage that it is now not necessary to change anything in the format of the package.json, however, I see some disadvantages coming with that too. For example, it's now not possible to know what will be installed, given a package.json also suppose some writes a bah module in for private registry, but then someone else writes a bah module in the public registry. now there is no way to use the one from the public registry, and it's also not obvious which registry you are getting any particular repo from.

hmm, what would it allow you to do better? hmm, you could move the registry or rename it, and you'd only have to update the config, but you could also configure an alias for that repo too and use an explicit registry in the uri.

@terinjokes so, with explicit registries you could still publish your own fork to your own registry.
the only difference is that it would look like:

"request": "npm://registry.cloudflair.com#~2.27.0"

@chrisdickinson I'll argue that npm isn't just http, it's a layer on top of http. it's has an http url, but it doesn't just make requests to that one url, but to possibly multiple. Also, an http url already has a meaning with npm - if a url returns a tarball, then it npm will install that tarball.

@terinjokes
Copy link

@dominictarr I'm not against namespaces, and I think they would be a good addition, but forcing the registry to be the namespace isn't that great. request@npm://registry.cloudflare.com#~2.27.0 is a different package than request@~2.27.0 and one would not ever be used to satisfy the other, and thus won't be deduped when they are in the same tree.

If I fix a bug in foo I want my entire codebase to use the fixed code while waiting for the PR to be merged and a new version to be packaged, not just the small sliver that's my application-specific logic.

@dominictarr
Copy link
Owner Author

@terinjokes the target usecase of multiple registries isn't really to install your own forks.
that is already satisfied well enough with giturls, instead, it's to make it easy for organizations to use npm to install their own private code that isn't open source.

I think we are after different things here. do you maintain many forked modules?
can you tell me more about your practices here?

@terinjokes
Copy link

We fork public modules to fix bugs a lot, and have multiple people working on codebases at once. It's easiest to ensure everyone's on the same page if things are published to our internal registry, alongside our private code.

Currently, out of the 52 modules on our private npm, we have 11 forks of public code bases. Most, if not all, have upstream PRs—whether or not the maintainer is responsive is another thing.

@dominictarr
Copy link
Owner Author

what are you using for your private npm currently?
is it an entire mirror of registry.npmjs.org?

@terinjokes
Copy link

I'm using terinjokes/docker-npmjs which is using kappa to delegate to registry.npmjs.org if it doesn't exist locally.

@junosuarez
Copy link

@terinjokes @dominictarr I'd like to mention that the proposals in this thread don't really introduce naming, per se, but rather address assigning an "origin of record", so to speak. Module resolution would still be handled by the module's name, eg the string in package.json#name, the key part of the item in the dependencies object, and the folder name in node_modules/. If you update the source of a dependency from the public npm registry to your own private registry containing your patched fork, as long as the version (eg, with a label) still satisfies the version range in all of the other modules in your dependency tree that depend on the patched module, it is still deduped.

Example:

 jden:test  $ tree
.
├── node_modules
│   ├── a
│   │   ├── node_modules
│   │   │   └── colors
│   │   │       ├── MIT-LICENSE.txt
│   │   │       ├── ReadMe.md
│   │   │       ├── colors.js
│   │   │       ├── example.html
│   │   │       ├── example.js
│   │   │       ├── package.json
│   │   │       ├── test.js
│   │   │       └── themes
│   │   │           ├── winston-dark.js
│   │   │           └── winston-light.js
│   │   └── package.json
│   └── colors
│       ├── MIT-LICENSE.txt
│       ├── ReadMe.md
│       ├── colors.js
│       ├── example.html
│       ├── example.js
│       ├── package.json
│       ├── test.js
│       └── themes
│           ├── winston-dark.js
│           └── winston-light.js
└── package.json

7 directories, 20 files

 jden:test  $ npm dedupe
[email protected] node_modules/colors

 jden:test  $ tree
.
├── node_modules
│   ├── a
│   │   ├── node_modules
│   │   └── package.json
│   └── colors
│       ├── MIT-LICENSE.txt
│       ├── ReadMe.md
│       ├── colors.js
│       ├── example.html
│       ├── example.js
│       ├── package.json
│       ├── test.js
│       └── themes
│           ├── winston-dark.js
│           └── winston-light.js
└── package.json

5 directories, 11 files

 jden:test  $ cat ./package.json | grep colors
    "colors": "git://github.com/jden/colors.js.git"

 jden:test  $ cat ./node_modules/a/package.json | grep colors
    "colors": "~0.6.2"

@terinjokes
Copy link

@jden What about this scenario, quoted from above?

For example, it's now not possible to know what will be installed, given a package.json also suppose some writes a bah module in for private registry, but then someone else writes a bah module in the public registry. now there is no way to use the one from the public registry, and it's also not obvious which registry you are getting any particular repo from.

I understood the npm://npmjs.internal/#~1.2.3 syntax to rectify this by namespacing the public bah and the private bah separately.


If I am indeed wrong, it certainly removes most of my objections. Only one remains.

If I fork a module to fix a bug, and this module is only used via public modules I haven't forked, I have no choice but to make my application dependent on the fork until such time as the PR is merged. One of the nice things about npm is that I don't have the micromanage the dependencies of my dependencies, and this certainly breaks that separation.

@junosuarez
Copy link

@terinjokes

I understood the npm://npmjs.internal/#~1.2.3 syntax to rectify this by namespacing the public bah and the private bah separately.

What I demonstrated above is the result of the current implementation of npm dedupe. The resolution at runtime only cares about a module's location on disk, ie, in node_modules. It would be possible to write an npm dedupe-strict for example that would require modules to originate from the same registry to be eligible for deduping. The only thing that remains then would be that two distinct modules with the same name but from different registries could not be used at the same level in a dependency tree. In practice, it's probably a good idea to have "unique enough" module names to lower the likelihood of this happening.

One of the nice things about npm is that I don't have the micromanage the dependencies of my dependencies, and this certainly breaks that separation.

How so? Simply update the dependency on the module that requires your fork, and everything else that depends on that module will be just happy.

@defunctzombie
Copy link

If you fix something for a dep you have to fork and update the dep. This is
how local module dependencies work. Sometimes it is a pain and other times
it isn't
On Dec 5, 2013 2:24 PM, "Jason Denizac" [email protected] wrote:

@terinjokes https://github.com/terinjokes

I understood the npm://npmjs.internal/#~1.2.3 syntax to rectify this by
namespacing the public bah and the private bah separately.

What I demonstrated above is the result of the current implementation of npm
dedupe. The resolution at runtime only cares about a module's location on
disk, ie, in node_modules. It would be possible to write an npm
dedupe-strict for example that would require modules to originate from
the same registry to be eligible for deduping. The only thing that remains
then would be that two distinct modules with the same name but from
different registries could not be used at the same level in a dependency
tree. In practice, it's probably a good idea to have "unique enough" module
names to lower the likelihood of this happening.

One of the nice things about npm is that I don't have the micromanage the
dependencies of my dependencies, and this certainly breaks that separation.

How so? Simply update the dependency on the module that requires your
fork, and everything else that depends on that module will be just
happy.


Reply to this email directly or view it on GitHubhttps://github.com//issues/48#issuecomment-29928763
.

@terinjokes
Copy link

So you're saying that if I fix a package (bah) used by something popular (popA), I would have to fork popA to update the dependency array, then fork every module that uses popA to update the dependency array to point at my fork, ad nauseam.

I don't see how this is a good solution and certainly doesn't sound like a productive use of my time.

Right now I publish the fixed version to my private repository. It shadows the package from the public repository, and I can move on with my job. When it's updated on the public repository, I unpublish from my private one (though I'd actually prefer not to have to do this last step, it should shadow only that version).

Again, in the general case, I'm not against the multiple repository format suggested here. I just think there should be another way for me to say "I have patched versions at 'x'. If anyone requests a package satisfied at 'x' use it."

@defunctzombie
Copy link

If you want to use a shadow package with same version sure. But you are not
using versioning then since maybe tomorrow you need to change something
else.

This is what it means to have local deps versus global deps.
On Dec 5, 2013 2:46 PM, "Terin Stock" [email protected] wrote:

So you're saying that if I fix a package (bah) used by something popular (
popA), I would have to fork popA to update the dependency array, then
fork every module that uses popA to update the dependency array to point
at my fork, ad nauseam.

I don't see how this is a good solution and certainly doesn't sound like a
productive use of my time.

Right now I publish the fixed version to my private repository. It shadows
the package from the public repository, and I can move on with my job. When
it's updated on the public repository, I unpublish from my private one
(though I'd actually prefer not to have to do this last step, it should
shadow only that version).

Again, in the general case, I'm not against the multiple repository format
suggested here. I just think there should be another way for me to say "I
have patched versions at 'x'. If anyone requests a package satisfied at 'x'
use it."


Reply to this email directly or view it on GitHubhttps://github.com//issues/48#issuecomment-29930949
.

@terinjokes
Copy link

Indeed, but I don't have many options that seamlessly work for an entire team of developers, and it was previously asked in this thread that I explain how we currently do things.

@dominictarr
Copy link
Owner Author

@terinjokes thank you for explaining.
So, my understanding is that you have not mirrored the entire registry, and your private npm only contains your forks and private modules?

The thing about this method that makes me the most uncomfortable is that it's very implicit.
it's just like how $PATH works, and i consider node's system of installing locally, and not really using $PATH to be one of node's very best innovations. However, clearly, this already works, and I'm we are not suggesting breaking that in any way.

Oh, by the way, it's worth noting that in npmd, the resolve step and the install step are decoupled.
npmd resolve module figures out what the entire dep tree will be, but it's output is just json, in the same format as npm shrinkwrap (except it also contains the shasums of the tarballs, if possible)
it would be easy to add a custom dedupe step, or something that replaced a dep, by just dropping a filter into the pipeline.

npmd-resolve whatever | my-custom-dedupe | npmd-install

you could also say, replace modules at any depth

npmd-resolve whatever | npmd-swapdep [email protected] request@npm:cloudflare.com#2.27.0 | npmd-install

(not to say that this isn't the way you SHOULD do this, but just that you COULD do this)

most importantly, you could also store that file,

npmd-resolve whatever | tee deps.json | npmd-install

And then anyone can check exactly what deps where in use when whatever was installed,
and it's still totally explicit. Of course, this is still not to say that @terinjokes's current method wouldn't still work, just pointing out that npmd is very flexible, and it would be easy to experiment with a variety of strategies for module resolution.

I havn't gone out of my way to make npmd work like this,
it's just it was easier to make it work this way.

@junosuarez
Copy link

This issue is still showing up on my personal "open issues mentioning you" list, limiting its utility. Could you please consider closing it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants