Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

erepo: --rev BRANCH: shallow clone #4246

Merged
merged 7 commits into from
Jul 22, 2020
Merged

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Jul 20, 2020

  • ❗ I have followed the Contributing to DVC checklist.

  • πŸ“– If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.

  • ❌ I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Follow up to #3585.
Should fix #3473.

  • erepo now clones with --no-single-branch --depth=1 --branch=<branch> when initialized with a valid branch or tag name (i.e. for get/import --rev)
    • if erepo is init'd with for_write=True (i.e. dvcx) we use the old full clone + writeable copy behavior
    • If shallow clone fails, we fallback to default full clone behavior
  • If we reuse an already cloned repo and don't have the necessary revision, we expand the clone with git fetch --unshallow and then treat it like normal full clone

This does not address these other potential changes to erepo clone behavior:

@pmrowla pmrowla self-assigned this Jul 20, 2020
@@ -31,7 +32,7 @@ def external_repo(url, rev=None, for_write=False):
logger.debug("Creating external repo %s@%s", url, rev)
path = _cached_clone(url, rev, for_write=for_write)
if not rev:
rev = "HEAD"
rev = "refs/remotes/origin/HEAD"
Copy link
Contributor Author

@pmrowla pmrowla Jul 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since rev=None is supposed to mean tip on the default branch (master), we want to use origin/HEAD, not HEAD from our local clone working copy. Local HEAD points to the tip of whatever branch we first cloned from (which may not be the default branch), but here we want the tip of the default branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be happy to see such things commented about in dvc code.

Comment on lines 116 to +128
with TqdmGit(desc="Cloning", unit="obj") as pbar:
tmp_repo = git.Repo.clone_from(
clone_from = partial(
git.Repo.clone_from,
url,
to_path,
env=env, # needed before we can fix it in __init__
no_single_branch=True,
progress=pbar.update_git,
)
if shallow_branch is None:
tmp_repo = clone_from()
else:
tmp_repo = clone_from(branch=shallow_branch, depth=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems essentially the same method as #3585 - just checking if @Suor has the same concern here as with https://github.com/iterative/dvc/pull/3585/files#r402728411

Copy link
Contributor Author

@pmrowla pmrowla Jul 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the caller has to explicitly specify that they want a shallow clone now, rather than us making a choice in Git.clone based on the rev parameter (which we don't use when cloning from erepo).

dvc/scm/git.py Outdated Show resolved Hide resolved
@pmrowla pmrowla marked this pull request as ready for review July 20, 2020 08:20
@pmrowla pmrowla requested review from pared, efiop and skshetry July 20, 2020 08:20
Copy link
Contributor

@Suor Suor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a couple of cosmetics.

@@ -31,7 +32,7 @@ def external_repo(url, rev=None, for_write=False):
logger.debug("Creating external repo %s@%s", url, rev)
path = _cached_clone(url, rev, for_write=for_write)
if not rev:
rev = "HEAD"
rev = "refs/remotes/origin/HEAD"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be happy to see such things commented about in dvc code.

Comment on lines 273 to 283
def _unshallow(git):
git.repo.git.fetch(unshallow=True)
if git.repo.head.is_detached:
# If this is a detached head (i.e. we shallow cloned a tag) switch to
# the default branch
origin_refs = git.repo.remotes["origin"].refs
ref = origin_refs["HEAD"].reference
branch_name = ref.name.split("/")[-1]
branch = git.repo.create_head(branch_name, ref)
branch.set_tracking_branch(ref)
branch.checkout()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in this file goes from high level to lower, so this one should be placed after its use.

_unshallow(git)
shallow = False
CLONES[url] = (clone_path, shallow)
logger.debug("erepo: git pull '%s'", url)
git.pull()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need git.pull when we already called .fetch() via _unshallow()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need pull fast-forward behavior in certain situations, but this has been updated to just use a single pull(unshallow=True) call in _unshallow() instead of fetch(unshallow=True) followed by the extra pull()

@Suor
Copy link
Contributor

Suor commented Jul 20, 2020

Also, as I see if we clone the same url more than once then we use full clone. Is this intended? Should we explain our justification in a comment?

* comment shallow clone behavior in erepo
* don't duplicate unnecessary fetch/pull call
@pmrowla
Copy link
Contributor Author

pmrowla commented Jul 20, 2020

Also, as I see if we clone the same url more than once then we use full clone. Is this intended? Should we explain our justification in a comment?

This is commented now before we call _unshallow()

@pmrowla pmrowla merged commit a4b3c6d into iterative:master Jul 22, 2020
@pmrowla pmrowla deleted the shallow-clone branch July 22, 2020 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'dvc get' is cloning the full bitbucket repo when only one branch is needed
4 participants