-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
erepo: --rev BRANCH: shallow clone #4246
Conversation
@@ -31,7 +32,7 @@ def external_repo(url, rev=None, for_write=False): | |||
logger.debug("Creating external repo %s@%s", url, rev) | |||
path = _cached_clone(url, rev, for_write=for_write) | |||
if not rev: | |||
rev = "HEAD" | |||
rev = "refs/remotes/origin/HEAD" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since rev=None
is supposed to mean tip on the default branch (master), we want to use origin/HEAD, not HEAD from our local clone working copy. Local HEAD points to the tip of whatever branch we first cloned from (which may not be the default branch), but here we want the tip of the default branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be happy to see such things commented about in dvc code.
with TqdmGit(desc="Cloning", unit="obj") as pbar: | ||
tmp_repo = git.Repo.clone_from( | ||
clone_from = partial( | ||
git.Repo.clone_from, | ||
url, | ||
to_path, | ||
env=env, # needed before we can fix it in __init__ | ||
no_single_branch=True, | ||
progress=pbar.update_git, | ||
) | ||
if shallow_branch is None: | ||
tmp_repo = clone_from() | ||
else: | ||
tmp_repo = clone_from(branch=shallow_branch, depth=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems essentially the same method as #3585 - just checking if @Suor has the same concern here as with https://github.com/iterative/dvc/pull/3585/files#r402728411
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case the caller has to explicitly specify that they want a shallow clone now, rather than us making a choice in Git.clone
based on the rev
parameter (which we don't use when cloning from erepo).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a couple of cosmetics.
@@ -31,7 +32,7 @@ def external_repo(url, rev=None, for_write=False): | |||
logger.debug("Creating external repo %s@%s", url, rev) | |||
path = _cached_clone(url, rev, for_write=for_write) | |||
if not rev: | |||
rev = "HEAD" | |||
rev = "refs/remotes/origin/HEAD" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be happy to see such things commented about in dvc code.
dvc/external_repo.py
Outdated
def _unshallow(git): | ||
git.repo.git.fetch(unshallow=True) | ||
if git.repo.head.is_detached: | ||
# If this is a detached head (i.e. we shallow cloned a tag) switch to | ||
# the default branch | ||
origin_refs = git.repo.remotes["origin"].refs | ||
ref = origin_refs["HEAD"].reference | ||
branch_name = ref.name.split("/")[-1] | ||
branch = git.repo.create_head(branch_name, ref) | ||
branch.set_tracking_branch(ref) | ||
branch.checkout() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in this file goes from high level to lower, so this one should be placed after its use.
dvc/external_repo.py
Outdated
_unshallow(git) | ||
shallow = False | ||
CLONES[url] = (clone_path, shallow) | ||
logger.debug("erepo: git pull '%s'", url) | ||
git.pull() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need git.pull
when we already called .fetch()
via _unshallow()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need pull fast-forward behavior in certain situations, but this has been updated to just use a single pull(unshallow=True)
call in _unshallow()
instead of fetch(unshallow=True)
followed by the extra pull()
Also, as I see if we clone the same url more than once then we use full clone. Is this intended? Should we explain our justification in a comment? |
* comment shallow clone behavior in erepo * don't duplicate unnecessary fetch/pull call
This is commented now before we call |
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
β I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)
Thank you for the contribution - we'll try to review it as soon as possible. π
Follow up to #3585.
Should fix #3473.
--no-single-branch --depth=1 --branch=<branch>
when initialized with a valid branch or tag name (i.e. forget/import --rev
)for_write=True
(i.e. dvcx) we use the old full clone + writeable copy behaviorgit fetch --unshallow
and then treat it like normal full cloneThis does not address these other potential changes to erepo clone behavior: