You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apparently imports are re-downloaded by fetch/pull (BTW this is undocumented, see iterative/dvc.org/issues/1792). I tested this and it seems even if you have the latest data in cache, the repo is cloned and the files are downloaded, overwriting the cached version. This seems unnecessary and could be an issue for large files/dirs. Shouldn't DVC be able to tell you already have the latest version without downloading the file? We have the commit hash, md5, etag, and checksum fields for that.
I'm not sure that this is the case though. I tested with both Git-tracked and DVC-tracked files ant the verbose output seems to indicate the data file is downloaded at least for sure for Git imports.
Git imports
λ dvc fetch -v2020-09-22 14:10:43,975 DEBUG: Check for update is enabled.2020-09-22 14:10:44,064 DEBUG: Trying to spawn '['daemon', '-q', 'updater']'2020-09-22 14:10:47,941 DEBUG: Spawned '['daemon', '-q', 'updater']'2020-09-22 14:10:47,951 DEBUG: fetched: [(3,)]2020-09-22 14:10:48,020 DEBUG: Creating external repo ../test@35117e1c2c941edf8e50511b9f69b3f8484978462020-09-22 14:10:48,026 DEBUG: erepo: git clone '../test' to a temporary dir2020-09-22 14:10:49,164 DEBUG: Saving '..\..\AppData\Local\Temp\tmpv7lxhlb9dvc-clone\code' to '.dvc\cache\78\44a93ad4b97169834dade975b5beff'.2020-09-22 14:10:49,169 DEBUG: Assuming 'C:\Users\poj12\DVC-repos\test2\.dvc\cache\78\44a93ad4b97169834dade975b5beff' is unchanged since it is read-only2020-09-22 14:10:49,246 DEBUG: fetched: [(6,)]Everything is up to date.
The DEBUG: erepo: git clone and DEBUG: Saving messages seem to indicate that the source repo was cloned and the Git-tracked file was overwritten in the cache.
But the DEBUG: Assuming only exists if the file is already in cache, however I can't tell what it means exactly. If I remove the file from cache and workspace first, I get DEBUG: cache '...5beff' expected 'HashInfo(name='md5', value='...beff', dir_info=None)' actual 'None' instead.
DVC imports
λ dvc fetch -v2020-09-22 14:39:51,369 DEBUG: Check for update is enabled.2020-09-22 14:39:51,443 DEBUG: Trying to spawn '['daemon', '-q', 'updater']'2020-09-22 14:39:54,944 DEBUG: Spawned '['daemon', '-q', 'updater']'2020-09-22 14:39:54,963 DEBUG: fetched: [(3,)]2020-09-22 14:39:54,993 DEBUG: Creating external repo ../test2@7c029062ba91f9f8d30dbd30dd21ae3c762cc5b12020-09-22 14:39:55,000 DEBUG: erepo: git clone '../test2' to a temporary dir2020-09-22 14:39:56,930 DEBUG: Saving '..\..\AppData\Local\Temp\tmp41v5wmkldvc-clone\data' to '.dvc\cache\61\37cde4893c59f76f005a8123d8e8e6'.2020-09-22 14:39:56,938 DEBUG: Assuming 'C:\Users\poj12\DVC-repos\test\.dvc\cache\61\37cde4893c59f76f005a8123d8e8e6' is unchanged since it is read-only2020-09-22 14:39:57,038 DEBUG: fetched: [(2,)]
Again, it seems to indicate that the file is saved from the tmp repo clone — which seems weird since it's not tracked by Git... The -v output is quite different if I remove the data from workspace and cache first, and actually mentions downloading from the remote, so I'm not sure what's happening here.
Maybe it's just a confusion on my part, not understanding the debug output. Just wanted to double-check. Sorry for extra long question!
BugReportApparently imports are re-downloaded by
fetch/pull
(BTW this is undocumented, see iterative/dvc.org/issues/1792). I tested this and it seems even if you have the latest data in cache, the repo is cloned and the files are downloaded, overwriting the cached version. This seems unnecessary and could be an issue for large files/dirs. Shouldn't DVC be able to tell you already have the latest version without downloading the file? We have the commit hash,md5
,etag
, andchecksum
fields for that.I'm not sure that this is the case though. I tested with both Git-tracked and DVC-tracked files ant the verbose output seems to indicate the data file is downloaded at least for sure for Git imports.
Git imports
The
DEBUG: erepo: git clone
andDEBUG: Saving
messages seem to indicate that the source repo was cloned and the Git-tracked file was overwritten in the cache.But the
DEBUG: Assuming
only exists if the file is already in cache, however I can't tell what it means exactly. If I remove the file from cache and workspace first, I getDEBUG: cache '...5beff' expected 'HashInfo(name='md5', value='...beff', dir_info=None)' actual 'None'
instead.DVC imports
Again, it seems to indicate that the file is saved from the tmp repo clone — which seems weird since it's not tracked by Git... The -v output is quite different if I remove the data from workspace and cache first, and actually mentions downloading from the remote, so I'm not sure what's happening here.
Maybe it's just a confusion on my part, not understanding the debug output. Just wanted to double-check. Sorry for extra long question!
Please provide information about your setup
DVC 1.7.2 (exe) on Win
The text was updated successfully, but these errors were encountered: