Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change in feed tries to overwrite every file in spool directory #45

Open
tonywhitmore opened this issue Oct 19, 2020 · 4 comments
Open
Assignees

Comments

@tonywhitmore
Copy link

tonywhitmore commented Oct 19, 2020

I've experienced a few times when something changes in a feed that is otherwise up-to-date. (I don't know what that "something" is but it has happened to me on multiple feeds.) Running castget against that feed prompts that a file already exists. Deleting that file and re-running castget then downloads the previously duplicated file, and errors on the next file in the feed. I've pasted an example below. I don't know what it is that causes castget to try to download the files again - perhaps there is a modification date in the feed?

One workaround is to move all the files out of the spool directory, run castget and then move all the files back in. This works until the "something" happens again.

I'm not sure what the best expected behaviour here would be. Ideally there would be a way for castget determine if a file has already been downloaded and not try to re-download it, even if something in the feed relating to that file has changed. Alternatively a "force-overwrite" option would do it.

tony@azal:~$ castget BabbleOn
Updating channel BabbleOn...
Enclosure file /home/tony/podcasts/BabbleOn/906915376-hollywoodbabbleon-371-caped-commentaries-9.mp3 already exists.
tony@azal:~$ rm /home/tony/podcasts/BabbleOn/906915376-hollywoodbabbleon-371-caped-commentaries-9.mp3
tony@azal:~$ castget BabbleOn
Updating channel BabbleOn...
Enclosure file /home/tony/podcasts/BabbleOn/893438968-hollywoodbabbleon-370-caped-commentaries-8.mp3 already exists.
tony@azal:~$ rm /home/tony/podcasts/BabbleOn/893438968-hollywoodbabbleon-370-caped-commentaries-8.mp3
tony@azal:~$ castget BabbleOn
Updating channel BabbleOn...
Enclosure file /home/tony/podcasts/BabbleOn/890189104-hollywoodbabbleon-369-caped-commentaries-7.mp3 already exists.
tony@azal:~$ castget BabbleOn
Updating channel BabbleOn...
Enclosure file /home/tony/podcasts/BabbleOn/886485790-hollywoodbabbleon-368-caped-commentaries-6.mp3 already exists.

etc. etc.

@mlj
Copy link
Owner

mlj commented Oct 25, 2020

Hard to say why this is happening but castget's strategy for determining whether it has already seen a file is a bit naive --- maybe too naive.

It just looks at the URL, makes a record of it and ignores that URL if it ever sees it again. Perhaps some feeds change the URL regularly. If this is the case, I am not sure how to detect duplicates (except by redownloading the file and, for example, comparing MD5 sums).

Is the URL of the feed in your example http://feeds.feedburner.com/HollywoodBabbleOnPod? I will subscribe to it myself and see if I can spot what goes wrong :)

@mlj mlj self-assigned this Oct 25, 2020
@tonywhitmore
Copy link
Author

Hi, yes, that's the feed URL although I've had it happen on a few others too. From memory:

http://feeds.feedburner.com/RichardHerringLSTPodcast
https://audioboom.com/channels/4929797.rss (WonkHE)

I did wonder about the URL changing but not the filename too. That seems plausible, perhaps they make an edit to the episode, upload a new file and the platform they are using generates a new URL which replaces the old file in the feed. I am not using castget's filename re-writing on BabbleOn or WonkHE but am on RichardHerringLSTPodcast.

The next time it happens I will try to figure out what has changed in the feed too. Thanks!

@hisaac
Copy link

hisaac commented Dec 14, 2020

For what it's worth, I've started seeing this issue as well, with multiple feeds.

A couple examples:

@tonywhitmore
Copy link
Author

This has just happened again with the Serial feed. It seems that they have updated the URLs in their feed, adding another redirecting service. So because the URL has changed, castget tries to download the file again, but the output filename is the same (whether using castget's rewriting capabilities or not) so castget returns an error. I'm guessing that it stops processing at the first error it finds in a feed?

So, a more accurate description of this bug might be that castget doesn't cope well when URLs are changed in a feed. I am not sure what the best behaviour here would be thought - it could be a command line switch to give the user the option to force overwrite duplicate files, or one to add URLs that produce a filename clash to the XML log file (either silently or with a warning).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants