-
Notifications
You must be signed in to change notification settings - Fork 722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wayback_machine_downloader get lots Connection refused #264
Comments
+1 |
It looks like they're using rate limiting on requests, I cracked open the gem and put a random 3-10s delay in |
Thanks, I just did this too and it's working again. |
how can I go on about doing this? |
Find the location of your gem files using.
Then, CD into that directory. Then go into /gems/wayback_machine_downloader-2.3.1 Once you are in that directory, open the gem in VSCode or any code editor. Then, I added this code to the method.
|
Sorry to be a hassle but there is a lib and a bin folder which one would I put it under and where would I put it in the file https://cdn2.noschool.work/u/bbxwZLFV6O8arNJPmarl.png
|
You want to edit the file here |
Still isn't working it's this is what it's showing https://cdn2.noschool.work/u/zxPMiLtqgBV3oWvFrJFP.png |
Thanks,but I don't have this problem with wget,Does it have no rate limiting on wget requests ? |
im facing the same issue, any solution |
Since 2019 they are limiting requests to 15 per minute: https://archive.org/details/toomanyrequests_20191110 Therefore, adding a static 4-second delay works to avoid any connection refused errors. |
Currently a bit of a hack - there should probably be a configurable delay parameter.
Last month, I ran the wayback_machine_downloader normally ok ,But starting from yesterday,I tried many domain names, each returned result was a connection refused,
The command like this : wayback_machine_downloader http://huzhan.com --concurrency 3 -t 20220525005404 -a
The corresponding result like this take a look below:
https://www.huzhan.com/code/goods377071.html -> websites/huzhan.com/code/goods377071.html (280/112619)
https://www.huzhan.com/serve/goods14529.html -> websites/huzhan.com/serve/goods14529.html (281/112619)
https://www.huzhan.com/serve/goods12899.html # Connection refused - connect(2)
https://www.huzhan.com/serve/goods12899.html -> websites/huzhan.com/serve/goods12899.html (282/112619)
https://www.huzhan.com/ishop42980/ # Connection refused - connect(2)
https://www.huzhan.com/ishop42980/ -> websites/huzhan.com/ishop42980/index.html (283/112619)
https://www.huzhan.com/code/goods421671.html # Connection refused - connect(2)
https://www.huzhan.com/code/goods421671.html -> websites/huzhan.com/code/goods421671.html (284/112619)
https://www.huzhan.com/serve/goods15588.html # Connection refused - connect(2)
https://www.huzhan.com/serve/goods15588.html -> websites/huzhan.com/serve/goods15588.html (285/112619)
https://www.huzhan.com/serve/goods15287.html # Connection refused - connect(2)
https://www.huzhan.com/serve/goods15287.html -> websites/huzhan.com/serve/goods15287.html (286/112619)
https://www.huzhan.com/code/goods420832.html # Connection refused - connect(2)
https://www.huzhan.com/code/goods420832.html -> websites/huzhan.com/code/goods420832.html (287/112619)
https://www.huzhan.com/ishop37725/ # Connection refused - connect(2)
https://www.huzhan.com/ishop37725/ -> websites/huzhan.com/ishop37725/index.html (288/112619)
https://www.huzhan.com/code/goods372252.html # Connection refused - connect(2)
https://www.huzhan.com/code/goods372252.html -> websites/huzhan.com/code/goods372252.html (289/112619)
https://www.huzhan.com/code/goods418192.html # Connection refused - connect(2)
https://www.huzhan.com/ishop21789/ # Connection refused - connect(2)
https://www.huzhan.com/code/goods418192.html -> websites/huzhan.com/code/goods418192.html (290/112619)
https://www.huzhan.com/ishop21789/ -> websites/huzhan.com/ishop21789/index.html (291/112619)
https://www.huzhan.com/code/goods354759.html # Connection refused - connect(2)
https://www.huzhan.com/code/goods354759.html -> websites/huzhan.com/code/goods354759.html (292/112619)
https://www.huzhan.com/code/goods421676.html # Connection refused - connect(2)
https://www.huzhan.com/code/goods421676.html -> websites/huzhan.com/code/goods421676.html (293/112619)
https://www.huzhan.com/code/goods412576.html # Connection refused - connect(2)
https://www.huzhan.com/ishop40294/ # Connection refused - connect(2)
https://www.huzhan.com/code/goods412576.html -> websites/huzhan.com/code/goods412576.html (294/112619)
https://www.huzhan.com/ishop40294/ -> websites/huzhan.com/ishop40294/index.html (295/112619)
https://www.huzhan.com/ishop40283/ # Connection refused - connect(2)
https://www.huzhan.com/ishop40283/ -> websites/huzhan.com/ishop40283/index.html (296/112619)
https://www.huzhan.com/serve/goods15226.html # Connection refused - connect(2)
https://www.huzhan.com/serve/goods15226.html -> websites/huzhan.com/serve/goods15226.html (297/112619)
https://www.huzhan.com/ishop44505/ # Connection refused - connect(2)
https://www.huzhan.com/ishop44505/ -> websites/huzhan.com/ishop44505/index.html (298/112619)
https://www.huzhan.com/code/goods410194.html # Connection refused - connect(2)
https://www.huzhan.com/code/goods410194.html -> websites/huzhan.com/code/goods410194.html (299/112619)
https://www.huzhan.com/ishop41272/ # Connection refused - connect(2)
https://www.huzhan.com/serve/goods15735.html # Connection refused - connect(2)
https://www.huzhan.com/ishop41272/ -> websites/huzhan.com/ishop41272/index.html (300/112619)
https://www.huzhan.com/serve/goods15735.html -> websites/huzhan.com/serve/goods15735.html (301/112619)
https://www.huzhan.com/code/goods420725.html # Connection refused - connect(2)
https://www.huzhan.com/code/goods420725.html -> websites/huzhan.com/code/goods420725.html (302/112619)
https://www.huzhan.com/ishop43261/ # Connection refused - connect(2)
https://www.huzhan.com/ishop43261/ -> websites/huzhan.com/ishop43261/index.html (303/112619)
https://www.huzhan.com/serve/goods15565.html # Connection refused - connect(2)
https://www.huzhan.com/serve/goods15565.html -> websites/huzhan.com/serve/goods15565.html (304/112619)
https://www.huzhan.com/ishop44358/ # Connection refused - connect(2)
https://www.huzhan.com/ishop44358/ -> websites/huzhan.com/ishop44358/index.html (305/112619)
https://www.huzhan.com/code/page/4 # Connection refused - connect(2)
https://www.huzhan.com/ishop7456/ # Connection refused - connect(2)
then,I get lots files is empty, Did the archive website implement controls to prevent crawling? Because I can access it normally using a browser,Similarly I can also obtain the files by Wget tool,Thank you for following this issue !
The text was updated successfully, but these errors were encountered: