Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

搜索Amazon高清封面成功率大幅降低 #292

Open
anyingxiuluo opened this issue Nov 8, 2024 · 3 comments
Open

搜索Amazon高清封面成功率大幅降低 #292

anyingxiuluo opened this issue Nov 8, 2024 · 3 comments

Comments

@anyingxiuluo
Copy link
Contributor

anyingxiuluo commented Nov 8, 2024

最近Amazon搜索高清图片成功率大幅降低,应该是这里修改地区不成功导致的,但是我不会修,希望大佬看到能修一下

mdcx/src/models/base/web.py

Lines 482 to 559 in 1cd7fad

def get_amazon_data(req_url):
"""
获取 Amazon 数据,修改地区为540-0002
"""
headers = {"accept-encoding": "gzip, deflate, br", 'Host': 'www.amazon.co.jp', 'User-Agent': get_user_agent(), }
try:
result, html_info = curl_html(req_url)
except:
result, html_info = curl_html(req_url, headers=headers)
session_id = ''
ubid_acbjp = ''
if x := re.findall(r'sessionId: "([^"]+)', html_info):
session_id = x[0]
if x := re.findall(r'ubid-acbjp=([^ ]+)', html_info):
ubid_acbjp = x[0]
headers_o = {'cookie': f'session-id={session_id}; ubid_acbjp={ubid_acbjp}', }
headers.update(headers_o)
result, html_info = curl_html(req_url, headers=headers)
if not result:
if '503 http' in html_info:
headers = {'Host': 'www.amazon.co.jp', 'User-Agent': get_user_agent(), }
result, html_info = get_html(req_url, headers=headers, keep=False, back_cookie=True)
if not result:
return False, html_info
if '540-0002' not in html_info:
try:
# 获取 anti_csrftoken_a2z
anti_csrftoken_a2z = re.findall(r'anti-csrftoken-a2z([^}]+)', html_info)[0].replace('"', '').strip(':')
session_id = re.findall(r'sessionId: "([^"]+)', html_info)[0]
ubid_acbjp = ''
if 'ubid-acbjp' in str(result):
try:
ubid_acbjp = result['set-cookie']
except:
try:
ubid_acbjp = re.findall(r'ubid-acbjp=([^ ]+)', str(result))[0]
except:
pass
headers_o = {'Anti-csrftoken-a2z': anti_csrftoken_a2z, 'cookie': f'session-id={session_id}; ubid_acbjp={ubid_acbjp}', }
headers.update(headers_o)
mid_url = 'https://www.amazon.co.jp/portal-migration/hz/glow/get-rendered-toaster' \
'?pageType=Search&aisTransitionState=in&rancorLocationSource=REALM_DEFAULT&_='
result, html = curl_html(mid_url, headers=headers)
try:
anti_csrftoken_a2z = re.findall(r'csrfToken="([^"]+)', html)[0]
ubid_acbjp = re.findall(r'ubid-acbjp=([^ ]+)', str(result))[0]
except:
pass
# 修改配送地址为日本,这样结果多一些
headers_o = {
'Anti-csrftoken-a2z': anti_csrftoken_a2z,
'Content-length': '140',
'Content-Type': 'application/json',
'cookie': f'session-id={session_id}; ubid_acbjp={ubid_acbjp}',
}
headers.update(headers_o)
post_url = 'https://www.amazon.co.jp/portal-migration/hz/glow/address-change?actionSource=glow'
data = {"locationType": "LOCATION_INPUT", "zipCode": "540-0002", "storeContext": "generic", "deviceType": "web", "pageType": "Search", "actionSource": "glow"}
result, html = post_html(post_url, json=data, headers=headers)
if result:
if '540-0002' in str(html):
headers = {'Host': 'www.amazon.co.jp', 'User-Agent': get_user_agent(), }
result, html_info = curl_html(req_url, headers=headers)
else:
print('Amazon 修改地区失败: ', req_url, str(result), str(html))
else:
print('Amazon 修改地区异常: ', req_url, str(result), str(html))
except Exception as e:
print('Amazon 修改地区出错: ', req_url, str(e))
print(traceback.format_exc())
return result, html_info

@anyingxiuluo anyingxiuluo changed the title 搜索Amazon高清成功率大幅降低 搜索Amazon高清封面成功率大幅降低 Nov 9, 2024
@newday-life
Copy link

现在修改日本地址没用了,需要日本代理

@anyingxiuluo
Copy link
Contributor Author

现在修改日本地址没用了,需要日本代理

我用的就是日本代理,还是不行

@anyingxiuluo
Copy link
Contributor Author

amazon高清图片刮削已经失效了,使用日本节点也不行

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants