2018年5月23日 星期三 晴
今天想抓取一些APP内置安装到手机里自动测试,想起了习惯的腾讯应用宝,看了一下网页,下载是用js写的,不是纯粹的url,生成的url好像还有一定的防爬机制,于是想起了selenium,用了一下,果然好使。
用python3写的代码如下:
[code] import urllib.request import re import time
def get_url_data(url): nFail = 0 while nFail < 5: try: file = urllib.request.urlopen(url) rsp = file.read().decode(‘utf-8’) return rsp except: nFail += 1 print(“get url fail:{url} count={nFail}".format(url=url,nFail=nFail)) print(“get url fail:{url}".format(url = url)) return None
def getTop100(): from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
url = 'http://mapp.qzone.qq.com/cgi-bin/mapp/mapp_applist?apptype=soft_top&pageNo=1&pageSize=120&platform=touch&network_type=undefined&resolution=1920x1080'
xpath = '//*[@id="detail"]/section/div/div[2]/a[1]'
idstr = '"id":(\d+)'
idobj = re.compile(idstr,re.M | re.DOTALL)
rsp = get_url_data(url)
if rsp:
ids = idobj.findall(rsp)
for id in ids:
again = 1
url = 'http://app.qq.com/#id=detail&appid=%s' % str(id)
while again:
try:
driver.get(url)
WebDriverWait(driver, 30).until(
EC.presence_of_element_located((By.ID, 'detail')))
download = driver.find_element_by_xpath(xpath)
print(download)
download.click()
again = 0
except Exception as e:
print(e)
time.sleep(10)
if name == “main”: getTop100() [/code]
后来再仔细看爬TOP100应用的返回内容,里面有直接提供apk的url…..
...