2018年5月23日 星期三 晴

今天想抓取一些APP内置安装到手机里自动测试,想起了习惯的腾讯应用宝,看了一下网页,下载是用js写的,不是纯粹的url,生成的url好像还有一定的防爬机制,于是想起了selenium,用了一下,果然好使。

用python3写的代码如下:

[code] import urllib.request import re import time

def get_url_data(url): nFail = 0 while nFail < 5: try: file = urllib.request.urlopen(url) rsp = file.read().decode(‘utf-8’) return rsp except: nFail += 1 print(“get url fail:{url} count={nFail}".format(url=url,nFail=nFail)) print(“get url fail:{url}".format(url = url)) return None

def getTop100(): from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
url = 'http://mapp.qzone.qq.com/cgi-bin/mapp/mapp_applist?apptype=soft_top&pageNo=1&pageSize=120&platform=touch&network_type=undefined&resolution=1920x1080'
xpath = '//*[@id="detail"]/section/div/div[2]/a[1]'

idstr = '"id":(\d+)'
idobj = re.compile(idstr,re.M | re.DOTALL)

rsp = get_url_data(url)
if rsp:
    ids = idobj.findall(rsp)
    for id in ids:
        again = 1
        url = 'http://app.qq.com/#id=detail&appid=%s' % str(id)
        while again:
            try:
                driver.get(url)
                WebDriverWait(driver, 30).until(
                              EC.presence_of_element_located((By.ID, 'detail')))
                download = driver.find_element_by_xpath(xpath)
                print(download)
                download.click()
                again = 0
            except Exception as e:
                print(e)
time.sleep(10)

if name == “main”: getTop100() [/code]

后来再仔细看爬TOP100应用的返回内容,里面有直接提供apk的url…..