Python爬蟲：抓取手機APP的數據

知識 08-29

點擊上方「

Python開發

」，選擇「置頂公眾號」

關鍵時刻，第一時間送達！

摘要

大多數APP裡面返回的是json格式數據，或者一堆加密過的數據。這裡以超級課程表APP為例，抓取超級課程表裡用戶發的話題。

抓取APP數據包

方法詳細可以參考這篇博文：

http://my.oschina.net/jhao104/blog/605963

得到超級課程表登錄的地址：

http://120.55.151.61/V2/StudentSkip/loginCheckV4.action

表單：

表單中包括了用戶名和密碼，當然都是加密過了的，還有一個設備信息，直接post過去就是。

另外必須加header,一開始我沒有加header得到的是登錄錯誤，所以要帶上header信息。

登錄

登錄代碼：

import

 urllib2   from
 cookielib import
 CookieJar   loginUrl = "http://120.55.151.61/V2/StudentSkip/loginCheckV4.action"
   headers = {       "Content-Type"
: "application/x-www-form-urlencoded; charset=UTF-8"
,       "User-Agent"
 
: 
"Dalvik/1.6.0 (Linux; U; Android 4.1.1; M040 Build/JRO03H)"
,       "Host"
: "120.55.151.61"
,       "Connection"
: "Keep-Alive"
,       "Accept-Encoding"
: "gzip"
,       "Content-Length"
: "207"
,       }   loginData = "phoneBrand=Meizu&platform=1&deviceCode=868033014919494&account=FCF030E1F2F6341C1C93BE5BBC422A3D&phoneVersion=16&password=A55B48BB75C79200379D82A18C5F47D6&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&"
 
   
cookieJar = CookieJar()   opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))   req = urllib2.Request(loginUrl, loginData, headers)   loginResult = opener.open(req).read()   
print
 loginResult

登錄成功會返回一串賬號信息的json數據

和抓包時返回數據一樣，證明登錄成功

抓取數據

用同樣方法得到話題的url和post參數

做法就和模擬登錄網站一樣。詳見：

http://my.oschina.net/jhao104/blog/547311

下見最終代碼，有主頁獲取和下拉載入更新。可以無限載入話題內容。

#!/usr/local/bin/python2.7 # -*- coding: utf8 -*-

""" 超級課程表話題抓取 """

   import
 urllib2   from
 cookielib import
 CookieJar   import
 json   """ 讀Json數據 """
   
def
 fetch_data
(json_data)
:
       data = json_data["data"
]       timestampLong = data["timestampLong"
]       messageBO = data["messageBOs"
]       topicList = []       for
 each in
 messageBO:           topicDict = {}           if
 each.get("content"
, False
):               topicDict["content"
] = each["content"
]               topicDict["schoolName"
] = each["schoolName"
]               topicDict["messageId"
] = each["messageId"
]               topicDict["gender"
] = each["studentBO"
]["gender"
]               topicDict["time"
] = each["issueTime"
]               print
 each["schoolName"
],each["content"
]               topicList.append(topicDict)       return
 timestampLong, topicList   """ 載入更多 """
   
def
 load
(timestamp, headers, url)
:
       headers["Content-Length"
] = "159"
       loadData = "timestamp=%s&phoneBrand=Meizu&platform=1&genderType=-1&topicId=19&phoneVersion=16&selectType=3&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&"
 % timestamp       req = urllib2.Request(url, loadData, headers)       loadResult = opener.open(req).read()       loginStatus = json.loads(loadResult).get("status"
, False
)       if
 loginStatus == 1
:           print
 "load successful!"
           timestamp, topicList = fetch_data(json.loads(loadResult))           load(timestamp, headers, url)       else
:           print
 "load fail"
           print
 loadResult           return
 False
   loginUrl = "http://120.55.151.61/V2/StudentSkip/loginCheckV4.action"
   topicUrl = "http://120.55.151.61/V2/Treehole/Message/getMessageByTopicIdV3.action"
   headers = {       "Content-Type"
: "application/x-www-form-urlencoded; charset=UTF-8"
,       "User-Agent"
: "Dalvik/1.6.0 (Linux; U; Android 4.1.1; M040 Build/JRO03H)"
,       "Host"
: "120.55.151.61"
,       "Connection"
: "Keep-Alive"
,       "Accept-Encoding"
: "gzip"
,       "Content-Length"
: "207"
,       }   """ ---登錄部分--- """
   
loginData = "phoneBrand=Meizu&platform=1&deviceCode=868033014919494&account=FCF030E1F2F6341C1C93BE5BBC422A3D&phoneVersion=16&password=A55B48BB75C79200379D82A18C5F47D6&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&"
   
cookieJar = CookieJar()   opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))   req = urllib2.Request(loginUrl, loginData, headers)   loginResult = opener.open(req).read()   loginStatus = json.loads(loginResult).get("data"
, False
)   if
 loginResult:       print
 "login successful!"
   else
:       print
 "login fail"
       print
 loginResult   """ ---獲取話題--- """
   
topicData = "timestamp=0&phoneBrand=Meizu&platform=1&genderType=-1&topicId=19&phoneVersion=16&selectType=3&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&"
   
headers["Content-Length"
] = "147"
   
topicRequest = urllib2.Request(topicUrl, topicData, headers)   topicHtml = opener.open(topicRequest).read()   topicJson = json.loads(topicHtml)   topicStatus = topicJson.get("status"
, False
)   print
 topicJson   if
 topicStatus == 1
:       print
 "fetch topic success!"
       timestamp, topicList = fetch_data(topicJson)       load(timestamp, headers, topicUrl)