python模拟登录，cookie问题求解

mckelvin

2013-04-29 08:57:33 +08:00

没跑过lz的代码，如果代码是文档上抄的没有问题那可能是服务器检测到了是爬虫，response headers里就没有加set-cookies ，建议request headers里补上User-Agent、Referer等项。
推荐使用requests模块的session，使用异常方便。它把cookie封装好了，但还是有些坑，极少数情况下才需要人工去干预cookie。

lfhong

2013-04-29 10:16:14 +08:00

我给你贴一个我写的browser吧，希望你能用上。

import gzip
import socket
import urllib2
import cookielib
from StringIO import StringIO

class Browser(object):

def __init__(self, filecookie=None, PROXY=None):
VERSION = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_7) AppleWebKit/534.35 (KHTML, like Gecko) Chrome/13.0.761.0 Safari/534.35'
self.version = VERSION
self.headers = []
self.headers.append(('User-agent', self.version))
self.headers.append(('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'))
self.headers.append(('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'))
self.headers.append(('Accept-Encoding', 'gzip'))
self.headers.append(('Accept-Language', 'en-US,en;q=0.8'))
self.headers.append(('Connection', 'keep-alive'))
if filecookie:
self.cj = cookielib.MozillaCookieJar(filecookie)
else:
self.cj = cookielib.CookieJar()
if PROXY and 'http' in PROXY:
proxy_handler = urllib2.ProxyHandler(PROXY)
self.opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(self.cj), proxy_handler)
else:
self.opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(self.cj))
self.opener.addheaders = self.headers

def addheaders(self, headers):
self.opener.addheaders = self.headers + headers

def open(self, url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
if data:
pg = self.opener.open(url, data, timeout=timeout)
else:
pg = self.opener.open(url, timeout=timeout)
if pg.info().get('Content-Encoding') == 'gzip':
buf = StringIO(pg.read())
f = gzip.GzipFile(fileobj=buf)
return f.read()
else:
return pg.read()

lfhong

2013-04-29 10:18:12 +08:00

用的时候

browser = Browser()

pg_content = browser.open(url) # 这是用 GET
pg_content = browser.open(url, data={'username':'1234', 'password': '12345'}) # POST

for4

2013-04-29 10:58:00 +08:00

建议用requests

exoticknight

2013-04-29 13:56:00 +08:00

@alexrezit 还是不行……

exoticknight

2013-04-29 13:56:53 +08:00

@mckelvin info()输出后是可以看到有set-cookie的……所以我才去折腾cookie

exoticknight

2013-04-29 13:57:44 +08:00

@lfhong 贴代码好评～我先去试试

scola

2013-04-29 15:12:25 +08:00

前不久帮同事写了个下载助手，快速下载内部网站上的文件，网站需要登录
lz可以参考下
https://gist.github.com/325862401/5403766
里面用到两个库ClientCookie ClientForm，参考这个
http://code.activestate.com/recipes/391929-access-password-protected-web-applications-for-scr/

exoticknight

2013-04-29 15:14:24 +08:00

@for4 似乎是要keepalive的问题，我去试试requests

exoticknight

2013-04-29 15:19:40 +08:00

@scola 谢谢，我去研究一下

qdcanyun

2013-04-29 19:25:45 +08:00

我可以推荐的requests里的Session么完全满足你的要求

exoticknight

2013-04-29 19:29:19 +08:00

@qdcanyun 我去看了一下似乎是这样，正在抓包+尝试^_^

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/67292