Cookie的英文原意是“点心”,它是在客户端访问Web服务器时,服务器在客户端硬盘上存放的信息,好像是服务器发送给客户的“点心”。服务器可以根据Cookie来跟踪客户状态,这对于需要区别客户的场合(如电子商务)特别有用。
当客户端首次请求访问服务器时,服务器先在客户端存放包含该客户的相关信息的Cookie,以后客户端每次请求访问服务器时,都会在HTTP请求数据中包含Cookie,服务器解析HTTP请求中的Cookie,就能由此获得关于客户的相关信息。
下面我们就来看一下python3爬虫带上cookie的方法:
1、直接将Cookie写在header头部
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
# coding:utf-8 import requests from bs4 import BeautifulSoup cookie = '''cisession=19dfd70a27ec0eecf1fe3fc2e48b7f91c7c83c60;CNZZDATA1000201968=181584 6425-1478580135-https%253A%252F%252Fwww.baidu.com%252F%7C1483922031;Hm_lvt_f805f7762a9a2 37a0deac37015e9f6d9=1482722012,1483926313;Hm_lpvt_f805f7762a9a237a0deac37015e9f6d9=14839 26368''' header = { 'User-Agent' : 'Mozilla / 5.0 (Windows NT 6.1 ; WOW64) AppleWebKit / 537.36 (KHTML, like Geck o) Chrome / 53.0 . 2785.143 Safari / 537.36 ', 'Connection' : 'keep-alive' , 'accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8' , 'Cookie' : cookie} url = 'http://www.zzvips.com/news/89636.html' wbdata = requests.get(url,headers = header).text soup = BeautifulSoup(wbdata, 'lxml' ) print (soup) |
2、使用requests插入Cookie
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
# coding:utf-8 import requests from bs4 import BeautifulSoup cookie = { "cisession" : "19dfd70a27ec0eecf1fe3fc2e48b7f91c7c83c60" , "CNZZDATA100020196" :" 1815846425 - 1478580135 - https % 253A % 252F % 252Fwww .baidu.com % 252F % 7C1483 922031 ", "Hm_lvt_f805f7762a9a237a0deac37015e9f6d9" : "1482722012,1483926313" , "Hm_lpvt_f805f7762a9a237a0deac37015e9f6d9" : "1483926368" } wbdata = requests.get(url,cookies = cookie).text soup = BeautifulSoup(wbdata, 'lxml' ) print (soup) |
实例扩展:
使用cookie登录哈工大ACM站点
获取站点登录地址
http://acm.hit.edu.cn/hoj/system/login
查看要传送的post数据
user和password
Code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
|
#!/usr/bin/env python # -*- coding: utf-8 -*- """ __author__ = 'pi' __email__ = 'pipisorry@126.com' """ import urllib.request, urllib.parse, urllib.error import http.cookiejar LOGIN_URL = 'http://acm.hit.edu.cn/hoj/system/login' values = { 'user' : '******' , 'password' : '******' } # , 'submit' : 'Login' postdata = urllib.parse.urlencode(values).encode() user_agent = r 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36' headers = { 'User-Agent' : user_agent, 'Connection' : 'keep-alive' } cookie_filename = 'cookie.txt' cookie = http.cookiejar.MozillaCookieJar(cookie_filename) handler = urllib.request.HTTPCookieProcessor(cookie) opener = urllib.request.build_opener(handler) request = urllib.request.Request(LOGIN_URL, postdata, headers) try : response = opener. open (request) page = response.read().decode() # print(page) except urllib.error.URLError as e: print (e.code, ':' , e.reason) cookie.save(ignore_discard = True , ignore_expires = True ) # 保存cookie到cookie.txt中 print (cookie) for item in cookie: print ( 'Name = ' + item.name) print ( 'Value = ' + item.value) get_request = urllib.request.Request(get_url, headers = headers) get_response = opener. open (get_request) print (get_response.read().decode()) # print('You have not solved this problem' in get_response.read().decode()) |
到此这篇关于Python3爬虫带上cookie的实例代码的文章就介绍到这了,更多相关Python3爬虫如何带上cookie内容请搜索服务器之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持服务器之家!
原文链接:https://www.py.cn/spider/guide/18504.html