python抓取网页中链接的静态图片_Python

python抓取网页中链接的静态图片

2021-01-09 00:50zoujm-hust12 Python

这篇文章主要为大家详细介绍了python抓取网页中链接的静态图片，具有一定的参考价值，感兴趣的小伙伴们可以参考一下

本文实例为大家分享了python抓取网页中链接的静态图片的具体代码，供大家参考，具体内容如下

				?

									# -*- coding:utf-8 -*- 

									#http://tieba.baidu.com/p/2460150866 

									#抓取图片地址 

									from bs4 import BeautifulSoup 

									import urllib.request 

									from time import sleep 

									html_doc = "http://tieba.baidu.com/p/2460150866"

									def get_image(url): 

									 req = urllib.request.Request(url) 

									 webpage = urllib.request.urlopen(req) 

									 html = webpage.read() 

									 soup = BeautifulSoup(html, 'html.parser') 

									 #抓取图片地址 

									 #抓取img标签且class为BDE_Image的所有内容 

									 img_src=soup.findAll("img",{'class':'BDE_Image'}) 

									 i = 1

									 for img in img_src: 

									  img_url = img.get('src') #抓取src 

									 # print(img) 

									  req = urllib.request.Request(img_url) 

									  u = urllib.request.urlopen(req) 

									  data = u.read() 

									  with open("AutoCodePng20180119-"+str(i)+".jpg", 'wb') as f: 

									   sleep(2) 

									   f.write(data) 

									   i += 1

									def getImg(url): 

									 html = urllib.request(url) 

									 page = html.read() 

									 soup = BeautifulSoup(page, "html.parser") 

									 imglist = soup.find_all('img') #发现html中带img标签的数据，输出格式为<img xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx，存入集合 

									 lenth = len(imglist) #计算集合的个数 

									 for i in range(lenth): 

									  print imglist[i].attrs['src'] #抓取img中属性为src的信息,例如<img src="123456" xxxxxxxxxxxxxxxx,则输出为123456