python获取整个网页源码的方法_Python

python获取整个网页源码的方法

2020-08-03 11:06Ly Python

在本篇文章里小编给大家整理的是关于python获取整个网页源码的方法，需要的朋友们可以参考下。

1、Python中获取整个页面的代码：

				?

									import requests

									res = requests.get('https://blog.csdn.net/yirexiao/article/details/79092355')

									res.encoding = 'utf-8'

									print(res.text)

2、运行结果

实例扩展：

				?

									from bs4 import BeautifulSoup

									import time,re,urllib2

									t=time.time()

									websiteurls={}

									def scanpage(url):

									 websiteurl=url

									 t=time.time()

									 n=0

									 html=urllib2.urlopen(websiteurl).read()

									 soup=BeautifulSoup(html)

									 pageurls=[]

									 Upageurls={}

									 pageurls=soup.find_all("a",href=True)

									 for links in pageurls:

									  if websiteurl in links.get("href") and links.get("href") not in Upageurls and links.get("href") not in websiteurls:

									   Upageurls[links.get("href")]=0

									 for links in Upageurls.keys():

									  try:

									   urllib2.urlopen(links).getcode()

									  except:

									   print "connect failed"

									  else:

									   t2=time.time()

									   Upageurls[links]=urllib2.urlopen(links).getcode()

									   print n,

									   print links,

									   print Upageurls[links]

									   t1=time.time()

									   print t1-t2

									  n+=1

									 print ("total is "+repr(n)+" links")

									 print time.time()-t

									scanpage(<a href="http://news.163.com/" rel="external nofollow">http://news.163.com/</a>)