python 爬取吉首大学网站成绩单_Python

python 爬取吉首大学网站成绩单

2021-11-19 13:55chen0495 Python

python简单爬虫,爬取吉首大学成绩单，学校的成绩单居然不支持导出,太坑了,算个绩点居然还要手打,我吐了.花2个多小时写了此python程序来生成可方便求和平均的Excel文件,帮助JSU学子脱离手算烦恼...

项目地址：

https://github.com/chen0495/pythoncrawlerforjsu

环境

python 3.5即以上
request、beautifulsoup、numpy、pandas.
安装beautifulsoup使用命令pip install beautifulsoup4

配置及使用

登陆学校成绩单查询网站,修改cookie.

python 爬取吉首大学网站成绩单

按f12后按ctrl+r刷新一下,获取cookie的方法见下图:

python 爬取吉首大学网站成绩单

修改爬虫url为自己的成绩单网址.

python 爬取吉首大学网站成绩单

运行src/main.py文件即可在/result下得到csv文件.

结果展示

python 爬取吉首大学网站成绩单

完整代码

				?

									# -*- coding: utf-8 -*-

									# @time    : 5/29/2021 2:13 pm

									# @author  : chen0495

									# @email   : 1346565673@qq.com|chenweiin612@gmail.com

									# @file    : main.py

									# @software: pycharm

									import requests as rq

									from bs4 import beautifulsoup as bs

									import numpy as np

									import pandas as pd

									rq.adapters.default_retries = 5

									s = rq.session()

									s.keep_alive = false # 关闭多余连接

									header = { # 请更改cookie

									    'user-agent' : 'mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/92.0.4501.0 safari/537.36 edg/92.0.891.1',

									    'cookie' : 'wengine_vpn_ticketwebvpn_jsu_edu_cn=xxxxxxxxxx; show_vpn=1; refresh=1'

									}

									# 请更改url

									r = rq.get('https://webvpn.jsu.edu.cn/https/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/jsxsd/kscj/cjcx_list', headers = header, verify=false)

									soup = bs(r.text,'html.parser')

									head = []

									for th in soup.find_all("th"):

									    head.append(th.text)

									while '' in head:

									    head.remove('')

									head.remove('序号')

									context = np.array(head)

									x = []

									flag = 0

									for td in soup.find_all("td"):

									    if flag!=0 and flag%11!=1:

									        x.append(td.text)

									    if flag%11==0 and flag!=0:

									        context = np.row_stack((context,np.array(x)))

									        x.clear()

									    flag+=1

									context = np.delete(context,0,axis=0)

									data = pd.dataframe(context,columns=head)

									print(data)

									# 生成文件,亲更改文件名

									data.to_csv('../result/result.csv',encoding='utf-8-sig')