华企号 后端开发 python爬取某游戏皮肤(学习使用)

python爬取某游戏皮肤(学习使用)

使用到了requests,xpath,re解析模块,同时使用了time,os模块辅助。使用xpath解析时,并不是所见即所得,没有re好用。

一、requests模块:

url = “https://pvp.qq.com/web201605/herolist.shtml”
headers = {
“User-Agent”: “Mozilla/5.0(Windows NT 6.1;WOW64) AppleWebKit/537.36(KABUL, like Gecko) ”
“Chrome/86.0.4240.198Safari/537.36 ”
}
resp = requests.get(url=url, headers=headers)
resp.encoding = resp.apparent_encoding
1.
2.
3.
4.
5.
6.
7.

二、xpath模块

from lxml import etree

e = etree.HTML(resp.text)
href = e.xpath(“//div[@class=’herolist-box’]/div/ul/li/a/@href”)
names = e.xpath(“//div[@class=’herolist-box’]/div/ul/li/a/img/@alt”)
1.
2.
3.
4.
5.

三、re模块

import re

reg = r’background:url(‘.*?’)’
src = re.findall(reg, resp_1.text, re.S)

 

四、os和time模块

import time
import os

if not os.path.exists(’03-herodetail’):
os.makedirs(’03-herodetail’)

time.sleep(2)

五、图片文件保存:

with open(f’03-herodetail/{i}.jpg’, “wb”) as f:
f.write(resp.content)

 

六、完整实例:

 

import requests
from lxml import etree
import time
import os
import re

if not os.path.exists(’03-herodetail’):
os.makedirs(’03-herodetail’)
url = “https://pvp.qq.com/web201605/herolist.shtml”
headers = {
“User-Agent”: “Mozilla/5.0(Windows NT 6.1;WOW64) AppleWebKit/537.36(KABUL, like Gecko) ”
“Chrome/86.0.4240.198Safari/537.36 ”
}
resp = requests.get(url=url, headers=headers)
resp.encoding = resp.apparent_encoding
e = etree.HTML(resp.text)
href = e.xpath(“//div[@class=’herolist-box’]/div/ul/li/a/@href”)
names = e.xpath(“//div[@class=’herolist-box’]/div/ul/li/a/img/@alt”)
# for name in names:
# print(name)
lst_link = []
for link in href:
lst_link.append(“https://pvp.qq.com/web201605/”+ link)
# print(lst_link)

for item in lst_link:
print(item)
resp_1 = requests.get(item, headers=headers)
resp_1.encoding = resp_1.apparent_encoding
# print(resp_1.status_code)
# print(resp_1.text)
# break
e_1 = etree.HTML(resp_1.text)
data_title = e_1.xpath(“//div[@class=’zk-con1 zk-con’]/div/div/div/ul/@data-imgname”)

# print(data_title,type(data_title))
# break
# bi_zhi_url = e_1.xpath(“//div[3]/div[1]/@style”)
data_src = e_1.xpath(“//div[@class=’zk-con1 zk-con’]/div/div/div/ul/li//@src”)
reg = r’background:url(‘.*?’)’
src = re.findall(reg, resp_1.text, re.S)
n = src[0].split(‘//’)
# print(n)
# print(type(n))
name = n[1][:-7]
# break
# data_name = e_1.xpath(“//div[@class=’zk-con1 zk-con’]/div/div/div/ul/@data-imgname”)
# print(data_name)
names = data_title[0].split(‘|’)
count = 0
for i in names:
count += 1
# print(i)
src_1 = src
href = “http://” + name + str(count) + “.jpg”
resp = requests.get(url=href, headers=headers)
with open(f’03-herodetail/{i}.jpg’, “wb”) as f:
f.write(resp.content)
time.sleep(2)

 

七:效果截图:

 

 

 

 

作者: 华企网通王鹏程序员

我是程序员王鹏,热爱互联网软件开发和设计,专注于大数据、数据分析、数据库、php、java、python、scala、k8s、docker等知识总结。 我的座右铭:"业精于勤荒于嬉,行成于思毁于随"
上一篇
下一篇

发表回复

联系我们

联系我们

028-84868647

在线咨询: QQ交谈

邮箱: tech@68v8.com

工作时间:周一至周五,9:00-17:30,节假日休息

关注微信
微信扫一扫关注我们

微信扫一扫关注我们

关注微博
返回顶部