python爬虫爬取糗事百科视频

python爬虫爬取糗事百科视频

注意:

需要提前配置好环境, 不会的话可以自行百度或者看我的另一篇博文, 链接:Windows下的Python安装教程(绝对详细)

1.安装第三方库

requests的安装

打开cmd, 执行命令 pip install requests,如图:
无

lxml的安装

打开cmd, 执行命令 pip install lxml,如图:
无
注意, 需要配置环境变量及pip镜像源为国内

2.代码

#!Python
# -*- encoding: utf-8 -*-
'''
1.文件名称 : QiuShiSpider.py
2.创建时间 : 2021/03/07 16:15:08
3.作者名称 : ZAY
4.Python版本 : 3.7.0
'''


import os
import requests
from lxml import etree
from multiprocessing import Pool


class Spider():
    def __init__(self):
        self.videos_list = []
        self.geturl = "https://www.qiushibaike.com/video/page/{}/"
        self.headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36"}
        self.make_folder()

    def make_folder(self):
        try:
            os.mkdir("D:\\videos")
        except FileExistsError:
            pass

    def get_url(self):
        for num in range(1, 14):
            url = self.geturl.format(num)
            try:
                response = requests.get(url, headers=self.headers)
                response.encoding = "utf-8"
                pyhtml = etree.HTML(response.text)
                for video_url in pyhtml.xpath("//video//@src"):
                    self.videos_list.append(video_url)
                print("正在获取{}的视频链接...".format(url))
            except:
                print("获取{}的视频链接失败!!!".format(url))

    def download_videos(self, url):
        filelist = os.listdir("D:\\videos")
        try:
            url = "https:" + url
            filename = url.replace("https://qiubai-video.qiushibaike.com/", "")
            if filename in filelist:
                print("%s已存在!!!" % filename)
            else:
                video = requests.get(url, headers=self.headers).content
                os.chdir("D:\\Videos")
                with open("%s" % filename, 'wb') as file:
                    file.write(video)
                print("%s下载成功!!!" % filename)
        except:
            print("%s下载失败!!!" % filename)

    def run(self):
        self.get_url()
        pool = Pool()
        pool.map(self.download_videos, self.videos_list)


if __name__ == "__main__":
    spider = Spider()
    spider.run()

3.运行效果

运行时, 将会多线程下载视频, 最终视频会保存在电脑的D:\videos下,如图:
无
无

若有疑问, 欢迎在评论区提问