Python 疫情数据的可视化与分析(一)

疫情数据的可视化

对疫情地图的数据的抓取与可视化词云显示,采用百度地图数据https://voice.baidu.com/act/newpneumonia/newpneumonia

第一弹数据获取:

可以打印出url网页数据信息

import requests
import json
from lxml import etree
import openpyxl

url = "https://voice.baidu.com/act/newpneumonia/newpneumonia"
response = requests.get(url)
print(response.text)

查看url的网页源代码,用ctr+f 快速查找
可以看到数据文件的格式以application/json 开头
在这里插入图片描述

而且以component的caseList里开始才有疫情数据
在这里插入图片描述

通过获取URL的component对象里的caseList转换成json数据

html = etree.HTML(response.text)
result = html.xpath('//script[@type="application/json"]/text()')
result = result[0]
result = json.loads(result)
# print(result['component'][0]['globalList'])
result1 = result['component'][0]['caseList']
for each in result1:
    print(each)
    print('*' * 50 + '\n')

储存到excel中

# 创建工作簿
wb = openpyxl.Workbook()
# 创建工作表
ws = wb.active
ws.title = "国内疫情"
ws.append(['省份', '累计确诊', '死亡', '治愈', '现有确诊', '累计确诊', '死亡增量', '治愈增量', '现有确诊增量'])
for each in result1:
    temp_list = [each['area'], each['confirmed'], each['died'], each['crued'], each['relativeTime'], each['confirmedRelative'], each['diedRelative'], each['curedRelative'], each['curConfirmRelative']]
    for i in range(len(temp_list)):
      if temp_list[i] == '':
          temp_list[i] = '0'
    ws.append(temp_list)

wb.save('./data.xlsx')

结果如下data.xls
在这里插入图片描述
将代码改成国外的获取数据获取globalList

result2 = result['component'][0]['globalList']
for each in result2:
    print(each)
    print('*' * 50 + '\n')

# 创建工作簿
wb = openpyxl.Workbook()
# 创建工作表
ws = wb.active
ws.title = "国内疫情"
ws.append(['省份', '累计确诊', '死亡', '治愈', '现有确诊    ', '累计确诊'])
for each in result2:
    temp_list = [each['area'], each['confirmed'], each['died'], each['crued'], each['confirmedRelative'], each['curConfirm']]
    for i in range(len(temp_list)):
      if temp_list[i] == '':
          temp_list[i] = '0'
    ws.append(temp_list)

wb.save('./data1.xlsx')

在这里插入图片描述
将给州的数据分隔,在每个数据里有subList
在这里插入图片描述

比如说{'area': '欧洲', 'subList': [{'died': '52', 'confirmed': '2629', 'crued': '1535',

result2 = result['component'][0]['globalList']
for each in result2:
    print(each)
    print('*' * 50 + '\n')

# 创建工作簿
wb = openpyxl.Workbook()
# 创建工作表
ws = wb.active
ws.title = "国内疫情"
ws.append(['省份', '累计确诊', '死亡', '治愈', '现有确诊    ', '累计确诊'])
for each in result2:
    temp_list = [each['area'], each['confirmed'], each['died'], each['crued'], each['confirmedRelative'], each['curConfirm']]
    for i in range(len(temp_list)):
      if temp_list[i] == '':
          temp_list[i] = '0'
    ws.append(temp_list)
for each in result2:
    sheet_title = each['area']
    # 创建新的工作表
    ws_out = wb.create_sheet(sheet_title)
    ws_out.append(['国家', '累计确诊', '死亡', '治愈', '现有确诊    ', '累计确诊'])
    for country in each['subList']:
        temp_list = [country['country'], country['confirmed'], country['died'], country['crued'], country['confirmedRelative'], country['curConfirm']]
        ws_out.append(temp_list)


wb.save('./data1.xlsx')

结果如图
在这里插入图片描述
好了数据清洗就告一段落了。疫情的词云分析请看接下来的博文
疫情数据的可视化与分析(二)