Python 疫情数据的可视化与分析(一)
疫情数据的可视化
对疫情地图的数据的抓取与可视化词云显示,采用百度地图数据https://voice.baidu.com/act/newpneumonia/newpneumonia
第一弹数据获取:
可以打印出url网页数据信息
import requests
import json
from lxml import etree
import openpyxl
url = "https://voice.baidu.com/act/newpneumonia/newpneumonia"
response = requests.get(url)
print(response.text)
查看url的网页源代码,用ctr+f 快速查找
可以看到数据文件的格式以application/json 开头
而且以component的caseList里开始才有疫情数据
通过获取URL的component对象里的caseList转换成json数据
html = etree.HTML(response.text)
result = html.xpath('//script[@type="application/json"]/text()')
result = result[0]
result = json.loads(result)
# print(result['component'][0]['globalList'])
result1 = result['component'][0]['caseList']
for each in result1:
print(each)
print('*' * 50 + '\n')
储存到excel中
# 创建工作簿
wb = openpyxl.Workbook()
# 创建工作表
ws = wb.active
ws.title = "国内疫情"
ws.append(['省份', '累计确诊', '死亡', '治愈', '现有确诊', '累计确诊', '死亡增量', '治愈增量', '现有确诊增量'])
for each in result1:
temp_list = [each['area'], each['confirmed'], each['died'], each['crued'], each['relativeTime'], each['confirmedRelative'], each['diedRelative'], each['curedRelative'], each['curConfirmRelative']]
for i in range(len(temp_list)):
if temp_list[i] == '':
temp_list[i] = '0'
ws.append(temp_list)
wb.save('./data.xlsx')
结果如下data.xls
将代码改成国外的获取数据获取globalList
result2 = result['component'][0]['globalList']
for each in result2:
print(each)
print('*' * 50 + '\n')
# 创建工作簿
wb = openpyxl.Workbook()
# 创建工作表
ws = wb.active
ws.title = "国内疫情"
ws.append(['省份', '累计确诊', '死亡', '治愈', '现有确诊 ', '累计确诊'])
for each in result2:
temp_list = [each['area'], each['confirmed'], each['died'], each['crued'], each['confirmedRelative'], each['curConfirm']]
for i in range(len(temp_list)):
if temp_list[i] == '':
temp_list[i] = '0'
ws.append(temp_list)
wb.save('./data1.xlsx')
将给州的数据分隔,在每个数据里有subList
比如说{'area': '欧洲', 'subList': [{'died': '52', 'confirmed': '2629', 'crued': '1535',
result2 = result['component'][0]['globalList']
for each in result2:
print(each)
print('*' * 50 + '\n')
# 创建工作簿
wb = openpyxl.Workbook()
# 创建工作表
ws = wb.active
ws.title = "国内疫情"
ws.append(['省份', '累计确诊', '死亡', '治愈', '现有确诊 ', '累计确诊'])
for each in result2:
temp_list = [each['area'], each['confirmed'], each['died'], each['crued'], each['confirmedRelative'], each['curConfirm']]
for i in range(len(temp_list)):
if temp_list[i] == '':
temp_list[i] = '0'
ws.append(temp_list)
for each in result2:
sheet_title = each['area']
# 创建新的工作表
ws_out = wb.create_sheet(sheet_title)
ws_out.append(['国家', '累计确诊', '死亡', '治愈', '现有确诊 ', '累计确诊'])
for country in each['subList']:
temp_list = [country['country'], country['confirmed'], country['died'], country['crued'], country['confirmedRelative'], country['curConfirm']]
ws_out.append(temp_list)
wb.save('./data1.xlsx')
结果如图
好了数据清洗就告一段落了。疫情的词云分析请看接下来的博文
疫情数据的可视化与分析(二)。