溪流图:世界各国新冠肺炎发展趋势

溪流图:世界各国新冠肺炎发展趋势

大家还记得我昨天使用 Stata 绘制的堆积面积图么?就是下面这个:

这幅图展示的是全世界的疫情状况,看到这幅图我意识到展示世界各国疫情状况的最好方案可能是溪流图,不过已有的绘制溪流图的 R 包我觉得都不怎么好看,于是乎,我今天上午快速地写了个绘制溪流图的 R 包(基于 htmlwidgets 的),经过测试还挺好用,不过帮助文档写的很粗糙,以后有时间再进一步完善,我们来一起使用这个 R 包可视化世界各国的疫情状况吧!

首先安装这个包:

1
devtools::install_github('czxa/streamgrapher')

如果你的电脑连接 GitHub 有问题,可以使用我构建的源码包安装:

1
install.packages("streamgrapher_0.1.0.tar.gz", repos = NULL, type = "source")

安装好之后就可以使用了,在使用之前我们先把数据整理下,还是之前使用的 time_series_19-covid-Confirmed.csv

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
library(readr)
library(lubridate)
library(streamgrapher)
library(tidyr)
read_csv('time_series_19-covid-Confirmed.csv') %>%
gather(5:50, key = "date", value = "confirmed") %>%
`colnames<-`(c("prov", "country", "lat", "lon", "date", "confirmed")) %>%
mutate(
country = case_when(
country %in% c("Mainland China", "Taiwan", "Hong Kong", "Macau") ~ "China",
T ~ country
)
) %>%
group_by(country, date) %>%
summarise(confirmed = sum(confirmed)) %>%
dplyr::filter(country != "China") %>%
mutate(date = mdy(date)) %>%
arrange(country, date) -> df

然后就可以使用 streamgrapher 包绘制溪流图了,这里为了数据的均衡,就先把中国删去了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
library(streamgrapher)
streamgrapher(
date = df$date,
name = df$country,
value = df$confirmed,
title = "世界各国新冠疫情发展趋势",
subtitle = "数据来源: CSSEGISandData/COVID-19: Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE<br>https://github.com/CSSEGISandData/COVID-19<br>绘制:TidyFriday",
titleAlign = "left",
subtitleAlign = "left",
marginBottom = 40,
xAxisMargin = 50,
height = "400px",
width = "100%"
)

streamgrapher 包是基于 htmlwidgets 开发的,所以你可以在 RMarkdown 和 Shiny 应用中使用。这个函数有很多参数,难以一一解释,必须的参数有三个,date、name 和 value,使用的时候你需要给三个参数传递相同长度的向量,建议先把三个向量组装成数据框。还有一个有用的参数是 color,默认使用的是 Color Brewer 的 Set2 调色板,可以自定义,但是需要注意长度应该大于 length(unique(name))。例如我们换一种调色方案:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
streamgrapher(
date = df$date,
name = df$country,
value = df$confirmed,
color = rep(RColorBrewer::brewer.pal(8, "Paired"), 13),
title = "世界各国新冠疫情发展趋势",
subtitle = "数据来源: CSSEGISandData/COVID-19: Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE<br>https://github.com/CSSEGISandData/COVID-19<br>绘制:TidyFriday",
titleAlign = "left",
subtitleAlign = "left",
marginBottom = 40,
xAxisMargin = 50,
height = "400px",
width = "100%"
)

其他的参数基本都是字面意思,很容易理解,就不再一一解释了。

似乎 2020 年 2 月 15 号之前的数据都很小,所以我们只绘制 2020 年 2 月 15 号之后的看看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
df %>% 
dplyr::filter(date >= ymd("2020-02-15")) -> df

streamgrapher(
date = df$date,
name = df$country,
value = df$confirmed,
color = rep(RColorBrewer::brewer.pal(8, "Set3"), 13),
title = "世界各国新冠疫情发展趋势",
subtitle = "数据来源: CSSEGISandData/COVID-19: Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE<br>https://github.com/CSSEGISandData/COVID-19<br>绘制:TidyFriday",
titleAlign = "left",
subtitleAlign = "left",
marginBottom = 40,
xAxisMargin = 50,
height = "400px",
width = "100%"
)

最后如果我们加上中国的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
read_csv('time_series_19-covid-Confirmed.csv') %>%
gather(5:50, key = "date", value = "confirmed") %>%
`colnames<-`(c("prov", "country", "lat", "lon", "date", "confirmed")) %>%
mutate(
country = case_when(
country %in% c("Mainland China", "Taiwan", "Hong Kong", "Macau") ~ "China",
T ~ country
)
) %>%
group_by(country, date) %>%
summarise(confirmed = sum(confirmed)) %>%
# dplyr::filter(country != "China") %>%
mutate(date = mdy(date)) %>%
arrange(country, date) -> df

streamgrapher(
date = df$date,
name = df$country,
value = df$confirmed,
title = "世界各国新冠疫情发展趋势",
subtitle = "数据来源: CSSEGISandData/COVID-19: Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE<br>https://github.com/CSSEGISandData/COVID-19<br>绘制:TidyFriday",
titleAlign = "left",
subtitleAlign = "left",
marginBottom = 40,
xAxisMargin = 50,
height = "400px",
width = "100%"
)

当然用过 ggplot2 也是可以把这幅图绘制出来的(其实用 Stata 也可以),可以自己试试。

知识星球附件链接:https://t.zsxq.com/v7meYzV

#

评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×