如何記錄全年的值? (How to note the values across year?)


問題描述

如何記錄全年的值? (How to note the values across year?)

我正在處理如下所示的時間序列數據:

     cname year govstruct
6091 China 1960         3
6092 China 1961         3
6093 China 1962         3
6094 China 1963         3
6095 China 1964         3
6096 China 1965         3
6097 China 1966         3
6098 China 1967         3
6099 China 1968         3
6100 China 1969         3
6101 China 1970         3
6102 China 1971         3
6103 China 1972         3
6104 China 1973         3
6105 China 1974         3
6106 China 1975         3
6107 China 1976         3
6108 China 1977         3
6109 China 1978         3
6110 China 1979         3
6111 China 1980         3
6112 China 1981         3
6113 China 1982         3
6114 China 1983         1
6115 China 1984         1
6116 China 1985         1
6117 China 1986         1
6118 China 1987         1
6119 China 1988         1
6120 China 1989         1
6121 China 1990         1
6122 China 1991         1
6123 China 1992         1
6124 China 1993         1
6125 China 1994         1
6126 China 1995         1
6127 China 1996         1
6128 China 1997         1
6129 China 1998         1
6130 China 1999         1
6131 China 2000         1
6132 China 2001         1
6133 China 2002         1
6134 China 2003         1
6135 China 2004         1
6136 China 2005         1
6137 China 2006         3
6138 China 2007         3
6139 China 2008         3
6140 China 2009         3
6141 China 2010         3
6142 China 2011         3
6143 China 2012         3


我想構建一個數據集,記錄 govstruct 涵蓋的日期範圍。

p>

我想要的是一個記錄國家名稱、年份範圍和 govstruct 值的數據集。這樣最終的數據集如下所示:

cname    years  govstruct
China 1960‑1982    3
China 1983‑2005    1
China 2006‑2012    3

請注意,我將遍歷國家/地區。因此,任何可以這樣做的代碼都將不勝感激。

非常感謝您的幫助。


參考解法

方法 1:

Here is one option with dplyr/data.table where we group by 'cname', and the run‑length‑id of 'govstruct', and summarise by pasteing the range of 'year'

library(dplyr)
library(stringr)
library(data.table)
df1 %>% 
    group_by(cname, grp = rleid(govstruct)) %>%
    summarise(govstructure = first(govstruct), 
         years = str_c(range(year), collapse="‑"))  %>%
    ungroup %>%
    select(‑grp)
# A tibble: 3 x 3
#  cname govstructure years    
#  <chr>        <int> <chr>    
#1 China            3 1960‑1982
#2 China            1 1983‑2005
#3 China            3 2006‑2012

Or we can also construct the grp based on comparing the adjacent elements

df1 %>%
   group_by(cname, grp = cumsum(c(TRUE, diff(govstruct) != 0))) %>%
   summarise(govstructure = first(govstruct), 
         years = str_c(range(year), collapse="‑")) 

Or using data.table using the same method as in the dplyr. i.e grouped by rleid of 'govstruct' and 'cname' paste the range of 'year'

library(data.table)
setDT(df1)[ , .(govstructure = first(govstruct),
      year = paste(range(year), collapse = "‑")), 
      .(cname, grp = rleid(govstruct))][, grp := NULL][]
#  cname govstructure      year
#1: China            3 1960‑1982
#2: China            1 1983‑2005
#3: China            3 2006‑2012

Or another option with base R

grp <‑ with(rle(df1$govstruct), rep(seq_along(values), lengths))
aggregate(year ~ cname + grp, data = df1, 
      FUN = function(x) paste(range(x), collapse="‑"))

data

df1 <‑ structure(list(cname = c("China", "China", "China", "China", 
"China", "China", "China", "China", "China", "China", "China", 
"China", "China", "China", "China", "China", "China", "China", 
"China", "China", "China", "China", "China", "China", "China", 
"China", "China", "China", "China", "China", "China", "China", 
"China", "China", "China", "China", "China", "China", "China", 
"China", "China", "China", "China", "China", "China", "China", 
"China", "China", "China", "China", "China", "China", "China"
), year = 1960:2012, govstruct = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L)), 
class = "data.frame", row.names = c(NA, 
‑53L))

方法 2:

We can use data.table to paste the values of first and last year for each cname and run‑length encoding value of govstruct.

library(data.table)

setDT(df)[ , .(year = paste(first(year), last(year), sep = "‑"), 
           govstruct = first(govstruct)), .(cname, rleid(govstruct))]


#   cname rleid      year govstruct
#1: China     1 1960‑1982         3
#2: China     2 1983‑2005         1
#3: China     3 2006‑2012         3

(by Sharif AmlaniakrunRonak Shah)

參考文件

  1. How to note the values across year? (CC BY‑SA 2.5/3.0/4.0)

#R #data-manipulation #time-series






相關問題

如何將均值、標準差等函數應用於整個矩陣 (How to apply mean, sd etc. function to a whole matrix)

Tạo các thùng của mỗi hàng trong bảng và vẽ hình thanh ngăn xếp trong R (Make bins of each table row and draw stack bar figure in R)

Reading not quite correct .csv file in R (Reading not quite correct .csv file in R)

包'treemap'中的線條粗細 (Thickness of lines in Package ‘treemap’)

是否需要帶有 awk 的預處理文件,或者可以直接在 R 中完成? (Is preprocessing file with awk needed or it can be done directly in R?)

rpivotTable 選擇元素下拉菜單 (rpivotTable select elements drop down menu)

優化性能 - Shiny 中的大文件輸入 (Optimizing Performance - Large File Input in Shiny)

數值取決於所應用的應用系列,R (Numeric values depending of apply family applied, R)

如何記錄全年的值? (How to note the values across year?)

R中的線性搜索 (Linear search in R)

在 dplyr/purrr 工作流程中動態連接多個數據集 (Dynamically join multiple datasets in a dplyr/purrr workflow)

如何將行值更改為列名 (R) (How change Row values to Column names (R))







留言討論