r - Multi windows range calculations data.table vs dplyr -


i'm doing range calculations (i.e. max , min) on multiple windows on stocks returns. have version in dplyr, many people publishing benchmarking calculations data.table faster. i've created version data.table syntax, it's slower dplyr one. me find better way use data.table make faster? many thanks.

library(quandl) library(tidyr) library(dplyr) library(data.table) library(microbenchmark)  tickers <- c("goog/nasdaq_aapl", "goog/nasdaq_msft",               "goog/nyse_ibm", "goog/nasdaq_goog")   data <- quandl(tickers,transformation = "rdiff")  returns <- gather(data, stock, value, -date) %>%     separate(stock, c("name", "field"), " - ") %>%     filter(        field == "close"     ) %>%     select(        - field     )  returns_dt <- data.table(returns)  multi_window_range <- function(data) {     result_1y <- data %>%         filter(             date >= sys.date() - 365         ) %>%          group_by(name) %>%         summarise(             max_1y = max(value, na.rm = true),             min_1y = min(value, na.rm = true)         )     result_2y <- data %>%         filter(             date >= sys.date() - 365 * 2         ) %>%         group_by(name) %>%         summarise(             max_2y = max(value, na.rm = true),            min_2y = min(value, na.rm = true)        )     result_5y <- data %>%         filter(             date >= sys.date() - 365 * 5         ) %>%         group_by(name) %>%         summarise(             max_5y = max(value, na.rm = true),             min_5y = min(value, na.rm = true)         )     return(inner_join(inner_join(result_1y, result_2y, = "name"), result_5y, = "name")) }  multi_window_range_dt <- function(data) {     setkey(data, name)     result_1y <- data[date >= sys.date() - 365,                       list(                         max_1y = max(value, na.rm = true),                         min_1y = min(value, na.rm = true)                       ), = "name"]    result_2y <- data[date >= sys.date() - 365 * 2,                      list(                         max_2y = max(value, na.rm = true),                         min_2y = min(value, na.rm = true)                      ), = "name"]    result_5y <- data[date >= sys.date() - 365 * 5,                      list(                         max_5y = max(value, na.rm = true),                         min_5y = min(value, na.rm = true)                      ), = "name"]    return(result_1y[result_2y][result_5y]) }  microbenchmark(     multi_window_range(returns),     multi_window_range_dt(returns_dt) )   unit: milliseconds                               expr      min       lq     mean   median       uq      max neval        multi_window_range(returns) 6.341532 6.522303 6.915266 6.692666 6.922623 10.16709   100  multi_window_range_dt(returns_dt) 7.537073 7.738516 8.066579 7.865968 8.073114 12.68021   100  

try this:

multi_window_range_dt2 <- function(data) {        data[, {         rng1 <- range(value[date > sys.date() - 365], na.rm = true)         rng2 <- range(value[date > sys.date() - 2*365], na.rm = true)         rng5 <- range(value[date > sys.date() - 5*365], na.rm = true)         list(max_1y = rng1[2], min_1y = rng1[1],              max_2y = rng2[2], min_2y = rng2[1],              max_5y = rng5[2], min_5y = rng5[1])        }, = "name"] }  library(rbenchmark) benchmark(multi_window_range(returns), multi_window_range_dt2(returns_dt))[1:4] 

which gives on laptop:

                                test  replications elapsed relative 1        multi_window_range(returns)           100    2.39    1.189 2 multi_window_range_dt2(returns_dt)           100    2.01    1.000 

this indicates multi_window_range takes 18.9% more time multi_window_range_dt2:


Comments

Popular posts from this blog

c++ - QTextObjectInterface with Qml TextEdit (QQuickTextEdit) -

javascript - angular ng-required radio button not toggling required off in firefox 33, OK in chrome -

xcode - Swift Playground - Files are not readable -