Dynamically modify values in data.table in R but getting only single constant value returned -
i trying modify values in data.tables in r. have come command thought job, instead puts same value in every row, while should dependent on information in row. here command:
tablea[,cola := tableb[colbd == unlist(strsplit(unlist(strsplit(colc," +"))[1], "-"))[1]                       & colbc == unlist(strsplit(unlist(strsplit(colc," +"))[1], "-"))[2]                       & colbb == unlist(strsplit(unlist(strsplit(colc," +"))[1], "-"))[3]                       ,colba]] so want cola tablea take value of colba tableb corresponds date colc in tablea. example tablea
   cola colb                colc  1: 600   93 2012-12-14 23:45:00  2: 600   93 2012-12-15 23:45:00  3: 600   93 2012-12-14 23:45:00  4: 600   93 2012-12-15 23:45:00  5: 600   93 2012-12-14 23:45:00  6: 600   93 2012-12-15 23:45:00  7: 600   93 2012-12-14 23:45:00  8: 600   93 2012-12-15 23:45:00  9: 600   93 2012-12-14 23:45:00 10: 600   93 2012-12-15 23:45:00 this relevant part tableb:
   colba colbb colbc colbd 1:   600    14    12  2012 2:   601    15    12  2012 this resulting tablea should like
   cola colb                colc  1: 600   93 2012-12-14 23:45:00  2: 601   93 2012-12-15 23:45:00  3: 600   93 2012-12-14 23:45:00  4: 601   93 2012-12-15 23:45:00  5: 600   93 2012-12-14 23:45:00  6: 601   93 2012-12-15 23:45:00  7: 600   93 2012-12-14 23:45:00  8: 601   93 2012-12-15 23:45:00  9: 600   93 2012-12-14 23:45:00 10: 601   93 2012-12-15 23:45:00 i hope clear enough!
so using information @davidarenburg code more efficient using as.date() , isodate functions, didn't know use on data. command become this:
tablea[,cola := tableb[as.date(colc) == as.date(isodate(colbd,colbc,colbb)), colba]] this works (for testset, not larger sample of data), gives me warning:
warning message: in `==.default`(as.date(colc), as.date(isodate(colbd, colbc,  :   longer object length not multiple of shorter object length 
you doing inefficiently. not how should work data.table.
here's 1 efficient way solve (there more efficient ways)
setkey(tableb[, indx := as.date(isodate(colbd, colbc, colbb))], indx) setkey(tablea[, indx := as.date(colc)], indx) tablea[tableb, cola := colba][, indx := null][] #     cola colb                colc #  1:  600   93 2012-12-14 23:45:00 #  2:  600   93 2012-12-14 23:45:00 #  3:  600   93 2012-12-14 23:45:00 #  4:  600   93 2012-12-14 23:45:00 #  5:  600   93 2012-12-14 23:45:00 #  6:  601   93 2012-12-15 23:45:00 #  7:  601   93 2012-12-15 23:45:00 #  8:  601   93 2012-12-15 23:45:00 #  9:  601   93 2012-12-15 23:45:00 # 10:  601   93 2012-12-15 23:45:00 here creating index column in each data set (only once , assigning reference, instead of calling multiple times) used merging while assigning correct values in colba cola  
if insist on retaining original rows order, can modify code following
setkey(tableb[, indx := as.date(isodate(colbd, colbc, colbb))], indx) setkey(tablea[, `:=`(indx = as.date(colc), order = .i)], indx) setorder(tablea[tableb, cola := colba], order)[, `:=`(indx = null, order = null)][] #     cola colb                colc #  1:  600   93 2012-12-14 23:45:00 #  2:  601   93 2012-12-15 23:45:00 #  3:  600   93 2012-12-14 23:45:00 #  4:  601   93 2012-12-15 23:45:00 #  5:  600   93 2012-12-14 23:45:00 #  6:  601   93 2012-12-15 23:45:00 #  7:  600   93 2012-12-14 23:45:00 #  8:  601   93 2012-12-15 23:45:00 #  9:  600   93 2012-12-14 23:45:00 # 10:  601   93 2012-12-15 23:45:00 
Comments
Post a Comment