How to improve performance of this linear interpolation in r -
for given column in dataframe, want construct new vector each point consists of average of points on either side. last observation instead second last. , first observation second. wrote r code solve issue, calling repeatedly , extremely slow. can give tips on how more efficiently? thanks.
x1 <- c(rep('a',100),rep('b',100),rep('c',100)) x2 <- rnorm(300) x <- data.frame(x1,x2) names(x) <- c('col1','data1') a.linear.interpolation <- function(x) { require(zoo) require(data.table) a.dattab <- data.table(x) setkey(a.dattab,col1) #replace na values using locf / nocb a.dattab[,data1:=na.locf(data1,na.rm=false),by=list(col1)] a.dattab[,data1:=na.locf(data1,na.rm=false,fromlast=true),by=list(col1)] #adding within group sequence number , size of group field facilitate #row row processing a.dattab[,grpseq:=seq_len(.n),by=list(col1)] a.dattab[,grpseq_max:=.n,by=list(col1)] #convert data.frame #data.frame seems faster data.table row row type processing a.df <- data.frame(a.dattab) new.col <- vector(length=nrow(a.df)) for(i in seq(nrow(a.df))){ if(a.df[i,"grpseq"]==1){ new.col[i] <- a.df[i+1,"data1"] } else if(a.df[i,"grpseq"]==a.df[i,"grpseq_max"]){ new.col[i] <- a.df[i-1,"data1"] } else { new.col[i] <- (a.df[i-1,"data1"]+a.df[i+1,"data1"])/2 } } return(new.col) }
apart using rollmeans
, base r filter
function can sort of thing well. e.g.:
linint <- function(vec) { c(vec[2], filter(vec, c(0.5, 0, 0.5))[-c(1, length(vec))], vec[length(vec) - 1]) } x <- c(1,3,6,10,1) linint(x) #[1] 3.0 3.5 6.5 3.5 10.0
and it's pretty quick, chewing through 10m cases in less second:
x <- rnorm(1e7) system.time(linint(x)) #user system elapsed #0.57 0.18 0.75
Comments
Post a Comment