regex - How to extract text related to a regular expression (regexpr) index in R -
i'm working specialised text file, long list of names , random sequence of letters associated each name. i'm attempting extract particular consensus sequence i'm interesting in. sequence is, lets "stxdxik", x being letter. red text file r , named "text".
then used regular expression isolate list of entries containing sequence, calling "ylist".
ylist<- text[grep("st[a-z]d[a-z]ik", text, value=false, perl=false)]
then used regexpr function locate position of sequence i'm interested in, calling "r".
r<- regexpr("st[a-z]d[a-z]ik", ylist)
now problem index of locations sequence lies in, starting position , number of matches. i'm interested in extracting full sequences, , not indexes "ylist" since important me full length sequence is. can help?
i have tried substr , regmatches functions in r substr has applied each match, not practical me have many many matches sequence , regmatches don't seem work or can't make work, perhaps because enter wrong command.
using for-loop:
text <- c("tedstxdxiksslker","janetlkajsdfstxdxikalkse","maggiesdfes","sdfjkstxdxikryan") ylist<- grep("st[a-z]d[a-z]ik", text, value=true, perl=false) r<- regexpr("st[a-z]d[a-z]ik", ylist) strings <- character() for(i in seq_along(ylist)){strings <- c(strings,substr(ylist[i],start=r[i],stop=r[i]+6))} > strings [1] "stxdxik" "stxdxik" "stxdxik"
or in 1 line using stringr
package.
require(stringr) > str_extract(string=text,pattern="st[a-z]d[a-z]ik") [1] "stxdxik" "stxdxik" na "stxdxik" strings2 <- str_extract(string=text,pattern="st[a-z]d[a-z]ik") strings2 <- strings2[!is.na(strings2)] > strings2 [1] "stxdxik" "stxdxik" "stxdxik"
Comments
Post a Comment