regex - How to extract text related to a regular expression (regexpr) index in R -


i'm working specialised text file, long list of names , random sequence of letters associated each name. i'm attempting extract particular consensus sequence i'm interesting in. sequence is, lets "stxdxik", x being letter. red text file r , named "text".

then used regular expression isolate list of entries containing sequence, calling "ylist".

ylist<- text[grep("st[a-z]d[a-z]ik", text, value=false, perl=false)] 

then used regexpr function locate position of sequence i'm interested in, calling "r".

r<- regexpr("st[a-z]d[a-z]ik", ylist) 

now problem index of locations sequence lies in, starting position , number of matches. i'm interested in extracting full sequences, , not indexes "ylist" since important me full length sequence is. can help?

i have tried substr , regmatches functions in r substr has applied each match, not practical me have many many matches sequence , regmatches don't seem work or can't make work, perhaps because enter wrong command.

using for-loop:

text <- c("tedstxdxiksslker","janetlkajsdfstxdxikalkse","maggiesdfes","sdfjkstxdxikryan") ylist<- grep("st[a-z]d[a-z]ik", text, value=true, perl=false)  r<- regexpr("st[a-z]d[a-z]ik", ylist)  strings <- character() for(i in seq_along(ylist)){strings <- c(strings,substr(ylist[i],start=r[i],stop=r[i]+6))}  > strings [1] "stxdxik" "stxdxik" "stxdxik" 

or in 1 line using stringr package.

require(stringr) > str_extract(string=text,pattern="st[a-z]d[a-z]ik") [1] "stxdxik" "stxdxik" na        "stxdxik"  strings2 <- str_extract(string=text,pattern="st[a-z]d[a-z]ik") strings2 <- strings2[!is.na(strings2)] > strings2 [1] "stxdxik" "stxdxik" "stxdxik" 

Comments

Popular posts from this blog

java - activate/deactivate sonar maven plugin by profile? -

python - TypeError: can only concatenate tuple (not "float") to tuple -

java - What is the difference between String. and String.this. ? -