excel - Using R to reformat data from cross-tab to one-datum-per-line format -
i'm using r pull in data through api , merge of single table, write csv file. graph in tableau, however, need prepare data using reformatting tool excel cross-tablulated format format each line contains 1 piece of data. example, taking format:
id,gender,school,math,english,science 1,m,west,90,80,70 2,f,south,50,50,50
to:
id,gender,school,subject,score 1,m,west,math,90 1,m,west,english,80 1,m,west,science,70 2,f,south,math,50 2,f,south,english,50 2,f,south,science,50
are there existing tools in r or in r library allow me this, or provide starting point? trying automate preparation of data tableau need run single script formatted properly, , remove manual excel step if possible.
in r , several other programs, process referred "reshaping" data. in fact, tableau page that linked to speaks of "excel reshaper plugin".
in base r, there few functions reshape data, such (notorious) reshape()
function takes panel data wide form long form, , stack()
creates skinny stacks of data.
the "reshape2" package seems more popular such data transformations, though. here's example of "melting" sample data, i've stored in data.frame
named "mydf":
library(reshape2) melt(mydf, id.vars=c("id", "gender", "school"), value.name="score", variable.name="subject") # id gender school subject score # 1 1 m west math 90 # 2 2 f south math 50 # 3 1 m west english 80 # 4 2 f south english 50 # 5 1 m west science 70 # 6 2 f south science 50
for example, base r's reshape()
isn't appropriate, stack()
is. here, i've stack
ed last 3 columns:
stack(mydf[4:6]) # values ind # 1 90 math # 2 50 math # 3 80 english # 4 50 english # 5 70 science # 6 50 science
to data.frame
looking for, cbind
first 3 columns above output.
for reference, hadley wickham's tidy data paper entry point thinking how structure of data might facilitate further processing , visualization.
Comments
Post a Comment