mapreduce - Hadoop - get results from output files after reduce? -


given job map , reduce phases, can see output folder contains files named "part-r-00000".

if need post-process these files on application level, need iterate on files in output folder in natural naming order (part-r-00000, part-r-00001,part-r-00002 ...) in order job results?

or can use hadoop helper file reader, allow me "iterator" , handle file switching me (when file part-r-00000 read, continue file part-r-00001)?

in mapreduce specify output folder, thing contain part-r files (which output of reduce task) , _success file (which empty). think if want postprocessing need set output dir of job1 input dir job 2.

now there might requirements postprocessor can addressed, example important process output files in order?

or if want process files locally depends on outputformat of mapreduce job, tell how part-r files structured. can simple use standard i/o guess.


Comments

Popular posts from this blog

java - activate/deactivate sonar maven plugin by profile? -

python - TypeError: can only concatenate tuple (not "float") to tuple -

java - What is the difference between String. and String.this. ? -