mapreduce - Hadoop: Getting the input file name in the mapper only once -
i new in hadoop , working on hadoop. have small query.
i have around 10 files in input folder need pass map reduce program. want file name in mapper filename contains time @ file got created. saw people using filesplit file name in mapper. if let input files contains million of lines every time mapper code called, file name , extract time file, obvious repeated time consuming thing same file. once time in mapper not have again , again assign time file.
how can achieve this?
you use mapper's setup
method filename, setup
method gaurenteed run once before map()
method gets initialized this:
public class mapperrsj extends mapper<longwritable, text, compositekeywritablersj, text> { string filename; @override protected void setup(context context) throws ioexception, interruptedexception { filesplit fsfilesplit = (filesplit) context.getinputsplit(); filename = context.getconfiguration().get(fsfilesplit.getpath().getparent().getname())); } @override public void map(longwritable key, text value, context context) throws ioexception, interruptedexception { // process each key value pair } }
Comments
Post a Comment