mapreduce - Hadoop: Getting the input file name in the mapper only once -

- May 15, 2011

i new in hadoop , working on hadoop. have small query.

i have around 10 files in input folder need pass map reduce program. want file name in mapper filename contains time @ file got created. saw people using filesplit file name in mapper. if let input files contains million of lines every time mapper code called, file name , extract time file, obvious repeated time consuming thing same file. once time in mapper not have again , again assign time file.

how can achieve this?

you use mapper's setup method filename, setup method gaurenteed run once before map() method gets initialized this:

public class mapperrsj extends mapper<longwritable, text, compositekeywritablersj, text> {   string filename;    @override   protected void setup(context context) throws ioexception, interruptedexception {     filesplit fsfilesplit = (filesplit) context.getinputsplit();     filename = context.getconfiguration().get(fsfilesplit.getpath().getparent().getname()));   }    @override   public void map(longwritable key, text value, context context) throws ioexception, interruptedexception {     // process each key value pair   } }

Search This Blog

Add

mapreduce - Hadoop: Getting the input file name in the mapper only once -

Comments

Post a Comment

Popular posts from this blog

c++ - QTextObjectInterface with Qml TextEdit (QQuickTextEdit) -

xcode - Swift Playground - Files are not readable -

jboss7.x - JBoss AS 7.3 vs 7.4 and differences -