【Hadoop】请问如果我想把两个mapreduce程序顺序连接起来应该怎么写程序

发布网友发布时间：2022-05-10 23:28

共1个回答

热心网友时间：2023-11-14 04:16

你可以自己设置输入输出路径，所以设置就行了。。。
example:
JobConf conf1 = new JobConf(YourClass.class);
//set configurations
...
//set inputformat
conf1.setInputFormat(SomeInputFormatExtendsFromInputFormat.class)
conf1.setOutputFormat(SomeOutputFormatExtendsFromOutputFormat.class)
//set input path
FileInputFormat.setInputPaths(conf1, "/your_input_dir");
FileOutputFormat.setOutputPaths(conf1, "/your_first_output_dir");
JobClient.runJob(conf1);
//at this point, the job should have finished. Use submitJob(conf1) to submit it asynchronisely.
JobConf conf2 = new JobConf();
//do the same for conf2, except the input path
FileInputFormat.setInputPaths(conf1, "/your_first_output_dir");
FileOutputFormat.setOutputPaths(conf1, "/your_first_input_dir");
JobClient.runJob(conf);
自己继承InputFormat, OutputFormat来定义合适的分割，读，写文件方式。maprece有一些实现好的，比如FileInputFormat, SequenceFileInputFormat。必要的时候读一下源代码，就清楚了。hadoop maprece 的最基本的文档见http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html