【Hadoop】请问如果我想把两个mapreduce程序顺序连接起来应该怎么写程序
发布网友
发布时间:2022-05-10 23:28
我来回答
共1个回答
热心网友
时间:2023-11-14 04:16
你可以自己设置输入输出路径,所以设置就行了。。。
example:
JobConf conf1 = new JobConf(YourClass.class);
//set configurations
...
//set inputformat
conf1.setInputFormat(SomeInputFormatExtendsFromInputFormat.class)
conf1.setOutputFormat(SomeOutputFormatExtendsFromOutputFormat.class)
//set input path
FileInputFormat.setInputPaths(conf1, "/your_input_dir");
FileOutputFormat.setOutputPaths(conf1, "/your_first_output_dir");
JobClient.runJob(conf1);
//at this point, the job should have finished. Use submitJob(conf1) to submit it asynchronisely.
JobConf conf2 = new JobConf();
//do the same for conf2, except the input path
FileInputFormat.setInputPaths(conf1, "/your_first_output_dir");
FileOutputFormat.setOutputPaths(conf1, "/your_first_input_dir");
JobClient.runJob(conf);
自己继承InputFormat, OutputFormat来定义合适的分割,读,写文件方式。maprece有一些实现好的,比如FileInputFormat, SequenceFileInputFormat。必要的时候读一下源代码,就清楚了。hadoop maprece 的最基本的文档见http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html