問題描述
hadoop ‑libjars dan ClassNotFoundException (hadoop ‑libjars and ClassNotFoundException)
please help, I'm stuck. Here is my code to run job.
hadoop jar mrjob.jar ru.package.Main ‑files hdfs://0.0.0.0:8020/MyCatalog/jars/metadata.csv ‑libjars hdfs://0.0.0.0:8020/MyCatalog/jars/opencsv.jar,hdfs://0.0.0.0:8020/MyCatalog/jars/gson.jar,hdfs://0.0.0.0:8020/MyCatalog/jars/my‑utils.jar /MyCatalog/http_requests.seq‑r‑00000 /MyCatalog/output/result_file
I do get these WARNs:
12/10/26 18:35:50 WARN util.GenericOptionsParser: The libjars file hdfs://0.0.0.0:8020/MyCatalog/jars/opencsv.jar is not on the local filesystem. Ignoring.
12/10/26 18:35:50 WARN util.GenericOptionsParser: The libjars file hdfs://0.0.0.0:8020/MyCatalog/jars/gson.jar is not on the local filesystem. Ignoring.
12/10/26 18:35:50 WARN util.GenericOptionsParser: The libjars file hdfs://0.0.0.0:8020/MyCatalog/jars/my‑utils.jar is not on the local filesystem. Ignoring.
Then: Exception in thread "main" java.lang.NoClassDefFoundError: on line in Main class where I try to instantiate class from jar named my‑utils.jar
- All these jars are in hfds (I see them through file browser)
- my‑utils.jar does contain class which is a reason for NoClassDefFoundError
What do I do wrong?
UPD: I'm inspecting sourcecode of GenericOptionsParser:
/**
* If libjars are set in the conf, parse the libjars.
* @param conf
* @return libjar urls
* @throws IOException
*/
public static URL[] getLibJars(Configuration conf) throws IOException {
String jars = conf.get("tmpjars");
if(jars==null) {
return null;
}
String[] files = jars.split(",");
List<URL> cp = new ArrayList<URL>();
for (String file : files) {
Path tmp = new Path(file);
if (tmp.getFileSystem(conf).equals(FileSystem.getLocal(conf))) {
cp.add(FileSystem.getLocal(conf).pathToFile(tmp).toURI().toURL());
} else {
LOG.warn("The libjars file " + tmp + " is not on the local " +
"filesystem. Ignoring.");
}
}
return cp.toArray(new URL[0]);
}
So: 1. no spaces between comma 2. still don't get it... I've tried to point to: local file system, hdfs file system, result is the same. Seems like class is not added...
‑‑‑‑‑
參考解法
方法 1:
Problem is solved. correct invocation is:
hadoop jar my‑job.jar ru.package.Main ‑files /home/cloudera/uploaded_jars/metadata.csv ‑libjars /home/cloudera/uploaded_jars/opencsv.jar,/home/cloudera/uploaded_jars/gson.jar,/home/cloudera/uploaded_jars/url‑raiting‑utils.jar /MyCatalog/http_requests.seq‑r‑00000 /MyCatalog/output/scoring_result
where
/MyCatalog
is hdfs path,
/home/cloudera/uploaded_jars/
is local fs path The problem was in job jar. Previously I did try to run job using simple jar with only three classes: Mapper, Reducer, Main class. Now I did provide other one generated by maven (it generates two of them) The second job jar contains all dependency libs. in side it. Structure looks like: my‑job.jar
‑lib
‑‑aopalliance‑1.0.jar asm‑3.2.jar avro‑1.5.4.jar ... commons‑beanutils‑1.7.0.jar commons‑beanutils‑core‑1.8.0.jar ... zookeeper‑3.4.3‑cdh4.0.0.jar
There are 76 jars inside lib folder.
It works but I don't understand why.
方法 2:
Just because they are on HDFS, doesn't mean that they are in the classpath of the job you are running.
If you really just want to fix this problem, I would use maven to build a "fat jar" which contains all your dependencies in a single jar. You can do this using the shade plugin.
But, looking at your command, it looks wrong. I think you might have better luck using the "job" command with ‑libjars, described here. I'm not sure that you can specify external jars using the "hadoop jar" command.
方法 3:
The reason is your mrjob.jar determines the jars needed for your Hadoop client job. Either you provide a fat jar or include all your jars under HADOOP_CLASSPATH.
On the other hand, ‑libjars sets the additional jars needed for the Map and Reduce tasks.
Read this http://grepalex.com/2013/02/25/hadoop‑libjars/
(by Capacytron、Capacytron、Paul Sanwald、GC001)