I'm trying to attack the problem of analyzing web logs with Hive, and I've seen plenty of examples out there, but I can't seem to find anyone with this specific issue.
Here's where I'm at: I've set up an AWS ElasticMapReduce cluster, I can log in, and I fire up Hive. I make sure to
add jar hive-contrib-0.8.1.jar, and it says it's loaded. I create a table called
event_log_raw, with a few string columns and a regex.
load data inpath '/user/hadoop/tmp overwrite into table event_log_raw, and I'm off to the races.
select * from event_log_raw works (I think locally, as I don't get the map % and reduce % outputs), and I get my 10 records from my sample data, parsed correctly, everything's good.
select count(*) from event_log_raw works as well, this time with a mapreduce job created.
I want to convert my
request_url field to a map, so I run:
select elr.view_time as event_time, elr.ip as ip, str_to_map(split(elr.request_url," "),"&","=") as params from event_log_raw elr
Mapreduce fires up, waiting, waiting...FAILED.
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
I check the syslogs from the task trackers and see, among other things,
java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) <snip> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.contrib.serde2.RegexSerDe at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:406) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:90) ... 22 more Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.contrib.serde2.RegexSerDe
I've google'd and SO'ed this, but I guess my google-fu is not up to snuff. Everything I've found points to folks having trouble with this and solving it by running the
add jar command. I've tried that, I've tried adding it to my
hive-site.xml, I've tried having it locally, tried putting the jar in an s3 bucket. Tried adding a bootstrap step to add it during the bootstrap phase (disaster).
Can anyone help me figure out a.) why my task nodes can't find RegexSerDe, and b.) how to make this work? Links are welcome as well, if they might reveal something more than just running
Thanks in advance!