PIG JOIN 的replicated后标写入内存用法
- - 数据库 - ITeye博客'''一句话总结:PIG 在2个表JOIN的时候,如果使用Using 'replicated' 会将后面的表分段读到内存中,从而加快JOIN的效率. 但是如果load 到内存的数据超过JVM的限制就会报错==>. 年前写了一个用户session处理的PIG脚本,各种测试通过,数据OK,然后就Happy的回家过年了.
2013-02-16 12:40:23,520 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More inf ormation at: http://hd09:50030/jobdetails.jsp?jobid=job_201301221227_72618 2013-02-16 13:47:50,157 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 80% comp lete 2013-02-16 13:47:52,171 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_ 201301221227_72618 has failed! Stop running all dependent jobs 2013-02-16 13:47:52,171 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% com plete 2013-02-16 13:47:52,175 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception f rom backed error: Task attempt_201301221227_72618_m_000000_1 failed to report status for 1201 seconds. Killing! 2013-02-16 13:47:52,176 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! 2013-02-16 13:47:52,178 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
Exception in thread "Thread for syncLogs" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2894) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:589) at java.lang.StringBuilder.append(StringBuilder.java:220) at java.io.UnixFileSystem.resolve(UnixFileSystem.java:108) at java.io.File.<init>(File.java:329) at org.apache.hadoop.mapred.TaskLog.getAttemptDir(TaskLog.java:267) at org.apache.hadoop.mapred.TaskLog.getAttemptDir(TaskLog.java:260) at org.apache.hadoop.mapred.TaskLog.getIndexFile(TaskLog.java:237) at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:316) at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369) at org.apache.hadoop.mapred.Child$3.run(Child.java:141) Exception in thread "LeaseChecker" java.lang.OutOfMemoryError: Java heap spac
A = LOAD 'A/dt=2013-02-14' USING PigStorage('\u0001') AS (id:int,name:chararray); B = LOAD 'B/*' USING PigStorage('\u0001') AS (id:int,gender:chararray); C = FOREACH (JOIN A BY id , B BY id USING 'replicated') GENERATE A::id, A::name, A::gender;