hadoop记录
- - 开源软件 - ITeye博客MapReduce的特征
1. Map1结果与Map2结果重叠现象. (传统的分布式计算无法解决)
方案:Map2与Map2原封不动的把数据传到Reduce; 问题:结果Map啥事没干,Reduce最终累死, 分而治之成为了空谈. reduce任务工作过程:
reduce是将map的输出作为reduce的输入,只要有一个map任务执行完就会有reduce任务开始执行.
㈡ 定位原因:
INSERT INTO t (col1, col2, col3, col4, col5, col6, col7) VALUES ('3532082239485507011_130_99', '130_99', 130, 99, 3532082239485507011, 2172353000317425008, 29078)
select trx_id,trx_state,trx_started,trx_requested_lock_id,trx_weight,trx_mysql_thread_id from information_schema.innodb_trx where trx_state='RUNNING';
㈣ 我的疑问:
为什么这条执行了1个多小时的SQL没有被记录到慢查询日志中呢??
㈤ 原来如此:
Query_time - Lock_time > long_query_time <===记录
Query_time - Lock_time < long_query_time <===不记录
㈥ 模拟场景:
⑴ Query_time - Lock_time > long_query_time
Session_A: mysql> begin; Query OK, 0 rows affected (0.00 sec) mysql> select emp_no,hire_date from employees where emp_no=10170 for update; +--------+------------+ | emp_no | hire_date | +--------+------------+ | 10170 | 1986-01-02 | +--------+------------+ 1 row in set (0.00 sec) Session_B: mysql> select emp_no,hire_date,sleep(3) from employees where emp_no=10170 for update;
过段时间在A做commit,B会执行、并被记录到slow log中:
# Time: 140818 22:37:31 # User@Host: root[root] @ localhost [] Id: 1 # Query_time: 3.049016 Lock_time: 0.018891 Rows_sent: 1 Rows_examined: 1 use employees; SET timestamp=1408372651; select emp_no,hire_date,sleep(3) from employees where emp_no=10170 for update;
Session_A: mysql> begin; Query OK, 0 rows affected (0.00 sec) mysql> select emp_no,hire_date from employees where emp_no=10170 for update; +--------+------------+ | emp_no | hire_date | +--------+------------+ | 10170 | 1986-01-02 | +--------+------------+ 1 row in set (0.00 sec) Session_B: mysql> select emp_no,hire_date from employees where emp_no=10170 for update;
㈦ 我的收获:
我们日常做性能剖析实际上应该包括2个方面:
1)基于执行时间的分析
2)基于等待时间的分析
By water
Good Luck!