虽然有zabbix等工具监控着pg数据库,但还是有很多信息不能监控到,所以想通过定期分析pg的日志来掌握数据库的运行状态。pgbadger是一款开源的日志分析工具,采用perl编写,项目地址为:
Options:
-a | --average minutes : number of minutes to build the average graphs of
queries and connections. Default 5 minutes.
-A | --histo-avg minutes: number of minutes to build the histogram graphs
of queries. Default 60 minutes.
-b | --begin datetime : start date/time for the data to be parsed in log.
-B | --bar-graph : use bar graph instead of line by default.
-c | --dbclient host : only report on entries for the given client host.
-C | --nocomment : remove comments like /* ... */ from queries.
-d | --dbname database : only report on entries for the given database.
-D | --dns-resolv : client ip adresses are replaced by their DNS name.
Be warned that this can really slow down pgBadger.
-e | --end datetime : end date/time for the data to be parsed in log.
-f | --format logtype : possible values: syslog,stderr,csv. Default: stderr.
-G | --nograph : disable graphs on HTML output. Enabled by default.
-h | --help : show this message and exit.
-i | --ident name : programname used as syslog ident. Default: postgres
-I | --incremental : use incremental mode, reports will be generated by
days in a separate directory, --outdir must be set.
-j | --jobs number : number of jobs to run at same time. Default is 1,
run as single process.
-J | --Jobs number : number of log file to parse in parallel. Default
is 1, run as single process.
-l | --last-parsed file: allow incremental log parsing by registering the
last datetime and line parsed. Useful if you want
to watch errors since last run or if you want one
report per day with a log rotated each week.
-m | --maxlength size : maximum length of a query, it will be restricted to
the given size. Default: no truncate
-M | --no-multiline : do not collect multiline statement to avoid garbage
especially on errors that generate a huge report.
-n | --nohighlight : disable SQL code highlighting.
-N | --appname name : only report on entries for given application name
-o | --outfile filename: define the filename for the output. Default depends
on the output format: out.html, out.txt, out.bin,
out.json or out.tsung.
With module JSON::XS installed, you can output file
in JSON format either.
To dump output to stdout use - as filename.
-O | --outdir path : directory where out file must be saved.
-p | --prefix string : the value of your custom log_line_prefix as
defined in your postgresql.conf. Only use it if you
aren't using one of the standard prefixes specified
in the pgBadger documentation, such as if your prefix
includes additional variables like client ip or
application name. See examples below.
-P | --no-prettify : disable SQL queries prettify formatter.
-q | --quiet : don't print anything to stdout, not even a progress bar.
-r | --remote-host ip : set the host where to execute the cat command on remote
logfile to parse localy the file.
-R | --retention N : number of week to keep in incremental mode. Default 0,
disabled. Used to set the number of weel to keep in
output directory. Older week end day directory are
automatically removed.
-s | --sample number : number of query samples to store/display. Default: 3
-S | --select-only : only report SELECT queries.
-t | --top number : number of queries to store/display. Default: 20
-T | --title string : change title of the HTML page report.
-u | --dbuser username : only report on entries for the given user.
-U | --exclude-user username : exclude entries for the specified user from report.
-v | --verbose : enable verbose or debug mode. Disabled by default.
-V | --version : show pgBadger version and exit.
-w | --watch-mode : only report errors just like logwatch could do.
-x | --extension : output format. Values: text, html, bin, json or
tsung. Default: html
-X | --extra-files : in incremetal mode allow pgbadger to write CSS and JS
files in the output directory as separate files.
-z | --zcat exec_path : set the full path to the zcat program. Use it if
zcat or bzcat or unzip is not in your path.
--pie-limit num : pie data lower than num% will show a sum instead.
--exclude-query regex : any query matching the given regex will be excluded
from the report. For example: "^(VACUUM|COMMIT)"
You can use this option multiple times.
--exclude-file filename: path of the file which contains all the regex to use
to exclude queries from the report. One regex per line.
--include-query regex : any query that does not match the given regex will be
excluded from the report. For example: "(table_1|table_2)"
You can use this option multiple times.
--include-file filename: path of the file which contains all the regex of the
queries to include from the report. One regex per line.
--disable-error : do not generate error report.
--disable-hourly : do not generate hourly report.
--disable-type : do not generate report of queries by type, database...
--disable-query : do not generate query reports (slowest, most
frequent, queries by users, by database, ...).
--disable-session : do not generate session report.
--disable-connection : do not generate connection report.
--disable-lock : do not generate lock report.
--disable-temporary : do not generate temporary report.
--disable-checkpoint : do not generate checkpoint/restartpoint report.
--disable-autovacuum : do not generate autovacuum report.
--charset : used to set the HTML charset to be used. Default: utf-8.
--csv-separator : used to set the CSV field separator, default: ,
--exclude-time regex : any timestamp matching the given regex will be
excluded from the report. Example: "2013-04-12 .*"
You can use this option multiple times.
--exclude-appname name : exclude entries for the specified application name
from report. Example: "pg_dump".
--exclude-line regex : pgbadger will start to exclude any log entry that
will match the given regex. Can be used multiple time.
--anonymize : obscure all literals in queries to hide confidential data.
把此out.html拉到本地用浏览器打开即可查看到精美的画面,其中信息有增删改查统计、事件、vacuum、临时文件、session、connection、checkpoint等信息。
postgres:/var/pg_log@pgdb1>pgbadger -b "2014-10-26 10:00:00" -e "2014-10-27 10:00:00" /var/pg_log/* -o /tmp/tmp_pgbadger.html
[======================> ] Parsed 59206 bytes of 62700 (94.43%), queries: 0, events: 0
[=======================> ] Parsed 94 bytes of 96 (97.92%), queries: 0, events: 0
[======================> ] Parsed 29695217 bytes of 31457483 (94.40%), queries: 0, events: 0
[=======================> ] Parsed 94 bytes of 96 (97.92%), queries: 0, events: 0
...
[======================> ] Parsed 29693436 bytes of 31457418 (94.39%), queries: 2250, events: 3
[======================> ] Parsed 29685718 bytes of 31457312 (94.37%), queries: 4884, events: 23
[======================> ] Parsed 8645318 bytes of 9156095 (94.42%), queries: 4884, events: 24
LOG: Ok, generating html report...
Wide character in print at /usr/local/bin/pgbadger line 7007.
Wide character in print at /usr/local/bin/pgbadger line 7007.
Wide character in print at /usr/local/bin/pgbadger line 7996.
可以看到通过/var/pg_log/*的方式指定日志的话,其实pgbadger要扫描所有的文件来分析这个时间段的日志信息。
2.3 定期增量分析日志
由于日志分析是一个日常性的运维工作,因此建议定期增量分析,定期就是使用crontab等工具定期执行日志分析命令,增量是指利用pgbadger的增量分析功能,分析过的日志是不会再分析的。
要进行增量分析,必须要先建立一个目录,让分析过程中产生的结果文件保存到此目录,并生成指示下次分析开始位置的文件。
日常分析:
[root@pgdb1 ~]# mkdir /tmp/pgbadger_log/
[root@pgdb1 ~]# chown -R postgres:postgres /tmp/pgbadger_log/
定时任务:
0 4 * * * /usr/local/bin/pgbadger --retention 4 -I -q /var/pg_log/* -O /tmp/pgbadger_log/
通过以上命令即可把日志输出到 /tmp/pgbadger_log/目录,并保持4周的循环,过期的分析结果将被删掉(注意:生产环境最好别放到/tmp目录下,否则重启机器后结果文件可能会丢失)。
下面手工先创建执行下试试:
postgres:/var/pg_log@pgdb1>/usr/local/bin/pgbadger --retention 4 -I -q /var/pg_log/* -O /tmp/pgbadger_log/
Wide character in print at /usr/local/bin/pgbadger line 7996.
Wide character in print at /usr/local/bin/pgbadger line 7996.
Wide character in print at /usr/local/bin/pgbadger line 7007.
....
速度比较慢,因为有/var/pg_log/下的日志有20来个,接近500M了
执行完后会在/tmp/pgbadger_log/目录下生成如下文件:
[root@pgdb1 pgbadger_log]# ls
2014 index.html LAST_PARSED
[root@pgdb1 pgbadger_log]# tree 2014/
2014/
├── 10
│ ├── 18
│ │ ├── 2014-10-18-875.bin
│ │ ├── 2014-10-18-876.bin
│ │ ├── 2014-10-18-877.bin
│ │ └── index.html
...
│ ├── 26
│ │ ├── 2014-10-26-875.bin
│ │ ├── 2014-10-26-876.bin
│ │ ├── 2014-10-26-877.bin
│ │ └── index.html
│ └── 27
│ ├── 2014-10-27-875.bin
│ ├── 2014-10-27-876.bin
│ └── index.html
├── week-42
│ └── index.html
├── week-43
│ └── index.html
└── week-44
└── index.html
14 directories, 44 files
-I为增量模式,可以看到,按日期生成了一堆的bin二进制文件,并用index索引每天的结果,如果装了apache可以访问服务器上的这些文件的,由于我的是数据库服务器,没有安装apache,所以要把整个目录拖到本地才能看,打开顶层的index.html就可以选择看哪周、哪天的日志。
再次执行增量分析命令就快很多了:
postgres:/var/pg_log@pgdb1> /usr/local/bin/pgbadger --retention 4 -I -q /var/pg_log/* -O /tmp/pgbadger_log/
Wide character in print at /usr/local/bin/pgbadger line 7007.
Wide character in print at /usr/local/bin/pgbadger line 7007.
Wide character in print at /usr/local/bin/pgbadger line 7007.
Wide character in print at /usr/local/bin/pgbadger line 7167.
2.4 仅分析并输出错误信息
postgres:/var/pg_log@pgdb1> pgbadger -q -w /var/pg_log/postgresql-2014-10-27_060717.csv -o /tmp/pg_errors.html
-w选项只过滤错误相关的信息,这样输出文件很小,便于之查看错误信息。
2.5 并行模式分析
postgres:/tmp/pgbadger_log/2014@pgdb1> /usr/local/bin/pgbadger -j 4 --retention 4 -I -q /var/pg_log/* -O /tmp/pgbadger_log/
-j选项为并行模式,其中还有-J模式,前者为并行N个进程分析一个文件,后者为并行N个进程同时分析N个文件。两者实测速度差不多。
还可以指定其他选项:
-S选项,只分析查询语句
-t选项,列出指定数目的top sql
-u和-U选项,仅列出指定用户或去除指定用户的查询
-x选项,指定输出格式,如果不想用浏览器查看的可以选择txt等格式
--exclude-query选项和--include-query,可以指定正则表达式过滤想要分析或去除的sql
有了这些数据,可以像pt-query-digist工具分析mysql慢查询日志一样分析postgresql的数据库运行状态了。