对于使用了LSI MegaRAID卡搭建RAID的, 通过LSI公司提供的MegaCli工具, 就可以实现对RAID卡和硬盘的监控. 注: DELL PERC5/6(PowerEdge RAID ControllerPERC)阵列卡实际上也就是LSI MegaRAID SAS controllers.
最新MegaCli工具包下载地址:
http://www.lsi.com/Search/Pages/results.aspx?k=megacli&r=assettype%3D%22AQ1NaXNjZWxsYW5lb3VzCWFzc2V0dHlwZQEBXgEk%22%20os%3D%22AQVMaW51eAJvcwEBXgEk%221. 安装前提1) 查看服务器类型#
dmidecode -s system-product-name (新版本dmidecode使用)
or
#
dmidecode | grep "Product Name" (低版本dmidecode使用)
Lenovo WQ R520 G7
2) 确认是否使用MegaRAID卡--HP ProLiant系列服务器大都使用Smart Array阵列卡
不适用.
--Lenovo万全系列服务器可能如下显示(有些不可用?)
#
dmesg | grep RAIDscsi0 : LSI SAS based MegaRAID driver
Vendor: LSI Model: MegaRAID 8300XLP Rev: 2.02
md: Autodetecting RAID arrays.
--IBM x系列服务器可能如下显示
#
dmesg | grep RAIDscsi0 : LSI SAS based MegaRAID driver
Vendor: IBM Model: ServeRAID M5015 Rev: 2.0.
md: Autodetecting RAID arrays.
--Dell PowerEdge系列服务器可能如下显示
#
dmesg | grep RAIDscsi0 : LSI Logic SAS based MegaRAID driver
md: Autodetecting RAID arrays.
3) 确认是否已安装#
rpm -qa | egrep 'Lib_Utils|MegaCli'2. 安装MegaCli建议下载安装使用最新的MegaCli, 这样就支持更多的SAS硬盘类型的监控.
#
cd /tmp#
unzip 8.01.06_Linux_MegaCLI.zip (解压MegaCli软件包)
Archive: 8.01.06_Linux_MegaCLI.zip
inflating: readme.txt
inflating: 8.01.06_Linux_MegaCLI.txt
extracting: MegaCliLin.zip
#
unzip MegaCliLin.zip (进一步解压MegaCliLin软件包)
Archive: MegaCliLin.zip
inflating: Lib_Utils-1.00-08.noarch.rpm
replace readme.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
inflating: readme.txt
inflating: MegaCli-8.01.06-1.i386.rpm
其中MegaCli-8.01.06-1.i386.rpm包是我们需要的(32bit或64bit系统都使用该包), 如果操作系统缺失了MegaCli相关的依赖包, 那么就需要先安装Lib_Utils-1.00-08.noarch.rpm了:
#
rpm -ivh Lib_Utils-1.00-08.noarch.rpm#
rpm -Uvh MegaCli-8.01.06-1.i386.rpm#
rpm -ql MegaCli (确认MegaCli包的安装文件信息)
/opt/MegaRAID/MegaCli/MegaCli
/opt/MegaRAID/MegaCli/MegaCli64
如果是32bit系统, 就使用MegaCli; 如果是64bit系统就是使用MegaCli64.
#
/opt/MegaRAID/MegaCli/MegaCli (该命令直接执行会提示如下错误)
or
#
/opt/MegaRAID/MegaCli/MegaCli64 (该命令直接执行会提示如下错误)
Fatal error - Command Tool invoked with wrong parameters
Exit Code: 0x01
3. 测试MegaCli#
arch (确定操作系统架构)
x86_64
原文件有大小写和数字, 且路径太长, 建议做个软连接到/usr/bin目录:
#
ln -sf /opt/MegaRAID/MegaCli/MegaCli /usr/bin/megacli (32bit系统)
or
# ln -sf /opt/MegaRAID/MegaCli/MegaCli64 /usr/bin/megacli (64bit系统)
现在就可以直接执行软连接后的文件了:
# megacli -help (查看命令帮助)
#
megacli -adpCount (查看适配器个数)
#
megacli -LdGetNum -aALL (查看逻辑盘个数)
#
megacli -LdInfo -LALL -aAll (显示所有逻辑盘信息, IBM x3650服务器示例)
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name :
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0Size : 1.086 TBState : OptimalStrip Size : 128 KBNumber Of Drives per span:4 //表示每4个物理盘做成一个RAID1盘组
Span Depth : 2 //表示共2个RAID1盘组做成了RAID10
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Access Policy : Read/Write
Disk Cache Policy : Disabled
Encryption Type : None
Default Power Savings Policy: Controller Defined
Current Power Savings Policy: None
Can spin up in 1 minute: Yes
LD has drives that support T10 power conditions: Yes
LD's IO profile supports MAX power savings with cached writes: Yes
Exit Code: 0x00
#
megacli -PdList -aAll| more (显示所有的物理盘信息, IBM x3650服务器示例)
Adapter #0
Enclosure Device ID: 252
Slot Number: 0Enclosure position: 0
Device Id: 8
Sequence Number: 2
Media Error Count: 0Other Error Count: 0Predictive Failure Count: 0Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.464 GB [0x22cee000 Sectors]
Firmware state: Online, Spun Up
SAS Address(0): 0x5000cca015512ae5
SAS Address(1): 0x0
Connected Port Number: 1(path0)
Inquiry Data: IBM-ESXSCBRCA300C3ETS0 NC610PFWEMUBECCXSA610
IBM FRU/CRU: 42D0638
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Hard Disk Device
Drive: Not Certified
Drive Temperature :38C (100.40 F)
...
#
megacli -cfgdsply -aALL | more (显示Raid卡型号,Raid设置,Disk相关信息)
#
megacli -FwTermLog -Dsply -aALL | more (查看Raid卡日志)
#
megacli -AdpAllInfo -aALL | more (查看Raid卡功能详细说明)
4. 安装check_megaraid_sas就是一个通过MegaCli命令来获取监控信息的Nagios插件, 使用perl编写的.
下载地址:
http://www.techno-obscura.com/~delgado/code/check_megaraid_sas
#
cd /tmp#
vi check_megaraid_sas-------------------------------------------------------------------------
# 35行修改如下
use lib qw(/usr/local/nagios/libexec); # possible pathes to your Nagios plugins and utils.pm
# 52-53行修改如下
my $megaclibin = '/usr/bin/megacli'; # the full path to your MegaCli binary
my $megacli = "$megaclibin"; # how we actually call MegaCli
-------------------------------------------------------------------------
#
cp check_megaraid_sas /usr/local/nagios/libexec/check_megaraid_sas#
chmod 755 /usr/local/nagios/libexec/check_megaraid_sas#
/usr/local/nagios/libexec/check_megaraid_sas -h (查看使用帮助)
Usage: /usr/local/nagios/libexec/check_megaraid_sas [-s number] [-m number] [-o number]
-s is how many hotspares are attached to the controller
-m is the number of media errors to ignore
-p is the predictive error count to ignore
-o is the number of other disk errors to ignore
5. 测试check_megaraid_sas#
/usr/local/nagios/libexec/check_megaraid_sasWARNING: 0:0:RAID-10:6 drives:1.225TB:Optimal Drives:6 (365 Errors)
如果报告有错误信息, 那么通过如下命令获得哪些物理盘有错误:
#
megacli -PdList -aAll| egrep "Slot Number|Error Count|Failure Count"Slot Number: 0
Media Error Count: 0
Other Error Count: 36Predictive Failure Count: 0
Slot Number: 1
Media Error Count: 0
Other Error Count: 37
Predictive Failure Count: 0
Slot Number: 2
Media Error Count: 0
Other Error Count: 92
Predictive Failure Count: 0
Slot Number: 3
Media Error Count: 0
Other Error Count: 90
Predictive Failure Count: 0
Slot Number: 4
Media Error Count: 0
Other Error Count: 56
Predictive Failure Count: 0
Slot Number: 5
Media Error Count: 0
Other Error Count: 54
Predictive Failure Count: 0
如果确认这些错误可以忽略, 那么如下执行:
#
/usr/local/nagios/libexec/check_megaraid_sas -o 365OK: 0:0:RAID-10:6 drives:1.225TB:Optimal Drives:6 (365 Errors)
输出信息格式说明:
<status> <controller #>:<volume #>:<RAID level>:<volume drive count>:<volume size>:<volume status> Drives:<total drives attached to controller(s)>
剩下就是设置Nagios的Command和Service了, 就不细写了啊.
--End--
阅读全文
类别:Nagios 查看评论