ページへ戻る
印刷
GentooLinuxな生活/サーバー監視方法で悩む
をテンプレートにして作成 ::
Nest Of Hawk
xpwiki
:GentooLinuxな生活/サーバー監視方法で悩む をテンプレートにして作成
開始行:
[[Gentoo Linuxな生活/システム管理関連]]
*サーバの監視に必要なこと
サーバ監視で必要なことと言えば、
-ちゃんと動いているか?
これに尽きると思います。で、ちゃんと動かなくなる要因とい...
-ソフトの異常
-ハードの異常
この2つですね。ソフトの異常はインストールすればすぐにわか...
*何を見ておこうか
まず何が壊れると一番まずいか。HDDです。HDDは消耗品ですし...
*温度を監視しよう
最近のPCは、CPUの発熱も厳しいところから、センサーチップを...
**lm_sensors
lm_sensorsと言うツールで、センサーチップの情報を見ること...
# emerge lm_sensors
インストール完了~
**設定
***カーネル
まずカーネルのI2Cドライバーコンフィグレーションを有効化し...
<*> I2C support
<M> I2C device interface
I2C Algorithms --->
I2C Hardware Bus support --->
Hardware Sensors Chip support --->
Other I2C Chip support --->
んで、I2C Hardware Bus support や、Hardware Sensors Chip ...
***lm_sensors側
# sensors_detect
コマンドを投入します。するてーと、メニュー形式で設定を保...
# rc-update add lm_sensors default
でlm_sensorsを起動時に有効にするようにするとi2cデバイスを...
**使い方
# sensors
w83697hf-isa-0290
Adapter: ISA adapter
VCore: +1.46 V (min = +1.71 V, max = +1.89 V)
+3.3V: +3.20 V (min = +3.14 V, max = +3.47 V)
+5V: +4.89 V (min = +4.76 V, max = +5.24 V)
+12V: +11.55 V (min = +10.82 V, max = +13.19 V)
-12V: -11.70 V (min = -13.18 V, max = -10.80 V)
-5V: -7.71 V (min = -5.25 V, max = -4.75 V)
V5SB: +5.54 V (min = +4.76 V, max = +5.24 V)
VBat: +3.06 V (min = +2.40 V, max = +3.60 V)
fan1: 0 RPM (min = 51923 RPM, div = 2)
fan2: 0 RPM (min = 225000 RPM, div = 2)
temp1: +29°C (high = +2°C, hyst = +0°C) s...
temp2: +17.0°C (high = +80°C, hyst = +75°C) s...
alarms: Chassis intrusion detection ...
beep_enable:
Sound alarm disabled
eeprom-i2c-1-51
Adapter: SiS96x SMBus adapter at 0x10c0
Memory type: DDR SDRAM DIMM
Memory size (MB): 512
eeprom-i2c-1-50
Adapter: SiS96x SMBus adapter at 0x10c0
Memory type: DDR SDRAM DIMM
Memory size (MB): 512
eeprom-i2c-0-50
Adapter: ivtv i2c driver #0
Unknown EEPROM type (255).
adm1030-i2c-1-2e
Adapter: SiS96x SMBus adapter at 0x10c0
CPU Fan: 2537 RPM (min = 1323 RPM, div = 2)
SYS Temp: +46.8°C (low = +0°C, high = +60°C)
SYS Crit: +85°C
CPU Temp: +48.2°C (low = +40°C, high = +52°C)
CPU Crit: +85°C
ほれ、どうですか?ちゃんと出ましたでしょ?つーか、何気にC...
***ASUSマザーボードP8H77-Vでのsensor設定
2013年2月、サーバのマザーを交換したのですが、lm_sensorsで...
sys-apps/lm_sensors
を追加してsensors-detect。するとnct6775というセンサーチッ...
# git clone https://github.com/groeck/nct6775.git
カーネルドライバをチェックアウトしてきます。で、自力コン...
で、/etc/conf.d/lm_sensorsを以下で記述
#
# The format of this file is a shell script that simply ...
# HWMON_MODULES for hardware monitoring driver modules, ...
# BUS_MODULES for any required bus driver module (for ex...
# Load modules at startup
LOADMODULES=yes
# Initialize sensors at startup
INITSENSORS=yes
HWMON_MODULES="coretemp nct6775"
# For compatibility reasons, modules are also listed ind...
# MODULE_0, MODULE_1, MODULE_2, etc.
# Please note that the numbers in MODULE_X must start at...
# steps of 1. Any number that is missing will make the i...
# rest of the modules. Use MODULE_X_ARGS for arguments.
#
# You should use BUS_MODULES and HWMON_MODULES instead i...
MODULE_0=coretemp
MODULE_1=nct6775
で、sensorsコマンドを打ってみたら…
acpitz-virtual-0
Adapter: Virtual device
temp1: +27.8°C (crit = +106.0°C)
temp2: +29.8°C (crit = +106.0°C)
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +38.0°C (high = +85.0°C, crit = +105.0°C)
Core 0: +36.0°C (high = +85.0°C, crit = +105.0°C)
Core 1: +36.0°C (high = +85.0°C, crit = +105.0°C)
Core 2: +28.0°C (high = +85.0°C, crit = +105.0°C)
Core 3: +30.0°C (high = +85.0°C, crit = +105.0°C)
nct6779-isa-0290
Adapter: ISA adapter
in0: +0.91 V (min = +0.00 V, max = ...
in1: +1.02 V (min = +0.00 V, max = ...
in2: +3.36 V (min = +0.00 V, max = ...
in3: +3.36 V (min = +0.00 V, max = ...
in4: +1.01 V (min = +0.00 V, max = ...
in5: +2.04 V (min = +0.00 V, max = ...
in6: +0.37 V (min = +0.00 V, max = ...
in7: +3.47 V (min = +0.00 V, max = ...
in8: +3.39 V (min = +0.00 V, max = ...
in9: +1.06 V (min = +0.00 V, max = ...
in10: +0.34 V (min = +0.00 V, max = ...
in11: +0.17 V (min = +0.00 V, max = ...
in12: +1.01 V (min = +0.00 V, max = ...
in13: +1.02 V (min = +0.00 V, max = ...
in14: +0.22 V (min = +0.00 V, max = ...
fan1: 0 RPM (min = 0 RPM)
fan2: 1140 RPM (min = 0 RPM)
fan3: 0 RPM (min = 0 RPM)
fan4: 0 RPM (min = 0 RPM)
fan5: 0 RPM (min = 0 RPM)
SYSTIN: +24.0°C (high = +0.0°C, hyst =...
CPUTIN: +31.0°C (high = +80.0°C, hyst =...
AUXTIN0: +95.0°C sensor = thermistor
AUXTIN1: +96.0°C sensor = thermistor
AUXTIN2: +104.0°C sensor = thermistor
AUXTIN3: +17.0°C sensor = thermal diode
PCH_CHIP_CPU_MAX_TEMP: +0.0°C
PCH_CHIP_TEMP: +0.0°C
PCH_CPU_TEMP: +0.0°C
PCH_MCH_TEMP: +0.0°C
cpu0_vid: +0.000 V
intrusion0: ALARM
intrusion1: ALARM
見事データがとれました。やれやれ…
**smartデーモン
smartとは、HDDの自己診断機能S.M.A.R.Tのことで、これを導入...
***インストール
# emerge smartmontools
終了~
***設定
/etc/smartd.confを開きます。ほとんど書いてある例を参考に...
# DEVICESCAN ←コメントアウトします。
/dev/sda -a -d sat -o on -S on -s (S/../.././04|L/../../...
説明すると、まずDEVICESCANを付けると、それ以降の設定は全...
これでsmartdを起動すればOK
# rc-update add smartd default
これでS.M.A.R.Tデーモン監視がスタートします。ちなみにコマ...
# smartctl -a /dev/hda
=== START OF INFORMATION SECTION ===
Device Model: HDS722516VLAT20
Serial Number: VNR4GMC4GJR2KM
Firmware Version: V34OA60A
User Capacity: 164,696,555,520 bytes
Device is: In smartctl database [for details use:...
ATA Version is: 6
ATA Standard is: ATA/ATAPI-6 T13 1410D revision 3a
Local Time is: Sun Apr 17 00:10:59 2005 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data col...
was never started.
Auto Offline Data...
Self-test execution status: ( 0) The previous sel...
without error or ...
been run.
Total time to complete Offline
data collection: (3585) seconds.
Offline data collection
capabilities: (0x1b) SMART execute Of...
Auto Offline data...
Suspend Offline c...
command.
Offline surface s...
Self-test support...
No Conveyance Sel...
No Selective Self...
SMART capabilities: (0x0003) Saves SMART data...
power-saving mode.
Supports SMART au...
Error logging capability: (0x01) Error logging su...
General Purpose L...
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 60) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH ...
1 Raw_Read_Error_Rate 0x000b 100 100 060 P...
2 Throughput_Performance 0x0005 100 100 050 P...
3 Spin_Up_Time 0x0007 134 134 024 P...
4 Start_Stop_Count 0x0012 100 100 000 O...
5 Reallocated_Sector_Ct 0x0033 100 100 005 P...
7 Seek_Error_Rate 0x000b 100 100 067 P...
8 Seek_Time_Performance 0x0005 100 100 020 P...
9 Power_On_Hours 0x0012 099 099 000 O...
10 Spin_Retry_Count 0x0013 100 100 060 P...
12 Power_Cycle_Count 0x0032 100 100 000 O...
192 Power-Off_Retract_Count 0x0032 100 100 050 ...
193 Load_Cycle_Count 0x0012 100 100 050 ...
194 Temperature_Celsius 0x0002 125 125 000 ...
196 Reallocated_Event_Count 0x0032 100 100 000 ...
197 Current_Pending_Sector 0x0022 100 100 000 ...
198 Offline_Uncorrectable 0x0008 100 100 000 ...
199 UDMA_CRC_Error_Count 0x000a 200 200 000 ...
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use...
Device does not support Selective Self Tests/Logging
こんな感じで情報が出てきます。かなり細かいでしょ?これで相...
終了行:
[[Gentoo Linuxな生活/システム管理関連]]
*サーバの監視に必要なこと
サーバ監視で必要なことと言えば、
-ちゃんと動いているか?
これに尽きると思います。で、ちゃんと動かなくなる要因とい...
-ソフトの異常
-ハードの異常
この2つですね。ソフトの異常はインストールすればすぐにわか...
*何を見ておこうか
まず何が壊れると一番まずいか。HDDです。HDDは消耗品ですし...
*温度を監視しよう
最近のPCは、CPUの発熱も厳しいところから、センサーチップを...
**lm_sensors
lm_sensorsと言うツールで、センサーチップの情報を見ること...
# emerge lm_sensors
インストール完了~
**設定
***カーネル
まずカーネルのI2Cドライバーコンフィグレーションを有効化し...
<*> I2C support
<M> I2C device interface
I2C Algorithms --->
I2C Hardware Bus support --->
Hardware Sensors Chip support --->
Other I2C Chip support --->
んで、I2C Hardware Bus support や、Hardware Sensors Chip ...
***lm_sensors側
# sensors_detect
コマンドを投入します。するてーと、メニュー形式で設定を保...
# rc-update add lm_sensors default
でlm_sensorsを起動時に有効にするようにするとi2cデバイスを...
**使い方
# sensors
w83697hf-isa-0290
Adapter: ISA adapter
VCore: +1.46 V (min = +1.71 V, max = +1.89 V)
+3.3V: +3.20 V (min = +3.14 V, max = +3.47 V)
+5V: +4.89 V (min = +4.76 V, max = +5.24 V)
+12V: +11.55 V (min = +10.82 V, max = +13.19 V)
-12V: -11.70 V (min = -13.18 V, max = -10.80 V)
-5V: -7.71 V (min = -5.25 V, max = -4.75 V)
V5SB: +5.54 V (min = +4.76 V, max = +5.24 V)
VBat: +3.06 V (min = +2.40 V, max = +3.60 V)
fan1: 0 RPM (min = 51923 RPM, div = 2)
fan2: 0 RPM (min = 225000 RPM, div = 2)
temp1: +29°C (high = +2°C, hyst = +0°C) s...
temp2: +17.0°C (high = +80°C, hyst = +75°C) s...
alarms: Chassis intrusion detection ...
beep_enable:
Sound alarm disabled
eeprom-i2c-1-51
Adapter: SiS96x SMBus adapter at 0x10c0
Memory type: DDR SDRAM DIMM
Memory size (MB): 512
eeprom-i2c-1-50
Adapter: SiS96x SMBus adapter at 0x10c0
Memory type: DDR SDRAM DIMM
Memory size (MB): 512
eeprom-i2c-0-50
Adapter: ivtv i2c driver #0
Unknown EEPROM type (255).
adm1030-i2c-1-2e
Adapter: SiS96x SMBus adapter at 0x10c0
CPU Fan: 2537 RPM (min = 1323 RPM, div = 2)
SYS Temp: +46.8°C (low = +0°C, high = +60°C)
SYS Crit: +85°C
CPU Temp: +48.2°C (low = +40°C, high = +52°C)
CPU Crit: +85°C
ほれ、どうですか?ちゃんと出ましたでしょ?つーか、何気にC...
***ASUSマザーボードP8H77-Vでのsensor設定
2013年2月、サーバのマザーを交換したのですが、lm_sensorsで...
sys-apps/lm_sensors
を追加してsensors-detect。するとnct6775というセンサーチッ...
# git clone https://github.com/groeck/nct6775.git
カーネルドライバをチェックアウトしてきます。で、自力コン...
で、/etc/conf.d/lm_sensorsを以下で記述
#
# The format of this file is a shell script that simply ...
# HWMON_MODULES for hardware monitoring driver modules, ...
# BUS_MODULES for any required bus driver module (for ex...
# Load modules at startup
LOADMODULES=yes
# Initialize sensors at startup
INITSENSORS=yes
HWMON_MODULES="coretemp nct6775"
# For compatibility reasons, modules are also listed ind...
# MODULE_0, MODULE_1, MODULE_2, etc.
# Please note that the numbers in MODULE_X must start at...
# steps of 1. Any number that is missing will make the i...
# rest of the modules. Use MODULE_X_ARGS for arguments.
#
# You should use BUS_MODULES and HWMON_MODULES instead i...
MODULE_0=coretemp
MODULE_1=nct6775
で、sensorsコマンドを打ってみたら…
acpitz-virtual-0
Adapter: Virtual device
temp1: +27.8°C (crit = +106.0°C)
temp2: +29.8°C (crit = +106.0°C)
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +38.0°C (high = +85.0°C, crit = +105.0°C)
Core 0: +36.0°C (high = +85.0°C, crit = +105.0°C)
Core 1: +36.0°C (high = +85.0°C, crit = +105.0°C)
Core 2: +28.0°C (high = +85.0°C, crit = +105.0°C)
Core 3: +30.0°C (high = +85.0°C, crit = +105.0°C)
nct6779-isa-0290
Adapter: ISA adapter
in0: +0.91 V (min = +0.00 V, max = ...
in1: +1.02 V (min = +0.00 V, max = ...
in2: +3.36 V (min = +0.00 V, max = ...
in3: +3.36 V (min = +0.00 V, max = ...
in4: +1.01 V (min = +0.00 V, max = ...
in5: +2.04 V (min = +0.00 V, max = ...
in6: +0.37 V (min = +0.00 V, max = ...
in7: +3.47 V (min = +0.00 V, max = ...
in8: +3.39 V (min = +0.00 V, max = ...
in9: +1.06 V (min = +0.00 V, max = ...
in10: +0.34 V (min = +0.00 V, max = ...
in11: +0.17 V (min = +0.00 V, max = ...
in12: +1.01 V (min = +0.00 V, max = ...
in13: +1.02 V (min = +0.00 V, max = ...
in14: +0.22 V (min = +0.00 V, max = ...
fan1: 0 RPM (min = 0 RPM)
fan2: 1140 RPM (min = 0 RPM)
fan3: 0 RPM (min = 0 RPM)
fan4: 0 RPM (min = 0 RPM)
fan5: 0 RPM (min = 0 RPM)
SYSTIN: +24.0°C (high = +0.0°C, hyst =...
CPUTIN: +31.0°C (high = +80.0°C, hyst =...
AUXTIN0: +95.0°C sensor = thermistor
AUXTIN1: +96.0°C sensor = thermistor
AUXTIN2: +104.0°C sensor = thermistor
AUXTIN3: +17.0°C sensor = thermal diode
PCH_CHIP_CPU_MAX_TEMP: +0.0°C
PCH_CHIP_TEMP: +0.0°C
PCH_CPU_TEMP: +0.0°C
PCH_MCH_TEMP: +0.0°C
cpu0_vid: +0.000 V
intrusion0: ALARM
intrusion1: ALARM
見事データがとれました。やれやれ…
**smartデーモン
smartとは、HDDの自己診断機能S.M.A.R.Tのことで、これを導入...
***インストール
# emerge smartmontools
終了~
***設定
/etc/smartd.confを開きます。ほとんど書いてある例を参考に...
# DEVICESCAN ←コメントアウトします。
/dev/sda -a -d sat -o on -S on -s (S/../.././04|L/../../...
説明すると、まずDEVICESCANを付けると、それ以降の設定は全...
これでsmartdを起動すればOK
# rc-update add smartd default
これでS.M.A.R.Tデーモン監視がスタートします。ちなみにコマ...
# smartctl -a /dev/hda
=== START OF INFORMATION SECTION ===
Device Model: HDS722516VLAT20
Serial Number: VNR4GMC4GJR2KM
Firmware Version: V34OA60A
User Capacity: 164,696,555,520 bytes
Device is: In smartctl database [for details use:...
ATA Version is: 6
ATA Standard is: ATA/ATAPI-6 T13 1410D revision 3a
Local Time is: Sun Apr 17 00:10:59 2005 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data col...
was never started.
Auto Offline Data...
Self-test execution status: ( 0) The previous sel...
without error or ...
been run.
Total time to complete Offline
data collection: (3585) seconds.
Offline data collection
capabilities: (0x1b) SMART execute Of...
Auto Offline data...
Suspend Offline c...
command.
Offline surface s...
Self-test support...
No Conveyance Sel...
No Selective Self...
SMART capabilities: (0x0003) Saves SMART data...
power-saving mode.
Supports SMART au...
Error logging capability: (0x01) Error logging su...
General Purpose L...
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 60) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH ...
1 Raw_Read_Error_Rate 0x000b 100 100 060 P...
2 Throughput_Performance 0x0005 100 100 050 P...
3 Spin_Up_Time 0x0007 134 134 024 P...
4 Start_Stop_Count 0x0012 100 100 000 O...
5 Reallocated_Sector_Ct 0x0033 100 100 005 P...
7 Seek_Error_Rate 0x000b 100 100 067 P...
8 Seek_Time_Performance 0x0005 100 100 020 P...
9 Power_On_Hours 0x0012 099 099 000 O...
10 Spin_Retry_Count 0x0013 100 100 060 P...
12 Power_Cycle_Count 0x0032 100 100 000 O...
192 Power-Off_Retract_Count 0x0032 100 100 050 ...
193 Load_Cycle_Count 0x0012 100 100 050 ...
194 Temperature_Celsius 0x0002 125 125 000 ...
196 Reallocated_Event_Count 0x0032 100 100 000 ...
197 Current_Pending_Sector 0x0022 100 100 000 ...
198 Offline_Uncorrectable 0x0008 100 100 000 ...
199 UDMA_CRC_Error_Count 0x000a 200 200 000 ...
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use...
Device does not support Selective Self Tests/Logging
こんな感じで情報が出てきます。かなり細かいでしょ?これで相...
ページ名: