1. MCELOG

Mcelog是一款对现代的X86 Linux系统中Machine Checks错误(特别是内存、IO和CPU硬件错误)进行记录和统计的工具软件。

Mcelog能在32位的X86 Linux内核(2.6.30版本之后)和64位Linux内核(2.6版本之后)上记录Machine Check错误,也应该在所有需要Error Handling的Linux系统上运行。

mcelog进程能够以多种方式来统计内存和其他的一些错误。 Mcelog --client 可以用来查询运行的守护进程。当错误阈值超过了配置,这个守护进程也可以执行triggers。 在一定范围内他被用作自动的PFA算法工具,包括bad page offlining和自动的 cache error handling。用户也可以自行定义

所有错误都记录到 /var/log/mcelog 或 syslog 或者 其他日志 。

对于内存错误它支持现代x86系统的集成内存控制器 ;支持所有的x86系统的CPU错误。

传统的mcelog使用cron计划任务执行,但现在这种用法被弃用。现在运行它的方法是开机启动一个一直运行的进程。此外, 它可以在命令行decode fatal machine checks(当它开机自动运行时,就不用更新的操作系统了.)

安装信息和如何设置mcelog包(如果你是一个管理员)请参见 README

2. Download

mcelog 使用 rolling 版本发布模式。请从 git 获取最新版本

Git clone git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git

最新的变化将在 gitweb 展示

如果你不能在kernel.org获取,这里有一个备份的下载地址

git:/ /github.com/andikleen/mcelog.git

3. Installation

3.1. mcelog 安装

快速开始

标准的方法应该使用mcelog发行版,但如果你的mcelog很旧的或错误配置了,您可以安装一个新的。

如果您的发行版基于旧的crontab方式的mcelog,你需要避免冲突。最简单的方法是删除mcelog在/etc/cron.* 中的计划任务文件。

编译git版本或一个tarball 。git版本是目前推荐的并且更有特色。

获取 git 版本 :

git clone git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git

(或者,如果你已经拥有一个分支,你可以cd mcelog ; git pull -u)

编译和安装

cd mcelog
make
... become root ...
make install

基于Linux中init脚本安装

设置mcelog.init脚本,确保能够开机启动执行。例如可以这样(在opensuse系统):

cp mcelog.init /etc/init.d/mcelog
chkconfig mcelog on

其他发行版上可能需要找到相当于chkconfig的管理方式。Mcelog.init 也有一些配置可以设置,尽管默认值是合理的。

如果你不重新启动系统, 也可以是使用/etc/init.d/mcelog start。如果你正确修改了配置文件,不需要重启进程。

基于systemd系统安装

mcelog包含一个标准systemd单位文件:mcelog.service。安装服务文件 :

cp mcelog.service /usr/lib/systemd/system
systemctl enable mcelog.service

基于upstart系统安装

这里需要使用init脚本

其他的安装方式注意问题

如果你没有配置mcelog在syslog或其他journald日志,您可能需要添加/var/log/mcelog日志轮询设置(如logrotate)。这通常可以在/etc/logrotate .d目录中通过创建一个文件来实现(或重用已存在的版本)。

您可以验证这个守护进程正在运行完全通过运行

mcelog --client

这是查询运行的守护进程中的信息。如果没有任何输出那么系统是健康的(没有错误记录)。

依赖关系

确认你的系统中存在/dev/mcelog. 如果没有请创建一个: mknod /dev/mcelog c 10 227 ,基于udev的系统还可以在/usr/lib/udev/rules.d udev规则文件 中添加信息:

ACTION=="add", KERNEL=="mcelog", SUBSYSTEM=="misc", TAG+="systemd", ENV{SYSTEMD_WANTS}+="mcelog.service"

如果/dev不是持久的,在许多新版本上这是需要。特殊发布的版本会在意这些。

对于bad page offlining你需要一个2.6.33+内核或2.6.32内核与有该功能的补丁 (比如RHEL6或SLES11-SP1)

内核必须启用CONFIG_X86_MCE。对于32位内核需要2.6.30+内核。

Mcelog 运行模式

Mcelog有三种运行模式: cronjob(计划) , trigger(触发器) , daemon(守护进程)

推荐的方式是daemon方式,因为一些新的功能(如页面错误预测分析)需要一个连续运行的守护进程。如果你只是想运行在守护进程模式,后面的内容你就不用看了。

在daemon模式mcelog持续的作为一个守护进程在后台运行等待错误。可以从一个 init脚本运行mcelog --daemon &,这是最快和最feature-ful的方式。

cronjob方式是旧的方法。 mcelog每5分钟从cron运行和检查错误。缺点是它明显的延迟错误报告(高达10分钟),同时不允许mcelog保持扩展状态 。

trigger方式是一种较新的方法,当内核有错误时运行mcelog。开启方式:

echo /usr/sbin/mcelog   >/sys/devices/system/machinecheck/machinecheck0/trigger 

这更快 , 但仍然不允许 mcelog 保持状态 , 并且有相对较高的开销为每个错误 , 因为一个项目必须从头开始初始化。

4. Configuration

4.1.1. mcelog.conf reference

Mcelog配置通过修改配置文件/etc/mcelog.conf

  • 通用格式
 optionname = value

除非删除某条配置,否则不允许使用空白值。

一般所有的命令行操作在这里不能使用,查看man mcelog或者mcelog --help获取操作列表。例如:允许 --no-syslog 使用如下方式

 no-syslog = yes (or no to disable)

when the option has a argument

logfile = /tmp/logfile
  • 下面的这些选项不能再命令行指定

设置cpu类型当mcelog解码事件时。

 cpu = type

有效的参数对于type请查看mcelog --help。如果这个参数被设置错误,那么decoded的输出也将会错误。By default when this parameter is not set mcelog uses the CPU it is running on on very new kernels the mcelog events reported by the kernel also carry the CPU type which is used too when available and not overriden.

运行daemon模式:

 daemon = yes

By default mcelog just processes the currently pending events and exits. In daemon mode it will keep running as a daemon in the background and poll the kernel for events and then decode them.

Filter out known broken events by default.

 filter = yes

Don't log memory errors individually. They still get accounted if that is enabled.

 filter-memory-errors = yes

Output in undecoded raw format to be easier machine readable (default is decoded).

 raw = yes

Set cpu mhz to decode uptime from time stamp counter (output unreliable, not needed on new kernels which report the event time directly. A lot of systems don't have a linear time stamp clock and the output is wrong then. Normally mcelog tries to figure out if it the TSC is reliable and only uses the current frequency then. Setting a frequency forces timestamp decoding. This setting is obsolete with modern kernels which report the time directly.

 cpumhz = 1800.00

Log output options Log decoded machine checks in syslog (default stdout or syslog for daemon)

 syslog = yes

Log decoded machine checks in syslog with error level

 syslog-error = yes

Never log anything to syslog

 no-syslog = yes

Append log output to logfile instead of stdout. Only when no syslog logging is active

 logfile = filename

Use smbios information to decode dimms (needs root). This function is not recommended to use right now and generally not needed. The exception is memdb prepopulation, which is configured separately below.

 dmi = no

When in daemon mode run as this user after set up. Note that the triggers will run as this user too. Setting this to non root will mean that triggers cannot take some corrective action, like offlining objects.

 run-credentials-user = root

Group to run as daemon with default to the group of the run-credentials-user

 run-credentials-group = nobody

The [server] config section

User allowed to access client socket. when set to * match any root is always allowed to access. default: root only

 client-user = root

group allowed to access mcelog When no group is configured any group matches (but still user checking). when set to * match any

 client-group = root

Path to the unix socket for client<->server communication. When no socket-path is configured the server will not start

 socket-path = /var/run/mcelog-client

When mcelog starts it checks if a server is already running. This configures the timeout for this check.

 initial-ping-timeout = 2

The [dimm] config section

Is the in memory dimm error tracking enabled? Only works on systems with integrated memory controller and which are supported. Only takes effect in daemon mode.

 dimm-tracking-enabled = yes

Use DMI information from the BIOS to prepopulate DIMM database. Note this might not work with all BIOS and requires mcelog to run as root. Alternative is to let mcelog create DIMM objects on demand.

 dmi-prepopulate = yes

Execute these triggers when the rate of corrected or uncorrected Errors per DIMM exceeds the threshold. Note when the hardware does not report DIMMs this might also be per channel. The default of 10/24h is reasonable for server quality DDR3 DIMMs as of 2009/10.

 uc-error-trigger = dimm-error-trigger
 uc-error-threshold = 1 / 24h
 ce-error-trigger = dimm-error-trigger
 ce-error-threshold = 10 / 24h

The [socket] config section

Enable memory error accounting per socket.

 socket-tracking-enabled = yes

Threshold and trigger for uncorrected memory errors on a socket.

mem-uc-error-trigger = socket-memory-error-trigger

 mem-uc-error-threshold = 100 / 24h

Trigger script for corrected memory errors on a socket.

 mem-ce-error-trigger = socket-memory-error-trigger

Threshold on when to trigger a correct error for the socket.

 mem-ce-error-threshold = 100 / 24h

log socket error threshold explicitely?

 mem-ce-error-log = yes

Trigger script for uncorrected bus error events

 bus-uc-threshold-trigger = bus-error-trigger

Trigger script for uncorrected iomca erors

 iomca-threshold-trigger = iomca-error-trigger

Trigger script for other uncategorized errors

 unknown-threshold-trigger = unknown-error-trigger

The [cache] config section

Processing of cache error thresholds reported by intel cpus.

 cache-threshold-trigger = cache-error-trigger

Should cache threshold events be logged explicitely?

 cache-threshold-log = yes

The [page] config section

Memory error accouting per 4k memory page. Threshold for the correct memory errors trigger script.

 memory-ce-threshold = 10 / 24h

Trigger script for corrected errors.

 memory-ce-trigger = page-error-trigger

Should page threshold events be logged explicitely?

 memory-ce-log = yes

Specify the internal action in mcelog to exceeding a page error threshold this is done in addition to executing the trigger script if available

memory-ce-action = off|account|soft|hard|soft-then-hard

 memory-ce-action = soft

off

no action

account

only account errors

soft

try to soft-offline page without killing any processes

This requires an uptodate kernel. Might not be successfull.

hard

try to hard-offline page by killing processes

Requires an uptodate kernel. Might not be successfull.

soft-then-hard

First try to soft offline, then try hard offlining

The [trigger] config section

Maximum number of running triggers

children-max = 2

execute triggers in this directory

directory = /etc/mcelog

4.2. Glossary 术语

  • Machine Check 硬件故障时在系统中产生日志信息

  • Machine Check Architecture (MCA) x86计算机允许软件统过硬件程序接口检测和处理硬件故障。 这是一个抽象的接口,允许操作系统逆向处理。详细的描述在Intel Architecture Software Developer Manual Volume 3 chapter 15。

  • Machine Check Exception (MCE) x86 CPU增加了完整的18例外来预测一个不可修正的硬件错误。操作系统拥有一个特殊的进程处理MCA寄存器中的信息。

  • Error Correcting Code (ECC) 错误修正码,一段能够检测和修复错误的特殊的代码。常用的ECC码能够检测到2bit的错误和修正1bit的错误(有一些高级的编码能够处理更多)。阅读维基百科中的ECC部分。服务器的内存一般支持ECC。

  • Corrected error 硬件错误能够被硬件自动修复(例如:使用ECC修复单bit故障)。这些错误不需要软件立刻解决,但是任然会报告、统计、预测和失败分析。

  • Uncorrected error 硬件产生了一个不可修正的硬件错误。数据已经产生异常。这种错误要求软件及时反馈。

  • Predictive Failure Analysis (PFA) 主要通过可修正错误的趋势来预测硬件以后的状态,自动任务步骤以避免一些离线。 mcelog工具自动的offlining对于内存、CPU缓存。此外用户可以进行配置。

  • IO-MCA 在最新的Xeon系统中报告PCIE链路不可修正的错误。mcelog支持,具体看IO errors错误报告。

  • PCI AER(PCI-Express Advanced Error reporting) PCIE高级错误报告,在PCIE链路中用来报告错误。不支持mcelog工具,但是会产生内核日志。详细的内容可以查看OLS页或者IO-MCA。

  • RAS Reliability, Availability, Serviceability. 可靠性,实用性,可服务性。

  • DIMM Memory module. 内存模块

  • DMI (or SMBIOS) 这是一个标准的BIOS向操作系统报告当前硬件配置方式。DMI信息会被dmidecode程序输出。mcelog使用这些信息当能够映射DIMMnumbers to silk screen labels。

  • APEI 一个ACPI4的借口定义标准,他允许BIOS向操作系统报告错误。以前大家都知道的是WHEA :硬件错误报告体系结构(Windows Hardware Error Architecture)

  • EDAC 一个可选的内存错误报告框架,具体请阅读FAQ部分。

5. Trigger

除了全局计数器, mcelog 还维护使用 leaky-bucket 算法来约定阀值。当错误的数量在一个特定时间窗超过预先配置的阈值触发器将被执行。触发器通常 /etc/mcelog 目录中的 shell 脚本还可以其他内部操作。在 mcelog.conf 阈值和触发器可以配置

触发将作为用户配置为 mcelog mcelog 运行。配置 , 默认情况下根默认触发动作可以被指定一个不同的触发脚本在配置文件中。行动除了默认触发 ( 比如通知管理员 ) 可以放在各自的 /etc/mcelog/*. 本地脚本执行后默认操作。这允许更新默认脚本没有覆盖当地行动。所有触发动作也记录到系统日志。

error flow 概述了各种触发器 ( 注意有些缺失 )

6. Triggers

In addition to global counters mcelog also maintains thresholds using a leaky-bucket algorithm. When the number of errors in a specific time window exceeds a pre-configured threshold a trigger will be executed. Triggers are usually shell scripts in the /etc/mcelog directory but can be also other internal actions. Thresholds and triggers can be configured in mcelog.conf

Trigger will run as the user configured for mcelog in mcelog.conf, by default root The default trigger action can be overridden by specifying a different trigger script in the configuration file. Actions in addition to the default trigger (like notifying an administrator) can be put into the respective /etc/mcelog/*.local script which is executed after the default action. This allows updating the default scripts without overriding local actions. All trigger actions are also logged to syslog.

The error flow gives an overview over the various triggers (note some are missing)

6.1. The DIMM and socket memory error triggers

The /etc/mcelog/dimm-error-trigger and /etc/mcelog/socket-memory-error-trigger scripts are executed when a DIMM or a CPU socket exceeds a configured corrected or uncorrected memory error threshold. The thresholds are configured in the mcelog.conf [dimm] and [socket] sections. The default triggers log a warning message in the system log. The triggers are only executed when mcelog runs as a daemon.Arguments are passed as environment variables

| THRESHOLD | human readable threshold status | | MESSAGE | Human readable consolidated error message | | TOTALCOUNT | total corrected oruncorrected count of errors for current DIMM depending on what triggered the event | | LOCATION | Consolidated location as a single string | | DMI_LOCATION | DIMM location from DMI/SMBIOS if available | | DMI_NAME | DIMM identifier from DMI/SMBIOS if available | | DIMM | DIMM number reported by hardware | | CHANNEL | Channel number reported by hardware | | SOCKETID | Socket ID of CPU that includes the memory controller with the DIMM | | CECOUNT | Total corrected error count for DIMM | | UCCOUNT | Total uncorrected error count for DIMM | | LASTEVENT | Time stamp of event that triggered threshold (in time_t format, seconds) | | THRESHOLD_COUNT | Total umber of events in current threshold time period of specific type |

After the default action local actions in /etc/mcelog/dimm-error-trigger.local or respective /etc/mcelog/socket-memory-error-trigger.local are executed.

6.2. The page error trigger

The /etc/mcelog/page-error-trigger script is executed by mcelog in daemon mode when a page in memory exceeds a pre-configured corrected or uncorrected error threshold. mcelog internally also implements offlining the page through the kernel. This is configured through the [page] section of mcelog.conf.

The environment arguments are the same as for the dimm-error-trigger script

After the default action local actions in /etc/mcelog/page-error-trigger.loccal are executed.

6.3. The cache error trigger

/etc/mcelog/cache-error-trigger shell script is called for cache error handling in daemon mode when a CPU reports excessive corrected cache errors. This could be a indication for future uncorrected errors.

This trigger is configured through the cache section in the configuration file. The threshold is defined by the CPU. The default trigger offlines the affected CPU cores, unless it is the last core running. For more details see extended cache error handling.

Arguments are passed as environment variables

| MESSAGE | Human readable error message | | CPU | Linux CPU number that triggered the error | | LEVEL | Cache level affected by error | | TYPE | Cache type affected by error (Data,Instruction,Generic) | | AFFECTED_CPUS | List of CPUs sharing the affected cache | | SOCKETID | Socket ID of affected CPU |

After the default action local actions in /etc/mcelog/cache-error-trigger.local are executed.

The bus-uc-threshold-trigger runs on uncorrected errors on a IO bus. It is configured through the bus-uc-threshold-trigger and bus-uc-threshold-trigger-threshold options in /etc/mcelog.conf. By default it logs a message with the error location to the system log. After the default action local actions in/etc/mcelog/bus-uc-error-trigger.local are executed.

Arguments are passed as environment variables

| MESSAGE | Human readable consolidated error message. | | LOCATION | Consolidated location as a single string | | SOCKETID | Socket ID of CPU that includes the memory controller with the DIMM | | LEVEL | Interconnect level | | PARTICIPATION | Processor Participation (Originator, Responder or Observer) | | REQUEST | Request type (read, write, prefetch, etc.) | | ORIGIN | Memory or IO | | TIMEOUT | The request timed out or not |

The iomca-error-trigger runs when a socket receives bus or interconnect errors. It is configured through the iomca-error-trigger and iomca-error-trigger-threshold options in /etc/mcelog.conf. By default it logs a message with the error location to the system log. After the default action local actions in/etc/mcelog/iomca-error-trigger.local are executed.

Arguments are passed as environment variables

| MESSAGE | Human readable consolidated error message | | LOCATION | Consolidated location as a single string | | SOCKETID | Socket ID of CPU that includes the memory controller with the DIMM | | CPU | Linux CPU number that triggered the error | | SET | PCI segment number | | BUS | PCI bus number | | DEVICE | PCI device number | | FUNCTION | PCI function number |

The unknown-error-trigger runs on any errors not otherwise categorized. It is configured through the unknown-error-trigger and unknown-error-trigger-threshold options in /etc/mcelog.conf. By default it logs a message to the system log. After the default action local actions in /etc/mcelog/unknown-error-trigger.local are executed.

Arguments are passed as environment variables

| MESSAGE | Human readable consolidated error message | | LOCATION | Consolidated location as a single string | | SOCKETID | Socket ID of CPU that includes the memory controller with the DIMM | | CPU | Linux CPU number that triggered the error | | STATUS | IA32_MCi_STATUS register value | | ADDR | IA32_MCi_ADDR register value | | MISC | IA32_MCi_MISC register value | | MCGSTATUS | IA32_MCG_STATUS register value | | MCGCAP | IA32_MCG_CAP register value |

Copyright © 温玉 2021 | 浙ICP备2020032454号 all right reserved,powered by Gitbook该文件修订时间: 2022-01-02 08:22:10

results matching ""

    No results matching ""