Search the whole station

机房突然断电,zabbix运行日志出现大量的pg_error报错信息解决方法

大家可能都会遇到机房突然断电,当zabbix恢复运行,查看日志发现有大量的PGRES_FATAL_ERROR错误信息,这种情况应该如何解决呢?

1、zabbix报错信息:

 [select clock,ns,value from history_uint where itemid=36570 and clock<=1662337221 and clock>1661732421 order by clock desc limit 2]

134751:20220906:092021.356 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR:  could not read block 619 in file “base/17376/55998”: read only 0 of 32768 bytes

 [select clock,ns,value from history_uint where itemid=36570 and clock<=1662337221 and clock>1661732421 order by clock desc limit 2]

134751:20220906:092021.359 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR:  could not read block 873 in file “base/17376/55991”: read only 0 of 32768 bytes

 [select clock,ns,value from history_uint where itemid=36567 and clock<=1662337221 and clock>1661732421 order by clock desc limit 2]

134751:20220906:092021.361 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR:  could not read block 873 in file “base/17376/55991”: read only 0 of 32768 bytes

[Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: could not read block 874

2、分析:

发现是因为突然断电,导致pg数据库的表部分索引出现问题了,需要修复,但是由于zabbix数据使用timescaledb时序数据库插件以及超表分区功能,故无法指定单表修复。

3、修复:

在网上搜索后,在以下网站找到了类似的报错以及修复方法:

https://lxadm.com/Repairing_broken_PostgreSQL_databases_/_tables

4、修复损坏的 PostgreSQL 数据库

If your server happened to crash, PostgresSQL database is corrupted, but didn’t contain too precious information, you may try the following fix.

如果你的服务器突然发生崩溃,PostgresSQL 突然被中断,但是没有包含太多之前的信息,你可以尝试安装以下方法修复。

The typical symptoms of a corrupted Postgres database would be like below:

常见的因为数据库运行突然中断的日志结果如下:

2013-03-05 11:29:50 GMT ERROR:  invalid page header in block 608102 of

relation base/16385/16615 2013-03-05 11:29:50 GMT STATEMENT:  COPY

public.history (itemid, clock, value) TO stdout; 2013-03-05 11:29:50

GMT LOG:  could not send data to client: Broken pipe

Or 或者

Query failed: [0] PGRES_FATAL_ERROR:ERROR:  right sibling’s left-link doesn’t match:

block 149266 links to 70823 instead of expected 71357 in index “history_uint_1”

The actual fix is quite easy, and basically sets “zero_damaged_pages = on”, then performs vacuum and reindexing.

实际的修复也简单,在数据库设置sets “zero_damaged_pages = on”,然后执行vacuum and reindexing重新建立索引即可。

DATABASE=yourdatabase

 TABLES=$(echo \\d | psql $DATABASE | grep “^ public” | awk ‘{print $3}’)

 for TABLE in $TABLES; do 

   echo $TABLE

   echo “SET zero_damaged_pages = on; VACUUM FULL $TABLE; REINDEX TABLE $TABLE” | psql $DATABASEdone

在zabbix server或者是pg数据库服务器,创建shell脚本,将以上的复制到脚本,修改为在使用的数据库。

5、修复脚本:

vim pg_repair_index.sh

!# /bin/bash

DATABASE=zabbix   #报错的数据库名称

TABLES=$(echo \\d | psql $DATABASE | grep “^ public” | awk ‘{print $3}’)

for TABLE in $TABLES; do

   echo $TABLE

   echo “SET zero_damaged_pages = on; VACUUM FULL $TABLE; REINDEX TABLE $TABLE” | psql $DATABASE

done 

给脚本执行权限

chmod +x pg_repair_index.sh

执行执行脚本 ./pg_repair_index.sh,会自动重置每个表的索引。

更多zabbix技术资料,请持续关注乐维社区:https://forum.lwops.cn/

The prev: The next:

Related recommendations

Expand more!