欢迎来到飞鸟慕鱼博客,开始您的技术之旅!
当前位置: 首页知识笔记正文

主从节点 staad,can节点a总线错误

墨初 知识笔记 126阅读
1.背景概述

在一次主从复制架构中由于主节点binlog损坏导致从节点无法正常同步数据只能重做从节点因此使用MySQL 8.0.17开始提供的clone技术进行恢复恢复后的2天都发生了主从报错数据冲突。

通过解析binlog发现同一时刻主从节点都在执行同一条语句因此询问业务是否在主从节点都执行了定时任务业务回复定时任务只在主节点执行。

最后排查发现克隆后的从节点的定时任务也会是开启的状态因此同一时刻主从节点同时执行定时任务导致主从报错最终将从节点的定时任务关闭后解决此问题。

2.问题复现

本次测试基于 GreatSQL 8.0.32-24

greatsql> SELECT VERSION();-----------| VERSION() |-----------| 8.0.32-24 |-----------1 row in set (0.00 sec)
1.搭建一套主从架构

2.创建event
greatsql> create database test;greatsql> use test;greatsql> CREATE TABLE test ( id int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT ID, now datetime DEFAULT NULL COMMENT 时间, PRIMARY KEY (id));greatsql> CREATE EVENT event_test ON SCHEDULE EVERY 1 MINUTEON COMPLETION PRESERVE ENABLE COMMENT 每隔1分钟向test表插入记录DO INSERT INTO test VALUES(NULL, now());
3.查看event状态

主节点默认情况下event状态为 ENABLED

greatsql> show events;-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| Db  | Name    | Definer | Time zone | Type    | Execute at | Interval value | Interval field | Starts        | Ends | Status  | Originator | character_set_client | collation_connection | Database Collation |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| test | event_test | root%  | SYSTEM   | RECURRING | NULL    | 1        | MINUTE     | 2023-10-12 17:11:14 | NULL | ENABLED |      1 | utf8mb4        | utf8mb4_unicode_ci  | utf8mb4_unicode_ci |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------1 row in set (0.00 sec)

从节点默认情况下event状态为 SLAVESIDE_DISABLED

greatsql> show events;------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| Db  | Name    | Definer | Time zone | Type    | Execute at | Interval value | Interval field | Starts        | Ends | Status       | Originator | character_set_client | collation_connection | Database Collation |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| test | event_test | root%  | SYSTEM   | RECURRING | NULL    | 1        | MINUTE     | 2023-10-12 17:11:14 | NULL | SLAVESIDE_DISABLED |      1 | utf8mb4        | utf8mb4_unicode_ci  | utf8mb4_unicode_ci |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------1 row in set (0.00 sec)
4.查看数据
greatsql> select * from test.test;-------------------------| id | now                 |-------------------------|  1 | 2023-08-08 16:00:39 ||  2 | 2023-08-08 16:01:39 ||  3 | 2023-08-08 16:02:39 |-------------------------3 rows in set (0.00 sec)
5.从节点进行克隆

# 安装克隆插件主从节点都需要

greatsql> install plugin clone soname mysql_clone.so;

# 从节点进行clone

greatsql> set global clone_valid_donor_list172.17.137.162:6001;greatsql> clone instance from root172.17.137.162:6001 identified by greatsql;
6.重新建立主从复制
greatsql> change master to master_userroot,master_passwordgreatsql,master_host172.17.137.162,master_port6001,master_auto_position1;Query OK, 0 rows affected, 7 warnings (0.04 sec)greatsql> start slave;Query OK, 0 rows affected, 1 warning (0.04 sec)
7.查看主从状态
greatsql> show slave status\G*************************** 1. row ***************************               Slave_IO_State: Waiting for source to send event                  Master_Host: 172.17.137.162                  Master_User: root                  Master_Port: 6001                Connect_Retry: 60              Master_Log_File: binlog.000001          Read_Master_Log_Pos: 2959               Relay_Log_File: relaylog.000002                Relay_Log_Pos: 395        Relay_Master_Log_File: binlog.000001             Slave_IO_Running: Yes            Slave_SQL_Running: No              Replicate_Do_DB:           Replicate_Ignore_DB:            Replicate_Do_Table:        Replicate_Ignore_Table:       Replicate_Wild_Do_Table:   Replicate_Wild_Ignore_Table:                    Last_Errno: 1062                   Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction e8bf88f9-2acd-11ee-a98a-00163e605c74:8 at master log binlog.000001, end_log_pos 2606. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.                 Skip_Counter: 0          Exec_Master_Log_Pos: 2307              Relay_Log_Space: 1242              Until_Condition: None               Until_Log_File: greatsql> select * from performance_schema.replication_applier_status_by_worker limit 1\G*************************** 1. row ***************************                                           CHANNEL_NAME:                                               WORKER_ID: 1                                              THREAD_ID: NULL                                          SERVICE_STATE: OFF                                      LAST_ERROR_NUMBER: 1062                                     LAST_ERROR_MESSAGE: Worker 1 failed executing transaction e8bf88f9-2acd-11ee-a98a-00163e605c74:8 at master log binlog.000001, end_log_pos 2606; Could not execute Write_rows event on table test.test; Duplicate entry 5 for key test.PRIMARY, Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the events master log FIRST, end_log_pos 2606                                   LAST_ERROR_TIMESTAMP: 2023-08-08 16:03:39.033240                               LAST_APPLIED_TRANSACTION:      LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000    LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000         LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000           LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000                                   APPLYING_TRANSACTION: e8bf88f9-2acd-11ee-a98a-00163e605c74:8         APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2023-08-08 16:02:45.795753        APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2023-08-08 16:02:45.795753             APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 2023-08-08 16:03:39.032510                 LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0   LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0  LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE: LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000                     APPLYING_TRANSACTION_RETRIES_COUNT: 0       APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0      APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:     APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.0000001 row in set (0.00 sec

可以看到从节点报错发生了主键冲突。

8.查看从节点定时任务状态

当前从节点定时任务状态为 ENABLED

greatsql> show events;-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| Db  | Name    | Definer | Time zone | Type    | Execute at | Interval value | Interval field | Starts        | Ends | Status  | Originator | character_set_client | collation_connection | Database Collation |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| test | event_test | root%  | SYSTEM   | RECURRING | NULL    | 1        | MINUTE     | 2023-08-08 15:58:45 | NULL | ENABLED |      1 | utf8mb4        | utf8mb4_unicode_ci  | utf8mb4_unicode_ci |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------1 row in set (0.00 sec)

可以看到由于从节点的定时任务也执行了从节点写入数据导致主键冲突。

9.故障解决
greatsql> alter event event_test DISABLE;Query OK, 0 rows affected (0.01 sec)

关闭从节点的定时任务event然后跳过主键冲突的报错最后重新启动主从复制。

3.总结

1.如果主库有定时任务通过clone的方式搭建从库在从库恢复之后需要关闭定时任务避免主从同时执行定时任务导致主从故障。

2.克隆时如果捐赠节点有主从复制信息则克隆后的接收节点也会克隆此复制信息并在克隆完成自动重启实例后自动启动复制避免此问题可以在接收节点的配置文件中增加 skip-slave-start避免节点重启后自动启动复制。

Enjoy GreatSQL :)

关于 GreatSQL

GreatSQL是适用于金融级应用的国内自主开源数据库具备高性能、高可靠、高易用性、高安全等多个核心特性可以作为MySQL或Percona Server的可选替换用于线上生产环境且完全免费并兼容MySQL或Percona Server。

相关链接 GreatSQL社区 Gitee GitHub Bilibili

GreatSQL社区

社区有奖建议反馈

社区博客有奖征稿详情

对文章有疑问或者有独到见解都可以去社区官网提出或分享哦~

技术交流群

微信&QQ群

QQ群533341697

微信群添加GreatSQL社区助手微信号wanlidbc 好友待社区助手拉您进群。

标签:
声明:无特别说明,转载请标明本文来源!