commit ac46eb01fd785b793d598b54d201209d2bc7e3b7 Author: dengwendi Date: Wed Nov 15 15:07:34 2023 +0800 INIT diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..4673f58 --- /dev/null +++ b/LICENSE @@ -0,0 +1,202 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [2013-2021] [Alibaba Group Holding Limited] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + diff --git a/README.md b/README.md new file mode 100644 index 0000000..e822db2 --- /dev/null +++ b/README.md @@ -0,0 +1,57 @@ +# PolarDB-X Operator 介绍 + +--- + +PolarDB-X Operator 是一个基于 Kubernetes 的 PolarDB-X 集群管控系统,希望能在 Kubernetes 上提供完整的生命周期管理能力。PolarDB-X Operator 支持运行在私有或者公有的 Kubernetes 集群上安装并部署 PolarDB-X 集群。 + +## 限制与说明 + +### 操作系统和 CPU 架构 + +PolarDB-X Operator 支持在任意环境的 Kubernetes 集群上进行部署,支持异构 Kubernetes 上的组件部署和 PolarDB-X 数据库集群部署。 + +目前 PolarDB-X Operator 和 PolarDB-X 集群支持以下操作系统和架构: + +| 操作系统 | CPU 架构 | 推荐配置 | +| :------: | :-------------: | :-------------------: | +| Linux | x86_64 (amd64) | 32C128G, >= 500G 磁盘 | +| Linux | aarch64 (arm64) | 32C128G, >= 500G 磁盘 | + +注: arm64 架构暂无镜像,需要单独编译。 + +### 磁盘 + +出于磁盘性能考虑,PolarDB-X Operator 使用宿主机上本地盘的某个路径来存放系统脚本和存储节点的数据,默认配置为 `/data`。PolarDB-X Operator 会自动管理其中存放的脚本和数据,请勿随意删除或更改,以免导致系统和 PolarDB-X 集群出现问题。 + +若您需要配置不同的路径,可以在安装 Operator 时参考 [[PolarDB-X 安装部署-Operator部署]](./deployment/README.md) 文档修改配置。 + +## 安装 + +在部署 PolarDB-X 集群前,首先需要在 Kubernetes 上安装 PolarDB-X Operator 的系统。通过借助 Kubernetes 上的包管理工具 helm,你可以快速完成系统的部署,参考文档 [[PolarDB-X 安装部署-快速开始]](./deployment/README.md) 在本地或已有的 Kubernetes 上安装 PolarDB-X Operator 并部署一个 PolarDB-X 测试集群。 + +Helm 包中预定义了许多配置,如果你想更改这些配置,可以参考 [[PolarDB-X 安装部署-Operator部署]](./deployment/README.md) 更改配置项,以使它更好的使用 Kubernetes 的资源。 + +> 注:为了在本地测试,快速开始中的集群使用了较少的资源,如需进行性能测试,请参考运维指南和 PolarDBXCluster API 文档进行更为规范的部署。 + +## API + +为了使 PolarDB-X 能够被 Kubernetes 识别和管理,我们将 PolarDB-X 集群和它的运维操作抽象为多个[定制资源](https://kubernetes.io/zh/docs/concepts/extend-kubernetes/api-extension/custom-resources/): + ++ PolarDBXCluster,定义和描述了 PolarDB-X 集群的拓扑、规格、配置和运维等信息 ++ XStore,定义和描述了 PolarDB-X 集群的数据节点 (DN) 的拓扑、规格、配置和运维等信息 + +您可以使用以下命令来查看 Kubernetes 集群中的这些资源: + +```bash +kubectl get polardbxcluster,xstore +``` + +参考 [[PolarDB-X CRD API](./api/README.md)] 来了解目前支持的所有资源和细节。 + +## 运维 + +同公有云上的 PolarDB-X 集群一样,PolarDB-X Operator 也支持绝大部分的运维操作,包括部署、删除、升级、升配、扩缩容和动态配置等,您可以参考 [[运维指南](./ops/README.md)] 来了解目前支持的所有的运维操作和使用方法。 + +## FAQ + +运维 PolarDB-X 集群时可能会遇到一些问题,[[FAQ]](./faq/README.md) 里整理了常见的问题和处理方法。 \ No newline at end of file diff --git a/api/README.md b/api/README.md new file mode 100644 index 0000000..1a39110 --- /dev/null +++ b/api/README.md @@ -0,0 +1,9 @@ +# PolarDB-X CRD API + +--- + +## polardbx.aliyun.com/v1 + +资源类型: + ++ [PolarDBXCluster](./polardbxcluster.md) \ No newline at end of file diff --git a/api/polardbxcluster.md b/api/polardbxcluster.md new file mode 100644 index 0000000..a4591b6 --- /dev/null +++ b/api/polardbxcluster.md @@ -0,0 +1,290 @@ +# polardbx.aliyun.com/v1 PolarDBXCluster + +使用 PolarDBXCluster 可以自由定义集群的拓扑、规格和配置,可以支持超大规模和不同容灾等级的部署。 + +以下是可配置项及相关的字段的含义: + +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXCluster +metadata: + name: full +spec: + # **Optional** + # + # 是否使用 DN-0 作为共享 GMS 以节省资源,默认值 false + # + # 不推荐在生产集群使用 + shareGMS: false + + # **Optional** + # + # PolarDB-X 集群所支持的 MySQL 协议版本,默认值 5.7 + # 可选值:5.7, 8.0 + protocolVersion: 5.7 + + # **Optional** + # + # PolarDB-X 集群在 Kubernetes 内对外暴露的服务名,默认为 .metadata.name + serviceName: full + + # **Optional** + # + # PolarDB-X 集群在 Kubernetes 内对外暴露的服务类型,默认为 ClusterIP + # 可选值参考 Service 的类型 + # + # 注:云上的 Kubernetes 集群可使用 LoadBalancer 来绑定 LB + serviceType: LoadBalancer + + # **Optional** + # + # PolarDB-X 集群是否为只读实例,默认为 false + readonly: false + + # **Optional** + # + # PolarDB-X 只读实例所属主实例的名称,默认为空 + # 当本实例不为只读实例时,此字段无效 + primaryCluster: pxc-master + + # **Optional** + # + # PolarDB-X 主实例附属的只读实例,仅在本实例不为只读时生效 + # 当本实例创建时,根据以下信息创建出与本实例规格和参数相同的只读实例 + # 本字段不可修改,且仅在创建时有效 + initReadonly: + - # 只读实例 CN 数 + cnRepilcas: 1 + # **Optional** + # + # 只读实例后缀名,不填则会生成随机后缀 + name: readonly + # **Optional** + # + # 只读实例参数 + extraParams: + AttendHtap: "true" + + # **Optional** + # + # PolarDB-X 集群安全配置 + security: + # **Optional** + # + # TLS 相关配置,暂不生效 + tls: + secretName: tls-secret + # **Optional** + # + # 指定用于编码内部密码的 key,引用指定 Secret 的 key + encodeKey: + name: ek-secret + key: key + + # *Optional** + # + # PolarDB-X 初始账号配置 + privileges: + - username: admin + password: "123456" + type: SUPER + + # PolarDB-X 集群配置 + config: + # CN 相关配置 + cn: + # 静态配置,修改会导致 CN 集群重建 + static: + # 启用协程, OpenJDK 暂不支持,需使用 dragonwell + EnableCoroutine: false + # 启用备库一致读 + EnableReplicaRead: false + # 启用 JVM 的远程调试 + EnableJvmRemoteDebug: false + # 自定义 CN 静态配置,key-value 结构 + ServerProperties: + processors: 8 + # 是否在该(只读)实例 CN 上开启 MPP 能力,主实例 CN 默认开启 + # 当该参数开启时,该实例会参与多机并行(MPP),同时分担主实例的读流量,反之则不参与 + AttendHtap: false + # 动态配置,修改并 apply 会由 operator 自动推送,key-value 结构 + dynamic: + CONN_POOL_IDLE_TIMEOUT: 30 + # DN 相关配置 + dn: + # DN my.cnf 配置,覆盖模板部分 + mycnfOverwrite: |- + loose_binlog_checksum: crc32 + # DN 日志清理间隔 + logPurgeInterval: 5m + # 日志与数据分离存储 + logDataSeparation: false + + # PolarDB-X 集群拓扑 + topology: + # 集群使用的镜像版本 (tag),默认为空(由 operator 指定) + version: v1.0 + + # 集群部署规则 + rules: + # 预定义节点选择器 + selectors: + - name: zone-a + nodeSelector: + nodeSelectorTerms: + - matchExpressions: + - key: topology.kubernetes.io/zone + operator: In + values: + - cn-hangzhou-a + - name: zone-b + nodeSelector: + nodeSelectorTerms: + - matchExpressions: + - key: topology.kubernetes.io/zone + operator: In + values: + - cn-hangzhou-b + - name: zone-c + nodeSelector: + nodeSelectorTerms: + - matchExpressions: + - key: topology.kubernetes.io/zone + operator: In + values: + - cn-hangzhou-c + components: + # **Optional** + # + # GMS 部署规则,默认和 DN 一致 + gms: + # 堆叠部署结构,operator 尝试在节点选择器指定的节点中,堆叠部署 + # 每个存储节点的子节点以达到较高资源利用率的方式,仅供测试使用 + rolling: + replicas: 3 + selector: + reference: zone-a + # 节点组部署结构,可以指定每个 DN 的子节点的节点组和节点选择器, + # 从而达成跨区、跨城等高可用部署结构 + nodeSets: + - name: cand-zone-a + role: Candidate + replicas: 1 + selector: + reference: zone-a + - name: cand-zone-b + role: Candidate + replicas: 1 + selector: + reference: zone-b + - name: log-zone-c + role: Voter + replicas: 1 + selector: + reference: zone-c + + # **Optional** + # + # DN 部署规则,默认为 3 节点,所有节点可部署 + dn: + nodeSets: + - name: cands + role: Candidate + replicas: 2 + - name: log + role: Voter + replicas: 1 + + # **Optional** + # + # CN 部署规则,同样按组划分 CN 节点 + cn: + - name: zone-a + # 合法值:数字、百分比、(0, 1] 分数,不填写为剩余 replica(只能有一个不填写) + # 总和不能超过 .topology.nodes.cn.replicas + replicas: 1 + selector: + reference: zone-a + - name: zone-b + replicas: 1 / 3 + selector: + reference: zone-b + - name: zone-c + replicas: 34% + selector: + reference: zone-c + + # **Optional** + # + # CDC 部署规则,同 CN + cdc: + - name: half + replicas: 50% + selector: + reference: zone-a + - name: half + # 带 + 表示向上取整 + replicas: 50%+ + selector: + reference: zone-b + + nodes: + # **Optional** + # + # GMS 规格配置,默认和 DN 相同 + gms: + template: + # 存储节点引擎,默认 galaxy + engine: galaxy + # 存储节点镜像,默认由 operator 指定 + image: polardbx-engine:latest + # 存储节点 Service 类型,默认为 ClusterIP + serviceType: ClusterIP + # 存储节点 Pod 是否适用宿主机网络,默认为 true + hostNetwork: true + # 存储节点磁盘空间限制,不填写无限制(软限制) + diskQuota: 10Gi + # 存储节点子节点使用的资源,默认为 4c8g + resources: + limits: + cpu: 4 + memory: 8Gi + + # **Optional** + # + # DN 规格配置 + dn: + # DN 数量配置,默认为 2 + replicas: 2 + template: + resources: + limits: + cpu: 4 + memory: 8Gi + # IO 相关限制,支持 BPS 和 IOPS 限制 + limits.io: + iops: 1000 + bps: 10Mi + + # CN 规格配置,参数解释同 DN + cn: + replicas: 3 + template: + image: polardbx-sql:latest + hostNetwork: false + resources: + limits: + cpu: 4 + memory: 8Gi + + # CDC 规格配置,参数解释同 CN,可不配置代表不启动 CDC 能力 + cdc: + replicas: 2 + template: + image: polardbx-cdc:latest + hostNetwork: false + resources: + limits: + cpu: 4 + memory: 8Gi +``` \ No newline at end of file diff --git a/deployment/0-quickstart.md b/deployment/0-quickstart.md new file mode 100644 index 0000000..cef5002 --- /dev/null +++ b/deployment/0-quickstart.md @@ -0,0 +1,302 @@ +# 快速上手 + +本文介绍了如何创建一个简单的 Kubernetes 集群,部署 PolarDB-X Operator,并使用 operator 部署一个完整的 PolarDB-X 集群。 + +> 注:本文中的部署说明仅用于测试目的,不要直接用于生产环境。 + +本文主要包含以下内容: + +1. [创建 Kubernetes 测试集群](#创建-kubernetes-测试集群) +2. [部署 PolarDB-X Operator](#部署-polardb-x-operator) +3. [部署 PolarDB-X 集群](#部署-polardb-x-集群) +4. [连接 PolarDB-X 集群](#连接-polardb-x-集群) +5. [销毁 PolarDB-X 集群](#销毁-polardb-x-集群) +6. [卸载 PolarDB-X Operator](#卸载-polardb-x-operator) + +# 创建 Kubernetes 测试集群 + +本节主要介绍如何使用 [minikube](https://minikube.sigs.k8s.io/docs/start/) 创建 Kubernetes 测试集群,您也可以使用阿里云的 [容器服务 ACK](https://www.aliyun.com/product/kubernetes) 来创建一个 Kubernetes 集群,并遵循教程部署 PolarDB-X Operator 和 PolarDB-X 集群。 + +## 使用 minikube 创建 Kubernetes 集群 + +[minikube](https://minikube.sigs.k8s.io/docs/start/) 是由社区维护的用于快速创建 Kubernetes 测试集群的工具,适合测试和学习 Kubernetes。使用 minikube 创建的 Kubernetes 集群可以运行在容器或是虚拟机中,本节中以 CentOS 8.2 上创建 Kubernetes 为例。 + +> 注:如在其他操作系统例如 macOS 或 Windows 上部署 minikube,部分步骤可能略有不同。 + +部署前,请确保已经安装 minikube 和 Docker,并符合以下要求: + ++ 机器规格不小于 4c8g ++ minikube >= 1.18.0 ++ docker >= 1.19.3 + +minikube 要求使用非 root 账号进行部署,如果你试用 root 账号访问机器,需要新建一个账号。 + +```bash +$ useradd -ms /bin/bash polardbx +$ usermod -aG docker polardbx +``` + +如果你使用其他账号,请和上面一样将它加入 docker 组中,以确保它能够直接访问 docker。 + +使用 su 切换到账号 `polardbx`, + +```bash +$ su polardbx +``` + +执行下面的命令启动一个 minikube, + +```bash +minikube start --cpus 4 --memory 7960 --image-mirror-country cn --registry-mirror=https://docker.mirrors.sjtug.sjtu.edu.cn +``` + +> 注:这里我们使用了阿里云的 minikube 镜像源以及 SJTU 提供的 docker 镜像源来加速镜像的拉取。 + +如果一切运行正常,你将会看到类似下面的输出。 + +```bash +😄 minikube v1.23.2 on Centos 8.2.2004 (amd64) +✨ Using the docker driver based on existing profile +❗ Your cgroup does not allow setting memory. + ▪ More information: https://docs.docker.com/engine/install/linux-postinstall/#your-kernel-does-not-support-cgroup-swap-limit-capabilities +❗ Your cgroup does not allow setting memory. + ▪ More information: https://docs.docker.com/engine/install/linux-postinstall/#your-kernel-does-not-support-cgroup-swap-limit-capabilities +👍 Starting control plane node minikube in cluster minikube +🚜 Pulling base image ... +🤷 docker "minikube" container is missing, will recreate. +🔥 Creating docker container (CPUs=4, Memory=7960MB) ... + > kubeadm.sha256: 64 B / 64 B [--------------------------] 100.00% ? p/s 0s + > kubelet.sha256: 64 B / 64 B [--------------------------] 100.00% ? p/s 0s + > kubectl.sha256: 64 B / 64 B [--------------------------] 100.00% ? p/s 0s + > kubeadm: 43.71 MiB / 43.71 MiB [---------------] 100.00% 1.01 MiB p/s 44s + > kubectl: 44.73 MiB / 44.73 MiB [-------------] 100.00% 910.41 KiB p/s 51s + > kubelet: 146.25 MiB / 146.25 MiB [-------------] 100.00% 2.71 MiB p/s 54s + + ▪ Generating certificates and keys ... + ▪ Booting up control plane ... + ▪ Configuring RBAC rules ... +🔎 Verifying Kubernetes components... + ▪ Using image registry.cn-hangzhou.aliyuncs.com/google_containers/storage-provisioner:v5 (global image repository) +🌟 Enabled addons: storage-provisioner, default-storageclass +💡 kubectl not found. If you need it, try: 'minikube kubectl -- get pods -A' +🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default +``` + +此时 minikube 已经正常运行。minikube 将自动设置 kubectl 的配置文件,如果之前已经安装过 kubectl,现在可以使用 kubectl 来访问集群: + +```bash +$ kubectl cluster-info +kubectl cluster-info +Kubernetes control plane is running at https://192.168.49.2:8443 +CoreDNS is running at https://192.168.49.2:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy + +To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. +``` + +如果没有安装 kubectl 的,minikube 也提供了子命令来使用 kubectl: + +```bash +$ minikube kubectl -- cluster-info +Kubernetes control plane is running at https://192.168.49.2:8443 +CoreDNS is running at https://192.168.49.2:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy + +To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. +``` + +> 注意:minikube kubectl 子命令需要在 kubectl 的参数前加 "--",如使用 bash shell 可以用 alias kubectl="minikube kubectl -- " 来设置快捷指令。下文都将使用 kubectl 命令进行操作。 + +现在我们可以开始部署 PolarDB-X Operator 了! + +> 测试完成后,执行 minikube delete 来销毁集群。 + +# 部署 PolarDB-X Operator + +开始之前,请确保满足以下前置要求: + ++ 已经准备了一个运行中的 Kubernetes 集群,并确保 + + 集群版本 >= 1.18.0 + + 至少有 2 个可分配的 CPU + + 至少有 4GB 的可分配内存 + + 至少有 30GB 以上的磁盘空间 ++ 已经安装了 kubectl 可以访问 Kubernetes 集群 ++ 已经安装了 [Helm 3](https://helm.sh/docs/intro/install/) + + +执行以下命令安装 PolarDB-X Operator。 + +```bash +$ helm install --namespace polardbx-operator-system --create-namespace polardbx-operator https://github.com/polardb/polardbx-operator/releases/download/v1.2.1/polardbx-operator-1.2.1.tgz +``` + +您也可以通过 PolarDB-X 的 Helm Chart 仓库安装: +```bash +helm repo add polardbx https://polardbx-charts.oss-cn-beijing.aliyuncs.com +helm install --namespace polardbx-operator-system --create-namespace polardbx-operator polardbx/polardbx-operator +``` + +期望看到如下输出: + +```bash +NAME: polardbx-operator +LAST DEPLOYED: Sun Oct 17 15:17:29 2021 +NAMESPACE: polardbx-operator-system +STATUS: deployed +REVISION: 1 +TEST SUITE: None +NOTES: +polardbx-operator is installed. Please check the status of components: + + kubectl get pods --namespace polardbx-operator-system + +Now have fun with your first PolarDB-X cluster. + +Here's the manifest for quick start: + +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXCluster +metadata: + name: quick-start + annotations: + polardbx/topology-mode-guide: quick-start +``` + +查看 PolarDB-X Operator 组件的运行情况,等待它们都进入 Running 状态: + +```bash +$ kubectl get pods --namespace polardbx-operator-system +NAME READY STATUS RESTARTS AGE +polardbx-controller-manager-6c858fc5b9-zrhx9 1/1 Running 0 66s +polardbx-hpfs-d44zd 1/1 Running 0 66s +polardbx-tools-updater-459lc 1/1 Running 0 66s +``` + +恭喜!PolarDB-X Operator 已经安装完成,现在可以开始部署 PolarDB-X 集群了! + +# 部署 PolarDB-X 集群 + +现在我们来快速部署一个 PolarDB-X 集群,它包含 1 个 GMS 节点、1 个 CN 节点、1 个 DN 节点和 1 个 CDC 节点。执行以下命令创建一个这样的集群: + +```bash +echo "apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXCluster +metadata: + name: quick-start + annotations: + polardbx/topology-mode-guide: quick-start" | kubectl apply -f - +``` + +你将看到以下输出: + +```bash +polardbxcluster.polardbx.aliyun.com/quick-start created +``` + +使用如下命令查看创建状态: + +```bash +$ kubectl get polardbxcluster -w +NAME GMS CN DN CDC PHASE DISK AGE +quick-start 0/1 0/1 0/1 0/1 Creating 35s +quick-start 1/1 0/1 1/1 0/1 Creating 93s +quick-start 1/1 0/1 1/1 1/1 Creating 4m43s +quick-start 1/1 1/1 1/1 1/1 Running 2.4 GiB 4m44s +``` + +当 PHASE 显示为 Running 时,PolarDB-X 集群已经部署完成!恭喜你,现在可以开始连接并体验 PolarDB-X 分布式数据库了! + +# 连接 PolarDB-X 集群 + +PolarDB-X 支持 MySQL 传输协议及绝大多数语法,因此你可以使用 mysql 命令行工具连接 PolarDB-X 进行数据库操作。 + +在开始之前,请确保已经安装 mysql 命令行工具。 + +## 转发 PolarDB-X 的访问端口 + +创建 PolarDB-X 集群时,PolarDB-X Operator 同时会为集群创建用于访问的服务,默认是 ClusterIP 类型。使用下面的命令查看用于访问的服务: + +```bash +$ kubectl get svc quick-start +``` + +期望输出: + +```bash +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +quick-start ClusterIP 10.110.214.223 3306/TCP,8081/TCP 5m25s +``` + +我们使用 kubectl 提供的 port-forward 命名将服务的 3306 端口转发到本地,并且保持转发进程存活。 + +```bash +$ kubectl port-forward svc/quick-start 3306 +``` + +## 连接 PolarDB-X 集群 + +Operator 将为 PolarDB-X 集群默认创建一个账号 polardbx_root,并将密码存放在 secret 中。 + +使用以下命令查看 polardbx_root 账号的密码: + +```bash +$ kubectl get secret quick-start -o jsonpath="{.data['polardbx_root']}" | base64 -d - | xargs echo "Password: " +Password: bvp9wjxx +``` + +保持 port-forward 的运行,重新打开一个终端,执行如下命令连接集群: + +```bash +$ mysql -h127.0.0.1 -P3306 -upolardbx_root -pbvp9wjxx +``` + +期望输出: + +```bash +Welcome to the MySQL monitor. Commands end with ; or \g. +Your MySQL connection id is 6 +Server version: 5.6.29 Tddl Server (ALIBABA) + +Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. + +Oracle is a registered trademark of Oracle Corporation and/or its +affiliates. Other names may be trademarks of their respective +owners. + +Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. + +mysql> +``` + +恭喜!你已经成功地部署并连接到了一个 PolarDB-X 分布式数据库集群,现在你可以开始体验分布式数据库的能力了! + +# 销毁 PolarDB-X 集群 + +完成测试后,你可以通过以下命令销毁 PolarDB-X 集群。 + +```bash +$ kubectl delete polardbxcluster quick-start +``` + +再次查看以确保删除完成 + +```bash +$ kubectl get polardbxcluster quick-start +``` + +# 卸载 PolarDB-X Operator + +使用如下命令卸载 PolarDB-X Operator。 + +```bash +$ helm uninstall --namespace polardbx-operator-system polardbx-operator +``` + +Helm 卸载并不会删除对应的定制资源 CRD,使用下面的命令查看并删除 PolarDB-X 对应的定制资源: + +```bash +$ kubectl get crds | grep polardbx.aliyun.com +polardbxclusters.polardbx.aliyun.com 2021-10-17T07:17:27Z +xstores.polardbx.aliyun.com 2021-10-17T07:17:27Z + +$ kubectl delete crds polardbxclusters.polardbx.aliyun.com xstores.polardbx.aliyun.com +``` \ No newline at end of file diff --git a/deployment/1-installation-data-dir.md b/deployment/1-installation-data-dir.md new file mode 100644 index 0000000..0fa7af1 --- /dev/null +++ b/deployment/1-installation-data-dir.md @@ -0,0 +1,29 @@ +修改数据目录 +======== +通过以下命令来在安装时指定宿主机: + +- 数据目录 `/polardbx/data` (默认值为 /data) +- 日志目录 `/polardbx/log` (默认值为/data-log) +- 传输目录 `/polardbx/filestream` (默认值为 /filestream) + +```bash +helm install --namespace polardbx-operator-system --set node.volumes.data=/polardbx/data polardbx-operator polardbx/polardbx-operator --create-namespace +``` + +或者你也可以准备一个 values.yaml 文件,然后通过下面的命令来指定: + +```bash +helm install --namespace polardbx-operator-system -f values.yaml polardbx-operator polardbx/polardbx-operator --create-namespace +``` + +其中 values.yaml 包含以下内容: + +```yaml +node: + volumes: + data: /polardbx/data + log: /polardbx/log + filestream: /polardbx/filestream +``` + +> 除了上述目录,容器运行文件目录(通常默认为/var/lib/docker)和k8s根目录(通常默认为/var/lib/kubelet),需要在安装docker或者k8s的时候挂载到合适的目录,防止出现磁盘满的问题。 diff --git a/deployment/1-installation-default-image-repo.md b/deployment/1-installation-default-image-repo.md new file mode 100644 index 0000000..3deacf8 --- /dev/null +++ b/deployment/1-installation-default-image-repo.md @@ -0,0 +1,7 @@ +修改默认镜像仓库 +======== +修改默认镜像仓库为 `registry:5000`: + +```bash +helm install --namespace polardbx-operator-system --set imageRepo=registry:5000 polardbx-operator polardbx/polardbx-operator --create-namespace +``` diff --git a/deployment/1-installation-default-image.md b/deployment/1-installation-default-image.md new file mode 100644 index 0000000..3df81c8 --- /dev/null +++ b/deployment/1-installation-default-image.md @@ -0,0 +1,34 @@ +修改默认镜像 +======== +## 系统组件 +1. 指定系统组件镜像 tag 为 v1.0.1: + +```bash +helm install --namespace polardbx-operator-system --set imageTag=v1.0.1 polardbx-operator polardbx/polardbx-operator --create-namespace +``` + +2. 指定拉取策略为 `Always`: + +```bash +helm install --namespace polardbx-operator-system --set imagePullPolicy=Always polardbx-operator polardbx/polardbx-operator --create-namespace +``` + +## 数据库集群 + +1. 指定所有组件的默认 tag 为 `v1`: + +```bash +helm install --namespace polardbx-operator-system --set clusterDefaults.version=v1 polardbx-operator polardbx/polardbx-operator --create-namespace +``` + +2. 覆盖组件默认 tag,例如指定 CN 镜像的 tag 为 `v2`(其余组件仍然为 `clusterDefaults.version`的配置): + +```bash +helm install --namespace polardbx-operator-system --set clusterDefaults.galaxysql=polardbx-sql:v2 polardbx-operator polardbx/polardbx-operator --create-namespace +``` + +3. 覆盖组件默认 repo,例如指定 CN 的镜像 repo 为 `registry:5000`(其余组件仍然为 `imageRepo`的配置): + +```bash +helm install --namespace polardbx-operator-system --set clusterDefaults.galaxysql=registry:5000/polardbx-sql polardbx-operator polardbx/polardbx-operator --create-namespace +``` diff --git a/deployment/1-installation.md b/deployment/1-installation.md new file mode 100644 index 0000000..e5fac93 --- /dev/null +++ b/deployment/1-installation.md @@ -0,0 +1,153 @@ +## 准备工作 +开始之前,请确保满足以下前置要求: + ++ 已经准备了一个运行中的 Kubernetes 集群,并确保 + + 集群版本 >= 1.18.0 + + 至少有 2 个可分配的 CPU + + 至少有 4GB 的可分配内存 + + 至少有 30GB 以上的磁盘空间 ++ 已经安装了 kubectl 可以访问 Kubernetes 集群 ++ 已经安装了 [Helm 3](https://helm.sh/docs/intro/install/) + + +执行以下命令安装 PolarDB-X Operator。 + +```bash +$ helm install --namespace polardbx-operator-system --create-namespace polardbx-operator https://github.com/polardb/polardbx-operator/releases/download/v1.4.0/polardbx-operator-1.4.0.tgz +``` + +您也可以通过 PolarDB-X 的 Helm Chart 仓库安装: + +```bash +helm repo add polardbx https://polardbx-charts.oss-cn-beijing.aliyuncs.com +helm install --namespace polardbx-operator-system --create-namespace polardbx-operator polardbx/polardbx-operator +``` + +期望看到如下输出: + +```bash +NAME: polardbx-operator +LAST DEPLOYED: Sun Oct 17 15:17:29 2021 +NAMESPACE: polardbx-operator-system +STATUS: deployed +REVISION: 1 +TEST SUITE: None +NOTES: +polardbx-operator is installed. Please check the status of components: + + kubectl get pods --namespace polardbx-operator-system + +Now have fun with your first PolarDB-X cluster. + +Here's the manifest for quick start: + +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXCluster +metadata: + name: quick-start + annotations: + polardbx/topology-mode-guide: quick-start +``` + +## 安装选项 +Helm 安装通常可以指定一些配置选项的值,用于覆盖默认的安装选项。这里介绍几个常见的选项和安装模式: + +- node.volume.data,用来指定宿主机上使用的数据目录,参考 [安装-修改数据目录](./1-installation-data-dir.md) ; +- images、imageTag、useLatestImage 和 clusterDefaults,用来修改系统组件和数据库集群默认的镜像集合,参考[安装-修改默认镜像](./1-installation-default-image.md) ; +- imageRepo,用来修改默认的镜像仓库,参考[安装-修改默认镜像仓库](./1-installation-default-image-repo.md) ; + +所有安装选项可通过如下命令获取: + +```shell +helm show values --namespace polardbx-operator-system https://github.com/polardb/polardbx-operator/releases/download/v1.4.0/polardbx-operator-1.4.0.tgz +``` + +## 系统检查 +### 运行情况 + +用以下命令查看系统组件的运行情况: + +```bash +kubectl -n polardbx-operator-system get pods +``` + +通常你将看到以下 Pod: + +```bash +NAME READY STATUS RESTARTS AGE +NAME READY STATUS RESTARTS AGE +polardbx-controller-manager-6c858fc5b9-zrhx9 1/1 Running 0 66s +polardbx-hpfs-d44zd 1/1 Running 0 66s +polardbx-tools-updater-459lc 1/1 Running 0 66s +``` + +其中: + +- polardbx-controller-manager 是 operator 和 webhook 所在的 Pod,由一个 Deployment 控制创建 +- polardbx-hpfs 是宿主机远程文件服务所在的 Pod,由一个 DaemonSet 控制创建,因此每个节点会有一个 +- polardbx-tools-updater 是宿主机上一些公共工具脚本的更新程序,同样由一个 DaemonSet 控制创建 + +### 动态配置 +Operator/Webhook 在运行时加载了一组动态配置,这组配置在 Kubernetes 中存放在 ConfigMap 中。 + +用下面的命令查看配置的内容: + +```bash +kubectl -n polardbx-operator-system get configmap polardbx-controller-manager-config -o yaml +``` + +通常你将看到这样的内容: + +```yaml +apiVersion: v1 +data: + config.yaml: |- + images: + repo: polardbx + common: + prober: probe-proxy:v1.2.0 + exporter: polardbx-exporter:v1.2.0 + compute: + init: polardbx-init:v1.2.0 + engine: registry.cn-zhangjiakou.aliyuncs.com/drds_pre/polardbx-sql:20220330-2 + cdc: + engine: registry.cn-zhangjiakou.aliyuncs.com/drds_pre/polardbx-cdc:20220408 + store: + galaxy: + engine: polardbx-engine@sha256:a1cf4aabf3e0230d6a63dd9afa125e58baa2a925462a59968ac3b918422bf521 + exporter: prom/mysqld-exporter:master + scheduler: + enable_master: true + cluster: + enable_exporters: true + enable_aliyun_ack_resource_controller: true + enable_debug_mode_for_compute_nodes: false + enable_privileged_container: false + store: + enable_privileged_container: false + host_paths: + tools: /data/cache/tools/xstore + volume_data: /data/xstore + hpfs_endpoint: polardbx-hpfs:6543 + webhook.yaml: |- + validator: + + default: + protocol_version: 8 + storage_engine: galaxy + service_type: ClusterIP + upgrade_strategy: RollingUpgrade +kind: ConfigMap +metadata: + annotations: + meta.helm.sh/release-name: polardbx-operator + meta.helm.sh/release-namespace: polardbx-operator-system + creationTimestamp: "2022-04-01T08:09:55Z" + labels: + app.kubernetes.io/managed-by: Helm + name: polardbx-controller-manager-config + namespace: polardbx-operator-system + resourceVersion: "2475601453" + uid: 585844ea-4b87-4407-98f2-520d02d8cffd +``` diff --git a/deployment/2-uninstallation.md b/deployment/2-uninstallation.md new file mode 100644 index 0000000..8113783 --- /dev/null +++ b/deployment/2-uninstallation.md @@ -0,0 +1,26 @@ +卸载 PolarDB-X Operator +======== +## 卸载系统组件 +使用下面的命令卸载: + +```bash +helm uninstall --namespace polardbx-operator-system polardbx-operator +``` + +使用下面的命令删除命名空间: + +```bash +kubectl delete namespace polardbx-operator-system +``` + +注意事项: + +- 卸载后,所有的 `PolarDBXCluster`、`XStore`等相关定制资源无法再自动维护 +- 出于资源保护的目的,数据库组件的 Pod 上通常有 finalizer 保护,卸载系统组件意味着删除时无法自动移除 finalizer 和回收资源(例如宿主机磁盘)等,需要手工进行,请谨慎操作 + +## 卸载定制资源定义 CRD +Helm 的卸载不会同时移除集成的 CRD,如果需要彻底卸载,需要手动移除: + +```bash +kubectl get crds | grep -E "polardbx.aliyun.com" | cut -d ' ' -f 1 | xargs kubectl delete crds +``` diff --git a/deployment/3-upgrade.md b/deployment/3-upgrade.md new file mode 100644 index 0000000..2ff46dd --- /dev/null +++ b/deployment/3-upgrade.md @@ -0,0 +1,28 @@ +升级 PolarDB-X Operator +======== + +由于 Helm 不会更新 CRD, 因此 PolarDB-X Operator 的升级分为如下两个步骤: +1. 更新 CRD +2. 升级 Operator + + +### 更新 CRD + +1. 请拉取版本对应的 [CRD 文件](https://github.com/polardb/polardbx-operator/tree/main/charts/polardbx-operator/crds)。CRD 文件的拉取可以直接拉取源码,也可以下载 PolarDB-X Operator 对应版本的 [Release 包](https://github.com/polardb/polardbx-operator/releases),解压后获取。 +2. 执行如下命令更新 CRD: +```shell +kubectl apply -f polardbx-operator/crds +``` + + +### 升级 Operator + +```bash +helm upgrade --namespace polardbx-operator-system polardbx/polardbx-operator +``` + +可以同时指定 values.yaml: + +```bash +helm upgrade --namespace polardbx-operator-system -f values.yaml polardbx/polardbx-operator +``` \ No newline at end of file diff --git a/deployment/README.md b/deployment/README.md new file mode 100644 index 0000000..6b5634a --- /dev/null +++ b/deployment/README.md @@ -0,0 +1,10 @@ +PolarDB-X Operator 安装部署 +========================= + +1. [快速开始](./0-quickstart.md) +2. [安装](./1-installation.md) + 1. [修改数据目录](./1-installation-data-dir.md) + 2. [修改默认镜像](./1-installation-default-image.md) + 3. [修改默认镜像仓库](./1-installation-default-image-repo.md) +3. [卸载](./2-uninstallation.md) +4. [升级](./3-upgrade.md) \ No newline at end of file diff --git a/faq/1-log.md b/faq/1-log.md new file mode 100644 index 0000000..2e2c440 --- /dev/null +++ b/faq/1-log.md @@ -0,0 +1,26 @@ +## polardbx-operator + +执行下面的命令查看 polardbx-operator 所在的 Pod + +```bash +kubectl -n polardbx-operator-system get pods -l app.kubernetes.io/component=controller-manager +NAME READY STATUS RESTARTS AGE +polardbx-controller-manager-597685578-kj4rj 1/1 Running 0 10d +``` + +使用 `kubectl logs` 命令来查看日志 + +```bash +kubectl -n polardbx-operator-system logs polardbx-controller-manager-597685578-kj4rj +... +2022-05-24T02:44:52.140Z INFO controller.xstore control/context.go:155 Executing command {"namespace": "default", "xstore": "pxc-demo-q2gq-gms", "engine": "galaxy", "phase": "Running", "stage": "", "trace": "e8ec86c3-f1f8-4baf-b3df-59cf8197ebcb", "action": "ReconcileConsensusRoleLabels", "step": 2, "pod": "default", "container": "engine", "command": ["/tools/xstore/current/venv/bin/python3", "/tools/xstore/current/cli.py", "consensus", "role", "--report-leader"], "timeout": "10s"} +2022-05-24T02:44:52.405Z INFO controller.xstore instance/consensus.go:62 Be aware of pod's role and current leader. {"namespace": "default", "xstore": "pxc-demo-q2gq-gms", "engine": "galaxy", "phase": "Running", "stage": "", "trace": "e8ec86c3-f1f8-4baf-b3df-59cf8197ebcb", "action": "ReconcileConsensusRoleLabels", "step": 2, "pod": "pxc-demo-q2gq-gms-cand-1", "role": "leader", "leader-pod": "pxc-demo-q2gq-gms-cand-1"} +2022-05-24T02:44:52.405Z INFO controller.xstore instance/consensus.go:218 Leader not changed. {"namespace": "default", "xstore": "pxc-demo-q2gq-gms", "engine": "galaxy", "phase": "Running", "stage": "", "trace": "e8ec86c3-f1f8-4baf-b3df-59cf8197ebcb", "action": "ReconcileConsensusRoleLabels", "step": 2, "leader-pod": "pxc-demo-q2gq-gms-cand-1"} +2022-05-24T02:44:52.405Z INFO controller.xstore instance/volumes.go:193 Not time to update sizes, skip. {"namespace": "default", "xstore": "pxc-demo-q2gq-gms", "engine": "galaxy", "phase": "Running", "stage": "", "trace": "e8ec86c3-f1f8-4baf-b3df-59cf8197ebcb", "action": "UpdateHostPathVolumeSizesPer1m0s", "step": 6} +2022-05-24T02:44:52.405Z INFO controller.xstore instance/common.go:117 Update observed generation. {"namespace": "default", "xstore": "pxc-demo-q2gq-gms", "engine": "galaxy", "phase": "Running", "stage": "", "trace": "e8ec86c3-f1f8-4baf-b3df-59cf8197ebcb", "action": "UpdateObservedGeneration", "step": 10, "previous-generation": 2, "current-generation": 2} +2022-05-24T02:44:52.405Z INFO controller.xstore instance/common.go:108 Update observed topology and config. {"namespace": "default", "xstore": "pxc-demo-q2gq-gms", "engine": "galaxy", "phase": "Running", "stage": "", "trace": "e8ec86c3-f1f8-4baf-b3df-59cf8197ebcb", "action": "UpdateObservedTopologyAndConfig", "step": 11, "current-generation": 2} +2022-05-24T02:44:52.405Z INFO controller.xstore control/common.go:62 Loop while running {"namespace": "default", "xstore": "pxc-demo-q2gq-gms", "engine": "galaxy", "phase": "Running", "stage": "", "trace": "e8ec86c3-f1f8-4baf-b3df-59cf8197ebcb", "action": "RetryAfter10s", "step": 12} +2022-05-24T02:44:52.405Z INFO controller.xstore instance/status.go:158 Display status updated! {"namespace": "default", "xstore": "pxc-demo-q2gq-gms", "engine": "galaxy", "phase": "Running", "stage": "", "trace": "e8ec86c3-f1f8-4baf-b3df-59cf8197ebcb", "defer_exec": true, "action": "UpdateDisplayStatus", "step": 13} +2022-05-24T02:44:52.405Z INFO controller.xstore instance/status.go:52 Object not changed. {"namespace": "default", "xstore": "pxc-demo-q2gq-gms", "engine": "galaxy", "phase": "Running", "stage": "", "trace": "e8ec86c3-f1f8-4baf-b3df-59cf8197ebcb", "defer_exec": true, "action": "PersistentXStore", "step": 14} +2022-05-24T02:44:52.405Z INFO controller.xstore instance/status.go:41 Status not changed. {"namespace": "default", "xstore": "pxc-demo-q2gq-gms", "engine": "galaxy", "phase": "Running", "stage": "", "trace": "e8ec86c3-f1f8-4baf-b3df-59cf8197ebcb", "defer_exec": true, "action": "PersistentStatus", "step": 15} +``` diff --git a/faq/10-dn-flame-graph.md b/faq/10-dn-flame-graph.md new file mode 100644 index 0000000..0c91cfe --- /dev/null +++ b/faq/10-dn-flame-graph.md @@ -0,0 +1,26 @@ +1. 执行`perf`命令确认是否安装,若未安装则执行(CentOS):`sudo yum install perf` +2. 实时查看热点函数:`perf top --call-graph dwarf -p {PID}`,检查是否能看到mysqld的函数栈,类似AHI的问题比较容易看出来 +3. 绘制火焰图 + +```shell +# 如果mysqld在容器中运行 +# 则拷贝mysqld的二进制文件到宿主机的相同运行路径下 +docker cp {ContainerId} + +# 找到mysqld进程号 +ps -ef | grep mysqld + +# 采样40s +perf record -F 99 -p {pid} -g --call-graph dwarf -- sleep 40 + +# 将二进制的 perf.data 转化为文本形式 +perf script > out.perf + +# 绘制火焰图 +# 火焰图工具下载见文末 +./FlameGraph-master/stackcollapse-perf.pl out.perf > out.folded +./FlameGraph-master/flamegraph.pl out.folded > mysqld.svg +``` + +火焰图工具:[FlameGraph-master.zip](./FlameGraph-master.zip) + diff --git a/faq/11-block-in-imagepullbackoff.md b/faq/11-block-in-imagepullbackoff.md new file mode 100644 index 0000000..574e59b --- /dev/null +++ b/faq/11-block-in-imagepullbackoff.md @@ -0,0 +1,10 @@ + +1. 执行如下命令,查看实际拉取的镜像是否存在 +```shell +kubectl describe pod {报错的 pod} +``` +2. 确认是否有镜像仓库的拉取权限。PolarDB-X 默认的镜像仓库是无需权限的,如果使用内部镜像仓库,需要配置鉴权信息,参考文档:[Pull an Image from a Private Registry](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/) +3. 第2步确认完成后,删除报错的 pod,让其重建即可。 + + + diff --git a/faq/12-kill-process-in-pod.md b/faq/12-kill-process-in-pod.md new file mode 100644 index 0000000..6d9cf00 --- /dev/null +++ b/faq/12-kill-process-in-pod.md @@ -0,0 +1,5 @@ +三种情况: + +1. kill 了 1 号进程 +1. kill 进程后,1 号进程退出了 +1. kill 进程后 `liveness`probe 连续失败超过阈值 diff --git a/faq/13-get-logs-from-a-terminated-pod.md b/faq/13-get-logs-from-a-terminated-pod.md new file mode 100644 index 0000000..c2162b6 --- /dev/null +++ b/faq/13-get-logs-from-a-terminated-pod.md @@ -0,0 +1,10 @@ +1. K8S 官方尚未提供从 stopped/completed Pod中拷贝文件的功能,[参考这里](https://github.com/kubernetes/kubectl/issues/454) 。不过可以通过如下命令获取上一个 Pod 的日志信息: + +```shell +kubectl logs -n --previous +``` +2. 如果通过查看日志判断 pod 无法达到Running状态是因为探活(Probe容器)失败,可以通过如下命令关闭 Pod 的探活,让 Pod 达到 Running 状态后,进入 Pod 查看文件或者拷贝文件。 + +```shell +kubectl annotate pod {pod 名} runmode=debug +``` diff --git a/faq/14-docker-image-build.md b/faq/14-docker-image-build.md new file mode 100644 index 0000000..5982ef5 --- /dev/null +++ b/faq/14-docker-image-build.md @@ -0,0 +1,51 @@ +## CN +拉取 PolarDB-X SQL 代码,执行docker_build.sh 即可。 +[https://github.com/polardb/polardbx-sql/blob/main/docker_build.sh](https://github.com/polardb/polardbx-sql/blob/main/docker_build.sh) + +## DN + +```dockerfile +FROM centos:7 + +# Install essential utils +RUN yum update -y && \ + yum install sudo hostname telnet net-tools vim tree less libaio numactl-libs python3 -y && \ + yum clean all && rm -rf /var/cache/yum && rm -rf /var/tmp/yum-* + +# Remove localtime to make mount possible. +RUN rm -f /etc/localtime + +# Create user "mysql" and add it into sudo group +RUN useradd -ms /bin/bash mysql && \ + echo "mysql:mysql" | chpasswd && \ + echo "mysql ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers + +# Install polardbx engine's rpm, use URL to reduce the final image size. +ARG POLARDBX_ENGINE_RPM_URL=.rpm + +RUN yum install -y ${POLARDBX_ENGINE_RPM_URL} && \ + yum clean all && rm -rf /var/cache/yum && rm -rf /var/tmp/yum-* # && \ + # mv /u01/xcluster80_current/* /opt/galaxy_engine/ && rm -rf /u01 + +# Target to polardbx engine home. +WORKDIR /opt/galaxy_engine + +# Setup environment variables. +ENV POLARDBX_ENGINE_HOME=/opt/galaxy_engine +ENV PATH=$POLARDBX_ENGINE_HOME/bin:$PATH + +ENTRYPOINT mysqld +``` + +1. 把上面 Dockerfile 存下来 +2. 打包 rpm(文档待更新) +3. 执行 + +```bash +docker build --build-arg POLARDBX_ENGINE_RPM_URL=${POLARDBX_ENGINE_RPM_URL} -t polardbx-engine . +``` + + +## CDC +拉取仓库代码,执行 build.sh 即可。 +详见:[https://github.com/polardb/polardbx-cdc/blob/main/docker/build.sh](https://github.com/polardb/polardbx-cdc/blob/main/docker/build.sh) diff --git a/faq/15-private-rpc-on-off.md b/faq/15-private-rpc-on-off.md new file mode 100644 index 0000000..b3ac88e --- /dev/null +++ b/faq/15-private-rpc-on-off.md @@ -0,0 +1,39 @@ +数据库参数修改详见:[《创建数据库参数操作对象》](../ops/configuration/1-cn-variable-load-at-runtime-create-db.md) +## 关闭私有协议 +通过 pxcknobs 修改如下参数即可: + +```shell +CONN_POOL_XPROTO_STORAGE_DB_PORT:-1 // DN 的私有协议,-1为关闭,0为自动获取配置 +CONN_POOL_XPROTO_META_DB_PORT: -1 // Meta db 的私有协议开关,-1为关闭,0为自动获取配置 +``` + +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXClusterKnobs +metadata: + name: polardbx-xcluster + namespace: development +spec: + ## PolarDB-X 的实例名 + clusterName: "polardbx-xcluster" + knobs: + CONN_POOL_XPROTO_STORAGE_DB_PORT: -1 + CONN_POOL_XPROTO_META_DB_PORT: -1 +``` + +## 开启私有协议 +配置如下 pxcknobs 参数即可: + +```shell +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXClusterKnobs +metadata: + name: polardbx-xcluster + namespace: development +spec: + ## PolarDB-X 的实例名 + clusterName: "polardbx-xcluster" + knobs: + CONN_POOL_XPROTO_STORAGE_DB_PORT: 0 + CONN_POOL_XPROTO_META_DB_PORT: 0 +``` diff --git a/faq/16-transaction-strategy.md b/faq/16-transaction-strategy.md new file mode 100644 index 0000000..9c1a497 --- /dev/null +++ b/faq/16-transaction-strategy.md @@ -0,0 +1,24 @@ +事务策略是 CN 的动态参数,如何修改可以参考文档:[《创建数据库参数操作对象》](../ops/configuration/1-cn-variable-load-at-runtime-create-db.md) + +## 操作步骤 + +1. 配置knobs.yaml 如下所示: + +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXClusterKnobs +metadata: + name: kunan-oss +spec: + clusterName: "tunan-oss" + knobs: + TRANSACTION_POLICY: XA +``` + +增加TRANSACTION_POLICY 参数,填写需要的事务策略即可,支持 XA|TSO|TSO_READONLY。 + +2. 登录CN,执行如下SQL,检查配置是否生效: + +```mysql +begin; show variables like "drds_transaction_policy"; rollback; +``` diff --git a/faq/17-one-replica-cluster.md b/faq/17-one-replica-cluster.md new file mode 100644 index 0000000..5c590e5 --- /dev/null +++ b/faq/17-one-replica-cluster.md @@ -0,0 +1,55 @@ +在yaml 中添加 .spec.topology.rules.components 配置gms 和 dn 的 nodesets即可,如下所示: + +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXCluster +metadata: + name: pxc-demo +spec: + topology: + rules: + components: + gms: + ## 配置nodeSets即可 + nodeSets: + - name: cands + role: Candidate + replicas: 1 + dn: + ## 配置nodeSets即可 + nodeSets: + - name: cands + role: Candidate + replicas: 1 + nodes: + gms: + template: + resources: + limits: + cpu: 2 + memory: 4Gi + cn: + replicas: 1 + template: + resources: + requests: + cpu: 1 + memory: 4Gi + limits: + cpu: 2 + memory: 4Gi + dn: + replicas: 1 + template: + resources: + limits: + cpu: 2 + memory: 4Gi + cdc: + replicas: 1 + template: + resources: + limits: + cpu: 2 + memory: 4Gi +``` diff --git a/faq/18-host-network-port-conflict.md b/faq/18-host-network-port-conflict.md new file mode 100644 index 0000000..8540c01 --- /dev/null +++ b/faq/18-host-network-port-conflict.md @@ -0,0 +1,4 @@ +PolarDB-X Operator 支持容器网络和Host Network 两种模式。在创建 PolarDB-X 实例的时候,如果采用 Host Network, 此时 Pod 直接使用的主机上的随机端口,可能会与主机上的其它进程端口产生冲突,导致服务起不来。 + +1. 如果宿主机上占用端口的进程能够停止或者更换端口,停止该进程或者更改端口即可,让 Pod 重新拉起即可。 +1. 如果端口无法更改,直接重建 PolarDB-X 实例,此时会重新生成新的随机端口,大概率不会遇到端口冲突。 diff --git a/faq/2-block-in-creating.md b/faq/2-block-in-creating.md new file mode 100644 index 0000000..c0d423c --- /dev/null +++ b/faq/2-block-in-creating.md @@ -0,0 +1,22 @@ +集群创建卡在 Creating 状态有几种可能的原因 + +- 组件的 Pod 始终无法 ready,可能的状态可能有 ImagePullBackOff,Pending,CrashBackLoopOff 等 +- GMS 中 metadb 的元数据无法准备完成 +- 无法从 CN 处获取版本 +- ... + +排查思路主要是两个: + +1. 查看本集群 Pod 状态,看是否有异常状态的 Pod +1. [查看 polardbx-operator 日志](./1-log.md) ,查看是否有对应集群的 ERROR 日志 + +```bash +kubectl get pods -l polardbx/name={集群名} +``` + +| Pod 状态 | 可能的原因 | 排查 & 解决思路 | +|------------------------------------------------------------------------|----------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +|
  • READY STATUS
  • 0/ 3 ImagePullBackOff
| 镜像拉取失败
  • 镜像写错了
  • 私有仓库,没有权限
| 使用 `kubectl describe` 进一步确定
  • 镜像写错了,更新 PolarDBXCluster 的 spec
  • 私有仓库,需要[添加权限](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
| +|
  • READY STATUS
  • 0/ 3 Pending
| 资源不足 | 使用 `kubectl describe` 进一步确定
  • 添加节点
  • 腾挪资源
| +|
  • READY STATUS
  • 2/ 3 CrashBackLoopOff
|
  • 容器反复 crash
  • cn 进程挂了
| 使用 `kubectl describe` 进一步确定
  • 具体问题具体分析
  • describe 看不到错误信息,可以通过[关闭探活](../ops/component/cn/2-liveness.md) 的方式让pod先起来,进入pod 查看相关的日志。
| + diff --git a/faq/3-block-in-deleting.md b/faq/3-block-in-deleting.md new file mode 100644 index 0000000..0d59963 --- /dev/null +++ b/faq/3-block-in-deleting.md @@ -0,0 +1,23 @@ +卡在 Deleting 状态,一定是因为存在未处理的 finalizer,首先查看 PolarDBXCluster 是否有这样的 finalizer + +```bash +kubectl get pxc {PolarDBX 名} -o jsonpath='{.metadata.finalizers}' +["polardbx/finalizer"] +``` + +通常只有 `polardbx/finalizer` 这一个,应当会由 polardbx-operator 进行处理。如果长时间未处理,需要 + +- 确定是否 operator 还存活 +- 查看 operator 日志来确定原因 + +如果存在其他 finalizer,需要确定是否有对应组件会处理, + +- 如果有,则需要对应组件排查原因 +- 否则,使用 `kubectl edit`手动删除对应的 finalizer + +## 批量操作 +如果 xstore 或者 cn 的数量较多,可以通过如下命令批量操作(操作前建议阅读命令格式,通过标签过滤的方式筛选出需要删除的对象): + +```shell +for i in $(kubectl get xstore -o jsonpath='{.items[*].metadata.name}'); do echo $i; kubectl get xstore $i -o json | jq '.metadata.finalizers = null' | kubectl apply -f -; done +``` diff --git a/faq/4-dn-no-leader.md b/faq/4-dn-no-leader.md new file mode 100644 index 0000000..252a5e1 --- /dev/null +++ b/faq/4-dn-no-leader.md @@ -0,0 +1,28 @@ +使用下面的命令获取存储节点 / 元数据节点的 leader 节点: + +```bash +kubectl get xstore wuzhe-test2-shmr-dn-3 +NAME LEADER READY PHASE DISK VERSION AGE +wuzhe-test2-shmr-dn-3 wuzhe-test2-shmr-dn-3-cands-0 1/1 Running 1.1 GiB 5.7.14-AliSQL-X-Cluster-1.6.1.1-20220520-log 3m11s +``` + +其中 `LEADER`列的信息就是 leader 节点所在的 Pod。 + +如果该列为空,则表示未发现 leader 节点,需要进一步判断是那种情况: + +1. [排查是否存储节点对应的 Pod 不在运行中](../ops/component/dn/1-dn-node-state-inspect.md) +1. [排查 Pod 内部的日志](../ops/component/dn/3-dn-log.md) +1. [排查 operator 的日志](./1-log.md) + +存储节点的 leader 是通过以下容器内命令来获取的: + +```bash +kubectl exec -it wuzhe-test2-shmr-dn-3-cands-0 bash +kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. +Defaulted container "engine" out of: engine, exporter, prober + +[root@iZ8vb9igdh4szqgoyfjt03Z /] +#xsl consensus role --report-leader +leader +wuzhe-test2-shmr-dn-3-cands-0 +``` diff --git a/faq/5-pod-restart-inccident.md b/faq/5-pod-restart-inccident.md new file mode 100644 index 0000000..f714000 --- /dev/null +++ b/faq/5-pod-restart-inccident.md @@ -0,0 +1,12 @@ +Pod 的重启计数在任何一个容器被重启时都会增加,因此首先需要确定是哪个容器重启:使用 `kubectl describe pod {name}` 查看是哪个容器最近在重启。 + +重启的原因通常需要排查日志来得知,通常有几种原因: + +1. 容器的 liveness probe 失败超过阈值(通常为 3),这个需要排查进程是否存活以及相关的日志来排除问题 +1. 容器 1 号进程意外退出,例如在容器内执行了 `killall` + +日志排查合集: + +1. [CN 日志](../ops/component/cn/4-cn-log.md) +1. [GMS/DN 日志](../ops/component/dn/3-dn-log.md) +1. [CDC 日志](../ops/component/cdc/2-cdc-node-login.md) diff --git a/faq/6-pod-not-in-running-state.md b/faq/6-pod-not-in-running-state.md new file mode 100644 index 0000000..89fac37 --- /dev/null +++ b/faq/6-pod-not-in-running-state.md @@ -0,0 +1 @@ +一些情况的处理:[集群创建卡在 Creating 状态](./2-block-in-creating.md) diff --git a/faq/7-cn-memory-dump.md b/faq/7-cn-memory-dump.md new file mode 100644 index 0000000..4cda52f --- /dev/null +++ b/faq/7-cn-memory-dump.md @@ -0,0 +1,25 @@ +#### 登录对应的计算节点 + +```bash +kubectl exec -it -- bash +``` + +#### dump 内存 + +```bash +# 通过 JPS 获取进程 ID +jps |grep TDDLLauncher + +# dump 内存 +jmap -dump:live,format=b,file=heap.bin +``` + +#### 拷贝文件到本地 + +```bash +# 推迟计算节点 Pod +exit + +# 拷贝内存文件 +kubectl cp : +``` diff --git a/faq/8-dn-core-file.md b/faq/8-dn-core-file.md new file mode 100644 index 0000000..52e1f8e --- /dev/null +++ b/faq/8-dn-core-file.md @@ -0,0 +1 @@ +参考[获取日志文件的文档](../ops/component/dn/3-dn-log.md) ,区别是位置在容器内的 `/data/mysql/data`目录。 diff --git a/faq/9-cn-flame-graph.md b/faq/9-cn-flame-graph.md new file mode 100644 index 0000000..67c8e36 --- /dev/null +++ b/faq/9-cn-flame-graph.md @@ -0,0 +1,42 @@ +当前镜像里没有集成相应的工具,需要手动上传工具包执行(见附件)。以 CN 中的 tddl 进程为例,如下所示 + +```bash +# 上传 profiler tar 包到 pod 的 /tmp 目录 +$ kubectl cp ~/Downloads/async-profiler.tar.gz pxc-yexi-test-cn-c6498459c-hgwn7:/tmp/ -c server + +# 打开 pod 的 shell +$ kubectl exec -it pxc-yexi-test-cn-c6498459c-hgwn7 -c server -- bash + +# 解压 tar 包到 /home/admin/tools 目录 +$ cd /home/admin/tools && tar xzvf /tmp/async-profiler.tar.gz + +# 查看 Tddl 进程 +$ jps +193 TddlLauncher +467 DrdsWorker +499432 Jps + +# 设置内核参数, 两种情况 +# 1. 容器是 privileged,直接设置就好 +# 2. 容器不是 privileged,需要去 Pod 对应的宿主机上设置 +$ echo 1 >/proc/sys/kernel/perf_event_paranoid + +# 查看内核参数是否正常 +$ cat /proc/sys/kernel/perf_event_paranoid +1 + +# 开始 profile +$ ./profiler.sh -d 80 -f /tmp/profiler-drds.svg 193 + +# 打开一个新的本地 shell,从 pod 里拷贝 svg 火焰图出来 +$ kubectl cp pxc-yexi-test-cn-c6498459c-hgwn7:/tmp/profiler-drds.svg /tmp/profiler-drds.svg -c server + +# 打开火焰图 +$ open /tmp/profiler-drds.svg +``` + +![image.png](./cn-flame-graph.png) + +## 附件 + +[async-profiler.tar.gz](./async-profiler.tar.gz) diff --git a/faq/FlameGraph-master.zip b/faq/FlameGraph-master.zip new file mode 100644 index 0000000..a96fb06 Binary files /dev/null and b/faq/FlameGraph-master.zip differ diff --git a/faq/README.md b/faq/README.md new file mode 100644 index 0000000..687a58d --- /dev/null +++ b/faq/README.md @@ -0,0 +1,20 @@ +常见问题 +========== +1. [如何获取系统组件日志](./1-log.md) +2. [集群创建卡在 Creating 状态](./2-block-in-creating.md) +3. [集群删除卡在 Deleting 状态](./3-block-in-deleting.md) +4. [存储节点未发现 Leader 节点](./4-dn-no-leader.md) +5. [Pod 意外重启](./5-pod-restart-inccident.md) +6. [Pod 始终不能 Running](./6-pod-not-in-running-state.md) +7. [计算节点 dump 内存](./7-cn-memory-dump.md) +8. [存储节点拷贝 core 文件](./8-dn-core-file.md) +9. [计算节点获取火焰图](./9-cn-flame-graph.md) +10. [存储节点获取火焰图](./10-dn-flame-graph.md) +11. [Pod 始终处于 ImagePullBackOff](./11-block-in-imagepullbackoff.md) +12. [容器内 Kill 进程后 Pod 重启](./12-kill-process-in-pod.md) +13. [Pod 不在运行如何获取文件](./13-get-logs-from-a-terminated-pod.md) +14. [如何构建镜像](./14-docker-image-build.md) +15. [如何关闭/开启私有协议](./15-private-rpc-on-off.md) +16. [如何调整事务策略](./16-transaction-strategy.md) +17. [如何创建单副本实例](./17-one-replica-cluster.md) +18. [宿主机网络端口冲突](./18-host-network-port-conflict.md) \ No newline at end of file diff --git a/faq/async-profiler.tar.gz b/faq/async-profiler.tar.gz new file mode 100644 index 0000000..bfe206d Binary files /dev/null and b/faq/async-profiler.tar.gz differ diff --git a/faq/cn-flame-graph.png b/faq/cn-flame-graph.png new file mode 100644 index 0000000..6b12bfc Binary files /dev/null and b/faq/cn-flame-graph.png differ diff --git a/ops/README.md b/ops/README.md new file mode 100644 index 0000000..8f5dbb3 --- /dev/null +++ b/ops/README.md @@ -0,0 +1,91 @@ +# PolarDB-X 运维指南 + +PolarDB-X 集群有 4 个部分组成:元数据服务(GMS)、计算节点(CN)、存储节点(DN)和日志节点(CDC)。每个部分都包含一个或多个计算资源,在 Kubernetes 中以 Pod 的形式呈现。基于 PolarDB-X Operator,我们可以定制集群每一个部分,比如创建 100 个计算节点,或是将 100 个节点分散在 A 和 B 两个可用区来保证高可用等等。 + +## 标签 (Labels) + +在组成 PolarDB-X 集群时,operator 为每个组件赋予了不同的标签,下表展示了一些常用的标签。 + +| 标签 | 含义 | 可选值 | 示例 | +| :--- | :--- | :--- | :--- | +| polardbx/name | 资源所属的 PolarDBXCluster 资源的名字 | | quick-start | +| polardbx/role | 资源的角色 | gms,cn,dn,cdc | cn | + +组合这些标签可以选择不同的资源,例如列举 quick-start 集群下的所有 Pod: + +```bash +$ kubectl get pods -l polardbx/name=quick-start +NAME READY STATUS RESTARTS AGE +quick-start-ml92-cdc-default-77979c6699-5dfgg 2/2 Running 0 10m +quick-start-ml92-cn-default-6d5956d4f4-jdzr4 3/3 Running 1 (7m9s ago) 10m +quick-start-ml92-dn-0-single-0 3/3 Running 0 10m +quick-start-ml92-gms-single-0 3/3 Running 0 10m +``` + +或是列举所有的 CN: + +```bash +$ kubectl get pods -l polardbx/name=quick-start,polardbx/role=cn +NAME READY STATUS RESTARTS AGE +quick-start-ml92-cn-default-6d5956d4f4-jdzr4 3/3 Running 1 (9m1s ago) 12m +``` + +## 部署 -- 集群拓扑 + +为了方便本机测试,[[快速上手](../deployment/0-quickstart.md)] 中展示的集群预先定义了集群的规格和拓扑,将整体资源压缩在 4c8g 以下。 + +如果想要部署更适合使用的模式,需要自定义集群的拓扑和规格。[[PolarDBXCluster API](../api/polardbxcluster.md)] 中详细解释 PolarDBXCluster 中可配置字段的含义和可选值,你可以参考它进行配置。当然,配置项是比较多且复杂的,这里给出几个简单的例子以供参考: + ++ 经典集群 -- 16c64g (2 CN + 2 DN) + +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXCluster +metadata: + name: classic +spec: + topology: + nodes: + cn: + replicas: 2 + template: + resources: + limits: + cpu: 16 + memory: 64Gi + dn: + replicas: 2 + template: + resources: + limits: + cpu: 16 + memory: 64Gi +``` + +通常建议不设置 resources 的 requests 以使 Kubernetes 能够使 Pod 独享计算资源,你可以参考 [Kubernetes 的文档](https://kubernetes.io/zh/docs/tasks/configure-pod-container/quality-service-pod/) 来了解 Pod 的服务质量的概念。Operator 默认配置中没有为每个容器都指定资源,如需要确保 Pod 是 Guaranteed 的服务质量,需要打开 EnforceQoSGuaranteed 的门特性,可以参考 [[PolarDB-X 安装部署](../deployment/README.md)] 进行配置。 + +在 Kubernetes 集群资源允许的前提下,可以配置规格更大、节点更多的 PolarDB-X 集群。 + +## 集群生命周期管理 + +参考 [[生命周期管理]](./lifecycle/README.md) 对 PolarDB-X 集群全生命周期进行管理,包括创建、升级、扩缩容、删除等。 + +## 组件管理 + +参考 [[组件管理]](./component/README.md) 对 PolarDB-X CN、DN 和 CDC 组件进行管理。 + +## 访问 + +参考 [[连接 PolarDB-X 数据库]](./connection/README.md) 选择合适的访问方式。 + +## 配置 + +参考 [[数据库参数设置]](./configuration/README.md) 来设置和修改配置。 + +## 监控 + +参考 [[监控]](./monitor/README.md) 为 PolarDB-X 集群开启监控功能。 + +## 日志采集 + +参考 [[日志采集]](./logcollector/README.md) 为 PolarDB-X 集群开启日志采集功能。 \ No newline at end of file diff --git a/ops/backup-restore/1-backup-storage-configure.md b/ops/backup-restore/1-backup-storage-configure.md new file mode 100644 index 0000000..7d9335d --- /dev/null +++ b/ops/backup-restore/1-backup-storage-configure.md @@ -0,0 +1,88 @@ +备份存储方式配置 +========== + +PolarDB-X Operator 从 1.3.0 版本开始支持全量备份恢复功能。在开启集群的备份恢复之前,需要对备份集的存储方式进行配置。 + +您可以通过如下方式完成备份存储方式的配置。 + +## 配置备份存储 + +### 支持的存储方式 + +目前支持的存储方式如下所示: + +* SFTP +* Aliyun OSS + +更多存储方式会在后续支持。 + +### 配置 SFTP 为备份集存储 + +1. 执行如下命令修改 ConfigMap: +```shell +kubectl -n polardbx-operator-system edit configmap polardbx-hpfs-config +``` +在sinks数组中添加自己的sftp配置,如下所示: +```yaml +data: + config.yaml: |- + sinks: + - name: default + type: sftp + host: 127.0.0.1 + port: 22 + user: admin + password: admin + rootPath: /backup +``` +2. 保存之后执行以下命令使配置生效: +```shell +kubectl -n polardbx-operator-system rollout restart daemonsets polardbx-hpfs +``` + +配置项解释: +- name: 配置项名称,多个 sftp 配置通过 name 区分 +- type: 配置项类型(具体参照[支持的存储](#支持的存储)), 取值范围:sftp, oss +- host: 备份机器ip +- port: 备份机器端口 +- user: 备份机器账户名 +- password: 备份机器密码 +- rootPath: 备份集存放的根目录 + +### 配置阿里云 OSS 为备份集存储 + +1. 执行如下命令修改 ConfigMap: +```shell +kubectl -n polardbx-operator-system edit configmap polardbx-hpfs-config +``` +在sinks数组中添加自己的oss配置, +```yaml +data: + config.yaml: |- + sinks: + - name: default + type: oss + endpoint: endpoint + accessKey: ak + accessSecret: sk + bucket: bucket +``` +2. 保存之后执行以下命令使配置生效: +```shell +kubectl -n polardbx-operator-system rollout restart daemonsets polardbx-hpfs +``` + +配置项解释: +- name: 配置项名称,多个 oss 配置通过 name 区分 +- type: 配置项类型(具体参照[支持的存储](#支持的存储)), 取值范围:sftp, oss +- endpoint: oss访问域名 +- accessKey: oss访问id +- accessSecret: oss访问密钥 +- bucket: oss存储空间 +> 具体介绍可参考:[OSS产品文档](https://help.aliyun.com/document_detail/31827.html) + + +## 注意事项 + +- sinks可以配置多种存储类型,不同类型的配置的name允许重复;每种存储类型也支持多组存储配置,但同一类型下的name不允许重复。 +- operator可以在未配置存储的情况下正常运行,但需要使用备份恢复时须添加对应的存储配置。 diff --git a/ops/backup-restore/2-cluster-backup.md b/ops/backup-restore/2-cluster-backup.md new file mode 100644 index 0000000..c36a12f --- /dev/null +++ b/ops/backup-restore/2-cluster-backup.md @@ -0,0 +1,77 @@ +集群备份 +====== + +PolarDB-X Operator 从 1.3.0 版本开始支持全量备份恢复功能。本文介绍如何对 PolarDB-X 进行全量备份。 + +## 前置条件 +1. PolarDB-X Operator 升级到 1.3.0 及以上版本 +2. 完成备份存储方式配置,参见文档:[备份存储配置](./1-backup-storage-configure.md) + + +## 发起全量备份 + +下面介绍如何通过 PolarDBXBackup 对象为 PolarDB-X 进行全量备份。 + +### 创建 PolarDBXBackup 对象 + +1. 参照如下示例编写 pxc-backup.yaml 文件: +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXBackup +metadata: + name: pxcbackup-test +spec: + cluster: + name: polardbx-test + retentionTime: 240h + storageProvider: + storageName: sftp + sink: default + preferredBackupRole: follower +``` + +参数说明: +* cluster.name: 待备份的目标 PolarDB-X 集群名称 +* retentionTime: 备份集保留时间,单位小时 +* storageProvider.storageName: 备份集存储方式,支持 sftp 和 oss +* storageProvider.sink: 备份集存储配置的名称,对应[备份存储配置](./1-backup-storage-configure.md)中的 name 字段 +* preferredBackupRole( 该参数仅适用于 1.4.0 及后续版本 ): 进行备份的节点角色,可选择 `follower` 和 `leader`,默认为 `follower`;**若使用 `leader` 发起备份,可能会对业务造成影响,请谨慎配置** + +2.使用下面的命令创建 PolarDBXBackup 对象,触发全量备份: +```bash +kubectl create -f pxc-backup.yaml +``` + +### 查看全量备份进度 + +您可以使用以下指令查看全量备份的进度: +```bash +kubectl get pxb +``` + +当进度中的`PHASE`变为`Finished`后备份即表示全量备份完成。 +```bash +NAME CLUSTER START END RESTORE_TIME PHASE AGE +pxcbackup-test polardbx-test 2022-10-21T04:56:38Z 2022-10-21T04:58:21Z 2022-10-21T04:57:23Z Finished 4m15s +``` +其中,进度里的`RESTORE_TIME`字段表示该备份集可以恢复到的最新的时间点。 + +### 注意事项 + +- PolarDBXBackup 对象的metadata.name字段表示备份集的名称,多次构建备份集需要修改该字段 + +## 备份集查阅 + +全量备份完成后,备份集存放在如下路径,您可以在 SFTP 配置的主机或者 OSS bucket 中查看对应的备份集文件。 + +``` +{root_path}/polardbx-backup/{pxc_name}/{pxc_backup_name}-{timestamp} +``` + +- root_path取决于存储配置 + - 若采用sftp作为存储,则该值为sink.rootPath + - 若采用oss作为存储,则该值为sink.bucket +- polardbx-backup为固定字段 +- pxc_name是待备份的集群的名字 +- pxc_backup_name是备份集的名字 +- timestamp是备份开始的时间戳(UTC+0) \ No newline at end of file diff --git a/ops/backup-restore/3-cluster-restore.md b/ops/backup-restore/3-cluster-restore.md new file mode 100644 index 0000000..b4763b5 --- /dev/null +++ b/ops/backup-restore/3-cluster-restore.md @@ -0,0 +1,96 @@ +集群恢复 +====== +PolarDB-X Operator 从 1.3.0 版本开始支持全量备份恢复功能。本文介绍如何通过已有的备份集恢复出 PolarDB-X 集群。 + +## 恢复 PolarDB-X 集群 + +PolarDB-X 备份集恢复支持两种方式: + +* 指定备份集对象进行恢复 +* 指定备份集文件进行恢复 + +### 指定备份集对象进行恢复 + +这一种方式必须确保备份集对应的`PolarDBXBackup`对象仍然留存在K8S集群中,并且保证远程存储中仍然保存着备份文件。 + +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXCluster +metadata: + name: pxc-restore +spec: + topology: + nodes: + cn: + template: + image: polardbx/polardbx-sql:latest + dn: + template: + image: polardbx/polardbx-engine:latest + restore: + backupset: pxcbackup-test + syncSpecWithOriginalCluster: false +``` + +参数说明 +* topology: 实例规格,可参照[实例创建](../lifecycle/1-create.md) +* restore.backupset: 备份集(备份对象)名称 +* restore.syncSpecWithOriginalCluster( 该参数仅适用于 1.4.0 及后续版本 ): 是否保持实例规格和原实例一致,默认取值为`false`,不保持一致;**目前不支持集群异构恢复,这意味着数据节点数会强制与原实例保持一致** + + +### 指定备份集文件进行恢复 + +这一种恢复方式仅支持 1.4.0 版本以后产出的备份集,只须保证远程存储中仍然保存着备份文件即可。 + +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXCluster +metadata: + name: pxc-restore +spec: + topology: + nodes: + cn: + template: + image: polardbx/polardbx-sql:latest + dn: + template: + image: polardbx/polardbx-engine:latest + restore: + from: + backupSetPath: /polardbx/backup/pxcbackup-test + storageProvider: + storageName: sftp + sink: default + syncSpecWithOriginalCluster: false +``` + +参数说明 +* topology: 实例规格,可参照[实例创建](../lifecycle/1-create.md) +* restore.from.backupSetPath: 备份集的远程存储路径 +* restore.storageProvider: 备份使用的存储配置,可参照[集群备份](./2-cluster-backup.md) +* restore.syncSpecWithOriginalCluster( 该参数仅适用于 1.4.0 及后续版本 ): 是否保持实例规格和原实例一致,默认取值为`false`,不保持一致;**目前不支持集群异构恢复,这意味着数据节点数会强制与原实例保持一致** + +参照上述示例编写恢复用的yaml文件,这里需要注意指定创建方式是`restore`,通过以下命令进行恢复: + +```bash +kubectl apply -f pxc-restore.yaml +``` + +可通过以下命令观察恢复进度: + +```bash +kubectl get pxc +``` + +当状态中的`PHASE`变为`RUNNING`后整个恢复流程就完成了 + +```bash +NAME GMS CN DN CDC PHASE DISK AGE +pxc-restore 1/1 1/1 2/2 1/1 Running 20.3 GiB 22m +``` + +## 注意事项 + +- 快速的恢复操作只需在yaml文件中指定希望使用的镜像即可,否则将会使用默认的镜像,更多的规格配置可以参考[集群创建](../lifecycle/1-create.md) +- 目前的恢复功能只支持同构恢复,暂不支持节点数量的变更 \ No newline at end of file diff --git a/ops/backup-restore/4-backup-schedule.md b/ops/backup-restore/4-backup-schedule.md new file mode 100644 index 0000000..1b8a68f --- /dev/null +++ b/ops/backup-restore/4-backup-schedule.md @@ -0,0 +1,68 @@ +备份调度 +========== +PolarDB-X Operator 从 1.4.0 版本开始支持全量备份调度功能。本文介绍如何为集群配置备份调度。 + +## 注意事项 + +- 若到达调度时间,对应的PolarDB-X 集群有其他备份正在进行中,则此次备份任务会等进行中的备份任务结束后再开始 +- 若同时发起多个调度,请合理制定调度规则,避免同一时间触发多个备份 + +## 创建备份调度 + +PolarDBXBackupSchedule 对象的示例如下所示: + +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXBackupSchedule +metadata: + name: pxc-schedule +spec: + schedule: "*/20 * * * *" + maxBackupCount: 5 + suspend: false + backupSpec: + cluster: + name: polardbx-test + retentionTime: 240h + storageProvider: + storageName: sftp + sink: default + preferredBackupRole: follower +``` + +参数说明: +* schedule: 调度规则,即定期发起备份的时间点,须使用合规的cron表达式指定 +* maxBackupCount: 保存的备份集数量上限,当备份集数超过上限,会从最旧的备份集开始清理,默认值为0,表示不做清理 +* suspend: 调度是否暂停,默认为 `false`,表示不暂停 +* backupSpec: 备份配置,可参考[集群备份](./2-cluster-backup.md) + +参照上述示例编写 pxc-schedule.yaml 文件,通过以下命令创建备份调度: + +```bash +kubectl apply -f pxc-schedule.yaml +``` + +可通过以下命令观察调度状态: + +```bash +kubectl get pbs +``` + +可以从状态中获取到如下信息: + +```bash +NAME SCHEDULE LAST_BACKUP_TIME NEXT_BACKUP_TIME LAST_BACKUP +pxc-schedule */20 * * * * 2023-03-16T08:00:00Z 2023-03-16T08:20:00Z polardbx-test-backup-202303160800 +``` + +## 调度规则示例 + +PolarDBXBackupSchedule 对象的 `spec.schedule` 字段表示调度规则,遵循标准cron表达式的格式要求,下表是一些调度规则的示例: + +| 调度规则 | 规则含义 | +| ----- | ------ | +| */20 * * * * | 每20分钟发起备份 | +| 0 * * * * | 每小时发起备份 | +| 0 0 * * 1 | 每周一的0点发起备份 | +| 0 2 * * 1,4 | 周一和周四的2点发起备份 | +| 0 2 */2 * * | 每两天的2点发起备份 | diff --git a/ops/backup-restore/README.md b/ops/backup-restore/README.md new file mode 100644 index 0000000..8a66115 --- /dev/null +++ b/ops/backup-restore/README.md @@ -0,0 +1,10 @@ +备份恢复 +=== + +> PolarDB-X Operator 从1.3.0版本开始支持备份恢复功能 + +1. [备份集存储方式配置](./1-backup-storage-configure.md) +2. [集群备份](./2-cluster-backup) +3. [增量日志备份](./2-binlog-backup.md) +4. [集群恢复](./3-cluster-restore) +5. [指定时间点恢复](./pitr) \ No newline at end of file diff --git a/ops/backup-restore/binlog-backup.md b/ops/backup-restore/binlog-backup.md new file mode 100644 index 0000000..1b05225 --- /dev/null +++ b/ops/backup-restore/binlog-backup.md @@ -0,0 +1,84 @@ +增量日志备份 +====== + +PolarDB-X Operator 从 1.4.0 版本开始支持增量日志备份功能。本文介绍如何对 PolarDB-X 进行增量日志备份。 +> 此处的增量日志为,DN节点上生成的一致性日志(类似mysql binlog日志),默认在DN容器内的/data/mysql/log目录中 + +## 前置条件 +1. PolarDB-X Operator 升级到 1.4.0 及以上版本 +2. 完成备份存储方式配置,参见文档:[备份存储配置](./1-backup-storage-configure.md) + + +## 发起增量日志备份 + +下面介绍如何通过 PolarDBXBackupBinlog 对象为 PolarDB-X 进行增量日志备份。 + +### 创建 PolarDBXBackupBinlog 对象 + +1. 参照如下示例编写 pxc-backup-binlog.yaml 文件: + +```yaml +apiVersion: polardbx.aliyun.com/v1 # API 组/版本 +kind: PolarDBXBackupBinlog # API 名称 +metadata: + name: backupbinlogforpolardb-x #增量日志备份任务名称 +spec: + pxcName: polardb-x # 待备份的目标 PolarDB-X 集群名称 + pxcUid: 8f634de1-5a4e-4e1c-b2dc-e8763384d83a # 待备份的目标 PolarDB-X 集群名称UID + remoteExpireLogHours: 168 # 在远程端(OSS或者SFTP)上的保存小时数 + localExpireLogHours: 7 # 在数据节点本地的保存小时数 + maxLocalBinlogCount: 60 # 在数据节点本地保存的增量日志文件保存个数 + pointInTimeRecover: true # 是否支持指定时间点恢复 + binlogChecksum: CRC32 # 增量日志校验码 + storageProvider: + storageName: oss # 存储方式,支持 sftp 和 oss + sink: osssink # 存储配置项的名称 +``` + +参数说明: +* pxcName: 待备份的目标 PolarDB-X 集群名称, 必填字段 +* pxcUid: 待备份的目标 PolarDB-X 集群名称UID,可选字段,一般不填 +* remoteExpireLogHours: 在远程端(OSS或者SFTP)上的保存小时数,可选字段,默认值为 168 +* localExpireLogHours: 在数据节点本地的保存小时数,可选字段,默认值为 7 +* maxLocalBinlogCount: 在数据节点本地保存的增量日志文件保存个数,可选字段,默认值为 60 +* pointInTimeRecover: 是否支持指定时间点恢复,可选字段,默认值为 true +* binlogChecksum: 增量日志校验码,可选字段,默认值为 CRC32 +* storageProvider.storageName: 备份集存储方式,支持 sftp 和 oss,必填字段 +* storageProvider.sink: 备份集存储配置的名称,对应[备份存储配置](./1-backup-storage-configure.md)中的 name 字段,必填字段 + +2.使用下面的命令创建 PolarDBXBackupBinlog 对象,开启增量日志备份: +```bash +kubectl create -f pxc-backup-binlog.yaml +``` +3.查看增量日志备份运行阶段是否为`running` +```bash +kubectl get pxcblog +``` + +## 增量日志备份查阅 + +增量日志备份文件存放在如下路径,您可以在 SFTP 配置的主机或者 OSS bucket 中查看对应的文件。 + +增量日志的元数据文件 +``` +{root_path}/polardbx-binlogbackup/{namespace}/{pxc_name}/{pxc_uid}/{xstore_name}/{xstore_uid}/{pod_name}/{version}/{batch_name}/binlog-meta/mysql_bin.{number}.txt +``` +增量日志文件 +``` +{root_path}/polardbx-binlogbackup/{namespace}/{pxc_name}/{pxc_uid}/{xstore_name}/{xstore_uid}/{pod_name}/{version}/{batch_name}/binlog-file/mysql_bin.{number} +``` + +- root_path取决于存储配置 + - 若采用sftp作为存储,则该值为sink.rootPath + - 若采用oss作为存储,则该值为sink.bucket +- polardbx-binlogbackup为固定字段 +- namespace是目标PolarDB-X Cluster所在namesapce +- pxc_name是目标PolarDB-X Cluster的名称 +- pxc_uid是目标PolarDB-X Cluster的UID +- xstore_name是备份文件所属的xstore名称 +- xstore_uid是备份文件所属的xstore的uid +- pod_name是备份文件所属的pod的名称 +- version是备份文件所属的pod的版本号 +- batch_name是批目录名称,1000个文件为一批 +- binlog-file和binlog-meta为固定字段 +- number为增量日志文件的序号 \ No newline at end of file diff --git a/ops/backup-restore/pitr.md b/ops/backup-restore/pitr.md new file mode 100644 index 0000000..2a0761e --- /dev/null +++ b/ops/backup-restore/pitr.md @@ -0,0 +1,51 @@ +指定时间点恢复 +====== + +PolarDB-X Operator 从 1.4.0 版本开始支持指定时间点恢复。本文介绍如何对 PolarDB-X 进行指定时间点恢复。 + +## 前置条件 + +1. PolarDB-X Operator 升级到 1.4.0 及以上版本 +2. 已经配置增量日志备份,并配置支持指定时间点恢复(默认为支持) +3. 恢复的时间点之前存在一个全量备份集 +4. 全量备份集中的恢复时间到指定的恢复时间之间有连续的增量日志文件 + +## 恢复 PolarDB-X 集群 + +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXCluster +metadata: + name: pxc-pitr-restore # 恢复出的集群名字 +spec: + topology: # 集群规格 + nodes: + cn: + template: + image: polardbx/galaxysql:latest + dn: + template: + image: polardbx/galaxyengine:latest + restore: # 指定集群的创建方式是恢复 + from: + clusterName: polardb-x-2 # 源PolarDB-X 集群名称 + time: "2023-03-20T02:06:46Z" # 恢复的时间点 +``` + +参照上述示例编写恢复用的yaml文件,这里需要注意指定创建方式是`restore`,通过以下命令进行恢复: + +```bash +kubectl apply -f pxc-pitr-restore.yaml +``` + +> 注意: +> * 如果全量备份集的时间点到指定的恢复时间之间,存在数据节点的增删操作,会导致恢复失败; +> * 如果全量备份集的时间点到指定的恢复时间之间,数据备节点发生过备库重搭等原因导致增量日志没有连续产生,会导致恢复失败; +> * 指定的恢复时间点附近如果有DDL操作,会有元数据不一致的问题 + +> 建议: +> * 定期做全量备份 +> * 在发生数据节点的增删后,发起一次全量备份任务 + +其余操作步骤,可参考 [集群恢复](./3-cluster-restore.md) + diff --git a/ops/component/README.md b/ops/component/README.md new file mode 100644 index 0000000..f9890b1 --- /dev/null +++ b/ops/component/README.md @@ -0,0 +1,25 @@ +组件管理 +======= + +### 计算节点运维 + +1. [检查节点状态](./cn/1-node-state-inspect.md) +2. [配置存活性、可用性探测](./cn/2-liveness.md) +3. [登录计算节点容器](./cn/3-cn-pod-login.md) +4. [获取计算节点日志](./cn/4-cn-log.md) +5. [删除/重建计算节点](./cn/5-node-delete.md) + +### 存储节点运维 + +1. [检查节点状态](./dn/1-dn-node-state-inspect.md) +2. [登录内部节点](./dn/2-dn-node-login.md) +3. [获取内部节点日志](./dn/3-dn-log.md) +4. [获取存储节点连接信息](./dn/4-dn-connection.md) +5. [获取存储节点任务信息](./dn/5-dn-task-info.md) +6. [删除/重建内部节点](./dn/6-dn-delete.md) + +### 日志节点运维 + +1. [检查节点状态](./cdc/1-cdc-state-inspect.md) +2. [登录日志节点容器](./cdc/2-cdc-node-login.md) +3. [重建日志节点](./cdc/3-cdc-delete.md) \ No newline at end of file diff --git a/ops/component/cdc/1-cdc-state-inspect.md b/ops/component/cdc/1-cdc-state-inspect.md new file mode 100644 index 0000000..74ec8f4 --- /dev/null +++ b/ops/component/cdc/1-cdc-state-inspect.md @@ -0,0 +1,16 @@ +检查日志节点状态 +===== +执行如下命令获取 cdc 的 pod 列表: + +```shell + kubectl get pods -l polardbx/role=cdc + +``` +期望得到如下结果,通过 READY , STATUS 字段判断 cdc pod 是否正常。 + +```shell +NAME READY STATUS RESTARTS AGE +tunan-oss-drsg-cdc-default-57d97f5bc8-8z4mz 2/2 Running 0 14d +tunan-oss-drsg-cdc-default-57d97f5bc8-qhvnq 2/2 Running 0 14d +``` + diff --git a/ops/component/cdc/2-cdc-node-login.md b/ops/component/cdc/2-cdc-node-login.md new file mode 100644 index 0000000..521a462 --- /dev/null +++ b/ops/component/cdc/2-cdc-node-login.md @@ -0,0 +1,11 @@ +登录日志节点 +====== +执行如下命令登录日志节点容器: + +```shell +kubectl exec -it {pod 名} bash +``` + +cdc 的日志在 /home/admin/logs/ 下面 + +CDC 的容器内会有三个 java 进程,daemon,dumper,final,日志分别在对应的目录下。 diff --git a/ops/component/cdc/3-cdc-delete.md b/ops/component/cdc/3-cdc-delete.md new file mode 100644 index 0000000..0645c1c --- /dev/null +++ b/ops/component/cdc/3-cdc-delete.md @@ -0,0 +1,7 @@ +重建日志节点 +====== +执行如下命令delete cdc的pod 即可触发重建: + +```shell +kubectl delete pod {pod 名} +``` diff --git a/ops/component/cn/1-node-state-inspect.md b/ops/component/cn/1-node-state-inspect.md new file mode 100644 index 0000000..a9d602b --- /dev/null +++ b/ops/component/cn/1-node-state-inspect.md @@ -0,0 +1,29 @@ +检查计算节点状态 +======= +执行如下命令查看 PolarDB-X 集群CN的总体状态: + +```shell +kubectl get pxc +``` + +期望得到如下输出,可以查看CN的数目以及当前ready的数目 + +```shell +NAME GMS CN DN CDC PHASE DISK AGE +classic 1/1 2/2 3/3 1/1 Running 40.3 GiB 4d2h +``` + + +执行如下命令获取 cn 的 pod 列表: + +```shell + kubectl get pods -l polardbx/role=cn +``` + +期望得到如下结果,通过 READY , STATUS 字段判断 cn pod 是否正常。 + +```shell +NAME READY STATUS RESTARTS AGE +classic-4gss-cn-default-5488d667fd-74lz2 3/3 Running 0 4d2h +classic-4gss-cn-default-5488d667fd-hn7fj 3/3 Running 1 4d2h +``` diff --git a/ops/component/cn/2-liveness.md b/ops/component/cn/2-liveness.md new file mode 100644 index 0000000..64c00b8 --- /dev/null +++ b/ops/component/cn/2-liveness.md @@ -0,0 +1,20 @@ +存活性、可用性配置 +======= +为了保证服务的高可用,如果发现某个组件(CN,DN,GMS,CDC)探活失败,K8s 会自动重启对应的pod以恢复服务。 +但是在部分场景下我们需要重启cn 进程(修改了某个参数)或者排查进程挂的原因,此时我们不希望K8s 因为探活失败把cn 的pod 删除,那么可以执行如下命令可以关闭所有 CN 节点的探活: + +```bash +kubectl annotate pod -l polardbx/role=cn runmode=debug +``` + +或者只关闭某个cn pod的探活: + +```bash +kubectl annotate pod {pod 名} runmode=debug +``` + +重新打开 cn 的探活: + +```bash +kubectl annotate --overwrite pod {pod 名} runmode- +``` \ No newline at end of file diff --git a/ops/component/cn/3-cn-pod-login.md b/ops/component/cn/3-cn-pod-login.md new file mode 100644 index 0000000..515dc3b --- /dev/null +++ b/ops/component/cn/3-cn-pod-login.md @@ -0,0 +1,14 @@ +登录计算节点 +======= +## 登录 Pod +如果 CN 处于 ready 状态,执行如下命令即可登录 CN Pod: + +```shell +kubectl exec -it {pod 名} bash +``` + +如果CN pod 因为探活失败处于 Crash 状态,可以通过如下命令关闭探活,让pod 处于 ready 状态后再执行上述命令登录 pod。 + +```shell +kubectl annotate pod {pod 名} runmode=debug +``` diff --git a/ops/component/cn/4-cn-log.md b/ops/component/cn/4-cn-log.md new file mode 100644 index 0000000..e24cddb --- /dev/null +++ b/ops/component/cn/4-cn-log.md @@ -0,0 +1,10 @@ +查看计算节点日志 +======== +1. 参考 《[3. 登录计算节点容器](./3-cn-pod-login.md) 》进入CN 的容器 +2. 进入 /home/admin/drds-server/logs 目录下查看需要的日志 +3. 如果需要拷贝日志文件到本地,可以通过如下命令: + +```shell +kubectl cp {pod 名}:{pod 内的日志文件} {本地目录} +``` + diff --git a/ops/component/cn/5-node-delete.md b/ops/component/cn/5-node-delete.md new file mode 100644 index 0000000..ac69c0e --- /dev/null +++ b/ops/component/cn/5-node-delete.md @@ -0,0 +1,13 @@ +重建计算节点 +======== +重建所有的 cn 节点,执行如下命令: + +```shell +kubectl delete pod -l polardbx/name={实例名},polardbx/role=cn +``` + +重建单个 cn 节点,执行如下命令: + +```shell +kubectl delete pod {pod 名} +``` diff --git a/ops/component/dn/1-dn-node-state-inspect.md b/ops/component/dn/1-dn-node-state-inspect.md new file mode 100644 index 0000000..2b7e52d --- /dev/null +++ b/ops/component/dn/1-dn-node-state-inspect.md @@ -0,0 +1,68 @@ +## 查询 XStore 列表 +执行如下命令查询所有DN 的列表: + +```shell +kubectl get xstore -l polardbx/name={实例名} +``` + +得到如下输出: + +```shell +NAME LEADER READY PHASE DISK VERSION AGE +tunan-oss-drsg-dn-0 tunan-oss-drsg-dn-0-cand-1 3/3 Running 11.7 GiB 8.0.18 20d +tunan-oss-drsg-dn-1 tunan-oss-drsg-dn-1-cand-1 3/3 Running 11.0 GiB 8.0.18 20d +``` + +PHASE 显示的是 每个 DN 的状态,LEADER 显示的是当前 DN 的 Leader pod。 + +## 查看 DN Pod +如果想查询PolarDB-X DN 的所有 pod,执行如下命令: + +```shell +kubectl get pod -l polardbx/name={实例名},polardbx/role=dn +``` + +得到如下结果: + +```shell +NAME READY STATUS RESTARTS AGE +tunan-oss-drsg-dn-0-cand-0 3/3 Running 0 20d +tunan-oss-drsg-dn-0-cand-1 3/3 Running 0 20d +tunan-oss-drsg-dn-0-log-0 3/3 Running 0 20d +tunan-oss-drsg-dn-1-cand-0 3/3 Running 0 20d +tunan-oss-drsg-dn-1-cand-1 3/3 Running 0 20d +tunan-oss-drsg-dn-1-log-0 3/3 Running 0 20d +``` + +如果想查看每个dn pod的角色,执行如下命令: + +```shell +kubectl get pod -l polardbx/name={实例名},polardbx/role=dn --show-labels +``` + +得到如下输出, 其中xstore/role=follower 表示的就是pod的role。 +> 注:如果xstore/role 该标签没有值,说明 DN 正在进行选主或者选主出现了问题 + +```shell +NAME READY STATUS RESTARTS AGE LABELS +tunan-oss-drsg-dn-0-cand-0 3/3 Running 0 20d polardbx/dn-index=0,polardbx/name=tunan-oss,polardbx/rand=drsg,polardbx/role=dn,xstore/generation=2,xstore/name=tunan-oss-drsg-dn-0,xstore/node-role=candidate,xstore/node-set=cand,xstore/pod=tunan-oss-drsg-dn-0-cand-0,xstore/port-lock=16148,xstore/role=follower +``` + +## 查看特定角色的 DN pod +查看所有的 leader pod: + +```shell +kubectl get pod -l polardbx/name={实例名},polardbx/role=dn,xstore/role=leader +``` + +查看所有的 follower pod: + +```shell +kubectl get pod -l polardbx/name={实例名},polardbx/role=dn,xstore/role=follower +``` + +查看所有的 logger pod + +```shell +kubectl get pod -l polardbx/name={实例名},polardbx/role=dn,xstore/role=logger +``` diff --git a/ops/component/dn/2-dn-node-login.md b/ops/component/dn/2-dn-node-login.md new file mode 100644 index 0000000..f7914c5 --- /dev/null +++ b/ops/component/dn/2-dn-node-login.md @@ -0,0 +1,15 @@ +## 登录 Pod +找到需要登录的 DN 的 pod,如果pod 处于 3/3 ready的状态,执行如下命令即可: + +```shell +kubectl exec -it {pod 名} bash +``` + +如果 DN 的 pod 因为探活失败被不停的重启,执行如下命令关闭探活后再登录: + +```shell +kubectl annotate pod -l polardbx/role=dn runmode=debug +``` + +## 进入 MySQL 命令行 +执行 `myc` 命令即可 diff --git a/ops/component/dn/3-dn-log.md b/ops/component/dn/3-dn-log.md new file mode 100644 index 0000000..5540c57 --- /dev/null +++ b/ops/component/dn/3-dn-log.md @@ -0,0 +1,41 @@ +## 容器可以登录 +如果 DN 的 pod 能够正常(或者关闭探活后)登录,则登录后进入 /data/mysql/log/目录,重点查看 alert.log 即可。 + +## 容器无法登录 +通过如下命令查看 dn engine 的启动日志: + +```shell +kubectl logs {pod 名} engine +``` + +如果发现日志中有报错,显示 mysql 初始化失败,则需要通过如下的方式前往 dn pod 在主机上的目录查看 alert.log + + 1. 加上 -o wide 参数,找到dn pod 所在的机器: + +```shell +kubectl get pod {pod 名} -o wide +``` + +得到如下输出: + +```shell +NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES +tunan-oss-drsg-dn-0-cand-1 3/3 Running 0 20d 172.16.0.129 cn-zhangjiakou.172.16.0.129 +``` + +其中 NODE 即 该pod 调度的机器。 + +2. 执行如下命令获取 dn pod 在宿主机上的实际目录: + +```shell +kubectl get pod {pod 名} -o json | grep "/data/xstore/default" +``` + +期望得到如下输出: + +```shell +"path": "/data/xstore/default/tunan-oss-drsg-dn-0-cand-1" +``` + +3. 前往该机器的上述目录即可查看日志。 + diff --git a/ops/component/dn/4-dn-connection.md b/ops/component/dn/4-dn-connection.md new file mode 100644 index 0000000..09fec61 --- /dev/null +++ b/ops/component/dn/4-dn-connection.md @@ -0,0 +1,29 @@ +## 获取用户名密码 +用户名默认是:admin +密码通过如下命令获取: + +```shell +kubectl get secret {dn 名} -o jsonpath={.data.admin} | base64 -d - | xargs echo "Password" +``` + +## 获取连接串 +执行如下命令获取 clusterIp: + +```shell +kubectl get svc {dn 名} +``` + +期望得到如下输出: + +```shell +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +tunan-oss-drsg-dn-0 ClusterIP 192.168.192.73 3306/TCP,31306/TCP 21d +``` + +如果在 K8s 集群内部直接通过 cluster-ip + 3306 端口即可访问。 + +如果在 K8s 外部,执行如下命令将端口转发到本地访问: + +```shell +kubectl port-forward svc/{dn 名} 3306:3306 +``` diff --git a/ops/component/dn/5-dn-task-info.md b/ops/component/dn/5-dn-task-info.md new file mode 100644 index 0000000..5186ded --- /dev/null +++ b/ops/component/dn/5-dn-task-info.md @@ -0,0 +1,39 @@ +注:只有在升级的时候有用。 + +使用下面的命令查看任务信息, + +```bash +kubectl get cm {xstore}-task -o yaml +``` + +结构为: + +```go +type ExecutionContext struct { + // Topologies in uses. + Topologies map[int64]*xstore.Topology `json:"topologies,omitempty"` + + // Generation. + Generation int64 `json:"generation,omitempty"` + + // Current running nodes. + Running map[string]model.PaxosNodeStatus `json:"running,omitempty"` + + // Tracking nodes. This is the tracking set of the paxos node configuration. + Tracking map[string]model.PaxosNodeStatus `json:"tracking,omitempty"` + + // Expected nodes. + Expected map[string]model.PaxosNode `json:"expected,omitempty"` + + // Current usable volumes. + Volumes map[string]model.PaxosVolume `json:"volumes,omitempty"` + + // Plan. + Plan *plan.Plan `json:"plan,omitempty"` + + // StepIndex of the plan. + StepIndex int `json:"step_index,omitempty"` + + PodFactory factory.ExtraPodFactory `json:"-"` +} +``` diff --git a/ops/component/dn/6-dn-delete.md b/ops/component/dn/6-dn-delete.md new file mode 100644 index 0000000..ff3d9b6 --- /dev/null +++ b/ops/component/dn/6-dn-delete.md @@ -0,0 +1,29 @@ +## Delete pod +执行如下命令直接删除 对应的 dn pod,k8s 会自动重建该 pod: + +```shell +kubectl delete pod {dn pod名} +``` + + +## Graceful Shutdown +部分场景下我们需要对 DN 进行graceful的 shutdown然后重启,可先执行如下命令关闭dn的探活: + +```shell +kubectl annotate pod {dn pod 名} runmode=debug +``` + +然后直接登录对应的 dn pod,输入myc 命令进入 MySQL 命令行,执行如下命令: + +```shell +mysql> shutdown; +``` + +即可重启 DN 的进程,此时 dn 的pod 不会重建。 + +搞完后记得执行如下命令打开探活: + +```shell +kubectl annotate --overwrite pod {dn pod 名} runmode- +``` + diff --git a/ops/configuration/1-cn-variable-at-startup.md b/ops/configuration/1-cn-variable-at-startup.md new file mode 100644 index 0000000..b803287 --- /dev/null +++ b/ops/configuration/1-cn-variable-at-startup.md @@ -0,0 +1,22 @@ +直接编辑 PolarDBXCluster的yaml,这部分参数修改因为需要重启才能生效,所以会自动触发cn pod 重建。通过如下命令修改: + +```shell +kubectl edit pxc {pxc 名称} +``` + +修改:.spec.config.cn.static,添加需要修改的参数,如下所示: + +```yaml +# 静态配置,修改会导致 CN 集群重建 +static: + # 启用协程, OpenJDK 暂不支持,需使用 dragonwell + EnableCoroutine: false + # 启用备库一致读 + EnableReplicaRead: false + # 启用 JVM 的远程调试 + EnableJvmRemoteDebug: false + # 自定义 CN 静态配置,key-value 结构 + # value 的值类型为 int 或 string,因此 bool 类型需要手动写为 string,例如 "true"、"false" + ServerProperties: + processors: 8 +``` diff --git a/ops/configuration/1-cn-variable-load-at-runtime-create-db.md b/ops/configuration/1-cn-variable-load-at-runtime-create-db.md new file mode 100644 index 0000000..e215ad7 --- /dev/null +++ b/ops/configuration/1-cn-variable-load-at-runtime-create-db.md @@ -0,0 +1,43 @@ +CN 的动态参数支持直接在 PolarDBXCluster 对象的 yaml 中修改,详见 .spec.config.cn.dynamic。不过这种配置方式也存在一些问题: + +- 集群配置项过多,集群定义过长掩盖其他细节 +- PolarDBXCluster 不仅需要负责集群(容器)的维护,还需要负责配置项的维护,逻辑复杂且容易出错 +- 单向同步导致其他途径(比如 set global)设置的参数失效 + +因此开源版本中,支持了通过 Knobs 对象修改 CN 的动态参数。 + +Knobs 对象的 yaml 定义如下: + +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXClusterKnobs +metadata: + name: polardbx-xcluster + namespace: development +spec: + ## PolarDB-X 的实例名 + clusterName: "polardbx-xcluster" + # 创建时不需要指定 + knobs: + ## 参数列表 + CONN_POOL_MAX_POOL_SIZE: 100 + RECORD_SQL: "true" + +``` + +>注:CN 的动态参数列表详见:[https://help.aliyun.com/document_detail/316576.html](https://help.aliyun.com/document_detail/316576.html) +> +>注:布尔参数值需要用字符串传入。 + +编辑好上述的 yaml 文件后 执行即可 + +```shell +kubectl apply -f {knbos yaml 文件} +``` + +执行如下 命令查看 knobs的列表: + +```shell +kubectl get pxcknobs +``` + diff --git a/ops/configuration/1-cn-variable-load-at-runtime-delete-db.md b/ops/configuration/1-cn-variable-load-at-runtime-delete-db.md new file mode 100644 index 0000000..3aa6eb6 --- /dev/null +++ b/ops/configuration/1-cn-variable-load-at-runtime-delete-db.md @@ -0,0 +1,5 @@ +执行如下命令删除 pxcknobs 对象即可,pxcknobs 之前配置的参数不会被重置。 + +```shell +kubectl delete pxcknobs {knobs 名} +``` diff --git a/ops/configuration/1-cn-variable-load-at-runtime-update-db.md b/ops/configuration/1-cn-variable-load-at-runtime-update-db.md new file mode 100644 index 0000000..eb268aa --- /dev/null +++ b/ops/configuration/1-cn-variable-load-at-runtime-update-db.md @@ -0,0 +1,7 @@ +如果 PolarDB-X Cluster 存在 Knobs对象的话,直接edit改对象,添加/修改/删除对应的参数即可,执行如下命令: + +```shell +kubectl edit pxcknobs {pxcknobs 名称} +``` + +如果 knobs 不存在,直接创建一个新的 knobs 对象,参考:[创建数据库参数操作对象](./1-cn-variable-load-at-runtime-create-db.md) diff --git a/ops/configuration/2-dn-variable.md b/ops/configuration/2-dn-variable.md new file mode 100644 index 0000000..428543b --- /dev/null +++ b/ops/configuration/2-dn-variable.md @@ -0,0 +1,26 @@ +## 修改 PXC YAML +直接修改 pxc yaml 的 .spec.config.dn即可,添加相关的mysql 参数,如下图所示: + +```shell +# DN 相关配置 + dn: + # DN my.cnf 配置,覆盖模板部分 + mycnfOverwrite: |- + loose_binlog_checksum: crc32 + logPurgeInterval: 5m + logDataSeparation: false +``` + +注意:如果部分my.cnf 参数需要重启后才能生效,需要手动重启 DN 的 mysql 进程。 + +非 my.cnf 参数目前支持设置: +- binlog的清理时间,修改 .spec.config.dn.logPurgeInterval 即可。 +- 日志与数据是否分离存储,修改 .spec.config.dn.logDataSeparation 即可 + +## Set Global 指令 +除了修改yaml 外,也可以通过 CN 的 set global 指令修改 DN 参数,登录CN,执行如下SQL: + +```shell +set ENABLE_SET_GLOBAL = true; -- 开启 set global 功能 +set global {dn 参数}; -- 关闭 sql日志 +``` diff --git a/ops/configuration/3-parameter-template.md b/ops/configuration/3-parameter-template.md new file mode 100644 index 0000000..23e7088 --- /dev/null +++ b/ops/configuration/3-parameter-template.md @@ -0,0 +1,98 @@ +## 参数模板 +PolarDB-X Operator从1.3.0版本开始支持参数模板功能 + +在实例初始化时,可以指定参数模板文件,对CN和DN指定一系列需要的模板参数。 + +参数模板需要通过yaml文件的形式进行配置。 + +```shell +kubectl apply -f {参数模板文件名称}.yaml +``` + +### 参数模板说明 + +注:在参数列表中,每个参数需要指定7个不同的属性,包括: + - name(名称) + - 参数名称 + - defaultValue(默认值) + - 参数的默认值,格式为字符串 + - mode(修改模式) + - 参数的模式,包含 read-only 和 read-write + - restart(是否重启) + - 参数修改后是否需要重启实例 + - unit(单位) + - 参数的单位,包含 INT, DOUBLE, STRING, TZ(Time Zone), HOUR_RANGE + - divisibilityFactor(整除因子) + - 单位为INT的参数需要设置整除因子,其他单位默认为0 + - optional(取值范围) + - 单位为INT, DOUBLE, HOUR_RANGE的参数,取值范围是一段范围,如:"[1000-60000]" + - 单位为STRING或TZ的参数,取值范围是一些可选项,如:"[ON|OFF]" + +参数模板的样例如下: + +```yaml +## 参数模板示例 +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXParameterTemplate +metadata: + name: parameterTemplate +spec: + nodeType: + cn: + # 参数列表 + paramList: + - name: CONN_POOL_BLOCK_TIMEOUT + defaultValue: "5000" + mode: read-write + restart: false + unit: INT + divisibilityFactor: 1 + optional: "[1000-60000]" + - ... + dn: + name: dnTemplate + paramList: + - name: innodb_use_native_aio + defaultValue: "OFF" + mode: readonly + restart: false + unit: STRING + divisibilityFactor: 0 + optional: "[ON|OFF]" + - ... + gms: ... +``` + +### 查看参数模板 + +实例会默认在default namespace中应用[8.0版本的参数模板](./3-parameter-template8.0.yaml),如果想在其他namespace创建实例,需要在相应的的namespace中创建参数模板对象。 + +可以通过如下命令查看已配置的所有参数模板。 + +```shell +kubectl get PolarDBXParameterTemplate +# 或者可用简称 +kubectl get pxpt +``` + +### PolarDBXCluster配置 + +配置好参数模板后,就可以在启动PolarDBXCluster的yaml文件中指定需要的参数模板 + +```yaml +# 添加参数模板 +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXCluster +metadata: + name: pxc +spec: + ... + ... + # 需要配置的参数模板 + parameterTemplate: + name: product +``` + +注: +- 实例在应用参数模板后,会对CN或DN中默认的参数根据参数模板中的默认值进行调整。此外,configmap中my.cnf.overwrite字段的参数具有更高优先级,不会被参数模板修改。 +- 目前不支持对配置了参数模板的实例修改参数模板,也不支持对运行的实例添加参数模板。若想修改运行中实例的参数,需使用动态参数功能。 \ No newline at end of file diff --git a/ops/configuration/3-parameter-template8.0.yaml b/ops/configuration/3-parameter-template8.0.yaml new file mode 100644 index 0000000..e4cd085 --- /dev/null +++ b/ops/configuration/3-parameter-template8.0.yaml @@ -0,0 +1,2884 @@ +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXParameterTemplate +metadata: + name: product +spec: + nodeType: + cn: + name: cnTemplate + paramList: + - defaultValue: 05:00 + divisibilityFactor: 0 + mode: readwrite + name: BACKGROUND_STATISTIC_COLLECTION_END_TIME + optional: '[00:00|01:00|02:00|03:00|04:00|05:00|06:00|07:00|08:00|09:00|10:00|11:00|12:00|13:00|14:00|15:00|16:00|17:00|18:00|19:00|20:00|21:00|22:00|23:00]' + restart: false + unit: STRING + - defaultValue: 02:00 + divisibilityFactor: 0 + mode: readwrite + name: BACKGROUND_STATISTIC_COLLECTION_START_TIME + optional: '[00:00|01:00|02:00|03:00|04:00|05:00|06:00|07:00|08:00|09:00|10:00|11:00|12:00|13:00|14:00|15:00|16:00|17:00|18:00|19:00|20:00|21:00|22:00|23:00]' + restart: false + unit: STRING + - defaultValue: '5000' + divisibilityFactor: 1 + mode: readwrite + name: CONN_POOL_BLOCK_TIMEOUT + optional: '[1000-60000]' + restart: false + unit: INT + - defaultValue: '30' + divisibilityFactor: 1 + mode: readwrite + name: CONN_POOL_IDLE_TIMEOUT + optional: '[1-60]' + restart: false + unit: INT + - defaultValue: '60' + divisibilityFactor: 1 + mode: readwrite + name: CONN_POOL_MAX_POOL_SIZE + optional: '[1-1600]' + restart: false + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: CONN_POOL_MAX_WAIT_THREAD_COUNT + optional: '[-1-8192]' + restart: false + unit: INT + - defaultValue: '20' + divisibilityFactor: 1 + mode: readwrite + name: CONN_POOL_MIN_POOL_SIZE + optional: '[0-60]' + restart: false + unit: INT + - defaultValue: '512' + divisibilityFactor: 1 + mode: readwrite + name: CONN_POOL_XPROTO_MAX_POOLED_SESSION_PER_INST + optional: '[1-8192]' + restart: false + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: CONN_POOL_XPROTO_STORAGE_DB_PORT + optional: '[-1-0]' + restart: false + unit: INT + - defaultValue: 'true' + divisibilityFactor: 0 + mode: readwrite + name: ENABLE_BACKGROUND_STATISTIC_COLLECTION + optional: '[true|false]' + restart: false + unit: STRING + - defaultValue: 'true' + divisibilityFactor: 1 + mode: readwrite + name: ENABLE_COMPLEX_DML_CROSS_DB + optional: '[true|false]' + restart: false + unit: STRING + - defaultValue: 'true' + divisibilityFactor: 0 + mode: readwrite + name: ENABLE_HLL + optional: '[true|false]' + restart: false + unit: STRING + - defaultValue: 'true' + divisibilityFactor: 0 + mode: readwrite + name: ENABLE_LOCAL_MODE + optional: '[true|false]' + restart: false + unit: STRING + - defaultValue: 'true' + divisibilityFactor: 0 + mode: readwrite + name: ENABLE_LOGICALVIEW_COST + optional: '[true|false]' + restart: false + unit: STRING + - defaultValue: 'false' + divisibilityFactor: 1 + mode: readwrite + name: ENABLE_RECYCLEBIN + optional: '[true|false]' + restart: false + unit: STRING + - defaultValue: 'true' + divisibilityFactor: 0 + mode: readwrite + name: ENABLE_SPM + optional: '[true|false]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 1 + mode: readwrite + name: ENABLE_SQL_FLASHBACK_EXACT_MATCH + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'true' + divisibilityFactor: 0 + mode: readwrite + name: ENABLE_STATEMENTS_SUMMARY + optional: '[true|false]' + restart: false + unit: STRING + - defaultValue: 'true' + divisibilityFactor: 0 + mode: readwrite + name: ENABLE_STATISTIC_FEEDBACK + optional: '[true|false]' + restart: false + unit: STRING + - defaultValue: 'true' + divisibilityFactor: 1 + mode: readwrite + name: FORBID_EXECUTE_DML_ALL + optional: '[true|false]' + restart: false + unit: STRING + - defaultValue: '-1' + divisibilityFactor: 1 + mode: readwrite + name: GENERAL_DYNAMIC_SPEED_LIMITATION + optional: '[-1-10000000]' + restart: false + unit: INT + - defaultValue: 'false' + divisibilityFactor: 1 + mode: readwrite + name: INFO_SCHEMA_QUERY_WITH_STAT + optional: '[true|false]' + restart: false + unit: STRING + - defaultValue: '2' + divisibilityFactor: 0 + mode: readwrite + name: IN_SUB_QUERY_THRESHOLD + optional: '[1-65535]' + restart: false + unit: INT + - defaultValue: SYSTEM + divisibilityFactor: 1 + mode: readwrite + name: LOGICAL_DB_TIME_ZONE + optional: '[SYSTEM|±HH:mm]' + restart: false + unit: TZ + - defaultValue: '28800000' + divisibilityFactor: 1 + mode: readwrite + name: LOGIC_IDLE_TIMEOUT + optional: '[3600000-86400000]' + restart: false + unit: INT + - defaultValue: '16777216' + divisibilityFactor: 1 + mode: readwrite + name: MAX_ALLOWED_PACKET + optional: '[4194304-33554432]' + restart: false + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: PARALLELISM + optional: '[-1-8]' + restart: false + unit: INT + - defaultValue: '-1' + divisibilityFactor: 1 + mode: readwrite + name: PER_QUERY_MEMORY_LIMIT + optional: '[-1-9223372036854775807]' + restart: false + unit: INT + - defaultValue: 00:00-01:00 + divisibilityFactor: 1 + mode: readwrite + name: PURGE_TRANS_START_TIME + optional: 00:00~23:59 + restart: false + unit: HOUR_RANGE + - defaultValue: '1000' + divisibilityFactor: 1 + mode: readwrite + name: SLOW_SQL_TIME + optional: '[1000-900000]' + restart: false + unit: INT + - defaultValue: '900000' + divisibilityFactor: 1 + mode: readwrite + name: SOCKET_TIMEOUT + optional: '[0-3600000]' + restart: false + unit: INT + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: STATEMENTS_SUMMARY_PERCENT + optional: '[0-100]' + restart: false + unit: INT + - defaultValue: REPEATABLE-READ + divisibilityFactor: 0 + mode: readwrite + name: TRANSACTION_ISOLATION + optional: '[REPEATABLE-READ|READ-COMMITTED|READ-UNCOMMITTED|SERIALIZABLE]' + restart: false + unit: STRING + - defaultValue: '500' + divisibilityFactor: 1 + mode: readwrite + name: XPROTO_MAX_DN_CONCURRENT + optional: '[1-8192]' + restart: false + unit: INT + - defaultValue: '32' + divisibilityFactor: 1 + mode: readwrite + name: XPROTO_MAX_DN_WAIT_CONNECTION + optional: '[1-8192]' + restart: false + unit: INT + - defaultValue: 'false' + divisibilityFactor: 1 + mode: readwrite + name: ENABLE_COROUTINE + optional: '[true|false]' + restart: true + unit: STRING + dn: + name: dnTemplate + paramList: + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: autocommit + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: automatic_sp_privileges + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: auto_increment_increment + optional: '[1-65535]' + restart: false + unit: INT + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: auto_increment_offset + optional: '[1-65535]' + restart: false + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: avoid_temporal_upgrade + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '1048576' + divisibilityFactor: 4096 + mode: readwrite + name: binlog_cache_size + optional: '[4096-16777216]' + restart: false + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 1 + mode: readwrite + name: binlog_order_commits + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: binlog_rows_query_log_events + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: full + divisibilityFactor: 0 + mode: readwrite + name: binlog_row_image + optional: '[full|minimal]' + restart: false + unit: STRING + - defaultValue: '32768' + divisibilityFactor: 4096 + mode: readwrite + name: binlog_stmt_cache_size + optional: '[4096-16777216]' + restart: false + unit: INT + - defaultValue: '"aes-128-ecb"' + divisibilityFactor: 1 + mode: readwrite + name: block_encryption_mode + optional: '["aes-128-ecb"|"aes-192-ecb"|"aes-256-ecb"|"aes-128-cbc"|"aes-192-cbc"|"aes-256-cbc"]' + restart: false + unit: STRING + - defaultValue: '4194304' + divisibilityFactor: 1 + mode: readwrite + name: bulk_insert_buffer_size + optional: '[0-4294967295]' + restart: false + unit: INT + - defaultValue: binary + divisibilityFactor: 0 + mode: readwrite + name: character_set_filesystem + optional: '[utf8|latin1|gbk|binary]' + restart: false + unit: STRING + - defaultValue: '1' + divisibilityFactor: 0 + mode: readwrite + name: concurrent_insert + optional: '[0|1|2]' + restart: false + unit: STRING + - defaultValue: '10' + divisibilityFactor: 1 + mode: readwrite + name: connect_timeout + optional: '[1-3600]' + restart: false + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: default_week_format + optional: '[0-7]' + restart: false + unit: INT + - defaultValue: '100' + divisibilityFactor: 1 + mode: readwrite + name: delayed_insert_limit + optional: '[1-4294967295]' + restart: false + unit: INT + - defaultValue: '300' + divisibilityFactor: 1 + mode: readwrite + name: delayed_insert_timeout + optional: '[1-3600]' + restart: false + unit: INT + - defaultValue: '1000' + divisibilityFactor: 1 + mode: readwrite + name: delayed_queue_size + optional: '[1-4294967295]' + restart: false + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: delay_key_write + optional: '[ON|OFF|ALL]' + restart: false + unit: STRING + - defaultValue: '4' + divisibilityFactor: 1 + mode: readwrite + name: div_precision_increment + optional: '[0-30]' + restart: false + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: end_markers_in_json + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '10' + divisibilityFactor: 1 + mode: readwrite + name: eq_range_index_dive_limit + optional: '[0-4294967295]' + restart: false + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: event_scheduler + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: explicit_defaults_for_timestamp + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: flush_time + optional: '[0-31536000]' + restart: false + unit: INT + - defaultValue: '1024' + divisibilityFactor: 1 + mode: readwrite + name: group_concat_max_len + optional: '[4-1844674407370954752]' + restart: false + unit: INT + - defaultValue: '644' + divisibilityFactor: 1 + mode: readwrite + name: host_cache_size + optional: '[0-65535]' + restart: false + unit: INT + - defaultValue: '''''' + divisibilityFactor: 0 + mode: readwrite + name: init_connect + optional: '[''''|''set names utf8mb4''|''set names utf8''|''set default_collation_for_utf8mb4=utf8mb4_general_ci''|''set + default_collation_for_utf8mb4=utf8mb4_general_ci;set names utf8mb4''|''set + names utf8mb4 collate utf8mb4_general_ci'']' + restart: false + unit: STRING + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: innodb_adaptive_flushing + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '10' + divisibilityFactor: 1 + mode: readwrite + name: innodb_adaptive_flushing_lwm + optional: '[0-70]' + restart: false + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: innodb_adaptive_hash_index + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '150000' + divisibilityFactor: 1 + mode: readwrite + name: innodb_adaptive_max_sleep_delay + optional: '[1-1000000]' + restart: false + unit: INT + - defaultValue: '64' + divisibilityFactor: 1 + mode: readwrite + name: innodb_autoextend_increment + optional: '[1-1000]' + restart: false + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: innodb_buffer_pool_dump_at_shutdown + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '25' + divisibilityFactor: 1 + mode: readwrite + name: innodb_buffer_pool_dump_pct + optional: '[1-100]' + restart: false + unit: INT + - defaultValue: all + divisibilityFactor: 0 + mode: readwrite + name: innodb_change_buffering + optional: '[none|inserts|deletes|changes|purges|all]' + restart: false + unit: STRING + - defaultValue: '25' + divisibilityFactor: 1 + mode: readwrite + name: innodb_change_buffer_max_size + optional: '[0-50]' + restart: false + unit: INT + - defaultValue: crc32 + divisibilityFactor: 0 + mode: readwrite + name: innodb_checksum_algorithm + optional: '[innodb|crc32|none|strict_innodb|strict_crc32|strict_none]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: innodb_cmp_per_index_enabled + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '5' + divisibilityFactor: 1 + mode: readwrite + name: innodb_compression_failure_threshold_pct + optional: '[0-100]' + restart: false + unit: INT + - defaultValue: '6' + divisibilityFactor: 1 + mode: readwrite + name: innodb_compression_level + optional: '[0-9]' + restart: false + unit: INT + - defaultValue: '50' + divisibilityFactor: 1 + mode: readwrite + name: innodb_compression_pad_pct_max + optional: '[0-70]' + restart: false + unit: INT + - defaultValue: '5000' + divisibilityFactor: 1 + mode: readwrite + name: innodb_concurrency_tickets + optional: '[1-4294967295]' + restart: false + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: innodb_data_file_purge + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '100' + divisibilityFactor: 1 + mode: readwrite + name: innodb_data_file_purge_interval + optional: '[0-10000]' + restart: false + unit: INT + - defaultValue: '128' + divisibilityFactor: 1 + mode: readwrite + name: innodb_data_file_purge_max_size + optional: '[16-1073741824]' + restart: false + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: innodb_deadlock_detect + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: innodb_disable_sort_file_cache + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '1' + divisibilityFactor: 0 + mode: readwrite + name: innodb_flush_log_at_trx_commit + optional: '[0|1|2]' + restart: false + unit: STRING + - defaultValue: '1' + divisibilityFactor: 0 + mode: readwrite + name: innodb_flush_neighbors + optional: '[0|1|2]' + restart: false + unit: STRING + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: innodb_flush_sync + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: innodb_ft_enable_diag_print + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: innodb_ft_enable_stopword + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '2000' + divisibilityFactor: 1 + mode: readwrite + name: innodb_ft_num_word_optimize + optional: '[0-10000]' + restart: false + unit: INT + - defaultValue: '2000000000' + divisibilityFactor: 1 + mode: readwrite + name: innodb_ft_result_cache_limit + optional: '[1000000-4294967295]' + restart: false + unit: INT + - defaultValue: '20000' + divisibilityFactor: 1 + mode: readwrite + name: innodb_io_capacity + optional: '[0-18446744073709551615]' + restart: false + unit: INT + - defaultValue: '40000' + divisibilityFactor: 1 + mode: readwrite + name: innodb_io_capacity_max + optional: '[0-18446744073709551615]' + restart: false + unit: INT + - defaultValue: '50' + divisibilityFactor: 1 + mode: readwrite + name: innodb_lock_wait_timeout + optional: '[1-1073741824]' + restart: false + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: innodb_log_checksums + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: innodb_log_compressed_pages + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '{LEAST(DBInstanceClassMemory/1048576/8, 8192)}' + divisibilityFactor: 1 + mode: readwrite + name: innodb_lru_scan_depth + optional: '[100-18446744073709551615]' + restart: false + unit: INT + - defaultValue: '75' + divisibilityFactor: 1 + mode: readwrite + name: innodb_max_dirty_pages_pct + optional: '[0-99]' + restart: false + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: innodb_max_dirty_pages_pct_lwm + optional: '[0-99]' + restart: false + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: innodb_max_purge_lag + optional: '[0-4294967295]' + restart: false + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: innodb_max_purge_lag_delay + optional: '[0-10000000]' + restart: false + unit: INT + - defaultValue: '1073741824' + divisibilityFactor: 1 + mode: readwrite + name: innodb_max_undo_log_size + optional: '[10485760-18446744073709551615]' + restart: false + unit: INT + - defaultValue: '' + divisibilityFactor: 0 + mode: readwrite + name: innodb_monitor_disable + optional: all + restart: false + unit: STRING + - defaultValue: '' + divisibilityFactor: 0 + mode: readwrite + name: innodb_monitor_enable + optional: all + restart: false + unit: STRING + - defaultValue: '37' + divisibilityFactor: 1 + mode: readwrite + name: innodb_old_blocks_pct + optional: '[5-95]' + restart: false + unit: INT + - defaultValue: '1000' + divisibilityFactor: 1 + mode: readwrite + name: innodb_old_blocks_time + optional: '[0-1024]' + restart: false + unit: INT + - defaultValue: '134217728' + divisibilityFactor: 1 + mode: readwrite + name: innodb_online_alter_log_max_size + optional: '[134217728-2147483647]' + restart: false + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: innodb_optimize_fulltext_only + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: innodb_print_all_deadlocks + optional: '[OFF|ON]' + restart: false + unit: STRING + - defaultValue: '128' + divisibilityFactor: 1 + mode: readwrite + name: innodb_purge_rseg_truncate_frequency + optional: '[1-128]' + restart: false + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: innodb_random_read_ahead + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '56' + divisibilityFactor: 1 + mode: readwrite + name: innodb_read_ahead_threshold + optional: '[0-1024]' + restart: false + unit: INT + - defaultValue: '128' + divisibilityFactor: 1 + mode: readwrite + name: innodb_rollback_segments + optional: '[1-128]' + restart: false + unit: INT + - defaultValue: '6' + divisibilityFactor: 1 + mode: readwrite + name: innodb_spin_wait_delay + optional: '[0-4294967295]' + restart: false + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: innodb_stats_auto_recalc + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: nulls_equal + divisibilityFactor: 0 + mode: readwrite + name: innodb_stats_method + optional: '[nulls_equal|nulls_unequal|nulls_ignored]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: innodb_stats_on_metadata + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: innodb_stats_persistent + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '20' + divisibilityFactor: 1 + mode: readwrite + name: innodb_stats_persistent_sample_pages + optional: '[0-4294967295]' + restart: false + unit: INT + - defaultValue: '8' + divisibilityFactor: 1 + mode: readwrite + name: innodb_stats_transient_sample_pages + optional: '[1-4294967295]' + restart: false + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: innodb_status_output + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: innodb_status_output_locks + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: innodb_strict_mode + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '30' + divisibilityFactor: 1 + mode: readwrite + name: innodb_sync_spin_loops + optional: '[0-4294967295]' + restart: false + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: innodb_table_locks + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: innodb_thread_concurrency + optional: '[0-1000]' + restart: false + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: innodb_thread_sleep_delay + optional: '[0-1000000]' + restart: false + unit: INT + - defaultValue: '7200' + divisibilityFactor: 1 + mode: readwrite + name: interactive_timeout + optional: '[10-86400]' + restart: false + unit: INT + - defaultValue: '{LEAST(DBInstanceClassMemory/1048576*128, 262144)}' + divisibilityFactor: 1 + mode: readwrite + name: join_buffer_size + optional: '[128-4294967295]' + restart: false + unit: INT + - defaultValue: '300' + divisibilityFactor: 100 + mode: readwrite + name: key_cache_age_threshold + optional: '[100-4294967295]' + restart: false + unit: INT + - defaultValue: '1024' + divisibilityFactor: 512 + mode: readwrite + name: key_cache_block_size + optional: '[512-16384]' + restart: false + unit: B + - defaultValue: '100' + divisibilityFactor: 1 + mode: readwrite + name: key_cache_division_limit + optional: '[1-100]' + restart: false + unit: INT + - defaultValue: en_US + divisibilityFactor: 0 + mode: readwrite + name: lc_time_names + optional: '[ja_JP|pt_BR|en_US]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: local_infile + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '31536000' + divisibilityFactor: 1 + mode: readwrite + name: lock_wait_timeout + optional: '[1-1073741824]' + restart: false + unit: INT + - defaultValue: '1' + divisibilityFactor: 0 + mode: readwrite + name: log_bin_use_v1_row_events + optional: '[0|1]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: log_queries_not_using_indexes + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: log_slow_admin_statements + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: log_throttle_queries_not_using_indexes + optional: '[0-4294967295]' + restart: false + unit: INT + - defaultValue: '1' + divisibilityFactor: 6 + mode: readwrite + name: long_query_time + optional: '[0.1-31536000]' + restart: false + unit: DOUBLE + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_force_memory_to_innodb + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_force_myisam_to_innodb + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_ignore_index_hint_error + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: loose_implicit_primary_key + optional: '[0-1]' + restart: false + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_innodb_log_compressed_pages + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on,use_index_extensions=on + divisibilityFactor: 0 + mode: readwrite + name: loose_optimizer_switch + optional: .* + restart: false + unit: STRING + - defaultValue: enabled=off,one_line=off + divisibilityFactor: 0 + mode: readwrite + name: loose_optimizer_trace + optional: .* + restart: false + unit: STRING + - defaultValue: greedy_search=on,range_optimizer=on,dynamic_range=on,repeated_subselect=on + divisibilityFactor: 0 + mode: readwrite + name: loose_optimizer_trace_features + optional: .* + restart: false + unit: STRING + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: loose_performance_agent_enabled + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '100' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_agent_file_size + optional: '[10-1000]' + restart: false + unit: INT + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_agent_interval + optional: '[1-60]' + restart: false + unit: INT + - defaultValue: '100000' + divisibilityFactor: 1 + mode: readwrite + name: loose_rds_audit_log_row_limit + optional: '[0-100000000]' + restart: false + unit: INT + - defaultValue: MYSQL_V1 + divisibilityFactor: 0 + mode: readwrite + name: loose_rds_audit_log_version + optional: '[MYSQL_V1|MYSQL_V3]' + restart: false + unit: STRING + - defaultValue: '2048' + divisibilityFactor: 1 + mode: readwrite + name: loose_rds_audit_max_sql_size + optional: '[0-10000000]' + restart: false + unit: INT + - defaultValue: XA_RECOVER_ADMIN + divisibilityFactor: 0 + mode: readwrite + name: loose_rds_expose_priv_list + optional: .* + restart: false + unit: STRING + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: loose_rds_force_myisam_to_innodb + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_rpl_semi_sync_master_enabled + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '1000' + divisibilityFactor: 1 + mode: readwrite + name: loose_rpl_semi_sync_master_timeout + optional: '[0-18446744073709551615]' + restart: false + unit: INT + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: loose_rpl_semi_sync_master_trace_level + optional: '[1|16|32|64]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_rpl_semi_sync_master_wait_no_slave + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: AFTER_SYNC + divisibilityFactor: 0 + mode: readwrite + name: loose_rpl_semi_sync_master_wait_point + optional: '[AFTER_SYNC|AFTER_COMMIT]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 1 + mode: readwrite + name: loose_rpl_semi_sync_slave_enabled + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: loose_rpl_semi_sync_slave_trace_level + optional: '[1|16|32|64]' + restart: false + unit: STRING + - defaultValue: '"*"' + divisibilityFactor: 0 + mode: readwrite + name: loose_session_track_system_variables + optional: .* + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_session_track_transaction_info + optional: '[STATE|CHARACTERISTICS|OFF]' + restart: false + unit: STRING + - defaultValue: '8' + divisibilityFactor: 1 + mode: readwrite + name: loose_slave_parallel_workers + optional: '[0-1024]' + restart: false + unit: INT + - defaultValue: '8' + divisibilityFactor: 1 + mode: readwrite + name: loose_validate_password_length + optional: '[1-12]' + restart: false + unit: INT + - defaultValue: '0' + divisibilityFactor: 0 + mode: readwrite + name: low_priority_updates + optional: '[0|1]' + restart: false + unit: STRING + - defaultValue: TABLE + divisibilityFactor: 0 + mode: readwrite + name: master_info_repository + optional: '[TABLE|FILE]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: master_verify_checksum + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '1073741824' + divisibilityFactor: 1 + mode: readwrite + name: max_allowed_packet + optional: '[16384-1073741824]' + restart: false + unit: INT + - defaultValue: '18446744073709551615' + divisibilityFactor: 1 + mode: readwrite + name: max_binlog_cache_size + optional: '[4096-18446744073709547520]' + restart: false + unit: INT + - defaultValue: '18446744073709551615' + divisibilityFactor: 4096 + mode: readwrite + name: max_binlog_stmt_cache_size + optional: '[4096-18446744073709547520]' + restart: false + unit: INT + - defaultValue: '100' + divisibilityFactor: 1 + mode: readwrite + name: max_connect_errors + optional: '[0-4294967295]' + restart: false + unit: INT + - defaultValue: '64' + divisibilityFactor: 1 + mode: readwrite + name: max_error_count + optional: '[0-65535]' + restart: false + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: max_execution_time + optional: '[0-4294967295]' + restart: false + unit: INT + - defaultValue: '67108864' + divisibilityFactor: 1024 + mode: readwrite + name: max_heap_table_size + optional: '[16384-1844674407370954752]' + restart: false + unit: INT + - defaultValue: '18446744073709551615' + divisibilityFactor: 1 + mode: readwrite + name: max_join_size + optional: '[1-18446744073709551615]' + restart: false + unit: INT + - defaultValue: '1024' + divisibilityFactor: 1 + mode: readwrite + name: max_length_for_sort_data + optional: '[0-838860]' + restart: false + unit: INT + - defaultValue: '16382' + divisibilityFactor: 1 + mode: readwrite + name: max_prepared_stmt_count + optional: '[0-1048576]' + restart: false + unit: INT + - defaultValue: '18446744073709551615' + divisibilityFactor: 1 + mode: readwrite + name: max_seeks_for_key + optional: '[1-18446744073709551615]' + restart: false + unit: INT + - defaultValue: '1024' + divisibilityFactor: 1 + mode: readwrite + name: max_sort_length + optional: '[4-8388608]' + restart: false + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: max_sp_recursion_depth + optional: '[0-255]' + restart: false + unit: INT + - defaultValue: '102400' + divisibilityFactor: 1 + mode: readwrite + name: max_write_lock_count + optional: '[1-102400]' + restart: false + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: min_examined_row_limit + optional: '[0-4294967295]' + restart: false + unit: INT + - defaultValue: '262144' + divisibilityFactor: 1 + mode: readwrite + name: myisam_sort_buffer_size + optional: '[262144-16777216]' + restart: false + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: mysql_native_password_proxy_users + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '16384' + divisibilityFactor: 1024 + mode: readwrite + name: net_buffer_length + optional: '[1024-1048576]' + restart: false + unit: INT + - defaultValue: '30' + divisibilityFactor: 1 + mode: readwrite + name: net_read_timeout + optional: '[1-18446744073709551615]' + restart: false + unit: INT + - defaultValue: '10' + divisibilityFactor: 1 + mode: readwrite + name: net_retry_count + optional: '[1-4294967295]' + restart: false + unit: INT + - defaultValue: '60' + divisibilityFactor: 1 + mode: readwrite + name: net_write_timeout + optional: '[1-18446744073709551615]' + restart: false + unit: INT + - defaultValue: '1' + divisibilityFactor: 0 + mode: readwrite + name: optimizer_prune_level + optional: '[0|1]' + restart: false + unit: STRING + - defaultValue: '62' + divisibilityFactor: 1 + mode: readwrite + name: optimizer_search_depth + optional: '[0-62]' + restart: false + unit: INT + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: optimizer_trace_limit + optional: '[0-4294967295]' + restart: false + unit: INT + - defaultValue: '16384' + divisibilityFactor: 1 + mode: readwrite + name: optimizer_trace_max_mem_size + optional: '[0-4294967295]' + restart: false + unit: INT + - defaultValue: '-1' + divisibilityFactor: 1 + mode: readwrite + name: optimizer_trace_offset + optional: '[-2147483648-2147483647]' + restart: false + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: loose_opt_enable_rds_priv_strategy + optional: 'ON' + restart: false + unit: STRING + - defaultValue: '2' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_point_iostat_interval + optional: '[0-60]' + restart: false + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: loose_performance_point_lock_rwlock_enabled + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '32768' + divisibilityFactor: 1 + mode: readwrite + name: preload_buffer_size + optional: '[1024-1073741824]' + restart: false + unit: INT + - defaultValue: '8192' + divisibilityFactor: 1024 + mode: readwrite + name: query_alloc_block_size + optional: '[1024-16384]' + restart: false + unit: INT + - defaultValue: '8192' + divisibilityFactor: 1024 + mode: readwrite + name: query_prealloc_size + optional: '[8192-1048576]' + restart: false + unit: INT + - defaultValue: '4096' + divisibilityFactor: 1 + mode: readwrite + name: range_alloc_block_size + optional: '[4096-18446744073709551615]' + restart: false + unit: INT + - defaultValue: '8388608' + divisibilityFactor: 1 + mode: readwrite + name: range_optimizer_max_mem_size + optional: '[0-18446744073709551615]' + restart: false + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: rds_audit_log_enabled + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '{LEAST(DBInstanceClassMemory/1048576*128, 262144)}' + divisibilityFactor: 1 + mode: readwrite + name: read_buffer_size + optional: '[8200-2147479552]' + restart: false + unit: INT + - defaultValue: TABLE + divisibilityFactor: 0 + mode: readwrite + name: relay_log_info_repository + optional: '[TABLE|FILE]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: session_track_gtids + optional: '[OFF|OWN_GTID|ALL_GTIDS]' + restart: false + unit: STRING + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: session_track_schema + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: session_track_state_change + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: sha256_password_proxy_users + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: show_old_temporals + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: strict + divisibilityFactor: 0 + mode: readwrite + name: slave_exec_mode + optional: strict + restart: false + unit: STRING + - defaultValue: '60' + divisibilityFactor: 1 + mode: readwrite + name: slave_net_timeout + optional: '[15-300]' + restart: false + unit: INT + - defaultValue: '2' + divisibilityFactor: 1 + mode: readwrite + name: slow_launch_time + optional: '[1-1024]' + restart: false + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: slow_query_log + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '868352' + divisibilityFactor: 1 + mode: readwrite + name: sort_buffer_size + optional: '[32768-4294967295]' + restart: false + unit: INT + - defaultValue: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION + divisibilityFactor: 0 + mode: readwrite + name: sql_mode + optional: (s*|REAL_AS_FLOAT|PIPES_AS_CONCAT|ANSI_QUOTES|IGNORE_SPACE|ONLY_FULL_GROUP_BY|NO_UNSIGNED_SUBTRACTION|NO_DIR_IN_CREATE|ANSI|NO_AUTO_VALUE_ON_ZERO|NO_BACKSLASH_ESCAPES|STRICT_TRANS_TABLES|STRICT_ALL_TABLES|NO_ZERO_IN_DATE|NO_ZERO_DATE|ALLOW_INVALID_DATES|ERROR_FOR_DIVISION_BY_ZERO|TRADITIONAL|HIGH_NOT_PRECEDENCE|NO_ENGINE_SUBSTITUTION|PAD_CHAR_TO_FULL_LENGTH)(,REAL_AS_FLOAT|,PIPES_AS_CONCAT|,ANSI_QUOTES|,IGNORE_SPACE|,ONLY_FULL_GROUP_BY|,NO_UNSIGNED_SUBTRACTION|,NO_DIR_IN_CREATE|,ANSI|,NO_AUTO_VALUE_ON_ZERO|,NO_BACKSLASH_ESCAPES|,STRICT_TRANS_TABLES|,STRICT_ALL_TABLES|,NO_ZERO_IN_DATE|,NO_ZERO_DATE|,ALLOW_INVALID_DATES|,ERROR_FOR_DIVISION_BY_ZERO|,TRADITIONAL|,HIGH_NOT_PRECEDENCE|,NO_ENGINE_SUBSTITUTION|,PAD_CHAR_TO_FULL_LENGTH)* + restart: false + unit: STRING + - defaultValue: '256' + divisibilityFactor: 1 + mode: readwrite + name: stored_program_cache + optional: '[16-524288]' + restart: false + unit: INT + - defaultValue: '{LEAST(DBInstanceClassMemory/1073741824*512, 2048)}' + divisibilityFactor: 1 + mode: readwrite + name: table_definition_cache + optional: '[400-524288]' + restart: false + unit: INT + - defaultValue: '{LEAST(DBInstanceClassMemory/1073741824*512, 8192)}' + divisibilityFactor: 1 + mode: readwrite + name: table_open_cache + optional: '[1-524288]' + restart: false + unit: INT + - defaultValue: '100' + divisibilityFactor: 1 + mode: readwrite + name: thread_cache_size + optional: '[0-16384]' + restart: false + unit: INT + - defaultValue: '2097152' + divisibilityFactor: 1 + mode: readwrite + name: tmp_table_size + optional: '[262144-134217728]' + restart: false + unit: INT + - defaultValue: '8192' + divisibilityFactor: 1024 + mode: readwrite + name: transaction_alloc_block_size + optional: '[1024-131072]' + restart: false + unit: INT + - defaultValue: READ-COMMITTED + divisibilityFactor: 0 + mode: readwrite + name: transaction_isolation + optional: '[READ-UNCOMMITTED|READ-COMMITTED|REPEATABLE-READ|SERIALIZABLE]' + restart: false + unit: STRING + - defaultValue: '4096' + divisibilityFactor: 1024 + mode: readwrite + name: transaction_prealloc_size + optional: '[1024-131072]' + restart: false + unit: INT + - defaultValue: 'YES' + divisibilityFactor: 0 + mode: readwrite + name: updatable_views_with_limit + optional: '[YES|NO]' + restart: false + unit: STRING + - defaultValue: '86400' + divisibilityFactor: 1 + mode: readwrite + name: wait_timeout + optional: '[1-31536000]' + restart: false + unit: INT + - defaultValue: WRITESET + divisibilityFactor: 1 + mode: readwrite + name: binlog_transaction_dependency_tracking + optional: '[WRITESET|WRITESET_SESSION|COMMIT_ORDER]' + restart: false + unit: STRING + - defaultValue: '2' + divisibilityFactor: 1 + mode: readwrite + name: log_error_verbosity + optional: '[1-3]' + restart: false + unit: INT + - defaultValue: '100' + divisibilityFactor: 1 + mode: readwrite + name: loose_binlog_group_delay + optional: '[0-1000000000]' + restart: false + unit: INT + - defaultValue: '100' + divisibilityFactor: 1 + mode: readwrite + name: loose_binlog_group_delay_running_threads + optional: '[0-100000]' + restart: false + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_ccl_max_waiting_count + optional: '[0-9223372036854775807]' + restart: false + unit: INT + - defaultValue: '4' + divisibilityFactor: 1 + mode: readwrite + name: loose_ccl_queue_bucket_count + optional: '[1-64]' + restart: false + unit: INT + - defaultValue: '64' + divisibilityFactor: 1 + mode: readwrite + name: loose_ccl_queue_bucket_size + optional: '[1-4096]' + restart: false + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_ccl_queue_hot_delete + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_ccl_queue_hot_update + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '86400' + divisibilityFactor: 1 + mode: readwrite + name: loose_ccl_wait_timeout + optional: '[1-31536000]' + restart: false + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_clear_log_file_pagecache + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '2048' + divisibilityFactor: 1 + mode: readwrite + name: loose_crash_sql_stmt_max_length + optional: '[1-1000000000]' + restart: false + unit: INT + - defaultValue: '86400' + divisibilityFactor: 1 + mode: readwrite + name: loose_information_schema_stats_expiry + optional: '[0-31536000]' + restart: false + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_innodb_buffer_pool_in_core_file + optional: '[OFF|ON]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_innodb_log_optimize_ddl + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '4096' + divisibilityFactor: 1 + mode: readwrite + name: loose_innodb_log_write_ahead_size + optional: '[512-16384]' + restart: false + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: loose_innodb_multi_blocks_enabled + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: loose_innodb_parallel_read_threads + optional: '[0-256]' + restart: false + unit: INT + - defaultValue: '100' + divisibilityFactor: 1 + mode: readwrite + name: loose_innodb_rds_chunk_flush_interval + optional: '[0-100000]' + restart: false + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 1 + mode: readwrite + name: loose_innodb_rds_faster_ddl + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '30' + divisibilityFactor: 1 + mode: readwrite + name: loose_innodb_rds_flashback_allow_gap + optional: '[0-10080]' + restart: false + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: loose_innodb_rds_flashback_enabled + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: loose_innodb_rds_flashback_interval + optional: '[1-10]' + restart: false + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: loose_innodb_rds_flashback_print_warning + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_innodb_rds_flashback_task_enabled + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'ON' + divisibilityFactor: 1 + mode: readwrite + name: loose_innodb_rds_free_resize + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_innodb_trx_resurrect_table_lock_accelerate + optional: '[OFF|ON]' + restart: false + unit: STRING + - defaultValue: '300' + divisibilityFactor: 1 + mode: readwrite + name: loose_innodb_undo_retention + optional: '[0-172800]' + restart: false + unit: INT + - defaultValue: '1024' + divisibilityFactor: 1 + mode: readwrite + name: loose_innodb_undo_space_reserved_size + optional: '[0-20480]' + restart: false + unit: INT + - defaultValue: '10240' + divisibilityFactor: 1 + mode: readwrite + name: loose_innodb_undo_space_supremum_size + optional: '[0-524288]' + restart: false + unit: INT + - defaultValue: MEMORY + divisibilityFactor: 0 + mode: readwrite + name: loose_internal_tmp_mem_storage_engine + optional: '[TempTable|MEMORY]' + restart: false + unit: STRING + - defaultValue: '10' + divisibilityFactor: 1 + mode: readwrite + name: loose_keyring_rds_command_timeout_sec + optional: '[1-999999]' + restart: false + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_multi_blocks_count + optional: '[0-1024]' + restart: false + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_multi_blocks_ddl_count + optional: '[0-1024]' + restart: false + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 1 + mode: readwrite + name: loose_persist_binlog_to_redo + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '1048576' + divisibilityFactor: 1 + mode: readwrite + name: loose_persist_binlog_to_redo_size_limit + optional: '[0-10485760]' + restart: false + unit: STRING + - defaultValue: '16777216' + divisibilityFactor: 1 + mode: readwrite + name: loose_rds_audit_log_buffer_size + optional: '[16777216-104857600]' + restart: false + unit: INT + - defaultValue: '8192' + divisibilityFactor: 1 + mode: readwrite + name: loose_rds_audit_log_event_buffer_size + optional: '[0-32768]' + restart: false + unit: INT + - defaultValue: '' + divisibilityFactor: 0 + mode: readwrite + name: loose_rds_data_protect_admin + optional: .* + restart: false + unit: STRING + - defaultValue: '' + divisibilityFactor: 0 + mode: readwrite + name: loose_rds_data_protect_ignore + optional: .* + restart: false + unit: STRING + - defaultValue: NONE + divisibilityFactor: 0 + mode: readwrite + name: loose_rds_data_protect_level + optional: '[ALL|DDL|NONE]' + restart: false + unit: STRING + - defaultValue: aurora,Xtrabak,replicator,eagleye,aliyun_root + divisibilityFactor: 0 + mode: readwrite + name: loose_rds_ignore_password_validation_user_list + optional: .* + restart: false + unit: STRING + - defaultValue: '' + divisibilityFactor: 0 + mode: readwrite + name: loose_rds_kill_user_list + optional: .* + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_recycle_bin + optional: '[OFF|ON]' + restart: false + unit: STRING + - defaultValue: '604800' + divisibilityFactor: 86400 + mode: readwrite + name: loose_recycle_bin_retention + optional: '[86400-1209600]' + restart: false + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_recycle_scheduler + optional: '[OFF|ON]' + restart: false + unit: STRING + - defaultValue: '30' + divisibilityFactor: 30 + mode: readwrite + name: loose_recycle_scheduler_interval + optional: '[30-120]' + restart: false + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_recycle_scheduler_purge_table_print + optional: '[OFF|ON]' + restart: false + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_sql_safe_updates + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: 'ON' + divisibilityFactor: 1 + mode: readwrite + name: loose_thread_pool_enabled + optional: '[ON|OFF]' + restart: false + unit: STRING + - defaultValue: '32' + divisibilityFactor: 1 + mode: readwrite + name: loose_thread_pool_oversubscribe + optional: '[10-64]' + restart: false + unit: INT + - defaultValue: '4' + divisibilityFactor: 1 + mode: readwrite + name: loose_thread_pool_size + optional: '[1-64]' + restart: false + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: loose_wait_binlog_flush + optional: '[OFF|ON]' + restart: false + unit: STRING + - defaultValue: '65536' + divisibilityFactor: 1 + mode: readwrite + name: max_points_in_geometry + optional: '[3-1048576]' + restart: false + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: sync_master_info + optional: '[0-18446744073709551615]' + restart: false + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: sync_relay_log_info + optional: '[0-18446744073709551615]' + restart: false + unit: INT + - defaultValue: '1073741824' + divisibilityFactor: 1 + mode: readwrite + name: temptable_max_ram + optional: '[2097152-107374182400]' + restart: false + unit: INT + - defaultValue: XXHASH64 + divisibilityFactor: 1 + mode: readwrite + name: transaction_write_set_extraction + optional: '[OFF|MURMUR32|XXHASH64]' + restart: false + unit: STRING + - defaultValue: '3000' + divisibilityFactor: 1 + mode: readwrite + name: back_log + optional: '[0-65535]' + restart: true + unit: INT + - defaultValue: CRC32 + divisibilityFactor: 0 + mode: readwrite + name: binlog_checksum + optional: '[CRC32|NONE]' + restart: true + unit: STRING + - defaultValue: utf8 + divisibilityFactor: 0 + mode: readwrite + name: character_set_server + optional: '[utf8|latin1|gbk|gb18030|utf8mb4]' + restart: true + unit: STRING + - defaultValue: InnoDB + divisibilityFactor: 0 + mode: readwrite + name: default_storage_engine + optional: '[InnoDB|innodb]' + restart: true + unit: STRING + - defaultValue: SYSTEM + divisibilityFactor: 0 + mode: readwrite + name: default_time_zone + optional: '[SYSTEM|-12:00|-11:00|-10:00|-9:00|-8:00|-7:00|-6:00|-5:00|-4:00|-3:00|-2:00|-1:00|\+0:00|\+1:00|\+2:00|\+3:00|\+4:00|\+5:00|\+5:30|\+5:45|\+6:00|\+6:30|\+7:00|\+8:00|\+9:00|\+10:00|\+11:00|\+12:00|\+13:00]' + restart: true + unit: STRING + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: disconnect_on_expired_password + optional: '[ON|OFF]' + restart: true + unit: STRING + - defaultValue: '84' + divisibilityFactor: 1 + mode: readwrite + name: ft_max_word_len + optional: '[10-4294967295]' + restart: true + unit: INT + - defaultValue: '4' + divisibilityFactor: 1 + mode: readwrite + name: ft_min_word_len + optional: '[1-3600]' + restart: true + unit: INT + - defaultValue: '20' + divisibilityFactor: 1 + mode: readwrite + name: ft_query_expansion_limit + optional: '[0-1000]' + restart: true + unit: INT + - defaultValue: '2' + divisibilityFactor: 0 + mode: readwrite + name: innodb_autoinc_lock_mode + optional: '[0|1|2]' + restart: true + unit: STRING + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: innodb_buffer_pool_instances + optional: '[1-64]' + restart: true + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: innodb_buffer_pool_load_at_startup + optional: '[ON|OFF]' + restart: true + unit: STRING + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: innodb_commit_concurrency + optional: '[0-1000]' + restart: true + unit: INT + - defaultValue: O_DIRECT + divisibilityFactor: 0 + mode: readwrite + name: innodb_flush_method + optional: '[fsync|O_DSYNC|littlesync|nosync|O_DIRECT|O_DIRECT_NO_FSYNC]' + restart: true + unit: STRING + - defaultValue: '8000000' + divisibilityFactor: 1 + mode: readwrite + name: innodb_ft_cache_size + optional: '[1600000-80000000]' + restart: true + unit: INT + - defaultValue: '84' + divisibilityFactor: 1 + mode: readwrite + name: innodb_ft_max_token_size + optional: '[10-84]' + restart: true + unit: INT + - defaultValue: '3' + divisibilityFactor: 1 + mode: readwrite + name: innodb_ft_min_token_size + optional: '[0-16]' + restart: true + unit: INT + - defaultValue: '2' + divisibilityFactor: 1 + mode: readwrite + name: innodb_ft_sort_pll_degree + optional: '[1-16]' + restart: true + unit: INT + - defaultValue: '640000000' + divisibilityFactor: 1 + mode: readwrite + name: innodb_ft_total_cache_size + optional: '[32000000-1600000000]' + restart: true + unit: INT + - defaultValue: '104857600' + divisibilityFactor: 1024 + mode: readwrite + name: innodb_log_file_size + optional: '[4194304-107374182400]' + restart: true + unit: INT + - defaultValue: '3000' + divisibilityFactor: 1 + mode: readwrite + name: innodb_open_files + optional: '[10-2147483647]' + restart: true + unit: INT + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: innodb_page_cleaners + optional: '[1-64]' + restart: true + unit: INT + - defaultValue: '300' + divisibilityFactor: 1 + mode: readwrite + name: innodb_purge_batch_size + optional: '[1-5000]' + restart: true + unit: INT + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: innodb_purge_threads + optional: '[1-32]' + restart: true + unit: INT + - defaultValue: '4' + divisibilityFactor: 1 + mode: readwrite + name: innodb_read_io_threads + optional: '[1-64]' + restart: true + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: innodb_rollback_on_timeout + optional: '[OFF|ON]' + restart: true + unit: STRING + - defaultValue: '1048576' + divisibilityFactor: 512 + mode: readwrite + name: innodb_sort_buffer_size + optional: '[65536-67108864]' + restart: true + unit: INT + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: innodb_sync_array_size + optional: '[1-64]' + restart: true + unit: INT + - defaultValue: '4' + divisibilityFactor: 1 + mode: readwrite + name: innodb_write_io_threads + optional: '[1-64]' + restart: true + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_auto_detect_certs + optional: '[ON|OFF]' + restart: true + unit: STRING + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: loose_innodb_numa_interleave + optional: '[ON|OFF]' + restart: true + unit: STRING + - defaultValue: /usr/bin/kms_agent + divisibilityFactor: 0 + mode: readwrite + name: loose_keyring_rds_kms_agent_cmd + optional: .* + restart: true + unit: STRING + - defaultValue: '2' + divisibilityFactor: 1 + mode: readwrite + name: ngram_token_size + optional: '[0-20]' + restart: true + unit: int + - defaultValue: '65535' + divisibilityFactor: 1 + mode: readwrite + name: open_files_limit + optional: '[1-2147483647]' + restart: true + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: opt_indexstat + optional: '[ON|OFF]' + restart: true + unit: STRING + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: opt_tablestat + optional: '[ON|OFF]' + restart: true + unit: STRING + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_point_iostat_volume_size + optional: '[0-100000]' + restart: true + unit: INT + - defaultValue: '{LEAST(DBInstanceClassMemory/8589934592, 1)}' + divisibilityFactor: 1 + mode: readwrite + name: performance_schema + optional: '[0-1]' + restart: true + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: relay_log_recovery + optional: '[ON|OFF]' + restart: true + unit: STRING + - defaultValue: LOGICAL_CLOCK + divisibilityFactor: 0 + mode: readwrite + name: slave_parallel_type + optional: '[DATABASE|LOGICAL_CLOCK]' + restart: true + unit: STRING + - defaultValue: '16' + divisibilityFactor: 1 + mode: readwrite + name: table_open_cache_instances + optional: '[1-64]' + restart: true + unit: INT + - defaultValue: '262144' + divisibilityFactor: 1024 + mode: readwrite + name: thread_stack + optional: '[131072-2147483647]' + restart: true + unit: INT + - defaultValue: TLSv1,TLSv1.1,TLSv1.2 + divisibilityFactor: 0 + mode: readwrite + name: tls_version + optional: '[TLSv1,TLSv1.1,TLSv1.2|TLSv1,TLSv1.1|TLSv1.2]' + restart: true + unit: STRING + - defaultValue: mysql_native_password + divisibilityFactor: 0 + mode: readwrite + name: default_authentication_plugin + optional: '[mysql_native_password|sha256_password|caching_sha2_password]' + restart: true + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: general_log + optional: 'OFF' + restart: true + unit: STRING + - defaultValue: '33554432' + divisibilityFactor: 1048576 + mode: readwrite + name: innodb_buffer_pool_chunk_size + optional: '[1048576-9223372036854775807]' + restart: true + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: loose_async_binlog_recovery + optional: '[OFF|ON]' + restart: true + unit: STRING + - defaultValue: '20971520' + divisibilityFactor: 1 + mode: readwrite + name: loose_binlog_buffer_size + optional: '[20971520-1073741824]' + restart: true + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: loose_innodb_buffer_pool_init_optimize + optional: '[OFF|ON]' + restart: true + unit: STRING + - defaultValue: '64' + divisibilityFactor: 1 + mode: readwrite + name: loose_innodb_doublewrite_pages + optional: '[0-512]' + restart: true + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_accounts_size + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_digests_size + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_error_size + optional: '[0-1048576]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_events_stages_history_long_size + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_events_stages_history_size + optional: '[-1-1024]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_events_statements_history_long_size + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_events_statements_history_size + optional: '[-1-1024]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_events_transactions_history_long_size + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_events_transactions_history_size + optional: '[-1-1024]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_events_waits_history_long_size + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_events_waits_history_size + optional: '[-1-1024]' + restart: true + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_hosts_size + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_cond_classes + optional: '[0-256]' + restart: true + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_cond_instances + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_digest_length + optional: '[0-1048576]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_digest_sample_age + optional: '[0-1048576]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_file_classes + optional: '[0-256]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_file_handles + optional: '[-1-32768]' + restart: true + unit: INT + - defaultValue: '1000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_file_instances + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_index_stat + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_memory_classes + optional: '[0-1024]' + restart: true + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_metadata_locks + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_mutex_classes + optional: '[0-256]' + restart: true + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_mutex_instances + optional: '[-1-104857600]' + restart: true + unit: INT + - defaultValue: '1000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_prepared_statements_instances + optional: '[-1-4194304]' + restart: true + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_program_instances + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_rwlock_classes + optional: '[0-256]' + restart: true + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_rwlock_instances + optional: '[-1-104857600]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_socket_classes + optional: '[0-256]' + restart: true + unit: INT + - defaultValue: '1000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_socket_instances + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_sql_text_length + optional: '[0-1048576]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_stage_classes + optional: '[0-256]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_statement_classes + optional: '[0-256]' + restart: true + unit: INT + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_statement_stack + optional: '[0-256]' + restart: true + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_table_handles + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '1000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_table_instances + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_table_lock_stat + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_thread_classes + optional: '[0-256]' + restart: true + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_max_thread_instances + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_session_connect_attrs_size + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_setup_actors_size + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_setup_objects_size + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: '10000' + divisibilityFactor: 1 + mode: readwrite + name: loose_performance_schema_users_size + optional: '[-1-1048576]' + restart: true + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_recovery_apply_binlog + optional: '[ON|OFF]' + restart: true + unit: STRING + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_xengine + optional: '[1|0]' + restart: true + unit: STRING + - defaultValue: '' + divisibilityFactor: 0 + mode: readwrite + name: slave_type_conversions + optional: '[s*|ALL_LOSSY|ALL_NON_LOSSY|ALL_SIGNED|ALL_UNSIGNED]' + restart: true + unit: STRING + - defaultValue: '4' + divisibilityFactor: 1 + mode: readwrite + name: consensus_io_thread_cnt + optional: .* + restart: true + unit: INT + - defaultValue: '4' + divisibilityFactor: 1 + mode: readwrite + name: consensus_worker_thread_cnt + optional: .* + restart: true + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: enforce_gtid_consistency + optional: .* + restart: true + unit: STRING + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: expire_logs_days + optional: .* + restart: true + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: gtid_mode + optional: .* + restart: true + unit: STRING + - defaultValue: '8388608' + divisibilityFactor: 1 + mode: readwrite + name: innodb_log_buffer_size + optional: .* + restart: true + unit: INT + - defaultValue: '16777216' + divisibilityFactor: 1 + mode: readwrite + name: key_buffer_size + optional: .* + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: loose_kill_idle_transaction_timeout + optional: .* + restart: true + unit: INT + - defaultValue: '1' + divisibilityFactor: 0 + mode: readwrite + name: log_slave_updates + optional: .* + restart: true + unit: INT + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: loose_binlog_order_commits + optional: .* + restart: true + unit: STRING + - defaultValue: '500' + divisibilityFactor: 1 + mode: readwrite + name: loose_rds_reserved_connections + optional: .* + restart: true + unit: INT + - defaultValue: '"one-thread-per-connection"' + divisibilityFactor: 0 + mode: readwrite + name: loose_thread_handling + optional: .* + restart: true + unit: STRING + - defaultValue: '2520' + divisibilityFactor: 1 + mode: readwrite + name: max_connections + optional: .* + restart: true + unit: INT + - defaultValue: '2000' + divisibilityFactor: 1 + mode: readwrite + name: max_user_connections + optional: .* + restart: true + unit: INT + - defaultValue: '442368' + divisibilityFactor: 1 + mode: readwrite + name: read_rnd_buffer_size + optional: .* + restart: true + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: relay_log_purge + optional: .* + restart: true + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: replicate-same-server-id + optional: .* + restart: true + unit: STRING + - defaultValue: '' + divisibilityFactor: 0 + mode: readwrite + name: rotate_log_table_last_name + optional: .* + restart: true + unit: STRING + - defaultValue: 'OFF' + divisibilityFactor: 0 + mode: readwrite + name: skip_slave_start + optional: .* + restart: true + unit: STRING + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: skip_ssl + optional: .* + restart: true + unit: STRING + - defaultValue: '1073741824' + divisibilityFactor: 1 + mode: readwrite + name: slave_pending_jobs_size_max + optional: .* + restart: true + unit: INT + - defaultValue: 'ON' + divisibilityFactor: 0 + mode: readwrite + name: slave_sql_verify_checksum + optional: .* + restart: true + unit: STRING + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: sync_binlog + optional: .* + restart: true + unit: INT + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: sync_relay_log + optional: .* + restart: true + unit: INT + - defaultValue: global + divisibilityFactor: 0 + mode: readwrite + name: innodb_tcn_cache_level + optional: .* + restart: true + unit: STRING + - defaultValue: '1' + divisibilityFactor: 1 + mode: readwrite + name: innodb_snapshot_update_gcn + optional: .* + restart: true + unit: INT + - defaultValue: '0' + divisibilityFactor: 1 + mode: readwrite + name: innodb_equal_gcn_visible + optional: .* + restart: true + unit: INT \ No newline at end of file diff --git a/ops/configuration/4-dynamic-parameter.md b/ops/configuration/4-dynamic-parameter.md new file mode 100644 index 0000000..2093dc9 --- /dev/null +++ b/ops/configuration/4-dynamic-parameter.md @@ -0,0 +1,66 @@ +## 动态参数 +PolarDB-X Operator从1.3.0版本开始支持动态参数功能 + +在实例运行时,可以通过指定动态参数文件来修改CN和DN的参数. + +动态参数需要通过yaml文件的形式进行配置。 + +```shell +kubectl apply -f {动态参数文件名称}.yaml +``` + +### 动态参数说明 + +动态参数在应用时需要指定基础的参数模板和实例的名称,当名称不存在时,会验证失败。 +此外,动态参数需要通过参数模板中属性的校验,否则也会验证失败。 + +注:由于部分参数在修改后需要重启实例,所以需要指定重启方式,包括直接重启(restart)和滚动重启(rollingRestart)两种,目前DN只支持滚动重启。 + +在参数列表中,每个参数需要指定2个属性,包括: +- name(名称) + - 参数名称 +- value(取值) + - 参数的取值,格式为字符串 + +动态参数的样例如下: + +```yaml +# 添加动态参数 +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXParameter +metadata: + name: test-param + labels: + parameter: dynamic +spec: + # 实例名称 + clusterName: pxc + # 参数模板名称 + templateName: product + nodeType: + cn: + name: cn-parameter + # 重启方式 + restartType: rollingRestart + # 参数列表 + paramList: + - name: CONN_POOL_MAX_POOL_SIZE + value: "1000" + dn: + name: dn-parameter + restartType: rollingRestart + paramList: + - name: autocommit + value: "OFF" + - ... +``` + +### 查看动态参数 + +可以通过如下命令查看已配置的所有动态参数。 + +```shell +kubectl get PolarDBXParameter +# 或者可用简称 +kubectl get pxp +``` \ No newline at end of file diff --git a/ops/configuration/README.md b/ops/configuration/README.md new file mode 100644 index 0000000..ec5bff3 --- /dev/null +++ b/ops/configuration/README.md @@ -0,0 +1,23 @@ +参数设置 +=========== + +### CN 静态参数 + +[CN 静态参数](./1-cn-variable-at-startup.md) + +### CN 动态参数 +1. [创建数据库参数操作对象](./1-cn-variable-load-at-runtime-create-db.md) + +2. [修改数据库参数](./1-cn-variable-load-at-runtime-update-db.md) + +3. [删除数据库参数操作对象](./1-cn-variable-load-at-runtime-delete-db.md) + +### DN 参数 + +[DN 参数](./2-dn-variable.md) + +### 参数模板 +[参数模板](./3-parameter-template.md) + +### 动态参数 +[动态参数](./4-dynamic-parameter.md) \ No newline at end of file diff --git a/ops/connection/1-account.md b/ops/connection/1-account.md new file mode 100644 index 0000000..c24ace5 --- /dev/null +++ b/ops/connection/1-account.md @@ -0,0 +1,13 @@ +PolarDB-X 默认的 root 账号都是: polardbx_root,您在登录后可以通过[权限管理语句](https://help.aliyun.com/document_detail/313296.html) 修改密码或者创建新的账号供业务访问。 + +polardbx_root 账号的密码随机生成,执行下面的命令获取 PolarDB-X root 账号的密码: + +```shell +kubectl get secret {PolarDB-X 集群名} -o jsonpath="{.data['polardbx_root']}" | base64 -d - | xargs echo "Password: " +``` + +期望输出: + +```shell +Password: ******* +``` diff --git a/ops/connection/2-connect-in-cluster.md b/ops/connection/2-connect-in-cluster.md new file mode 100644 index 0000000..f45d99e --- /dev/null +++ b/ops/connection/2-connect-in-cluster.md @@ -0,0 +1,29 @@ +## 非 CN Pod 访问 +如果你在 K8s 集群内的pod上访问 PolarDB-X,可以直接通过 cluster-ip 访问。 创建 PolarDB-X 集群时,PolarDB-X Operator 同时会为集群创建用于访问的服务,默认是 ClusterIP 类型。使用下面的命令查看用于访问的服务: + +```shell +$ kubectl get svc {PolarDB-X 集群名} +``` + +期望输出: + +```shell +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +quick-start ClusterIP 10.110.214.223 3306/TCP,8081/TCP 5m25s +``` + +如果您是在 K8s 集群内进行访问,可以直接使用上面输出的 Cluster-IP 即可。PolarDB-X 服务默认的端口都是 3306. +> ClusterIP 是通过 K8s 集群的内部 IP 暴露服务,选择该访问方式时,只能在集群内部访问 + +执行如下命令,输入上面获取的密码后,即可连接 PolarDB-X: + +```shell +mysql -h10.110.214.223 -P3306 -upolardbx_root -p +``` + +> **说明: ** +> - 此处**-P**为大写字母,默认端口为3306。 +> - 为保障密码安全,**-p**后请不要填写密码,会在执行整行命令后提示您输入密码,输入后按回车即可登录。 + +## CN Pod 内访问 +直接在cn pod 内输入 myc 命令即可登录 diff --git a/ops/connection/3-connect-outside-cluster.md b/ops/connection/3-connect-outside-cluster.md new file mode 100644 index 0000000..257818b --- /dev/null +++ b/ops/connection/3-connect-outside-cluster.md @@ -0,0 +1,36 @@ +### 通过 port-forward 转发到本地访问 +如果您在 K8s 集群外想访问 PolarDB-X 数据库,但是没有配置 LoadBalancer, 可以通过如下命令将服务的 3306 端口转发到本地,并且保持转发进程存活。 + +```shell +kubectl port-forward svc/{PolarDB-X 集群名} 3306 +``` + +> 如果您机器的3306端口被占用,可以通过如下命令将服务转发到指定的端口上:kubectl port-forward svc/{PolarDB-X 集群名} {新端口}:3306 + +新开一个终端,执行如下命令即可连接 PolarDB-X: + +```shell +mysql -h127.0.0.1 -P{转发端口} -upolardbx_root -p +``` + +> **说明: ** +> - 此处**-P**为大写字母,默认端口为3306。 +> - 为保障密码安全,**-p**后请不要填写密码,会在执行整行命令后提示您输入密码,输入后按回车即可登录。 + +### 通过 NodePort 访问 +如果创建 PolarDB-X 集群的时候指定了 [serviceType: LoadBalancer](../../api/polardbxcluster.md) ,也可以直接通过 NodePort的方式进行访问。 + +通过如下命令获取所有的 nodePort: + +```shell +kubectl get svc -l polardbx/name={集群名},polardbx/cn-type=rw -o jsonpath="{.items[0].spec.ports[0].nodePort}" | xargs echo "NodePort:" +``` + +通过如下命令获取 IP 列表: + +```shell +kubectl get pods -l polardbx/name={集群名},polardbx/role=cn -o jsonpath="{range .items[*]}{.status.hostIP}{'\n'}{end}" +``` + +通过上述结果中的任意 IP + NodePort 即可访问 PolarDB-X: +![image.png](./connect-to-polardb-x.png) diff --git a/ops/connection/4-connect-from-internet.md b/ops/connection/4-connect-from-internet.md new file mode 100644 index 0000000..09fac97 --- /dev/null +++ b/ops/connection/4-connect-from-internet.md @@ -0,0 +1,25 @@ +### 通过 LoadBalancer 访问 +若运行在有 LoadBalancer 的环境,比如阿里云平台,建议使用云平台的 LoadBalancer 特性。在创建 PolarDB-X 集群时指定 `.spec.serviceType` 为 LoadBalancer,operator 将会自动创建类型为 LoadBalancer 的服务(Service),此时当云平台支持时 Kubernetes 会自动为该服务配置,如下所示: + +```bash +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +xxxxxxxxx LoadBalancer 192.168.247.39 8.209.29.16 3306:30612/TCP,8081:30370/TCP 28h +``` + +此时可使用 EXTERNAL-IP 所示的 IP 进行访问: + +```bash +mysql -h8.209.29.16 -P3306 -upolardbx_root -p +``` +### 通过机器的公网 IP 访问 +如果您为K8s 集群内的部分机器开启了公网ip 地址,可以通过 port-foward 的方式将实例的访问端口映射到有公网ip的机器上。 + +在有公网ip的机器上执行如下命令进行端口转发: + +```shell +kubectl port-forward svc/{PolarDB-X 集群名} 3306 --address=0.0.0.0 +``` + +注意:--address=0.0.0.0 需要加上,允许外部ip 访问 + +配置机器对应的安全组或者防火墙,允许3306 端口被外部机器访问。之后通过机器的公网IP + 3306 端口访问即可。 diff --git a/ops/connection/README.md b/ops/connection/README.md new file mode 100644 index 0000000..f2355a4 --- /dev/null +++ b/ops/connection/README.md @@ -0,0 +1,10 @@ +连接 PolarDB-X 数据库 +=================== + +[获取用户名密码](./1-account.md) + +[K8S 集群内连接](./2-connect-in-cluster.md) + +[K8S 集群外连接](./3-connect-outside-cluster.md) + +[公网连接](./4-connect-from-internet.md) \ No newline at end of file diff --git a/ops/connection/connect-to-polardb-x.png b/ops/connection/connect-to-polardb-x.png new file mode 100644 index 0000000..f585f2d Binary files /dev/null and b/ops/connection/connect-to-polardb-x.png differ diff --git a/ops/lifecycle/1-create-cdc-node-example.md b/ops/lifecycle/1-create-cdc-node-example.md new file mode 100644 index 0000000..90883ba --- /dev/null +++ b/ops/lifecycle/1-create-cdc-node-example.md @@ -0,0 +1,109 @@ +# CDC节点创建 +PolarDB-X CDC 组件内置于 PolarDB-X 实例中,想要体验 PolarDB-X CDC 的功能,需要拉起一个 PolarDB-X 集群。 +## 全局Binlog +* 通过 PXD 部署:参考 [通过PXD部署集群](https://doc.polardbx.com/quickstart/topics/quickstart-pxd-cluster.html),可以在拓扑文件中编辑 CDC 相关的标签值指定 CDC 集群的配置。 + * `image`:CDC 节点的镜像 + * `replica`:CDC 节点的个数 + * `nodes`:每个 CDC 节点的具体配置 + * `resources`:分配给 CDC 节点的内存等资源 +* 通过 K8S 部署:参考 [通过K8S部署](https://doc.polardbx.com/quickstart/topics/quickstart-k8s.html),默认会创建一个 CDC 节点,负责全局 Binlog 的生成。 +## Binlog多流 +Binlog 多流目前只支持使用 K8S 进行部署,在进行部署之前需要准备好`minikube`和`PolarDB-X Operator`环境,环境配置方法参考 [准备工作](https://doc.polardbx.com/operator/deployment/1-installation.html) 。 + +接下来,我们需要准备一个描述 PolarDB-X 集群的 YAML 文件,示例如下: +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXCluster +metadata: + name: polardbx-test +spec: + config: + cdc: + envs: + binlogx_stream_group_name: "group1" + binlogx_stream_count: "3" + binlogx_transmit_hash_level: "RECORD" + topology: + nodes: + cdc: + replicas: 2 + xReplicas: 2 + template: + resources: + limits: + cpu: "1" + memory: 1Gi + requests: + cpu: 500m + memory: 500Mi + image: polardbx/polardbx-cdc:latest + cn: + replicas: 1 + template: + resources: + limits: + cpu: "2" + memory: 4Gi + requests: + cpu: 500m + memory: 1Gi + image: polardbx/polardbx-sql:latest + dn: + replicas: 2 + template: + engine: galaxy + resources: + limits: + cpu: "2" + memory: 8Gi + requests: + cpu: 500m + memory: 500Mi + image: polardbx/polardbx-engine:latest + gms: + template: + engine: galaxy + resources: + limits: + cpu: "1" + memory: 1Gi + requests: + cpu: 500m + memory: 500Mi + image: polardbx/polardbx-engine:latest + ``` +注:目前 `PolarDB-X Operator` 仅支持拉起单个多流group,并且需要同时拉起全局Binlog。 + +其中多流相关的配置如下: +* `xReplicas`: 多流节点个数 +* `binlogx_stream_group_name`:多流流组名称 +* `binlogx_stream_count`:流的个数 +* `binlogx_transmit_hash_level`:多流数据分发的哈希规则,目前支持三种规则: + * `RECORD`:按行哈希 + * `TABLE`:按表哈希 + * `DATABASE`:按库哈希 + +使用下面的命令创建 PolarDB-X Cluster 对象: +```shell +kubectl create -f polardbx-test.yaml +``` +使用下面的命令观察 PolarDB-X Cluster 对象的状态: +```shell +kubectl get pxc polardbx-test +``` +```text +NAME GMS CN DN CDC PHASE DISK AGE +polardbx-test 0/1 0/2 0/2 0/3 Creating 5s +``` +当状态中 PHASE 为 Running 时,PolarDB-X 集群就创建完成了。 +```shell +kubectl get pxc polardbx-test +``` +```text +NAME GMS CN DN CDC PHASE DISK AGE +polardbx-test 1/1 2/2 2/2 3/3 Running 6.2Gi 63s +``` +使用下面的命令获得所有Binlog多流Pod的名称: +```shell +kubectl get pods -l polardbx/group=g-1 +``` diff --git a/ops/lifecycle/1-create-ha-example.md b/ops/lifecycle/1-create-ha-example.md new file mode 100644 index 0000000..efe7c16 --- /dev/null +++ b/ops/lifecycle/1-create-ha-example.md @@ -0,0 +1,205 @@ +## 同城三机房 + +```yaml +spec: + topology: + rules: + selectors: + - name: zone-a + ... + - name: zone-b + ... + - name: zone-c + ... + components: + cn: + - name: zone-a + replicas: 1 / 3 + selector: + reference: zone-a + - name: zone-b + replicas: 1 / 3 + selector: + reference: zone-b + - name: zone-c + replicas: 1 / 3 + selector: + reference: zone-c + cdc: + - name: zone-a + replicas: 1 / 3 + selector: + reference: zone-a + - name: zone-b + replicas: 1 / 3 + selector: + reference: zone-b + - name: zone-c + replicas: 1 / 3 + selector: + reference: zone-c + dn: + nodeSets: + - name: cand-zone-a + role: Candidate + replicas: 1 + selector: + reference: zone-a + - name: cand-zone-b + role: Candidate + replicas: 1 + selector: + reference: zone-b + - name: log-zone-c + role: Voter + replicas: 1 + selector: + reference: zone-c +``` +## 两地三中心 + +```yaml +spec: + topology: + rules: + selectors: + - name: region-1-zone-a + ... + - name: region-1-zone-b + ... + - name: region-2-zone-c + ... + components: + cn: + - name: region-1-zone-a + replicas: 1 / 3 + selector: + reference: region-1-zone-a + - name: region-1-zone-b + replicas: 1 / 3 + selector: + reference: region-1-zone-b + - name: region-2-zone-c + replicas: 1 / 3 + selector: + reference: region-2-zone-c + cdc: + - name: region-1-zone-a + replicas: 1 / 3 + selector: + reference: region-1-zone-a + - name: region-1-zone-b + replicas: 1 / 3 + selector: + reference: region-1-zone-b + - name: region-2-zone-c + replicas: 1 / 3 + selector: + reference: region-2-zone-c + dn: + nodeSets: + - name: cand-region-1-zone-a + role: Candidate + replicas: 1 + selector: + reference: region-1-zone-a + - name: cand-region-2-zone-c + role: Candidate + replicas: 1 + selector: + reference: region-2-zone-c + - name: region-1-zone-b + role: Voter + replicas: 1 + selector: + reference: region-1-zone-b +``` + +## 三地五中心 + +```yaml +spec: + topology: + rules: + selectors: + - name: region-1-zone-a + ... + - name: region-1-zone-b + ... + - name: region-2-zone-c + ... + - name: region-2-zone-d + ... + - name: region-3-zone-e + ... + components: + cn: + - name: region-1-zone-a + replicas: 1 / 5 + selector: + reference: region-1-zone-a + - name: region-1-zone-b + replicas: 1 / 5 + selector: + reference: region-1-zone-b + - name: region-2-zone-c + replicas: 1 / 5 + selector: + reference: region-2-zone-c + - name: region-2-zone-d + replicas: 1 / 5 + selector: + reference: region-2-zone-d + - name: region-3-zone-e + replicas: 1 / 5 + selector: + reference: region-3-zone-e + cdc: + - name: region-1-zone-a + replicas: 1 / 5 + selector: + reference: region-1-zone-a + - name: region-1-zone-b + replicas: 1 / 5 + selector: + reference: region-1-zone-b + - name: region-2-zone-c + replicas: 1 / 5 + selector: + reference: region-2-zone-c + - name: region-2-zone-d + replicas: 1 / 5 + selector: + reference: region-2-zone-d + - name: region-3-zone-e + replicas: 1 / 5 + selector: + reference: region-3-zone-e + dn: + nodeSets: + - name: cand-region-1-zone-a + role: Candidate + replicas: 1 + selector: + reference: region-1-zone-a + - name: cand-region-1-zone-b + role: Candidate + replicas: 1 + selector: + reference: region-1-zone-b + - name: cand-region-3-zone-c + role: Candidate + replicas: 1 + selector: + reference: region-3-zone-c + - name: cand-region-4-zone-d + role: Candidate + replicas: 1 + selector: + reference: region-4-zone-d + - name: region-3-zone-e + role: Voter + replicas: 1 + selector: + reference: region-3-zone-e +``` diff --git a/ops/lifecycle/1-create-host-network-mode.md b/ops/lifecycle/1-create-host-network-mode.md new file mode 100644 index 0000000..a52cbce --- /dev/null +++ b/ops/lifecycle/1-create-host-network-mode.md @@ -0,0 +1,14 @@ + +Kubernetes 中容器默认是在 Kubernetes 的容器网络中的,此时网络通信会增加一定的代价和延迟。Kubernetes 支持将容器放到宿主机的网络空间中,这种方式在 Pod 上表现为 `hostNetwork` 为 true。 + +PolarDBXCluster 也支持将节点的容器放到宿主机网络中,但有几个限制: + +- 节点需要监听的端口是随机生成的,不保证不冲突 +- 节点升级中可能会遇到端口冲突起不来的情况,需要手动处理 + +每个组件的`hostNetwork`都在对应的 `template`字段中,可以分别指定 + +- `spec.topology.gms.template.hostNetwork` +- `spec.topology.cn.template.hostNetwork` +- `spec.topology.dn.template.hostNetwork` +- `spec.topology.cdc.template.hostNetwork` diff --git a/ops/lifecycle/1-create-node-selector.md b/ops/lifecycle/1-create-node-selector.md new file mode 100644 index 0000000..deaf0a0 --- /dev/null +++ b/ops/lifecycle/1-create-node-selector.md @@ -0,0 +1,34 @@ +在 `topology.rules.nodeSelectors`中,你可以定义一组预置的节点选择器,然后在后面的 `topology.rules.components`中引用它们。关于节点选择器的定义方法和含义,参考[官方文档](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/) 。 + +```yaml +spec: + topology: + rules: + selectors: + - name: zone-a + nodeSelector: + nodeSelectorTerms: + - matchExpressions: + - key: topology.kubernetes.io/zone + operator: In + values: + - cn-hangzhou-a + - name: zone-b + nodeSelector: + nodeSelectorTerms: + - matchExpressions: + - key: topology.kubernetes.io/zone + operator: In + values: + - cn-hangzhou-b + - name: zone-c + nodeSelector: + nodeSelectorTerms: + - matchExpressions: + - key: topology.kubernetes.io/zone + operator: In + values: + - cn-hangzhou-c +``` + +节点选择器可以帮助我们控制部署实例的拓扑,例如两地三中心,三地五中心等,具体的使用可以参考:[容灾部署示例](./1-create-ha-example.md) 。 diff --git a/ops/lifecycle/1-create-readonly-pxc.md b/ops/lifecycle/1-create-readonly-pxc.md new file mode 100644 index 0000000..e01ca6e --- /dev/null +++ b/ops/lifecycle/1-create-readonly-pxc.md @@ -0,0 +1,82 @@ +## 只读实例创建 +对于PolarDB-X Operator 1.3.0及以上的版本,您可以创建只读实例,并指定其所属的 PolarDB-X 主实例。 + +只读实例的存储节点通过增加 Learner 副本的方式来保证物理资源隔离,提供了读写分离的特性,同时能够基于全局时钟来确保只读查询的强一致性。 + +您可以通过以下两种方法创建只读实例: + +### 1. 为已有主实例添加只读实例 +创建独立的 PolarDBXCluster yaml 配置文件,令`spec.readonly`为`true`,并指定`spec.primaryCluster`为其所属的主实例名,示例如下: + ``` yaml + # readonly.yaml + apiVersion: polardbx.aliyun.com/v1 + kind: PolarDBXCluster + metadata: + name: pxc-readonly + spec: + readonly: true + primaryCluster: pxc-master # 主实例名 + topology: + nodes: + cn: + replicas: 1 + template: + resources: + limits: + cpu: 2 + memory: 4Gi + image: polardbx/polardbx-sql:latest + imagePullPolicy: Always + dn: + # DN replicas 会自动与主实例的 DN replicas 保持同步,无需显式指定 + template: + resources: + limits: + cpu: 2 + memory: 4Gi + image: polardbx/polardbx-engine:latest + imagePullPolicy: IfNotPresent + config: + cn: + static: + AttendHtap: true # 是否支持 HTAP + ``` +### 2. 同时创建主实例与只读实例 +在创建主实例时,在主实例 PolarDBXCluster yaml 配置文件的`spec.initReadonly`字段中添加附属只读实例的信息。这种方法创建出的只读实例规格和参数与主实例相同,示例如下: + ``` yaml + # pxc-with-readonly.yaml + apiVersion: polardbx.aliyun.com/v1 + kind: PolarDBXCluster + metadata: + name: pxc + spec: + initReadonly: + - cnReplicas: 1 # 只读实例 CN 数 + name: readonly # 只读实例后缀名,本例中将生成名为 "pxc-readonly" 的只读实例,不填则会生成随机后缀 + extraParams: + AttendHtap: "true" # 是否支持 HTAP + topology: + nodes: + cn: + replicas: 1 + template: + resources: + limits: + cpu: 2 + memory: 4Gi + image: polardbx/polardbx-sql:latest + imagePullPolicy: Always + dn: + replicas: 1 + template: + resources: + limits: + cpu: 2 + memory: 4Gi + image: polardbx/polardbx-engine:latest + imagePullPolicy: IfNotPresent + ``` + +### 3. 连接只读实例 + +您可以直接连接只读实例,连接方法同主实例,见[连接 PolarDB-X 数据库](../connection/README.md),对应的集群实例名称填入只读实例名即可。 \ No newline at end of file diff --git a/ops/lifecycle/1-create-simple-mode.md b/ops/lifecycle/1-create-simple-mode.md new file mode 100644 index 0000000..659d017 --- /dev/null +++ b/ops/lifecycle/1-create-simple-mode.md @@ -0,0 +1,3 @@ +PolarDBXCluster 支持将 GMS 的功能合并到第一个 DN(DN-0)中来减少整体使用的资源,这种情况适合进行测试部署。 + +想要指定这种部署模式,将 `spec.shareGMS`设置为 true 即可。需要注意的是,极简模式和普通模式不能来回切换。 diff --git a/ops/lifecycle/1-create-state-node-rule.md b/ops/lifecycle/1-create-state-node-rule.md new file mode 100644 index 0000000..2ced344 --- /dev/null +++ b/ops/lifecycle/1-create-state-node-rule.md @@ -0,0 +1,39 @@ +有状态节点规则是针对元数据、存储节点的内部节点,有两种形式: + +- nodeSet,每个 GMS、DN 都遵从 nodeSet 的规则来部署内部节点 +- rolling,只针对 DN,会将内部节点按照堆叠的方式部署在 Kubernetes 集群内的所有可用的节点之上(用于测试),从而最大化资源利用 + +```yaml +spec: + topology: + rules: + components: + # **Optional** + # + # GMS 部署规则,默认和 DN 一致 + gms: + # 堆叠部署结构,operator 尝试在节点选择器指定的节点中,堆叠部署 + # 每个存储节点的子节点以达到较高资源利用率的方式,仅供测试使用 + rolling: + replicas: 3 + selector: + reference: zone-a + # 节点组部署结构,可以指定每个 DN 的子节点的节点组和节点选择器, + # 从而达成跨区、跨城等高可用部署结构 + nodeSets: + - name: cand-zone-a + role: Candidate + replicas: 1 + selector: + reference: zone-a + - name: cand-zone-b + role: Candidate + replicas: 1 + selector: + reference: zone-b + - name: log-zone-c + role: Voter + replicas: 1 + selector: + reference: zone-c +``` diff --git a/ops/lifecycle/1-create-stateless-node-rule.md b/ops/lifecycle/1-create-stateless-node-rule.md new file mode 100644 index 0000000..62e8e6c --- /dev/null +++ b/ops/lifecycle/1-create-stateless-node-rule.md @@ -0,0 +1,26 @@ +示例: + +```yaml +spec: + topology: + rules: + components: + # **Optional** + # + # CN 部署规则,同样按组划分 CN 节点 + cn: + - name: zone-a + # 合法值:数字、百分比、(0, 1] 分数,不填写为剩余 replica(只能有一个不填写) + # 总和不能超过 .topology.nodes.cn.replicas + replicas: 1 + selector: + reference: zone-a + - name: zone-b + replicas: 1 / 3 + selector: + reference: zone-b + - name: zone-c + replicas: 34% + selector: + reference: zone-c +``` diff --git a/ops/lifecycle/1-create.md b/ops/lifecycle/1-create.md new file mode 100644 index 0000000..52fe2b0 --- /dev/null +++ b/ops/lifecycle/1-create.md @@ -0,0 +1,62 @@ +前言:完整的 PolarDBXCluster 定义参考[这里](../../api/polardbxcluster.md) 。 + +首先准备一个描述 PolarDBXCluster 的 yaml 文件: + +```yaml +apiVersion: polardbx.aliyun.com/v1 # API 组 / 版本 +kind: PolarDBXCluster # API 名称 +metadata: # 对象元数据 + name: polardbx-test # 对象名字 + namespace: default # 所在命名空间 + labels: # 对象标签集合 + kind: test +spec: # Spec + topology: # 拓扑定义 + nodes: # 节点规格和数量 + cn: + replicas: 2 + template: + image: polardbx/polardbx-sql:latest + resources: + limits: + cpu: 4 + memory: 16Gi + dn: + replicas: 2 + template: + image: polardbx/polardbx-engine:latest + resources: + limits: + cpu: 4 + memory: 16Gi + cdc: + replicas: 2 + template: + image: polardbx/polardbx-cdc:latest + resources: + limits: + cpu: 4 + memory: 16Gi +``` + +使用下面的命令创建 PolarDBXCluster 对象: + +```bash +kubectl create -f polardbx-test.yaml +``` + +使用下面的命令观察 PolarDBXCluster 对象的状态: + +```bash +kubectl get pxc polardbx-test +NAME GMS CN DN CDC PHASE DISK AGE +polardbx-test 0/1 0/2 0/2 0/2 Creating 5s +``` + +当状态中 `PHASE` 为 `Running` 时,PolarDB-X 集群就创建完成了。 + +```bash +kubectl get pxc polardbx-test +NAME GMS CN DN CDC PHASE DISK AGE +polardbx-test 1/1 2/2 2/2 2/2 Running 6.2Gi 63s +``` diff --git a/ops/lifecycle/2-delete.md b/ops/lifecycle/2-delete.md new file mode 100644 index 0000000..a8c18ae --- /dev/null +++ b/ops/lifecycle/2-delete.md @@ -0,0 +1,22 @@ +使用下面的命令删除 PolarDBXCluster 集群(对象),其中 `polardbx-test` 是 PolarDBXCluster 对象名 + +```bash +kubectl delete pxc polardbx-test +``` + +此时查看对象状态可能会看到 `PHASE`在 `Deleting`, + +```bash +kubectl get pxc polardbx-test +NAME GMS CN DN CDC PHASE DISK AGE +polardbx-test 1/1 2/2 2/2 2/2 Deleting 6.2Gi 2m1s +``` + +或者报错对象已经不存在 + +```bash +kubectl get pxc polardbx-test +Error from server (NotFound): polardbxclusters.polardbx.aliyun.com "polardbx-test" not found +``` + +当 PolarDBXCluster 主实例被删除时,其附属的只读实例也会随之删除 diff --git a/ops/lifecycle/3-update.md b/ops/lifecycle/3-update.md new file mode 100644 index 0000000..23e0ad9 --- /dev/null +++ b/ops/lifecycle/3-update.md @@ -0,0 +1,17 @@ +注:本文升级指修改某个或某几个组件的镜像,实际操作中你可以同时进行升级、升配、扩缩容动作。 + +以前文[《1. 创建》](./1-create.md) 中的 yaml 为例,假设我们想要更新 CN 的镜像为 `polardbx/polardbx-sql:v2.0`,那么可以使用 `kubectl edit` 或是 `kubectl patch` 的方式修改 `.spec` 下的镜像字段,这里演示 `kubectl patch`的方式: + +```bash +kubectl patch pxc polardbx-test -p '{"spec": {"topology": {"nodes": {"cn": {"template": {"image": "polardbx/polardbx-sql:v2.0"}}}}}}' +``` + +稍后观察集群状态,`PHASE`会进入 `Upgrading` 状态,表明正在升级中: + +```bash +kubectl get pxc polardbx-test +NAME GMS CN DN CDC PHASE DISK AGE +polardbx-test 1/1 1/2 2/2 2/2 Upgrading 6.2Gi 93s +``` + +当 `PHASE`重新变为 `Running`时,升级完成。 diff --git a/ops/lifecycle/4-upgrade.md b/ops/lifecycle/4-upgrade.md new file mode 100644 index 0000000..49afb2c --- /dev/null +++ b/ops/lifecycle/4-upgrade.md @@ -0,0 +1,7 @@ +除了是修改资源配置以外,其余同[《3. 升级》](./3-update.md) 。 + +```bash +kubectl patch pxc polardbx-test -p '{"spec": {"topology": {"nodes": {"cn": {"template": {"resources": {"limits": {"cpu": 4, "memory": "16Gi"}}}}}}}}' +``` + +同样 `PHASE`从 `Running`进入 `Upgrading`,然后再回到 `Running`。 diff --git a/ops/lifecycle/5-scale-out.md b/ops/lifecycle/5-scale-out.md new file mode 100644 index 0000000..28bc13e --- /dev/null +++ b/ops/lifecycle/5-scale-out.md @@ -0,0 +1,23 @@ +除了是增加节点以外,其余同[《3. 升级》](./3-update.md) 。 + +```bash +kubectl patch pxc polardbx-test -p '{"spec": {"topology": {"nodes": {"dn": {"replicas": 3}}}}}' +``` + +同样 `PHASE`从 `Running`进入 `Upgrading`,然后再回到 `Running`。 + +```bash +kubectl get pxc polardbx-test +NAME GMS CN DN CDC PHASE DISK AGE +polardbx-test 1/1 1/2 2/3 2/2 Upgrading 6.2Gi 93s +``` + +但你会看到 DN 的数量预期变为了 `2/3` 和 `3/3`。同时,operator 会自动进行数据的均衡,因此还可能涉及到数据的搬迁。你可以通过如下方式查看数据搬迁进度: + +```bash +kubectl get pxc polardbx-test -o wide +NAME PROTOCOL GMS CN DN CDC PHASE DISK STAGE REBALANCE VERSION AGE +polardbx-test 8.0 1/1 2/2 3/3 2/2 Upgrading 22.6 GiB RebalanceWatch 50% 8.0.3-PXC-5.4.13-20220418/8.0.18 35d +``` + + diff --git a/ops/lifecycle/6-scale-in.md b/ops/lifecycle/6-scale-in.md new file mode 100644 index 0000000..036c5ca --- /dev/null +++ b/ops/lifecycle/6-scale-in.md @@ -0,0 +1,5 @@ +同[《5. 扩容》](./5-scale-out.md) 一样,除了是缩减节点。同样,数据会自动进行搬迁。 + +```bash +kubectl patch pxc polardbx-test -p '{"spec": {"topology": {"nodes": {"dn": {"replicas": 1}}}}}' +``` diff --git a/ops/lifecycle/7-rollback-exception.md b/ops/lifecycle/7-rollback-exception.md new file mode 100644 index 0000000..f6bab82 --- /dev/null +++ b/ops/lifecycle/7-rollback-exception.md @@ -0,0 +1,5 @@ +几种情况下,operator 无法响应新的操作: + +1. `PHASE` 在 `Deleting`状态,意味着在删除中 +2. `PHASE` 在 `Locked`状态,意味着在锁定中 +3. `PHASE` 在 `Upgrading`状态,且 `STAGE` 在 `RebalanceStart`、`RebalanceWatch`和 `Clean` 状态时,无法中断,意味着此时在进行数据的迁移工作 diff --git a/ops/lifecycle/7-rollback.md b/ops/lifecycle/7-rollback.md new file mode 100644 index 0000000..5ace474 --- /dev/null +++ b/ops/lifecycle/7-rollback.md @@ -0,0 +1,15 @@ +除了[几个特殊情况](./7-rollback-exception.md) ,在任意一个变更过程中,你都可以再次变更对象的 `.spec` 字段来触发新的操作,operator 会及时响应以达到预期效果。 + +因此,中断/回滚上一次操作的方式是再进行一次操作,将 `.spec`改回之前的状态,例如: + +```bash +kubectl patch pxc polardbx-test -p '{"spec": {"topology": {"nodes": {"dn": {"replicas": 3}}}}}' +``` + +之后立刻将 `replicas`改回 2 + +```bash +kubectl patch pxc polardbx-test -p '{"spec": {"topology": {"nodes": {"dn": {"replicas": 2}}}}}' +``` + +那么 PolarDB-X 集群将继续稳定运行。 diff --git a/ops/lifecycle/8-complex-ops.md b/ops/lifecycle/8-complex-ops.md new file mode 100644 index 0000000..3259ef1 --- /dev/null +++ b/ops/lifecycle/8-complex-ops.md @@ -0,0 +1,50 @@ +复杂操作指并不是单纯的升级、升配、扩缩容,而是混合了部分或者所有意图的操作方式。Kubernetes 的声明式 API 要求我们高效地实现这样的操作,因此 operator 也支持了。 + +举个例子,我们将同时 + +1. 修改 CN 的镜像为 polardbx/polardbx-sql:v2.0 +2. 修改 CDC 的配置为 8C32G +3. 增加 DN 的节点,到 3 个 + +同样,我们可以用 `kubectl edit` 和 `kubectl patch` 的方式实现,这里演示 `kubectl patch` 的方式。 + +首先准备一个 patch 文件, + +```yaml +spec: + topology: + nodes: + cn: + template: + image: polardbx/polardbx-sql:v2.0 + dn: + replicas: 3 + cdc: + template: + resources: + limits: + cpu: 8 + memory: 32Gi +``` + +执行下面的命令来进行上述操作: + +```bash +kubectl patch pxc polardbx-test --patch-file patch.yaml +``` + +此时我们将同时观察到 CN、DN、CDC 的变化和数据搬迁: + +```bash +kubectl get pxc polardbx-test -o wide +NAME PROTOCOL GMS CN DN CDC PHASE DISK STAGE REBALANCE VERSION AGE +polardbx-test 8.0 1/1 1/2 2/3 1/2 Upgrading 22.6 GiB 8.0.3-PXC-5.4.13-20220418/8.0.18 35d +``` + +```bash +kubectl get pxc polardbx-test -o wide +NAME PROTOCOL GMS CN DN CDC PHASE DISK STAGE REBALANCE VERSION AGE +polardbx-test 8.0 1/1 2/2 3/3 2/2 Upgrading 22.6 GiB RebalanceWatch 50% 8.0.3-PXC-5.4.13-20220418/8.0.18 35d +``` + +注:如《[不可中断的情况](./7-rollback-exception.md) 》中所说,数据搬迁中不可中断。 diff --git a/ops/lifecycle/README.md b/ops/lifecycle/README.md new file mode 100644 index 0000000..bbcfaa8 --- /dev/null +++ b/ops/lifecycle/README.md @@ -0,0 +1,19 @@ +生命周期管理 +========== + +1. [创建](./1-create.md) + 1. [集群拓扑规则-节点选择器 (NodeSelector)](./1-create-node-selector.md) + 2. [集群拓扑规则-无状态节点规则(计算、日志节点)](./1-create-stateless-node-rule.md) + 3. [集群拓扑规则-有状态节点规则(GMS、存储节点)](./1-create-state-node-rule.md) + 4. [集群拓扑规则-容灾部署示例](./1-create-ha-example.md) + 5. [容器宿主机网络模式](./1-create-host-network-mode.md) + 6. [极简部署模式 (ShareGMS)](./1-create-simple-mode.md) + 7. [只读实例创建](./1-create-readonly-pxc.md) +2. [删除](./2-delete.md) +3. [升级](./3-update.md) +4. [升配](./4-upgrade.md) +5. [扩容](./5-scale-out.md) +6. [缩容](./6-scale-in.md) +7. [操作中断/回滚](./7-rollback.md) + 1. [不可中断的情况](./7-rollback-exception.md) +8. [复杂操作](./8-complex-ops.md) \ No newline at end of file diff --git a/ops/logcollector/1-logcollector.md b/ops/logcollector/1-logcollector.md new file mode 100644 index 0000000..d3eb70b --- /dev/null +++ b/ops/logcollector/1-logcollector.md @@ -0,0 +1,204 @@ +# 日志采集 + +本文介绍如何在 k8s 集群中为 PolarDB-X 数据库开启日志采集功能。 + +## 采集内容 +### 计算节点的日志 +| 日志 | Pod内路径 | 是否进行了解析 | +| --- | --- | --- | +| SQL日志 | /home/admin/drds-server/logs/*/sql.log | 是 | +| 慢日志 | /home/admin/drds-server/logs/*/slow.log | 是 | +| 错误日志 | /home/admin/drds-server/logs/*/tddl.log | 否 | +>容器内路径中的*表示任意目录名 + +## 安装 PolarDB-X LogCollector +PolarDB-X通过Filebeat采集日志,将原始日志发送到Logstash进行解析并发送给最后的存储端。 +### 前置要求 +1. 已经准备了一个运行中的 K8s 集群,并确保集群版本 >= 1.18.0 +2. 已经安装了 [Helm 3](https://helm.sh/docs/intro/install/) +3. 已经安装 PolarDB-X Operator 1.2.2 及以上的版本 + +### Helm包安装 +首先创建一个名为 polardbx-logcollector 的命名空间: +``` +kubectl create namespace polardbx-logcollector +``` + +执行如下命令安装 PolarDB-X LogCollector: +``` + helm install --namespace polardbx-logcollector polardbx-logcollector https://github.com/polardb/polardbx-operator/releases/download/v1.3.0/polardbx-logcollector-1.3.0.tgz +``` + +您也可以通过 PolarDB-X 的 Helm Chart 仓库安装: +```bash +helm repo add polardbx https://polardbx-charts.oss-cn-beijing.aliyuncs.com +helm install --namespace polardbx-logcollector polardbx-logcollector polardbx/polardbx-logcollector +``` +> 注:默认安装配置下,Filebeat通过DaemonSet的形式安装到k8s集群的机器上,默认每个Filebeat Pod会占用占用 500MB 内存和 1 个CPU核;Logstash Pod默认部署一个,每个占用 1.5GB 内存和 2 个CPU核。具体默认可查看: [values.yaml](https://github.com/polardb/polardbx-operator/blob/main/charts/polardbx-logcollector/values.yaml)。 + +期望看到如下输出: +``` +polardbx-operator logcollector plugin is installed. Please check the status of components: + + kubectl get pods --namespace {{ .Release.Namespace }} + +Now start to collect logs of your polardbx cluster. +``` + +## 查看日志 + +### 开启日志采集 +PolarDB-X 集群的日志采集功能默认关闭,您可以通过如下命令控制控制日志采集的开启与关闭: + +打开 PolarDB-X 实例的 CN 节点日志采集: +``` +kubectl patch pxc {pxc name} --patch '{"spec":{"config":{"cn":{"enableAuditLog":true}}}}' --type merge +``` +关闭 PolarDB-X 实例的 CN 节点日志采集: +``` +kubectl patch pxc {pxc name} --patch '{"spec":{"config":{"cn":{"enableAuditLog":false}}}}' --type merge +``` + +### 在 Logstash 标准输出查看日志 + +PolarDB-X 使用Logstash作为日志的解析和上报组件, 默认将日志输出到标准控制台,方便使用者验证日志的采集和解析链路是否正常。通过下面的命令查看日志采集: +```shell +kubectl logs -f {logstash pod name} -n polardbx-logcollector +``` + + +## 将日志投递到其它系统 + +Logstash支持多种[输出插件](https://www.elastic.co/guide/en/logstash/current/output-plugins.html) , 您也可[开发自己的输出插件](https://www.elastic.co/guide/en/logstash/current/output-new-plugin.html) , 根据实际需求将 PolarDB-X 日志投递到其它系统做进一步分析。 + +Logstash 的 output plugin 配置保存在 polardbx-logcollector 命名空间下名为 logstash-pipeline 的 ConfigMap 中,您可以通过如下命令修改 logstash 的 output 配置。 +```shell +kubectl edit configmap logstash-pipeline -n polardbx-logcollector +``` + +logstash-pipeline 的 output 配置如下图所示: + +![undefined](images/logstash-pipeline-config.png ) + +下面本文将以 ElasticSearch 为例,介绍如何配置 Logstash,将 PolarDB-X 投递到 ElasticSearch 集群。 + +### 投递日志至 ElasticSearch + +如果您的环境已经有 ES 集群,可以直接跳过《创建 ElasticSearch》。 + +#### 创建 ElasticSearch + +参考如下文档在 K8s 集群中快速部署一个测试的 ES 集群。 + +1. [部署 ElasticSearch Operator](https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-deploy-eck.html#k8s-deploy-eck) +2. [部署 ElasticSearch Cluster](https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-deploy-elasticsearch.html), 该步骤中需要获取ES 集群的endpoint,用户名,密码,证书。 +ES 集群的访问证书可以通过如下命令获取: +```shell +kubectl get secret quickstart-es-http-certs-public -o=jsonpath='{.data.ca\.crt}' +``` +3. [部署 Kibana](https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-deploy-kibana.html) +> 注:上述 ES 集群仅用于测试,生产环境请自行配置 ES 集群。 + +#### 更新证书 Secret + +如果 ES 集群采用 HTTP 访问,可以跳过该步骤。 + +如果 ES 集群采用HTTPS 访问,需要配置证书,证书文件(/usr/share/logstash/config/certs/ca.crt)已通过 polardbx-logcollector 命名空间的 elastic-certs-public secret 挂载到 Logstash 的 Pod 中,通过如下命令更新secret: +```shell +kubectl edit secret elastic-certs-public -n polardbx-logcollector +``` + +#### 配置 logstash output + +前置条件: +- k8s集群内可访问的 ES 集群地址; +- 打开ES自动创建索引功能; +- ES集群的上创建一个API Key 或者 账号密码; +- 如果使用https,则需要ES集群证书。将内容写入ES证书Secret中,证书Secret为 polardbx-logcollector 命名空间的 elastic-certs-public,证书文件名为ca.crt。 + +通过如下命令更新 Logstash 的output 配置: + +```shell +kubectl edit configmap logstash-pipeline -n polardbx-logcollector +``` + +例如,下面给出了一个 ES 集群配置的示例: +``` +output { + elasticsearch { + hosts => ["https://quickstart-es-http.default:9200"] + user => elastic + password => sTF9B37N0jAF45Kn2Jwt874N + ssl => true + cacert => "/usr/share/logstash/config/certs/ca.crt" + index => "%{[@metadata][target_index]}" + } +} +``` + +- 如需了解更多的配置,可访问 [Elastic Search Output Plugins Options](https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-options) + +![undefined](images/logstash-pipeline-config-output.png) + +启用elasticsearch输出插件后,记得**注释掉stdout的输出配置**。 + +#### 访问Kibana + +参考[部署Kibana](https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-deploy-kibana.html#k8s-deploy-kibana), 登录Kibana 创建三个 Index Pattern, 用于查询日志: + +| 日志类型 | Index Pattern | +| --- | --- | +| SQL日志 | cn_sql_log-* | +| 慢日志 | cn_slow_log-* | +| 错误日志 | cn_tddl_log-* | + +Kibana 创建 Index Pattern 如下图所示: +![kibana-create-index](images/kibana-create-index.png) +#### 日志效果图 +SQL日志 +![undefined](images/sql-log-result.png) + +错误日志 +![undefined](images/error-log-result.png) + +慢日志 +![undefined](images/slow-log-result.png) + +### 其他 +- [已有输出的插件](https://www.elastic.co/guide/en/logstash/current/output-plugins.html) +- [开发新的输出插件](https://www.elastic.co/guide/en/logstash/current/output-new-plugin.html) + +# values.yaml介绍 +用户可根据自己实际情况对 polardbx-logcollector 的安装配置进行定制,values.yaml路径:charts/polardbx-logcollector/values.yaml,文件对各个配置项进行了详细的注解。 + +# 日志字段介绍 +查看[日志字段介绍](2-logfield.md)。 + +# 资源配置和性能调优建议 +## 资源 + +| logstash单核 | filebeat单核 | +|---------------|----------------| +| 5000 events/s | 12000 events/s | +为了让核数得到充分利用,且不会oom的产生,我们需要合理配置内存资源、并发数、缓存大小等 + +## 建议根据具体场景调整的参数 + +### filebeat的filebeat.yml配置文件 +ConfigMap名称为filebeat-config。 +参数: +- SQL日志配置项里的harvester_buffer_size大小 +- queue.mem配置 + +参考:[filebeat配置](https://www.elastic.co/guide/en/beats/filebeat/current/configuring-howto-filebeat.html) + +### logstash的jvm.options配置文件 +ConfigMap名称为logstash-config。参数: +- -Xms和-Xmx + +### logstash的logstash.yml配置文件 +ConfigMap名称为logstash-config。参数: +- pipeline.batch.size +- pipeline.workers + +参考:[logstash配置](https://www.elastic.co/guide/en/logstash/current/config-setting-files.html) diff --git a/ops/logcollector/2-logfield.md b/ops/logcollector/2-logfield.md new file mode 100644 index 0000000..af61763 --- /dev/null +++ b/ops/logcollector/2-logfield.md @@ -0,0 +1,206 @@ +# 日志字段说明 +本文介绍如何在logstash上报的日志内容中,各字段的说明。 +## SQL日志 +### 字段说明 +| **字段组** | **字段名称** | **描述** | +| --- | --- |--------------------------------------------------------------------| +| fields | instance_id | 实例名称。 | +| | node_name | 计算节点pod所在的node名称。 | +| | log_type | 日志类型。 | +| | pod_name | 计算节点pod名称。 | +| message | log_time | 日志打印时间戳。 | +| | physical_affected_rows | 物理影响行数。 | +| | total_physical_get_connection_time_cost | 物理连接获取耗时, 单位ns。 | +| | sql | 被执行的SQL语句。 | +| | fetched_rows | 从存储拉取的记录行数。 | +| | total_physical_time_cost | 物理执行总耗时,包括物理SQL耗时与物理结果集耗时,单位ns。 | +| | total_physical_sql_execution_time_cost | 物理SQL执行的总耗时之和,单位ns。 | +| | schema | 数据库。 | +| | logical_time_cost | 逻辑层执行耗时(即DRDS层的消耗的CPU时间),单位ns。 | +| | affected_rows | 若执行的是DML,表示受影响的行数;若执行的是查询语句,表示返回结果的行数。 | +| | logical_optimizer_time_cost | 从sql接收到生成Plan的时间,即SQL在优化器的总耗时,单位ns。 | +| | logical_executor_time_cost | 完整执行整个plan的总逻辑耗时(这个指标已去除了物理层总耗时),单位ns。 | +| | workload_type | SQL执行时的负载类型,取值范围如下:TP:事务类型的负载;AP:分析类型的负载。 | +| | response_time | 响应时间,单位:微秒。 | +| | template_id | 模板SQL的哈希值。 | +| | user | 执行SQL的用户名。 | +| | trace_id | SQL执行的TRACE ID。 | +| | extra_info | 额外信息。包括客户端地址(ipport),prepare语句编号(stmt_id),事物策略(trx),负载类型(wt),内核版本(ver) | + + +### 例子 +```json +{ + "_index": "cn_sql_log-2022.11.16", + "_type": "_doc", + "_id": "oz1rf4QB-sddlgYFeymE", + "_version": 1, + "_score": null, + "_source": { + "@version": "1", + "@timestamp": "2022-11-16T07:50:58.507Z", + "host": { + "name": "filebeat-vtggm" + }, + "fields": { + "pod_name": "busu-pxchostnet-ql8p-cn-default-84cdc67d84-4w8ww", + "log_type": "cn_sql_log", + "instance_id": "busu-pxchostnet", + "node_name": "cn-beijing.192.168.1.250" + }, + "message": { + "total_physical_sql_execution_time_cost": 680097, + "total_physical_get_connection_time_cost": 4544, + "fetched_rows": 0, + "trace_id": "153a4ba63d007000", + "extra_info": "ipport=192.168.0.3:58888 wt=TP ver=5.4.13-20220621", + "affected_rows": 0, + "template_id": "3e4e0512", + "logical_time_cost": -3315586127, + "user": "polardbx_root", + "response_time": 1203, + "physical_affected_rows": 0, + "schema": "polardbx", + "logical_optimizer_time_cost": 27655, + "log_time": "2022-11-16 15:50:58.509", + "sql": "SELECT engine, external_endpoint, file_uri, access_key_id, access_key_secret FROM metadb.file_storage_info", + "total_physical_time_cost": 686243, + "logical_executor_time_cost": -3315613782 + }, + "tags": [ + "beats_input_codec_plain_applied" + ] + }, + "fields": { + "@timestamp": [ + "2022-11-16T07:50:58.507Z" + ] + }, + "sort": [ + 1668585058507 + ] +} +``` + +## 慢日志 +### 字段说明 +| **字段组** | **字段名称** | **描述** | +| --- | --- | --- | +| fields | instance_id | 实例名称。 | +| | log_type | 日志类型。 | +| | node_name | 计算节点pod所在的node名称。 | +| | pod_name | 计算节点pod名称。 | +| message | log_time | 日志打印时间。 | +| | time | 执行时间,单位ms。 | +| | host | 客户端ip。 | +| | port | 客户端端口。 | +| | sql | 执行SQL语句。 | +| | affected_rows | 影响行数。 | +| | trace_id | 跟踪编号。 | +| | server_version | 内核版本。 | +| | user | 用户名。 | +| | schema | 数据库名。 | + +### 例子 +```json +{ + "_index": "cn_slow_log-2022.08.09", + "_type": "_doc", + "_id": "CxdugYIB4-sIO7p8dvv-", + "_version": 1, + "_score": null, + "_source": { + "fields": { + "instance_id": "busu-pxchostnet", + "node_name": "cn-beijing.192.168.0.207", + "log_type": "cn_slow_log", + "pod_name": "busu-pxchostnet-ldcw-cn-default-c754df994-xqhhj" + }, + "@version": "1", + "message": { + "log_time": "2022-08-09 15:07:55.720", + "time": "2001", + "host": "127.0.0.1", + "port": "35812", + "sql": "select sleep(2)", + "affected_rows": "1", + "trace_id": "14bacc6508402000", + "server_version": "5.4.13-16534775", + "user": "polardbx_root", + "schema": "busudb" + }, + "host": { + "name": "filebeat-wg47m" + }, + "@timestamp": "2022-08-09T07:07:55.720Z", + "tags": [ + "beats_input_codec_plain_applied" + ] + }, + "fields": { + "@timestamp": [ + "2022-08-09T07:07:55.720Z" + ] + }, + "highlight": { + "message.schema": [ + "@kibana-highlighted-field@busudb@/kibana-highlighted-field@" + ] + }, + "sort": [ + 1660028875720 + ] +} +``` + +## 错误日志 +### 字段说明 +| **字段组** | **字段名称** | **描述** | +| --- | --- | --- | +| fields | instance_id | 实例名称。 | +| | log_type | 日志类型。 | +| | node_name | 计算节点pod所在的node名称。 | +| | pod_name | 计算节点pod名称。 | +| / | logger | 打印者名称。 | +| | loglevel | 日子级别 | +| | message | 错误内容 | +| | thread | 线程名称 | + +### 例子 +```json +{ + "_index": "cn_tddl_log-2022.08.09", + "_type": "_doc", + "_id": "3oWGgYIBMBS_DyGstwZ2", + "_version": 1, + "_score": null, + "_source": { + "loglevel": "WARN", + "thread": "ManagerExecutor-14-thread-160", + "logger": " com.alibaba.polardbx.manager.ManagerConnection", + "host": { + "name": "filebeat-wg47m" + }, + "message": "[user=polardbx_root,host=127.0.0.1,port=37150,schema=null] Index: 17, Size: 17\njava.lang.IndexOutOfBoundsException: Index: 17, Size: 17\n\tat java.util.ArrayList.rangeCheck(ArrayList.java:659)\n\tat java.util.ArrayList.get(ArrayList.java:435)\n\tat com.alibaba.polardbx.net.packet.RowDataPacket.getPacketLength(RowDataPacket.java:111)\n\tat com.alibaba.polardbx.net.packet.RowDataPacket.write(RowDataPacket.java:85)\n\tat com.alibaba.polardbx.manager.response.ShowHtc.execute(ShowHtc.java:130)\n\tat com.alibaba.polardbx.manager.handler.ShowHandler.handle(ShowHandler.java:93)\n\tat com.alibaba.polardbx.manager.ManagerQueryHandler.query(ManagerQueryHandler.java:68)\n\tat com.alibaba.polardbx.net.handler.QueryHandler.queryRaw(QueryHandler.java:29)\n\tat com.alibaba.polardbx.net.FrontendConnection.query(FrontendConnection.java:474)\n\tat com.alibaba.polardbx.net.handler.FrontendCommandHandler.handle(FrontendCommandHandler.java:65)\n\tat com.alibaba.polardbx.manager.ManagerConnection.lambda$handleData$0(ManagerConnection.java:62)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:855)\n\tat com.alibaba.wisp.engine.WispTask.runOutsideWisp(WispTask.java:299)\n\tat com.alibaba.wisp.engine.WispTask.runCommand(WispTask.java:274)\n\tat com.alibaba.wisp.engine.WispTask.access$100(WispTask.java:53)\n\tat com.alibaba.wisp.engine.WispTask$CacheableCoroutine.run(WispTask.java:241)\n\tat java.dyn.CoroutineBase.startInternal(CoroutineBase.java:62)", + "fields": { + "instance_id": "busu-pxchostnet", + "node_name": "cn-beijing.192.168.0.207", + "log_type": "cn_tddl_log", + "pod_name": "busu-pxchostnet-ldcw-cn-default-c754df994-xqhhj" + }, + "@version": "1", + "tags": [ + "beats_input_codec_plain_applied" + ], + "@timestamp": "2022-08-09T07:34:16.157Z" + }, + "fields": { + "@timestamp": [ + "2022-08-09T07:34:16.157Z" + ] + }, + "sort": [ + 1660030456157 + ] +} +``` \ No newline at end of file diff --git a/ops/logcollector/README.md b/ops/logcollector/README.md new file mode 100644 index 0000000..f8fcd6f --- /dev/null +++ b/ops/logcollector/README.md @@ -0,0 +1,6 @@ +日志采集 +==== + +[日志采集](./1-logcollector.md) + +[日志字段](./2-logfield.md) \ No newline at end of file diff --git a/ops/logcollector/images/elastic-cert-public-secret.png b/ops/logcollector/images/elastic-cert-public-secret.png new file mode 100644 index 0000000..1a170b1 Binary files /dev/null and b/ops/logcollector/images/elastic-cert-public-secret.png differ diff --git a/ops/logcollector/images/error-log-result.png b/ops/logcollector/images/error-log-result.png new file mode 100644 index 0000000..58ac9e7 Binary files /dev/null and b/ops/logcollector/images/error-log-result.png differ diff --git a/ops/logcollector/images/kibana-create-index.png b/ops/logcollector/images/kibana-create-index.png new file mode 100644 index 0000000..2b74b62 Binary files /dev/null and b/ops/logcollector/images/kibana-create-index.png differ diff --git a/ops/logcollector/images/logstash-pipeline-config-output.png b/ops/logcollector/images/logstash-pipeline-config-output.png new file mode 100644 index 0000000..c0696e5 Binary files /dev/null and b/ops/logcollector/images/logstash-pipeline-config-output.png differ diff --git a/ops/logcollector/images/logstash-pipeline-config.png b/ops/logcollector/images/logstash-pipeline-config.png new file mode 100644 index 0000000..b45026b Binary files /dev/null and b/ops/logcollector/images/logstash-pipeline-config.png differ diff --git a/ops/logcollector/images/slow-log-result.png b/ops/logcollector/images/slow-log-result.png new file mode 100644 index 0000000..0b60950 Binary files /dev/null and b/ops/logcollector/images/slow-log-result.png differ diff --git a/ops/logcollector/images/sql-log-result.png b/ops/logcollector/images/sql-log-result.png new file mode 100644 index 0000000..b18a450 Binary files /dev/null and b/ops/logcollector/images/sql-log-result.png differ diff --git a/ops/monitor/1-monitor-install.md b/ops/monitor/1-monitor-install.md new file mode 100644 index 0000000..f393587 --- /dev/null +++ b/ops/monitor/1-monitor-install.md @@ -0,0 +1,58 @@ +本文介绍如何在 K8s 集群中为 PolarDB-X 数据库开启监控功能。 +## 安装 PolarDB-X Monitor +PolarDB-X 通过 Prometheus 和 Grafana 来监控 PolarDB-X 集群。PolarDB-X Monitor 集成了  [kube-promethus](https://github.com/prometheus-operator/kube-prometheus) 组件栈,通过安装 PolarDB-X Monitor 即可一键部署监控所需的资源和组件。 +### 前置要求 + +1. 已经准备了一个运行中的 K8s 集群,并确保集群版本 >= 1.18.0 +2. 已经安装了 [Helm 3](https://helm.sh/docs/intro/install/) +3. 已经安装 PolarDB-X Operator 1.2.0 及以上的版本 + +### Helm 包安装 +首先创建一个名为 polardbx-monitor 的命名空间: + +```bash +kubectl create namespace polardbx-monitor +``` + +安装 PolarDBXMonitor CRD: +> 注意:如果您的 PolarDB-X Operator 1.2.0及以上 是通过 helm install 直接安装的,PolarDBXMonitor 的 CRD 会默认安装,可以跳过这步。如果您的 PolarDB-X Operator 是 从1.1.0 及以下的低版本通过 helm upgrade 升级而来,需要执行如下命令手工安装: + +```bash +kubectl apply -f https://raw.githubusercontent.com/polardb/polardbx-operator/v1.4.0/charts/polardbx-operator/crds/polardbx.aliyun.com_polardbxmonitors.yaml +``` + +执行如下命令安装 PolarDB-X Monitor: + +```bash + helm install --namespace polardbx-monitor polardbx-monitor polardbx-monitor-1.4.0.tgz +``` + +您也可以通过 PolarDB-X 的 Helm Chart 仓库安装: + +```bash +helm repo add polardbx https://polardbx-charts.oss-cn-beijing.aliyuncs.com +helm install --namespace polardbx-monitor polardbx-monitor polardbx/polardbx-monitor +``` + +> 注:通过这种方式安装 Prometheus 和 Grafana 采用的都是默认配置便于快速体验。如果部署在生产集群,你可以参考: [配置 Prometheus + Grafana](./4-prom-config.md) +> +> 注:如果您是在 minikube 上安装 PolarDB-X Monitor, 可能会因为资源不够导致组件无法创建,可以参考: [配置 Prometheus + Grafana](./4-prom-config.md) + + +期望看到如下输出: + +```shell +polardbx-operator monitor plugin is installed. Please check the status of components: + + kubectl get pods --namespace {{ .Release.Namespace }} + +Now start to monitor your polardbx cluster. +``` + +PolarDB-X Monitor 安装完成后,会在您 K8s 集群的 polardbx-monitor 命名空间下创建 prometheus 和 grafana 等组件,以此来监控 K8s 内的 PolarDB-X,通过如下命令检查相关组件是否正常,确认所有的 pod 都处于 Running 状态。 + +```bash +kubectl get pods -n polardbx-monitor +``` + + diff --git a/ops/monitor/2-monitor-cluster-exist.md b/ops/monitor/2-monitor-cluster-exist.md new file mode 100644 index 0000000..0e98371 --- /dev/null +++ b/ops/monitor/2-monitor-cluster-exist.md @@ -0,0 +1,25 @@ +# 为存量集群开启监控 +PolarDB-X 集群的监控采集功能默认是关闭的,需要为您需要监控的 PolarDBXCluster 创建 PolarDBXMonitor对象进行开启。 + +```bash +kubectl apply -f polardbx-monitor.yaml +``` + +其中 polardbx-monitor.yaml 的yaml 描述如下: + +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: PolarDBXMonitor +metadata: + name: quick-start-monitor +spec: + clusterName: quick-start + monitorInterval: 30s + scrapeTimeout: 10s +``` + +- spec.clusterName: 需要开启监控的 PolarDB-X 集群名称 +- spec.monitorInterval: 监控数据采集频率,默认30s +- spec.scrapeTimeout: 监控数据采集的超时时间,默认10s。注意:scrapeTimeout 的值需要小于 monitorInterval + + diff --git a/ops/monitor/3-monitoring.md b/ops/monitor/3-monitoring.md new file mode 100644 index 0000000..9541ca0 --- /dev/null +++ b/ops/monitor/3-monitoring.md @@ -0,0 +1,27 @@ +## 访问 Grafana Dashboard +默认情况下执行如下命令将 Grafana 端口转发到本地: + +```bash +kubectl port-forward svc/grafana -n polardbx-monitor 3000 +``` + +在浏览器中输入: [http://localhost:3000](http://localhost:3000), 即可访问到 PolarDB-X Dashboard,默认的用户名和密码都是 admin。 +> 注:由于 Grafana 的配置存储在 ConfigMap 中,您在 Grafana 中修改的密码或者新增的 Dashboard 不会被持久化,一旦 Grafana Pod 重建,这部分配置会丢失,请注意提前保存。 + + +![](./polardb-x-dashboard.png) +如果您的 K8s 集群中支持 LoadBalancer,你可以为 Grafana 的 Service 配置 LoadBalancer 进行访问,参考: + +如果您的 K8s 集群内有多个 PolarDB-X Cluster,可以通过 Grafana 页面上面的下拉框切换 Namespace 和 PolarDB-X Cluster。 + +## 访问 Prometheus +默认情况下执行如下命令将 Prometheus 端口转发到本地: + +```bash +kubectl port-forward svc/prometheus-k8s -n polardbx-monitor 9090 +``` + +在浏览器中输入: [http://localhost:9090](http://localhost:9090), 即可访问到 Prometheus页面。 + +如果您的 K8s 集群中支持 LoadBalancer,你可以为 Prometheus 的 Service 配置 LoadBalancer 进行访问,详见:[配置 Prometheus 和 Grafana](./4-prom-config.md) + diff --git a/ops/monitor/4-prom-config.md b/ops/monitor/4-prom-config.md new file mode 100644 index 0000000..673ff13 --- /dev/null +++ b/ops/monitor/4-prom-config.md @@ -0,0 +1,62 @@ +# 配置 Prometheus 和 Grafana +PolarDB-X Monitor 的 helm chart 采用了默认的 Prometheus 和 Grafana 配置,如果您想修改相关配置,可以使用如下的命令安装或者升级 PolarDB-X Monitor,通过 values.yaml 覆盖默认的配置。 + +```shell +helm install --namespace polardbx-monitor polardbx-monitor polardbx-monitor-1.4.0.tgz -f values.yaml +``` + +或者: + +```shell +helm upgrade --namespace polardbx-monitor polardbx-monitor polardbx-monitor-1.4.0.tgz -f values.yaml +``` + +values.yaml 文件包含了 Prometheus 和 Grafana 的相关配置项,下面针对常见的几种场景给出配置示例,详细的配置列表详见:[values.yaml](https://github.com/polardb/polardbx-operator/blob/main/charts/polardbx-monitor/values.yaml) 。 + +### 配置 LoadBalancer +如果您的 K8s 集群支持 LoadBalancer,可以在安装或者升级 PolarDB-X Monitor 的时候通过 -f 参数指定如下配置: + +```yaml +monitors: + grafana: + serviceType: LoadBalancer + prometheus: + serviceType: LoadBalancer +``` + +### 持久化监控数据 +默认配置创建的 Prometheus 集群的监控数据是不持久化的,存在数据丢失的风险,您可以通过如果的values.yaml 指定数据持久化的目录: + +```yaml +monitors: + prometheus: + persist: true + # K8s 集群内支持的 storage class + storageClassName: ssd + # 存储空间的大小 + storageRequest: 100G +``` + +### 配置 Prometheus 和 Grafana 规格 +默认配置中,Prometheus 集群包含1个节点,每个节点限定8C16G资源,Grafana包含1个节点,每个节点限定4C8G的资源,您可以通过如下配置项修改 Prometheus 和 Grafana集群的规格和节点数量: + +```yaml +monitors: + grafana: + resources: + requests: + cpu: 1000m + memory: 2Gi + limits: + cpu: 2000m + memory: 8Gi + prometheus: + resources: + requests: + cpu: 1000m + memory: 2Gi + limits: + cpu: 2000m + memory: 8Gi +``` + diff --git a/ops/monitor/5-alert-config.md b/ops/monitor/5-alert-config.md new file mode 100644 index 0000000..be3f1eb --- /dev/null +++ b/ops/monitor/5-alert-config.md @@ -0,0 +1,244 @@ +# 告警配置 +polardbx-operator 基于 Prometheus + AlertManager 提供了灵活的告警能力。本文档介绍如何为 polardbx-operator 创建的集群配置报警。 + +## 前置条件 + +1. 安装 polardbx-operator 和 polardbx-monitor 组件,且 polardbx-monitor 版本不低于 1.4.0。 +2. 创建一个 PolarDB-X 集群并开启监控,参考文档:[为存量集群开启监控](https://doc.polardbx.com/operator/ops/monitor/2-monitor-cluster-exist.html)。 +3. 因为发送报警消息需要访问对应的渠道(邮件、钉钉),因此您的 K8s 集群需要有一台能访问对应渠道Endpoint的机器,例如需要能访问SMTP服务器或者钉钉webhook。 + +## 启动 AlertManager 并访问 +PolarDB-X Operator 1.4.0 版本已默认集成 AlertManager 并配置开箱即用的告警能力,只需要简单几步即可开启。 + +1. 准备好一台具备访报警渠道Endpoint的机器,执行如下命令,为该节点打上部署 AlertManager 的标签。 +```shell +kubectl label node {node 名} alertmanager=true +``` +> 上述命令中的 {node ming}需要替换为 Kubectl get node 结果中的 NAME + +2. 执行如下命令,开启 AlertManager并将其部署到具备 alertmanager 标签的机器上 +```shell +kubectl patch alertmanager main -n polardbx-monitor --type='merge' -p '{"spec": {"replicas" : 1, "nodeSelector": {"alertmanager": "true"}}}' +``` + +3. 执行如下命令,等待 AlertManager 的 pod 变为 Running 状态: +```shell +kubectl get pods -n polardbx-monitor alertmanager-main-0 +``` + +4. 执行如下命令,将 AlertManager 的端口转发到本地,并通过浏览器访问 +```shell +kubectl port-forward svc/alertmanager-main --address=0.0.0.0 9093 -n polardbx-monitor +``` +浏览器中输入 http://{ip}:9093/ 即可访问 AlertManager。 + + +## 配置 AlertManager 报警推送渠道 + +AlertManager 支持多种报警推送渠道的配置,详见:[CONFIGURATION](https://prometheus.io/docs/alerting/latest/configuration/)。本文档以邮件和钉钉为例,介绍如何配置 AlertManager 推送报警到邮件和钉钉。 + +### 配置报警推送到邮件 +1.创建 alertmanger-secret.yaml 文件并复制如下内容到文件中: +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: alertmanager-main + namespace: polardbx-monitor + labels: + app.kubernetes.io/instance: polardbx-monitor + app.kubernetes.io/managed-by: Helm +stringData: + alertmanager.yaml: |- + global: + smtp_smarthost: smtp.gmail.com:587 ## 发送报警的邮箱配置 + smtp_from: + smtp_auth_username: + smtp_auth_identity: + smtp_auth_password: + receivers: + - name: email_receiver + email_configs: + - to: ## 推送报警的邮箱列表 + send_resolved: true + route: + group_by: + - job + group_interval: 5m + group_wait: 30s + receiver: email_receiver + repeat_interval: 12h + routes: + - receiver: email_receiver + group_wait: 10s + templates: + - '/etc/alertmanager/config/*.tmpl' +type: Opaque +``` +2. 修改上述文件中 global及receivers.email_configs.to 相关配置项,填写发送报警邮件的邮箱信息。 +3. 执行如下命令应用上述配置: +```shell +kubectl delete -f alertmanger-secret.yaml +kubectl apply -f alertmanger-secret.yaml +``` + +### 配置报警推送到钉钉 + +AlertManager 目前还没有直接支持钉钉,如果需要发送消息到钉钉,需要部署钉钉报警的插件,通过webhook的方式,将AlertManager的消息转换成钉钉的报警格式,发送出去。 + +1. 在钉钉群中创建一个 webhook 机器人,并记录下 webhook 的 url 和 secret。 +2. 创建 dingtalk-webhook.yaml, 并复制如下内容到dingtalk-webhook.yaml中: +```yaml +apiVersion: v1 +data: + config.yaml: | + ## + # This config is for prometheus-webhook-dingtalk instead of Kubernetes! + ## + + ## Request timeout + # timeout: 5s + + ## Customizable templates path + templates: + - /config/template.tmpl + + ## You can also override default template using `default_message` + ## The following example to use the 'legacy' template from v0.3.0 + # default_message: + # title: '{{ template "legacy.title" . }}' + # text: '{{ template "legacy.content" . }}' + targets: + webhook1: + # 修改这边的url 和 secret即可 + url: https://oapi.dingtalk.com/robot/send?access_token=e2*** + secret: SECc***** +kind: ConfigMap +metadata: + labels: + app: alertmanager-webhook-dingtalk + name: alertmanager-webhook-dingtalk-config + namespace: polardbx-monitor +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + labels: + run: dingtalk + name: webhook-dingtalk + namespace: polardbx-monitor +spec: + replicas: 1 + selector: + matchLabels: + alertmanager: true + template: + metadata: + labels: + run: dingtalk + spec: + containers: + - args: + - --web.listen-address=:8060 + - --config.file=/config/config.yaml + image: timonwong/prometheus-webhook-dingtalk:v1.4.0 + name: alertmanager-webhook-dingtalk + ports: + - containerPort: 8060 + name: http + resources: + limits: + cpu: 100m + memory: 100Mi + volumeMounts: + - mountPath: /config + name: config + volumes: + - configMap: + name: alertmanager-webhook-dingtalk-config + name: config +--- +apiVersion: v1 +kind: Service +metadata: + labels: + run: dingtalk + name: webhook-dingtalk + namespace: polardbx-monitor +spec: + ports: + - port: 8060 + protocol: TCP + targetPort: 8060 + selector: + run: dingtalk + sessionAffinity: None +``` +> 注意:24行的 url 和 secret字段为你配置的钉钉机器人配置。 + +2. 执行如下命令,部署钉钉报警插件: +```shell +kubectl apply -f dingtalk-webhook.yaml +``` + +3.执行如下命令创建 alertmanger-secret.yaml +```shell +vim alertmanger-secret.yaml +``` +拷贝如下内容到 alertmanger-secret.yaml中: +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: alertmanager-main + namespace: polardbx-monitor + labels: + app.kubernetes.io/instance: polardbx-monitor + app.kubernetes.io/managed-by: Helm +stringData: + alertmanager.yaml: |- + global: + resolve_timeout: 5m + receivers: + - name: dingtalk-webhook + webhook_configs: + - send_resolved: true + url: "http://webhook-dingtalk.polardbx-monitor:8060/dingtalk/webhook1/send" + route: + group_by: + - job + group_interval: 5m + group_wait: 30s + receiver: dingtalk-webhook + repeat_interval: 12h + routes: + - receiver: dingtalk-webhook + group_wait: 10s + templates: + - '/etc/alertmanager/config/*.tmpl' +type: Opaque +``` + +4. 执行如下命令应用报警配置: +```shell +kubectl delete -f alertmanger-secret.yaml +kubectl apply -f alertmanger-secret.yaml +``` + +## 查看已有告警项 + +方法 1:参考文档:[查看监控](https://doc.polardbx.com/operator/ops/monitor/3-monitoring.html) 访问Prometheus 控制台,查看 Alerts Tab 页面,即可查看到系统重已配置的告警项。如下图所示: + +![](./alert-rules.png) + +方法2: + +PolarDB-X 的告警项都是通过 PrometheusRule 对象配置的,执行如下命令,看下 PolarDB-X 的 PrometheusRule: +```shell +kubectl get prometheusrule -n polardbx-monitor polardbx-alert-rules -o yaml +``` + +您也可以通过如下的命令修改 PrometheusRule 中的告警配置或者增加新的告警项: +```shell +kubectl edit prometheusrule -n polardbx-monitor polardbx-alert-rules +``` \ No newline at end of file diff --git a/ops/monitor/README.md b/ops/monitor/README.md new file mode 100644 index 0000000..3dcfb4e --- /dev/null +++ b/ops/monitor/README.md @@ -0,0 +1,12 @@ +监控告警 +==== + +[安装监控组件](./1-monitor-install.md) + +[为存量集群开启监控](./2-monitor-cluster-exist.md) + +[查看监控](./3-monitoring.md) + +[配置 Prometheus + Grafana](./4-prom-config.md) + +[告警配置](./5-alert-config.md) \ No newline at end of file diff --git a/ops/monitor/alert-rules.png b/ops/monitor/alert-rules.png new file mode 100644 index 0000000..d683fa1 Binary files /dev/null and b/ops/monitor/alert-rules.png differ diff --git a/ops/monitor/polardb-x-dashboard.png b/ops/monitor/polardb-x-dashboard.png new file mode 100644 index 0000000..a17158a Binary files /dev/null and b/ops/monitor/polardb-x-dashboard.png differ diff --git a/ops/rebuild/README.md b/ops/rebuild/README.md new file mode 100644 index 0000000..32f34bf --- /dev/null +++ b/ops/rebuild/README.md @@ -0,0 +1,9 @@ +备库重搭 +============ +1. [介绍](./rebuild.md) +2. [健康检查](./health_check.md) +3. 发起备库重搭 + 1. [重搭follower节点](./rebuild_follower.md) + 2. [重搭logger节点](./rebuild_logger.md) + 3. [重搭learner节点](./rebuild_learner.md) +4. [自动备库重搭](./rebuild_auto.md) \ No newline at end of file diff --git a/ops/rebuild/health_check.md b/ops/rebuild/health_check.md new file mode 100644 index 0000000..7837bfc --- /dev/null +++ b/ops/rebuild/health_check.md @@ -0,0 +1,23 @@ +备库健康检查 +============== +本文讲解,如何对备库的健康度进行检查,来判断是否要发起一次备库重搭。 + +## POD检查 +查看有没有不在Running状态的DN POD +```bash +kubectl get pods -l xstore/name --show-labels | grep -v Running +``` +检查不健康的POD所在xstore是否在做正常的任务,比如变配、升降级等预期内的任务,如果不是,且该POD一直无法恢复ready状态,可以考虑发起一个备库重搭任务。 + +## 复制线程和延迟检查 +由于程序bug,可能导致备库上的复制线程发生中断。通过在备库上执行如下语句,可查看复制状态。 +```sql +show slave status +``` +如果Slave_SQL_Running属性为No,且Last_Error属性不为空,则表示备库复制任务出现问题,这时候首先明确导致复制中断原因,之后发起备库重搭来恢复备库。 + +由于某些原因(备库挂掉了很长时间、备库宿主机有问题等),导致备库延迟过大,追上主库需要非常多的时间,比如需要几十个小时,这时可选择发起一次备库重搭。 + +### 如何查看备库延迟? +1. 在备库上执行 show slave status,查看Seconds_Behind_Master属性; +2. 在主库上执行 select * from information_schema.alisql_cluster_global 比较备库是主库上APPLIED_INDEX属性值上的差异,主备库上的APPLIED_INDEX属性值的增长速度,可预估出需要多长时间才能追上主库的日志。 \ No newline at end of file diff --git a/ops/rebuild/image/check_follower.png b/ops/rebuild/image/check_follower.png new file mode 100644 index 0000000..76dd37e Binary files /dev/null and b/ops/rebuild/image/check_follower.png differ diff --git a/ops/rebuild/image/check_learner.png b/ops/rebuild/image/check_learner.png new file mode 100644 index 0000000..015d22b Binary files /dev/null and b/ops/rebuild/image/check_learner.png differ diff --git a/ops/rebuild/image/check_logger.png b/ops/rebuild/image/check_logger.png new file mode 100644 index 0000000..2553106 Binary files /dev/null and b/ops/rebuild/image/check_logger.png differ diff --git a/ops/rebuild/image/get_xf_follower.png b/ops/rebuild/image/get_xf_follower.png new file mode 100644 index 0000000..5001649 Binary files /dev/null and b/ops/rebuild/image/get_xf_follower.png differ diff --git a/ops/rebuild/image/get_xf_learner.png b/ops/rebuild/image/get_xf_learner.png new file mode 100644 index 0000000..16ba51f Binary files /dev/null and b/ops/rebuild/image/get_xf_learner.png differ diff --git a/ops/rebuild/image/get_xf_logger.png b/ops/rebuild/image/get_xf_logger.png new file mode 100644 index 0000000..b22f10d Binary files /dev/null and b/ops/rebuild/image/get_xf_logger.png differ diff --git a/ops/rebuild/image/open_rebuild_auto.png b/ops/rebuild/image/open_rebuild_auto.png new file mode 100644 index 0000000..8235c96 Binary files /dev/null and b/ops/rebuild/image/open_rebuild_auto.png differ diff --git a/ops/rebuild/rebuild.md b/ops/rebuild/rebuild.md new file mode 100644 index 0000000..0c99093 --- /dev/null +++ b/ops/rebuild/rebuild.md @@ -0,0 +1,49 @@ +备库重搭 +=========================== +PolarDB-X的DN由三个节点组成,角色为leader、follower、logger。当三个节点的其中一个由于软件或者硬件原因而无法继续提供服务的时候,我们需要对问题节点进行重建,即备库重搭。 +本文主要介绍,针对存储节点的备库重搭功能。 + +# 备库重搭任务 + +备库重搭任务被定义为类型名为XStoreFollower一个Custom Resource Definition对象,用户可通过书写CRD yaml文件,发起一个备库重搭任务,等任务成功后再删除备库重搭任务。 + +## 重搭流程 +一般的备库重搭流程为:从主节点上发起一次流式备份,将备份集直接发送到目标机器上,然后在目标机器上进行恢复流程,最后进行必要的参数配置,恢复出一个备节点来。 +具体流程如下表所示: + +| 步骤 | learner/follower是否需要 | logger是否需要 | +|----------------------------------------|----------------------|------------| +| 校验参数 | 是 | 是 | +| 确定源节点 | 是 | 否 | +| 创建一个临时POD目标节点 | 跨机重搭时需要 | 跨机重搭时需要 | +| 从源头节点上流式备份到目标节点上 | 是 | 否 | +| 在目标节点上做备份恢复 | 是 | 否 | +| 刷三节点元数据,包括三节点ip和binlog启始位点 | 是 | 是 | +| 删除临时pod,将正式的pod创建到临时pod所在的节点 | 是 | 是 | +| 清理数据,包括:原pod的数据和备份集数据 | 是 | 是 | + + +## 参数介绍 + +| 参数 | 描述 | 是否必填 | 字段类型 | 默认值 | +|---------------------|-------------------------------------|---------|------|--------------------------------| +| .metadata.name | 备库重搭任务名称 | 是 | 字符串 | | +| .spec.fromPodName | 源POD名称,即对这个POD发起流式备份 | 否 | 字符串 | 从xstore实例上选择一个健康的节点 | +| .spec.targetPodName | 被重搭的POD名称 | 是 | 字符串 || +| .spec.local | 是否本机重搭 | 是 | 布尔值 | 默认为跨机重搭 | +| .spec.XStoreName | 源POD的XStore实例名称 | 是 | 字符串 || +| .spec.NodeName | 可指定重搭到某个NODE上 | 否 | 字符串 | 本机重搭为目标POD的NODE名称;跨机重搭,则依据调度策略 | + + +## 例子 +名称xstore-xxx的xstore实例下的一个POD名称为xstore-xxx-pod-1的节点数据损坏服务继续提供服务,我们使用以下的yaml创建出一个跨机备库重搭任务,并等待重搭任务执行成功。 +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: XStoreFollower +metadata: + name: rebuildjob +spec: + local: false + targetPodName: xstore-xxx-pod-1 + xStoreName: xstore-xxx +``` diff --git a/ops/rebuild/rebuild_auto.md b/ops/rebuild/rebuild_auto.md new file mode 100644 index 0000000..7e0da85 --- /dev/null +++ b/ops/rebuild/rebuild_auto.md @@ -0,0 +1,32 @@ +自动备库重搭 +=========================== +PolarDB-X Operator 从 1.4.0 版本开始支持自动备库重搭,会检测数据节点备库的状态,在一定条件下自动发起备库重搭任务。 + +## 开关 +默认关闭。 如需打开,可设置operator启动参数 `-feature-gates=EnableAutoRebuildFollower`, 多个feature-gate之间使用`,`分隔。 + +### 方式1:helm安装或者升级operator时指定 +编辑values.yaml文件中`featureGates`字段,如下所示: +```yaml +controllerManager: + name: polardbx-controller-manager + featureGates: [ EnableAutoRebuildFollower ] +``` + +### 方式2:直接修改operator的deployment定义 +修改.spec.containers[0].args 如下 +```bash +kubectl -n polardbx-operator-system edit deployment polardbx-controller-manager +``` +![open_rebuild_auto.png](image/open_rebuild_auto.png) + +## 触发条件 +备库上执行`show slave status`的返回结果需要满足以下条件: + +- `Slave_SQL_Running`为`No` +- `Last_SQL_Error`不为空 + + + + + diff --git a/ops/rebuild/rebuild_follower.md b/ops/rebuild/rebuild_follower.md new file mode 100644 index 0000000..4226218 --- /dev/null +++ b/ops/rebuild/rebuild_follower.md @@ -0,0 +1,37 @@ +重搭follower节点 +================ + +## 检查follower节点是否为无法恢复 +![check_follower.png](./image/check_follower.png) + +## 对follower节点发起跨机备库重搭 +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: XStoreFollower +metadata: + name: rebuildjob-follower +spec: + local: false + targetPodName: rebuild-demo-6jrr-dn-0-cand-1 + xStoreName: rebuild-demo-6jrr-dn-0 +``` + +## 对follower发起本机备库重搭 +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: XStoreFollower +metadata: + name: rebuildjob-follower +spec: + local: true + targetPodName: rebuild-demo-6jrr-dn-0-cand-1 + xStoreName: rebuild-demo-6jrr-dn-0 +``` + +## 查看备库重搭任务是否成功 +![get_xf_follower.png](./image/get_xf_follower.png) + +## 删除备库重搭任务 +```bash +kubectl delete xf rebuildjob-follower +``` \ No newline at end of file diff --git a/ops/rebuild/rebuild_learner.md b/ops/rebuild/rebuild_learner.md new file mode 100644 index 0000000..6b57b28 --- /dev/null +++ b/ops/rebuild/rebuild_learner.md @@ -0,0 +1,37 @@ +重搭learner节点(只读实例下的DN节点) +================ + +## 检查follower节点是否为无法恢复 +![check_learner.png](./image/check_learner.png) + +## 对learner节点发起跨机备库重搭 +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: XStoreFollower +metadata: + name: rebuildjob-learner +spec: + local: false + targetPodName: rebuild-demo-ro-pztl-dn-0-learner-0 + xStoreName: rebuild-demo-pqlk-dn-0 +``` + +## 对learner发起本机备库重搭 +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: XStoreFollower +metadata: + name: rebuildjob-learner +spec: + local: true + targetPodName: rebuild-demo-ro-pztl-dn-0-learner-0 + xStoreName: rebuild-demo-pqlk-dn-0 +``` +>> 注意: xStoreName需要填,只读节点对应的主节点的xstore实例名称 +## 查看备库重搭任务是否成功 +![get_xf_leaner.png](./image/get_xf_learner.png) + +## 删除备库重搭任务 +```bash +kubectl delete xf rebuildjob-learner +``` \ No newline at end of file diff --git a/ops/rebuild/rebuild_logger.md b/ops/rebuild/rebuild_logger.md new file mode 100644 index 0000000..77cbada --- /dev/null +++ b/ops/rebuild/rebuild_logger.md @@ -0,0 +1,37 @@ +重搭logger节点 +================ + +## 检查logger节点是否为无法恢复 +![check_logger.png](./image/check_logger.png)! + +## 对logger节点发起跨机备库重搭 +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: XStoreFollower +metadata: + name: rebuildjob-logger +spec: + local: false + targetPodName: rebuild-demo-6jrr-dn-0-log-0 + xStoreName: rebuild-demo-6jrr-dn-0 +``` + +## 对follower发起本机备库重搭 +```yaml +apiVersion: polardbx.aliyun.com/v1 +kind: XStoreFollower +metadata: + name: rebuildjob-logger +spec: + local: true + targetPodName: rebuild-demo-6jrr-dn-0-log-0 + xStoreName: rebuild-demo-6jrr-dn-0 +``` + +## 查看备库重搭任务是否成功 +![get_xf_logger.png](image/get_xf_logger.png) + +## 删除备库重搭任务 +```bash +kubectl delete xf rebuildjob-logger +``` \ No newline at end of file