MySQLSG: Deploying NDB Cluster on Kubernetes

A. About NDB Cluster

MySQL NDB Cluster is an in-memory database cluster of MySQL based on shared nothing and distributed architecture. NDB cluster uses NDB storage engine to store data in memory-optimized tables with durability for extra fast performance and low latency to handle extreme high volume of transactions as well as meeting ACID compliance. NDB Cluster enables data to be redundantly distributed on several cluster nodes without single point of failure to meet 99.999% system availability SLA.

In a nutshell, NDB Cluster nodes (processes) can be divided into 3 building blocks based on types:

1. Data Nodes (ndbd / ndbmtd)

When storage engine for a table is defined as ndb (or ndbcluster), the data is stored in memory of data node servers and this data will be directly accessible from all SQL Nodes / API Nodes whereby users can access data.

2. SQL Nodes (mysqld) / API Nodes

SQL-based operations are performed via SQL nodes running mysqld server daemon, which has support for NDB storage engine. Applications or user sessions in performing query, DML, or even DDL, can connect to any of SQL Nodes that access backend data node servers where data is stored.

3. Management Nodes (ndb_mgmd)

Management nodes are nothing but centralized cluster related system variables for data nodes and API Nodes / SQL Nodes as well as to control cluster membership. New node that comes up will firstly connect to management node to get cluster parameters for the node to run. Therefore, the management node has to be run prior starting up any other nodes. In case of network partitioning, the management node will act as an arbitrator for the cluster.

NDB Cluster Architecture

This article does not focus to discuss about NDB Cluster in detail, but focuses to provide simple DIY solution to deploy management node, SQL nodes, and data nodes on Kubernetes. More information about NDB Cluster can be found from https://dev.mysql.com/doc/refman/8.0/en/mysql-cluster-overview.html

B. Docker Image for NDB Cluster

In microservices context, Kubernetes is an orchestrator for containers that run docker. This article uses Minikube for simplicity reason. If you have not done Minikube installation, please follow section A and B from this post, https://mysqlsg.blogspot.com/2020/05/deploying-mysql-innodb-cluster-on.html before doing this article.

Next, you need an NDB Cluster docker image. If you don’t have one, please do below command to pull the image from GitHub and store it on your Minikube.

$ minikube ssh -- docker pull mysql/mysql-cluster

Then, run the following command to check if NDB Cluster image is successfully downloaded.

$ minikube ssh -- docker images | grep mysql-cluster

The above command shall return “mysql/mysql-cluster latest’.

C. Pre-requisites for Management Node

To deploy a management node, you need a config.ini. Config.ini is a file which consists of all cluster parameters required for starting data nodes and SQL nodes. You can use below sample as starting point <Please name the file as: config_ini.yaml>

[ndbd default]

NoOfReplicas=2

DataMemory=98M

[ndb_mgmd]

NodeId=1

HostName=mgmt-0

DataDir=/var/lib/mysql

[ndbd]

HostName=dataa-0

NodeId=2

DataDir=/var/lib/mysql

ServerPort=2202

[ndbd]

HostName=datab-0

NodeId=3

DataDir=/var/lib/mysql

ServerPort=2202

[mysqld]

Run the following command to upload config_ini.yaml onto Kubernetes as a configMap, named config-ini.

$ kubectl -n mysql-cluster create configmap config-ini --from-file=config_ini.yaml

config-ini has to be mounted as /etc/config.ini on management node. See section F for detail manifest in YAML file format. To start ndb_mgmd daemon, the YAML for management node in section F contains:

"sleep 5; ndb_mgmd --config-file= /etc/config.ini --config-dir=/home; while true; sleep 1; done; "

"sleep 5" in the command argument is used to pause the pod for 5 seconds to ensure the ndb_mgmd daemon on management node starts after all pods for data nodes are available. This is because all pods for data nodes must be reachable before starting ndb_mgmd daemon, otherwise the management node will fail to start.

Although a management node does not need to persist data, I recommend to use StatefulSet with replicas =1 for deploying a management node. We set replicas=1 because management node doesn’t need to be redundant since high availability of management node can be provided by Kubernetes. In the event of management node is crashed, kubelet will automatically start a new management node for the NDB cluster.

D. Pre-requisites for Data Node

Like management node, to start a data node, we need option file that tells the hostname of the management node. The following is an example for option file required by a data node <please name the file as: datanode_ini.yaml>

[mysqld]

ndbcluster

ndb-connectstring=mgmt-0

Run the following command to upload datanode_ini.yaml as a configMap, named datanode.

$ kubectl -n mysql-cluster create configmap datanode --from-file=datanode_ini.yaml

This configMap has to be mounted as /etc/datanode.cnf (see section F for detail). To start ndbd daemon, the YAML uses the following command:

"sleep 10; ndbd --defaults-file=/etc/datanode.cnf --ndb-connectstring=mgmt-0; while true; sleep 1; done; "

"sleep 10" in the command argument is used to pause the pod for 5 seconds to ensure the ndbd daemon on data node starts after the management node.

Although a data node stores data in memory, the LDM creates the data copies (snapshot) to the disk every regular LCP to provide the durability of data. An LDM is responsible for tuple storage, hash and T-tree indexes, page buffer and tablespace management, writing and restoring local checkpoints, and Disk Data log management. Thus, I recommend to use StatefulSet with PV/PVC for physical disk storage. However, in this article, I don’t use PV/PVC for simplicity reason.

To provide redundancy, each data node within a data node group shall have (at least) a replica. So, I recommend two StatefulSet with replicas=1 for a single data node group. If we want to expand data node into two or more data node groups, we just need to set replicas equal to the number of data node groups. In theory, expanding the data node group is used to store databases that are larger than the memory capacity of one data node. Thus, table data can be sharded across data node groups for performance gain. See below diagram.

Data Nodes on Kubernetes with single node group vs multi node groups

E. Pre-requisites for SQL Node

To deploy SQL Node, you need a proper option file. Use the following sample as start (file name: my_cnf.yaml), and deploy it as configMap on Kubernetes.

[mysqld]

ndbcluster

ndb-connectstring=mgmt-0

user=mysql

Run this command to deploy:

$ kubectl -n mysql-cluster create configmap my-cnf --from-file=my_cnf.yaml

To spin mysqld on Kubernetes, we need to use secret to determine root password. Use below YAML (file name: secret.yaml) to set root password as “root”.

apiVersion: v1

kind: Secret

metadata:

name: mysql-root-password

namespace: mysql-cluster

type: Opaque

data:

password: cm9vdA==

Run this command to deploy:

$ kubectl apply -f secret.yaml

In contrast to InnoDB Clusters, mysqld on NDB Clusters does not have to store data because all data is intended to be stored on the data node. Because SQL Nodes in NDB Clusters can be stateless, I recommend SQL Nodes to be deployed in Kubernetes as Deployment.

F. NDB Cluster YAML File

At this point, your preparation is compete and you are ready to deploy your first NDB Cluster on Kubernetes! All you need now is a YAML file. Copy below text, paste into a text file, and save it as all_cluster.yaml. You can use this file as a baseline YAML and you can freely modify it to meet your needs.

---

apiVersion: v1

kind: Service

metadata:

name: mgmt-0

namespace: mysql-cluster

spec:

ports:

- name: mgmtport

port: 1186

targetPort: 1186

- name: pingport

port: 7

targetPort: 7

selector:

statefulset.kubernetes.io/pod-name: mgmt-0

clusterIP: None

---

apiVersion: v1

kind: Service

metadata:

name: dataa-0

namespace: mysql-cluster

spec:

ports:

- name: dataport

port: 2202

targetPort: 2202

- name: pingport

port: 7

targetPort: 7

selector:

statefulset.kubernetes.io/pod-name: dataa-0

clusterIP: None

---

apiVersion: v1

kind: Service

metadata:

name: datab-0

namespace: mysql-cluster

spec:

ports:

- name: dataport

port: 2202

targetPort: 2202

- name: pingport

port: 7

targetPort: 7

selector:

statefulset.kubernetes.io/pod-name: datab-0

clusterIP: None

---

apiVersion: v1

kind: Service

metadata:

name: mysql-cluster

namespace: mysql-cluster

labels:

app: mysql-cluster

spec:

ports:

- name: tcp-rw

port: 3306

targetPort: 3306

selector:

app: mysql-cluster

type: LoadBalancer

---

apiVersion: apps/v1

kind: StatefulSet

metadata:

name: mgmt

namespace: mysql-cluster

spec:

serviceName: "mgmt"

replicas: 1

selector:

matchLabels:

app: mgmt

template:

metadata:

labels:

app: mgmt

spec:

containers:

- image: mysql/mysql-cluster:latest

name: mysql

volumeMounts:

- name: mysql-configmap-volume

mountPath: /etc/config.ini

subPath: config.ini

command: ["/bin/sh"] # do not modify

args: ["-c", "sleep 30; ndb_mgmd --config-file=/etc/config.ini --config-dir=/home; while true; do sleep 1; done;"] # do not modify

volumes:

- name: mysql-configmap-volume

configMap:

name: config-ini

items:

- key: config_ini.yaml

path: config.ini

---

apiVersion: apps/v1

kind: StatefulSet

metadata:

name: dataa

namespace: mysql-cluster

spec:

serviceName: "dataa"

replicas: 1

selector:

matchLabels:

app: dataa

template:

metadata:

labels:

app: dataa

spec:

containers:

- image: mysql/mysql-cluster:latest

name: mysql

volumeMounts:

- name: mysql-configmap-volume

mountPath: /etc/datanode.cnf

subPath: datanode.cnf

command: ["/bin/sh", "-c", "sleep 60; ndbd --defaults-file=/etc/datanode.cnf --ndb-connectstring=mgmt-0; while true; do sleep 1; done;"]

volumes:

- name: mysql-configmap-volume

configMap:

name: datanode

items:

- key: datanode_ini.yaml

path: datanode.cnf

---

apiVersion: apps/v1

kind: StatefulSet

metadata:

name: datab

namespace: mysql-cluster

spec:

serviceName: "datab"

replicas: 1

selector:

matchLabels:

app: datab

template:

metadata:

labels:

app: datab

spec:

containers:

- image: mysql/mysql-cluster:latest

name: mysql

volumeMounts:

- name: mysql-configmap-volume

mountPath: /etc/datanode.cnf

subPath: datanode.cnf

command: ["/bin/sh", "-c", "sleep 60; ndbd --defaults-file=/etc/datanode.cnf --ndb-connectstring=mgmt-0; while true; do sleep 1; done;"]

volumes:

- name: mysql-configmap-volume

configMap:

name: datanode

items:

- key: datanode_ini.yaml

path: datanode.cnf

---

apiVersion: apps/v1

kind: Deployment

metadata:

name: mysql-cluster

namespace: mysql-cluster

spec:

replicas: 2

selector:

matchLabels:

app: mysql-cluster

strategy:

type: Recreate

template:

metadata:

labels:

app: mysql-cluster

spec:

hostname: mysql-cluster

containers:

- image: mysql/mysql-cluster:latest

name: mysql

env:

- name: MYSQL_ROOT_PASSWORD

valueFrom:

secretKeyRef:

name: mysql-root-password

key: password

volumeMounts:

- name: mysql-configmap-volume

mountPath: /etc/my.cnf

subPath: my.cnf

volumes:

- name: mysql-configmap-volume

configMap:

name: my-cnf

items:

- key: my_cnf.yaml

path: my.cnf

G. Deploying NDB Cluster YAML

Simply run the following command and wait 1 min for cluster to be ready

$ kubectl apply -f all_nodes.yaml

H. Testing NDB Cluster

After 1 min, run the following command to see if all pods are running:

goldfish.local:~/webinar/cge-k8s $ kubectl -n mysql-cluster get pod

NAME READY STATUS RESTARTS AGE

dataa-0 1/1 Running 0 68s

datab-0 1/1 Running 0 68s

mgmt-0 1/1 Running 0 69s

mysql-cluster-796d8b4d78-dwn9j 1/1 Running 0 68s

mysql-cluster-796d8b4d78-n6n4l 1/1 Running 0 68s

goldfish.local:~/webinar/cge-k8s $

Check cluster status from management node:

goldfish.local:~/webinar/cge-k8s $ kubectl -n mysql-cluster exec -it mgmt-0 -- ndb_mgm -c localhost -e show

Connected to Management Server at: localhost:1186

Cluster Configuration

---------------------

[ndbd(NDB)] 2 node(s)

id=2 @172.17.0.12 (mysql-5.7.30 ndb-7.6.14, Nodegroup: 0, *)

id=3 @172.17.0.15 (mysql-5.7.30 ndb-7.6.14, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)

id=1 @172.17.0.16 (mysql-5.7.30 ndb-7.6.14)

[mysqld(API)] 2 node(s)

id=4 @172.17.0.14 (mysql-5.7.30 ndb-7.6.14)

id=5 @172.17.0.13 (mysql-5.7.30 ndb-7.6.14)

goldfish.local:~/webinar/cge-k8s $

Now you can play with your cluster, following is the example.

1. Create an NDB table from SQL node 1 and insert three records

goldfish.local:~/webinar/cge-k8s $ kubectl -n mysql-cluster exec -it mysql-cluster-796d8b4d78-dwn9j -- mysql -uroot -h127.0.0.1 -proot -e "create database test; create table test.test (i int) engine=ndb; insert into test.test values (1); insert into test.test values (2); insert into test.test values (3); select * from test.test;"

mysql: [Warning] Using a password on the command line interface can be insecure.

+------+

| i |

+------+

| 1 |

| 3 |

| 2 |

+------+

goldfish.local:~/webinar/cge-k8s $

2. Query that data from SQL node 2

goldfish.local:~/webinar/cge-k8s $ kubectl -n mysql-cluster exec -it mysql-cluster-796d8b4d78-n6n4l -- mysql -uroot -h127.0.0.1 -proot -e "select * from test.test;"

mysql: [Warning] Using a password on the command line interface can be insecure.

+------+

| i |

+------+

| 1 |

| 3 |

| 2 |

+------+

goldfish.local:~/webinar/cge-k8s $

The NDB Cluster works!

I. Future Article on NDB Cluster

I will write subsequent article about NDB Cluster runs on Kubernetes. The article will discuss on ways to expand data node group online without shutting down the cluster, explaining how it can be done on Kubernetes. The article will also discuss on how to use hybrid deployment, when data nodes are deployed on VM / bare metal (outside Kubernetes cluster).

Disclaimer:

The method and tricks presented here are experimental only and it’s your own responsibility to test, implement, and support in case of issues. It is only showing example and not for a production deployment, as this is not formally a supported configuration. I encourage more testing to be done, including by development team. Implementation with less layer as possible is recommended, and with having support from Oracle Support for real world implementation serving business application.

Stay tune, Thank you!

MySQLSG

Tuesday, June 30, 2020

Deploying NDB Cluster on Kubernetes

No comments:

Post a Comment