greenplum database – setup and configure

In this tutorial we will understand how to setup a Pivotal Greenplum 6 database cluster with following set up on Linux 6.

How to Install, Create and Configure Greenplum database 6.12 in Linux – Introduction

IP Address Hostname Implementations-1 Implementations-2
192.168.1.51 m_node01 master
192.168.1.52 segnode01 gpseg0(primary) gpseg1(mirror), master standby
192.168.1.53 segnode02 gpseg1(primary) gpseg0(mirror)

Greenplum-Architecture

Introducing the terms defined:

Greenplum Database is a massively parallel processing (MPP) database server with an architecture specially designed to manage large-scale analytic data warehouses and business intelligence workloads.

MPP (also known as a shared nothing architecture) refers to systems with two or more processors that cooperate to carry out an operation, each processor with its own memory, operating system and disks.

Greenplum uses this high-performance system architecture to distribute the load of multi-terabyte data warehouses, and can use all of a system’s resources in parallel to process a query.

Greenplum Master

The Greenplum Database master is the entry to the Greenplum Database system, accepting client connections and SQL queries, and distributing work to the segment instances.

Greenplum Database end-users interact with Greenplum Database (through the master) as they would with a typical PostgreSQL database.

Master Mirroring/Standby Master

You can deploy a backup or mirror of the master instance on a separate host machine or on the same host machine. A backup master or standby master serves as a warm standby if the primary master becomes nonoperational.

Greenplum Segments

Greenplum Database segment instances are independent PostgreSQL databases that each store a portion of the data and perform the majority of query processing.

When a user connects to the database via the Greenplum master and issues a query, processes are created in each segment database to handle the work of that query.

Segment Mirroring

When Greenplum Database High Availability is enabled, there are two types of segments: primary and mirror.

Each primary segment has one corresponding mirror segment.

A primary segment receives requests from the master to make changes to the segment’s database and then replicates those changes to the corresponding mirror.

If a primary segment becomes unavailable, database queries fail over to the mirror segment.

Segment mirroring employs a physical file replication scheme—data file I/O at the primary is replicated to the secondary so that the mirror’s files are identical to the primary’s files.

In our example., we are going to have the below set up

IP Address Hostname Implementations-1 Implementations-2
192.168.1.51 m_node01 master
192.168.1.52 segnode01 gpseg0(primary) gpseg1(mirror), master standby
192.168.1.53 segnode02 gpseg1(primary) gpseg0(mirror)
  • We use 192.168.1.51 as a master and 192.168.1.52 as a standby server.
  • We use 192.168.1.52 as a segment 1 and 192.168.1.53 for its standby(replication)
  • We use 192.168.1.53 as a segment 2 and 192.168.1.52 for its standby(replication)

We will divide the installation into three steps

  1. Pre-requisites installation or preparing the environment
  2. Greenplum software installation
  3. Configuring database setup

How to Install, Create and Configure Greenplum database in Linux : Pre-requisites

Perform the following tasks in order:

1. Make sure your host systems meet the requirements described in Platform Requirements.

The following is the configuration that I used on each of my machine.

CPU x86_64
Memory 2GB
Hard Disk 20 GB

2. Disable SELinux and firewall software.

If SELinux is not disabled, disable it by editing the /etc/selinux/config file.

SELINUX=disabled

Ensure that SELinux is disabled (as a root user)

/sbin/chkconfig iptables off

/sbin/chkconfig –list iptables

3. Set the required operating system parameters.

Set the parameters in the /etc/sysctl.conf file and reload with sysctl -p:

# kernel.shmall = _PHYS_PAGES / 2 # See Shared Memory Pages
kernel.shmall = 197951838
# kernel.shmmax = kernel.shmall * PAGE_SIZE
kernel.shmmax = 810810728448
kernel.shmmni = 4096
vm.overcommit_memory = 2 # See Segment Host Memory
vm.overcommit_ratio = 95 # See Segment Host Memory
net.ipv4.ip_local_port_range = 10000 65535 # See Port Settings
kernel.sem = 500 2048000 200 4096
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_max_syn_backlog = 4096

net.ipv4.conf.all.arp_filter = 1
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.swappiness = 10
vm.zone_reclaim_mode = 0
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
vm.dirty_background_ratio = 0 # See System Memory
vm.dirty_ratio = 0
vm.dirty_background_bytes = 1610612736
vm.dirty_bytes = 4294967296

fs.file-max = 65536

/sbin/sysctl -p

4. Set the security limits in /etc/security/limits.conf

* soft nofile 524288
* hard nofile 524288
* soft nproc 131072
* hard nproc 131072

4. Synchronize system clocks.

Ensure that the time sync is there in all the machines. For more details click here

5. Create the gpadmin account.

groupadd gpadmin
useradd gpadmin -r -m -g gpadmin
passwd gpadmin

Download Greenplum database software

wget https://github.com/greenplum-db/gpdb/releases/download/6.12.1/greenplum-db-6.12.1-rhel6-x86_64.rpm

chmod 755 greenplum-db-6.12.1-rhel6-x86_64.rpm

Install the rpm

rpm -ivh greenplum-db-6.12.1-rhel6-x86_64.rpm

In my case, I got some failed dependencies, I installed them one by one.

yum install -y krb5-devel
yum install -y libyaml
yum install -y libevent2

Then I installed Greenplum software

[root@mnode ~]#
[root@mnode ~]# rpm -ivh greenplum-db-6.12.1-rhel6-x86_64.rpm
Preparing… ########################################### [100%]
1:greenplum-db-6 ########################################### [100%]
[root@mnode ~]#

Greenplum database software – Post Installation steps

  • Stop the virtual machine and clone it to two more machines.
  • Set IP addresses as mentioned above

Now your setup should look like

IP Address Hostname
192.168.1.51 m_node01
192.168.1.52 segnode01
192.168.1.53 segnode02

Edit /etc/hosts file by adding below three entries and ensure it is copied in all three machines.

192.168.1.51 mnode mnode
192.168.1.52 segnode01 segnode01
192.168.1.53 segnode02 segnode02

Setup passwordless connectivity between three machines

su gpadmin
ssh-keygen -t rsa -b 4096
Generating public/private rsa key pair.
Enter file in which to save the key (/home/gpadmin/.ssh/id_rsa):
Created directory ‘/home/gpadmin/.ssh’.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:

Source the path file in the Greenplum Database installation directory.

echo “source /usr/local/greenplum-db/greenplum_path.sh” >> .bashrc
echo “export MASTER_DATA_DIRECTORY=/data/master/gpseg-1” >> .bashrc
source .bashrc

Use the ssh-copy-id command to add the gpadmin user’s public key to the authorized_hosts SSH file on every other host in the cluster.

ssh-copy-id segnode01
ssh-copy-id segnode02

Repeat the steps in all machines so that you have passwordless connectivity beween all the three machines.

In the gpadmin home directory, create a file named hostfile_exkeys that has the machine
configured host names and host addresses (interface names) for each host in your Greenplum system

[gpadmin@mnode ~]$ cat hostfile_exkeys
mnode
segnode01
segnode02

Run the gpssh-exkeys utility with your hostfile_exkeys file to enable n-n passwordless SSH for the gpadmin user.

gpssh-exkeys -f hostfile_exkeys

Confirming Your Installation in all the three machines

gpssh -f hostfile_exkeys -e ‘ls -l /usr/local/greenplum-db-<version>’

Creating data directories for master and segment nodes(as a root)

Creating master directory in master node and segnode01(for master standby)

(as root)

chown gpadmin:gpadmin /data
mkdir -p /data/master
chown -R gpadmin:gpadmin /data

(as gpadmin user)

source /usr/local/greenplum-db/greenplum_path.sh
gpssh -h segnode01 -e ‘mkdir -p /data/master’
gpssh -h segnode01 -e ‘chown gpadmin:gpadmin /data/master’

Creating Data Storage Areas on Segment Hosts

create a file hostfile_gpssh_segonly with all the segments nodes list

[gpadmin@mnode ~]$ cat hostfile_gpssh_segonly
segnode01
segnode02
[gpadmin@mnode ~]$

Using gpssh, create the primary and mirror data directory locations on all segment hosts at once using the hostfile_gpssh_segonly file you just created.

source /usr/local/greenplum-db/greenplum_path.sh
gpssh -f hostfile_gpssh_segonly -e ‘mkdir -p /data/primary’
gpssh -f hostfile_gpssh_segonly -e ‘mkdir -p /data/mirror’
gpssh -f hostfile_gpssh_segonly -e ‘chown -R gpadmin /data/*’

Initializing a Greenplum Database System

Creating the Greenplum Database Configuration File. Create gpinitsystem_config file with below configurations.

ARRAY_NAME=”Greenplum”
MACHINE_LIST_FILE=./hostfile
SEG_PREFIX=gpseg
PORT_BASE=50000
declare -a DATA_DIRECTORY=(/data/primary)
MASTER_HOSTNAME=mnode
MASTER_DIRECTORY=/data/master
MASTER_PORT=5432
TRUSTED_SHELL=ssh
CHECK_POINT_SEGMENTS=8
ENCODING=UNICODE
MACHINE_LIST_FILE=/home/gpadmin/hostfile_gpssh_segonly
MIRROR_PORT_BASE=7000
declare -a MIRROR_DATA_DIRECTORY=(/data/mirror)

Run the initialization utility

gpinitsystem -c ~/gpinitsystem_config -s segnode01

  • Prompt yes(Y) when requested.

Create a database

psql -c “create database gpadmin” template1
psql -c “alter user gpadmin password ‘gpadmin'”
echo “host all all 0.0.0.0/0 md5” >> /data/master/gpseg-1/pg_hba.conf
gpstop -u

Check the setup with the following comamnds

to see the mirror segments in the system and their status:

To see the primary to mirror segment mappings:

[INFO]:-Obtaining Segment details from master…
[INFO]:————————————————————–
[INFO]:–Current GPDB mirror list and status
[INFO]:–Type = Spread
[INFO]:————————————————————–
[INFO]:- Status Data State Primary Datadir Port Mirror Datadir Port
[INFO]:- Primary Active, Mirror Available Synchronized segnode01 /data/primary/gpseg0 50000 segnode02 /data/mirror/gpseg0 7000
[INFO]:- Primary Active, Mirror Available Synchronized segnode02 /data/primary/gpseg1 50000 segnode01 /data/mirror/gpseg1 7000
[INFO]:————————————————————–

To see the status of the standby master mirror:

 

Thank you for giving your valuable time to read the above information. I hope the content served your purpose in reaching out the blog.
Suggestions for improvement of the blog are highly appreciable. Please contact us for any information/suggestion/feedback.

If you want to be updated with all our articles

please follow us on Facebook Twitter
Please subscribe to our newsletter.

 

This Post Has One Comment

Leave a Reply