Exercise 9.3-9.5 Create different types of DNS routing policies using AWS Route 53

The exercises designed in this chapter is really good and very straightforward. By completing the hands-on exercises, you would get a clear understanding regards routing policy in both public and private cloud.

Note: there are some typos in the description part of the exercises such as “create a record set with type set to developer” where “type” should be “name”. Don’t get confused.

9.3 Create an Alias A Record with a Simple Routing Policy

9.4 Create a Latency Routing Policy (Delete one of the Latency based record set to change the policy from Latency based to Weight based)

9.5 Create a Hosted Zone for Amazon Virtual Private Cloud(Amazon VPC)

 

Below is the screenshot of the record set configuration of the Hosted Zone in AWS Route 53 when finishing exercise 9.3/9.4.

c9

Below is the screenshot of Hosted Zone configuration in AWS Route 53 when finishing exercise 9.5.

c9_2

Below is the screenshot of Hosted Zones after finishing exercises 9.3/4/5. Note that you have two domains with one public and one private.

2018-01-17_130122

Advertisements

Exercise 7.5 Launch a Redshift Cluster

Key steps have been listed as below, along with the official document reference in the end.

    1. Download and install Java 8 JRE.
    2. Download the Amazon Redshift JDBC Driver.
    3. Download SQL Workbench.
    4. Click on the executable and create the database connection.  Note: you must configure the Redshift JDBC driver by choosing the downloaded driver location in the “Manage Drivers” dialogue.
    5. Load sample data from Amazon S3 by using the COPY command. Note: the loading file and commands can all be found in the official doc “Load Sample data from S3”.
      *a. Create an IAM role. Choose two permission policies “AmazonS3ReadOnlyAccess” and “AWSGlueConsoleFullAccess”.
      *b. attache the IAM role to the Redshift cluster.
      *c. run the create table and copy command by providing authentication by referencing the IAM role you created.

      copy users from 's3://awssampledbuswest2/tickit/allusers_pipe.txt' 
      credentials 'aws_iam_role=' 
      delimiter '|' region 'us-west-2';

 

Reference Doc:

Chapter 6 AWS IAM exercises

Some useful commands to complete the exercises of chapter have been noted down as below:


--List available regions in default JSON format

C:\Users\Betty>aws ec2 describe-regions
{
"Regions": [
{
"Endpoint": "ec2.ap-south-1.amazonaws.com",
"RegionName": "ap-south-1"
},
{
"Endpoint": "ec2.eu-west-2.amazonaws.com",
"RegionName": "eu-west-2"
},
...
]
}

--List available regions in text format

C:\Users\Betty>aws ec2 describe-regions --output text
REGIONS ec2.ap-south-1.amazonaws.com ap-south-1
REGIONS ec2.eu-west-2.amazonaws.com eu-west-2
REGIONS ec2.eu-west-1.amazonaws.com eu-west-1
REGIONS ec2.ap-northeast-2.amazonaws.com ap-northeast-2
REGIONS ec2.ap-northeast-1.amazonaws.com ap-northeast-1
REGIONS ec2.sa-east-1.amazonaws.com sa-east-1
REGIONS ec2.ca-central-1.amazonaws.com ca-central-1
REGIONS ec2.ap-southeast-1.amazonaws.com ap-southeast-1
REGIONS ec2.ap-southeast-2.amazonaws.com ap-southeast-2
REGIONS ec2.eu-central-1.amazonaws.com eu-central-1
REGIONS ec2.us-east-1.amazonaws.com us-east-1
REGIONS ec2.us-east-2.amazonaws.com us-east-2
REGIONS ec2.us-west-1.amazonaws.com us-west-1
REGIONS ec2.us-west-2.amazonaws.com us-west-2

--Show the version of AWS CLI

C:\Users\Betty>aws --version
aws-cli/1.14.2 Python/2.7.9 Windows/8 botocore/1.8.6

--Configure the CLI to use multiple access keys. Disable and delete the old access keys in the end.

C:\Users\Betty>aws configure --profile administrator
AWS Access Key ID [None]: AKIAID54XROS2XNB2N4Q
AWS Secret Access Key [None]: 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
Default region name [None]: us-east-1
Default output format [None]: text

C:\Users\Betty>aws s3api list-buckets --profile administrator
BUCKETS 2017-10-03T00:08:45.000Z fairybetty-apsoutheast2
BUCKETS 2017-10-03T00:11:34.000Z fairybetty-euwest4
BUCKETS 2017-10-03T02:53:11.000Z fairybetty-myserverlesswebsite
BUCKETS 2017-10-19T05:02:40.000Z fairybetty-sharedbucket
BUCKETS 2017-10-03T00:05:24.000Z fairybetty-useast1
BUCKETS 2017-10-05T01:46:24.000Z fb-pollyaudiofiles
BUCKETS 2017-10-05T01:45:24.000Z fb-pollywebsite
BUCKETS 2017-11-03T04:24:15.000Z mynewbucket-huang-20171102
OWNER suya.huang 8864937e19554b4efa96371e3ad5c514d186faa25d790221f908fa0c8448930e


C:\Users\Betty>aws configure --profile administrator_new
AWS Access Key ID [None]: AKIAIGM4K5MPYUYWQPRQ
AWS Secret Access Key [None]: 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
Default region name [None]: us-east-1
Default output format [None]: text

C:\Users\Betty>aws s3api list-buckets --profile administrator_new
BUCKETS 2017-10-03T00:08:45.000Z fairybetty-apsoutheast2
BUCKETS 2017-10-03T00:11:34.000Z fairybetty-euwest4
BUCKETS 2017-10-03T02:53:11.000Z fairybetty-myserverlesswebsite
BUCKETS 2017-10-19T05:02:40.000Z fairybetty-sharedbucket
BUCKETS 2017-10-03T00:05:24.000Z fairybetty-useast1
BUCKETS 2017-10-05T01:46:24.000Z fb-pollyaudiofiles
BUCKETS 2017-10-05T01:45:24.000Z fb-pollywebsite
BUCKETS 2017-11-03T04:24:15.000Z mynewbucket-huang-20171102
OWNER suya.huang 8864937e19554b4efa96371e3ad5c514d186faa25d790221f908fa0c8448930e

C:\Users\Betty>aws s3api list-buckets --profile administrator

An error occurred (InvalidAccessKeyId) when calling the ListBuckets operation: The AWS Access Key Id you provided does not exist in our records.

Install and Configure AWS CLI

Before you can use CLI from your client windows machine, you need to install and configure it. Refer to the official doc for details:

Install the AWS Command Line Interface on Microsoft Windows

Configuring the AWS CLI

Key steps are listed below:

  • Download the appropriate MSI installer.

Download the AWS CLI MSI installer for Windows (64-bit)

  • Run the downloaded MSI installer.
  • Follow the instructions that appear.

The CLI installs to C:\Program Files\Amazon\AWSCLI (64-bit) by default.

  • To confirm the installation, use the aws –version command at a command prompt

C:\Users\cat>aws --version
aws-cli/1.14.2 Python/2.7.9 Windows/8 botocore/1.8.6

  • Configure the AWS CLI

List available regions


C:\Users\cat>aws ec2 describe-regions --output text
REGIONS ec2.ap-south-1.amazonaws.com ap-south-1
REGIONS ec2.eu-west-2.amazonaws.com eu-west-2
REGIONS ec2.eu-west-1.amazonaws.com eu-west-1
REGIONS ec2.ap-northeast-2.amazonaws.com ap-northeast-2
REGIONS ec2.ap-northeast-1.amazonaws.com ap-northeast-1
REGIONS ec2.sa-east-1.amazonaws.com sa-east-1
REGIONS ec2.ca-central-1.amazonaws.com ca-central-1
REGIONS ec2.ap-southeast-1.amazonaws.com ap-southeast-1
REGIONS ec2.ap-southeast-2.amazonaws.com ap-southeast-2
REGIONS ec2.eu-central-1.amazonaws.com eu-central-1
REGIONS ec2.us-east-1.amazonaws.com us-east-1
REGIONS ec2.us-east-2.amazonaws.com us-east-2
REGIONS ec2.us-west-1.amazonaws.com us-west-1
REGIONS ec2.us-west-2.amazonaws.com us-west-2

Open the access key file you downloaded earlier while creating the IAM user and copy paste the Access Key ID and Secrect Access Key to the command prompt as below.


C:\Users\cat>aws configure
AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [None]: us-east-1
Default output format [None]: text

Now, you should be able to use CLI from your windows client machine. Note that you need to add appropriate policies to the IAM user to view or edit resource information.


 

Exercise 5.5 Create a Scaling Policy

  1. Create an Amazon Cloud Watch metric and alarm for CPU utilization using the AWS Management Console.
  2. Using the Auto Scaling group from Exercise 5.4, edit the Auto Scaling group to include a policy that uses the CPU utilization alarm.
  3. Drive CPU utilization on the monitored Amazon Ec2 instances up to observe Auto Scaling.

Firstly, create an Auto Scaling  Launch configuration.asg_c1

Secondly, create an Auto Scaling group using the launch configuration created earlier.asg1

asg2

Do not configure scaling policy at this step as we’re going to create one using CLI.

asg3

asg4

 

asg5

Thirdly, create a scaling policy for the scaling group created above.


C:\Users\Betty>aws autoscaling put-scaling-policy --auto-scaling-group-name MyASG55 --policy-name MYASG55_CPULoadScaleOut --scaling-adjustment 1 --adjustment-type ChangeInCapacity --cooldown 30
{
"Alarms": [],
"PolicyARN": "arn:aws:autoscaling:us-east-1:921874900115:scalingPolicy:fb30c7c0-a0b8-4a73-b3ce-37458acb40d0:autoScalingGroupName/MyASG55:policyName/MYASG55_CPULoadScaleOut"
}

Fourthly, creates an alarm and associates it with the specified metric. Use the policy ARN for alarm.


aws cloudwatch put-metric-alarm --alarm-name capacityAdd --metric-name CPUUtilization --namespace AWS/EC2 --statistic Average --period 60 --threshold 50 --comparison-operator GreaterThanOrEqualToThreshold --dimensions "Name=AutoScalingGroupName, Value=MyASG55" --evaluation-periods 1 --alarm-actions "arn:aws:autoscaling:us-east-1:921874900115:scalingPolicy:fb30c7c0-a0b8-4a73-b3ce-37458acb40d0:autoScalingGroupName/MyASG55:policyName/MYASG55_CPULoadScaleOut"

Fifthly, use a stress testing tool to simulate some workload on the Linux hosts and observe the EC2 instances in EC2 dashboard. You should be able to see new instances being created and started automatically.

 

 

Exercise 5.3 Create a custom Amazon Cloudwatch Metric for Memory Consumption

  1. Create a custom Amazon CloudWatch metric for memory consumption.
  2. Use the CLI to PUT values into the metric.

The steps of creating a custom metric can be found in Amazon official document. Refer to AWS Documentation » Amazon EC2 » User Guide for Linux Instances » Monitoring Amazon EC2 » Monitoring Memory and Disk Metrics for Amazon EC2 Linux Instances“.

Below is the excerpt required to complete the task.

  • Create an IAM user with aws access type as “Programmatic access” and inline policy as below. Download the access key file which needs to be used later.

{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Action": [
 "cloudwatch:PutMetricData",
 "cloudwatch:GetMetricStatistics",
 "cloudwatch:ListMetrics",
 "ec2:DescribeTags"
 ],
 "Resource": "*"
 }
 ]
}

  • Install the required packages
 yum install perl-Switch perl-DateTime perl-Sys-Syslog perl-LWP-Protocol-https -y 
  • Download, install, and configure the monitoring scripts
 curl http://aws-cloudwatch.s3.amazonaws.com/downloads/CloudWatchMonitoringScripts-1.2.1.zip -O

unzip CloudWatchMonitoringScripts-1.2.1.zip
rm CloudWatchMonitoringScripts-1.2.1.zip
cd aws-scripts-mon

cp awscreds.template awscreds.conf

AWSAccessKeyId=my-access-key-id
AWSSecretKey=my-secret-access-key

Alternatively, you can associate an IAM role (instance profile) with your instance so that you don’t need to add the access key information in a configuration file.

  • Run the script to generate metrics and send it to Cloudwatch
    • Perform a simple test run without posting data to CloudWatch
./mon-put-instance-data.pl --mem-util --verify --verbose
    • Collect all available memory metrics and send them to CloudWatch
./mon-put-instance-data.pl --mem-util --mem-used-incl-cache-buff --mem-used --mem-avail
    • Set a cron schedule for metrics reported to CloudWatch.Add the following command to crontab to report memory and disk space utilization to CloudWatch every five minutes:
*/5 * * * * ~/aws-scripts-mon/mon-put-instance-data.pl --mem-used-incl-cache-buff --mem-util --disk-space-util --disk-path=/ --from-cron
  •  Use CLI to put metric to Cloudwatch
C:\Users\cat aws cloudwatch put-metric-data --namespace "System/Linux" --dimensions Name=InstanceId,Value=i-0c5efa4c554f43c45 --metric-name MemoryUtilization --value 99 
  •  Use CLI to show metrics information

C:\Users\cat>aws cloudwatch list-metrics --namespace "System/Linux" --metric-name MemoryUtilization --dimensions Name=InstanceId,Value=i-0c5efa4c554f43c45
{
"Metrics": [
{
"Namespace": "System/Linux",
"Dimensions": [
{
"Name": "InstanceId",
"Value": "i-0c5efa4c554f43c45"
}
],
"MetricName": "MemoryUtilization"
}
]
}

C:\Users\cat>aws cloudwatch get-metric-statistics --namespace "System/Linux" --metric-name MemoryUtilization --dimensions Name=InstanceId,Value=i-0c5efa4c554f43c45 --start-time 2017-12-04T12:00:00.000Z --end-time 2017-12-04T12:25:00.000Z --period 60 --statistics "Sum" "Maximum" "Minimum" "Average" "SampleCount"
{
"Datapoints": [
{
"SampleCount": 1.0,
"Timestamp": "2017-12-04T12:00:00Z",
"Average": 89.9832103270251,
"Maximum": 89.9832103270251,
"Minimum": 89.9832103270251,
"Sum": 89.9832103270251,
"Unit": "Percent"
},
{
"SampleCount": 1.0,
"Timestamp": "2017-12-04T12:05:00Z",
"Average": 89.9953995509647,
"Maximum": 89.9953995509647,
"Minimum": 89.9953995509647,
"Sum": 89.9953995509647,
"Unit": "Percent"
},
{
"SampleCount": 1.0,
"Timestamp": "2017-12-04T12:10:00Z",
"Average": 90.0075887749043,
"Maximum": 90.0075887749043,
"Minimum": 90.0075887749043,
"Sum": 90.0075887749043,
"Unit": "Percent"
},
{
"SampleCount": 1.0,
"Timestamp": "2017-12-04T12:15:00Z",
"Average": 90.019777998844,
"Maximum": 90.019777998844,
"Minimum": 90.019777998844,
"Sum": 90.019777998844,
"Unit": "Percent"
},
{
"SampleCount": 1.0,
"Timestamp": "2017-12-04T12:20:00Z",
"Average": 90.0504476590792,
"Maximum": 90.0504476590792,
"Minimum": 90.0504476590792,
"Sum": 90.0504476590792,
"Unit": "Percent"
}
],
"Label": "MemoryUtilization"
}

gptransfer failed with “[ERROR]:-error ‘ERROR: gpfdist error – line too long in file “

While the table you copied using gptransfer has wide columns, you might get error as below:

[gpadmin@mdw-1 ~]$ gptransfer -t preview.au.segment_weekly --dest-database qa1_cloned --drop
20160804:03:42:21:007931 gptransfer:mdw-1:gpadmin-[INFO]:-Building list of source tables to transfer...
20160804:03:42:21:007931 gptransfer:mdw-1:gpadmin-[INFO]:-Number of tables to transfer: 1
20160804:03:42:21:007931 gptransfer:mdw-1:gpadmin-[INFO]:-gptransfer will use "fast" mode for transfer.
20160804:03:42:21:007931 gptransfer:mdw-1:gpadmin-[INFO]:-Validating source host map...
20160804:03:42:21:007931 gptransfer:mdw-1:gpadmin-[INFO]:-Validating transfer table set...
20160804:03:42:22:007931 gptransfer:mdw-1:gpadmin-[INFO]:-Using batch size of 2
20160804:03:42:22:007931 gptransfer:mdw-1:gpadmin-[INFO]:-Using sub-batch size of 24
20160804:03:42:22:007931 gptransfer:mdw-1:gpadmin-[INFO]:-Creating work directory '/home/gpadmin/gptransfer_7931'
20160804:03:42:23:007931 gptransfer:mdw-1:gpadmin-[INFO]:-Creating schema au in database qa1_cloned...
20160804:03:42:24:007931 gptransfer:mdw-1:gpadmin-[INFO]:-Starting transfer of preview.au.segment_weekly to qa1_cloned.au.segment_weekly...
20160804:03:42:24:007931 gptransfer:mdw-1:gpadmin-[INFO]:-Creating target table qa1_cloned.au.segment_weekly...
20160804:03:42:24:007931 gptransfer:mdw-1:gpadmin-[INFO]:-Retrieving schema for table preview.au.segment_weekly...
20160804:03:42:33:007931 gptransfer:mdw-1:gpadmin-[INFO]:-Transfering data preview.au.segment_weekly -> qa1_cloned.au.segment_weekly...
20160804:03:42:52:007931 gptransfer:mdw-1:gpadmin-[ERROR]:-Failed to transfer table preview.au.segment_weekly
20160804:03:42:52:007931 gptransfer:mdw-1:gpadmin-[ERROR]:-error 'ERROR: could not write to external resource: Broken pipe (fileam.c:1774) (seg6 sdw-1:40006 pid=9178) (cdbdisp.c:1326)
' in 'INSERT INTO gptransfer.w_ext_segment_weekly_131f3c5e2361d8cb90a4bd9328ccb0e7 SELECT * FROM "au"."segment_weekly"'
20160804:03:42:52:007931 gptransfer:mdw-1:gpadmin-[ERROR]:-error 'ERROR: gpfdist error - line too long in file /home/gpadmin/gptransfer_7931/preview.au.segment_weekly/preview.au.segment_weekly.pipe.6 near (98286585 (url.c:2030) (seg7 slice1 sdw-1:40007 pid=10382) (cdbdisp.c:1326)
DETAIL: External table ext_segment_weekly_131f3c5e2361d8cb90a4bd9328ccb0e7, line 120726 of file gpfdist://sdw-1:8023/preview.au.segment_weekly.pipe.6
' in 'INSERT INTO "au"."segment_weekly" SELECT * FROM gptransfer.ext_segment_weekly_131f3c5e2361d8cb90a4bd9328ccb0e7'
20160804:03:42:52:007931 gptransfer:mdw-1:gpadmin-[INFO]:-Remaining 1 of 1 tables
20160804:03:42:52:007931 gptransfer:mdw-1:gpadmin-[WARNING]:-1 tables failed to transfer. A list of these tables
20160804:03:42:52:007931 gptransfer:mdw-1:gpadmin-[WARNING]:-has been written to the file failed_transfer_tables_20160804_034213.txt
20160804:03:42:52:007931 gptransfer:mdw-1:gpadmin-[WARNING]:-This file can be used with the -f option to continue
20160804:03:42:52:007931 gptransfer:mdw-1:gpadmin-[WARNING]:-the data transfer.
20160804:03:42:52:007931 gptransfer:mdw-1:gpadmin-[INFO]:-Removing work directories...
20160804:03:42:57:007931 gptransfer:mdw-1:gpadmin-[INFO]:-Finished.

To avoid this error, you need to specify  option –max-line-length, according the official document (Utility Guide):

–max-line-length=length
Sets the maximum allowed data row length in bytes for the gpfidst utility. If not specified, the default is 10485760. Valid range is 32768 (32K) to 268435456 (256MB).

Should be used when user data includes very wide rows (or when line too long error message occurs). Should not be used otherwise as it increases resource allocation.

[gpadmin@mdw-1 ~]$ gptransfer -t preview.au.segment_weekly --dest-database qa1_cloned --drop  --max-line-length 100485760

Problem resolved!

High memory usage of a SQL statement with ‘IN’

#High memory usage of a SQL statement with ‘IN’

Note: This is a bug of ORCA optimizer in Greenplum which hasn’t been fixed in the latest version 4.3.8.0

Recently we hit a bug triggered by a bad written query which developer put over 1000 items in the ‘IN’ statement. This single statement took about 35GB memory on the master node and all other user queries were getting out of memory error(“VM Protect” errors (“VM Protect failed to allocate %d bytes, %d MB available”)).

Below is the problematic session which tries to execute the bad written query.

reporting_dev=# select version();
version
------------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 8.2.15 (Greenplum Database 4.3.7.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Jan 21 2016 15:51:02
(1 row)


gpadmin=# select * from pg_stat_activity where procpid=12772;
datid | datname | procpid | sess_id | usesysid | usename |
current_query
| waiting | query_start | backend_start | client_addr | client_port | application_name | xact_start | waiting_reason
----------+------------------------+---------+---------+----------+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------+---------+-------------------------------+------------------------------+--------------+-------------+-----------------------+-------------------------------+----------------
26762904 | reporting_dev | 12772 | 883202 | 26809501 | ssda_backend_dev | -- TODO Add paging??
| f | 2016-03-16 03:58:50.286015+00 | 2016-03-16 02:30:56.71149+00 | *.*.*.* | 41870 | AvailableDatesService | 2016-03-16 02:56:46.194144+00 |
: SELECT DISTINCT channel FROM cache_aggregate_au.conversion_v2_b9eda986fb79015710205195045de16e_20160110_20160116
: WHERE conversion IN ('IKN_A_BW_001','IKN_A_TA_001','IKN_A_AA_001','IKN_A_CA_001','IKN_A_EA_001','IKN_A_BW_002','IKN_A_IG_001','IKN_A_KA_001','IKN_A_TT
_001','IKN_A_AL_001','INTERNSEARCH','SN_L_FB_001','SN_L_FB_002','SN_L_FB_003','SN_L_FB_004','SN_L_FB_005','SN_L_FB_006','SN_L_FB_007','SN_L_FB_008','SN_L_FB_009','SN_L_FB_010','SN_L_FB_011','BBC_L_YT_001','YSS_L_YT_001','USP_L_FB_003','U
SP_L_FB_005','USP_L_FB_009','USP_L_FB_011','USP_L_FB_014','USP_L_FB_015','USP_L_FB_016','NCA_L_FB_001','SPB_L_FB_001','SPB_L_TW_001','SPD_FB_L_01','SPD_FB_L_02','SPD_FB_L_03','SPD_FB_L_04','BNJ_FB_L_001','BNJ_FB_L_002','BNJ_FB_L_003','BN
J_FB_L_005','AUE_L_FB_003','AUE_L_FB_004','AUE_L_TW_001','AUE_L_TW_002','AUE_L_TW_004','FUK_L_FB_001','FUK_L_FB_002','FUK_L_FB_003','FUK_L_FB_004','PRO_L_YT_001','ASO_L_FB_001','TFP_L_FB_001','TFP_L_FB_002','TFP_L_FB_003','TFP_L_FB_004',
'TFP_L_FB_005','TFP_L_FB_006
(1 row)

Below is the output of the top command which shows the abnormal memory usage of the backend process.

Tasks: 730 total, 3 running, 727 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.9%us, 1.0%sy, 0.0%ni, 86.7%id, 11.3%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 32862724k total, 32694560k used, 168164k free, 468k buffers
Swap: 11149304k total, 5979424k used, 5169880k free, 130348k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12772 gpadmin 20 0 34.6g 29g 6416 D 10.8 93.7 116:56.30 postgres: port 5432, ssda_backend_dev reporting_dev 10.156.145.7(41870) con883202 10.156.145.7(41870) cmd66 BIND
8632 gpadmin 20 0 574m 37m 15m D 7.1 0.1 0:00.30 postgres: port 5432, symphony_backend_qa symphony_reporting_qa 10.156.145.176(52337) con887535 10.156.145.176(52337) cmd3 BIND
8567 gpadmin 20 0 638m 37m 13m D 5.4 0.1 0:00.49 postgres: port 5432, symphony_backend_qa symphony_reporting_qa 10.156.145.176(52336) con887531 10.156.145.176(52336) cmd6 BIND

Simply killing the sessions took several minutes as it’s releasing the memroy resources back to the OS.

Parameter statement_mem or using resource queue cannot limit the memory consumption at all.

According to Pivotal, it is a known bug in ORCA that surfaces with queries that have a large ‘IN’ statement. It’s been identified a bug in the plan enumeration which should be fixed in a few weeks.

The workaround would be rewrite the SQL or turn off ORCA.

We choose to turn off ORCA to avoid similar issue happening again. The SQL finished in one second once ORCA has been disabled.

reporting_dev=# set optimizer='off';
SET
Time: 2.996 ms
reporting_dev=# \i ./bad_query.sql
channel
-----------------------------
www.amazon.com.au
www.taste.com.au
www.iga.net.au
www.ebay.com.au
www.youtube.com
www.target.com.au
www.kmart.com.au
www.bigw.com.au
www.chemistwarehouse.com.au
www.facebook.com
www.twitter.com
www.aldi.com.au
(12 rows)

Time: 794.160 ms

Still, we asked developers to rewrite the query to use table join against the temporary table (containts the values of the ‘IN’ statement) to do the filer which is a commonly recommended approach to replace the “IN” statement.

Upgrade Greenplum DB from 4.3.x to 4.3.7

This document details the steps to upgrade Greenplum DB from 4.3.x to 4.3.7.1

Step-by-step guide

  1. Log in to your Greenplum Database master host as the Greenplum administrative user:
    $ su - gpadmin
  2. Perform a smart shutdown of your current Greenplum Database 4.3.x system (there can be no active connections to the database).
    $ gpstop -a
  3. Run the installer for 4.3.7.x on the Greenplum Database master host.
    /usr/local/greenplum-db-4.3.7.1
  4. update the symbolic link to point to the newly installed version
    $ rm /usr/local/greenplum-db
    $ ln -s /usr/local/greenplum-db-4.3.7.1 /usr/local/greenplum-db
  5. Run thegpseginstallutility to install the 4.3.7.x binaries on all the segment hosts specified in the hostfile.
    gpseginstall -f hostfile

    sample hostfile:

                  dm-mdw-2
                  dm-sdw-1
                  dm-sdw-2
                  dm-sdw-3
  6. After all segment hosts have been upgraded, restart your Greenplum Database system:
    $ gpstart

 

Refer to official guide of the whole upgrade procedure http://gpdb.docs.pivotal.io/4370/relnotes/GPDB_4371_README.html#topic17

resynchronize the data between the primary and backup master host

GP version: 4.2.3.2

It’s an old  GP version which is no longer being supported by Pivotal.

Our DW legacy system is built on that, since there’s no more updates and is waiting to be decommissioned so no upgrade is unplanned.

We found the broken synchronization between master and standby master and need to fix it. Below is how:

 

Identify the issue using gpstate command:

[gpadmin@mck-mdw-01 ~]$ gpstate -f
20160108:08:59:25:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:-Starting gpstate with args: -f
20160108:08:59:25:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.2.3.2 build 1'
20160108:08:59:25:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.2.3.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Jan 10 2013 18:2737'
20160108:08:59:25:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:-Obtaining Segment details from master...
20160108:08:59:26:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:-Standby master details
20160108:08:59:26:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:-----------------------
20160108:08:59:26:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:-   Standby address          = mck-mdw-02.gdn.hitwise.com
20160108:08:59:26:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:-   Standby data directory   = /data/gpdb_master/gp-1
20160108:08:59:26:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:-   Standby port             = 5432
20160108:08:59:26:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:-   Standby PID              = 14680
20160108:08:59:26:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:-   Standby status           = Standby host passive
20160108:08:59:26:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:--------------------------------------------------------------
20160108:08:59:26:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:--gp_master_mirroring table
20160108:08:59:26:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:--------------------------------------------------------------
20160108:08:59:26:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:--Summary state: Not Synchronized
20160108:08:59:26:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:--Detail state: Standby master too far behind
20160108:08:59:26:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:--Log time: 2015-10-15 04:59:15+10
20160108:08:59:26:028898 gpstate:mck-mdw-01:gpadmin-[INFO]:--------------------------------------------------------------

Check the size of the master directory for an rough estimation of how long the resync would take. Delete unnecessary files under gpdb_master would make it faster.

[gpadmin@mck-mdw-02 /data/vol1]$ du -smh *
1.9G    corefile
21G     gpdb_master

Fix it using the gpinitstandby command:

[gpadmin@mck-mdw-01 ~]$ gpinitstandby -n
20160108:09:00:10:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
20160108:09:00:10:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Checking for filespace directory /data/gpdb_master/gp-1 on mck-mdw-02.gdn.hitwise.com
20160108:09:00:10:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:------------------------------------------------------
20160108:09:00:10:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Greenplum standby master initialization parameters
20160108:09:00:10:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:------------------------------------------------------
20160108:09:00:10:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Greenplum master hostname               = mck-mdw-01
20160108:09:00:10:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Greenplum master data directory         = /data/gpdb_master/gp-1
20160108:09:00:10:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Greenplum master port                   = 5432
20160108:09:00:10:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Greenplum standby master hostname       = mck-mdw-02.gdn.hitwise.com
20160108:09:00:10:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Greenplum standby master port           = 5432
20160108:09:00:10:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Greenplum standby master data directory = /data/gpdb_master/gp-1
20160108:09:00:10:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Greenplum update system catalog         = Off
20160108:09:00:10:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Greenplum stop database mode            = smart
20160108:09:00:10:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:------------------------------------------------------
20160108:09:00:10:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:- Filespace locations
20160108:09:00:10:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:------------------------------------------------------
20160108:09:00:10:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-pg_system -> /data/gpdb_master/gp-1
Do you want to continue with standby master initialization? Yy|Nn (default=N):
> Y
20160108:09:00:27:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Syncing Greenplum Database extensions to standby
20160108:09:00:27:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-The packages on mck-mdw-02.gdn.hitwise.com are consistent.
20160108:09:00:27:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Stopping database...
20160108:09:00:37:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:--n option given, skipping catalog update
20160108:09:00:37:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Forcing changed blocks to disk for filespace /data/gpdb_master/gp-1...
20160108:09:00:37:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Copying filespace directory to mck-mdw-02.gdn.hitwise.com
20160108:09:08:54:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Updating filespace flat files
20160108:09:08:54:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Updating filespace flat files
20160108:09:08:54:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Removing pg_hba.conf backup...
20160108:09:08:54:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Starting database in production mode...
20160108:09:09:16:029173 gpinitstandby:mck-mdw-01:gpadmin-[INFO]:-Successfully syncronized standby master.

Verify that if synchronization has been restored:

[gpadmin@mck-mdw-01 ~]$ gpstate -f
20160108:09:19:29:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:-Starting gpstate with args: -f
20160108:09:19:29:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.2.3.2 build 1'
20160108:09:19:29:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.2.3.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Jan 10 2013 18:27:37'
20160108:09:19:29:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:-Obtaining Segment details from master...
20160108:09:19:30:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:-Standby master details
20160108:09:19:30:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:-----------------------
20160108:09:19:30:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:-   Standby address          = mck-mdw-02.gdn.hitwise.com
20160108:09:19:30:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:-   Standby data directory   = /data/gpdb_master/gp-1
20160108:09:19:30:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:-   Standby port             = 5432
20160108:09:19:30:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:-   Standby PID              = 11207
20160108:09:19:30:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:-   Standby status           = Standby host passive
20160108:09:19:30:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:--------------------------------------------------------------
20160108:09:19:30:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:--gp_master_mirroring table
20160108:09:19:30:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:--------------------------------------------------------------
20160108:09:19:30:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:--Summary state: Synchronized
20160108:09:19:30:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:--Detail state:
20160108:09:19:30:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:--Log time: 2016-01-08 08:09:13+10
20160108:09:19:30:002145 gpstate:mck-mdw-01:gpadmin-[INFO]:--------------------------------------------------------------