Loading data from S3 to AWS RDS for PostgreSQL

November 11, 2020, 11:44 am

≫ Next: Control-M/EM install application pack

≪ Previous: Oracle Grid Infrastructure on Windows With 2 Nodes

AWS RDS for PostgreSQL comes with an extension that allows you to fetch data from AWS S3 and to write back data to AWS S3. The use case for this is obvious: Either you use other AWS services that write data to S3 and you want to further process that data in PostgreSQL, or you want other AWS services to consume data from PostgreSQL by providing that data in S3. Let’s have a look at how that works.

The extension AWS is providing for working with S3 from inside PostgreSQL is called “aws_s3”:

postgres=> select * from pg_available_extensions where name like '%aws%';
    name     | default_version | installed_version |                   comment                   
-------------+-----------------+-------------------+---------------------------------------------
 aws_commons | 1.0             |                   | Common data types across AWS services
 aws_s3      | 1.0             |                   | AWS S3 extension for importing data from S3
(2 rows)

If you try to install the extension you’ll notice that there is a dependency on the “aws_commons” extension:

postgres=> create extension aws_s3;
ERROR:  required extension "aws_commons" is not installed
HINT:  Use CREATE EXTENSION ... CASCADE to install required extensions too.

You can install both extensions in one step using the “CASCADE” option:

postgres=> create extension aws_s3 cascade;
NOTICE:  installing required extension "aws_commons"
CREATE EXTENSION

These extensions provide a couple of helper functions (aws_commons) and the function to import a file from S3 (aws_s3):

postgres=> \dx+ aws_commons
             Objects in extension "aws_commons"
                     Object description                      
-------------------------------------------------------------
 function aws_commons.create_aws_credentials(text,text,text)
 function aws_commons.create_s3_uri(text,text,text)
 schema aws_commons
 type aws_commons._aws_credentials_1
 type aws_commons._s3_uri_1
(5 rows)

postgres=> \dx+ aws_s3
                                       Objects in extension "aws_s3"
                                            Object description                                             
-----------------------------------------------------------------------------------------------------------
 function aws_s3.table_import_from_s3(text,text,text,aws_commons._s3_uri_1,aws_commons._aws_credentials_1)
 function aws_s3.table_import_from_s3(text,text,text,text,text,text,text,text,text)
 schema aws_s3
(3 rows)

Having the extension ready we need a file we can import, so lets create one (exactly the same file as in the previous post, but a bit less rows):

dwe@dwe:~/Downloads$ cat gen_data.sh 
#!/bin/bash
 
FILE="/home/dwe/Downloads/sample.csv"
rm -rf ${FILE}
 
for i in {1..1000000}; do
    echo "${i},firstname${i},lastname${i},xxx${i}@xxx.com,street${i},country${i},description${i}" >> ${FILE}
done

dwe@dwe:~/Downloads$ chmod +x gen_data.sh 
dwe@dwe:~/Downloads$ ./gen_data.sh 
dwe@dwe:~/Downloads$ head -5 sample.csv 
1,firstname1,lastname1,xxx1@xxx.com,street1,country1,description1
2,firstname2,lastname2,xxx2@xxx.com,street2,country2,description2
3,firstname3,lastname3,xxx3@xxx.com,street3,country3,description3
4,firstname4,lastname4,xxx4@xxx.com,street4,country4,description4
5,firstname5,lastname5,xxx5@xxx.com,street5,country5,description5
dwe@dwe:~/Downloads$ ls -lha sample.csv 
-rw-rw-r-- 1 dwe dwe 96M Nov 10 11:11 sample.csv

We’ll be using a new bucket for this demo, so lets create one and then upload the file we just generated:

dwe@dwe:~/Downloads$ aws s3 mb s3://s3-rds-demo --region eu-central-1
make_bucket: s3-rds-demo
dwe@dwe:~/Downloads$ aws s3 cp sample.csv s3://s3-rds-demo/
upload: ./sample.csv to s3://s3-rds-demo/sample.csv

Before we can do anything against S3 from RDS for PostgreSQL we need to setup the required permissions. You can use security credentials for this, but it is recommended to use IAM roles and policies. The first step is to create a policy that allows listing the bucket, read and write (write is required for writing data to S3 later on):

dwe@dwe:~$ aws iam create-policy \
>    --policy-name rds-s3-policy \
>    --policy-document '{
>      "Version": "2012-10-17",
>      "Statement": [
>        {
>          "Sid": "s3import",
>          "Action": [
>            "s3:GetObject",
>            "s3:ListBucket",
>            "S3:PutObject"
>          ],
>          "Effect": "Allow",
>          "Resource": [
>            "arn:aws:s3:::s3-rds-demo", 
>            "arn:aws:s3:::s3-rds-demo/*"
>          ] 
>        }
>      ] 
>    }' 
{
    "Policy": {
        "PolicyName": "rds-s3-policy",
        "PolicyId": "ANPA2U57KX3NFH4HU4COG",
        "Arn": "arn:aws:iam::xxxxxxxx:policy/rds-s3-policy",
        "Path": "/",
        "DefaultVersionId": "v1",
        "AttachmentCount": 0,
        "PermissionsBoundaryUsageCount": 0,
        "IsAttachable": true,
        "CreateDate": "2020-11-10T12:04:34+00:00",
        "UpdateDate": "2020-11-10T12:04:34+00:00"
    }
}

Once the policy is in place we create an IAM role which gets the policy just created attached to:

dwe@dwe:~$ aws iam create-role \
>    --role-name rds-s3-role \
>    --assume-role-policy-document '{
>      "Version": "2012-10-17",
>      "Statement": [
>        {
>          "Effect": "Allow",
>          "Principal": {
>             "Service": "rds.amazonaws.com"
>           },
>          "Action": "sts:AssumeRole"
>        }
>      ] 
>    }'
{
    "Role": {
        "Path": "/",
        "RoleName": "rds-s3-role",
        "RoleId": "AROA2U57KX3NP2XWVCELI",
        "Arn": "arn:aws:iam::xxxxxxxxxx:role/rds-s3-role",
        "CreateDate": "2020-11-10T12:07:20+00:00",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "rds.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        }
    }
}

Attaching the policy to the role (you will need the ARN of the policy from above):

dwe@dwe:~$ aws iam attach-role-policy \
>    --policy-arn arn:aws:iam::xxxxxxxxxx:policy/rds-s3-policy \
>    --role-name rds-s3-role

Finally you need to attach the IAM role to the RDS instance by providing the ARN of the role and the identifier of your RDS instance:

aws rds add-role-to-db-instance \
   --db-instance-identifier dwe-postgres-helvetia \
   --feature-name s3Import \
   --role-arn arn:aws:iam::xxxxxxxx:role/rds-s3-role   \
   --region eu-central-1

Your RDS instance needs to be running to do that, otherwise you’ll get this:

An error occurred (InvalidDBInstanceState) when calling the AddRoleToDBInstance operation: The status for the dwe-postgres DB instance is stopped. The DB instance is not available for s3Import feature.

Having the IAM policy attached to the RDS instance we can load the csv, but first the s3 URI needs to be defined (we do not want to use access keys and credentials):

postgres=> SELECT aws_commons.create_s3_uri('s3-rds-demo'
postgres(>                                 ,'sample.csv'
postgres(>                                 ,'eu-central-1'
postgres(>                                 ) AS s3_uri \gset
postgres=> select :'s3_uri';
               ?column?                
---------------------------------------
 (s3-rds-demo,sample.csv,eu-central-1)
(1 row)

No we are ready to load the file:

postgres=> create table sample ( id int primary key
postgres(>                              , firstname varchar(20)
postgres(>                              , lastname varchar(20)
postgres(>                              , email varchar(20)
postgres(>                              , street varchar(20)
postgres(>                              , country varchar(20)
postgres(>                              , description varchar(20)
postgres(>                              );
CREATE TABLE
postgres=> SELECT aws_s3.table_import_from_s3 ( 'sample'
                                   , ''
                                   , '(format csv)'
                                   , :'s3_uri'
                                   );
                                 table_import_from_s3                                 
--------------------------------------------------------------------------------------
 1000000 rows imported into relation "sample" from file sample.csv of 100222272 bytes
(1 row)
postgres=> select * from sample limit 5;
 id |  firstname  |  lastname  |     email     |  street  |  country  |  description  
----+-------------+------------+---------------+----------+-----------+---------------
 77 | firstname77 | lastname77 | xxx77@xxx.com | street77 | country77 | description77
 78 | firstname78 | lastname78 | xxx78@xxx.com | street78 | country78 | description78
 79 | firstname79 | lastname79 | xxx79@xxx.com | street79 | country79 | description79
  1 | firstname1  | lastname1  | xxx1@xxx.com  | street1  | country1  | description1
  2 | firstname2  | lastname2  | xxx2@xxx.com  | street2  | country2  | description2
(5 rows)

And we’re done. The follow up post will show the opposite: Writing back to to S3 from RDS for PostgreSQL.

Cet article Loading data from S3 to AWS RDS for PostgreSQL est apparu en premier sur Blog dbi services.

↧

Control-M/EM install application pack

November 12, 2020, 2:42 am

≫ Next: AWS DynamoDB -> S3 -> OCI Autonomous Database

≪ Previous: Loading data from S3 to AWS RDS for PostgreSQL

introduction

Hi team!
Today lets check how to install more Control-M feature on our agents,this can be useful ,moreover you can add your own plug-ins!
Below we will see how to give more possibilities to your agent

Install application pack using CCM

select Manage->New Activity->Application pack->Upgrade/Install

add a description of you task then check send application pack to Control-M agents

select the Control-M agent name where the pack must be installed

Once installed you can go on the CCM and check if agent has its new control module features added

Check if application pack is available on the selected agent

Select refresh on CCM if module is not appearing(here another example of upgraded agent)

Migrate an existing module

If you have customized module you can migrate them on other agents
To add New Control-M module , add the developed script in the dedicated folder( according your apps-repo path)
Example
/home/ctmagtfolder/ctm/cm/AP/data/AI/apps-repo/NEWMODULE/
Connect on Control-M API using the url from CCM

Add or copy the script used in the API

Select the agent name where module must be installed
Select the plug-in then click on “deploy”

Once done,you can do a refresh on your Control-M agent on the CCM and check if the custom plugin is now available

Congratulations Control Module migration is effective, now jobs can be defined with this new module in the planning domain

conclusion

You are now able to install application package feature and to migrate your customized plug-ins
I invite you to check dbi bloggers and stay tuned for the next blogs
Cheers

Cet article Control-M/EM install application pack est apparu en premier sur Blog dbi services.

↧

AWS DynamoDB -> S3 -> OCI Autonomous Database

November 16, 2020, 7:17 am

≫ Next: Recovery in the ☁ with Oracle Autonomous Database

≪ Previous: Control-M/EM install application pack

By Franck Pachot

.
I contribute to multiple technologies communities. I’m an Oracle ACE Director for many years, and I also became an AWS Data Hero recently . I got asked if there’s no conflict with that, as Amazon and Oracle are two competitors. Actually, there’s no conflict at all. Those advocacy programs are not about sales, but technology. Many database developers and administrators have to interact with more than one cloud provider, and those cloud providers expand their interoperability for better user experience. Here is an example with two recent news from past months:

Imagine that your application stores some data into DynamoDB because it is one of the easiest serverless datastore that can scale to millions of key-value queries per second with great availability and performance. You may want to export it to the S3 Object Storage and this new DynamoDB feature can export it without any code (no lambda, no pipeline, no ETL…). When in S3 you may want to query it. Of course, there are many possibilities. One is Amazon Athena with the Presto query engine. But that’s for another post. Here I’m using my free Oracle Autonomous Database to query directly from Amazon S3 with the full SQL power of Oracle Database.

Latest AWS CLI

This is a new feature, my current AWS CLI doesn’t know about it:


[opc@a demo]$ aws --version
aws-cli/2.0.50 Python/3.7.3 Linux/4.14.35-2025.400.9.el7uek.x86_64 exe/x86_64.oracle.7

[opc@a demo]$ aws dynamodb export-table-to-point-in-time
Invalid choice: 'export-table-to-point-in-time', maybe you meant:

Let’s update it to the latest version:


cd /var/tmp
wget -c https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip
unzip awscli-exe-linux-x86_64.zip
sudo ./aws/install --bin-dir /usr/local/bin --install-dir /usr/local/aws-cli --update
aws --version
cd -

If you don’t already have the AWS CLI, the initial install is the same as the upgrade procedure. You will just have to “aws configure” it.


[opc@a demo]$ aws dynamodb export-table-to-point-in-time help | awk '/DESCRIPTION/,/^$/'
DESCRIPTION
       Exports  table  data to an S3 bucket. The table must have point in time
       recovery enabled, and you can export data  from  any  time  within  the
       point in time recovery window.

This is the inline help. Online one is: https://docs.aws.amazon.com/de_de/cli/latest/reference/dynamodb/export-table-to-point-in-time.html

DynamoDB Table

I’m quickly creating a DynamoDB table. I’m doing that in the AWS free tier and set the Throughput (RCU/WCU) to the maximum I can do for free here as I have no other tables:


[opc@a demo]$ aws dynamodb create-table --attribute-definitions  AttributeName=K,AttributeType=S  --key-schema  AttributeName=K,KeyType=HASH --billing-mode PROVISIONED --provisioned-throughput ReadCapacityUnits=25,WriteCapacityUnits=25 --table-name Demo

TABLEDESCRIPTION        2020-11-15T23:02:08.543000+01:00        0       arn:aws:dynamodb:eu-central-1:802756008554:table/Demo   b634ffcb-1ae3-4de0-8e42-5d8262b04a38    Demo    0       CREATING
ATTRIBUTEDEFINITIONS    K       S
KEYSCHEMA       K       HASH
PROVISIONEDTHROUGHPUT   0       25      25

This creates a DynamoDB table with a hash partition key named “K”.

Enable PIT (Point-In-Time Recovery) for consistent view (MVCC)

When you export a whole table, you want a consistent view of it. If you just scan it, you may see data as-of different points in time because there are concurrent updates and DynamoDB is No SQL: No ACID (Durability is there but Atomicity is limited, Consistency has another meaning, and there’s no snapshot Isolation), no locks (except with a lot of DiY code), no MVCC (Multi-Version Concurrency control). However, Amazon DynamoDB provides MVCC-like consistent snapshots for recovery reasons, when enabling Continuous Backups which is like Copy-on-Write on the storage. This can provide a consistent point-in-time snapshot of the whole table. And the “Export to S3” feature is based on that. This means that if you don’t enable Point-In-Time Recovery you can’t use this feature:


[opc@a demo]$ aws dynamodb export-table-to-point-in-time --table-arn arn:aws:dynamodb:eu-central-1:802756008554:table/Demo --s3-bucket franck-pachot-free-tier --export-format DYNAMODB_JSON

An error occurred (PointInTimeRecoveryUnavailableException) when calling the ExportTableToPointInTime operation: Point in time recovery is not enabled for table 'Demo'

Here is how to enable it (which can also be done from the console in table “Backups” tab):


[opc@a demo]$ aws dynamodb update-continuous-backups --table-name Demo --point-in-time-recovery-specification PointInTimeRecoveryEnabled=true

CONTINUOUSBACKUPSDESCRIPTION    ENABLED
POINTINTIMERECOVERYDESCRIPTION  2020-11-15T23:02:15+01:00       2020-11-15T23:02:15+01:00       ENABLED

Put items

I quickly insert a few items just to test this feature:


python3
import boto3, botocore.config, datetime
print(datetime.datetime.utcnow().isoformat())
dynamodb = boto3.resource('dynamodb',config=botocore.config.Config(retries={'mode':'adaptive','total_max_attempts': 10}))
for k in range(1,1000):
 dynamodb.Table('Demo').put_item(Item={'K':f"K-{k:08}",'V':datetime.datetime.utcnow().isoformat()})

quit()

This has created 1000 items with a key from K-00000001 to K-00001000 and a simple value item with one attribute “V” where I put the current timestamp.

They have been inserted sequentially. I’ve highlighted one item inserted in the middle, with its timestamp. The reason is to test the consistent snapshot: only the items inserted before this one should be visible from this point-in-time.

Export to S3

I mention this “Export Time” for the export to S3:


[opc@a demo]$ aws dynamodb export-table-to-point-in-time --export-time 2020-11-15T22:02:33.011659 --table-arn arn:aws:dynamodb:eu-central-1:802756008554:table/Demo --s3-bucket franck-pachot-free-tier --export-format DYNAMODB_JSON

EXPORTDESCRIPTION       e8839402-f7b0-4732-b2cb-63e74f8a4c7b    arn:aws:dynamodb:eu-central-1:802756008554:table/Demo/export/01605477788673-a117c9b4    DYNAMODB_JSON   IN_PROGRESS     2020-11-15T23:02:33+01:00       franck-pachot-free-tier AES256  2020-11-15T23:03:08.673000+01:00
        arn:aws:dynamodb:eu-central-1:802756008554:table/Demo   b634ffcb-1ae3-4de0-8e42-5d8262b04a38

You can do the same from the console (use the preview of the new DynamoDB console).

And that’s all, I have a new AWSDynamoDB folder in the S3 bucket I’ve mentioned:


[opc@a demo]$ aws s3 ls franck-pachot-free-tier/AWSDynamoDB/                                                                                                             
                          PRE 01605477788673-a117c9b4/

The subfolder 01605477788673-a117c9b4 is created for this export. A new export will create another one.

[opc@a demo]$ aws s3 ls franck-pachot-free-tier/AWSDynamoDB/01605477788673-a117c9b4/

                           PRE data/
2020-11-15 23:04:09          0 _started
2020-11-15 23:06:09        197 manifest-files.json
2020-11-15 23:06:10         24 manifest-files.md5
2020-11-15 23:06:10        601 manifest-summary.json
2020-11-15 23:06:10         24 manifest-summary.md5

There’s a lot of metadata coming with this, and a “data” folder.

[opc@a demo]$ aws s3 ls franck-pachot-free-tier/AWSDynamoDB/01605477788673-a117c9b4/data/
2020-11-15 23:05:35       4817 vovfdgilxiy6xkmzy3isnpzqgu.json.gz

This is where my DynamoDB table items have been exported: a text file, with one record per item, each as a JSON object, and gzipped:


{"Item":{"K":{"S":"K-00000412"},"V":{"S":"2020-11-15T22:02:30.725004"}}}
{"Item":{"K":{"S":"K-00000257"},"V":{"S":"2020-11-15T22:02:28.046544"}}}
{"Item":{"K":{"S":"K-00000179"},"V":{"S":"2020-11-15T22:02:26.715717"}}}
{"Item":{"K":{"S":"K-00000274"},"V":{"S":"2020-11-15T22:02:28.333683"}}}
{"Item":{"K":{"S":"K-00000364"},"V":{"S":"2020-11-15T22:02:29.889403"}}}
{"Item":{"K":{"S":"K-00000169"},"V":{"S":"2020-11-15T22:02:26.546318"}}}
{"Item":{"K":{"S":"K-00000121"},"V":{"S":"2020-11-15T22:02:25.628021"}}}
{"Item":{"K":{"S":"K-00000018"},"V":{"S":"2020-11-15T22:02:23.613737"}}}
{"Item":{"K":{"S":"K-00000459"},"V":{"S":"2020-11-15T22:02:31.513156"}}}
{"Item":{"K":{"S":"K-00000367"},"V":{"S":"2020-11-15T22:02:29.934015"}}}
{"Item":{"K":{"S":"K-00000413"},"V":{"S":"2020-11-15T22:02:30.739685"}}}
{"Item":{"K":{"S":"K-00000118"},"V":{"S":"2020-11-15T22:02:25.580729"}}}
{"Item":{"K":{"S":"K-00000303"},"V":{"S":"2020-11-15T22:02:28.811959"}}}
{"Item":{"K":{"S":"K-00000021"},"V":{"S":"2020-11-15T22:02:23.676213"}}}
{"Item":{"K":{"S":"K-00000167"},"V":{"S":"2020-11-15T22:02:26.510452"}}}
...

As you can see, the items are in the DynamoDB API format, mentioning attribute name (I’ve defined “K” and “V”) and the datatype (“S” for string here).

Credentials to access to S3

As my goal is to access it through the internet, I’ve defined a user for that:

the credentials I’ll need are the IAM user Access key ID and Secret access key.

As my goal is access it from my Oracle Autonomous Database, I connect to it:


[opc@a demo]$ TNS_ADMIN=/home/opc/wallet ~/sqlcl/bin/sql demo@atp1_tp

SQLcl: Release 20.3 Production on Sun Nov 15 22:34:32 2020

Copyright (c) 1982, 2020, Oracle.  All rights reserved.
Password? (**********?) ************
Last Successful login time: Sun Nov 15 2020 22:34:39 +01:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.5.0.0.0

DEMO@atp1_tp>

I use SQLcl command line here, with the downloaded wallet, but you can do the same from SQL Developer Web of course.

Here is how I declare those credentials from the Oracle database:


DEMO@atp1_tp> exec dbms_cloud.create_credential('AmazonS3FullAccess','AKIA3VZ74TJVKPXKVFRK','RTcJaqRZ+8BTwdatQeUe4AHpJziR5xVrRGl7pmgd');

PL/SQL procedure successfully completed.

Currently, this DBMS_CLOUD is available only on the Oracle Cloud-managed databases, but I’m quite sure that this will come into on-premises version soon. Data movement through an Object Storage is a common thing today.

Access to S3 from Oracle Autonomous Database

Ok, now many things are possible now, with this DBMS_CLOUD package API, like creating a view on it (also known as External Table) to query directly from this remote file:


DEMO@atp1_tp> 
begin
 dbms_cloud.create_external_table(
  table_name=>'DEMO',
  credential_name=>'AmazonS3FullAccess',
  file_uri_list=>'https://franck-pachot-free-tier.s3.eu-central-1.amazonaws.com/AWSDynamoDB/01605477788673-a117c9b4/data/vovfdgilxiy6xkmzy3isnpzqgu.json.gz',
  format=>json_object( 'compression' value 'gzip' ),column_list=>'ITEM VARCHAR2(32767)'
 );
end;
/

PL/SQL procedure successfully completed.

This is straightforward: the credential name, the file URL, and the format (compressed JSON). As we are in a RDBMS here I define the structure: the name of the External Table (we query it as a table, but nothing is stored locally) and its definition (here only one ITEM column for the JSON string). Please, remember that SQL has evolved from its initial definition. I can do schema-on-read here by just defining an unstructured character string as my VARCHAR2(32767). But I still benefit from RDBMS logical independence: query as a table something that can be in a different format, on a different region, with the same SQL language.

Here is what I can view from this External Table:


DEMO@atp1_tp> select * from DEMO order by 1;
                                                                       ITEM
___________________________________________________________________________
{"Item":{"K":{"S":"K-00000001"},"V":{"S":"2020-11-15T22:02:23.202348"}}}
{"Item":{"K":{"S":"K-00000002"},"V":{"S":"2020-11-15T22:02:23.273001"}}}
{"Item":{"K":{"S":"K-00000003"},"V":{"S":"2020-11-15T22:02:23.292682"}}}
{"Item":{"K":{"S":"K-00000004"},"V":{"S":"2020-11-15T22:02:23.312492"}}}
{"Item":{"K":{"S":"K-00000005"},"V":{"S":"2020-11-15T22:02:23.332054"}}}
...
{"Item":{"K":{"S":"K-00000543"},"V":{"S":"2020-11-15T22:02:32.920001"}}}
{"Item":{"K":{"S":"K-00000544"},"V":{"S":"2020-11-15T22:02:32.940199"}}}
{"Item":{"K":{"S":"K-00000545"},"V":{"S":"2020-11-15T22:02:32.961124"}}}
{"Item":{"K":{"S":"K-00000546"},"V":{"S":"2020-11-15T22:02:32.976036"}}}
{"Item":{"K":{"S":"K-00000547"},"V":{"S":"2020-11-15T22:02:32.992915"}}}

547 rows selected.

DEMO@atp1_tp>

I validate the precision of the DynamoDB Point-In-Time snapshot: I exported as of 2020-11-15T22:02:33.011659 which was the point in time where I inserted K-00000548. At that time only K-00000001 to K-00000547 were there.

On SQL you can easily transform a non-structured JSON collection of items into a structured two-dimensional table:


DEMO@atp1_tp> 
  select json_value(ITEM,'$."Item"."K"."S"'),
         json_value(ITEM,'$."Item"."V"."S"') 
  from DEMO 
  order by 1 fetch first 10 rows only;

   JSON_VALUE(ITEM,'$."ITEM"."K"."S"')    JSON_VALUE(ITEM,'$."ITEM"."V"."S"')
______________________________________ ______________________________________
K-00000001                             2020-11-15T22:02:23.202348
K-00000002                             2020-11-15T22:02:23.273001
K-00000003                             2020-11-15T22:02:23.292682
K-00000004                             2020-11-15T22:02:23.312492
K-00000005                             2020-11-15T22:02:23.332054
K-00000006                             2020-11-15T22:02:23.352470
K-00000007                             2020-11-15T22:02:23.378414
K-00000008                             2020-11-15T22:02:23.395848
K-00000009                             2020-11-15T22:02:23.427374
K-00000010                             2020-11-15T22:02:23.447688

10 rows selected.

I used JSON_VALUE to extract attribute string values through JSON Path, but it will be easier to query with a relational view and proper data types, thanks to the JSON_TABLE function:


DEMO@atp1_tp> 
  select K,V, V-lag(V) over (order by V) "lag"
  from DEMO,
  json_table( ITEM, '$' columns ( 
   K varchar2(10) path '$."Item"."K"."S"' error on error, 
   V timestamp path '$."Item"."V"."S"' error on error
  )) order by K fetch first 10 rows only;

            K                                  V                    lag
_____________ __________________________________ ______________________
K-00000001    15-NOV-20 10.02.23.202348000 PM
K-00000002    15-NOV-20 10.02.23.273001000 PM    +00 00:00:00.070653
K-00000003    15-NOV-20 10.02.23.292682000 PM    +00 00:00:00.019681
K-00000004    15-NOV-20 10.02.23.312492000 PM    +00 00:00:00.019810
K-00000005    15-NOV-20 10.02.23.332054000 PM    +00 00:00:00.019562
K-00000006    15-NOV-20 10.02.23.352470000 PM    +00 00:00:00.020416
K-00000007    15-NOV-20 10.02.23.378414000 PM    +00 00:00:00.025944
K-00000008    15-NOV-20 10.02.23.395848000 PM    +00 00:00:00.017434
K-00000009    15-NOV-20 10.02.23.427374000 PM    +00 00:00:00.031526
K-00000010    15-NOV-20 10.02.23.447688000 PM    +00 00:00:00.020314

10 rows selected.

I’ve declared the timestamp as TIMESTAMP and then further arithmetic can be done, like I did here calculating the time interval between two items.
And yes, that’s a shout-out for the power of SQL for analyzing data, as well as a shout-out for millisecond data-ingest I did on a NoSQL free tier Be careful, I sometimes put hidden messages link this in my demos…

Copy from S3 to Oracle Autonomous Database

I can use the query above to store it locally with a simple CREATE TABLE … AS SELECT:


DEMO@atp1_tp>
  create table DEMO_COPY(K primary key using index local, V) 
  partition by hash(K) partitions 8 
  as select K,V from DEMO, json_table(ITEM,'$' columns ( K varchar2(10) path '$."Item"."K"."S"' , V timestamp path '$."Item"."V"."S"' ));

Table DEMO_COPY created.

DEMO@atp1_tp> desc DEMO_COPY

   Name       Null?            Type
_______ ___________ _______________
K       NOT NULL    VARCHAR2(10)
V                   TIMESTAMP(6)

In order to mimic the DynamoDB scalability, I’ve created it as a hashed partitioned table with local primary key index. Note that there’s a big difference here: the number of partition is mentioned and necessitates an ALTER TABLE to increase it. This is probably something that will evolve in the Oracle Autonomous database but for the moment, DynamoDB still provides the most simple API for automatic sharding.

Here is the execution plan:


DEMO@atp1_tp> select * from DEMO_COPY where K='K-00000042';

            K                                  V
_____________ __________________________________
K-00000042    15-NOV-20 10.02.24.109384000 PM

DEMO@atp1_tp> xc
DEMO@atp1_tp> select plan_table_output from dbms_xplan.display_cursor(format=>'allstats last')

                                                                                                        PLAN_TABLE_OUTPUT
_________________________________________________________________________________________________________________________
SQL_ID  cn4b2uar7hxm5, child number 0
-------------------------------------
select * from DEMO_COPY where K='K-00000042'

Plan hash value: 3000930534

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |              |      1 |        |      1 |00:00:00.01 |       3 |      1 |
|   1 |  TABLE ACCESS BY GLOBAL INDEX ROWID| DEMO_COPY    |      1 |      1 |      1 |00:00:00.01 |       3 |      1 |
|*  2 |   INDEX UNIQUE SCAN                | SYS_C0037357 |      1 |      1 |      1 |00:00:00.01 |       2 |      1 |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("K"='K-00000042')

19 rows selected.

3 buffer reads are, of course, really fast. And by increasing the number of partitions (always a power of two because a linear hashing algorithm is used). But do not stick to the same data model as the source NoSQL one. If you export data to a relational database, you probably have other access paths.

Remote access

While in execution plans here is the one from the External Table:


DEMO@atp1_tp> 
  select K,V from DEMO, 
  json_table(ITEM,'$' columns ( K varchar2(10) path '$."Item"."K"."S"' , V timestamp path '$."Item"."V"."S"' )) 
  where K='K-00000042';

            K                                  V
_____________ __________________________________
K-00000042    15-NOV-20 10.02.24.109384000 PM

DEMO@atp1_tp> select plan_table_output from dbms_xplan.display_cursor(format=>'allstats last')

                                                                                       PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________
SQL_ID  1vfus4x9mpwp5, child number 0
-------------------------------------
select K,V from DEMO, json_table(ITEM,'$' columns ( K varchar2(10) path
'$."Item"."K"."S"' , V timestamp path '$."Item"."V"."S"' )) where
K='K-00000042'

Plan hash value: 1420242269

-----------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |      1 |00:00:00.29 |    3053 |
|   1 |  PX COORDINATOR                |          |      1 |        |      1 |00:00:00.29 |    3053 |
|   2 |   PX SEND QC (RANDOM)          | :TQ10000 |      1 |   1067K|      1 |00:00:00.15 |    2024 |
|   3 |    NESTED LOOPS                |          |      1 |   1067K|      1 |00:00:00.15 |    2024 |
|   4 |     PX BLOCK ITERATOR          |          |      1 |        |    547 |00:00:00.15 |    2024 |
|*  5 |      EXTERNAL TABLE ACCESS FULL| DEMO     |      1 |  13069 |    547 |00:00:00.15 |    2024 |
|   6 |     JSONTABLE EVALUATION       |          |    547 |        |      1 |00:00:00.01 |       0 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter(JSON_EXISTS2("ITEM" FORMAT JSON , '$?(@.Item.K.S=="K-00000042")' FALSE ON
              ERROR)=1)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 2 because of degree limit

Even if the plan is ready for parallel access, here I have one file to read and unzip and it was not parallelized (Starts=1). There are no statistics and that’s why the rows estimations (E-Rows) is wrong. What is important to understand is that the whole file is read and unzipped (A-Rows=547) before the non-interesting rows being filtered out. However, we can see that the condition on the relational columns has been pushed down as a JSON Path predicate.

I did everything from the command line here, but you can use SQL Developer Web:

Note that rather than creating an External Table and creating a table from it, you can also insert the remote data into an existing table with DBMS_CLOUD:


DEMO@atp1_tp> create table DEMO_ITEM("Item" varchar2(32767));
Table DEMO_ITEM created.

DEMO@atp1_tp> exec dbms_cloud.copy_data(table_name=>'DEMO_ITEM',credential_name=>'AmazonS3FullAccess',file_uri_list=>'https://franck-pachot-free-tier.s3.eu-central-1.amazonaws.com/AWSDynamoDB/01605477788673-a117c9b4/data/vovfdgilxiy6xkmzy3isnpzqgu.json.gz',format=>json_object( 'compression' value 'gzip' ));

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.130

Pricing and summary

I used all free tier here, but be careful that you may have multiple services billed when doing this.
First, I used two options of DynamoDB:

I enabled “Continuous backups (PITR)” which is $0.2448 per GB per months in my region,
and I used “Data Export to Amazon S3” which is $0.1224 per GB there.

And when taking out data from S3 you should check at egress costs.
From The Oracle Autonomous Database, I used the always free database (20GB free for life) and ingress transfer is free as well.
Conclusion: better do the export once and copy data rather than query the External Table multiple times. This is not the correct solution if you want a frequently refreshed copy. But it is very satisfying to know that you can move data from one cloud provider to the other, and one technology to the other. I’ll probably blog about the other way soon.

Cet article AWS DynamoDB -> S3 -> OCI Autonomous Database est apparu en premier sur Blog dbi services.

↧

Recovery in the ☁ with Oracle Autonomous Database

November 18, 2020, 7:49 am

≫ Next: Configure Data Guard with odacli on ODA

≪ Previous: AWS DynamoDB -> S3 -> OCI Autonomous Database

By Franck Pachot

.
I’ll start this series with the Oracle Autonomous database but my goal is to cover the Point In Time recovery for many managed databases in the major cloud providers. Because I’ve seen lot of confusion about database backups (see What is a database backup (back to the basics)). On one side, in a managed database, we should not have to care about backups, but only about recovery. The way it is done (backups, redo archiving, snapshots, replication…) is not in our responsibility on a managed service. What we want is an easy way to recover the database to a past Point In Time within a recovery window (the recovery point – RPO) and in a predictable time (the recovery time – RTO). However, I’m certain that it is important to know how it is implemented behind this “recovery” service.

Clone from a backup

What I like the most with the Oracle Autonomous Database is that there is no “restore” or “recovery” button on the database service actions. There is one when you go to the list of backups, in order to restore the database in-place from a previous backup. But this is not the common case of recovery. When you need recovery, you probably want to create a new database. The reason is that when you recover to a Point In Time in the past, you probably want to connect to it, check it, read or export data from it, without discarding the transactions that happened since then in the original database. And you are in the cloud: it is easy to provision a new clone and keep the original database until you are sure to know what you need to do (merging recent transactions into the clone or fixing the original database from the clone data). In the cloud, a Point-In-Time Recovery (PITR) is actually a “clone from backup” or “clone from timestamp”. And this is exactly how you do it in the Oracle Autonomous Database: you create a new clone, and rather than cloning from the current state you choose to clone from a previous state, by mentioning a timestamp or a backup.

Point In Time (PIT)

Actually, I think that the “clone from backup” should not even be there. It gives a false impression that you restore a backup like a snapshot of the database. But that’s wrong: the “restore from backup” is just a “restore from timestamp” where the timestamp is the one from the end of the copy of datafiles to the backup set. It is not a special Recovery Point. It is just a point where the Recovery Time is minimal. From a user point of view, you decide to clone from a timestamp. The timestamp is probably the point just before the failure (like an erroneous update for example). Additionally, I hope that one day more utilities will be provided to define this point. Log Miner can search the journal of changes for a specific point. Imagine a recovery service where you mention a time window, the name of a table, an operation (like UPDATE),… and find the exact point-in-time to recover. Oracle has Log Miner and Flashback Transaction and a DBA can use them. A managed service could provide the same. But currently, you need to know the timestamp you want to recover to. The console shows time per half hours but you can enter it with finer granularity: second in the console, millisecond with the CLI:

I have a PL/SQL loop constantly updating a timestamp in a DEMO table:


SQL> exec loop update demo set ts=sys_extract_utc(current_timestamp); commit; end loop;

I just kept that running.

Later I created a a clone with 21:21:21.000 as recovery point in time:


[opc@a demo]$ oci db autonomous-database create-from-backup-timestamp --clone-type FULL --timestamp 2020-11-14T21:21:21.000Z --db-name clone --cpu-core-count 1 --data-storage-size-in-tbs 1 --admin-password COVID-19-Go-Home --autonomous-database-id ocid1.autonomousdatabase.oc1.iad.abuw...

and once opened, I checked the state of my table:


SQL> select * from demo;

TS                       
------------------------ 
2020-11-14T21:21:19.368Z

This is less than 2 seconds before the mentioned recovery point-in-time. RPO is minimal.

Recovery window

With the Oracle Autonomous Database, you always have the possibility to recover to any point in time during the past 2 months. Always: without any additional configuration, without additional cost, and even in the always free tier. You can run additional manual backups but you cannot remove those automatic ones, and you cannot reduce this recovery window.
Here is the documentation:
About Backup and Recovery on Autonomous Data Warehouse
Autonomous Data Warehouse automatically backs up your database for you. The retention period for backups is 60 days. You can restore and recover your database to any point-in-time in this retention period.

This is great. You don’t have to worry about anything in advance. And if something happens (can be a user error dropping a table, updating wrong rows, an application release that messes up everything, a legal request to look at past data,…) you can create a clone of the database at any state within this 60 days timeframe. And these 60 days is the minimum guaranteed. I have a small free-tier database where I can still see the backups from the past 6 months:

I have another database that is stopped for 2 months:

Only one backup there because the database has not been opened in the past two months (my current timestamp is 2020-09-17 15:45:50). This is important to know. Some cloud providers will start your managed database to backup it and apply updates. for example, in AWS RDS a stopped database is stopped only for seven days and it will be automatically started after. But for the Oracle Autonomous Database, the database is actually a PDB, and stopped means closed but the instance is still running. But at least one backup remains!

Multiple backups

This is a very important point. If you rely only on restoring a backup, without the possibility to recover to a further Point In Time, you have the risk that any problem in the backup compromises your recovery. Here, even if the last backup has a problem (you should always design for failure even if the probability of it is low) I can recover from the previous one and recover. It will take longer (24 hours of activity to recover) but data will be there, and consistent, to the requested point in time.

When you know how Oracle Database backups work, you know that any backup configuration mentions either a number of backup (please, always more than one) or a recovery window (which hopefully will contain more than one), bot not both:


RMAN> show all;

using target database control file instead of recovery catalog
RMAN configuration parameters for database with db_unique_name CDB1A_IAD154 are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default

RMAN> configure retention policy to recovery window of 60 days;

new RMAN configuration parameters:
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 60 DAYS;
new RMAN configuration parameters are successfully stored

In a managed database you don’t have to configure this. But knowing how it works will give you confidence and trust in the data protection. Here for my closed database I can see only one backup.

Limitation in RTO (Recovery Time Objective)

How long does it take to restore the database? This is proportional to the size. But when you understand that a restore is not sufficient and recovery must apply the redo logs to bring all database files to be consistent to the recovery point-in-time, you know it can take longer. As automatic backups are taken every day, you may have 24 hours to recover. And understanding this may help. If you have the choice to select a point-in-time not too far after the end of the backup, then you can minimize the RTO. This is where you may choose “select the backup from the list” rather than “Point In Time clone”. But the, you may increase the RPO.

My main message is: even in a managed service where you are not the DBA, it is important to understand how it works. There is nothing like “restore a backup”. Recovery always happens and your goal when you select the recovery point-in-time is to minimize the RTO for the required RPO. Recovery has to restore the archived logs, read them and apply the relevant redo vectors. And with the Oracle Autonomous Database, this redo stream is common for all databases (redo is at CDB level if you know the multitenant architecture). This can take time for your database even if you didn’t have a lot of updates in your database (which is a PDB). Most of the redo will be discarded, but they have to be restored and read sequentially. In order to verify that, I’ve selected a point-in-time from this morning for the database that was stopped 2 months ago. The last backup is 2 months before the Point In Time to recover to. If you take this as a black box, you may think that this will be very fast: no updates to recover because the database was closed most of the time. But actually, this has to go through 2 months of redo for this database shared by many users. While writing this, the progress bar is showing “18%” for three hours… This could be avoided if the whole CDB was backed-up every day, because a closed autonomous database is just a closed PDB – files are still accessible for backup. Is it the case? I don’t know.

Finally, this recovery took nearly 5 hours. And then I am not sure whether it really restored the datafiles from 2 months ago, and then restoring this amount of archived logs was really fast, or from a more recent backup taken while the database was closed, not visible in the list of backups. Because I see 74 TeraBytes of archived logs during those 60 days:


select dbms_xplan.format_size(sum(blocks*block_size))
from gv$archived_log where dest_id=1 and first_time>sysdate-60;

DBMS_XPLAN.FORMAT_SIZE(SUM(BLOCKS*BLOCK_SIZE)) 
---------------------------------------------- 
74T

And I would be very surprised that they can restore and recover this amount in less than 5 hours… But who knows? We don’t have the logs in a managed database.

Note that this is a shared infrastructure managed database. You have also the choice to provision a dedicated autonomous database when you need better control and isolation. What I’m saying is that if you don’t know it, you will make mistakes. Like this one: as the database was stopped for 2 months, I should have selected a point-in-time closer to the last backup. It could have taken 1 hour instead of 5 hours (I tested it on the same database). I’ve been working on production databases for 20 years, in many companies. I can tell you that when a problem will happen in production, and you will have to recover a database, you will be surrounded by managers asking, every five minutes, how long this recovery will take. The better you understand the recovery process, the more comfortable you will be.

Just imagine that you are in this situation and the only information you have is this 18% progress bar that is there for two hours:

When I understand what happens (2 months of shared archived logs to restore), I can take another action, or at least explain why it takes so long. Here, even if I have no access to the recovery logs in this managed database, I can guess that it restores and recover all archived logs from the past 60 days. It cannot be fast and it is hard to predict in a shared infrastructure, as it depends on the neighbours activity.

Limitation in RPO (Recovery Point Objective)

Managed services have always more limitations because the automation that is behind must standardize everything. When trying to recover from a RPO that is just 30 minutes ago, I hit a limitation. I’ve not seen it documented but the message is clear: Cloning operation failed because the timestamp specified is not at least 2 hours in the past.

I don’t understand exactly the reason. Point-In-Time recovery should be possible even if redo logs are archived every 2 hours, which is not the case anyway:


DEMO@atp1_tp> select * from v$archive_dest where status='VALID';

   DEST_ID             DEST_NAME    STATUS      BINDING    NAME_SPACE     TARGET    ARCHIVER    SCHEDULE                  DESTINATION    LOG_SEQUENCE    REOPEN_SECS    DELAY_MINS    MAX_CONNECTIONS    NET_TIMEOUT    PROCESS    REGISTER    FAIL_DATE    FAIL_SEQUENCE    FAIL_BLOCK    FAILURE_COUNT    MAX_FAILURE    ERROR    ALTERNATE    DEPENDENCY    REMOTE_TEMPLATE    QUOTA_SIZE    QUOTA_USED    MOUNTID    TRANSMIT_MODE    ASYNC_BLOCKS    AFFIRM      TYPE    VALID_NOW      VALID_TYPE    VALID_ROLE    DB_UNIQUE_NAME    VERIFY    COMPRESSION    APPLIED_SCN    CON_ID    ENCRYPTION
__________ _____________________ _________ ____________ _____________ __________ ___________ ___________ ____________________________ _______________ ______________ _____________ __________________ ______________ __________ ___________ ____________ ________________ _____________ ________________ ______________ ________ ____________ _____________ __________________ _____________ _____________ __________ ________________ _______________ _________ _________ ____________ _______________ _____________ _________________ _________ ______________ ______________ _________ _____________
         1 LOG_ARCHIVE_DEST_1    VALID     MANDATORY    SYSTEM        PRIMARY    ARCH        ACTIVE      USE_DB_RECOVERY_FILE_DEST              3,256            300             0                  1              0 ARCH       YES                                     0             0                0              0          NONE         NONE          NONE                           0             0          0 SYNCHRONOUS                    0 NO        PUBLIC    YES          ALL_LOGFILES    ALL_ROLES     NONE              NO        DISABLE                     0         0 DISABLE


DEMO@atp1_tp> select stamp,dest_id,thread#,sequence#,next_time,backup_count,name from v$archived_log where next_time > sysdate-2/24;

           STAMP    DEST_ID    THREAD#    SEQUENCE#              NEXT_TIME    BACKUP_COUNT                                                                     NAME
________________ __________ __________ ____________ ______________________ _______________ ________________________________________________________________________
   1,056,468,560          1          1        3,255 2020-11-14 15:29:14                  1 +RECO/FEIO1POD/ARCHIVELOG/2020_11_14/thread_1_seq_3255.350.1056468555
   1,056,468,562          1          2        3,212 2020-11-14 15:29:16                  1 +RECO/FEIO1POD/ARCHIVELOG/2020_11_14/thread_2_seq_3212.376.1056468557
   1,056,472,195          1          2        3,213 2020-11-14 16:29:51                  1 +RECO/FEIO1POD/ARCHIVELOG/2020_11_14/thread_2_seq_3213.334.1056472191
   1,056,472,199          1          1        3,256 2020-11-14 16:29:52                  1 +RECO/FEIO1POD/ARCHIVELOG/2020_11_14/thread_1_seq_3256.335.1056472193


DEMO@atp1_tp> select * from v$logfile;

   GROUP#    STATUS      TYPE                                             MEMBER    IS_RECOVERY_DEST_FILE    CON_ID
_________ _________ _________ __________________________________________________ ________________________ _________
        2           ONLINE    +DATA/FEIO1POD/ONLINELOG/group_2.276.1041244447    NO                               0
        2           ONLINE    +RECO/FEIO1POD/ONLINELOG/group_2.262.1041244471    YES                              0
        1           ONLINE    +DATA/FEIO1POD/ONLINELOG/group_1.275.1041244447    NO                               0
        1           ONLINE    +RECO/FEIO1POD/ONLINELOG/group_1.261.1041244471    YES                              0
        5           ONLINE    +DATA/FEIO1POD/ONLINELOG/group_5.266.1041245263    NO                               0
        5           ONLINE    +RECO/FEIO1POD/ONLINELOG/group_5.266.1041245277    YES                              0
        6           ONLINE    +DATA/FEIO1POD/ONLINELOG/group_6.268.1041245295    NO                               0
        6           ONLINE    +RECO/FEIO1POD/ONLINELOG/group_6.267.1041245307    YES                              0
        3           ONLINE    +DATA/FEIO1POD/ONLINELOG/group_3.281.1041245045    NO                               0
        3           ONLINE    +RECO/FEIO1POD/ONLINELOG/group_3.263.1041245057    YES                              0
        4           ONLINE    +DATA/FEIO1POD/ONLINELOG/group_4.282.1041245077    NO                               0
        4           ONLINE    +RECO/FEIO1POD/ONLINELOG/group_4.264.1041245089    YES                              0
        7           ONLINE    +DATA/FEIO1POD/ONLINELOG/group_7.267.1041245327    NO                               0
        7           ONLINE    +RECO/FEIO1POD/ONLINELOG/group_7.268.1041245339    YES                              0
        8           ONLINE    +DATA/FEIO1POD/ONLINELOG/group_8.279.1041245359    NO                               0
        8           ONLINE    +RECO/FEIO1POD/ONLINELOG/group_8.269.1041245371    YES                              0

As you can see, the Autonomous Database has two threads (RAC two nodes) with 4 online redo log groups (the Autonomous Database is not protected by Data Guard even if you enable Autonomous Data Guard… but that’s another story) and two members each. And the online logs are archived and backed up. All is there to be able to recover within the past 2 hours. But probably those archived logs are shipped to a dedicated destination to be recovered by the PITR feature “clone from timestamp”.

Anyway, this is a managed service and you must do with it. You don’t want the DBA responsibility, and then you lack some control. Again, if you understand it, everything is fine. In case of failure, you can start to create a clone from 2 hours ago and look at what you can do to repair the initial mistake. And two hours later, you know that you can create another clone which is recovered to the last second before the failure.

In summary

The Oracle Autonomous Database does not provide all the possibilities available when you manage it yourself, but it is still at the top of the main cloud-managed database services: RPO is at second between 2 hours ago and 60 days ago. RTO is in few hours even when dealing with terabytes. This, without anything to configure or pay in addition to the database service, and this includes the always free database. If the database is stopped during the backup window, the backup is skipped but at least one remains even if out of the retention window.

Cet article Recovery in the ☁ with Oracle Autonomous Database est apparu en premier sur Blog dbi services.

↧

Configure Data Guard with odacli on ODA

November 18, 2020, 12:09 pm

≫ Next: Control-M/EM Update jobs using Pgadmin tool

≪ Previous: Recovery in the ☁ with Oracle Autonomous Database

Introduction

I always thought integration of Data Guard was a missing feature on Oracle Database Appliance. For sure, you can configure Data Guard manually as you do on a classic server, but as an appliance, ODA should have proposed an easier way for configuring that. After years of waiting for this feature, it’s now available, and I just tested it. And I’m here to give you my feedback.

Disclaimer

You’ll need 2 ODAs for building this configuration. Data Guard on a single ODA is not supported and would be a nonsense.

I did this test on 2 brand new X8-2M ODAs deployed with the latest version 19.9. It should also work starting from 19.8, as it was the first version to include this feature.

The DEMO database used in this test is also version 19.9. I don’t know if it should work with previous versions, it’s probably OK for 18c and 12.2, but I’m not sure for 12.1 and 11.2.

This configuration through odacli is linked to your ODA system configuration, especially regarding the domain. If you do not want to use the db_domain on your primary database and you removed this parameter, you’ll have to know that the standby will anyway be configured with this db_domain inherited from the system’s db domain. And the Data Guard configuration will failed at step 7.

Another piece of advice regarding the use of this feature: you’ll need db_create_online_log_dest_1 parameter to be set. I’m used to remove this parameter to allow multiplexing of the redologs between db_create_file_dest and db_recovery_file_dest (the default behavior when no db_create_online_log_dest_X is configured) but it’s better to use both db_create_online_log_dest_1 and db_create_online_log_dest_2 for using the odacli’s Data Guard feature.

You should be aware that if you already have your own backup configuration without using odacli can be a problem. Make sure to understand that creating a backup configuration for a database will interfere with your existing backup configuration.

Data Guard configuration is related to a database: you’ll need to apply this procedure for each database for which you want a disaster recovery option.

Data Guard is free of charge on Enterprise Edition, but make sure to never open your standby database in read only mode. For opening a standby database read only the Active Guard option is needed. Standard Edition does not support Data Guard, but you can use Dbvisit standby as a third party disaster recover solution.

Finally, I would recommend to test this feature before applying it to your production databases. The goal being to be able to decide if this feature fits your needs or not. You can still configure Data Guard manually as you always did.

Create a backup configuration for your database

If you’re not familiar with backup configuration on ODA, you just need to know that you can manage your backup from odacli. It’s not a new feature, it’s not used very often but it can simplify your job and it rely on RMAN, meaning that it’s reliable.

For configuring Data Guard on ODA you first need to create a backup configuration for your primary. You can do the backup to OCI (the Oracle public Cloud) or nfs, most of us will probably use a classic nfs share. Let’s consider a nfs share /Rback/DEMO for our DEMO database, and a 10-day retention:

odacli create-backupconfig -n nfsRbackDEMO -w 10 -d NFS -c /Rback/DEMO

Backupconfig creation is fast:

odacli describe-job -i "15066dd7-485c-4bd9-9cdb-4259252b9ede"

Job details
----------------------------------------------------------------
                     ID:  15066dd7-485c-4bd9-9cdb-4259252b9ede
            Description:  create backup config:nfsRbackDEMO
                 Status:  Success
                Created:  November 18, 2020 9:06:46 AM CET
                Message:  backup config creation

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Backup config metadata persist           November 18, 2020 9:06:47 AM CET    November 18, 2020 9:06:47 AM CET    Success

Associate the backup config to the database

This backup configuration now needs to be linked to your DEMO database:

odacli modify-database -in DEMO -bin nfsRbackDEMO

odacli describe-job -i "b8f70d11-bc2f-49e2-81e3-5072305302ef"

Job details
----------------------------------------------------------------
                     ID:  b8f70d11-bc2f-49e2-81e3-5072305302ef
            Description:  Modify database : DEMO
                 Status:  Success
                Created:  November 18, 2020 9:14:29 AM CET
                Message:  Modify database

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Validate OMF parameter values            November 18, 2020 9:14:31 AM CET    November 18, 2020 9:14:32 AM CET    Success
update db with backupconfig attributes   November 18, 2020 9:14:39 AM CET    November 18, 2020 9:14:42 AM CET    Success
Enable Database Autobackup               November 18, 2020 9:14:42 AM CET    November 18, 2020 9:14:42 AM CET    Success
Enable Archivelog Autobackup             November 18, 2020 9:14:42 AM CET    November 18, 2020 9:14:42 AM CET    Success
Configure Control file Auto Backup Format November 18, 2020 9:14:42 AM CET    November 18, 2020 9:14:44 AM CET    Success
Backup Current Control file              November 18, 2020 9:14:44 AM CET    November 18, 2020 9:14:50 AM CET    Success
Update metadata for database:DEMO        November 18, 2020 9:14:50 AM CET    November 18, 2020 9:14:50 AM CET    Success

As you can see, applying this backup configuration will change some rman parameters, for example the path for the autobackup of the controfile.

Create a first backup of your primary with odacli

Now you can easily take an incremental level 0 backup of your database, an incremental level 0 being actually a full backup. The -ka option is for keeping archivelogs on disk:

odacli create-backup --backupType Regular-L0 -in DEMO -ka

odacli describe-job -i 56a941f7-2f26-4d87-b159-1e02de183db6

Job details
----------------------------------------------------------------
                     ID:  56a941f7-2f26-4d87-b159-1e02de183db6
            Description:  Create Regular-L0 Backup with TAG-auto for Db:DEMO in NFS:/Rback/DEMO/orabackups/oda00770p/database/3800776698/DEMO_770P
                 Status:  Success
                Created:  November 18, 2020 9:18:07 AM CET
                Message:

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Validate backup config                   November 18, 2020 9:18:09 AM CET    November 18, 2020 9:18:09 AM CET    Success
NFS location existence validation        November 18, 2020 9:18:09 AM CET    November 18, 2020 9:18:09 AM CET    Success
Backup Validations                       November 18, 2020 9:18:09 AM CET    November 18, 2020 9:18:11 AM CET    Success
Recovery Window validation               November 18, 2020 9:18:11 AM CET    November 18, 2020 9:18:13 AM CET    Success
Archivelog deletion policy configuration November 18, 2020 9:18:13 AM CET    November 18, 2020 9:18:14 AM CET    Success
Database backup                          November 18, 2020 9:18:14 AM CET    November 18, 2020 9:19:00 AM CET    Success

Backup can take few minutes to few hours depending on your database size.

Export the backup report to a json file

Each backup done with odacli generates a backup report, you’ll need to save backup report’s description in a json file that will be used on standby server. In this example, I put the json file on the nfs share to avoid having to copy the file from primary to standby server.

odacli list-backupreports | tail -n 2 | grep -v ^$ | awk '{print $1}'
1b64bcc5-dec7-4373-bf95-f8bdd18bdf08

odacli describe-backupreport -i 1b64bcc5-dec7-4373-bf95-f8bdd18bdf08 > /Rback/DEMO/br_DEMO_`date +"%Y%m%d_%H%M%S"`.json

Restore the database on standby server

Assuming your standby server has also the same nfs share mounted, you can restore the database on it. The irestore-database feature will create the database with odacli, and you may know that if you don’t specify a dbhome, a new dbhome will be created. So don’t forget to pick up the dbhome corresponding to your needs if you don’t want a new dbhome. You also need to give to odacli a new db_unique_name for your database (having different db_unique_names for primary and standby is a requirement for Data Guard) and specify the database role: STANDBY.

odacli list-dbhomes

odacli irestore-database -dh 82559355-db72-4616-b968-13b8509466fc -r /Rback/DEMO/br_DEMO_20201118_092635.json -u DEMO_769P -ro STANDBY
Password for SYS user:
{
  "jobId" : "775b58d4-8296-41fa-9c89-7d8761c0b69b",
  "status" : "Created",
  "message" : null,
  "reports" : [ ],
  "createTimestamp" : "November 18, 2020 11:35:20 AM CET",
  "resourceList" : [ ],
  "description" : "Database service recovery with db name: DEMO",
  "updatedTime" : "November 18, 2020 11:35:20 AM CET"
}

The restore can take minutes to hours depending on the size of the database.

odacli describe-job -i "775b58d4-8296-41fa-9c89-7d8761c0b69b"

Job details
----------------------------------------------------------------
                     ID:  775b58d4-8296-41fa-9c89-7d8761c0b69b
            Description:  Database service recovery with db name: DEMO
                 Status:  Success
                Created:  November 18, 2020 11:35:20 AM CET
                Message:

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Check if cluster ware is running         November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:21 AM CET   Success
Creating DbStorage for DbRestore         November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:21 AM CET   Success
Validating DiskSpace for DATA            November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:21 AM CET   Success
Generating SSH key                       November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:21 AM CET   Success
SSH key                                  November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:21 AM CET   Success
SSH key scan                             November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:21 AM CET   Success
Create TDE And Audit Dir Locations       November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:21 AM CET   Success
Create pfile for Auxiliary Instance      November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:21 AM CET   Success
Deleting FRA                             November 18, 2020 11:35:21 AM CET   November 18, 2020 11:35:22 AM CET   Success
Rman duplicate                           November 18, 2020 11:35:22 AM CET   November 18, 2020 11:37:49 AM CET   Success
Delete RECO FileGroup DEMO_769P          November 18, 2020 11:37:49 AM CET   November 18, 2020 11:37:50 AM CET   Success
Create RECO FileGroup DEMO_769P          November 18, 2020 11:37:50 AM CET   November 18, 2020 11:37:50 AM CET   Success
Delete RECO FileGroup DEMO_769P_9999     November 18, 2020 11:37:50 AM CET   November 18, 2020 11:37:50 AM CET   Success
Creating pfile from spfile               November 18, 2020 11:37:50 AM CET   November 18, 2020 11:37:51 AM CET   Success
Set PFile Ownership                      November 18, 2020 11:37:51 AM CET   November 18, 2020 11:37:51 AM CET   Success
Customize Db Parameters                  November 18, 2020 11:37:51 AM CET   November 18, 2020 11:37:52 AM CET   Success
Shutdown And Start database              November 18, 2020 11:37:52 AM CET   November 18, 2020 11:38:33 AM CET   Success
Create spfile for restore db             November 18, 2020 11:38:33 AM CET   November 18, 2020 11:38:34 AM CET   Success
Set PFile Ownership                      November 18, 2020 11:38:34 AM CET   November 18, 2020 11:38:34 AM CET   Success
Shutdown And Mount database              November 18, 2020 11:38:34 AM CET   November 18, 2020 11:39:19 AM CET   Success
Enable block change tracking             November 18, 2020 11:39:19 AM CET   November 18, 2020 11:39:27 AM CET   Success
Register Database taskflow               November 18, 2020 11:39:37 AM CET   November 18, 2020 11:42:11 AM CET   Success
Create SPFile in shared loc              November 18, 2020 11:39:37 AM CET   November 18, 2020 11:39:52 AM CET   Success
Delete Local Spfile                      November 18, 2020 11:39:52 AM CET   November 18, 2020 11:39:52 AM CET   Success
Register DB with clusterware             November 18, 2020 11:39:52 AM CET   November 18, 2020 11:40:36 AM CET   Success
Set SysPassword and Create PwFile        November 18, 2020 11:40:36 AM CET   November 18, 2020 11:40:38 AM CET   Success
Creating pfile                           November 18, 2020 11:40:38 AM CET   November 18, 2020 11:40:38 AM CET   Success
Updating db env                          November 18, 2020 11:40:38 AM CET   November 18, 2020 11:40:39 AM CET   Success
Enable DbSizing Template                 November 18, 2020 11:40:39 AM CET   November 18, 2020 11:41:25 AM CET   Success
Create tns entry                         November 18, 2020 11:41:25 AM CET   November 18, 2020 11:41:26 AM CET   Success
Running datapatch                        November 18, 2020 11:41:26 AM CET   November 18, 2020 11:41:27 AM CET   Success
Set CPU pool                             November 18, 2020 11:41:27 AM CET   November 18, 2020 11:41:27 AM CET   Success
Reset Associated Networks                November 18, 2020 11:42:11 AM CET   November 18, 2020 11:42:13 AM CET   Success
Copy Pwfile to Shared Storage            November 18, 2020 11:42:13 AM CET   November 18, 2020 11:42:17 AM CET   Success

Copy the TrustStore file from standby server to primary

You need to copy a file from standby server to primary server. This file will only be used during intial Data Guard configuration:

scp /opt/oracle/dcs/dcscli/cacerts root@oda00770p.intranet.ch:/opt/oracle/dcs/dcscli/cacerts.oda00769p

Create a json file for Data Guard configuration

You can configure Data Guard in interactive mode, but it’s better to prepare a json file with all the parameters. You can later duplicate this json file for your other databases.

vi /home/oracle/scripts/DEMO_DTG.json

{
  "name": "DEMO_DTG",
  "protectionMode": "MAX_AVAILABILITY",
  "enableFlashback": true,
  "enableActiveDg": false,
  "replicationGroups": [
    {
      "sourceEndPoints": [
        {
          "endpointType": "PRIMARY",
          "hostName": "oda00770p.intranet.ch",
          "listenerPort": 1521,
          "databaseUniqueName": "DEMO_770P",
          "ipAddress": "125.150.226.239"
        }
      ],
      "targetEndPoints": [
        {
          "endpointType": "STANDBY",
          "hostName": "oda00769p.intranet.ch",
          "listenerPort": 1521,
          "databaseUniqueName": "DEMO_769P",
          "ipAddress": "125.150.226.237"
        }
      ],
      "transportType": "SYNC"
    }
  ]
}

Configure Data Guard with odacli

Now you can configure Data Guard with your json file as an input. You will be asked to manually give the standby server’s name, the password of the ODA webconsole, the path of the file you copied from standby to primary (the TrustStore file), the password of this TrustStore, and the primary SYS password. The password for the TrustStore can be found on standby server here:

cat /opt/oracle/dcs/dcscli/dcscli.conf  | grep TrustStorePasswpord
TrustStorePassword=changeit


odacli configure-dataguard -r /home/oracle/scripts/DEMO_DTG.json
Standby site address: oda00769p.intranet.ch
BUI username for Standby site (default: oda-admin):
BUI password for Standby site:
TrustStore path for Standby DCS server: /opt/oracle/dcs/dcscli/cacerts.oda00769p
TrustStore password for Standby DCS server:
Primary database syspassword:
The TrustStore of the Standby site should be removed unless it is necessary for some other purpose. Do you want to delete it? (Y/N, default:Y): N
As a requirement, Standby database’s SYS password will be set to Primary database’s after Data Guard configuration. Are you sure you want to proceed with Data Guard configuration? (Y/N): Y
*******************************************************************************************
Configure Data Guard DEMO_DTG started
*******************************************************************************************
Step 1: Validate Data Guard configuration request (Primary site)
Description: Validate DG Config Creation
Job ID: b666ee0c-225a-4228-aadf-64eb14ef2dff
Started November 18, 2020 15:29:54 PM CET
Validate create Data Guard configuration request
Finished November 18, 2020 15:29:58 PM CET
*******************************************************************************************
Step 2: Validate Data Guard configuration request (Standby site)
Description: Validate DG Config Creation
Job ID: 5bca9b36-9920-4991-ba23-b45043af639c
Started November 18, 2020 15:29:58 PM CET
Validate create Data Guard configuration request
Finished November 18, 2020 15:30:00 PM CET
*******************************************************************************************
Step 3: Download password file from Primary database (Primary site)
Description: Download orapwd file from Primary database
Started November 18, 2020 15:30:00 PM CET
Prepare orapwd file for Primary database DEMO
Finished November 18, 2020 15:30:04 PM CET
*******************************************************************************************
Step 4: Upload password file to Standby database (Standby site)
Description: Upload orapwd file to Standby database
Started November 18, 2020 15:30:04 PM CET
Write orapwd file to Standby database DEMO
Finished November 18, 2020 15:30:10 PM CET
*******************************************************************************************
Step 5: Configure Primary database (Primary site)
Description: DG Config service creation - ConfigurePrimary
Job ID: de288bc9-579f-4a1c-8db4-0328d3cfcc1e
Started November 18, 2020 15:30:10 PM CET
Configure host DNS on primary env
Configure Data Guard Tns on primary env
Enable Data Guard related Db parameters for primary env
Enable force logging and archivelog mode in primary env
Enable FlashBack
Configure network parameters for local listener on primary env
Restart listener on primary env
Create services for primary db
Finished November 18, 2020 15:30:24 PM CET
*******************************************************************************************
Step 6: Configure Standby database (Standby site)
Description: DG Config service creation - ConfigureStandby
Job ID: 0b3dbf45-5023-4b5f-a978-a8a512e58c5f
Started November 18, 2020 15:30:24 PM CET
Configure Data Guard Tns on standby env
Configure host DNS on standby env
Clear Data Guard related Db parameters for standby env
Enable Data Guard related Db parameters for standby env
Enable force logging and archivelog mode in standby env
Populate standby database metadata
Configure network parameters for local listener on standby env
Reset Db sizing and hidden parameters for ODA best practice
Restart Listener on standby env
Create services for standby db
Finished November 18, 2020 15:31:42 PM CET
*******************************************************************************************
Step 7: Configure and enable Data Guard (Primary site)
Description: DG Config service creation - ConfigureDg
Job ID: c89f51ef-7dc5-410b-8421-f7892353babc
Started November 18, 2020 15:31:42 PM CET
Config and enable Data Guard
Post check Data Guard configuration
Finished November 18, 2020 15:33:09 PM CET
*******************************************************************************************
Step 8: Enable Flashback (Standby site)
Description: DG Config service creation - EnableFlashback
Job ID: cdd1899c-f7be-40e9-8bc1-0396acd62c20
Started November 18, 2020 15:33:09 PM CET
Enable FlashBack
Finished November 18, 2020 15:33:19 PM CET
*******************************************************************************************
Step 9: Re-enable Data Guard (Primary site)
Description: DG Config service creation - ReenableDg
Job ID: 391fe101-b286-4664-85a7-09a9679a5027
Started November 18, 2020 15:33:19 PM CET
Post check Data Guard configuration
Finished November 18, 2020 15:33:21 PM CET
*******************************************************************************************
Step 10: Create Data Guard status (Primary site)
Description: DG Status service creation - NewDgconfig
Job ID: 587f7f95-8a4f-4be6-aaa4-a1b6f4fdffa3
Started November 18, 2020 15:33:21 PM CET
Create Data Guard status
Finished November 18, 2020 15:33:23 PM CET
*******************************************************************************************
Step 11: Create Data Guard status (Standby site)
Description: DG Status service creation - NewDgconfig
Job ID: d1cee908-3333-4f1b-b91d-70ae9f13f7dd
Started November 18, 2020 15:33:23 PM CET
Create Data Guard status
Finished November 18, 2020 15:33:25 PM CET
*******************************************************************************************
Configure Data Guard DEMO_DTG completed
*******************************************************************************************

Now configuration is done. Note that this configuration is not a unique job but an interactive process that will span multiple jobs on your 2 ODAs.

Is it really working?

It was my first question. Let’s check that:

oracle@oda00770p:/home/oracle/ [DEMO] dgmgrl /
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Wed Nov 18 15:54:40 2020
Version 19.9.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected to "DEMO_770P"
Connected as SYSDG.
DGMGRL> show configuration;

Configuration - DEMO_DTG

  Protection Mode: MaxAvailability
  Members:
  DEMO_770P - Primary database
    DEMO_769P - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 19 seconds ago)




DGMGRL> show database 'DEMO_769P';

Database - DEMO_769P

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 0 Byte/s
  Real Time Query:    OFF
  Instance(s):
    DEMO

Database Status:
SUCCESS

Yes it’s working fine.

Switchover test

To make sure it’s working, let’s do a switchover from primary server. You first need to pick up the dataguardstatus id:

odacli list-dataguardstatus

ID                                       Name                             Database Name        Role       Protection Mode    Apply Lag       Transport Lag   Apply Rate      Status
---------------------------------------- -------------------------------- -------------------- ---------- ------------------ --------------- --------------- --------------- ----------
d859288f-29fd-45a6-b1b3-c4bb8b52dfd7     DEMO_DTG                         DEMO                 Primary    MaxAvailability    12 seconds      0 seconds       528.00 KByte/s  Configured

odacli describe-dataguardstatus -i d859288f-29fd-45a6-b1b3-c4bb8b52dfd7
Dataguard Status details
----------------------------------------------------------------
                     ID: d859288f-29fd-45a6-b1b3-c4bb8b52dfd7
                   Name: DEMO_DTG
          Database Name: DEMO
                   Role: Primary
        Protection Mode: MaxAvailability
              Apply Lag: 0 seconds
          Transport Lag: 0 seconds
             Apply Rate: 0 Byte/s
                 Status: Configured
           Updated Time: November 18, 2020 3:56:37 PM CET


odacli switchover-dataguard -i d859288f-29fd-45a6-b1b3-c4bb8b52dfd7 -u DEMO_769P


odacli describe-job -i "11d03cf3-d9a1-467b-8226-4363732161d5"

Job details
----------------------------------------------------------------
                     ID:  11d03cf3-d9a1-467b-8226-4363732161d5
            Description:  Dataguard Lifecycle operation service - SwitchoverDg
                 Status:  Success
                Created:  November 18, 2020 3:59:18 PM CET
                Message:

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Precheck switchover DataGuard            November 18, 2020 3:59:18 PM CET    November 18, 2020 3:59:21 PM CET    Success
Switchover DataGuard                     November 18, 2020 3:59:21 PM CET    November 18, 2020 4:00:15 PM CET    Success
Postcheck switchover DataGuard           November 18, 2020 4:00:15 PM CET    November 18, 2020 4:00:16 PM CET    Success
Check if DataGuard config is updated     November 18, 2020 4:00:16 PM CET    November 18, 2020 4:00:36 PM CET    Success

Switchover is OK. Let’s check with dgmgrl:

oracle@oda00769p:/home/oracle/ [DEMO] dgmgrl /
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Wed Nov 18 16:16:14 2020
Version 19.9.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected to "DEMO_769P"
Connected as SYSDG.
DGMGRL> show configuration;

Configuration - DEMO_DTG

  Protection Mode: MaxAvailability
  Members:
  DEMO_769P - Primary database
    DEMO_770P - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 37 seconds ago)

DGMGRL> show database 'DEMO_770P';

Database - DEMO_770P

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 14.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    DEMO

Database Status:
SUCCESS

Nice!

Switchback test

Now I’m quite confident for the switchback. Let’s try from standby server:

[root@oda00769p ~]# odacli list-dataguardstatus

ID                                       Name                             Database Name        Role       Protection Mode    Apply Lag       Transport Lag   Apply Rate      Status
---------------------------------------- -------------------------------- -------------------- ---------- ------------------ --------------- --------------- --------------- ----------
d859288f-29fd-45a6-b1b3-c4bb8b52dfd7     DEMO_DTG                         DEMO                 Standby    MaxAvailability    0 seconds       0 seconds       2.00 KByte/s    Configured

[root@oda00769p ~]# odacli describe-dataguardstatus -i d859288f-29fd-45a6-b1b3-c4bb8b52dfd7
Dataguard Status details
----------------------------------------------------------------
                     ID: d859288f-29fd-45a6-b1b3-c4bb8b52dfd7
                   Name: DEMO_DTG
          Database Name: DEMO
                   Role: Primary
        Protection Mode: MaxAvailability
              Apply Lag: 0 seconds
          Transport Lag: 0 seconds
             Apply Rate: 14.00 KByte/s
                 Status: Configured
           Updated Time: November 18, 2020 4:17:12 PM CET




[root@oda00769p ~]#  odacli switchover-dataguard -i d859288f-29fd-45a6-b1b3-c4bb8b52dfd7 -u DEMO_770P
Password for target database:
{
  "jobId" : "99bbbaff-4716-4d84-ada6-1da256a93242",
  "status" : "Created",
  "message" : null,
  "reports" : [ ],
  "createTimestamp" : "November 18, 2020 16:19:15 PM CET",
  "resourceList" : [ ],
  "description" : "Dataguard Lifecycle operation service - SwitchoverDg",
  "updatedTime" : "November 18, 2020 16:19:15 PM CET"
}
You have new mail in /var/spool/mail/root
[root@oda00769p ~]# odacli describe-job -i "99bbbaff-4716-4d84-ada6-1da256a93242"


Job details
----------------------------------------------------------------
                     ID:  99bbbaff-4716-4d84-ada6-1da256a93242
            Description:  Dataguard Lifecycle operation service - SwitchoverDg
                 Status:  Success
                Created:  November 18, 2020 4:19:15 PM CET
                Message:

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Precheck switchover DataGuard            November 18, 2020 4:19:15 PM CET    November 18, 2020 4:19:18 PM CET    Success
Switchover DataGuard                     November 18, 2020 4:19:18 PM CET    November 18, 2020 4:20:06 PM CET    Success
Postcheck switchover DataGuard           November 18, 2020 4:20:06 PM CET    November 18, 2020 4:20:07 PM CET    Success
Check if DataGuard config is updated     November 18, 2020 4:20:07 PM CET    November 18, 2020 4:20:27 PM CET    Success

Yes it worked. Brilliant. You can also check with dgmgrl or by querying current_scn on standby database. Everything is fine regarding this feature.

Conclusion

The Data Guard configuration with odacli is a game changer. Compared to manual configuration, it’s much easier to do, it’s fast to deploy on all your databases and it’s reliable because it’s just an interface on top of DGMGRL and RMAN. I highly recommend using this feature! Good job Oracle ODA’s team! You just made ODA even better.

Cet article Configure Data Guard with odacli on ODA est apparu en premier sur Blog dbi services.

↧

Control-M/EM Update jobs using Pgadmin tool

November 19, 2020, 7:13 am

≫ Next: PostgreSQL 14 will support OUT parameters for procedures

≪ Previous: Configure Data Guard with odacli on ODA

introduction

Hi team,

Today we will work on Control-M database,we already know how to update it using CLI and also using Control-M GUI,but today we will use a reall nice tool named PgAdmin.
Its a GUI where you can connect your controlm postgres database and perform administartions tasks.It’s a freeware and quite complete,so don’t hesitate to test it
We will see how to update jobs using this tool,but probably on next blogs we will check how to update other Control-M parameters.
So let’s go!

Connect your Control-M databases to PgAdmin

First you must have PgAdmin tool installed on your PC and also connect you 2 control-m database ( for Control-M sever and Control-M/EM using your database credentials).
Go to this page and enjoy (it’s quite easy to install)
https://www.pgadmin.org/download/
I will probably make a quick blog to show you how to connect it and browse into Control-M database.

Once connected we will select on Control-M database the table named CMR_AJF (in order to update our job only in the monitoring domain.)This table contains the parameters of the active jobs

Select the job you want to update on the AJF

Update job fields in PgAdmin

Back to PgAdmin, we deploy the CMR_AJF table to see the fields inside and select in the job the columns we want to update ( here we will choose the name and the description )
Save the changes in PgAdmin then connect to your control-M server using controlm user.

Update job in Control-M database using reset_ecs command

To take the changes in account we have to perform the reset_ecs command under controlm user( corresponding to a force download )

Check on workload automation if changes are effective

Back to the monitoring domain on our Control-M GUI we can see that jobname and definition are now updated

conclusion

PgAdmin is a user friendly tool to perform SQL tasks and queries , that can be used to ease actions on Control-M database.Lot of possibility are given to update your databases and no need to have gret knowledge of PostgreSQL syntax to do it,just be careful of what you do on your databases!(you can consult a postgre admin if you have some doubts)!
Feel free to check other dbi bloggers ( for example related to PostgreSQL ) and see you next time!!!

Cet article Control-M/EM Update jobs using Pgadmin tool est apparu en premier sur Blog dbi services.

↧

PostgreSQL 14 will support OUT parameters for procedures

November 20, 2020, 4:17 am

≫ Next: odacli create-database -io doesn’t work for older databases in 19.9

≪ Previous: Control-M/EM Update jobs using Pgadmin tool

Before PostgreSQL 11 there was no possibility to work with procedures in PostgreSQL, only functions have been supported. Since PostgreSQL 11, procedures are supported and many waited for that because procedures also brought transaction control (commit/rollback), which is not possible with functions. Next year, when PostgreSQL 14 will be released there will also be support for OUT parameters. Currently only IN, INOUT and VARIADIC are supported. This makes PostgreSQL’s procedures more compatible to Oracle’s implementation of procedures, so lets have a look.

The only way for returning something from a procedure currently is to use INOUT:

postgres=# select version();
                                                          version                                                          
---------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 13.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
(1 row)

postgres=# create table t1 ( a int, b text);
CREATE TABLE
postgres=# insert into t1 values (1,'aa');
INSERT 0 1
postgres=# create or replace procedure p1 ( p_val inout int )
postgres-# as $$
postgres$# declare
postgres$# begin
postgres$#   select a
postgres$#     into p_val
postgres$#    from t1;
postgres$# end;
postgres$# $$ language plpgsql;
CREATE PROCEDURE

This simple procedure has one INOUT parameter and the parameter will contain the result of the select once the procedure is executed:

postgres=# call p1(5);
 p_val 
-------
     1
(1 row)

You can, of course, use multiple INOUT parameters as well:

postgres=# create or replace procedure p1 ( p_val inout int 
postgres(#                                , p_val2 inout text)
postgres-# as $$
postgres$# declare
postgres$# begin
postgres$#   select a, b
postgres$#     into p_val, p_val2
postgres$#    from t1;
postgres$# end;
postgres$# $$ language plpgsql;
CREATE PROCEDURE
postgres=# call p1 (5,'ccc');
 p_val | p_val2 
-------+--------
     1 | aa
(1 row)

But if you try to use an OUT parameter this will not work:

postgres=# create or replace procedure p1 ( p_val out int )
postgres-# as $$
postgres$# declare
postgres$# begin
postgres$#   select a
postgres$#     into p_val
postgres$#    from t1;
postgres$# end;
postgres$# $$ language plpgsql;
ERROR:  procedures cannot have OUT arguments
HINT:  INOUT arguments are permitted.
postgres=#

This will change with PostgreSQL 14:

postgres=# select version();
                                                  version                                                  
-----------------------------------------------------------------------------------------------------------
 PostgreSQL 14devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
(1 row)

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values (1,'aaa');
INSERT 0 1
postgres=# create or replace procedure p1 ( p_val out int )
postgres-# as $$
postgres$# declare
postgres$# begin
postgres$#   select a
postgres$#     into p_val
postgres$#    from t1;
postgres$# end;
postgres$# $$ language plpgsql;
CREATE PROCEDURE

For testing that you somehow need to declare a variable to hold the value, which will be returned:

postgres=# do
postgres-# $$
postgres$# declare 
postgres$#   n int;
postgres$# begin
postgres$#   call p1(n);
postgres$#   raise notice '%', n;
postgres$# end;
postgres$# $$;
NOTICE:  1
DO

Nice.

Cet article PostgreSQL 14 will support OUT parameters for procedures est apparu en premier sur Blog dbi services.

↧

odacli create-database -io doesn’t work for older databases in 19.9

November 20, 2020, 11:10 am

≫ Next: Solr Sharding – Shard creation

≪ Previous: PostgreSQL 14 will support OUT parameters for procedures

Introduction

On ODA, you create your databases with odacli create-database. You can create a complete database, and it takes about 10 minutes, and you can also create an instance-only database. It will simply and quickly create a record in the ODA registry, start the instance with a basic pfile, and that’s it. Then you’re supposed to do the next step: feed up the database with a manual restore or an RMAN duplicate.

Create a 12.2 instance-only is not OK on ODA 19.9

Unfortunately, creating an instance-only doesn’t work for me with 19.9. I tried to create a 12.2 instance and it failed immediatly:

odacli list-dbhomes

ID                                       Name                 DB Version                               Home Location                                 Status
---------------------------------------- -------------------- ---------------------------------------- --------------------------------------------- ----------
1de83958-c38e-4b9b-9bd3-6c510ce2d3c3     OraDB19000_home1     19.9.0.0.201020                          /u01/app/oracle/product/19.0.0.0/dbhome_1     Configured
c191668f-10de-4184-b22e-b953648bf4a7     OraDB12201_home1     12.2.0.1.201020                          /u01/app/oracle/product/12.2.0.1/dbhome_1     Configured


odacli create-database -dh 'c191668f-10de-4184-b22e-b953648bf4a7' -u ADM001BD_765D -n ADM001BD -r asm -io

odacli describe-job -i b86b28be-0d71-49db-a926-59b98d0c0d6c

Job details
----------------------------------------------------------------
                     ID:  b86b28be-0d71-49db-a926-59b98d0c0d6c
            Description:  Database service creation with db name: ADM001BD
                 Status:  Failure
                Created:  November 19, 2020 9:06:16 AM CET
                Message:  DCS-10001:Internal error encountered: Failed to run sql in method : createTemporaryInstance.Unable to startup standby instance in nomount mode as output conatins ora- :

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Database Service creation                November 19, 2020 9:06:19 AM CET    November 19, 2020 9:07:06 AM CET    Failure
Database Service creation                November 19, 2020 9:06:19 AM CET    November 19, 2020 9:07:06 AM CET    Failure
Setting up ssh equivalance               November 19, 2020 9:06:19 AM CET    November 19, 2020 9:06:19 AM CET    Success
Creating volume datADM001BD              November 19, 2020 9:06:19 AM CET    November 19, 2020 9:06:31 AM CET    Success
Creating volume rdoADM001BD              November 19, 2020 9:06:31 AM CET    November 19, 2020 9:06:43 AM CET    Success
Creating ACFS filesystem for DATA        November 19, 2020 9:06:43 AM CET    November 19, 2020 9:06:54 AM CET    Success
Creating ACFS filesystem for RECO        November 19, 2020 9:06:54 AM CET    November 19, 2020 9:07:05 AM CET    Success
Database Service creation                November 19, 2020 9:07:05 AM CET    November 19, 2020 9:07:06 AM CET    Failure
Auxiliary Instance Creation              November 19, 2020 9:07:05 AM CET    November 19, 2020 9:07:06 AM CET    Failure

Looking in the dcscli.log and dcs-agent.log (under /opt/oracle/dcs/log/) didn’t help. But I found a new init file in the database home:

cat /u01/app/oracle/product/12.2.0.1/dbhome_1/dbs/initADM001BD.ora
db_name=ADM001BD
db_unique_name=ADM001BD_765D
wallet_root=+DATA/ADM001BD_765D,tde_configuration='KEYSTORE_CONFIGURATION=FILE'db_domain=intranet.ch

Is an instance really able to start on this kind of pfile? Let’s try.

cp /u01/app/oracle/product/12.2.0.1/dbhome_1/dbs/initADM001BD.ora /tmp/

su - oracle

export ORACLE_HOME=/u01/app/oracle/product/12.2.0.1/dbhome_1
export ORACLE_SID=ADM001BD
$ORACLE_HOME/bin/sqlplus / as sysdba

SQL*Plus: Release 12.2.0.1.0 Production on Thu Nov 19 10:01:20 2020

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount pfile='/tmp/initADM001BD.ora';
LRM-00101: unknown parameter name 'wallet_root'
LRM-00101: unknown parameter name 'tde_configuration'
ORA-01078: failure in processing system parameters
exit

No for sure it can’t work. These wallet_root and tde_configuration parameters are for 18c and later databases. Don’t expect it will work on 11.2, 12.1 or 12.2.

Let’s also try to start a 19c instance with this pfile:

export ORACLE_HOME=/u01/app/oracle/product/19.0.0.0/dbhome_1
$ORACLE_HOME/bin/sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Nov 19 09:58:59 2020
Version 19.9.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount pfile='/tmp/initADM001BD.ora';
ORACLE instance started.

Total System Global Area  436204272 bytes
Fixed Size                  8897264 bytes
Variable Size             285212672 bytes
Database Buffers          134217728 bytes
Redo Buffers                7876608 bytes

Yes a 19c instance can start on it.

I also tested an instance-only creation with odacli for a 19c and it worked fine, the init file being the same.

Workaround

As my database is in Failed status, my registry is not very clean:

odacli list-databases

ID                                       DB Name    DB Type  DB Version           CDB        Class    Shape    Storage    Status        DbHomeID      
---------------------------------------- ---------- -------- -------------------- ---------- -------- -------- ---------- ------------ ----------------------------------------
8ad09a72-fbcf-410f-854b-cd8c945fb3de     DBTEST     Si       19.9.0.0.201020      false      Oltp     Odb1s    Asm        Configured   1de83958-c38e-4b9b-9bd3-6c510ce2d3c3
1d0de144-c93c-4f26-b00a-6cbbfafda6dd     ADM001BD   Si       12.2.0.1.201020      false      Oltp     Odb1     Asm        Failed       c191668f-10de-4184-b22e-b953648bf4a7

What you could do is to delete this database (it will be correctly deleted) and create a complete database: it takes longer but it will succeed for whatever version you use:

odacli create-database  -cs WE8ISO8859P15 -no-c -dh 'c191668f-10de-4184-b22e-b953648bf4a7' -u ADM001BD_765D -n ADM001BD -r asm -s odb1s -no-co

When database creation is complete, you need to remove all the files belonging to it, let’s generate a script for that purpose:

su - oracle
. oraenv <<< ADM001BD
sqlplus -s / as sysdba

set pages 0
set lines 200
set serveroutput off
set feedback off
spool /tmp/delete_files.txt
select 'asmcmd rm -rf '||file_name  from dba_data_files;
select 'asmcmd rm -rf '||file_name  from dba_temp_files;
select 'asmcmd rm -rf '||member from v$logfile;
select 'asmcmd rm -rf '||value from v$spparameter where name='control_files';
select 'asmcmd rm -rf '||value from v$parameter where name='spfile';
spool off
exit

su - grid
cat /tmp/delete_files.txt
asmcmd rm -rf +DATA/ADM001BD_765D/DATAFILE/undotbs1.298.1056884325                                                                                                                                
asmcmd rm -rf +DATA/ADM001BD_765D/DATAFILE/system.306.1056884289                                                                                                                                  
asmcmd rm -rf +DATA/ADM001BD_765D/DATAFILE/sysaux.304.1056884317                                                                                                                                  
asmcmd rm -rf +DATA/ADM001BD_765D/DATAFILE/users.309.1056884715                                                                                                                                   
asmcmd rm -rf +DATA/ADM001BD_765D/TEMPFILE/temp.310.1056884383                                                                                                                                    
asmcmd rm -rf +RECO/ADM001BD_765D/ONLINELOG/group_3.2390.1056884377                                                                                                                               
asmcmd rm -rf +RECO/ADM001BD_765D/ONLINELOG/group_2.2389.1056884377                                                                                                                               
asmcmd rm -rf +RECO/ADM001BD_765D/ONLINELOG/group_1.2386.1056884377                                                                                                                               
asmcmd rm -rf +RECO/ADM001BD_765D/CONTROLFILE/current.2385.1056884373                                                                                                                             
asmcmd rm -rf +DATA/ADM001BD_765D/PARAMETERFILE/spfile.308.1056884469   
sh /tmp/delete_files.txt
exit

su - oracle
. oraenv <<< ADM001BD
startup force nomount;

startup failed: ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file '/u01/app/oracle/product/12.2.0.1/dbhome_1/dbs/initADM001BD.ora'

starting Oracle instance without parameter file for retrieval of spfile
Oracle instance started

Total System Global Area    1073741824 bytes

Fixed Size                     8628936 bytes
Variable Size                293602616 bytes
Database Buffers             763363328 bytes
Redo Buffers                   8146944 bytes

Conclusion

This little bug is a bit disappointing, but there is a workaround. I opened a SR in MOS and will keep you informed if a better workaround is proposed by Oracle.

Cet article odacli create-database -io doesn’t work for older databases in 19.9 est apparu en premier sur Blog dbi services.

↧

Solr Sharding – Shard creation

November 22, 2020, 4:11 am

≫ Next: Will PostgreSQL14 finally come with schema variables?

≪ Previous: odacli create-database -io doesn’t work for older databases in 19.9

I wrote this blog in conjunction with the one about the Solr Sharding – Concepts & Methods a year ago and it was 95% completed but then I had other stuff to do and just forgot I didn’t publish it. It might be a little bit late but well, better late than never… It will complete the overview I wanted to share around the Alfresco Clustering and in particular around the Solr Sharding. In this blog, I will talk about the creation of Shards. This includes the distribution of the different Instances, explaining the global parameters that can be used and showing some examples of what can be done with the concrete commands to achieve that.

I. Shard distribution

One of the strong points of using the Solr Sharding is that you can do pretty much what you want with the distribution of the Shards. It means that you can choose the best distribution of Shards based on your requirements, based on the resources available (Number of Solr Servers, RAM, I/O, …) and then Alfresco can just work with that seamlessly. Let’s consider some situations:

1 Solr Server – 2 Shards – No HA (1 Shard Instance per Shard)
2 Solr Server – 2 Shards – No HA (1 Shard Instance per Shard)
2 Solr Server – 2 Shards – HA (2+ Shard Instances per Shard)
6 Solr Server – 3 Shards – HA (2+ Shard Instances per Shard)

For the above examples, there are several possible representation of the Shards repartitions across the different Solr Servers. If we consider all Shards in the same way (same number of Shard Instance for each), then you can represent the above examples as follow:

If the Shard N°1 is the most important one and you want this one to be available on all Solr Servers with a maximum of 2 Shards per Solr Server, then this is another possible solution:

There are several ways to distribute the Shards and the good thing is that it is not really linked with the Sharding Method used. What I mean is that you can define the distribution that you want and it will be possible whatever the Sharding Method is. However, in some cases, the Method used might have an impact on whether or not the chosen distribution make sense. If I take the above example again where the Shard N°1 would be considered he most important one, then it means that you should know exactly what is inside this Shard N°1 but if DB_ID is used, then the content of this Shard will be random… Therefore, even if there is no hard link, some choice might not make a lot of sense if both aspects aren’t considered ;).

Alfresco (ACS only, not community) provides a graphical representation of the Solr Shards on the Alfresco Administration Console (https://alfresco-dns.domain/alfresco/s/enterprise/admin/admin-flocs). This is very useful to quickly see what is setup, how it is organized and what is the status of each and every Shards: whether they are up-to-date, lagging behind or not running at all. Here is an example of 1 Shard Group composed of 5 Shards with 2 Instances each:

In addition to that, I always find it very useful to look at the Solr reports (https://solr-dns.domain/solr/admin/cores?action=REPORT and https://solr-dns.domain/solr/admin/cores?action=SUMMARY) since it gives an overview of exactly how many acls and nodes are indexed. These are well known reports for Alfresco but in case you never saw these, I would advise you to take a look.

II. Shard creation parameters

As mentioned in the previous blog related to the Sharding Methods, when creating a Shard Instance, you will need to define some parameters. Here, I will only define them in the URL/Command because it is easier to have everything there. In addition to these parameters specific to each Method, there are some that are global. Before diving into the commands, let’s take a look at a few parameters and what they represent:

storeRef: Which Alfresco Store should this Shard index
numShards: Maximum number of different Shards that can be created
- For DB_ID_RANGE, even if you need only 2 Shards at the moment, put more here (like 10/20) to be able to create new Shards in the future as the index grows. Once this limit is defined, you cannot exceed it so keep some margin
- For all other Methods, set that to the number of Shards needed
numNodes: Maximum number of Solr Servers that can be used
- For DB_ID_RANGE, even if you have only 2 Solr Servers at the moment, put more here (like 10/20) to be able to install new Solr Servers and create Shard Instances in the future. Once this limit is defined, you cannot exceed it so keep some margin
- For all other Methods, even if you have only 2 Solr Servers at the moment, put more here (like 10/20). Maybe in the future you will want to have the same Shards (fix number) on more Solr Servers. Once this limit is defined, you cannot exceed it so keep some margin
nodeInstance: The “ID/Number” of the Solr Server on which the command is going to be executed, from 1 to numNodes, included
template: Which template should be used as base to create the new Core. The templates can be found under <SOLR_HOME>/solrhome/templates/
coreName: The base name of the Solr Core that this Shard should be part of. If you do not specify one, it will be generated from the Alfresco storeRef (E.g.: “workspace-SpacesStore”). Usual names can be “alfresco” or “archive” for example (Name of the default non-sharded cores)
shardIds: An ID that controls the generated Core Name, Base URL and Solr Data storage. This is to differentiate instances of Shards on a single Solr Server. You might think that this is the “ID/Number” of the Shard but not necessarily: with the DB_ID_RANGE for example, you can have the same shardIds used on two different Solr Servers for different Shards because the important thing is the range, it’s not the shardIds parameter in this case. So be careful here, this might be confusing. A best practice is however to have this shardIds set to the real “ID/Number” of the Shard so that it avoids confusion in all cases
replicationFactor: Technically speaking, the replication of a Shard Instance is another Shard Instance indexing the same set of nodes (so from the same Shard). You can normally mention the “replicationFactor” parameter in Solr but with the way it was implemented for Alfresco, you don’t need to care about that actually. You will see in the example below that to have several Shard Instances for the same Shard, you only need to execute the same commands on different servers
property.XXX: All the properties that can be set in the template’s solrcore.properties as well as the Sharding Method specific ones like:
- property.data.dir.root: The place where the index will be put
- property.shard.method: The Sharding Method to be used to create this Shard Instance
- property.shard.instance: This is supposed to be the real “ID/Number” of the Shard for which a Shard Instance should be created (from 0 to numShards-1)

In summary, with the above parameters:

The name of the Core will be: ${coreName}-${shardIds}
The Solr Data will be put under: ${data.dir.root}/${coreName}-${shardIds}
The Solr Home will be put under: ${solr.solr.home}/${template}–${coreName}–shards–${numShards}-x-${replicationFactor}–node–${nodeInstance}-of-${numNodes}/${coreName}-${shardIds}
The Base URL will be: ${solr.baseurl}/${coreName}-${shardIds}
The Shard that will be created (unless otherwise): ${property.shard.instance}

One final note about the parameters: you need to be extra careful with these. I will say it again but in case of DB_ID_RANGE for example, Solr will not care about the shardIds or the property.shard.instance, it will only care about what it needs to index and that is based on the range… If you are using the Dynamic Registration, Alfresco will (and this is where the confusion can start) show you (in the Admin Console > Search Services Sharding) wrong information because it will base itself on the “property.shard.instance” to know to which Shard this instance belongs to (or at least supposed to…). This is just a graphical representation of the Shard Instances (so normally no impact on searches) but still, if you want an efficient and correct graphical view, keep things consistent!

III. Examples

a. DB_ID_RANGE

I will use the “High Availability – 6 Solr Server – 3 Shards (2nd solution)” as working example to show which commands can be used to create these: 3 different Shards, 4 Shard Instances for each on the 6 Solr Servers. I will use the DB_ID_RANGE Method so it is easy to see the differences between each command and identify for which Shard an Instance will be created. I will use a range of 50 000 000, assume that we can go up to 20 Shards (so 1 000 000 000 at maximum) on up to 12 Solr Servers.

For the Solr Server 1 – Shards 0 & 1:
- curl -v “http://solr1:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=1&template=rerank&coreName=alfresco&shardIds=0&property.shard.method=DB_ID_RANGE&property.shard.range=0-50000000&property.shard.instance=0“
- curl -v “http://solr1:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=1&template=rerank&coreName=alfresco&shardIds=1&property.shard.method=DB_ID_RANGE&property.shard.range=50000000-100000000&property.shard.instance=1“
For the Solr Server 2 – Shards 1 & 2:
- curl -v “http://solr2:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=2&template=rerank&coreName=alfresco&shardIds=1&property.shard.method=DB_ID_RANGE&property.shard.range=50000000-100000000&property.shard.instance=1“
- curl -v “http://solr2:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=2&template=rerank&coreName=alfresco&shardIds=2&property.shard.method=DB_ID_RANGE&property.shard.range=100000000-150000000&property.shard.instance=2“
For the Solr Server 3 – Shards 0 & 2:
- curl -v “http://solr3:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=3&template=rerank&coreName=alfresco&shardIds=0&property.shard.method=DB_ID_RANGE&property.shard.range=0-50000000&property.shard.instance=0“
- curl -v “http://solr3:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=3&template=rerank&coreName=alfresco&shardIds=2&property.shard.method=DB_ID_RANGE&property.shard.range=100000000-150000000&property.shard.instance=2“
For the Solr Server 4 – Shards 0 & 1:
- curl -v “http://solr4:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=4&template=rerank&coreName=alfresco&shardIds=0&property.shard.method=DB_ID_RANGE&property.shard.range=0-50000000&property.shard.instance=0“
- curl -v “http://solr4:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=4&template=rerank&coreName=alfresco&shardIds=1&property.shard.method=DB_ID_RANGE&property.shard.range=50000000-100000000&property.shard.instance=1“
For the Solr Server 5 – Shards 1 & 2:
- curl -v “http://solr5:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=5&template=rerank&coreName=alfresco&shardIds=1&property.shard.method=DB_ID_RANGE&property.shard.range=50000000-100000000&property.shard.instance=1“
- curl -v “http://solr5:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=5&template=rerank&coreName=alfresco&shardIds=2&property.shard.method=DB_ID_RANGE&property.shard.range=100000000-150000000&property.shard.instance=2“
For the Solr Server 6 – Shards 0 & 2:
- curl -v “http://solr6:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=6&template=rerank&coreName=alfresco&shardIds=0&property.shard.method=DB_ID_RANGE&property.shard.range=0-50000000&property.shard.instance=0“
- curl -v “http://solr6:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=20&numNodes=12&nodeInstance=6&template=rerank&coreName=alfresco&shardIds=2&property.shard.method=DB_ID_RANGE&property.shard.range=100000000-150000000&property.shard.instance=2“

b. DATE

In this section, I will use the same setup but with a DATE Sharding Method. I will keep the same details as well (3 Shards) and therefore, each Shard Instance will contain 4 of the 12 months. There are several ways to obtain this:

No grouping option (default 1): Shard N°1 will contain the months 1,4,7,10 // Shard N°2 will contain the months 2,5,8,11 // Shard N°3 will contain the months 3,6,9,12
Grouping of 2: Shard N°1 will contain the months 1,2,7,8 // Shard N°2 will contain the months 3,4,9,10 // Shard N°3 will contain the months 5,6,11,12
Grouping of 3: Shard N°1 will contain the months 1,2,3,10 // Shard N°2 will contain the months 4,5,6,11 // Shard N°3 will contain the months 7,8,9,12
Grouping of 4: Shard N°1 will contain the months 1,2,3,4 // Shard N°2 will contain the months 5,6,7,8 // Shard N°3 will contain the months 9,10,11,12

Here, I will be group the months by 4. Since this is not the DB_ID_RANGE Method, all the Shard Instances for a Solr Server can be created by single command. However, when doing so, you cannot specify the “property.shard.instance” because it would contain several values (0,1,2 for example) and it is as far as I know, not supported. Therefore, the Shards will be created based on the “shardIds” parameter which has no problem with comma separated list:

For the Solr Server 1 – Shards 0 & 1:
- curl -v “http://solr1:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=1&template=rerank&coreName=alfresco&shardIds=0,1&property.shard.method=DATE&property.shard.key=cm:modified&property.shard.date.grouping=4&property.shard.count=3″
For the Solr Server 2 – Shards 1 & 2:
- curl -v “http://solr2:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=2&template=rerank&coreName=alfresco&shardIds=1,2&property.shard.method=DATE&property.shard.key=cm:modified&property.shard.date.grouping=4&property.shard.count=3″
For the Solr Server 3 – Shards 0 & 2:
- curl -v “http://solr3:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=3&template=rerank&coreName=alfresco&shardIds=0,2&property.shard.method=DATE&property.shard.key=cm:modified&property.shard.date.grouping=4&property.shard.count=3″
For the Solr Server 4 – Shards 0 & 1:
- curl -v “http://solr4:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=4&template=rerank&coreName=alfresco&shardIds=0,1&property.shard.method=DATE&property.shard.key=cm:modified&property.shard.date.grouping=4&property.shard.count=3″
For the Solr Server 5 – Shards 1 & 2:
- curl -v “http://solr5:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=5&template=rerank&coreName=alfresco&shardIds=1,2&property.shard.method=DATE&property.shard.key=cm:modified&property.shard.date.grouping=4&property.shard.count=3″
For the Solr Server 6 – Shards 0 & 2:
- curl -v “http://solr6:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=6&template=rerank&coreName=alfresco&shardIds=0,2&property.shard.method=DATE&property.shard.key=cm:modified&property.shard.date.grouping=4&property.shard.count=3″

c. ACL_ID (ACL v2)

In this section, I will use the same setup but with an ACL_ID Sharding Method. I will keep the same details as well (3 Shards). Each of the three Shards will get assigned some of the ACLs of the Alfresco Repository and it will index all the Alfresco nodes that have these specific ACLs. Therefore, it will be a random assignment that might or might not be evenly distributed (the more ACLs you have, the more evenly it is supposed to be in theory):

For the Solr Server 1 – Shards 0 & 1:
- curl -v “http://solr1:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=1&template=rerank&coreName=alfresco&shardIds=0,1&property.shard.method=ACL_ID&property.shard.count=3″
For the Solr Server 2 – Shards 1 & 2:
- curl -v “http://solr2:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=2&template=rerank&coreName=alfresco&shardIds=1,2&property.shard.method=ACL_ID&property.shard.count=3″
For the Solr Server 3 – Shards 0 & 2:
- curl -v “http://solr3:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=3&template=rerank&coreName=alfresco&shardIds=0,2&property.shard.method=ACL_ID&property.shard.count=3″
For the Solr Server 4 – Shards 0 & 1:
- curl -v “http://solr4:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=4&template=rerank&coreName=alfresco&shardIds=0,1&property.shard.method=ACL_ID&property.shard.count=3″
For the Solr Server 5 – Shards 1 & 2:
- curl -v “http://solr5:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=5&template=rerank&coreName=alfresco&shardIds=1,2&property.shard.method=ACL_ID&property.shard.count=3″
For the Solr Server 6 – Shards 0 & 2:
- curl -v “http://solr6:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=3&numNodes=12&nodeInstance=6&template=rerank&coreName=alfresco&shardIds=0,2&property.shard.method=ACL_ID&property.shard.count=3″

IV. Conclusion & final remarks:

At first, it might be a little confusing to work with Solr Sharding because there are a lot of terms that might overlap each-other a little bit but once you get it, it is such a pleasure to so easily create this kind of indexing architecture. Some final remarks:

As mentioned previously, you can create all the Instances on a Solr Server using a single command, except if you are using the DB_ID_RANGE because that contains the one variable parameter that needs to change between two different Shards. If the Instances are part of the same Shard (with the same range), then you can create them all in a single command.
If you want to have different Shards on the same Solr Server, then you need to change “property.shard.instance” (because it is a different Shard) and “shardIds” (because the Core & Base URL needs to be unique per Solr Server). The Shard Instance will belong to the Shard specified in “property.shard.instance” by default. If not present, it will use “shardIds” (as always, unless there is a range specified for the DB_ID_RANGE Method because this takes precedence over anything else).
If you want to have several Instances of the same Shard on the same Solr Server, then you need to change “shardIds” and keep “property.shard.instance” the same. If you can avoid that, then you should, for simplicity and consistency, but if you have a requirement to have two Instances of the same Shard on the same Solr Server, it is still possible.

If you are interested in playing with the Solr Sharding but you don’t have an existing infrastructure in place, you might want to take a look at this GitHub project from Angel Borroy which should help you to go further. Have fun!

Cet article Solr Sharding – Shard creation est apparu en premier sur Blog dbi services.

↧

Will PostgreSQL14 finally come with schema variables?

November 22, 2020, 11:40 am

≫ Next: Documentum DQMan for repositories 16.7 and after

≪ Previous: Solr Sharding – Shard creation

One of the bits you need to solve when you migrate from Oracle to PostgreSQL is this: In the Oracle database there are PL/SQL packages and some of those have package variables defined. PostgreSQL does not know the concept of a package but you can use schemas to group your PL/pgSQL functions and procedures. When it comes to packages variables there is no easy solution currently.

Tip: You can add your own parameters in #postgresql pic.twitter.com/sdaOMStqqU

— Daniel Westermann (@westermanndanie) August 7, 2020

to emulate that, but it is nasty and not the intended use case for this. Schema variables are under discussion for quite some time and the good news is, that the patch is now “Ready for Committer”. This still is not a guarantee that PostgreSQL 14 will come with schema variables, but at least it is close to that.

As always, lets do some simple demos to understand what is going on and how the feature might help, once it is committed. The most simple example for a schema variable is this:

postgres=# create variable var1 as int;
CREATE VARIABLE

As the name of the feature (schema variables) implies, a variable is created in a specific schema. As I have not modified the default search_path the variable got created in the public schema. You can easily check this in the new catalog table pg_variable:

postgres=# select varname, varnamespace::regnamespace from pg_variable;
 varname | varnamespace 
---------+--------------
 var1    | public
(1 row)

By default, once a variable is created, it is persistent and available again after the instance is restarted:

postgres=# \! pg_ctl restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-19 01:29:48.825 CET - 1 - 80179 -  - @ LOG:  redirecting log output to logging collector process
2020-11-19 01:29:48.825 CET - 2 - 80179 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# select 1;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# select var1;
 var1 
------
     
(1 row)

Until now the variable does not contain any value, as we did not assign anything. To do that you can use “let”:

postgres=# \h let
Command:     LET
Description: change a schema variable's value
Syntax:
LET schema_variable = sql_expression
LET schema_variable = DEFAULT

URL: https://www.postgresql.org/docs/devel/sql-let.html

Assigning a value can be as simple as this:

postgres=# let var1 = 1;
LET
postgres=# select var1;
 var1 
------
    1
(1 row)

.. or you can calculate the new value:

postgres=# let var1 = var1 * 2;
LET
postgres=# select var1;
 var1 
------
    2
(1 row)

The value, a variable has assigned to, is not persistent, it lives only for the duration of the session:

postgres=# \! pg_ctl restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-19 01:44:42.837 CET - 1 - 80305 -  - @ LOG:  redirecting log output to logging collector process
2020-11-19 01:44:42.837 CET - 2 - 80305 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# select 1;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# select var1;
 var1 
------
     
(1 row)

If you want to have the value of a variable persistent you need to make it immutable:

postgres=# create immutable variable var2 as int default 2;
CREATE VARIABLE
postgres=# select var2;
 var2 
------
    2
(1 row)

postgres=# \! pg_ctl restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-19 01:58:53.365 CET - 1 - 80414 -  - @ LOG:  redirecting log output to logging collector process
2020-11-19 01:58:53.365 CET - 2 - 80414 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# select 1;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# select var2;
 var2 
------
    2
(1 row)

Important to understand is, that variables are not transaction safe by default:

postgres=# create variable var3 as int default 3;
CREATE VARIABLE 
postgres=# select var3;
 var3 
------
    3
(1 row)

postgres=# begin;
BEGIN
postgres=*# let var3=4;
LET
postgres=*# rollback;
ROLLBACK
postgres=# select var3;
 var3 
------
    4
(1 row)

But you can do it, if you want:

postgres=# create variable var4 as int default 5 on transaction end reset;
CREATE VARIABLE
postgres=# begin;
BEGIN
postgres=*# let var4 = 10;
LET
postgres=*# select var4;
 var4 
------
   10
(1 row)

postgres=*# rollback;
ROLLBACK
postgres=# select var4;
 var4 
------
    5
(1 row)

Like tables, variables can also be temporary:

postgres=# create temporary variable var6 as int default -1;
CREATE VARIABLE
postgres=# \! pg_ctl restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-19 02:22:22.308 CET - 1 - 80611 -  - @ LOG:  redirecting log output to logging collector process
2020-11-19 02:22:22.308 CET - 2 - 80611 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# select 1;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# select var6;
ERROR:  column "var6" does not exist
LINE 1: select var6;
               ^
postgres=#

… and you can also specify to drop the variable at commit time:

postgres=# begin;
BEGIN
postgres=*# create temporary variable var7 as int default -1 on commit drop;
CREATE VARIABLE
postgres=*# let var7 = -9;
LET
postgres=*# commit;
COMMIT
postgres=# select var7;
ERROR:  column "var7" does not exist
LINE 1: select var7;
               ^
postgres=#

Variables can be referenced in procedures, functions and in SQL:

postgres=# create variable var8 as int default 100;
CREATE VARIABLE
postgres=# create variable var8 as int default 100;
CREATE VARIABLE
postgres=# create function f1() returns int as $$select var8;$$ language SQL;
CREATE FUNCTION
postgres=# select f1();
 f1  
-----
 100
(1 row)

postgres=# create procedure p1() as $$
postgres$# declare
postgres$# begin
postgres$#   let var8 = 101;
postgres$#   raise notice '%', var8;
postgres$# end;
postgres$# $$ language plpgsql;
CREATE PROCEDURE
postgres=# call p1();
NOTICE:  101
CALL

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# select var8;
 var8 
------
  101
(1 row)

postgres=# insert into t1 values (101,'aaa');
INSERT 0 1
postgres=# select * from t1 where a = var8;
  a  |  b  
-----+-----
 101 | aaa
(1 row)

This is really a great feature and I do hope it finally gets committed for PostgreSQL 14.

Cet article Will PostgreSQL14 finally come with schema variables? est apparu en premier sur Blog dbi services.

↧

Documentum DQMan for repositories 16.7 and after

November 23, 2020, 6:04 am

≫ Next: AWS burstable EC2 and CloudWatch metrics

≪ Previous: Will PostgreSQL14 finally come with schema variables?

You may have noticed when upgrading your documentum infra that after 7.3 the DQMan utility from FME is no more usable (you get a red cross without any warnings on the screen). In this blog I’ll show you how to get it working again. I tested it on 16.7 but it should work on the latest ones.

Install Java 32bit

DQMan will only work with Java 32bit as it’s not compiled (for the moment) in x64.

On my case I installed Java 8u192 in C:\jre-1.8.192_x32.

Install DFC

Install DFC with the version you need, I have a repository in 16.7 so I installed DFC in 16.7. It’s not mandatory to install the DFC with the 32bits java, it doesn’t prevent DQMan to start. Fill up the dfc.properties which will connect to the repository.

Install DQMan

You can install it wherever you want, in my case I’ve put it in C:\Users\Public\dqman. I’ve the latest dqman version which is currently 6.0.1.34 (you can verify in the about).

Setup DQMan

All is now installed and we can setup dqman to work with our repository.

First you need dmcl.dll from an earlier version, put it into the root folder of dqman, e.g. C:\Users\Public\dqman. Mine is 156KB.

Now copy the dfc.properties from your DFC installation to the config folder of dqman: C:\Users\Public\dqman\config\dfc.properties

If you have a dmcl40.dll in the root folder delete it.

Then create a java.ini file in the root folder with the following inside:

java_library_path="C:\jre-1.8.192_x32\bin\client\jvm.dll"
JAVA_OPTIONS=" -Xcheck:jni -XX:+RestoreMXCSROnJNICalls -Xmx256m"
java_classpath = "C:\Documentum\dctm.jar;C:\Documentum\config;C:\Documentum\Shared\dfc.jar;C:\Users\Public\dqman\config"

Of course update the path depending on your installation.

You should now be able to launch it, you can verify the version of the DFC used in About:

Cet article Documentum DQMan for repositories 16.7 and after est apparu en premier sur Blog dbi services.

↧

AWS burstable EC2 and CloudWatch metrics

November 23, 2020, 2:24 pm

≫ Next: Installing EDB Advanced Server without EDB Repository

≪ Previous: Documentum DQMan for repositories 16.7 and after

By Franck Pachot

.
Your application workload is usually not constant and maybe not predictable. If you undersize the capacity, you will save money but in case of peak of activity you will have severe performance issues. Thus, you need to size the capacity for peak workloads: you pay for idle CPU when out of those peaks. This is one major reason for virtualization and cloud: elasticity. Because the hardware is shared by multiple applications, which probably don’t have their workload peak at the same time, you don’t have to size the physical capacity for the sum of peak activity, as you would have to do if each application run on dedicated hardware. AWS has many features to choose between reserved capacity and auto-scaling ones. Here is as very simple one that is even available in the Free Tier: burstable compute.

I have created an EC2 t2.micro instance. And I’m displaying 3 metrics in a CloudWatch:

I started with 32 minutes of “CPU Credit Balance”. This means that I can run at 100% CPU for 32 minutes. This credit stays and even increases when I’m not using the CPU. At 16:20 have started a program running fully in CPU ( `yes > /dev/null` ). I used 100% of CPU until 17:00 approximately and then the “CPU Utilization (%)” decreased to around 20% which is the baseline in this t2.micro when credits are exhausted. The “CPU Credit Usage” was at 5 during all this period because it is gathered every 5 minutes: during 5 minutes running at 100% CPU you use 5 minutes of credit. And during that time you see the “CPU Credit Balance” decreasing down to zero. I stopped my `yes` process at 17:45 where the “CPU utilization” is at 0% as nothing is running, and the “CPU credit balance” starts to grow again slowly.

From this CloudWatch “CPU Utilization” you cannot say if between 17:05 and 17:35 you are only 20% CPU usage because the process doesn’t need to (being idle) or because it is throttled by lack of CPU credit. But you can guess it when you look at the credit balance and usage. Let’s have a look at the default SAR statistics gathered every 10 minutes:


[ec2-user@ip-172-31-42-243 ~]$ sar

04:04:59 PM       LINUX RESTART                                                                                                        

04:10:01 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
04:20:01 PM     all      0.02      0.00      0.02      0.00      0.00     99.95
04:30:01 PM     all     45.88      0.00      0.58      0.09      0.01     53.44
04:40:01 PM     all     99.02      0.00      0.96      0.00      0.02      0.00
04:50:01 PM     all     99.03      0.00      0.95      0.00      0.01      0.00
05:00:01 PM     all     99.04      0.00      0.95      0.00      0.01      0.00
05:10:01 PM     all     28.03      0.00      0.23      0.00     71.74      0.00
05:20:02 PM     all     17.50      0.00      0.12      0.00     82.39      0.00
05:30:01 PM     all     18.90      0.00      0.11      0.00     80.99      0.00
05:40:02 PM     all     18.83      0.02      0.16      0.00     81.00      0.00
05:50:01 PM     all      3.45      1.07      1.10      4.60     33.31     56.47
06:00:01 PM     all      0.02      0.00      0.02      0.00      0.00     99.96
06:10:01 PM     all      0.02      0.00      0.02      0.00      0.02     99.94
06:20:01 PM     all      0.03      0.00      0.01      0.00      0.01     99.95
06:30:01 PM     all      0.01      0.00      0.01      0.00      0.00     99.98

Before starting my program, the CPU was nearly 100% idle, until 16:20. Then when it was running the CPU was used at nearly 100% in user space until 17:00. Then, the CPU usage is nearly 20% until I stopped the process at 17:45. This is exactly the same as what CloudWatch displays. But SAR has an additional metric here: “%steal” and this is how I know that it was not idle but throttled by the hypervisor. My process was 100% in Running state but this shows up as 20% CPU usage and 80% stolen. Then I’ve stopped the process and %user and %steal is down to zero – all time is idle.

And once the CPU is idle, my CPU credits start to increase again:

24 hours later I earned 144 minutes of CPU. I’ll not earn more – this is the maximum. Of course all that is documented in the instance type description (https://aws.amazon.com/ec2/instance-types/) and Burstable CPU description (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-credits-baseline-concepts.html): an idle t2.micro gets 6 minutes of credits every hour, with a maximum of 144. The only thing that differs from documentation in my example is the baseline, which should be 10% when out of credits, but I am above here. No idea why.

I’ve started again a process that never sleeps:

It can run at 100% for 144 minutes but I stop it there, this is the same pattern as above.
Just one thing, CloudWatch doesn’t tell it to you but “CPU utilisation%” includes the time running in kernel (%system). Instead of running `yes > /dev/null` I have run `dd if=/dev/urandom of=/dev/null bs=1M` which runs mostly in kernel space:

03:20:01 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
09:00:01 PM     all      0.02      0.00     99.96      0.00      0.02      0.00
Average:        all      7.73      0.00      1.26      0.00      0.01     91.00

Amazon RDS

This can be interesting for a database and some RDS instance types are burstable (db.t3.*). Databases rarely have an homogeneous activity, but they must have CPU resources available at peak time. Because databases share data, they protect data structures with latches and maybe spinlocks: in case of CPU starvation, a process scheduled out of CPU may hold an exclusive resource and another process comes to running state just to spin in CPU trying to acquire this resource. This is a waste of CPU cycles, as the resource will not be available while the other process is in the runqueue.
Aurora can run on db.t3 medium and large. But you may also look at auto-scale and maybe serverless. Oracle Database does not benefit a lot from burstable instances because the software license is paid on processor metric, which counts on full vCPU rather than the baseline, so you probably want to use 100% of it as long as you need.

Cet article AWS burstable EC2 and CloudWatch metrics est apparu en premier sur Blog dbi services.

↧

Installing EDB Advanced Server without EDB Repository

November 23, 2020, 11:38 pm

≫ Next: DynamoDB PartiQL – part I: INSERT

≪ Previous: AWS burstable EC2 and CloudWatch metrics

Almost all of us, who already installed Enterprise DB Tools on a Linux server know the procedure. You need to add the EDB Repository to your server or your Red Hat Satellite configuration and after that you can easily install the tools you need. But what happens, if you are not able to add the repository?
The answer is really simple: Download the rpms and install them directly on the server.

Download the EDB tar ball

On the repository of EDB you can find the possibility to download the tar-ball which contains all the packages needed to install EDB products. The tar ball is about 2 GB size, so there should be space for it somewhere.

Just scroll a bit down on the repo page

In case you don’t want to download the whole tar ball, there is also the possibility to download the single rpms. But as I want to have everything, I go on with the tar ball.

Downloading the tar ball directly. Of course you need your subscription credentials here as well.

 wget https://username:password@yum.enterprisedb.com/edb/redhat/edb_redhat_rhel-7-x86_64.tar.gz
--2020-11-19 09:52:34--  https://username:*password*@yum.enterprisedb.com/edb/redhat/edb_redhat_rhel-7-x86_64.tar.gz
Resolving yum.enterprisedb.com (yum.enterprisedb.com)... 54.165.250.135
Connecting to yum.enterprisedb.com (yum.enterprisedb.com)|54.165.250.135|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Reusing existing connection to yum.enterprisedb.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 2052730909 (1.9G) [application/x-gzip]
Saving to: ‘edb_redhat_rhel-7-x86_64.tar.gz’

10% [==============                                                               ] 215,687,168 2.40MB/s  eta 21m 51s

Extract the needed packages

As you we want to install the EDB Advanced Server and EFM, we need to find out which packages are available in the tar ball. As the output is quite long, let’s write it into a file for better readability.

tar -tf edb_redhat_rhel-7-x86_64.tar.gz > edb_rhel_packages.txt

After that we can search for the files we need using vi and afterwards extract them. Find below the complete list of packages you need for the installation of EDB Advanced Server.

tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-13.0-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-edbplus-39.0.0-1.rhel7.x86_64.rpm  
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-libicu-66.1-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-pgagent-4.2.0-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-pgpool41-extensions-4.1.2-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-pgsnmpd-1.0-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-client-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-cloneschema-1.14-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-contrib-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-core-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-devel-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-docs-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-edb-modules-1.0-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-indexadvisor-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-libs-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-llvmjit-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-parallel-clone-1.8-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-pldebugger-1.1-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-plperl-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-plpython3-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-pltcl-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-sqlprofiler-4.0-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-sqlprotect-13.0.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-server-sslutils-1.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-slony-replication-2.2.8-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-slony-replication-core-2.2.8-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-slony-replication-docs-2.2.8-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-as13-slony-replication-tools-2.2.8-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-jdbc-42.2.12.3-1.rhel7.x86_64.rpm
tar -xf edb_redhat_rhel-7-x86_64.tar.gz edb-pgpool41-libs-4.1.2-1.rhel7.x86_64.rpm

Install EDB Advanced Server

Now that all needed packages are downloaded, go on with the installation. First we have to install the packages needed from the RHEL and EPEL Repository.

sudo yum install -y unzip xorg-x11-xauth screen boost-atomic boost-chrono boost-date-time boost-filesystem boost-system boost-regex boost-thread tcl uuid python3 python3-libs python3-pip python3-setuptools libtirpc libicu libxslt llvm5.0 llvm5.0-libs libtirpc lm_sensors-libs net-snmp-agent-libs net-snmp-libs perl-Compress-Raw-Bzip2 perl-Compress-Raw-Zlib perl-DBD-Pg perl-DBI perl-Data-Dumper perl-IO-Compress perl-Net-Daemon perl-PlRPC perl-version perl-Switch

Once we installed all the dependencies needed, we can finally install the Advanced Server using the rpms.

sudo yum install /tmp/edb-*.rpm

In case one rpm is missing, you will get a hint, that one rpm needs another as dependency. You can just extract this rpm from the tar ball as well and retry the installation command.

Finished. That’s all you have to do, quite simple, isn’t it.
Now you can go on with the setup of the database. In case you need help in that point, you can find some hints here.

Cet article Installing EDB Advanced Server without EDB Repository est apparu en premier sur Blog dbi services.

↧

DynamoDB PartiQL – part I: INSERT

November 24, 2020, 10:51 am

≫ Next: PostgreSQL 14: Add the number of de-allocations to pg_stat_statements?

≪ Previous: Installing EDB Advanced Server without EDB Repository

By Franck Pachot

.
This sounds paradoxical to execute SQL statements on a NoSQL database, but we have now a new API to interact with DynamoDB, which looks like SQL. AWS data services is a collection of purpose-built database services that have their own API. The relational databases in RDS all share a similar API thanks to the SQL standard. However, for non-relational databases, there is no standard: each NoSQL databases, even if they have similar get/put API, do not share a common language. In order to build a converged API on top of that, Amazon is using the Open Source PartiQL to provide SQL-like access. And this has been recently implemented for DynamoDB. Here are my first tests, this will be a multi-part blog post. I’ll mention immediately the most important: you are still in a NoSQL database. The “SQL” there has very little in common with the SQL used in relational databases. Introducing the INSERT, UPDATE, SELECT, DELETE keywords do not transform a key-value datastore to a relational database: you still put and get items, not set of rows, you do not join multiple items in the database, you are not ACID,…

AWS CLI

I’ll run from AWS CLI here and for the moment, PartiQL is not available in the Version 2:


[opc@a tmp]$ aws --version
aws-cli/2.1.3 Python/3.7.3 Linux/4.14.35-2025.400.9.el7uek.x86_64 exe/x86_64.oracle.7

[opc@a tmp]$ aws dynamodb execute-statement

usage: aws [options]   [ ...] [parameters]

aws: error: argument operation: Invalid choice, valid choices are:

However, it is available in Version 1 (even if Version 2 is “now stable and recommended for general use”) so I installed temporarily Version 1:


cd /var/tmp
wget -c https://s3.amazonaws.com/aws-cli/awscli-bundle.zip
unzip -o awscli-bundle.zip
sudo ./awscli-bundle/install -i /var/tmp/aws-cli/awscli1
alias aws='/var/tmp/aws-cli/awscli1/bin/aws "$@" --output json | jq --compact-output'

You can see that I like the output as compact json, as this is easier to put in the blog post, but of course you don’t have to. And you can also run the same from the console.

For the moment, this feature is not deployed in all regions, so if you don’t see it check on another region. The announce about it is there: You now can use a SQL-compatible query language to query, insert, update, and delete table data in Amazon DynamoDB. When you will read my blog post, all the above may be outdated and you can use it in all regions, and with the latest AWS CLI. While I’m there, the documentation for this feature is in the developer guide: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ql-reference.html

DDL

There is no DDL like CREATE TABLE so you need to create your DynamoDB with the usual API: Console, CLI or API


aws dynamodb create-table --attribute-definitions \
  AttributeName=MyKeyPart,AttributeType=N \
  AttributeName=MyKeySort,AttributeType=N \
 --key-schema \
  AttributeName=MyKeyPart,KeyType=HASH \
  AttributeName=MyKeySort,KeyType=RANGE \
 --billing-mode PROVISIONED \
 --provisioned-throughput ReadCapacityUnits=25,WriteCapacityUnits=25 \
 --table-name Demo

This the same table as I used in a previous post in GSI and LSI. This table has a composite key, partitioned by hashed on the MyKeyPart attribute and by range on MyKeySort, locally indexed on the MyKeySort attribute by default.

INSERT

I can insert new items there with a simple INSERT statement:


[opc@a aws]$ aws dynamodb execute-statement --statement \
"insert into Demo value {'MyKeyPart':1,'MyKeySort':1,'MyUnstructuredData':'here is my first insert :)'}"

{"Items":[]}

You can already see that besides the INSERT keyword, nothing is similar to RDBMS SQL. In SQL the INSERT reads from a set of rows, and the VALUES keyword is a table value constructor to provide this set of rows rather than querying it from a table. here we have s singular VALUE keyword to input a single JSON document.

Mixing the code (SQL keywords, table and attribute identifiers) with data (literal values, like the numbers or character strings I used there) is handy in an interactive query console, but is not the right way to code. As with any API, we must code with variables, and pass them as parameters at execution time. Here is how to do it:


[opc@a aws]$ aws dynamodb execute-statement --statement \
"insert into Demo value {'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}" \
  --parameters '[{"N":"2"},{"N":"2"},{"S":"use parameters when embedding SQL in programs"}]'

{"Items":[]}

This is the correct way to use SQL in programs: the SQL text contains parameters, as placeholders for values, also called bind variables. They are simple question marks here (?) which are referenced in order in the parameters list. Some SQL databases also support named parameters (:variable) but not here.

You should remember that are bind variables, not substitution text. For Strings, you don’t put the quotes around them in the SQL text. The data type is declared in the parameter list.

When you have data to ingest, you don’t want a network roundtrip for each item. Here is how to send batched inserts:


[opc@a aws]$ aws dynamodb batch-execute-statement --statements \
'[ {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"2"},{"N":"1"},{"S":"a"}]} , {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"2"},{"N":"2"},{"S":"b"}]} , {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"2"},{"N":"3"},{"S":"c"}]} , {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"2"},{"N":"4"},{"S":"d"}]} , {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"2"},{"N":"5"},{"S":"e"}]} ]'

{"Responses":[{"TableName":"Demo"},{"TableName":"Demo","Error":{"Message":"Duplicate primary key exists in table","Code":"DuplicateItem"}},{"TableName":"Demo"},{"TableName":"Demo"},{"TableName":"Demo"}]}

With batched statements, some statements may fail and the Response holds a list of errors encountered. Here I inserted multiple MyKeySort values for the same MyKeyPart. And one of them (MyKeyPart=2,MyKeySort=2) was already there, and then was rejected as duplicate key. The others are inserted.

Finally, here is what I have in my Demo table:

Note that if there are already some duplicates in the Statements list, the whole call is rejected:


[opc@a aws]$ aws dynamodb batch-execute-statement --statements '[ {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"9"},{"N":"9"},{"S":"a"}]} , {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"9"},{"N":"9"},{"S":"b"}]} , {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"9"},{"N":"1"},{"S":"c"}]} , {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"9"},{"N":"2"},{"S":"d"}]} , {"Statement":"insert into Demo value '"{'MyKeyPart':?,'MyKeySort':?,'MyUnstructuredData':?}"'","Parameters":[{"N":"9"},{"N":"3"},{"S":"e"}]} ]'

An error occurred (ValidationException) when calling the BatchExecuteStatement operation: Provided list of item keys contains duplicates

Because of the duplicate [{“N”:”9″},{“N”:”9″}] here, no items have been inserted.

Unfortunately, I have to repeat each statement there. I would prefer an API where I prepare one parameterized statement and to be executed with many values, like the array interface in SQL databases. But we are in NoSQL and each Item has its own structure that may differ from the other.

SELECT

Let’s have a look at my items with a SELECT statement:


[opc@a aws]$ aws dynamodb execute-statement --statement "select * from Demo"

{"Items":[
 {"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"a"},"MyKeySort":{"N":"1"}}
,{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"use parameters when embedding SQL in programs"}
,"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"c"}
,"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"d"}
,"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"e"}
,"MyKeySort":{"N":"5"}},{"MyKeyPart":{"N":"1"},"MyUnstructuredData":{"S":"here is my first insert :)"}
,"MyKeySort":{"N":"1"}}]}

Of course I would like to see them sorted:


[opc@a aws]$ aws dynamodb execute-statement --statement "select * from Demo order by MyKeyPart,MyKeySort"     
                                                
An error occurred (ValidationException) when calling the ExecuteStatement operation: Must have WHERE clause in the statement when using ORDER BY clause.

This is a reminder that PartiQL is not a database engine. It cannot provide a sorted result set if the underlying database does not support it. DynamoDB is a distributed database, partitioned by hash, which means that a SCAN (the operation behind this SELECT as I have no where clause to select a partition) does not return the result in order. This is a fundamental concept in DynamoDB: in order to be scalable and predictable, there are no cross-partition operations. And a SQL language over it will not change that.


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo where MyKeyPart=2 order by MyKeyPart,MyKeySort"

{"Items":[{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"5"}}]}

Here the ORDER BY is accepted because I explicitely restrained the operation to a single partition, with my WHERE clause (an equality predicate on the HASH key). And I request a sort on the RANGE key which is locally indexed in DynamoDB, and then can scan the items in this order. Again, you must keep in mind that PartiQL is not a database engine. The engine is DynamoDB and PartiQL maps a SQL-like language on top of the DynamoDB NoSQL API. And of course, you have the choice: all data manipulation can be done with the usual API. But there’s more to say on SELECT, and this will be in Part II…

Cet article DynamoDB PartiQL – part I: INSERT est apparu en premier sur Blog dbi services.

↧

PostgreSQL 14: Add the number of de-allocations to pg_stat_statements?

November 24, 2020, 11:21 pm

≫ Next: Ansible Custom Interactive Dynamic Inventory

≪ Previous: DynamoDB PartiQL – part I: INSERT

In the last post we had a look at an interesting patch (schema variables) that is currently waiting to be committed for PostgreSQL 14. Another patch, that currently is in the same state is named [PATCH] Add features to pg_stat_statements. As this does not give much information what the patch is about, here a short summary: When you reach the maximum number of allowed statements to track, pg_stat_statements will de-allocate entries to free space for other statements. As de-allocations use resources as well and frequent de-allocations might also mean that pg_stat_statement is not configured the right way for your workload, this patch provides the following information: how many de-allocations happened.

In the default configuration pg_stat_statements tracks a maximum of 5000 statements:

postgres=# show pg_stat_statements.max;
 pg_stat_statements.max 
------------------------
 5000
(1 row)

The lowest value allowed for pg_stat_statements.max is 100:

postgres=# alter system set pg_stat_statements.max=5;
ERROR:  5 is outside the valid range for parameter "pg_stat_statements.max" (100 .. 2147483647)
postgres=# alter system set pg_stat_statements.max=100;
ALTER SYSTEM

As this change requires a restart it should be set carefully:

postgres=# select context from pg_settings where name = 'pg_stat_statements.max';
  context   
------------
 postmaster
(1 row)

postgres=# \! pg_ctl restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-20 01:00:54.656 CET - 1 - 102610 -  - @ LOG:  redirecting log output to logging collector process
2020-11-20 01:00:54.656 CET - 2 - 102610 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# show pg_stat_statements.max;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# show pg_stat_statements.max;
 pg_stat_statements.max 
------------------------
 100
(1 row)

Lets generate some random statements and the check what additional information we get with this patch:

postgres=# select 'create table t'||i||'( a int, b int, c int)' from generate_series(1,101) i; \gexec
                ?column?                 
-----------------------------------------
 create table t1( a int, b int, c int)
 create table t2( a int, b int, c int)
 create table t3( a int, b int, c int)
 create table t4( a int, b int, c int)
 create table t5( a int, b int, c int)
 create table t6( a int, b int, c int)
 create table t7( a int, b int, c int)
 create table t8( a int, b int, c int)
 create table t9( a int, b int, c int)
...
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
postgres=#

This created 101 tables and should have forced pg_stat_statements to de-allocate some older statements. There is a new, very simple view which provides this information:

postgres=# select * from pg_stat_statements_info;
 dealloc 
---------
       3
(1 row)

Nice. This will give you an idea if pg_stat_statements.max is not properly configured for your environment. If you manually re-set pg_statements the de-allocation counter will also be reset:

postgres=# select * from pg_stat_statements_info;
 dealloc 
---------
       3
(1 row)

postgres=# select pg_stat_statements_reset();
 pg_stat_statements_reset 
--------------------------
 
(1 row)

postgres=# select * from pg_stat_statements_info;
 dealloc 
---------
       0
(1 row)

Cet article PostgreSQL 14: Add the number of de-allocations to pg_stat_statements? est apparu en premier sur Blog dbi services.

↧

Ansible Custom Interactive Dynamic Inventory

November 24, 2020, 11:35 pm

≫ Next: Documentum CTS reinstallation failing Null Pointer

≪ Previous: PostgreSQL 14: Add the number of de-allocations to pg_stat_statements?

When working with an Ansible dynamic inventory, you may want to update it from your playbook on the fly. For instance, you may want to create a server and then install an application on it without calling a second playbook or developing specific code to create a runtime group (e.g. “add_host” module). Or you may want to update the inventory on the fly without having to wait for an asynchronous mechanism to be triggered. This all this project is about: providing you a simple, fully encrypted, interactive dynamic inventory made to measure for your environment.

Back-end setup

1) Install the database software:

This will serve as a data storage for your Ansible inventory data (More details here)

wget https://siodb.io/packages/InstallSiodb.sh
chmod u+x ./InstallSiodb.sh
sudo ./InstallSiodb.sh

2) Get the last version of the dynamic inventory:

git clone https://github.com/siodb/ansible-dynamic-inventory.git
cd ansible-dynamic-inventory

3) Create the inventory data model and the database user:

sudo -i -u siodb siocli --user root < sio_inv_data_model.sql
sudo -i -u siodb siocli --user root < sio_inv_user.sql

4) Create your groups, groupvars, hosts and hostvars:

On this step, you can add your groups, groupvars, hosts and hostvars by inserting them into the correct table. Also, you can link them together using the columns:

groups_variables.group_id: to link variables to a group from the group table.
hosts.group_id: to link a group from the group table.
hosts_variables.host_id: to link variables to a host from the hosts table.

Here is a overview of the data model used for this inventory (red number indicating the order in which to insert data):

So, first connect to the database freshly installed from step 1):

sudo -i -u siodb siocli --user root

Then insert your data. For instance:

use database sioinv ;
insert into groups
values
    ( 'production' ),
    ( 'test' ),
    ( 'development' )
;
insert into groups_variables
values
    ( 1, 'domain_name', 'company.com' ),
    ( 2, 'environment_name', 'production' ),
    ( 2, 'subnet', '10.10.0.0/16' ),
    ( 3, 'environment_name', 'test' ),
    ( 3, 'subnet', '10.20.0.0/16' ),
    ( 4, 'environment_name', 'development' ),
    ( 4, 'subnet', '10.30.0.0/16' )
;

insert into hosts
values
    ( 2, 'server-01', CURRENT_TIMESTAMP ),
    ( 3, 'server-02', CURRENT_TIMESTAMP ),
    ( 4, 'server-03', CURRENT_TIMESTAMP )
;

insert into hosts_variables
values
    ( 1, 'public_ip', '0.1.2.3' ),
    ( 1, 'application', 'app01' ),
    ( 2, 'public_ip', '0.1.2.4' ),
    ( 2, 'application', 'app02' ),
    ( 3, 'public_ip', '0.1.2.4' ),
    ( 3, 'application', 'app02' )
;

Configure the inventory

You must tell the inventory script where to lookup for its data. That is done configuring the file sio_inv.ini:

[sio_inv]
siodb_rest_ip = localhost
siodb_rest_port = 50443
siodb_rest_user = sioinv
siodb_rest_token = [token-generated-from-script-sio_inv_user.sql]
siodb_rest_tls_verify_certificate = no

Validate the inventory

Use the command ansible-inventory to validate your inventory. For instance:

$ ansible-inventory -i ./sio_inv.py  --graph --vars
@all:
  |--@development:
  |  |--server-03
  |  |  |--{domain_name = company.com}
  |  |  |--{environment_name = development}
  |  |  |--{subnet = 10.30.0.0/16}
  |  |--{environment_name = development}
  |  |--{subnet = 10.30.0.0/16}
  |--@production:
  |  |--hypervisor-01
  |  |  |--{domain_name = company.com}
  |  |  |--{environment_name = production}
  |  |  |--{subnet = 10.10.0.0/16}
  |  |--hypervisor-02
  |  |  |--{domain_name = company.com}
  |  |  |--{environment_name = production}
  |  |  |--{subnet = 10.10.0.0/16}
  |  |--server-01
  |  |  |--{application = app01}
  |  |  |--{domain_name = company.com}
  |  |  |--{environment_name = production}
  |  |  |--{public_ip = 0.1.2.3}
  |  |  |--{subnet = 10.10.0.0/16}
  |  |--server-04
  |  |  |--{domain_name = company.com}
  |  |  |--{environment_name = production}
  |  |  |--{subnet = 10.10.0.0/16}
  |  |--{environment_name = production}
  |  |--{subnet = 10.10.0.0/16}
  |--@test:
  |  |--server-02
  |  |  |--{application = app02}
  |  |  |--{domain_name = company.com}
  |  |  |--{environment_name = test}
  |  |  |--{public_ip = 0.1.2.4}
  |  |  |--{subnet = 10.20.0.0/16}
  |  |--server-05
  |  |  |--{domain_name = company.com}
  |  |  |--{environment_name = test}
  |  |  |--{subnet = 10.20.0.0/16}
  |  |--{environment_name = test}
  |  |--{subnet = 10.20.0.0/16}
  |--@ungrouped:
  |--{domain_name = company.com}

Example of playbook

Here is an example of how to interactively deal with the dynamic inventory while the playbook is running
(Set the variable {{ sioinv_user_token }} where it makes sense for you and store the token in Ansible Vault):


---
- name: Full Stack Service creation
  hosts: hypervisor-01
  tasks:

    - name: Create machine
      shell: >
        echo "Module or command to create a new Virtual Machine {{ vm_name }}"

    - name: Update the custom dynamic inventory on the fly with the just-created vm
      uri:
        url: https://localhost:50443/databases/sioinv/tables/hosts/rows
        validate_certs: false
        user: sioinv
        password: {{ sioinv_user_token }}
        method: POST
        body: >
          [{
            "name": "{{ vm_name }}"
          }]
        force_basic_auth: yes
        status_code: 201
        body_format: json
      register: r_vm_post

    - name: Add hostvars on the fly to the just-created vm
      uri:
        url: https://localhost:50443/databases/sioinv/tables/hosts_variables/rows
        validate_certs: false
        user: sioinv
        password: {{ sioinv_user_token }}
        method: POST
        body: >
          [{
            "host_id": {{ r_vm_post.json.trids[0] }},
            "name": "ip_address",
            "value": "10.10.0.0"
          }]
        force_basic_auth: yes
        status_code: 201
        body_format: json

    - name: Refresh the inventory in current run to include the new VM and its variables before moving to the next play
      meta: refresh_inventory

- name: Full Stack Service creation
  hosts: "{{ vm_name }}"
  tasks:

    - name: Create the new database on the new VM
      shell: >
        echo "Module or command to create a new database on {{ vm_name }}"

Last words before you go

The inventory is the foundation of your Ansible strategy. It must be dynamic as possible and avoiding complexity to ensure smooth maintenance in the future. Modifying the inventory may lead you to change all your roles and playbooks and is something you may want to avoid. With this simple inventory describe here, you’ll have a strong foundation for your Ansible automation.

I hope this helps you, and please do contact us or comment below if you need more details.

Cet article Ansible Custom Interactive Dynamic Inventory est apparu en premier sur Blog dbi services.

↧

Documentum CTS reinstallation failing Null Pointer

November 25, 2020, 1:37 am

≫ Next: Documentum Administrator plugins installation not possible

≪ Previous: Ansible Custom Interactive Dynamic Inventory

I went into an issue by a customer were I had to upgrade in place the CTS. When you have to upgrade a CTS on the same server you first have to uninstall the old version in order to install the new one. Unfortunately, the uninstallation didn’t end well and some folders or configuration haven’t been removed properly.

Removing CTS manually

If the uninstallation failed with the uninstaller, you will not be able to run it again to remove all the stuff, as it thinks that it has already be uninstalled. So you’ll have to remove the components manually.

Remove the 3 services for CTS: You will find 3 services with the name beginning with “Documentum”, remove them.
Remove the installation folder: By default the installation folder is located in C:\Documentum, you can remove it or at least change its name.
There are some keys in the registry, normally you can keep it as it doesn’t prevent the software to be reinstalled. But note that you can search for “Documentum” in the registry to find all concerned keys.
Remove the DFC if it is still there (Add or Remove Programs).
In the targeted repository, remove the folders System/Media Server, System/Applications/Transformation and System/Applications/Rich_Media_services. It is where the transformation profiles are located. Note that you may keep it as the latest installers can dump them in order to replicate your configurations.

Running the installer fails with a null pointer

I had the issue where I did all the recommendations above, and when I ran the installer it failed with a null pointer. Even in the GUI I saw a path trying to install here: null/Documentum.

It is even impossible to change it manually as it’s a path generated by the installer itself. I thought it was due to the registry keys but it wasn’t, I changed a lot of them manually and nothing changed.

The issue came from a file that wasn’t deleted by the uninstallation, this file handles information about the previous installation, it is located here: C:\Program Files\Zero G Registry\.com.zerog.registry.xml

You’ll have to display hidden files and folders. Delete this file and the installation will go smoothly now.

Cet article Documentum CTS reinstallation failing Null Pointer est apparu en premier sur Blog dbi services.

↧

Documentum Administrator plugins installation not possible

November 25, 2020, 5:27 am

≫ Next: DynamoDB PartiQL – part II: SELECT

≪ Previous: Documentum CTS reinstallation failing Null Pointer

In some customer’s contexts you cannot install whatever you want. For policy reasons, or security. And documentum components really love to install client softwares to interact with them. In this blog we will discuss about the Documentum Administrator tool and the plugins it asks to install in order to gain full access to its functionalities.

Documentum Administrator’s plugins

DA requires at least 3 things to be installed on client machines in order to have the file transfer working:

The browser extension (IE, Chrome, Firefox, aso)
The NativeSetup.exe
You will have to change some settings in the browsers as well, in order to allow the execution of some scripts for these sites, white list it, and so on

These two programs allow to transfer and view files from the Documentum Administrator web application. I think it should be the same for Webtop as it’s based on the same. Unfortunately, some customers cannot allow users, even administrators to run specific exe files on their machines. So we tried to install them manually with some policies allowance and some elevated rights but it didn’t work very well as our policies were too restrictive.

In addition we were not allowed to change some parameters in the browsers as it was managed by the security team. So what can we do?

Setup Documentum Administrator in HTTP mode

Depending on what you need to do with the DA, you can enable the HTTP mode. This mode allows to use the default browser/http transfer protocols, hence it will not require anything to be installed on the end user machine. However this mode has some drawbacks and limitations.

Here is how to enable the HTTP mode:

Login to the web server and navigate to the exploded war file folder
Locate the file: da/wdk/app.xml
Change <default-mechanism> in <contentxfer> to http
Save the file and restart the web application server
Allow popups in the browser

Now the content transfer has been set to http mode.

Limitations

HTTP content transfer is supported for XML files but only for a single file used with the Default XML Application. For virtual documents, only the root (parent) file is transferred. The browser handles the launch of viewing and editing applications.

The checkout directory is not configurable. To checkout a document, users must select Edit, then either save the document or open and save it. On checkin, the user must navigate to the location in which the document was saved. User preferences are not supported in this mode.

Feature	UCF	HTTP
Download to client	UCF Deployment at client side	No client-side deployment
Drag and drop	Supported on IE Browser	Not supported
Viewing or editing application	Configurable	Controlled by browser
ACS support	Supported	Limited support for export or edit; not supported for view with relative links
Progress display	Supported	Supported only by certain browser
Preferences	Supported	Not supported
Restart interrupted operation	Supported	Not supported
Checkout	Supported	Limited support. User must save and select checkout location
Edit	Supported	Limited support. User must save and select checkout location
Checkin	Supported	Limited support. User must navigate to saved document
View	Supported; does not transfer content if file is up to date on client	Supported; always transfers content
Export	Supported	Supported
Import	Supported	Limited support. Single file selection at a time, no folder import
Client xfer tracing	Supported	Not supported
Server xfer tracing	Supported	Supported
File compression	Supported, with configurable exceptions	Turn on HTTP compression in web.xml
XML application	Supported	Import single file against Default XML Application
Virtual document	Supported	Root only
Temp working directory for upload/download	temp.working.dir	None
Checked out files	checkout.dir	User must save file, then check in from file
Exported files	export.dir or user choice	Browser specific UI to select a download location
Viewed files	viewed.dir	Browser temporary internet files location, for example, $java{user.home}\Local Settings\Temporary Internet Files
User location (base for other locations)	user.dir (defined in config file)	None
Registry file location	registry.file	None
Registry mode	registry.mode	None
Log file location	logs.dir	None
Tracing/debug	tracing.enabled	None
File polling	file.poll.interval	None

Cet article Documentum Administrator plugins installation not possible est apparu en premier sur Blog dbi services.

↧

DynamoDB PartiQL – part II: SELECT

November 25, 2020, 12:29 pm

≫ Next: Documentum DM_SYSOBJECT_E_NO_BROWSE_ACCESS No browse access for sysobject with ID

≪ Previous: Documentum Administrator plugins installation not possible

By Franck Pachot

.
In the previous post I insertd a few rows in a Demo table using the SQL-like new API on DynamoDB. I checked my items with a SELECT but was limited in the ORDER BY clause. Here is the most important to understand: there are no additional data processing engine here. PartiQL (pronounce it like ‘particle’ and it helps to avoid any kind of dyslexia) parses a statement with INSERT/UPDATE/DELETE/SELECT and calls the NoSQL API you already know (Put/Get/Update/Delete Item, Query and Scan). It looks like SQL but SQL is a language that declares operations on a set of rows, like relational tables or views, which are a logical layer above the physical model. In RDBMS, you build your SQL queries according to your business needs, not the physical layout. Of course, the physical layout (like indexing, partitioning) is also optimized for this access, but this is done independently. With PartiQL on DynamoDB you must know which operation will happen when you write your statement. Because all the simplicity and scalability of DynamoDB resides on the bounded API that matches the physical layout:

GetItem does partition pruning + unique hash/index access to retrieve one item
Query does partition pruning + index access to retrieve a sorted range of items
Scan reads all partitions, possibly in parallel, to retrieve all items

Of course, the cost is different. With the DynamoDB API you know which one you are doing because you call a different operation. With PartiQL you should know what you are doing but you execute the same statement (SELECT) and the data access will depend on the columns referenced in WHERE clause. Basically, if you don’t have an equality predicate on the partition (HASH) key, you have to read all partitions (Scan). If you have an equality predicate on the partition (HASH) key and inequality on the sort (RANGE) key you benefit from partition pruning (Query). This is obvious when you know what is a hash function, but error-prone if you don’t know the data model. The DynamoDB API helps you to prevent that because your fingers should hurt when typing “scan” for a large table.

Scan

So, if what you want is actually get all items, because you need all of them, or maybe to filter out a small part of them only, you want a scan. Yes, it reads everything, but it is the most efficient access to read a large portion of your table. Because with one RCU you can get many items. Doing the same (getting all items) with GetItem would cost one RCU per item (I suppose strong consistency and small items here). To put it basically, for OLTP workload (many users reading few items) you avoid scans on large tables. DynamoDB is a key-value store: the goal is to access by the key. And for some reporting or to export data, you may scan, which is expensive (in time and RCU) but not done frequently.

As seen in the previous post, scanning the whole table to get all items with all attributes is a simple SELECT * FROM:


[opc@a aws]$ aws dynamodb execute-statement --statement "select * from Demo"

{"Items":[
{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"a"},"MyKeySort":{"N":"1"}},
{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"use parameters when embedding SQL in programs"},"MyKeySort":{"N":"2"}},
{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"c"},"MyKeySort":{"N":"3"}},
{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"d"},"MyKeySort":{"N":"4"}},
{"MyKeyPart":{"N":"2"},"MyUnstructuredData":{"S":"e"},"MyKeySort":{"N":"5"}},
{"MyKeyPart":{"N":"1"},"MyUnstructuredData":{"S":"here is my first insert :)"},"MyKeySort":{"N":"1"}}
]}

As long as there’s no equality predicate on the primary key (or the hash part of it in case of composite hash/sort key) the SELECT will do a scan. I mentioned “equality”, we will see later when there are many equality predicates or a list of values to be equal to. We will see later, probably in a further post, what happens with secondary indexes. Anyway, this is not a RDBMS. When you query the table, there’s no query planer to optimize the access to read from an index. If you want to access by a secondary index, the index name must be mentioned in the FROM clause.

Another thing that we have seen in the previous post is that, as it is a scan, you cannot have the partition key in the ORDER BY because DynamoDB does not sort the rows when retrieved from multiple partitions, and PartiQL do not do further data processing on the result. So, basically, there’s no possible ORDER BY when not having a WHERE clause on the partition key:



[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo order by MyKeyPart"

An error occurred (ValidationException) when calling the ExecuteStatement operation: Must have WHERE clause in the statement when using ORDER BY clause.

[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo order by MyKeySort"

An error occurred (ValidationException) when calling the ExecuteStatement operation: Must have WHERE clause in the statement when using ORDER BY clause.

Query

Then, if we query for one partition only, this is a Query rather than a Scan. Here is an example where I select only the items where MyKeyPart = 2 which, with the HASH function, maps to only one partition:


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo where MyKeyPart = 2"

{"Items":[{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"5"}}]}

The items are ordered by MyKeySort even in the absence of ORDER BY because this is how it is stored and retreived physically within each partition. But, as SQL is a declarative language, I prefer not to rely on the order without ORDER BY clause.

Here is the correct way to do it, with no additional cost:


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo where MyKeyPart = 2 order by MyKeySort"
{"Items":[{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"5"}}]}


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo where MyKeyPart = 2 order by MyKeySort desc"
{"Items":[{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"5"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"1"}}]}

Here because there is only one value for MyKeyPart I didn’t need to put MyKeyPart in the ORDER BY, but with multiple values you need to:


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort \
from Demo where MyKeyPart = 1  or MyKeyPart = 2 order by MyKeySort"

An error occurred (ValidationException) when calling the ExecuteStatement operation: Must have hash key in ORDER BY clause when more than one hash key condition specified in WHERE clause.

[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort \
from Demo where MyKeyPart = 1  or MyKeyPart = 2 order by MyKeyPart,MyKeyPart desc"

{"Items":[{"MyKeyPart":{"N":"1"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"5"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"1"}}]}

[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort \
from Demo where MyKeyPart = 1  or MyKeyPart = 2 order by MyKeyPart desc,MyKeyPart"

{"Items":[{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"5"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"1"},"MyKeySort":{"N":"1"}}]}

You might be surprised to see this query with multiple values run as a Query rather than a Scan. What if they come from multiple partitions?
This is possible when the number of values is well known in advance (“1” and “2” here) and then this can be sorted first, and a Query run for each of them. Of course, this will multiply the cost of it. For example, because I know that I inserted values 1 to 5, I can get all my items with:


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort \
from Demo where MyKeyPart in [1,2,3,4,5] order by MyKeyPart,MyKeySort"

{"Items":[{"MyKeyPart":{"N":"1"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"5"}}]}

So I’m able to get all items sorted now? Yes, but for a higher cost than a scan because it will query them one by one. I would be cheaper to Scan here but there is no optimizer to estimate the cost of both operations and choose the cheaper. But at least, the cost is predictable as it is proportional to the number of key values in the list.

I cannot use inequalities, or BETWEEN, because they work on a range and this Query access can be done only on known values.


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo where MyKeyPart between 2 and 2 order by MyKeyPart,MyKeySort"

An error occurred (ValidationException) when calling the ExecuteStatement operation: Must have at least one non-optional hash key condition in WHERE clause when using ORDER BY clause.

Here, even if maths tells me that it is equivalent to equality (“MyKeyPart between 2 and 2” is the same as “MyKeyPart = 2”) we have no optimizer there to do those transformations. The rule is basic: a set of value can be sorted and queried individually but anything else is considered as a range of value that cannot be accessed with a hash function.

How can I be sure about this behaviour? I have a small table where the response time difference is not significant. Be best proof is to see what happens when full table scan is impossible. There’s an IAM policy to deny scans:

PartiQL documentation (https://t.co/ezzXfsyADB) explains the scenarios when a SELECT can become a SCAN. You can also explicitly deny Scan via IAM policy (https://t.co/elVs4HpvtM) to avoid full table scans.

— DynamoDB (@dynamodb) November 24, 2020

I have created a user with deny on “dynamodb:PartiQLSelect” action on condition “dynamodb:FullTableScan”=”True”

With this user profile I execute the following:


[opc@a aws]$ aws --profile noscan dynamodb execute-statement --statement "select MyKeyPart,MyKeySort \
from Demo"

An error occurred (AccessDeniedException) when calling the ExecuteStatement operation: User: arn:aws:iam::802756008554:user/ddb-noscan is not authorized to perform: dynamodb:PartiQLSelect on resource: arn:aws:dynamodb:eu-west-1:802756008554:table/Demo with an explicit deny

[opc@a aws]$ aws --profile noscan dynamodb execute-statement --statement "select MyKeyPart,MyKeySort \
from Demo where MyKeyPart in [1,2,3,4,5]"                

{"Items":[{"MyKeyPart":{"N":"1"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"5"}}]}

It is clear that when Full Table Scan is denied the WHERE on a list of 5 values is still possible. Because it 5 query calls instead of a scan.

Pagination

I have additionally inserted many rows with MyKeyPart=10 and large size attributes, and query them:


$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo where MyKeyPart=10"

{"Items":[{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"1"}},{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"2"}},{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"3"}},{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"4"}},{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"5"}}],"NextToken":"CS4sUIPi4Efg7eSg4sGJZHJ09C/m8JWMwLXB+DF5n54EIBl6yuPZNAHfoRUFg7qgGg872qXswoXSZEI/XAIfvUPNisWSYGrPiquxLFakMecd6aF/ggaexxpKlhPS+ridkOXu8HoWIuWgSXFRBa32QmIXITRhrSMwuT1Q54+6Li6emcxvtpJfmxvxWf/yQkece5nqQIwH/EC3vAr1SZ4Pd537qexKejVHJ+2QrXALwG283UR/obWc53A2HTQ+G3cNeL4xOvVwp9gsOhlKxhsRrS+GqHRF0IHlGrpsdc0LkbMS1hISuagp/KZ0dqP/v7ejB6HsEHhFYZeKYZBoysTYTzhpB02NF3F4MSKp8QF4nO4vcq4="}

I get a few items and a “Next Token” that is quite large.

I can query the next pages with the –next-token option:


$ ws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo where MyKeyPart=10 \
--next-token CS4sUIPi4Efg7eSg4sGJZHJ09C/m8JWMwLXB+DF5n54EIBl6yuPZNAHfoRUFg7qgGg872qXswoXSZEI/XAIfvUPNisWSYGrPiquxLFakMecd6aF/ggaexxpKlhPS+ridkOXu8HoWIuWgSXFRBa32QmIXITRhrSMwuT1Q54+6Li6emcxvtpJfmxvxWf/yQkece5nqQIwH/EC3vAr1SZ4Pd537qexKejVHJ+2QrXALwG283UR/obWc53A2HTQ+G3cNeL4xOvVwp9gsOhlKxhsRrS+GqHRF0IHlGrpsdc0LkbMS1hISuagp/KZ0dqP/v7ejB6HsEHhFYZeKYZBoysTYTzhpB02NF3F4MSKp8QF4nO4vcq4="

{"Items":[{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"6"}},{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"7"}},{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"8"}},{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"9"}},{"MyKeyPart":{"N":"10"},"MyKeySort":{"N":"10"}}],"NextToken":"FjHEA2wnIK74SlGaS6TiPSv2fEwfiZhJNHyxvJ+qG750oeKlqSNyx9IDdCUD+m2rSpodPIFJhYYQHXBM9sJed3k6qaA/aUk4s4DUlPvZHl7WAJ4rTY0AmNDUYBPqWyCV8FliSsGPtFTfj1A9T4zD1TU6uuvNIORY/zKHtsAjWzT4Jsg5y32MFcVOmOsDBhyWsQotFqxy1ErMGhJy3cQnEvy1P1KpQak6sflzp3sWLWzUgOXQB/xF1PXRtT8w/E1lPk26LnA/L2bA91nucuohN63hP3MVojPH0GkPCjZsx08wJTn4MEpqDArEREWO2XCkL/GI7vTtYw6GXRenKZoatSG55yKCVDkFRuw7cbK749mEIb6r6Xs="}

Again, this is completely different from SQL databases where you have cursors, but this is adapted to DynamoDB query that reads ranges of items with small chunks.

Projection

I used SELECT with * to get the whole item key and attributes (like ALL_ATTRIBUTES), and with a list of attributes to do a query projection (like SPECIFIC_ATTRIBUTES). There’s no aggregation and I don’t think we can do the equivalent of COUNT. Here is how I would do it if it were possible:


[opc@a aws]$ aws dynamodb execute-statement --statement "select count(*) from Demo"

An error occurred (ValidationException) when calling the ExecuteStatement operation: Unexpected path component at 1:8

This clearly not supported (yet).

According to the documentation expressions should be allowed, like this:


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort,size(MyUnstructuredData) from Demo where MyKeyPart=2 and size(MyUnstructuredData)>10"

An error occurred (ValidationException) when calling the ExecuteStatement operation: Unexpected path component at 1:28

Apparently, this size() function is allowed only on the WHERE clause:


[opc@a aws]$ aws dynamodb execute-statement --statement "select MyKeyPart,MyKeySort from Demo where MyKeyPart=2 and size(MyUnstructuredData)>10"

{"Items":[{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"2"}}]}

In summary, be careful. You must know only which attribute you filter in the WHERE clause. Equality on partition key allows single hash partition access. Without it, it is a scan which can take time and lof of RCU.

Cet article DynamoDB PartiQL – part II: SELECT est apparu en premier sur Blog dbi services.

↧

Documentum DM_SYSOBJECT_E_NO_BROWSE_ACCESS No browse access for sysobject with ID

November 26, 2020, 4:50 am

≫ Next: PostgreSQL 14: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

≪ Previous: DynamoDB PartiQL – part II: SELECT

When upgrading you may get this specific issues if you are trying to edit content or create new documents. We got it when creating a new document in a specific folder, D2 told us that the folder doesn’t exists and that it had no browse access.

Foolishness! The folder does exist and all accesses are set properly so what happened? Here is the full log:

Caused by: com.documentum.fc.client.DfIdNotFoundException: [DM_API_E_EXIST]error:  "Folder specified by 0b010ba18005277d does not exist."
        at com.documentum.fc.client.impl.session.Session.getFolderBySpecification(Session.java:1289)
        at com.documentum.fc.client.impl.session.SessionHandle.getFolderBySpecification(SessionHandle.java:835)
        at com.emc.smartwave.web.servlets.listeners.properties.D2CreationServicePlugin.printParentFolder(D2CreationServicePlugin.java:151)
        at com.emc.smartwave.web.servlets.listeners.properties.D2CreationServicePlugin.formerOnAfter(D2CreationServicePlugin.java:187)
        at com.emc.smartwave.web.servlets.listeners.properties.D2CreationServicePlugin.createProperties(D2CreationServicePlugin.java:133)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at com.emc.d2fs.dctm.aspects.InjectSessionAspect.callPlugin(InjectSessionAspect.java:487)
        at com.emc.d2fs.dctm.aspects.InjectSessionAspect.ajc$inlineAccessMethod$com_emc_d2fs_dctm_aspects_InjectSessionAspect$com_emc_d2fs_dctm_aspects_InjectSessionAspect$callPlugin(InjectSessionAspect.java:1)
        at com.emc.d2fs.dctm.aspects.InjectSessionAspect.process(InjectSessionAspect.java:130)
        at com.emc.d2fs.dctm.web.services.create.D2CreationService.createProperties(D2CreationService.java:97)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at com.emc.d2fs.dctm.aspects.InjectSessionAspect.callPlugin(InjectSessionAspect.java:487)
        at com.emc.d2fs.dctm.aspects.InjectSessionAspect.ajc$inlineAccessMethod$com_emc_d2fs_dctm_aspects_InjectSessionAspect$com_emc_d2fs_dctm_aspects_InjectSessionAspect$callPlugin(InjectSessionAspect.java:1)
        at com.emc.d2fs.dctm.aspects.InjectSessionAspect.process(InjectSessionAspect.java:236)
        ... 55 common frames omitted
Caused by: com.documentum.fc.common.DfException: [DM_SYSOBJECT_E_NO_BROWSE_ACCESS]error:  "No browse access for sysobject with ID '0b010ba18005277d'."
        at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:57)
        at com.documentum.fc.client.impl.connection.docbase.MessageEntry.getException(MessageEntry.java:39)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseMessageManager.getException(DocbaseMessageManager.java:137)
        at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.checkForMessages(NetwiseDocbaseRpcClient.java:329)
        at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.applyForObject(NetwiseDocbaseRpcClient.java:672)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection$8.evaluate(DocbaseConnection.java:1382)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.evaluateRpc(DocbaseConnection.java:1141)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.applyForObject(DocbaseConnection.java:1374)
        at com.documentum.fc.client.impl.docbase.DocbaseApi.parameterizedFetch(DocbaseApi.java:107)
        at com.documentum.fc.client.impl.objectmanager.PersistentDataManager.fetchFromServer(PersistentDataManager.java:191)
        at com.documentum.fc.client.impl.objectmanager.PersistentDataManager.getData(PersistentDataManager.java:82)
        at com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObjectFromServer(PersistentObjectManager.java:355)
        at com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObject(PersistentObjectManager.java:311)
        at com.documentum.fc.client.impl.session.Session.getObject(Session.java:965)
        at com.documentum.fc.client.impl.session.Session.getFolderBySpecification(Session.java:1281)
        ... 74 common frames omitted

The issue resolution is quite simple, it’s just a cache problem.

You only need to force clearing the caches on the clients. There’s a simple step to do this. The content server is keeping a reference on the “client version” in the dm_docbase_config object. We only need to increment its value to force all clients to reload and clear the caches. So here is how you do it:

In DQL:

retrieve,c,dm_docbase_config
set,c,l,client_pcaching_change
7
save,c,l
reinit,c

Just increase the client_pcaching_change number by 1.

Cet article Documentum DM_SYSOBJECT_E_NO_BROWSE_ACCESS No browse access for sysobject with ID est apparu en premier sur Blog dbi services.

↧