Quantcast
Channel: dbi Blog
Viewing all 2844 articles
Browse latest View live

PostgreSQL partitioning (7): Indexing and constraints

$
0
0

Yesterday we talked about attaching and detaching of partitions. Today we will look at indexing and constraints when it comes to partitioned tables. If you missed the last posts, again, here they are:

  1. PostgreSQL partitioning (1): Preparing the data set
  2. PostgreSQL partitioning (2): Range partitioning
  3. PostgreSQL partitioning (3): List partitioning
  4. PostgreSQL partitioning (4) : Hash partitioning
  5. PostgreSQL partitioning (5): Partition pruning
  6. PostgreSQL partitioning (6): Attaching and detaching partitions

When declarative partitioning was introduced in PostgreSQL 10 there were quite some limitations. For example: If you wanted to create a primary key on a partitioned table that just failed and PostgreSQL told you that it is not supported. Things improved quite much since then and today you can do many things with partitioned tables that did not work initially (You can check this for an overview of the improvements that came in PostgreSQL 11).

This time we will use the list partitioned table and this is how it looks like currently:

postgres=# \d+ traffic_violations_p_list
                                   Partitioned table "public.traffic_violations_p_list"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: LIST (violation_type)
Partitions: traffic_violations_p_list_citation FOR VALUES IN ('Citation'),
            traffic_violations_p_list_esero FOR VALUES IN ('ESERO'),
            traffic_violations_p_list_sero FOR VALUES IN ('SERO'),
            traffic_violations_p_list_warning FOR VALUES IN ('Warning'),
            traffic_violations_p_list_default DEFAULT

There is not a single constraint or index and the same is true for the partitions (only showing the first one here but is is the same for all of them):

postgres=# \d traffic_violations_p_list_citation
                 Table "public.traffic_violations_p_list_citation"
         Column          |          Type          | Collation | Nullable | Default 
-------------------------+------------------------+-----------+----------+---------
 seqid                   | text                   |           |          | 
 date_of_stop            | date                   |           |          | 
 time_of_stop            | time without time zone |           |          | 
 agency                  | text                   |           |          | 
 subagency               | text                   |           |          | 
 description             | text                   |           |          | 
 location                | text                   |           |          | 
 latitude                | numeric                |           |          | 
 longitude               | numeric                |           |          | 
 accident                | text                   |           |          | 
 belts                   | boolean                |           |          | 
 personal_injury         | boolean                |           |          | 
 property_damage         | boolean                |           |          | 
 fatal                   | boolean                |           |          | 
 commercial_license      | boolean                |           |          | 
 hazmat                  | boolean                |           |          | 
 commercial_vehicle      | boolean                |           |          | 
 alcohol                 | boolean                |           |          | 
 workzone                | boolean                |           |          | 
 state                   | text                   |           |          | 
 vehicletype             | text                   |           |          | 
 year                    | smallint               |           |          | 
 make                    | text                   |           |          | 
 model                   | text                   |           |          | 
 color                   | text                   |           |          | 
 violation_type          | text                   |           |          | 
 charge                  | text                   |           |          | 
 article                 | text                   |           |          | 
 contributed_to_accident | boolean                |           |          | 
 race                    | text                   |           |          | 
 gender                  | text                   |           |          | 
 driver_city             | text                   |           |          | 
 driver_state            | text                   |           |          | 
 dl_state                | text                   |           |          | 
 arrest_type             | text                   |           |          | 
 geolocation             | point                  |           |          | 
 council_districts       | smallint               |           |          | 
 councils                | smallint               |           |          | 
 communities             | smallint               |           |          | 
 zip_codes               | smallint               |           |          | 
 municipalities          | smallint               |           |          | 
Partition of: traffic_violations_p_list FOR VALUES IN ('Citation')

As already mentioned in one of the previous posts we can not create a primary key or unique index because there are duplicate rows in the partitioned table. We can, however, create a standard btree index:

postgres=# create index i1 on traffic_violations_p_list ( model );
CREATE INDEX
postgres=# \d+ traffic_violations_p_list
                                   Partitioned table "public.traffic_violations_p_list"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: LIST (violation_type)
Indexes:
    "i1" btree (model)
Partitions: traffic_violations_p_list_citation FOR VALUES IN ('Citation'),
            traffic_violations_p_list_esero FOR VALUES IN ('ESERO'),
            traffic_violations_p_list_sero FOR VALUES IN ('SERO'),
            traffic_violations_p_list_warning FOR VALUES IN ('Warning'),
            traffic_violations_p_list_default DEFAULT

This is a so called partitioned index and you can verify that with:

postgres=# select * from pg_partition_tree('i1');
                    relid                     | parentrelid | isleaf | level 
----------------------------------------------+-------------+--------+-------
 i1                                           |             | f      |     0
 traffic_violations_p_list_citation_model_idx | i1          | t      |     1
 traffic_violations_p_list_esero_model_idx    | i1          | t      |     1
 traffic_violations_p_list_sero_model_idx     | i1          | t      |     1
 traffic_violations_p_list_warning_model_idx  | i1          | t      |     1
 traffic_violations_p_list_default_model_idx  | i1          | t      |     1

Indeed the index cascaded down to all the partitions:

postgres=# \d traffic_violations_p_list_citation
                 Table "public.traffic_violations_p_list_citation"
         Column          |          Type          | Collation | Nullable | Default 
-------------------------+------------------------+-----------+----------+---------
 seqid                   | text                   |           |          | 
 date_of_stop            | date                   |           |          | 
 time_of_stop            | time without time zone |           |          | 
 agency                  | text                   |           |          | 
 subagency               | text                   |           |          | 
 description             | text                   |           |          | 
 location                | text                   |           |          | 
 latitude                | numeric                |           |          | 
 longitude               | numeric                |           |          | 
 accident                | text                   |           |          | 
 belts                   | boolean                |           |          | 
 personal_injury         | boolean                |           |          | 
 property_damage         | boolean                |           |          | 
 fatal                   | boolean                |           |          | 
 commercial_license      | boolean                |           |          | 
 hazmat                  | boolean                |           |          | 
 commercial_vehicle      | boolean                |           |          | 
 alcohol                 | boolean                |           |          | 
 workzone                | boolean                |           |          | 
 state                   | text                   |           |          | 
 vehicletype             | text                   |           |          | 
 year                    | smallint               |           |          | 
 make                    | text                   |           |          | 
 model                   | text                   |           |          | 
 color                   | text                   |           |          | 
 violation_type          | text                   |           |          | 
 charge                  | text                   |           |          | 
 article                 | text                   |           |          | 
 contributed_to_accident | boolean                |           |          | 
 race                    | text                   |           |          | 
 gender                  | text                   |           |          | 
 driver_city             | text                   |           |          | 
 driver_state            | text                   |           |          | 
 dl_state                | text                   |           |          | 
 arrest_type             | text                   |           |          | 
 geolocation             | point                  |           |          | 
 council_districts       | smallint               |           |          | 
 councils                | smallint               |           |          | 
 communities             | smallint               |           |          | 
 zip_codes               | smallint               |           |          | 
 municipalities          | smallint               |           |          | 
Partition of: traffic_violations_p_list FOR VALUES IN ('Citation')
Indexes:
    "traffic_violations_p_list_citation_model_idx" btree (model)

As soon as you add another partition it will be indexed automatically:

postgres=# create table traffic_violations_p_list_demo
postgres-# partition of traffic_violations_p_list
postgres-# for values in ('demo');
CREATE TABLE
postgres=# \d traffic_violations_p_list_demo
                   Table "public.traffic_violations_p_list_demo"
         Column          |          Type          | Collation | Nullable | Default 
-------------------------+------------------------+-----------+----------+---------
 seqid                   | text                   |           | not null | 
 date_of_stop            | date                   |           |          | 
 time_of_stop            | time without time zone |           |          | 
 agency                  | text                   |           |          | 
 subagency               | text                   |           |          | 
 description             | text                   |           |          | 
 location                | text                   |           |          | 
 latitude                | numeric                |           |          | 
 longitude               | numeric                |           |          | 
 accident                | text                   |           |          | 
 belts                   | boolean                |           |          | 
 personal_injury         | boolean                |           |          | 
 property_damage         | boolean                |           |          | 
 fatal                   | boolean                |           |          | 
 commercial_license      | boolean                |           |          | 
 hazmat                  | boolean                |           |          | 
 commercial_vehicle      | boolean                |           |          | 
 alcohol                 | boolean                |           |          | 
 workzone                | boolean                |           |          | 
 state                   | text                   |           |          | 
 vehicletype             | text                   |           |          | 
 year                    | smallint               |           |          | 
 make                    | text                   |           |          | 
 model                   | text                   |           |          | 
 color                   | text                   |           |          | 
 violation_type          | text                   |           |          | 
 charge                  | text                   |           |          | 
 article                 | text                   |           |          | 
 contributed_to_accident | boolean                |           |          | 
 race                    | text                   |           |          | 
 gender                  | text                   |           |          | 
 driver_city             | text                   |           |          | 
 driver_state            | text                   |           |          | 
 dl_state                | text                   |           |          | 
 arrest_type             | text                   |           |          | 
 geolocation             | point                  |           |          | 
 council_districts       | smallint               |           |          | 
 councils                | smallint               |           |          | 
 communities             | smallint               |           |          | 
 zip_codes               | smallint               |           |          | 
 municipalities          | smallint               |           |          | 
Partition of: traffic_violations_p_list FOR VALUES IN ('demo')
Indexes:
    "traffic_violations_p_list_demo_model_idx" btree (model)
Check constraints:
    "chk_make" CHECK (length(seqid) > 1)

You can as well create an index on a specific partition only (maybe because you know that the application is searching on a specific column on that partition):

postgres=# create index i2 on traffic_violations_p_list_citation (make);
CREATE INDEX
postgres=# \d traffic_violations_p_list_citation
                 Table "public.traffic_violations_p_list_citation"
         Column          |          Type          | Collation | Nullable | Default 
-------------------------+------------------------+-----------+----------+---------
 seqid                   | text                   |           |          | 
 date_of_stop            | date                   |           |          | 
 time_of_stop            | time without time zone |           |          | 
 agency                  | text                   |           |          | 
 subagency               | text                   |           |          | 
 description             | text                   |           |          | 
 location                | text                   |           |          | 
 latitude                | numeric                |           |          | 
 longitude               | numeric                |           |          | 
 accident                | text                   |           |          | 
 belts                   | boolean                |           |          | 
 personal_injury         | boolean                |           |          | 
 property_damage         | boolean                |           |          | 
 fatal                   | boolean                |           |          | 
 commercial_license      | boolean                |           |          | 
 hazmat                  | boolean                |           |          | 
 commercial_vehicle      | boolean                |           |          | 
 alcohol                 | boolean                |           |          | 
 workzone                | boolean                |           |          | 
 state                   | text                   |           |          | 
 vehicletype             | text                   |           |          | 
 year                    | smallint               |           |          | 
 make                    | text                   |           |          | 
 model                   | text                   |           |          | 
 color                   | text                   |           |          | 
 violation_type          | text                   |           |          | 
 charge                  | text                   |           |          | 
 article                 | text                   |           |          | 
 contributed_to_accident | boolean                |           |          | 
 race                    | text                   |           |          | 
 gender                  | text                   |           |          | 
 driver_city             | text                   |           |          | 
 driver_state            | text                   |           |          | 
 dl_state                | text                   |           |          | 
 arrest_type             | text                   |           |          | 
 geolocation             | point                  |           |          | 
 council_districts       | smallint               |           |          | 
 councils                | smallint               |           |          | 
 communities             | smallint               |           |          | 
 zip_codes               | smallint               |           |          | 
 municipalities          | smallint               |           |          | 
Partition of: traffic_violations_p_list FOR VALUES IN ('Citation')
Indexes:
    "i2" btree (make)
    "traffic_violations_p_list_citation_model_idx" btree (model)

What is not working right now, is creating a partitioned index concurrently:

postgres=# create index CONCURRENTLY i_con on traffic_violations_p_list (zip_codes);
psql: ERROR:  cannot create index on partitioned table "traffic_violations_p_list" concurrently

This implies that there will be locking when you create a partition index an indeed if you create the index in one session:

postgres=# create ndex i_mun on traffic_violations_p_list (municipalities);
CREATE INDEX

… and at the same time insert something in another session it will block until the index got created in the first session:

postgres=# insert into traffic_violations_p_list ( seqid, date_of_stop ) values ( 'xxxxx', date('01.01.2023'));
-- blocks until index above is created

You can limit locking time when you create the partitioned index on the partitioned table only but do not cascade down to the partitions:

postgres=# create index i_demo on only traffic_violations_p_list (accident);
CREATE INDEX

This will leave the index in an invalid state:

postgres=# select indisvalid from pg_index where indexrelid = 'i_demo'::regclass;
 indisvalid 
------------
 f
(1 row)

Now you can create the index concurrently on all the partitions:

postgres=# create index concurrently i_demo_citation on traffic_violations_p_list_citation (accident);
CREATE INDEX
postgres=# create index concurrently i_demo_demo on traffic_violations_p_list_demo (accident);
CREATE INDEX
postgres=# create index concurrently i_demo_esero on traffic_violations_p_list_esero(accident);
CREATE INDEX
postgres=# create index concurrently i_demo_sero on traffic_violations_p_list_sero(accident);
CREATE INDEX
postgres=# create index concurrently i_demo_warning on traffic_violations_p_list_warning(accident);
CREATE INDEX
postgres=# create index concurrently i_demo_default on traffic_violations_p_list_default(accident);
CREATE INDEX

Once you have that you can attach all the indexes to the partitioned index:

postgres=# alter index i_demo attach partition i_demo_citation;
ALTER INDEX
postgres=# alter index i_demo attach partition i_demo_demo;
ALTER INDEX
postgres=# alter index i_demo attach partition i_demo_esero;
ALTER INDEX
postgres=# alter index i_demo attach partition i_demo_sero;
ALTER INDEX
postgres=# alter index i_demo attach partition i_demo_warning;
ALTER INDEX
postgres=# alter index i_demo attach partition i_demo_default;
ALTER INDEX

This makes the partitioned index valid automatically:

postgres=# select indisvalid from pg_index where indexrelid = 'i_demo'::regclass;
 indisvalid 
------------
 t
(1 row)

The very same is true for constraints: You can create them on the partitioned table and on specific partitions:

postgres=# alter table traffic_violations_p_list add constraint chk_make check (length(seqid)>1);
ALTER TABLE
postgres=# \d+ traffic_violations_p_list
                                   Partitioned table "public.traffic_violations_p_list"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: LIST (violation_type)
Indexes:
    "i1" btree (model)
Check constraints:
    "chk_make" CHECK (length(seqid) > 1)
Partitions: traffic_violations_p_list_citation FOR VALUES IN ('Citation'),
            traffic_violations_p_list_esero FOR VALUES IN ('ESERO'),
            traffic_violations_p_list_sero FOR VALUES IN ('SERO'),
            traffic_violations_p_list_warning FOR VALUES IN ('Warning'),
            traffic_violations_p_list_default DEFAULT

postgres=# \d+ traffic_violations_p_list_citation
                                     Table "public.traffic_violations_p_list_citation"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition of: traffic_violations_p_list FOR VALUES IN ('Citation')
Partition constraint: ((violation_type IS NOT NULL) AND (violation_type = 'Citation'::text))
Indexes:
    "i2" btree (make)
    "traffic_violations_p_list_citation_model_idx" btree (model)
Check constraints:
    "chk_make" CHECK (length(seqid) > 1)
Access method: heap

For a specific partition only:

postgres=# alter table traffic_violations_p_list_citation add constraint chk_state check (state is not null);
ALTER TABLE
postgres=# \d traffic_violations_p_list_citation
                 Table "public.traffic_violations_p_list_citation"
         Column          |          Type          | Collation | Nullable | Default 
-------------------------+------------------------+-----------+----------+---------
 seqid                   | text                   |           |          | 
 date_of_stop            | date                   |           |          | 
 time_of_stop            | time without time zone |           |          | 
 agency                  | text                   |           |          | 
 subagency               | text                   |           |          | 
 description             | text                   |           |          | 
 location                | text                   |           |          | 
 latitude                | numeric                |           |          | 
 longitude               | numeric                |           |          | 
 accident                | text                   |           |          | 
 belts                   | boolean                |           |          | 
 personal_injury         | boolean                |           |          | 
 property_damage         | boolean                |           |          | 
 fatal                   | boolean                |           |          | 
 commercial_license      | boolean                |           |          | 
 hazmat                  | boolean                |           |          | 
 commercial_vehicle      | boolean                |           |          | 
 alcohol                 | boolean                |           |          | 
 workzone                | boolean                |           |          | 
 state                   | text                   |           |          | 
 vehicletype             | text                   |           |          | 
 year                    | smallint               |           |          | 
 make                    | text                   |           |          | 
 model                   | text                   |           |          | 
 color                   | text                   |           |          | 
 violation_type          | text                   |           |          | 
 charge                  | text                   |           |          | 
 article                 | text                   |           |          | 
 contributed_to_accident | boolean                |           |          | 
 race                    | text                   |           |          | 
 gender                  | text                   |           |          | 
 driver_city             | text                   |           |          | 
 driver_state            | text                   |           |          | 
 dl_state                | text                   |           |          | 
 arrest_type             | text                   |           |          | 
 geolocation             | point                  |           |          | 
 council_districts       | smallint               |           |          | 
 councils                | smallint               |           |          | 
 communities             | smallint               |           |          | 
 zip_codes               | smallint               |           |          | 
 municipalities          | smallint               |           |          | 
Partition of: traffic_violations_p_list FOR VALUES IN ('Citation')
Indexes:
    "i2" btree (make)
    "traffic_violations_p_list_citation_model_idx" btree (model)
Check constraints:
    "chk_make" CHECK (length(seqid) > 1)
    "chk_state" CHECK (state IS NOT NULL)

Changing the properties of a column works the same way: Either on the partitioned table level or for a specific partition only:

postgres=# alter table traffic_violations_p_list alter column seqid set not null;
ALTER TABLE
postgres=# alter table traffic_violations_p_list_citation alter column time_of_stop set not null;
ALTER TABLE
postgres=# \d+ traffic_violations_p_list
                                   Partitioned table "public.traffic_violations_p_list"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           | not null |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: LIST (violation_type)
Indexes:
    "i1" btree (model)
Check constraints:
    "chk_make" CHECK (length(seqid) > 1)
Partitions: traffic_violations_p_list_citation FOR VALUES IN ('Citation'),
            traffic_violations_p_list_esero FOR VALUES IN ('ESERO'),
            traffic_violations_p_list_sero FOR VALUES IN ('SERO'),
            traffic_violations_p_list_warning FOR VALUES IN ('Warning'),
            traffic_violations_p_list_default DEFAULT

postgres=# \d traffic_violations_p_list_citation
                 Table "public.traffic_violations_p_list_citation"
         Column          |          Type          | Collation | Nullable | Default 
-------------------------+------------------------+-----------+----------+---------
 seqid                   | text                   |           | not null | 
 date_of_stop            | date                   |           |          | 
 time_of_stop            | time without time zone |           | not null | 
 agency                  | text                   |           |          | 
 subagency               | text                   |           |          | 
 description             | text                   |           |          | 
 location                | text                   |           |          | 
 latitude                | numeric                |           |          | 
 longitude               | numeric                |           |          | 
 accident                | text                   |           |          | 
 belts                   | boolean                |           |          | 
 personal_injury         | boolean                |           |          | 
 property_damage         | boolean                |           |          | 
 fatal                   | boolean                |           |          | 
 commercial_license      | boolean                |           |          | 
 hazmat                  | boolean                |           |          | 
 commercial_vehicle      | boolean                |           |          | 
 alcohol                 | boolean                |           |          | 
 workzone                | boolean                |           |          | 
 state                   | text                   |           |          | 
 vehicletype             | text                   |           |          | 
 year                    | smallint               |           |          | 
 make                    | text                   |           |          | 
 model                   | text                   |           |          | 
 color                   | text                   |           |          | 
 violation_type          | text                   |           |          | 
 charge                  | text                   |           |          | 
 article                 | text                   |           |          | 
 contributed_to_accident | boolean                |           |          | 
 race                    | text                   |           |          | 
 gender                  | text                   |           |          | 
 driver_city             | text                   |           |          | 
 driver_state            | text                   |           |          | 
 dl_state                | text                   |           |          | 
 arrest_type             | text                   |           |          | 
 geolocation             | point                  |           |          | 
 council_districts       | smallint               |           |          | 
 councils                | smallint               |           |          | 
 communities             | smallint               |           |          | 
 zip_codes               | smallint               |           |          | 
 municipalities          | smallint               |           |          | 
Partition of: traffic_violations_p_list FOR VALUES IN ('Citation')
Indexes:
    "i2" btree (make)
    "traffic_violations_p_list_citation_model_idx" btree (model)
Check constraints:
    "chk_make" CHECK (length(seqid) > 1)
    "chk_state" CHECK (state IS NOT NULL)

This was indexing and constraints with partitioned tables. In the next post we will have a look sub partitioning.

Cet article PostgreSQL partitioning (7): Indexing and constraints est apparu en premier sur Blog dbi services.


Why ODA reimaging doesn’t work on the first try?

$
0
0

Introduction

Reimaging an ODA is a good practice for a lot of reasons. To make your ODA cleaner if it’s running for many years and if you patch regularly. To simplify patching because if you’re late you could have to apply multiple intermediate patches to reach the target version. Or simply because you need to change the configuration (for example network configuration) and you want to make things clean, and be sure that future patches will be OK after the changes.

Understand the reimaging process

Actually, reimaging is divided in two operations. The pure reimaging of the nodes with a dedicated ISO file (barely an OS installation). And then the create appliance part. Reimaging means completely reinstall the ODA from scratch, so you can think that everything will be cleaned up. But reimaging is something slightly different. It only means reinstalling the software on the node. For sure, if you have 2 nodes (HA ODAs) you will have to do the pure reimaging on both nodes. Most of the time pure reimaging is working fine. Just after the reimaging you will need to deploy/create the appliance (for all ODAs using 18.3 or later) from the first node only, and this step can be quite stressful.

Typical problem encountered

This example comes from an ODA X6-2S, but it’s quite the same problem on all the ODAs. You just succesfully reimaged the server, and configured straight the network with odacli-firstnet. As soon as your ODA is in the network, you can copy the GI and DB clones needed to create the appliance. Create the appliance stands for configuring the system for Oracle, installing all the Oracle software, configuring ASM, creating a first database and so on. You’ll find these steps in the official documentation. In this case, a few minutes after running the create-appliance, you discover that the appliance creation failed.

 /opt/oracle/dcs/bin/odacli describe-job -i 3f93ad2d-7f0f-4f25-90cb-d3937b1270a9

Job details
----------------------------------------------------------------
                     ID:  3f93ad2d-7f0f-4f25-90cb-d3937b1270a9
            Description:  Provisioning service creation
                 Status:  Failure
                Created:  May 24, 2019 5:08:50 PM CEST
                Message:  DCS-10001:Internal error encountered: Fail to run root scripts : .

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Provisioning service creation            May 24, 2019 5:08:50 PM CEST        May 24, 2019 5:21:47 PM CEST        Failure
Provisioning service creation            May 24, 2019 5:08:50 PM CEST        May 24, 2019 5:21:47 PM CEST        Failure
networks updation                        May 24, 2019 5:08:51 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
updating network                         May 24, 2019 5:08:51 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
Setting up Network                       May 24, 2019 5:08:51 PM CEST        May 24, 2019 5:08:51 PM CEST        Success
OS usergroup 'asmdba'creation            May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
OS usergroup 'asmoper'creation           May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
OS usergroup 'asmadmin'creation          May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
OS usergroup 'dba'creation               May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
OS usergroup 'dbaoper'creation           May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
OS usergroup 'oinstall'creation          May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
OS user 'grid'creation                   May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
OS user 'oracle'creation                 May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
SSH equivalance setup                    May 24, 2019 5:08:59 PM CEST        May 24, 2019 5:08:59 PM CEST        Success
Grid home creation                       May 24, 2019 5:09:06 PM CEST        May 24, 2019 5:11:57 PM CEST        Success
Creating GI home directories             May 24, 2019 5:09:06 PM CEST        May 24, 2019 5:09:06 PM CEST        Success
Cloning Gi home                          May 24, 2019 5:09:06 PM CEST        May 24, 2019 5:11:56 PM CEST        Success
Updating GiHome version                  May 24, 2019 5:11:56 PM CEST        May 24, 2019 5:11:57 PM CEST        Success
Storage discovery                        May 24, 2019 5:11:57 PM CEST        May 24, 2019 5:16:33 PM CEST        Success
Grid stack creation                      May 24, 2019 5:16:33 PM CEST        May 24, 2019 5:21:47 PM CEST        Failure
Configuring GI                           May 24, 2019 5:16:33 PM CEST        May 24, 2019 5:18:13 PM CEST        Success
Running GI root scripts                  May 24, 2019 5:18:13 PM CEST        May 24, 2019 5:21:47 PM CEST        Failure

You were thinking that reimaging was a good idea, and now you’re affraid that maybe not… But don’t panic, it’s a normal behavior.

As this error is related to GI stack configuration, you need to look into the GI logs, as you would do on another platform.

vi /u01/app/18.0.0.0/grid/install/root_odadbi03_2019-05-24_17-18-13-479398837.log

...
2019/05/24 17:20:47 CLSRSC-594: Executing installation step 17 of 20: 'InitConfig'.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'odadbi'
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'odadbi' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.evmd' on 'odadbi'
CRS-2672: Attempting to start 'ora.mdnsd' on 'odadbi'
CRS-2676: Start of 'ora.mdnsd' on 'odadbi' succeeded
CRS-2676: Start of 'ora.evmd' on 'odadbi' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'odadbi'
CRS-2676: Start of 'ora.gpnpd' on 'odadbi' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'odadbi'
CRS-2672: Attempting to start 'ora.gipcd' on 'odadbi'
CRS-2676: Start of 'ora.cssdmonitor' on 'odadbi' succeeded
CRS-2676: Start of 'ora.gipcd' on 'odadbi' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'odadbi'
CRS-2672: Attempting to start 'ora.diskmon' on 'odadbi'
CRS-2676: Start of 'ora.diskmon' on 'odadbi' succeeded
CRS-2676: Start of 'ora.cssd' on 'odadbi' succeeded
Creating SQL script file /tmp/asminit_sql_2019-05-24-17-21-15.sql
cat: /etc/grub.conf: Permission denied

SQL*Plus: Release 18.0.0.0.0 - Production on Fri May 24 17:21:15 2019
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle.  All rights reserved.

Connected to an idle instance.

ASM instance started

Total System Global Area 1136934472 bytes
Fixed Size                  8666696 bytes
Variable Size            1103101952 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted
ASM diskgroups volume enabled
create diskgroup DATA NORMAL REDUNDANCY
*
ERROR at line 1:
ORA-15018: diskgroup cannot be created
ORA-15030: diskgroup name "DATA" is in use by another diskgroup
...

Problem is quite obvious, diskgroup creation failed when creating DATA diskgroup. As a result, your GI configuration is not complete, and there is no way to go further.

Why ASM is in trouble when reimaging?

Actually, reimaging an ODA will erase all the data on the local disks (the 2 disks for the system) but will not erase anything on the disks dedicated to ASM (the data disks), even if you’re using a lite ODA like most of us these days. As a result, data disks are still configured with previous ASM headers and data, leading to the failure. You can avoid this error by thinking of cleaning up the ODA BEFORE the reimaging (please refer to the dedicated procedure for your version/your ODA). In our case, we totally missed to clean up before, so after the first deployment failure, you’ll find a script for purging ASM tagging and data on disks :

/opt/oracle/oak/onecmd/cleanup.pl
INFO: *******************************************************************
INFO: ** Starting process to cleanup provisioned host odadbi03           **
INFO: *******************************************************************
INFO: Default mode being used to cleanup a provisioned system.
INFO: It will change all ASM disk status from MEMBER to FORMER
Do you want to continue (yes/no) : yes
INFO:
Running cleanup will delete Grid User - 'grid' and
INFO: DB user - 'oracle' and also the
INFO: groups 'oinstall,dba,asmadmin,asmoper,asmdba'
INFO: nodes will be rebooted
Do you want to continue (yes/no) : yes
…

After this cleanup, the server will reboot, and you will be able to retry the odacli create-appliance:

/opt/oracle/dcs/bin/odacli describe-job -i "3571b291-be91-4cd4-a133-b52ead24ff61"

Job details
----------------------------------------------------------------
                     ID:  3571b291-be91-4cd4-a133-b52ead24ff61
            Description:  Provisioning service creation
                 Status:  Success
                Created:  May 24, 2019 5:53:44 PM CEST
                Message:

Task Name                                Start Time                          End Time                            Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
networks updation                        May 24, 2019 5:53:45 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
updating network                         May 24, 2019 5:53:45 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
Setting up Network                       May 24, 2019 5:53:45 PM CEST        May 24, 2019 5:53:45 PM CEST        Success
OS usergroup 'asmdba'creation            May 24, 2019 5:53:52 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
OS usergroup 'asmoper'creation           May 24, 2019 5:53:52 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
OS usergroup 'asmadmin'creation          May 24, 2019 5:53:52 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
OS usergroup 'dba'creation               May 24, 2019 5:53:52 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
OS usergroup 'dbaoper'creation           May 24, 2019 5:53:52 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
OS usergroup 'oinstall'creation          May 24, 2019 5:53:52 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
OS user 'grid'creation                   May 24, 2019 5:53:52 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
OS user 'oracle'creation                 May 24, 2019 5:53:52 PM CEST        May 24, 2019 5:53:52 PM CEST        Success
SSH equivalance setup                    May 24, 2019 5:53:53 PM CEST        May 24, 2019 5:53:53 PM CEST        Success
Grid home creation                       May 24, 2019 5:54:00 PM CEST        May 24, 2019 5:57:08 PM CEST        Success
Creating GI home directories             May 24, 2019 5:54:00 PM CEST        May 24, 2019 5:54:00 PM CEST        Success
Cloning Gi home                          May 24, 2019 5:54:00 PM CEST        May 24, 2019 5:57:06 PM CEST        Success
Updating GiHome version                  May 24, 2019 5:57:06 PM CEST        May 24, 2019 5:57:08 PM CEST        Success
Storage discovery                        May 24, 2019 5:57:08 PM CEST        May 24, 2019 6:01:46 PM CEST        Success
Grid stack creation                      May 24, 2019 6:01:46 PM CEST        May 24, 2019 6:18:04 PM CEST        Success
Configuring GI                           May 24, 2019 6:01:46 PM CEST        May 24, 2019 6:03:27 PM CEST        Success
Running GI root scripts                  May 24, 2019 6:03:27 PM CEST        May 24, 2019 6:12:52 PM CEST        Success
Running GI config assistants             May 24, 2019 6:12:53 PM CEST        May 24, 2019 6:14:45 PM CEST        Success
Setting AUDIT SYSLOG LEVEL               May 24, 2019 6:14:53 PM CEST        May 24, 2019 6:14:53 PM CEST        Success
Post cluster OAKD configuration          May 24, 2019 6:18:04 PM CEST        May 24, 2019 6:21:17 PM CEST        Success
Disk group 'RECO'creation                May 24, 2019 6:21:25 PM CEST        May 24, 2019 6:21:35 PM CEST        Success
Volume 'datDBTEST'creation               May 24, 2019 6:21:35 PM CEST        May 24, 2019 6:22:06 PM CEST        Success
Volume 'reco'creation                    May 24, 2019 6:22:06 PM CEST        May 24, 2019 6:22:25 PM CEST        Success
Volume 'commonstore'creation             May 24, 2019 6:22:25 PM CEST        May 24, 2019 6:22:44 PM CEST        Success
ACFS File system 'DATA'creation          May 24, 2019 6:22:44 PM CEST        May 24, 2019 6:22:59 PM CEST        Success
ACFS File system 'RECO'creation          May 24, 2019 6:22:59 PM CEST        May 24, 2019 6:23:15 PM CEST        Success
ACFS File system 'DATA'creation          May 24, 2019 6:23:15 PM CEST        May 24, 2019 6:23:30 PM CEST        Success
Database home creation                   May 24, 2019 6:23:30 PM CEST        May 24, 2019 6:27:02 PM CEST        Success
Validating dbHome available space        May 24, 2019 6:23:30 PM CEST        May 24, 2019 6:23:30 PM CEST        Success
Creating DbHome Directory                May 24, 2019 6:23:30 PM CEST        May 24, 2019 6:23:30 PM CEST        Success
Extract DB clones                        May 24, 2019 6:23:30 PM CEST        May 24, 2019 6:25:08 PM CEST        Success
Clone Db home                            May 24, 2019 6:25:08 PM CEST        May 24, 2019 6:26:47 PM CEST        Success
Enable DB options                        May 24, 2019 6:26:47 PM CEST        May 24, 2019 6:26:56 PM CEST        Success
Run Root DB scripts                      May 24, 2019 6:26:56 PM CEST        May 24, 2019 6:26:56 PM CEST        Success
Provisioning service creation            May 24, 2019 6:27:02 PM CEST        May 24, 2019 6:35:18 PM CEST        Success
Database Creation                        May 24, 2019 6:27:02 PM CEST        May 24, 2019 6:33:23 PM CEST        Success
Change permission for xdb wallet files   May 24, 2019 6:33:23 PM CEST        May 24, 2019 6:33:23 PM CEST        Success
Place SnapshotCtrlFile in sharedLoc      May 24, 2019 6:33:23 PM CEST        May 24, 2019 6:33:25 PM CEST        Success
SqlPatch upgrade                         May 24, 2019 6:34:46 PM CEST        May 24, 2019 6:35:16 PM CEST        Success
updating the Database version            May 24, 2019 6:35:16 PM CEST        May 24, 2019 6:35:18 PM CEST        Success
users tablespace creation                May 24, 2019 6:35:18 PM CEST        May 24, 2019 6:35:20 PM CEST        Success
Install TFA                              May 24, 2019 6:35:20 PM CEST        May 24, 2019 6:39:53 PM CEST        Success

Everything is OK now.

Conclusion

ODA reimaging does not include data disk formating, you are now aware of that. The other thing reimaging is not doing is to patch the bios, firmwares and all the microcodes in your ODA. So just after the create-appliance, don’t forget to apply the patch even you’re already in the target version.

Cet article Why ODA reimaging doesn’t work on the first try? est apparu en premier sur Blog dbi services.

SUSE Expert Days 2019 in Lausanne

$
0
0

dbi services is now SUSE Solution Partner. What better way to celebrate that than to attend an event?
Last week I was at the SUSE Expert Days 2019 which took place at the Mövenpick Hotel in Lausanne.

Keynote

After a welcome coffee, keynote speech started with Mauro Beglieri (Global and Enterprise Account Manager). The open-source infrastructure and application delivery vendor is growing worldwide and SUSE ALPS team (Switzerland and Austria) counts now 10 people. From 15th March SUSE is an independent company, in order to have its own strategy no more linked to Micro Focus.
Nowadays there are some barriers to Digital Transformation, such as data growth, new security challenges, programming and deployment models. To handle these issues, SUSE is focus on real-time insight, user requirements, a new deployment strategy and the customer satisfaction. Data management and deployment can be done across 3 different models: the edge, the core and the cloud. How can we make the 3 models live together? SUSE vision is to have 2 blocs:
– some flexibility with software-defined infrastructure
– an application delivery platform
Here the SUSE portfolio:

There is no proprietary solution on all SUSE products.

Cloud Application Platform

Bo Jin (Sales Engineer) presented then the SUSE Cloud Application Platform (CAP).
Containers will become the new infrastructure layer. Docker introduced a standard for containers. Using containers requires an orchestration solution to manage that: Kubernetes.
Cloud Foundry helps developers to push code into the production (cf push).
SUSE is bringing Cloud Foundry to Kubernetes ecosystem in a lightweight way through CAP. CAP is a portable PaaS solution which supports SUSE CaaS Platform, Google GKE, Microsoft AKS and Amazon EKS, so you are not locked into a specific cloud or a Kubernetes stack. This is one important advantage compared to OpenShift, which is only available on RedHat platform.
SUSE contributes to some important Cloud Foundry projects (as originator and/or project leader):
– Stratos (the management UI for Cloud Foundry)
– CF containerization (to run Cloud Foundry itself in Kubernetes, with a smaller footprint than VM)
– Eirini (to run user applications developed with CF containerization)
A SUSE partner presented also an implementation of CAP at a customer (an international organization in Switzerland) with the following lessons at the end:
– separate legacy from Cloud native
– security requirements make things more complex
– backup strategy must be defined in different ways

SLES for SAP and SUSE Manager

As the last session, Christian Pfenninger (Senior Pre-Sales Engineer) presented SUSE Linux Enterprise Server for SAP.

Actually SUSE is the first operating system for SAP HANA. Different deployment options are available: on-premise, on the public or a private Cloud.
In addition, SUSE Linux Enterprise offers support for Intel Optane DC persistent memory, which provides high performances and a faster recovery after a system restart, that means less downtime.
Christian also talked about SUSE Manager, the infrastructure management tool for Linux systems which is useful for configuration management, patching management, monitoring and reporting, security auditing, etc…

Conclusion

The day ended with a lunch with the SUSE team and other participants: it’s always good to share knowledge and have some networking moments! And this is exactly the spirit of dbi services.
The event was very interesting and well prepared, both commercial and technical staff could appreciate it.
With the new trends in the IT/Ops fields such as Open Source, DevOps, Agility, SUSE is becoming an interesting solution. So stay tuned with the chameleon… 😉

Cet article SUSE Expert Days 2019 in Lausanne est apparu en premier sur Blog dbi services.

An exotic feature in the content server: check_client_version

$
0
0

An exotic feature in the content server: check_client_version

A few months ago, I tripped over a very mysterious problem while attempting to connect to a 7.3 CS docbase from within dqMan.
We had 3 docbases and we could connect using this client to all of them but one ! Moreover, we could connect to all three using a remote Documentum Administrator or the local idql/iapi command-line tools. Since we could connect to at least one of them with dqMan, this utility was not guilty. Also, since all three docbases accepted connections, they were all OK in this respect. Ditto for the account used, dmadmin or nominative ones; local connections were possible hence the accounts were all active and, as they could be used from within the remote DA, their identification method and password were correct too.
We tried connecting from different workstations in order to check the dqMan side, we cleared its caches, we reinstalled it, but to no avail. We checked the content server’s log, as usual nothing relevant. It was just the combination of this particular docbase AND dqMan. How strange !
So what the heck was wrong here ?
As we weren’t the only administrators of those repositories, we more or less suspecting someone else change some setting but which one ? Ok, I sort of gave it away in the title but please bear with me and read on.
I don’t remember exactly how, we were probably working in panic mode, but we eventually decided to compare the docbases’ dm_docbase_config object side by side as shown below (with some obfuscation):

paste <(iapi bad_boy -Udmadmin -Pxxx <<eoq | awk '{print substr($0, 1, 80)}'
retrieve,c,dm_docbase_config
dump,c,l
quit
eoq
) <(iapi good_boy -Udmadmin -Pxxx <<eoq | awk '{print substr($0, 1, 80)}'
retrieve,c,dm_docbase_config
dump,c,l
quit
eoq
) | column -c 30 -s $'\t' -t | tail +11 | head -n 48
USER ATTRIBUTES                                          USER ATTRIBUTES
  object_name                     : bad_boy                object_name                     : good_boy
  title                           : bad_boy Repository     title                           : good_boy Global Repository
  subject                         :                        subject                         :
  authors                       []:                        authors                       []: 
  keywords                      []:                        keywords                      []: 
  resolution_label                :                        resolution_label                :
  owner_name                      : bad_boy                owner_name                      : good_boy
  owner_permit                    : 7                      owner_permit                    : 7
  group_name                      : docu                   group_name                      : docu
  group_permit                    : 5                      group_permit                    : 5
  world_permit                    : 3                      world_permit                    : 3
  log_entry                       :                        log_entry                       :
  acl_domain                      : bad_boy                acl_domain                      : good_boy
  acl_name                        : dm_450xxxxx80000100    acl_name                        : dm_450xxxxx580000100
  language_code                   :                        language_code                   :
  mac_access_protocol             : nt                     mac_access_protocol             : nt
  security_mode                   : acl                    security_mode                   : acl
  auth_protocol                   :                        auth_protocol                   :
  index_store                     : DM_bad_boy_INDEX       index_store                     : DM_good_boy_INDEX
  folder_security                 : T                      folder_security                 : T
  effective_date                  : nulldate               effective_date                  : nulldate
  richmedia_enabled               : T                      richmedia_enabled               : T
  dd_locales                   [0]: en                     dd_locales                   [0]: en
  default_app_permit              : 3                      default_app_permit              : 3
  oldest_client_version           :                        oldest_client_version           :
  max_auth_attempt                : 0                      max_auth_attempt                : 0
  client_pcaching_disabled        : F                      client_pcaching_disabled        : F
  client_pcaching_change          : 1                      client_pcaching_change          : 1
  fulltext_install_locs        [0]: dsearch                fulltext_install_locs        [0]: dsearch
  offline_sync_level              : 0                      offline_sync_level              : 0
  offline_checkin_flag            : 0                      offline_checkin_flag            : 0
  wf_package_control_enabled      : F                      wf_package_control_enabled      : F
  macl_security_disabled          : F                      macl_security_disabled          : F
  trust_by_default                : T                      trust_by_default                : T
  trusted_docbases              []:                        trusted_docbases              []: 
  login_ticket_cutoff             : nulldate               login_ticket_cutoff             : nulldate
  auth_failure_interval           : 0                      auth_failure_interval           : 0
  auth_deactivation_interval      : 0                      auth_deactivation_interval      : 0
  dir_user_sync_on_demand         : F                      dir_user_sync_on_demand         : F
  check_client_version            : T                      check_client_version            : F
  audit_old_values                : T                      audit_old_values                : T
  docbase_roles                 []:                        docbase_roles                [0]: Global Registry
  approved_clients_only           : F                      approved_clients_only           : F
  minimum_owner_permit            : 2                      minimum_owner_permit            : 0
  minimum_owner_xpermit           :                        minimum_owner_xpermit           :
  dormancy_status                 :                        dormancy_status                 :

The only significant differences were the highlighted ones and the most obvious one was the attribute check_client_version, it was turned on in the bad_boy repository. Now that we finally had something to blame, the universe started making sense again ! We quickly turned this setting to false and could eventually connect to that recalcitrant docbase. But the question is still open: check against what ? What criteria was applied to refuse dqman access to bad_boy but to allow it to good_boy ? That was still not clear, even though we could work around it.
Now, who and why turned it on, that had to remain a mystery.
While we were at it, we also noticed another attribute which seemed to be related to the previous one: oldest_client_version.
Was there any other client_% attribute in dm_docbase_config ?

paste <(iapi good_boy -Udmadmin -Pdmadmin <<eoq | grep client
retrieve,c,dm_docbase_config
dump,c,l
quit
eoq) <(iapi bad_boy -Udmadmin -Pdmadmin <<eoq | grep client
retrieve,c,dm_docbase_config
dump,c,l
quit
eoq) | column -s $'\t' -t
  oldest_client_version           :      oldest_client_version           : 
  client_pcaching_disabled        : F    client_pcaching_disabled        : F
  client_pcaching_change          : 1    client_pcaching_change          : 1
  check_client_version            : F    check_client_version            : T
  approved_clients_only           : F    approved_clients_only           : F

Yes, but they looked quite harmless in the current context.
Thus, the relevant attributes here are check_client_version and oldest_client_version. Let’s discover a bit more about them.

Digging

As usual, the documentation is a bit scketchy about these attributes:

check_client_version Boolean S T means that the repository
                               servers will not accept connections
                               from clients older than the
                               version level specified in the
                               oldest_client_version property.
                               F means that the servers accept
                               connections from any client version.
                               The default is F.

oldest_client _version string(32) S Version number of the oldest
                                    Documentum client that will access
                                    this repository.
                                    This must be set manually. It is used
                                    by the DFC to determine how to
                                    store chunked XML documents. If
                                    check_client_version is set to T,then
                                    this value is also used to identify the
                                    oldest client version level that may
                                    connect to the repository.

But what is the client version ? Logically, it is the version of its DfCs or, for older clients, the version of the dmcl shared library.
So, if check_client_version is true, the client version is checked and if it is older than the one defined in oldest_client_version, the client is forbidden to connect. That makes sense except that in our case, oldest_client_version was empty. Maybe in such a case, the client has to match exactly the content server’s DfC version ? As dqMan was either using the dmcl40.dll library or an old Dfc version, it was rejected. Let’s verify these hypothesis with a 16.4 target repository.
Connecting from an ancient 5.3 client
We exhumed an old 5.3 CS installation to use its client part with the default configuration in the target docbase:

dmadmin@osboxes:~/documentum53$ idql dmtest -Udmadmin -Pdmadmin
 
 
Documentum idql - Interactive document query interface
(c) Copyright Documentum, Inc., 1992 - 2004
All rights reserved.
Client Library Release 5.3.0.115 Linux
 
 
Connecting to Server using docbase dmtest
[DM_SESSION_I_SESSION_START]info: "Session 0100c35080003913 started for user dmadmin."
 
 
Connected to Documentum Server running Release 16.4.0080.0129 Linux64.Oracle

Fine so far.
Let’s activate the dm_docbase_config.check_client_version in the target:

retrieve,c,dm_docbase_config
...
set,c,l,check_client_version
SET> T
...
OK
API> save,c,l
...
[DM_DCNFG_E_CANT_SAVE]error: "Cannot save dmtest docbase_config."
 
[DM_DCNFG_E_SET_OLDEST_CLIENT_VERSION_FIRST]error: "The docbase_config object attribute oldest_client_version has to be set before setting attribute check_client_version to T."

Interesting. At that time, this attribute was empty and yet the check_client_version was active. Is this constraint new in 16.4 or did the unknow administrator hack around this ? As I don’t have a 7.x repository available right now, I cannot test this point.
Let’s play by the rules and set oldest_client_version:

reset,c,l
set,c,l,oldest_client_version
16.4
save,c,l
OK
set,c,l,check_client_version
SET> T
...
OK
API> save,c,l
...
OK

Try connecting from the 5.3 client: still OK.
Maybe a reinit is necessary to actuate the changes:

reinit,c

Try again:

dmadmin@osboxes:~/documentum53$ idql dmtest -Udmadmin -Pdmadmin
&nbps;
&nbps;
Documentum idql - Interactive document query interface
(c) Copyright Documentum, Inc., 1992 - 2004
All rights reserved.
Client Library Release 5.3.0.115 Linux
 
 
Connecting to Server using docbase dmtest
Could not connect
[DM_SESSION_E_START_FAIL]error: "Server did not start session. Please see your system administrator or check the server log.
Error message from server was:
[DM_SESSION_E_AUTH_FAIL]error: "Authentication failed for user dmadmin with docbase dmtest."
 
"

So a reinit it required indeed.
Note the misleading error, it is not the authentication that is wrong but the client version validation. It is such wrong messages that make diagnosis of Documentum problems so hard and time-consuming. Anyway, let’s revert the check_client_version to F:

set,c,l,check_client_version
F
save,c,l
reinit,c

Try connecting: OK. So, the client version filtering is effective. Let’s try it with a 5.3 client version:

API> set,c,l,oldest_client_version
SET> 5.3
...
OK
API> save,c,l
...
OK
API> set,c,l,check_client_version
SET> T
...
OK
API> save,c,l
...
OK
API> reinit,c
...
OK

Try connecting: OK, that’s expected.
Let’s try it for a minimum 5.2 client version: it still works, which is expected too since the test client’s version is 5.3 and in my books 5.3 > 5.2.
Let’s try it for a miminum a 5.4 client version: the connection fails, so client version checking works as expected here.
Let’s try it for a miminum a 20.0 client version: the connection fails as expected. No check on the version’s value is done, which is quite understandable programmatically speaking, although a bit optimistic in the context of the turmoil Documentum went through lately.
Let’s go back to a more realistic value:

API> set,c,l,oldest_client_version
SET> 7.2
...
[DM_SESSION_E_AUTH_FAIL]error: "Authentication failed for user dmadmin with docbase dmtest."
 
 
API> save,c,l

Oops, interestingly, the last change did not make it because with the current setting so down the way into the future, the present client’s session was disconnected and there is no way to reconnect in order to revert it !
Let’s do the rollback from the database level directly:

sqlplus dmtest@orcl
 
SQL*Plus: Release 12.2.0.1.0 Production on Mon Jun 10 16:25:56 2019
 
Copyright (c) 1982, 2016, Oracle. All rights reserved.
 
Enter password:
Last Successful login time: Mon Jun 10 2019 16:25:40 +02:00
 
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
 
SQL> update dm_docbase_config_s set check_client_version = 0;
 
1 row updated.
SQL> commit;
 
Commit complete.
 
quit;

Try to connect:

iapi dmtest@docker:1489
Please enter a user (dmadmin):
Please enter password for dmadmin:
 
 
OpenText Documentum iapi - Interactive API interface
Copyright (c) 2018. OpenText Corporation
All rights reserved.
Client Library Release 16.4.0070.0035
 
 
Connecting to Server using docbase dmtest
[DM_SESSION_E_AUTH_FAIL]error: "Authentication failed for user dmadmin with docbase dmtest."

Still not ok because the reinit is missing but for this we need to connect which we still cannot because of the missing reinit. To break this catch-22 situation, let’s cut the gordian knot and kill the dmtest docbase’s processes:

dmadmin@docker:~$ ps ajxf | grep dmtest
1 27843 27843 27843 ? -1 Ss 1001 0:00 ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
27843 27849 27843 27843 ? -1 S 1001 0:00 \_ /app/dctm/product/16.4/bin/mthdsvr master 0xe901fd2f, 0x7f8a50658000, 0x223000 50000 5 27843 dmtest /app/dctm/dba/log
27849 27850 27843 27843 ? -1 Sl 1001 0:03 | \_ /app/dctm/product/16.4/bin/mthdsvr worker 0xe901fd2f, 0x7f8a50658000, 0x223000 50000 5 0 dmtest /app/dctm/dba/log
27849 27861 27843 27843 ? -1 Sl 1001 0:03 | \_ /app/dctm/product/16.4/bin/mthdsvr worker 0xe901fd2f, 0x7f8a50658000, 0x223000 50000 5 1 dmtest /app/dctm/dba/log
27849 27874 27843 27843 ? -1 Sl 1001 0:03 | \_ /app/dctm/product/16.4/bin/mthdsvr worker 0xe901fd2f, 0x7f8a50658000, 0x223000 50000 5 2 dmtest /app/dctm/dba/log
27849 27886 27843 27843 ? -1 Sl 1001 0:03 | \_ /app/dctm/product/16.4/bin/mthdsvr worker 0xe901fd2f, 0x7f8a50658000, 0x223000 50000 5 3 dmtest /app/dctm/dba/log
27849 27899 27843 27843 ? -1 Sl 1001 0:03 | \_ /app/dctm/product/16.4/bin/mthdsvr worker 0xe901fd2f, 0x7f8a50658000, 0x223000 50000 5 4 dmtest /app/dctm/dba/log
27843 27862 27843 27843 ? -1 S 1001 0:00 \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
27843 27863 27843 27843 ? -1 S 1001 0:00 \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
27843 27875 27843 27843 ? -1 S 1001 0:00 \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
27843 27887 27843 27843 ? -1 S 1001 0:00 \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
27843 27901 27843 27843 ? -1 S 1001 0:00 \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini
27843 27944 27843 27843 ? -1 Sl 1001 0:06 \_ ./dm_agent_exec -docbase_name dmtest.dmtest -docbase_owner dmadmin -sleep_duration 0
27843 27962 27843 27843 ? -1 S 1001 0:00 \_ ./documentum -docbase_name dmtest -security acl -init_file /app/dctm/dba/config/dmtest/server.ini

and:

kill -9 -27843

After restarting the docbase, the connectivity was restored.
So, be cautious while experimenting ! Needless to say, avoid doing it in a production docbase or in any heavily used development docbase for that matter, or the wrath of the multiverses and beyond will fall upon you and you will be miserable for ever.
Connecting from a 7.3 client
The same behavior and error messages as with the precedent 5.3 client were observed with a more recent 7.3 client and, inferring from the incident above, later clients behave the same way.

Conclusion

We never stop learning stuff with Documentum ! While this client version limiting feature looks quite exotic, it may make sense in order to avoid surprises or even corruptions when using newly implemented extensions or existing but changed areas of the content server. It is possible that new versions of the DfCs behave differently from older ones in dealing with the same functionalities and Documentum had no better choice but to cut the older versions off to prevent any conflict. As usual, the implementation looks a bit hasty with inapt error messages costing hours of investigation and the risk to cut oneself off a repository.

Cet article An exotic feature in the content server: check_client_version est apparu en premier sur Blog dbi services.

Connecting to a Repository via a Dynamically Edited dfc.properties File (part I)

$
0
0

Connecting to a Repository via a Dynamically Edited dfc.properties File

Now that we have containerized content servers, it is very easy, maybe too easy, to create new repositories. Their creation is still not any faster (whether they are containerized or not is irrelevant here) but given a configuration file it just takes one command to instantiate an image into a running container with working repositories in it. Thus, during experimentation and testing, out of laziness or in a hurry, one can quickly finish up having several containers with identically named repositories, e.g. dmtest01, with an identically named docbroker, e.g. docbroker01. Now, suppose one wants to connect to the docbase dmtest01 running on the 3rd such container using the familiar command-line tools idql/iapi/dmawk. How then to select that particular instance of dmtest01 among all the others ?
To precise the test case, let’s say that we are using a custom bridge network to link the containers together on the docker host (appropriately named docker) which is a VirtualBox VM running an Ubuntu flavor. The metal also runs natively the same Ubuntu distro. It looks complicated but actually matches the common on-premises infrastructure type where the metal is an ESX or equivalent, its O/S is the hypervisor and the VMs run a Redhat or Suse distro. As this is a local testing environment, no DNS or network customizations have been introduced save for the custom bridge.
We want to reach a remote repository either from container to container or from container to host or from host to container.
The problem here stems from the lack of flexibility in the docbroker/dfc.properties file mechanism and no network fiddling can work around this.

It’s All in The dfc.properties File

Containers have distinct host names, so suffice it to edit their local dfc.properties file and edit this field only. Their file may all look like the one below:

dfc.docbroker.host[0]=container01
dfc.docbroker.port[0]=1489
dfc.docbroker.host[1]=docker
dfc.docbroker.port[1]=1489
dfc.docbroker.host[3]=container011
dfc.docbroker.port[3]=1489
dfc.docbroker.host[4]=container02
dfc.docbroker.port[4]=1489

In effect, the custom bridge network embeds a DNS for all the attached containers, so their host names are known to each other (but not to the host so IP address must be used from there or the host’s /etc/hosts file must be edited). The docbroker ports are the ones inside the containers and have all the same value 1489 because they were created out of the same configuration files. The docker entry has been added to the containers’ /etc/host file via the ––add-host= clause of the docker run’s command.
For the containers’ host machine, where a Documentum repository has been installed too, the dfc.properties file could look like this one:

dfc.docbroker.host[0]=docker
dfc.docbroker.port[0]=1489
dfc.docbroker.host[1]=docker
dfc.docbroker.port[1]=2489
dfc.docbroker.host[3]=docker
dfc.docbroker.port[3]=3489
dfc.docbroker.host[4]=docker
dfc.docbroker.port[4]=5489

Here, the host name is the one of the VM where the containers sit and is the same for all the containers. The port numbers differ because they are the external container’s port which are published in the host VM and mapped to the respective docbroker’s internal port, 1489. Since the containers share the same custom network, their host names, IP addresses and external ports must all be different when running the image, or docker won’t allow it.
Alternatively, the container’s IP addresses and internal docbroker’s ports could be used directly too if one is too lazy to declare the containers’ host names in the host’s /etc/hosts file, which is generally the case when testing:

dfc.docbroker.host[0]=docker 
dfc.docbroker.port[0]=1489
dfc.docbroker.host[1]=192.168.33.101
dfc.docbroker.port[1]=1489
dfc.docbroker.host[2]=192.168.33.102
dfc.docbroker.port[2]=1489
dfc.docbroker.host[3]=192.168.33.104
dfc.docbroker.port[3]=1489

The host’s custom network will take care of routing the traffic into the respective containers.
Can you spot the problem now ? As all the containers contain identically named repositories (for clarity, let’s say that we are looking for the docbase dmtest01), the first contacted docbroker in that file will always reply successfully because there is indeed a dmtest01 docbase in that container and consequently one will always be directed to the docbase container01.dmtest01. If one wants to contact container03.dmtest01, this configuration won’t let do it. One would need to edit it and move the target container03 host in the first position, which is OK until one wants to access container02.dmtest01 or go back to container01.dmtest01.
This situation has been existing forever but containers make it more obvious because they make it so much easier to have repository homonyms.
So is there a simpler way to work around this limitation than editing back and forth a configuration file or giving different names to the containerized repositories ?

A Few Reminders

Documentum has made quite a lot of design decisions inspired by the Oracle DBMS but their implementation is far from offering the same level of flexibility and power, and this is often irritating. Let’s consider the connectivity for example. Simply speaking, Oracle’s SQL*Net configuration relies mainly on a tnsnames.ora file for the connectivity (it can also use a centralized ldap server but let’s keep it simple). This file contains entries used to contact listeners and get the information needed to connect to the related database. Minimal data to provide in the entries are the listener’s hostname and port, and the database sid or service name, e.g.:

...
ORCL =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = db)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = db_service)
    )
  )
...

A connection to the database db_service can simply be requested as follows:

sqlplus scott@orcl

orcl is the SQL*Net alias for the database served by db_service. It works like an index in a lookup table, the tnsnames.ora file.
Compare this with a typical dfc.properties file, e.g. /home/dmadmin/documentum/shared/config/dfc.properties:

...
dfc.docbroker.host[0]=docker
dfc.docbroker.port[0]=1489
dfc.docbroker.host[1]=dmtest
dfc.docbroker.port[1]=1489
...

Similarly, instead of contacting listeners, we have here docbrokers. A connection to the docbase dmtest can be requested as follows:

idql dmtest

dmtest is the target repository. It is not a lookup key in the dfc.properties file. Unlike the tnsnames.ora file and its aliases, there is an indirection here and the dfc.properties file does not directly tell where to find a certain repository, it just lists the docbrokers to be sequentially queried about it until the first one that knows the repository (or an homonym thereof) answers. If the returned target docbase is the wrong homonym, tough luck, it will not be reachable, unless the order of the entries is changed. Repositories announces themselves to the docbrokers by “projecting” themselves. If two repositories by the same name project to the same docbroker, no error is raised but the docbroker can return unexpected results, e.g. one may finish up in the unintended docbase.
Another major difference is that with Oracle but not with Documentum, it is possible to bypass the tnsnames.ora file by specifying the connection data in-line, e.g. on the command-line:

sqlplus scott@'(DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = db)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = db_service)
    )
  )'

This can be very useful when editing the local, official listener.ora file is not allowed, and sometimes faster than setting $TNS_ADMIN to an accessible local directory and editing a private listener.ora file there.
This annoyance is even more frustrating because Documentum’s command-line tools do support a similar syntax but for a different purpose:

idql repository[.service][@machine] [other parameters]

While this syntax is logically useful to access the service (akin to an Oracle’s instance but for a HA Documentum installation), it is used in a distributed repository environment to contact a particular node’s docbroker; however, it still does not work if that docbroker is not first declared in the local dfc.properties file.
Last but not least, one more reason to be frustrated is that the DfCs do allow to choose a specific docbroker when opening a session, as illustrated by the jython snippet below:

import traceback
import com.documentum.fc.client as DFCClient
import com.documentum.fc.common as DFCCommon

docbroker_host = "docker"
docbroker_port = "1489"
docbase = "dmtest"
username = "dmadmin"
password = "dmadmin"
print("attempting to connect to " + docbase + " as " + username + "/" + password + " via docbroker on host " + docbroker_host + ":" + docbroker_port)
try:
  client = DFCClient.DfClient.getLocalClient()

  config = client.getClientConfig()
  config.setString ("primary_host", docbroker_host)
  config.setString ("primary_port", docbroker_port)

  logInfo = DFCCommon.DfLoginInfo()
  logInfo.setUser(username)
  logInfo.setPassword(password)
  docbase_session = client.newSession(docbase, logInfo)

  if docbase_session is not None:
    print("Connected !")
  else:
    print("Couldn't connect !")
except Exception:
  traceback.print_exc()

Content of dfc.properties:

$ cat documentum/shared/config/dfc.properties
dfc.date_format=dd.MM.yyyy HH:mm:ss

Execution:

$ jython ./test.jy
...
attempting to connect to dmtest as dmadmin/dmadmin via docbroker docker
Connected !

Despite a dfc.properties file devoid of any docbroker definition, the connection was successful. Unfortunately, this convenience has not been carried over to the vegetative command-line tools.
While we can dream and hope for those tools to be resurrected and a backport miracle to happen (are you listening OTX ?), the next best thing is to tackle ourselves this shortcoming and implement an as unobtrusive as possible solution. Let’s see how.

A few Proposals

Currently, one has to manually edit the local dfc.properties file, but this is tedious to say the least, because changes must sometimes be done twice, forwards and rolled back if the change is only temporary. To avoid this, we could add at once in our local dfc.properties file all the machines that host repositories of interest but this file could quickly grow large and it won’t solve the case of repository homonyms. The situation would become quite unmanageable although an environment variable such as the late DMCL_CONFIG (appropriately revamped e.g. to DFC_PROPERTIES_CONFIG for the full path name of the dfc.properties file to use) could help here to organize those entries. But there is not such a variable any longer for the command-line tools (those tools have stopped evolving since CS v6.x) although there is a property for the DfCs to pass to the JVM at startup, -Ddfc.properties.file, or even the #include clause in the dfc.properties file, or playing with the $CLASSPATH but there is a better way.
What about an on-the-fly, transparent, behind the scenes dfc.properties file editing to support a connection syntax similar to the Oracle’s in-line one ?
Proposal 1
Let’s specify the address of the docbroker of interest directly on the command-line, as follows:

$ idql dmtest01@container03:3489
or
$ idql dmtest01@192.168.33.104:3489

This is more akin to Oracle in-line connection syntax above.
Proposal 2
An alternative could be to use an Oracle’s tnsnames.ora-like configuration file such as the one below (and (in (keeping (with (the (lisp spirit)))))):

dmtest01 = ((docbroker.host = container01) (docbroker.port = 1489))
dmtest02 = ((docbroker.host = container02) (docbroker.port = 1489))
dmtest03 = ((docbroker.host = container03) (docbroker.port = 1489))

and to use it thusly:

$ idql dmtest01@dmtest03

dmtest03 is looked up in the configuration file and replaced on the command-line by its definition.
Proposal 3
With a more concise configuration file that can also be sourced:

dmtest01=container01:1489
dmtest02=container02:1489
dmtest03=container03:1489

and used as follows:

$ export REPO_ALIAS=~/repository_connections.aliases
$ . $REPO_ALIAS
$ ./widql dmtest01@$dmtest03

$dmtest03 is directly fetched from the environment after the configuration file has been sourced, which is equivalent to a lookup. Since the variable substitution occurs at the shell level, it comes free of charge.
With a bit more generalization, it is possible to merge the three proposals together:

$ idql repository(@host_literal:port_number) | @$target

In other words, one can either provide literally the full connection information or provide a variable which will be resolved by the shell from a configuration file to be sourced preliminarily.
Let’s push the configuration file a bit farther and define complete aliases up to the repository name like this:

dmtest=dmtest@docker:1489
or even so:
dmtest=dmtest:docker:1489

Usage:

$ ./widql $dmtest

The shell will expand the alias with its definition. The good thing is the definition styles can be mixed and matched to suit one’s fantasy. Example of a configuration file:

# must be sourced prior so the environment variables can be resolved;
# this is a enhancement over the dfc.properties file syntax used by the dctm_wrapper utility:
# docbroker.host[i]=...
# docbroker.port[i]=...
# it supports several syntaxes:
# docbroker only definition docbroker_host:port;
#    usage: ./widql dmtest@$dmtest
# full definition docbase[@[docbroker_host]:[port]]
#    usage: ./widql $test
# alternate ':' separator docbase:[docbroker_host]:[port];
#    usage: ./widql $dmtestVM
# alias literal;
#    usage: ./widql test
# in order to resolve alias literals, the wrapper will source the configuration file by itself;

# docker.dmtest;
# docbroker only definition;
d_dmtest=docker:1489
# full definition;
f_dmtest=dmtest@docker:1489
# alternate ':' separator;
a_dmtest=dmtest:docker:1489

# container01.dmtest01;
# docbroker only definition;
d_dmtest01=container01:2489
dip_dmtest01=192.168.33.101:1489
# full definition;
f_dmtest01=dmtest01@container01:2489
fip_dmtest01c=dmtest01@192.168.33.101:1489
# alternate ':' separator;
a_dmtest01=dmtest01:container01:2489
aip_dmtest01=dmtest01:192.168.33.101:2489

# container011.dmtest01;
# docbroker only definition;
d_dmtest011=container011:5489
dip_dmtest011=192.168.33.104:1489
# full definition;
f_dmtest011=dmtest01@container011:2489
fip_dmtest011=dmtest01@192.168.33.104:1489
# alternate ':' separator;
a_dmtest011=dmtest01:container011:2489
aip_dmtest011=dmtest01:192.168.33.104:2489

Lines 5 to 14 explains all the supported target syntaxes with a new one presented on lines 12 to 14, which will be explained later in the paragraph entitled Possible Enhancements.
Using lookup variables in a configuration file makes things easier when the host names are hard to remember because better mnemonic aliases can be defined for them. Also, as they are looked up, the entries can be in any order. They must obviously be unique or they will mask each other. A consistent naming convention may be required to easily find one own’s way into this file.
Whenever the enhanced syntax is used, it triggers an automatic editing of the dfc.properties file and the specified connection information is inserted as dfc.docbroker.host and dfc.docbroker.port entries. Then, the corresponding Documentum tool gets invoked and finally the original dfc.properties file is restored when the tool exits. The trigger here is the presence of the @ or : characters in the first command-line parameter.
This would also cover the case when an entry is simply missing from the dfc.properties file. Actually, from the point of view of the command-line tools, all the connection definitions could be handled over to the new configuration file and even removed from dfc.properties as they are dynamically added to and deleted from the latter file as needed.

The Implementation

The above proposal looks pretty easy and fun to implement, so let’s give it a shot. In this article, I’ll present a little script, dctm_wrapper, that builds upon the above @syntax to first edit the configuration file on demand (that’s the dynamic part of the article’s title) and then invoke the standard idql, iapi or dmawk utilities, with an optional rollback of the change on exiting.
Since it is not possible to bypass the dfc.properties files, we will dynamically modify it whenever the @host syntax is used from a command-line tool. As we do no want to replace the official idql, iapi and dmawk tools, yet, we will create new ones, say widql, wiapi and wdmawk (where w stands for wrapper). Those will be symlinks to the real script, dctm-wrapper.sh, which will be able to invoke either idql, iapi or dmawk according to how it was called (bash’s $0 contains the name of the symlink that was invoked, even though its target is always dctm-wrapper.sh, see the script’s source at the next paragraph).
The script dctm-wrapper.sh will support the following syntax:

$ ./widql docbase[@[host][:port]] [other standard parameters] [--verbose] [--append] [--keep] $ ./wiapi docbase[@[host][:port]] [other standard parameters] [--verbose] [--append] [--keep] $ ./wdmawk [-v] docbase[@[host][:port]] [dmawk parameters] [--verbose] [--append] [--keep]

The custom parameters ––verbose, ––append and ––keep are processed by the script and stripped off before invoking the official tools.
wdmawk is a bit special in that the native tool, dmawk, is invoked differently from iapi/idql but I felt that it too could benefit from this little hack. Therefore, in addition to the non-interactive editing of the dfc.properties file, wdmawk also passes on the target docbase name as a -v docbase=… command-line parameter (the standard way to pass parameters in awk) and removes the extended target parameter docbase[@[host][:port]] unless it is prefixed by the -v option in which case it gets forwarded through the -v repo_target= parameter. The dmawk program is then free to use them the way it likes. The repo_target parameter could have been specified on the command-line independently but the -v option can still be useful in cases such as the one below:

$ ./wdmawk docbase@docker:1489 -v repo_target=docbase@docker:1489 '{....}'

which can be shortened to

$ ./wdmawk -v docbase@docker:1489 '{....}'

If the extended target docbase parameter is present, it must be the first one.
If the ‘@’ or ‘:’ characters are missing, it means the enhanced syntax is not used and the script will not attempt to modify dfc.properties; it will pass on all the remaining parameters to the matching official tools.
When @[host][:port] is present, the dfc.properties file will be edited to accommodate the new docbroker’s parameters; all the existing couples dfc.docbroker.host/dfc.docbroker.port will either be removed (if ––append is missing) or preserved (if ––append is present) and a new couple entry will be appended with the given values. Obviously, if one want to avoid the homonym trap, ––append should not be used in order to let the given docbroker be picked up as the sole entry in the property file.
When ––append and ––keep are present, we end up with a convenient way to add docbroker entries into the property file without manually editing it.
As the host is optional, it can be omitted and the one from the first dfc.docbroker.host[] entry will be used instead. Ditto for the port.
Normally, upon returning from the invocation of the original tools, the former dfc.properties file is restored to its original content. However, if ––keep is mentioned, the rollback will not be performed and the modified file will replace the original file. The latter will still be there though but renamed to $DOCUMENTUM_SHARED/config/dfc.properties_saved_YY-MM-DD_HH:MI:SS so it will still be possible to manually roll back. ––keep is mostly useful in conjunction with ––append so that new docbrokers get permanently added to the configuration file.
Finally, when ––verbose is specified, the changes to the dfc.properties file will be sent to stdout; a diff of both the original and the new configuration file will also be shown, along with the final command-line used to invoke the selected original tool. This helps troubleshooting possible command-line parsing issues because, as it can be seen from the code, no extra-effort has been put into this section.

The Code

The script below shows a possible implementation:

#!/bin/bash
# Installation:
# it should not be called directly but through one of the aliases below for the standard tools instead:
# ln -s dctm-wrapper wiapi
# ln -s dctm-wrapper widql
# ln -s dctm-wrapper wdmawk
# where the initial w stands for wrapper;
# and then:
#    ./widql ...
# $DOCUMENTUM_SHARED must obviously exist;
# Since there is no \$DOCUMENTUM_SHARED in eCS ≥ 16.4, set it to $DOCUMENTUM as follows:
#    export DOCUMENTUM_SHARED=$DOCUMENTUM
# See Usage() for details;

Usage() {
   cat - >>EoU
./widql docbase[@[host][:port]] [other standard parameters] [--verbose] [--append] [--keep]
./wiapi docbase[@[host][:port]] [other standard parameters] [--verbose] [--append] [--keep]
./wdmawk [-v] docbase[@[host][:port]] [dmawk -v parameters] [--verbose] [--append] [--keep]
E.g.:
   wiapi dmtest
or:
   widql dmtest@remote_host
or:
   widql dmtest@remote_host:1491 -Udmadmin -Pxxxx
or:
   wiapi dmtest@:1491 --append
or:
   wdmawk -v dmtest01@docker:5489 -f ./twdmawk.awk -v ...
or:
   wdmawk dmtest01@docker:2489 -f ./twdmawk.awk -v ...
or:
   wiapi dmtest@remote_host:1491 --append --keep
etc...
If --verbose is present, the changes applied to \$DOCUMENTUM[_SHARED]/config/dfc.properties are displayed.
If --append is present, a new entry is appended to the dfc.properties file, the value couple dfc.docbroker.host and dfc.docbroker.port, and the existing ones are not commented out so they are still usable;
If --append is not present, all the entries are removed prior to inserting the new one;
If --keep is present, the changed dfc.properties file is not reverted to the changed one, i.e. the changes are made permanent;
If a change of configuration has been requested, the original config file is first saved with a timestamp appended and restored on return from the standard tools, unless --keep is present in which case
the backup file is also kept so it is still possible to manually revert to the original configuration;
wdmawk invokes dmawk passing it the -v docbase=$docbase command-line parameter;
In addition, if -v docbase[@[host][:port]] is used, -v repo_target=docbase[@[host][:port]] is also passed to dmawk;
Instead of a in-line target definition, environment variables can also be used, e.g.:
   widql dmtest@$dmtestVM ...
where $dmtestVM resolves to e.g. docker:1489
or even:
   widql $test01c ...
where $test01c resolves to e.g. dmtest01@container01:1489
As the environment variable is resolved by the shell before it invokes the program, make sure it has a definition, e.g. source a configuration file;
EoU
   exit 0
}

if [[ $# -eq 0 ]]; then
   Usage
fi

# save command;
current_cmd="$0 $*"

# which original program shall possibly be called ?
dctm_program=$(basename $0); dctm_program=${dctm_program:1}
if [[ $dctm_program == "dmawk" ]]; then
   bFordmawk=1 
else
   bFordmawk=0 
fi

# look for the --verbose, --append or --keep options;
# remove them from the command-line if found so they are not passed to the standard Documentum's tools;
# the goal is to clean up the command-line from the enhancements options so it can be passed to the official tools;
bVerbose=0
bAppend=0
bKeep=0
posTarget=1
passTarget2awk=0
while true; do
   index=-1
   bChanged=0
   for i in "$@"; do
      (( index += 1 ))
      if [[ "$i" == "--verbose" ]]; then
         bVerbose=1
         bChanged=1
         break
      elif [[ "$i" == "--append" ]]; then
         bAppend=1
         bChanged=1
         break
      elif [[ "$i" == "--keep" ]]; then
         bKeep=1
         bChanged=1
         break
      elif [[ "$i" == "-v" && $bFordmawk -eq 1 && $index -eq 0 ]]; then
	 passTarget2awk=1
         bChanged=1
         break
      fi
   done
   if [[ $bChanged -eq 1 ]]; then
      set -- ${@:1:index} ${@:index+2:$#-index-1}
   else
      break
   fi
done

[[ bVerbose -eq 1 ]] && echo "current_cmd=[$current_cmd]"

target=$1
remote_info=$(echo $1 | gawk '{
   docbase = ""; hostname = ""; port = ""
   if (match($0, /@[^ \t:]*/)) {
      docbase = substr($0, 1, RSTART - 1)
      hostname = substr($0, RSTART + 1, RLENGTH - 1)
      rest = substr($0, RSTART + RLENGTH)
      if (1 == match(rest, /:[0-9]+/))
         port = substr(rest, 2, RLENGTH - 1)
   }
   else docbase = $0
}
END {
   printf("%s:%s:%s", docbase, hostname, port)
}')
docbase=$(echo $remote_info | cut -d: -f1)
hostname=$(echo $remote_info | cut -d: -f2)
port=$(echo $remote_info | cut -d: -f3)

# any modifications to the config file requested ?
if [[ ! -z $hostname || ! -z $port ]]; then
   # the dfc.properties file must be changed for the new target repository;
   dfc_config=$DOCUMENTUM_SHARED/config/dfc.properties
   if [[ ! -f $dfc_config ]]; then
      echo "$dfc_config not found"
      echo "check the \$DOCUMENTUM_SHARED environment variable"
      echo " in ≥ 16.4, set it to \$DOCUMENTUM"
      exit 1
   fi
   
   # save the current config file;
   backup_file=${dfc_config}_saved_$(date +"%Y-%m-%d_%H:%M:%S")
   cp $dfc_config ${backup_file}

   [[ $bVerbose -eq 1 ]] && echo "changing to $hostname:$port..."
   pid=$$; gawk -v hostname="$hostname" -v port="$port" -v bAppend=$bAppend -v bVerbose=$bVerbose -v bKeep=$bKeep -v pid=$$ 'BEGIN {
      bFirst_hostname = 0; first_hostname = ""
      bFirst_port     = 0 ;    first_port = ""
      max_index = -1
   }
   {
      if (match($0, /^dfc.docbroker.host\[[0-9]+\]=/)) {
         if (!hostname && !bFirst_hostname) {
            # save the first host name to be used if command-line hostname was omitted;
            bFirst_hostname = 1
            first_hostname = substr($0, RLENGTH +1)
         }
         match($0, /\[[0-9]+\]/); index_number = substr($0, RSTART + 1, RLENGTH - 2)
         if (bAppend) {
            # leave the entry;
            print $0
            if (index_number > max_index)
               max_index = index_number
         }
         else {
            # do not, which will remove the entry;
            if (bVerbose)
               print "# removed:", $0 > ("/tmp/tmp_" pid)
         }
      }
      else if (match($0, /^dfc.docbroker.port\[[0-9]+\]=/)) {
         if (!port && !bFirst_port) {
            # save the first port to be used if command-line port was omitted;
            bFirst_port = 1
            first_port = substr($0, RLENGTH +1)
         }
         if (bAppend)
            # leave the entry;
            print $0
         else {
            # do nothing, which will remove the entry;
            if (bVerbose)
               print "# removed:", $0 > ("/tmp/tmp_" pid)
         }
      }
      else print
   }
   END {
      if (!hostname)
         hostname = first_hostname
      if (!port)
         port = first_port
      if (bAppend)
         index_number = max_index + 1
      else
         index_number = 0
      print "dfc.docbroker.host[" index_number "]=" hostname
      print "dfc.docbroker.port[" index_number "]=" port
      if (bVerbose) {
         print "# added: dfc.docbroker.host[" index_number "]=" hostname > ("/tmp/tmp_" pid)
         print "# added: dfc.docbroker.port[" index_number "]=" port     > ("/tmp/tmp_" pid)
      }
      close("/tmp/tmp_" pid)
   }' $dfc_config > ${dfc_config}_new

   if [[ $bVerbose -eq 1 ]]; then
      echo "requested changes:"
      cat /tmp/tmp_$$
      rm /tmp/tmp_$$
      echo "diffs:"
      diff $dfc_config ${dfc_config}_new
   fi 

   mv ${dfc_config}_new $dfc_config
   shift

   if [[ $bFordmawk -eq 1 ]]; then
      docbase="-v docbase=$docbase"
      [[ $passTarget2awk -eq 1 ]] && docbase="-v repo_target=$target $docbase"
   fi
   [[ $bVerbose -eq 1 ]] && echo "calling original: $DM_HOME/bin/${dctm_program} $docbase $*"
   $DM_HOME/bin/${dctm_program} $docbase $*

   # restore original config file;
   [[ $bKeep -eq 0 ]] && mv ${backup_file} $dfc_config
else
   if [[ $bVerbose -eq 1 ]]; then
      echo "no change to current $dfc_config file"
      echo "calling original: $DM_HOME/bin/${dctm_program} $*"
   fi
   $DM_HOME/bin/${dctm_program} $*
fi

The original configuration file is always saved on entry by appending a timestamp precise to the second which, unless you’re the Flash running the command twice in the background with the option ––keep but without ––append, should be enough to preserve the original content.
To make the command-line parsing simpler, the script relies on the final invoked command for checking any syntax errors. Feel free to modify it and make it more robust if you need that. As said earlier, the ––verbose option can help troubleshooting unexpected results here.
See part II of this article for the tests.

Cet article Connecting to a Repository via a Dynamically Edited dfc.properties File (part I) est apparu en premier sur Blog dbi services.

Connecting to a Repository via a Dynamically Edited dfc.properties File (part II)

$
0
0

This is part II of the 2-part article. See for part I of this article.

Testing

We will test on the host machine named docker that hosts 2 containers, container01 and container011. All 3 machines run a repository. Its name is respectively dmtest on docker (shortly, dmtest@docker:1489), dmtest01@container01:1489 (dmtest01@container01:2489 externally) and dmtest01@container011:1489 (dmtest01@container011:5489 externally). Incidentally, the enhanced syntax is also a good way to uniquely identify the repositories.
The current dfc.properties file on the host docker:

$ grep docbroker /app/dctm/config/dfc.properties
dfc.docbroker.host[0]=docker
dfc.docbroker.port[0]=1489

This is used for the local docbase dmtest.
Let’s tag all the docbases for an easy identification later:

$ iapi dmtest -Udmadmin -Pdmadmin <<eoq
retrieve,c,dm_docbase_config
set,c,l,title
dmtest on docker host VM
save,c,l
eoq

Idem from within container01 with its default dfc.properties file:

$ iapi dmtest01 -Udmadmin -Pdmadmin <<eoq
retrieve,c,dm_docbase_config
set,c,l,title
dmtest01 created silently on container01
save,c,l
eoq

Idem from within container011:

$ iapi dmtest01 -Udmadmin -Pdmadmin <<eoq
retrieve,c,dm_docbase_config
set,c,l,title
dmtest01 created silently on container011
save,c,l
eoq

First, let's access container01.dmtest01 from the containers' host VM with the current dfc.properties file:

$ idql dmtest01 -Udmadmin -Pdmadmin
 
 
OpenText Documentum idql - Interactive document query interface
Copyright (c) 2018. OpenText Corporation
All rights reserved.
Client Library Release 16.4.0070.0035
 
 
Connecting to Server using docbase dmtest01
Could not connect
[DM_DOCBROKER_E_NO_SERVERS_FOR_DOCBASE]error: "The DocBroker running on host (docker:1489) does not know of a server for the specified docbase (dmtest01)"

As expected, it does not work because container01.dmtest01 does not project to the host’s docbroker. Now, let’s turn to widql:

$ ./widql dmtest01@docker:2489 -Udmadmin -Pdmadmin --keep <<eoq
select title from dm_docbase_config
go
eoq
OpenText Documentum idql - Interactive document query interface
Copyright (c) 2018. OpenText Corporation
All rights reserved.
Client Library Release 16.4.0070.0035
 
 
Connecting to Server using docbase dmtest01
[DM_SESSION_I_SESSION_START]info: "Session 0100c350800011bb started for user dmadmin."
 
 
Connected to OpenText Documentum Server running Release 16.4.0000.0248 Linux64.Oracle
title
------------------------------------------
dmtest01 created silently on container01

It works.
We used ––keep, therefore the dfc.properties file has changed:

$ grep docbroker /app/dctm/config/dfc.properties
dfc.docbroker.host[0]=docker
dfc.docbroker.port[0]=2489

Indeed.
That docbase can also be reached by the container’s IP address and internal port 1489:

$ docker exec -it container01 ifconfig eth0 | head -3
eth0: flags=4163 mtu 1500
inet 192.168.33.101 netmask 255.255.255.0 broadcast 192.168.33.255
ether 02:42:c0:a8:21:65 txqueuelen 0 (Ethernet)
 
$ ./widql dmtest01@192.168.33.101:1489 -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest01
[DM_SESSION_I_SESSION_START]info: "Session 0100c350800011b5 started for user dmadmin."
...
title
------------------------------------------
dmtest01 created silently on container01

Is the local dmtest docbase still reachable ?:

$ idql dmtest -Udmadmin -Pdmadmin
...
Could not connect
[DM_DOCBROKER_E_NO_SERVERS_FOR_DOCBASE]error: "The DocBroker running on host (docker:2489) does not know of a server for the specified docbase (dmtest)"

Not with that changed dfc.properties file and the standard tools. But by using our nifty little tool:

$ ./widql dmtest@docker:1489 -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connected to OpenText Documentum Server running Release 16.4.0080.0129 Linux64.Oracle
title
----------------------
dmtest on host VM

Fine !
Is container011.dmtest01 reachable now ?

$ ./widql dmtest01 -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest01
...
Connected to OpenText Documentum Server running Release 16.4.0000.0248 Linux64.Oracle
title
-------------------------------------------
dmtest01 created silently on container01

This is container01.dmtest01, not the one we want, i.e. the one on container011.
Note that ./widql was called without the extended syntax so it invoked the standard idql directly.
Let try again:

$ ./widql dmtest01@docker:5489 -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest01
[DM_SESSION_I_SESSION_START]info: "Session 0100c3508000059e started for user dmadmin."
...
title
------------------------------------------
dmtest01 created silently on container011

Here we go, it works !
The same using the container’s IP address and its docbroker’s internal port:

$ docker exec -it container011 ifconfig eth0 | head -3
eth0: flags=4163 mtu 1500
inet 192.168.33.104 netmask 255.255.255.0 broadcast 192.168.33.255
ether 02:42:c0:a8:21:68 txqueuelen 0 (Ethernet)
 
$ ./widql dmtest01@192.168.33.104:5489 -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest01
[DM_SESSION_I_SESSION_START]info: "Session 0100c35080000598 started for user dmadmin."
...
title
------------------------------------------
dmtest01 created silently on container011

Try now the same connection but with ––append and ––keep:

$ ./widql dmtest01@docker:5489 -Udmadmin -Pdmadmin --append --keep <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest01
...
Connected to OpenText Documentum Server running Release 16.4.0000.0248 Linux64.Oracle
title
-------------------------------------------
dmtest01 created silently on container011

What is the content of dfc.properties now ?

$ grep docbroker /app/dctm/config/dfc.properties
dfc.docbroker.host[0]=docker
dfc.docbroker.port[0]=2489
dfc.docbroker.host[1]=docker
dfc.docbroker.port[1]=5489

Both options have been taken into account as expected.
Let’s try to reach the VM host’s repository:

$ ./widql dmtest -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest
Could not connect
[DM_DOCBROKER_E_NO_SERVERS_FOR_DOCBASE]error: "The DocBroker running on host (docker:2489) does not know of a server for the specified docbase (dmtest)"

Specify the docbroker’s host and the ––verbose option:

$ ./widql dmtest@docker -Udmadmin -Pdmadmin --verbose <<eoq
select title from dm_docbase_config
go
eoq
 
changing to docker:...
requested changes:
# removed: dfc.docbroker.host[0]=docker
# removed: dfc.docbroker.port[0]=2489
# removed: dfc.docbroker.host[1]=docker
# removed: dfc.docbroker.port[1]=5489
# added: dfc.docbroker.host[0]=docker
# added: dfc.docbroker.port[0]=2489
diffs:
12,13d11
< dfc.docbroker.host[1]=docker
< dfc.docbroker.port[1]=5489
calling original: /app/dctm/product/16.4/bin/idql dmtest -Udmadmin -Pdmadmin
...
Connecting to Server using docbase dmtest
Could not connect
[DM_DOCBROKER_E_NO_SERVERS_FOR_DOCBASE]error: "The DocBroker running on host (docker:2489) does not know of a server for the specified docbase (dmtest)"

Since the port was not specified, the wrapper took the first port found in the dfc.properties to supply the missing value, i.e. 2489 which is incorrect as dmtest@docker only projects to port docker:1489.
Use an unambiguous command now:

$ ./widql dmtest@docker:1489 -Udmadmin -Pdmadmin ––verbose <<eoq
select title from dm_docbase_config
go
eoq
 
changing to docker:1489...
requested changes:
# removed: dfc.docbroker.host[0]=docker
# removed: dfc.docbroker.port[0]=2489
# removed: dfc.docbroker.host[1]=docker
# removed: dfc.docbroker.port[1]=5489
# added: dfc.docbroker.host[0]=docker
# added: dfc.docbroker.port[0]=1489
diffs:
11,13c11
< dfc.docbroker.port[0]=2489
< dfc.docbroker.host[1]=docker
dfc.docbroker.port[0]=1489
calling original: /app/dctm/product/16.4/bin/idql dmtest -Udmadmin -Pdmadmin
...
Connecting to Server using docbase dmtest
...
Connected to OpenText Documentum Server running Release 16.4.0080.0129 Linux64.Oracle
title
--------------------
dmtest on host VM

Looks OK.
Let’s try wdmawk now. But first, here is the test code twdmawk.awk:

$ cat twdmawk.awk 
BEGIN {
   print "repo_target=" repo_target, "docbase=" docbase
   session = dmAPIGet("connect," docbase ",dmadmin,dmadmin")
   print dmAPIGet("getmessage," session)
   dmAPIGet("retrieve," session ",dm_docbase_config")
   print dmAPIGet("get," session ",l,title")
   dmAPIExec("disconnect," session)
   exit(0)
}

Line 3 displays the two variables automatically passed to dmawk by the wrapper, repo_target and docbase.
The test script connects to the docbase which was silently passed as command-line parameter by wdmawk through the -v option after it extracted it from the given target parameter docbase[@host[:port]], as illustrated below with the ––verbose option.
Let’s see the invocation for the repository on the host VM:

$ ./wdmawk dmtest@docker:1489 -f ./twdmawk.awk --verbose
changing to docker:1489...
requested changes:
# removed: dfc.docbroker.host[0]=docker
# removed: dfc.docbroker.port[0]=2489
# removed: dfc.docbroker.host[1]=docker
# removed: dfc.docbroker.port[1]=5489
# added: dfc.docbroker.host[0]=docker
# added: dfc.docbroker.port[0]=1489
diffs:
11,13c11
< dfc.docbroker.port[0]=2489
< dfc.docbroker.host[1]=docker
––
> dfc.docbroker.port[0]=1489
calling original: /app/dctm/product/16.4/bin/dmawk -v docbase=dmtest -f ./twdmawk.awk
repo_target= docbase=dmtest
[DM_SESSION_I_SESSION_START]info: "Session 0100c3508000367b started for user dmadmin."
 
 
dmtest on host VM

Let’s acces the container01’s repository :

$ ./wdmawk dmtest01@docker:2489 -f ./twdmawk.awk
 
[DM_SESSION_I_SESSION_START]info: "Session 0100c35080001202 started for user dmadmin."
 
 
dmtest01 created silently on container01

A small typo in the port number and …

dmadmin@docker:~$ ./wdmawk dmtest01@docker:3489 -f ./twdmawk.awk
 
[DFC_DOCBROKER_REQUEST_FAILED] Request to Docbroker "docker:3489" failed
 
[DM_SESSION_E_RPC_ERROR]error: "Server communication failure"
 
java.net.ConnectException: Connection refused (Connection refused)

Note the stupid error message “… Connection refused …”, very misleading when investigating a problem. It’s just that there nobody listening on that port.
Let’s access the container011’s repository:

dmadmin@docker:~$ ./wdmawk dmtest01@docker:5489 -f ./twdmawk.awk
 
[DM_SESSION_I_SESSION_START]info: "Session 0100c350800005ef started for user dmadmin."
 
 
dmtest01 created silently on container011

Effect of the -v option:

dmadmin@docker:~$ ./wdmawk -v dmtest01@docker:5489 -f ./twdmawk.awk --verbose
...
calling original: /app/dctm/product/16.4/bin/dmawk -v repo_target=dmtest@docker:1489 -v docbase=dmtest -f ./twdmawk.awk
repo_target=dmtest@docker:1489 docbase=dmtest
[DM_SESSION_I_SESSION_START]info: "Session 0100c35080003684 started for user dmadmin."
 
 
dmtest on host VM

A repo_target parameter with the extended syntax has been passed to dmawk.
Let’s now quickly check the wrapper from within the containers.
Container01
The host’s docbase:

[dmadmin@container01 scripts]$ ./wiapi dmtest@docker:1489 -Udmadmin -Pdmadmin<<eoq
retrieve,c,dm_docbase_config
get,c,l,title
eoq
...
Connecting to Server using docbase dmtest
...
dmtest on host VM

The container011’s docbase:

[dmadmin@container01 scripts]$ ./wiapi dmtest01@container011:1489 -Udmadmin -Pdmadmin<<eoq
retrieve,c,dm_docbase_config
get,c,l,title
eoq
...
Connecting to Server using docbase dmtest01
...
dmtest01 created silently on container011
...

Container011
The host’s docbase:

dmadmin@container011 scripts]$ ./wiapi dmtest@docker:1489 -Udmadmin -Pdmadmin<<eoq
retrieve,c,dm_docbase_config
get,c,l,title
eoq
...
Connecting to Server using docbase dmtest
...
Connected to OpenText Documentum Server running Release 16.4.0080.0129 Linux64.Oracle
...
dmtest on host VM
...

The docbase on container01:

dmadmin@container011 scripts]$ ./wiapi dmtest01@container01:1489 -Udmadmin -Pdmadmin<<eoq
retrieve,c,dm_docbase_config
get,c,l,title
eoq
...
...
Connecting to Server using docbase dmtest01
...
dmtest01 created silently on container01
...

Let’s briefly test the usage of the sourced configuration file. Here is a snippet of the file shown earlier in this article:

# repository connection configuration file;
# must be sourced prior so the environment variables can be resolved;
# this is a enhancement over the dfc.properties file syntax used by the dctm_wrapper utility:
# docbroker.host[i]=...
# docbroker.port[i]=...
# it supports several syntaxes:
# docbroker only definition [[docbroker_host]:[port]];
#    usage: ./widql dmtest@$dmtest
# full definition docbase[@[docbroker_host]:[port]]
#    usage: ./widql $test
# alternate ':' separator docbase:[[docbroker_host]:[docroker_port]];
#    usage: ./widql $dmtestVM
# alias literal;
#    usage: ./widql test
# in order to resolve alias literals, the wrapper will source the configuration file by itself;
...
# container011.dmtest01;
# docbroker only definition docbroker_host:port;
d_dmtest011=container011:5489
di_dmtest011=192.168.33.104:1489
# full definition;
f_dmtest011=dmtest01@container011:2489
fip_dmtest011=dmtest01@192.168.33.104:1489

With a good name convention, the variables can be easily remembered which saves a lot of typing too.
Note on lines 9 and 10 how the whole extended target name can be specified, including the repository name.
A few tests:

dmadmin@docker:~$ ./widql dmtest01@$d_dmtest011 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql dmtest01@container011:5489 -Udmadmin -Pdmadmin --verbose] ...
 
dmadmin@docker:~$ ./widql dmtest01@$dip_dmtest011 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql dmtest01@192.168.33.104:1489 -Udmadmin -Pdmadmin --verbose] ...
 
dmadmin@docker:~$ ./widql $f_dmtest011 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql dmtest01@container011:2489 -Udmadmin -Pdmadmin --verbose] ...
 
dmadmin@docker:~$ ./widql $fip_dmtest011 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql dmtest01@192.168.33.104:1489 -Udmadmin -Pdmadmin --verbose] ...

The variables have been expanded by the shell prior to entering the wrapper, no programming effort was needed here, which is always appreciated.

Possible Enhancements

As shown precedently, the alternate configuration file lists aliases for the couples docbroker:port and even repository@docbroker:port. In passing, the wrapper also supports the version repository:docbroker:port.
Now, in order to better match Documentum syntax, is it possible to be even more transparent by removing dollar signs, colons and at-signs while still accessing the extended syntax ? E.g.:

$ ./widql dmtest -Udmadmin ....

Yes it is. The trick here is to first look up the alias in the configuration file, which incidentally becomes mandatory now, and re-execute the program with the alias resolved. As we are all lazy coders, we will not explicitly code the looking up but instead rely on the shell: the wrapper will source the file, resolve the target and re-execute itself.
If the alias has not been defined in the file, then the wrapper considers it as the name of a repository and falls back to the usual command-line tools.
A good thing is that no new format has to be introduced in the file as the target is still the name of an environment variable.
Since the changes are really minimal, let’s do it. Hereafter, the diff output showing the changes from the listing in part I:

> # this variable points to the target repositories alias file and defaults to repository_connections.aliases;
> REPO_ALIAS=${REPO_ALIAS:-~/repository_connections.aliases}
> 
107a111
> [[ bVerbose -eq 1 ]] && echo "current configuration file=[$REPO_ALIAS]"
225,227c229,241
<    if [[ $bVerbose -eq 1 ]]; then
<       echo "no change to current $dfc_config file"
    [[ -f $REPO_ALIAS ]] && . $REPO_ALIAS
>    definition=${!1}
>    [[ $bVerbose -eq 1 ]] && echo "alias lookup in $REPO_ALIAS: $1 = $definition"
>    if [[ ! -z $definition ]]; then
>       new_cmd=${current_cmd/$1/$definition}
>       [[ $bVerbose -eq 1 ]] && echo "invoking $new_cmd"
>       exec $new_cmd
>    else
>       if [[ $bVerbose -eq 1 ]]; then
>          echo "no change to current $dfc_config file"
>          echo "calling original: $DM_HOME/bin/${dctm_program} $*"
>       fi
>       $DM_HOME/bin/${dctm_program} $*
229d242
<    $DM_HOME/bin/${dctm_program} $*

On line 9, the target configuration file pointed to by the REPO_ALIAS environment variable gets sourced if existing. $REPO_ALIAS defaults to repository_connections.aliases but can be changed before calling the wrapper.
Note on line 10 how bash can dereference a variable 1 containing the name of another variable 2 to get variable 2’s value (indirect expansion), nice touch.
To apply the patch in-place, save the diffs above in diff-file and run the following command:

patch old-file < diff-file

Testing
For conciseness, the tests below only show how the target is resolved. The actual connection has already been tested abundantly earlier.

dmadmin@docker:~$ ./widql f_dmtest -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql f_dmtest -Udmadmin -Pdmadmin --verbose] alias lookup in /home/dmadmin/repository_connections.aliases: f_dmtest = dmtest@docker:1489
invoking ./widql dmtest@docker:1489 -Udmadmin -Pdmadmin --verbose
current_cmd=[/home/dmadmin/widql dmtest@docker:1489 -Udmadmin -Pdmadmin --verbose] ...
dmadmin@docker:~$ ./widql fip_dmtest01 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql fip_dmtest01 -Udmadmin -Pdmadmin --verbose] alias lookup in /home/dmadmin/repository_connections.aliases: fip_dmtest01 = dmtest01@192.168.33.2:1489
invoking ./widql dmtest01@192.168.33.2:1489 -Udmadmin -Pdmadmin --verbose
current_cmd=[/home/dmadmin/widql dmtest01@192.168.33.2:1489 -Udmadmin -Pdmadmin --verbose] ...
dmadmin@docker:~$ ./widql fip_dmtest011 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql fip_dmtest011 -Udmadmin -Pdmadmin --verbose] alias lookup in /home/dmadmin/repository_connections.aliases: fip_dmtest011 = dmtest01@192.168.33.3:1489
invoking ./widql dmtest01@192.168.33.3:1489 -Udmadmin -Pdmadmin --verbose
current_cmd=[/home/dmadmin/widql dmtest01@192.168.33.3:1489 -Udmadmin -Pdmadmin --verbose]

Note how the targets are cleaner now, no curly little fancy shell characters in front.

Conclusion

As I was testing this little utility, I was surprised to realize how confortable and natural its usage is. It feels actually better to add the docbroker’s host and port than to stop at the docbase name, probably because it makes the intented repository absolutely unambiguous. The good thing is that is almost invisible, except for its invocation but even this can be smoothed out by using command aliases or renaming the symlinks.
When one has to work with identically named docbases or with clones existing in different environments, dctm-wrapper brings a real relief. And it was quick and easy to put together too.
As it modifies an essential configuration file, it is mainly aimed at developers or administrators on their machine, but then those constitute the targeted audience anyway.
As always, if you have any ideas for some utility that could benefit us all, please do no hesitate to suggest them in the comment section. Feedback is welcome too of course.

Cet article Connecting to a Repository via a Dynamically Edited dfc.properties File (part II) est apparu en premier sur Blog dbi services.

PostgreSQL partitioning (8): Sub-partitioning

$
0
0

We are slowly coming to the end of this little series about partitioning in PostgreSQL. In the last post we had a look at indexing and constraints and today we will have a look at sub partitioning. Sub partitioning means you go one step further and partition the partitions as well. Although it is not required to read all the posts of this series to follow this one: If you want, here they are:

  1. PostgreSQL partitioning (1): Preparing the data set
  2. PostgreSQL partitioning (2): Range partitioning
  3. PostgreSQL partitioning (3): List partitioning
  4. PostgreSQL partitioning (4) : Hash partitioning
  5. PostgreSQL partitioning (5): Partition pruning
  6. PostgreSQL partitioning (6): Attaching and detaching partitions
  7. PostgreSQL partitioning (7): Indexing and constraints

Coming back to our range partitioned table this is how it looks like currently:

postgres=# \d+ traffic_violations_p
                                      Partitioned table "public.traffic_violations_p"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: RANGE (date_of_stop)
Partitions: traffic_violations_p_2013 FOR VALUES FROM ('2013-01-01') TO ('2014-01-01'),
            traffic_violations_p_2014 FOR VALUES FROM ('2014-01-01') TO ('2015-01-01'),
            traffic_violations_p_2015 FOR VALUES FROM ('2015-01-01') TO ('2016-01-01'),
            traffic_violations_p_2016 FOR VALUES FROM ('2016-01-01') TO ('2017-01-01'),
            traffic_violations_p_2017 FOR VALUES FROM ('2017-01-01') TO ('2018-01-01'),
            traffic_violations_p_2018 FOR VALUES FROM ('2018-01-01') TO ('2019-01-01'),
            traffic_violations_p_2019 FOR VALUES FROM ('2019-01-01') TO ('2020-01-01'),
            traffic_violations_p_2020 FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'),
            traffic_violations_p_2021 FOR VALUES FROM ('2021-01-01') TO ('2022-01-01'),
            traffic_violations_p_default DEFAULT

Lets assume that you expect that traffic violations will grow exponentially in 2022 because more and more cars will be on the road and when there will be more cars there will be more traffic violations. To be prepared for that you do not only want to partition by year but also by month. In other words: Add a new partition for 2022 but sub partition that by month. First of all you need a new partition for 2022 that itself is partitioned as well:

create table traffic_violations_p_2022
partition of traffic_violations_p
for values from ('2022-01-01') to ('2023-01-01') partition by range(date_of_stop);

Now we can add partitions to the just created partitioned partition:

create table traffic_violations_p_2022_jan
partition of traffic_violations_p_2022
for values from ('2022-01-01') to ('2022-02-01');

create table traffic_violations_p_2022_feb
partition of traffic_violations_p_2022
for values from ('2022-02-01') to ('2022-03-01');

create table traffic_violations_p_2022_mar
partition of traffic_violations_p_2022
for values from ('2022-03-01') to ('2022-04-01');

create table traffic_violations_p_2022_apr
partition of traffic_violations_p_2022
for values from ('2022-04-01') to ('2022-05-01');

create table traffic_violations_p_2022_may
partition of traffic_violations_p_2022
for values from ('2022-05-01') to ('2022-06-01');

create table traffic_violations_p_2022_jun
partition of traffic_violations_p_2022
for values from ('2022-06-01') to ('2022-07-01');

create table traffic_violations_p_2022_jul
partition of traffic_violations_p_2022
for values from ('2022-07-01') to ('2022-08-01');

create table traffic_violations_p_2022_aug
partition of traffic_violations_p_2022
for values from ('2022-08-01') to ('2022-09-01');

create table traffic_violations_p_2022_sep
partition of traffic_violations_p_2022
for values from ('2022-09-01') to ('2022-10-01');

create table traffic_violations_p_2022_oct
partition of traffic_violations_p_2022
for values from ('2022-10-01') to ('2022-11-01');

create table traffic_violations_p_2022_nov
partition of traffic_violations_p_2022
for values from ('2022-11-01') to ('2022-12-01');

create table traffic_violations_p_2022_dec
partition of traffic_violations_p_2022
for values from ('2022-12-01') to ('2023-01-01');

Looking at psql’s output when we describe the partitioned table not very much changed, just the keyword “PARTITIONED” is showing up beside our new partition for 2022:

postgres=# \d+ traffic_violations_p
                                      Partitioned table "public.traffic_violations_p"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: RANGE (date_of_stop)
Partitions: traffic_violations_p_2013 FOR VALUES FROM ('2013-01-01') TO ('2014-01-01'),
            traffic_violations_p_2014 FOR VALUES FROM ('2014-01-01') TO ('2015-01-01'),
            traffic_violations_p_2015 FOR VALUES FROM ('2015-01-01') TO ('2016-01-01'),
            traffic_violations_p_2016 FOR VALUES FROM ('2016-01-01') TO ('2017-01-01'),
            traffic_violations_p_2017 FOR VALUES FROM ('2017-01-01') TO ('2018-01-01'),
            traffic_violations_p_2018 FOR VALUES FROM ('2018-01-01') TO ('2019-01-01'),
            traffic_violations_p_2019 FOR VALUES FROM ('2019-01-01') TO ('2020-01-01'),
            traffic_violations_p_2020 FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'),
            traffic_violations_p_2021 FOR VALUES FROM ('2021-01-01') TO ('2022-01-01'),
            traffic_violations_p_2022 FOR VALUES FROM ('2022-01-01') TO ('2023-01-01'), PARTITIONED,
            traffic_violations_p_default DEFAULT

The is where the new functions in PostgreSQL 12 become very handy:

postgres=# select * from pg_partition_tree('traffic_violations_p');
             relid             |        parentrelid        | isleaf | level 
-------------------------------+---------------------------+--------+-------
 traffic_violations_p          |                           | f      |     0
 traffic_violations_p_default  | traffic_violations_p      | t      |     1
 traffic_violations_p_2013     | traffic_violations_p      | t      |     1
 traffic_violations_p_2014     | traffic_violations_p      | t      |     1
 traffic_violations_p_2015     | traffic_violations_p      | t      |     1
 traffic_violations_p_2016     | traffic_violations_p      | t      |     1
 traffic_violations_p_2017     | traffic_violations_p      | t      |     1
 traffic_violations_p_2018     | traffic_violations_p      | t      |     1
 traffic_violations_p_2019     | traffic_violations_p      | t      |     1
 traffic_violations_p_2020     | traffic_violations_p      | t      |     1
 traffic_violations_p_2021     | traffic_violations_p      | t      |     1
 traffic_violations_p_2022     | traffic_violations_p      | f      |     1
 traffic_violations_p_2022_jan | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_feb | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_mar | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_apr | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_may | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_jun | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_jul | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_aug | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_sep | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_oct | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_nov | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_dec | traffic_violations_p_2022 | t      |     2

To verify if data is routed correctly to the sub partitions let’s add some data for 2022:

insert into traffic_violations_p (date_of_stop)
       select * from generate_series ( date('01-01-2022')
                                     , date('12-31-2022')
                                     , interval '1 day' );

If we did the partitioning correctly we should see data in the new partitions:

postgres=# select count(*) from traffic_violations_p_2022_nov;
 count 
-------
    30
(1 row)

postgres=# select count(*) from traffic_violations_p_2022_dec;
 count 
-------
    31
(1 row)

postgres=# select count(*) from traffic_violations_p_2022_feb;
 count 
-------
    28
(1 row)

Here we go. Of course you could go even further and sub-partition the monthly partitions further by day or week. You can also partition by list and then sub-partition the list partitions by range. Or partition by range and then sub-partition by list, e.g.:

postgres=# create table traffic_violations_p_list_dummy partition of traffic_violations_p_list for values in ('dummy') partition by range (date_of_stop);
CREATE TABLE
postgres=# create table traffic_violations_p_list_dummy_2019 partition of traffic_violations_p_list_dummy for values from ('2022-01-01') to ('2023-01-01');
CREATE TABLE
postgres=# insert into traffic_violations_p_list (seqid, violation_type , date_of_stop) values (-1,'dummy',date('2022-12-01'));
INSERT 0 1
postgres=# select date_of_stop,violation_type from traffic_violations_p_list_dummy_2019;
 date_of_stop | violation_type 
--------------+----------------
 2022-12-01   | dummy
(1 row)

That’s it for sub-partitioning. In the final post we will look at some corner cases with partitioning in PostgreSQL.

Cet article PostgreSQL partitioning (8): Sub-partitioning est apparu en premier sur Blog dbi services.

In-person event (June 11,2019) Ansible Automates – Bern

$
0
0

A couple of weeks ago, I found in my inbox an invitation from Red Hat concerning an event organized in Bern about Ansible.
I heard a lot about Automation but did not really know what can or could be automated.
Infrastructure?, Application deployment?, networks?, containers,clouds? All of them?
So I wanted to learn more about Ansible and decided to participate.
This “Ansible Automates” was a full-day event taking place in the KURSAAL hotel, located in the center of Bern, a nice place.

After the registration process and a welcome coffee, I took a seat in the conference room and I was immediately impressed as already more than 140 persons were in.
It seems to be that there was a huge interest for this event.

Some Welcome words from Eduard Mobalek, Red Hat Head of sales and the first session started punctually at 09:15.
it was titled “Automation Everywhere, Product Overview and Roadmap”.
Belkacem Moussouni, head of business development EMEA focused on:
Why should we use Ansible instead of other mature products as Puppet or Chef
and the main reason was because of its simplicity.
By using YAML, a simple configuration language, you could write your own playbook in less than an hour.

The next session was leaded by Nicholas Christener, CEO/CTO of ADFINIS SYGROUP AG.
A customer case introducing a handbook to Ansible following their guidelines and best practices.

After the coffee break, Mr Christoph Bernard from Visana Services, talked about how to automate security-related tasks.
How to use Ansible for various use cases such as fraud detection, network security, governance.

E Guete! on the screen was announcing that it was Lunch Time.
Just outside the conference room, was a huge buffet waiting for us.

At 14:15 started the session called Network Automation with Ansible Tower 3.5.
The speaker introduced all new features and enhancements that improve network automation as:
– Red Hat Enterprise Linux 8
– New Inventory Plugins
– Extended list display
– Enhanced metrics

At 15:30 started the last session called Manage Windows with Ansible.
The speaker, Götz Rieger, Senior Solution Architect showed how to manage Windows Servers – Step by Step using PowerShell scripts and Playbooks

This event ended as usual with an apero together with the Ansible team and all the other participants.
it’s always nice to meet other people, share knowledge and have some networking moments!
Now, let’s say, I know much more about automation but what I learnt overall during this event is
if you need to make your life easier by automating your administration tasks,
Ansible is a good place to start.
All sessions were well prepared,but as I was speaking to one of the Ansible TEAM member,
I told him that the only thing missing in this event was a workshop.

So, let’s wait for next!

Cet article In-person event (June 11,2019) Ansible Automates – Bern est apparu en premier sur Blog dbi services.


PostgreSQL on the beach, PGIBZ, recap

$
0
0

So, finally, PostgreSQL on the beach is over. During discussions it was mentioned that some companies had issues to justify sending people to a conference to Ibiza. It was not really clear if they wanted to avoid the impression of making holidays while going to a conference or if it was just about the location. I am not judging this at all but what I can tell: It was a great, very relaxed and very well organized conference. Of course people went to the beach for swimming during lunch break but that does not mean that there was no content. Just look at the scheduling and you will notice that there have been plenty of really good talks.

I believe there is not other conference where you can see Bruce like that (no bow tie at all):

Btw: If you want to see Bruce live in October you can do that. We’ll be organizing an event the 24th of October in Zürich: “How open source drives innovation”. You can not yet register but the event is listed on our website and Bruce already listed the event as well. Just follow us on the usual channels for updated information.

Very good news: There definitely will be PostgreSQL on the beach next year:

I only can encourage you to join. You will definitely learn a lot of stuff, meet great people and can do a lot of networking. At what conference you can have a view like this?:

Some impressions from some of the talks (I really can not paste pictures from all of them. Check Twitter if you want to see more):

A big thank to Álvaro and the whole team for making that possible:

See you next year in Ibiza.

Cet article PostgreSQL on the beach, PGIBZ, recap est apparu en premier sur Blog dbi services.

Evenement Oracle : Dans la tête d’un Hacker & Comment protéger vos données sensibles

$
0
0

Pour une fois un blog en français parce que concernant un événement Oracle en Français.
Aujourd’hui chez Oracle Suisse à Genève, il y’avait une présentation concernant la sécurité Oracle.
Il s’agissait de se mettre à la place d’un Hacker pour mieux appréhender leurs démarches et ensuite comment protéger les données dans une base de données Oracle.
Comprendre les intentions et moyens utilisés par les Hackers pour parvenir à leurs fins nous aide à mieux les combattre.

La séance était animée par Hakim Loumi – EMEA DB Security PM
Dans un premier temps le conférencier a montré une estimation de l’accroissement des données d’ici 2025. Waw 175 Zb

Et comment ces données pouvaient être sensitives

Impressionnant le nombre d’attaques dans le monde

Hakim a aussi montré pourquoi le Hacking était devenu un de fléaux les plus importants. Dans ce slide ci-dessous pour un investissement de moins de moins de 10$, une identité complète revenait en moyenne à 240$. Quelle rentabilité !!!!!

Evidemment la base de données est une cible principale des attaques

Le conférencier a ensuite présenté les principaux outils fournis par Oracle, principalement nécessitant l’option Advanced Security Option

Data Masking
Data Redaction
Oracle Firewall
Database vault

Et pour terminer sur ce joli slide

Conclusion

L’évenement étatit vraiment intéressant. Le conférencier par des exemples, des anecdotes simples a su capter l’attention du public. On retiendra surtout dans cette présentation que la sécurité des données, n’est pas que l’affaire du DBA. En effet c’est toute une chaines de procédures incluant tout le monde dans l’entreprise.

Cet article Evenement Oracle : Dans la tête d’un Hacker & Comment protéger vos données sensibles est apparu en premier sur Blog dbi services.

work Agile in a GxP-regulated environment

$
0
0

On 4 June 2019 I followed an invitation to

wega-it’s Know-how & Networking Breakfast 2 2019 on “Agile Validation in GxP Projects”.

So they were to discuss Agility in the context of GxP regulation.

I had some earlier exposure to various kinds of compliance roles and topics, and my current work environment is in the highly regulated Pharma industry. So I was really wondering (and hoping for learning) how you can possibly bring the two points of view of Agility and GxP regulation together. The Literaturhaus Basel was to see some very special type of literature that day. Not a formal presentation but a role play performance between these two viewpoints, represented by Evelyne Daniel, an experienced GxP validation expert, and Mathias Fuchs, an equally experienced Scrum master, both from wega-IT. A very nice idea, very appropriate for the topic!

What is Compliance (GxP) and what is Agile?

Typically in GxP compliance we work along the so-called V-model. In its rigidness and wanted plannability of course it corresponds largely to the waterfall model of traditional software development. Opposed to this the Agile Manifesto (2001) criticizes the very basics of these traditional ways of working. Remember the iconic claims like “Working software over comprehensive Documentation” and “Responding to change over following a Plan”. But how would you ever get acceptance in the compliance world without full documentation and planning!?! When I quickly browsed the internet, I found a multitude of statements and proposals which would certainly merit a separate blog post. For this time, I will give a quick summary of the wega breakfast presentation and some touch points with our current working environment in the Pharma industry.

Although in my current work environment we are not actually driving GxP Validation projects, we are still subject to the very tight GxP regulation. In the processes of Change and Release Management, this is reflected in the rigid rules of testing and documentation, to just name the most obvious examples. Background, of course, is the definition of Compliance and its Validation: the goal is to “establish documented evidence” to assure compliance and quality etc. These requirements hold independently of the quality, completeness or even up-to-date status of the pre-defined processes and rules! Call this inflexible and cumbersome! Any adaptation (update!) of the processes and rules is very formal through the complicated administrative processes to be used and hence very slow. Consider this in our fast-moving (not only IT-) world!

What is an MVP?

A nice play of words was interjected in the stage discussion: the acronym MVP has a very clear meaning as a basic concept for both sides, just it is not the same: MVP = Master Validation Plan (in GxP Validation) versus Minimal Viable Product (in Agile or Lean Software Development).

How to bring them together?

Now how to bring the core aspirations of Agile Development like Customer focus, Flexibility, Speed into the Compliance world? A first inevitable step in the V-model world: break up the (dead?) lock between a complete finalization of User Requirements Specification and the setup of a complete Validation Plan prescribing all Qualification criteria (IQ, OQ, PQ). Definition of Done (DoD) plays a major role when trying to cut the end-to-end Development-Validation elephant into smaller pieces. Inclusion of Validation into the “daily” Development activities is another must, instead of adding Validation at the end of Development phases only. Yet another core principle from the Agile side is the ensurance of team Maturity and Mindset. Much-hailed Diversity is opposed to pure compliance-oriented expert teams, striving for innovation and creativity in the team.

WEGA breakfast - Agile Validation in GxP projects

Some basic approaches

The final answer on how to – methodically – combine or maybe rather “emulsify” Agility and Compliance Validation comes as no surprise: there is no one-size-fits-all method. Rather three obvious basic approaches were presented.

  1. introducing Agility right between the left (Specifications) and the right (Qualifications) arms of the V-model, probably using some kind of piloting or prototyping
  2. including Validation into the Agile Development, almost doing Validation in each Agile sprint
  3. appending V-model Validation at the end of an Agile development.

The above-mentioned end-to-end Development-to-Validation elephant has to be broken into smaller better manageable units. Each specific project situation will have its own possible and best way to do it.

Think innovative and creative!

Thanks to wega-informatik (www.wega-it.com)  for organizing this creative and informative event.

 

Cet article work Agile in a GxP-regulated environment est apparu en premier sur Blog dbi services.

Windocks and K8s support

$
0
0

I got recently the 4.08 update from the Windocks team and I was very excited to evaluate some of new features. The first cool one I want to present in this blog concerns the Kubernetes support for deploying Windocks containers that will make my application deployment definitely easier. Let’s say you want to deploy your application that is tied to a Windocks container for SQL Server. In a previous blog post I explained why we are using Windocks in our context. So, with previous versions of Windocks, we had to write custom scripts to deploy applications on K8s that are tied to a Windocks. With the new version 4.08, this process may be simplified because both of applications and their related Windocks containers are directly deployable on K8s by using a YAML deployment file.

In fact, the new way consists in deploying a Windocks SQL Server proxy on K8s that works in conjunction with a Windocks Server. Once the SQL Server proxy deployed a corresponding Windocks container is spinning up with their specific parameters as shown in the picture below:

 

First of all, in order to make access secure between K8s and the Windocks Server authentication is required and we need to provide credential information that will be stored in the sql-proxy secret in K8s. SA password is also included in this secret and will be used to setup the SA account when the Windocks container will spin up.

$ kubectl create secret generic proxy-secrets --from-literal=WINDOCKS_REQUIRED_USERNAME='clustadmin' --from-literal=WINDOCKS_REQUIRED_PASSWORD='StrongPassword' --from-literal=WINDOCKS_REQUIRED_CONTAINER_SAPASSWORD=’sa_password'

 

The next step consists in deploying the Windocks SQL proxy by with the specific environment variables including WINDOCKS_REQUIRED_HOSTNAME (Windocks server name or IP Address), WINDOCKS_REQUIRED_IMAGE_NAME (Windocks based image used for container) and WINDOCKS_SQL_PROXY_OPTIONAL_LISTENING_PORT (optional).

  • The Windocks SQL Proxy YAML file
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: windocks-sql-proxy-secure 
  labels:
    app: sqlproxy-secure 
spec:
  replicas: 1 
  template:
    metadata:
      labels:
        app: sqlproxy-secure 
        tier: frontend
    spec:
      containers:
      - name: sqlproxy-secure-app 
        image: windocks/windocks-sql-server-proxy 
        imagePullPolicy: Always
        ports:
        - name: tcp-proxy
          containerPort: 3087
        - name: tls-proxy
          containerPort: 3088
        envFrom:
          - secretRef:
              name: proxy-secrets
        env:
          - name: PROJECT_ID
            value: project_id_for_GKE_deployment_optional
          - name: WINDOCKS_REQUIRED_HOSTNAME
            value: xx.xxx.xxx.xxx
          - name: WINDOCKS_REQUIRED_IMAGE_NAME
            value: 2012_ci
          - name: WINDOCKS_SQL_PROXY_OPTIONAL_LISTENING_PORT
            value: "3087"

 

If we want to make the SQL Proxy pod accessible from outside a service is needed but this is not mandatory according to the context. Note that you may also use TLS connection to secure the network between K8s and the Windocks server.

  • The Windocks service YAML file
apiVersion: v1
kind: Service
metadata:
  name: windocks-sql-proxy-secure
  labels:
    app: sqlproxy-secure
    tier: frontend
spec:
  sessionAffinity: ClientIP
  type: LoadBalancer
  ports:
  - port: 3087
    name: tcp-proxy-secure-service
    targetPort: 3087
  - port: 3088
    name: tls-proxy-secure-service
    targetPort: 3088
  selector:
    app: sqlproxy-secure
    tier: frontend

 

Let’s give a try on my Azure infrastructure including an AKS cluster and a Windocks Server installed in an Azure VM. I also took the opportunity to create my own helm chart from the YAML files provided by the Windocks team. It will make my deployment easier for sure. Here the command I used to deploy my Windocks helm chart on my AKS cluster.

$ helm install --name windocks2012 --namespace dmk --set Windocks.Image=2012_ci --set Windocks.Port=3089 --set Windocks.PortSSL=3090 .

 

Deployment will be performed in a specific namespace named dmk and the 2012_ci image will be used as based image for my Windocks container. I will be able to connect to my Windocks container by using the 3089 port through the SQL Proxy deployed on K8s. After few seconds the following resources were deployed within my dmk namespace including a Windocks SQL Proxy pod and the Windocks SQL Proxy service.

$ kubectl get all -n dmk
NAME                                                                  READY   STATUS    RESTARTS   AGE
pod/windocks2012-sqlproxy-securewindocks-sql-proxy-secure-56fb8694m   1/1     Running   0          13m

NAME                                                            TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)
                 AGE
service/backend                                                 ClusterIP      10.0.126.154   <none>          80/TCP
                 8d
service/windocks2012-sqlproxy-securewindocks-sql-proxy-secure   LoadBalancer   10.0.252.235   xx.xx.xxx.xxx   3089:30382/TCP,3090:30677/TCP   44m

NAME                                                                    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/windocks2012-sqlproxy-securewindocks-sql-proxy-secure   1         1         1            1           44m

NAME                                                                               DESIRED   CURRENT   READY   AGE
replicaset.apps/windocks2012-sqlproxy-securewindocks-sql-proxy-secure-56fbdb5c96   1         1         1       44m

 

Once deployed, the SQL proxy will redirect all connections from 3089 port to the container port after spinning up the corresponding Windocks container on the Windocks server. We may get some details by taking a look at the SQL Proxy logs on K8s. As a reminder the container port is allocated dynamically by default by the Windocks server and the SQL proxy get it automatically for connection redirection.

…
Valid response for creating Windocks container
Container id is b1201aaaba3b4cd047953b624e541e26500024e42e6381936fc7b526b5596a99
Container port is 10001
Setting up tcp server
redirecting connections from 127.0.0.1:3089 to xx.xxx.xxx.xxx:10001 
…

 

Let’s try to connect by using mssql-cli and the external IP of the SQL Proxy service and the 3089 port. The connection redirect is effective and I can interact with my Windocks container on local port 10001:

master> SELECT top 1 c.local_net_address, c.local_tcp_port
....... FROM sys.dm_exec_connections as c; 
+---------------------+------------------+
| local_net_address   | local_tcp_port   |
|---------------------+------------------|
| 172.18.0.5          | 10001            |
+---------------------+------------------+

 

The Windocks container for SQL Server was spinning up my 3 testing databases as expected:

master> \ld+
+-------------------+-------------------------+-----------------------+------------------------------+
| name              | create_date             | compatibility_level   | collation_name               |
|-------------------+-------------------------+-----------------------+------------------------------|
| master            | 2003-04-08 09:13:36.390 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| tempdb            | 2019-06-27 20:04:04.273 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| model             | 2003-04-08 09:13:36.390 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| msdb              | 2012-02-10 21:02:17.770 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| AdventureWorksDbi | 2019-06-27 20:04:03.537 | 100                   | Latin1_General_100_CS_AS     |
| ApplixEnterprise  | 2019-06-27 20:04:04.477 | 90                    | SQL_Latin1_General_CP1_CI_AS |
| dbi_tools         | 2019-06-27 20:04:05.153 | 100                   | French_CS_AS                 |
+-------------------+-------------------------+-----------------------+------------------------------+

 

From the Windocks server, I may get a picture of provisioned containers. The interesting one in our case is referenced by the name k8s-windocks2012/xxxx:

PS F:\WINDOCKS\SQL2012> docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
e9dbe5556b2f        2012_ci             ""                  29 minutes ago      Stopped             10002/              dab/Windocks-id:31432367-c744-4ae3-8248-cb3fb3d2792e
b1201aaaba3b        2012_ci             ""                  13 minutes ago      Started             10001/              k8s-windocks2012/Windocks-id:cfa58c38-d168-4c04-b4c8-12b0552b93ad

 

Well, in a nutshell a feature we will consider to integrate in our DevOps Azure pipeline for sure. Stay tuned, other blog posts will come later.

See you!

 

 

 

Cet article Windocks and K8s support est apparu en premier sur Blog dbi services.

Modifying pg_hba.conf from inside PostgreSQL

$
0
0

During one of the sessions from the last Swiss PGDay there was a question which could not be answered during the talk: Is it possible to modify pg_hba.conf from inside PostgreSQL without having access to the operating system? What everybody agreed on is, that there currently is no build-in function for doing this.

When you are on a recent version of PostgreSQL there is a view you can use to display the rules in pg_hba.conf:

postgres=# select * from pg_hba_file_rules ;
 line_number | type  |   database    | user_name |  address  |                 netmask                 | auth_method | options | error 
-------------+-------+---------------+-----------+-----------+-----------------------------------------+-------------+---------+-------
          84 | local | {all}         | {all}     |           |                                         | trust       |         | 
          86 | host  | {all}         | {all}     | 127.0.0.1 | 255.255.255.255                         | trust       |         | 
          88 | host  | {all}         | {all}     | ::1       | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         | 
          91 | local | {replication} | {all}     |           |                                         | trust       |         | 
          92 | host  | {replication} | {all}     | 127.0.0.1 | 255.255.255.255                         | trust       |         | 
          93 | host  | {replication} | {all}     | ::1       | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         | 
          94 | host  | {all}         | {mydb}    | ::1       | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         | 
(7 rows)

But there is nothing which allows you to directly modify that. When you are lucky and you have enough permissions there is a way to do it, though. First, lets check where pg_hba.conf is located:

postgres=# select setting from pg_settings where name like '%hba%';
           setting           
-----------------------------
 /u02/pgdata/DEV/pg_hba.conf

Having that information we can load that file to a table:

postgres=# create table hba ( lines text ); 
CREATE TABLE
postgres=# copy hba from '/u02/pgdata/DEV/pg_hba.conf';
COPY 93

Once it is loaded we have the whole content in our table (skipping the comments and empty lines here):

postgres=# select * from hba where lines !~ '^#' and lines !~ '^$';
                                 lines                                 
-----------------------------------------------------------------------
 local   all             all                                     trust
 host    all             all             127.0.0.1/32            trust
 host    all             all             ::1/128                 trust
 local   replication     all                                     trust
 host    replication     all             127.0.0.1/32            trust
 host    replication     all             ::1/128                 trust
(6 rows)

As this is a normal table we can of course add a row:

postgres=# insert into hba (lines) values ('host  all mydb  ::1/128                 trust');
INSERT 0 1
postgres=# select * from hba where lines !~ '^#' and lines !~ '^$';
                                 lines                                 
-----------------------------------------------------------------------
 local   all             all                                     trust
 host    all             all             127.0.0.1/32            trust
 host    all             all             ::1/128                 trust
 local   replication     all                                     trust
 host    replication     all             127.0.0.1/32            trust
 host    replication     all             ::1/128                 trust
 host  all mydb  ::1/128                 trust
(7 rows)

And now we can write it back:

postgres=# copy hba to '/u02/pgdata/DEV/pg_hba.conf';
COPY 94

Reading the whole file confirms that our new rule is there:

postgres=# select pg_read_file('pg_hba.conf');
                               pg_read_file                               
--------------------------------------------------------------------------
 # PostgreSQL Client Authentication Configuration File                   +
 # ===================================================                   +
 #                                                                       +
 # Refer to the "Client Authentication" section in the PostgreSQL        +
 # documentation for a complete description of this file.  A short       +
 # synopsis follows.                                                     +
 #                                                                       +
 # This file controls: which hosts are allowed to connect, how clients   +
 # are authenticated, which PostgreSQL user names they can use, which    +
 # databases they can access.  Records take one of these forms:          +
 #                                                                       +
 # local      DATABASE  USER  METHOD  [OPTIONS]                          +
 # host       DATABASE  USER  ADDRESS  METHOD  [OPTIONS]                 +
 # hostssl    DATABASE  USER  ADDRESS  METHOD  [OPTIONS]                 +
 # hostnossl  DATABASE  USER  ADDRESS  METHOD  [OPTIONS]                 +
 #                                                                       +
 # (The uppercase items must be replaced by actual values.)              +
 #                                                                       +
 # The first field is the connection type: "local" is a Unix-domain      +
 # socket, "host" is either a plain or SSL-encrypted TCP/IP socket,      +
 # "hostssl" is an SSL-encrypted TCP/IP socket, and "hostnossl" is a     +
 # plain TCP/IP socket.                                                  +
 #                                                                       +
 # DATABASE can be "all", "sameuser", "samerole", "replication", a       +
 # database name, or a comma-separated list thereof. The "all"           +
 # keyword does not match "replication". Access to replication           +
 # must be enabled in a separate record (see example below).             +
 #                                                                       +
 # USER can be "all", a user name, a group name prefixed with "+", or a  +
 # comma-separated list thereof.  In both the DATABASE and USER fields   +
 # you can also write a file name prefixed with "@" to include names     +
 # from a separate file.                                                 +
 #                                                                       +
 # ADDRESS specifies the set of hosts the record matches.  It can be a   +
 # host name, or it is made up of an IP address and a CIDR mask that is  +
 # an integer (between 0 and 32 (IPv4) or 128 (IPv6) inclusive) that     +
 # specifies the number of significant bits in the mask.  A host name    +
 # that starts with a dot (.) matches a suffix of the actual host name.  +
 # Alternatively, you can write an IP address and netmask in separate    +
 # columns to specify the set of hosts.  Instead of a CIDR-address, you  +
 # can write "samehost" to match any of the server's own IP addresses,   +
 # or "samenet" to match any address in any subnet that the server is    +
 # directly connected to.                                                +
 #                                                                       +
 # METHOD can be "trust", "reject", "md5", "password", "scram-sha-256",  +
 # "gss", "sspi", "ident", "peer", "pam", "ldap", "radius" or "cert".    +
 # Note that "password" sends passwords in clear text; "md5" or          +
 # "scram-sha-256" are preferred since they send encrypted passwords.    +
 #                                                                       +
 # OPTIONS are a set of options for the authentication in the format     +
 # NAME=VALUE.  The available options depend on the different            +
 # authentication methods -- refer to the "Client Authentication"        +
 # section in the documentation for a list of which options are          +
 # available for which authentication methods.                           +
 #                                                                       +
 # Database and user names containing spaces, commas, quotes and other   +
 # special characters must be quoted.  Quoting one of the keywords       +
 # "all", "sameuser", "samerole" or "replication" makes the name lose    +
 # its special character, and just match a database or username with     +
 # that name.                                                            +
 #                                                                       +
 # This file is read on server startup and when the server receives a    +
 # SIGHUP signal.  If you edit the file on a running system, you have to +
 # SIGHUP the server for the changes to take effect, run "pg_ctl reload",+
 # or execute "SELECT pg_reload_conf()".                                 +
 #                                                                       +
 # Put your actual configuration here                                    +
 # ----------------------------------                                    +
 #                                                                       +
 # If you want to allow non-local connections, you need to add more      +
 # "host" records.  In that case you will also need to make PostgreSQL   +
 # listen on a non-local interface via the listen_addresses              +
 # configuration parameter, or via the -i or -h command line switches.   +
                                                                         +
 # CAUTION: Configuring the system for local "trust" authentication      +
 # allows any local user to connect as any PostgreSQL user, including    +
 # the database superuser.  If you do not trust all your local users,    +
 # use another authentication method.                                    +
                                                                         +
                                                                         +
 # TYPE  DATABASE        USER            ADDRESS                 METHOD  +
                                                                         +
 # "local" is for Unix domain socket connections only                    +
 local   all             all                                     trust   +
 # IPv4 local connections:                                               +
 host    all             all             127.0.0.1/32            trust   +
 # IPv6 local connections:                                               +
 host    all             all             ::1/128                 trust   +
 # Allow replication connections from localhost, by a user with the      +
 # replication privilege.                                                +
 local   replication     all                                     trust   +
 host    replication     all             127.0.0.1/32            trust   +
 host    replication     all             ::1/128                 trust   +
 host  all mydb  ::1/128                 trust                           +
(1 row)

All you need to do from now on is to reload the configuration and you’re done:

postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

Of course: Use with caution!

Cet article Modifying pg_hba.conf from inside PostgreSQL est apparu en premier sur Blog dbi services.

Deploying SQL Server on Azure virtual machine with Terraform

$
0
0

We are entering now to the infrastructure as code world and provisioning a SQL Server infrastructure is not excluded from the equation. This is especially true when it comes the Cloud regardless we are using IaaS or PaaS.

One great tool to use in such scenario is certainly terraform and I introduced it during the last PowerSaturday pre-conference named “DBA modern competencies”. Installation paradigms in the cloud differ from what we usually do on-premises either we are GUI-oriented or scripting-enthusiastic DBAs. In on-premises scenarios, building and delivering a software require both to deal often with a lot of hardware including servers, racks, network, cooling and so on. So, it makes sense to have one team for managing the hardware stuff (Ops) and another one to develop software (Devs). But nowadays, a shift is taking place and instead of managing their own data centers for some of their system components, many customers are moving to the cloud, taking advantage of services such as Azure, AWS and Google Cloud. This is a least what I begin to notice with some of my customers for some times now including some of their database environments. Instead of investing heavily in hardware, many Ops teams are spending all their time working on software, using tools such as Chef, Puppet, Terraform or Docker. In other words, instead of racking servers and plugging in network cables, many sysadmins are writing code. From a database administrator standpoint, I believe this is a good news because it will remove this boring part of the work without any real adding values.

The question that may rise probably is why to use Terraform (or equivalent) rather than a script? I mean script as ad-hoc script here … Personally, I use PowerShell based ah-hoc scripts to install and configure SQL Server for a while now at customer shops and PowerShell is part of IAC (Infrastructure As Code) tooling category. But using ad-hoc scripts come with some drawbacks. First of all, using a programming language like PowerShell implies you have to write completely custom code for every task and it is not a big deal as long as you manage few components in the infrastructure. But what if you’re  dealing with hundred or thousand of servers including databases, network, load balancers and so on? Do you really want to maintain a big and unmaintainable ad-hoc script repository especially if you work in a collaborative way? In this case you have to rely on a tool designed for such job. Furthermore, writing an ad hoc script that works once isn’t often too difficult but writing one that works correctly even if you run it over and over again is a lot harder and impotent code is an important part of modern infrastructure.

In my case, we also investigated on different configuration management tools like Ansible but they are mostly designed to install and manage software on existing servers. In the context of the cloud, in most cases servers do not exist and you have to provide one (or many) including the related infrastructure like virtual network, virtual IP, virtual machine, disks and so forth …  So here comes the big interest of using Terraform in such scenario. Terraform is an Hashicop product and comes with open source vs Enterprise version. I’m using the open source one with 01.2 version at the moment of writing this blog post as shown below:

$ terraform version
Terraform v0.12.0
+ provider.azurerm v1.31.0
+ provider.template v2.1.2

 

In my case, it may be very helpful to provision servers on Azure. Microsoft already invested a lot to introduce Terraform as provisioning tool for different Azure services including SQL Azure DBs with azurerm_sql_database or azurerm_sql_server providers. But the story is not the same when it comes SQL Server virtual machines on Azure. The only provider available to provision an Azure VM is azurerm_virtual_machine but it doesn’t include the SQL Server configuration part. In this blog post I want to expose some challenges I faced to make it possible and you probably want to be aware of 🙂

Let’s say that in fact, we have to rely on the azurerm_template_deployment provider in this case but at the cost of some complexity. Indeed, you have to introduce the SQL Server template Microsoft.SqlVirtualMachine/SqlVirtualMachines to the Terraform files with all the difficulties that is implies. First, from my opinion adding a SQL Server template file leads to additional complexity that is at the opposite of what we expect from Terraform. We lose some code simplification and maintainability somewhere and this is by far my main disappointment. Anyway, in addition let’s say that debugging a Terraform deployment with Azure templates may be time-consuming task especially if you have to deal with syntax error issues. You often have to take a look at the Azure event logs directly to know what is happening. Second, the parameters are all string type and we need to convert them to the right type with Azure template variables. Finally, as far as I know, it is not possible to use arrays as input parameters within the Azure template and refactoring from the initial server template is requested accordingly. This is the case of the SQL Server dataDisks configuration parameter for instance. I don’t know if Microsoft is planning something on this topic but my expectation is to get something more “Terraform-integrated” in the future.

Let’s finish this blog post with my Terraform files used in my SQL Server VM Azure provisioning workflow that is as follows:

Create a resource group => Create a virtual network + subnet => Create SQL Server Azure VM with custom configuration

  • The main configuration file. The resource section is understanble and generally speaking the configuration file remains maintenable enough here with the declarative / descriptive way to provision resources.
# =============== VARIABLES =============== #
variable "prefix" {
  type    = string
  default = "dbi"
}

variable "resourcegroup" {
  type = string
}

variable "location" {
  type    = string
  default = "westeurope"
}

variable "subscriptionId" {
  type = string
}

variable "virtualmachinename" {
  type = string
}

variable "virtualMachineSize" {
  type    = string
}

variable "adminUsername" {
  type = string
}

variable "adminUserPassword" {
  type = string
}

variable "image_ref_offer" {
  type = string
}

variable "image_ref_sku" {
  type = string
}

variable "image_ref_version" {
  type = string
}

variable "osDiskType" {
  type    = string
  default = "Premium_LRS"
}

variable "sqlVirtualMachineLocation" {
  type    = string
  default = "westeurope"
}

variable "sqlServerLicenseType" {
  type    = string
}

variable "sqlPortNumber" {
  type    = string
  default = "1433"
}

variable "sqlStorageDisksCount" {
  type    = string
  default = "1"
}

variable "diskSqlSizeGB" {
  type    = string
  default = "1024"
}

variable "sqlDisklType" {
  type    = string
  default = "Premium_LRS"
}

variable "sqlStorageWorkloadType" {
  type    = string
  default = "GENERAL"
}

variable "sqlAuthenticationLogin" {
  type = string
}

variable "sqlAuthenticationPassword" {
  type = string
}

variable "sqlConnectivityType" {
  type = string
}

variable "sqlAutopatchingDayOfWeek" {
  type    = string
  default = "Sunday"
}

variable "sqlAutopatchingStartHour" {
  type    = string
  default = "2"
}

variable "sqlAutopatchingWindowDuration" {
  type    = string
  default = "60"
}

variable "diagnosticsStorageAccountName" {
  type = string
}

variable "tag" {
  type = string
}

# =============== TEMPLATES =============== #
data "template_file" "sqlvm" {
  template = file("./Templates/sql_vm_azure_dbi.json")
}

# =============== RESOURCES =============== #
resource "azurerm_resource_group" "sqlvm" {
  name     = var.resourcegroup
  location = "West Europe"
}

resource "azurerm_virtual_network" "sqlvm" {
  name                = "${var.resourcegroup}-vnet"
  address_space       = ["172.20.0.0/24"]
  location            = azurerm_resource_group.sqlvm.location
  resource_group_name = azurerm_resource_group.sqlvm.name
}

resource "azurerm_subnet" "internal" {
  name                 = "default"
  resource_group_name  = var.resourcegroup
  virtual_network_name = azurerm_virtual_network.sqlvm.name
  address_prefix       = "172.20.0.0/24"
}

resource "azurerm_template_deployment" "sqlvm" {
  name                = "${var.prefix}-template"
  resource_group_name = azurerm_resource_group.sqlvm.name

  template_body = data.template_file.sqlvm.rendered

  #DEPLOY

  # =============== PARAMETERS =============== #
  parameters = {
    "location"                         = var.location                      # Location (westeurope by default)
    "networkInterfaceName"             = "${var.prefix}-${var.virtualmachinename}-interface" # Virtual machine interace name
    "enableAcceleratedNetworking"      = "true"                            # Enable Accelerating networking (always YES)
    "networkSecurityGroupName"         = "${var.prefix}-${var.virtualmachinename}-nsg" # NSG name (computed)
    "subnetName"                       = azurerm_subnet.internal.name      # Resource subnet
    "virtualNetworkId"                 = "/subscriptions/${var.subscriptionId}/resourceGroups/${var.resourcegroup}/providers/Microsoft.Network/virtualNetworks/${var.resourcegroup}-vnet"
    "publicIpAddressName"              = "${var.prefix}-${var.virtualmachinename}-ip" # Public IP Address name (computed)
    "publicIpAddressType"              = "Dynamic"                         # Public IP allocation (Dynamic, Static)
    "publicIpAddressSku"               = "Basic"                           # Public IP Address sku (None, Basic, Advanced)
    "virtualMachineName"               = "${var.prefix}-${var.virtualmachinename}" # Virtual machine name (computed)
    "virtualMachineRG"                 = var.resourcegroup                 # Resource group for resources
    "virtualMachineSize"               = var.virtualMachineSize            # Virtual machine size (Standard_DS13_v2)
    "image_ref_offer"                  = var.image_ref_offer               # SQL Server Image Offer (SQL2017-WS2016, ...)
    "image_ref_sku"                    = var.image_ref_sku                 # SQL Server Image SKU (SQLDEV, ...)
    "image_ref_version"                = var.image_ref_version             # SQL Server Image version (latest, <version number>)
    "adminUsername"                    = var.adminUsername                 # Virtual machine user name
    "adminUserPassword"                = var.adminUserPassword             # Virtual machine user password
    "osDiskType"                       = var.osDiskType                    # OS Disk type (Premium_LRS by default)
    "sqlDisklType"                     = var.sqlDisklType                  # SQL Disk type Premium_LRS by default)
    "diskSqlSizeGB"                    = var.diskSqlSizeGB                 # SQL Disk size (GB)
    "diagnosticsStorageAccountName"    = var.diagnosticsStorageAccountName # Diagnostics info - storage account name
    "diagnosticsStorageAccountId"      = "/subscriptions/${var.subscriptionId}/resourceGroups/${var.resourcegroup}/providers/Microsoft.Storage/storageAccounts/${var.diagnosticsStorageAccountName}" # Storage account must exist
    "diagnosticsStorageAccountType"    = "Standard_LRS"                    # Diagnostics info - storage account type
    "diagnosticsStorageAccountKind"    = "Storage"                         # Diagnostics info - storage type
    "sqlVirtualMachineLocation"        = var.sqlVirtualMachineLocation     # Virtual machine location
    "sqlVirtualMachineName"            = "${var.prefix}-${var.virtualmachinename}" # Virtual machine name
    "sqlServerLicenseType"             = var.sqlServerLicenseType          # SQL Server license type. - PAYG or AHUB
    "sqlConnectivityType"              = var.sqlConnectivityType           # LOCAL, PRIVATE, PUBLIC
    "sqlPortNumber"                    = var.sqlPortNumber                 # SQL listen port
    "sqlStorageDisksCount"             = var.sqlStorageDisksCount          # Nb of SQL disks to provision
    "sqlStorageWorkloadType"           = var.sqlStorageWorkloadType        # Workload type GENERAL, OLTP, DW
    "sqlStorageDisksConfigurationType" = "NEW"                             # Configuration type NEW 
    "sqlStorageStartingDeviceId"       = "2"                               # Storage starting device id => Always 2
    "sqlStorageDeploymentToken"        = "8528"                            # Deployment Token
    "sqlAutopatchingDayOfWeek"         = var.sqlAutopatchingDayOfWeek      # Day of week to apply the patch on. - Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday
    "sqlAutopatchingStartHour"         = var.sqlAutopatchingStartHour      # Hour of the day when patching is initiated. Local VM time
    "sqlAutopatchingWindowDuration"    = var.sqlAutopatchingWindowDuration # Duration of patching
    "sqlAuthenticationLogin"           = var.sqlAuthenticationLogin        # Login SQL
    "sqlAuthUpdatePassword"            = var.sqlAuthenticationPassword     # Login SQL Password
    "rServicesEnabled"                 = "false"                           # No need to enable R services
    "tag"                              = var.tag                           # Resource tags
  }

  deployment_mode = "Incremental"                                          # Deployment => incremental (complete is too destructive in our case) 
}

  • The SQL Server template from customized from SqlVirtualMachine/SqlVirtualMachine template. As said previously, the hardest part of the deployment. Hope to see it removed in the future!
{
    "$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "location": {
            "type": "string"
        },
        "networkInterfaceName": {
            "type": "string"
        },
        "enableAcceleratedNetworking": {
            "type": "string"
        },
        "networkSecurityGroupName": {
            "type": "string"
        },
        "subnetName": {
            "type": "string"
        },
        "virtualNetworkId": {
            "type": "string"
        },
        "publicIpAddressName": {
            "type": "string"
        },
        "publicIpAddressType": {
            "type": "string"
        },
        "publicIpAddressSku": {
            "type": "string"
        },
        "virtualMachineName": {
            "type": "string"
        },
        "virtualMachineRG": {
            "type": "string"
        },
        "osDiskType": {
            "type": "string"
        },
        "virtualMachineSize": {
            "type": "string"
        },
        "image_ref_offer": {
            "type": "string"
        },
        "image_ref_sku": {
            "type": "string"
        },
        "image_ref_version": {
            "type": "string"
        },
        "adminUsername": {
            "type": "string"
        },
        "adminUserPassword": {
            "type": "string"
        },
        "diagnosticsStorageAccountName": {
            "type": "string"
        },
        "diagnosticsStorageAccountId": {
            "type": "string"
        },
        "diagnosticsStorageAccountType": {
            "type": "string"
        },
        "diagnosticsStorageAccountKind": {
            "type": "string"
        },
        "sqlVirtualMachineLocation": {
            "type": "string"
        },
        "sqlVirtualMachineName": {
            "type": "string"
        },
        "sqlServerLicenseType": {
            "type": "string"  
        },
        "sqlConnectivityType": {
            "type": "string"
        },
        "sqlPortNumber": {
            "type": "string"
        },
        "sqlStorageDisksCount": {
            "type": "string"
        },
        "sqlDisklType": {
            "type": "string"
        },
        "diskSqlSizeGB": {
            "type": "string"
        },
        "sqlStorageWorkloadType": {
            "type": "string"
        },
        "sqlStorageDisksConfigurationType": {
            "type": "string"
        },
        "sqlStorageStartingDeviceId": {
            "type": "string"
        },
        "sqlStorageDeploymentToken": {
            "type": "string"
        },
        "sqlAutopatchingDayOfWeek": {
            "type": "string"
        },
        "sqlAutopatchingStartHour": {
            "type": "string"
        },
        "sqlAutopatchingWindowDuration": {
            "type": "string"
        },
        "sqlAuthenticationLogin": {
            "type": "string"
        },
        "sqlAuthUpdatePassword": {
            "type": "string"
        },
        "rServicesEnabled": {
            "type": "string"
        },
        "tag": {
            "type": "string"
        }
    },
    "variables": {
        "nsgId": "[resourceId(resourceGroup().name, 'Microsoft.Network/networkSecurityGroups', parameters('networkSecurityGroupName'))]",
        "vnetId": "[parameters('virtualNetworkId')]",
        "subnetRef": "[concat(variables('vnetId'), '/subnets/', parameters('subnetName'))]",
        "dataDisks": [
            {
                    "lun": "0",
                    "createOption": "empty",
                    "caching": "ReadOnly",
                    "writeAcceleratorEnabled": false,
                    "id": null,
                    "name": null,
                    "storageAccountType": "[parameters('sqlDisklType')]",
                    "diskSizeGB": "[int(parameters('diskSqlSizeGB'))]"
            }
        ]
    },
    "resources": [
        {
            "name": "[parameters('networkInterfaceName')]",
            "type": "Microsoft.Network/networkInterfaces",
            "apiVersion": "2018-10-01",
            "location": "[parameters('location')]",
            "dependsOn": [
                "[concat('Microsoft.Network/networkSecurityGroups/', parameters('networkSecurityGroupName'))]",
                "[concat('Microsoft.Network/publicIpAddresses/', parameters('publicIpAddressName'))]"
            ],
            "properties": {
                "ipConfigurations": [
                    {
                        "name": "ipconfig1",
                        "properties": {
                            "subnet": {
                                "id": "[variables('subnetRef')]"
                            },
                            "privateIPAllocationMethod": "Dynamic",
                            "publicIpAddress": {
                                "id": "[resourceId(resourceGroup().name, 'Microsoft.Network/publicIpAddresses', parameters('publicIpAddressName'))]"
                            }
                        }
                    }
                ],
                "enableAcceleratedNetworking": "[parameters('enableAcceleratedNetworking')]",
                "networkSecurityGroup": {
                    "id": "[variables('nsgId')]"
                }
            },
            "tags": {
                "Environment": "[parameters('tag')]"
            }
        },
        {
            "name": "[parameters('networkSecurityGroupName')]",
            "type": "Microsoft.Network/networkSecurityGroups",
            "apiVersion": "2019-02-01",
            "location": "[parameters('location')]",
            "properties": {
                "securityRules": [
                    {
                        "name": "RDP",
                        "properties": {
                            "priority": 300,
                            "protocol": "TCP",
                            "access": "Allow",
                            "direction": "Inbound",
                            "sourceAddressPrefix": "*",
                            "sourcePortRange": "*",
                            "destinationAddressPrefix": "*",
                            "destinationPortRange": "3389"
                        }
                    }
                ]
            },
            "tags": {
                "Environment": "[parameters('tag')]"
            }
        },
        {
            "name": "[parameters('publicIpAddressName')]",
            "type": "Microsoft.Network/publicIpAddresses",
            "apiVersion": "2019-02-01",
            "location": "[parameters('location')]",
            "properties": {
                "publicIpAllocationMethod": "[parameters('publicIpAddressType')]"
            },
            "sku": {
                "name": "[parameters('publicIpAddressSku')]"
            },
            "tags": {
                "Environment": "[parameters('tag')]"
            }
        },
        {
            "name": "[parameters('virtualMachineName')]",
            "type": "Microsoft.Compute/virtualMachines",
            "apiVersion": "2018-10-01",
            "location": "[parameters('location')]",
            "dependsOn": [
                "[concat('Microsoft.Network/networkInterfaces/', parameters('networkInterfaceName'))]",
                "[concat('Microsoft.Storage/storageAccounts/', parameters('diagnosticsStorageAccountName'))]"
            ],
            "properties": {
                "hardwareProfile": {
                    "vmSize": "[parameters('virtualMachineSize')]"
                },
                "storageProfile": {
                    "osDisk": {
                        "createOption": "fromImage",
                        "managedDisk": {
                            "storageAccountType": "[parameters('osDiskType')]"
                        }
                    },
                    "imageReference": {
                        "publisher": "MicrosoftSQLServer",
                        "offer": "[parameters('image_ref_offer')]",
                        "sku": "[parameters('image_ref_sku')]",
                        "version": "[parameters('image_ref_version')]"
                    },
                    "copy": [
                        {
                            "name": "dataDisks",
                            "count": "[length(variables('dataDisks'))]",
                            "input": {
                                "lun": "[variables('dataDisks')[copyIndex('dataDisks')].lun]",
                                "createOption": "[variables('dataDisks')[copyIndex('dataDisks')].createOption]",
                                "caching": "[variables('dataDisks')[copyIndex('dataDisks')].caching]",
                                "writeAcceleratorEnabled": "[variables('dataDisks')[copyIndex('dataDisks')].writeAcceleratorEnabled]",
                                "diskSizeGB": "[variables('dataDisks')[copyIndex('dataDisks')].diskSizeGB]",
                                "managedDisk": {
                                    "id": "[coalesce(variables('dataDisks')[copyIndex('dataDisks')].id, if(equals(variables('dataDisks')[copyIndex('dataDisks')].name, json('null')), json('null'), resourceId('Microsoft.Compute/disks', variables('dataDisks')[copyIndex('dataDisks')].name)))]",
                                    "storageAccountType": "[variables('dataDisks')[copyIndex('dataDisks')].storageAccountType]"
                                }
                            }
                        }
                    ]
                },
                "networkProfile": {
                    "networkInterfaces": [
                        {
                            "id": "[resourceId('Microsoft.Network/networkInterfaces', parameters('networkInterfaceName'))]"
                        }
                    ]
                },
                "osProfile": {
                    "computerName": "[parameters('virtualMachineName')]",
                    "adminUsername": "[parameters('adminUsername')]",
                    "adminPassword": "[parameters('adminUserPassword')]",
                    "windowsConfiguration": {
                        "enableAutomaticUpdates": true,
                        "provisionVmAgent": true
                    }
                },
                "licenseType": "Windows_Server",
                "diagnosticsProfile": {
                    "bootDiagnostics": {
                        "enabled": true,
                        "storageUri": "[concat('https://', parameters('diagnosticsStorageAccountName'), '.blob.core.windows.net/')]"
                    }
                }
            },
            "tags": {
                "Environment": "[parameters('tag')]"
            }
        },
        {
            "name": "[parameters('diagnosticsStorageAccountName')]",
            "type": "Microsoft.Storage/storageAccounts",
            "apiVersion": "2018-07-01",
            "location": "[parameters('location')]",
            "properties": {},
            "sku": {
                "name": "[parameters('diagnosticsStorageAccountType')]"
            },
            "tags": {
                "Environment": "[parameters('tag')]"
            }
        },
        {
            "name": "[parameters('sqlVirtualMachineName')]",
            "type": "Microsoft.SqlVirtualMachine/SqlVirtualMachines",
            "apiVersion": "2017-03-01-preview",
            "location": "[parameters('sqlVirtualMachineLocation')]",
            "properties": {
                "virtualMachineResourceId": "[resourceId('Microsoft.Compute/virtualMachines', parameters('sqlVirtualMachineName'))]",
                "sqlServerLicenseType": "[parameters('sqlServerLicenseType')]",
                "AutoPatchingSettings": {
                    "Enable": true,
                    "DayOfWeek": "[parameters('sqlAutopatchingDayOfWeek')]",
                    "MaintenanceWindowStartingHour": "[parameters('sqlAutopatchingStartHour')]",
                    "MaintenanceWindowDuration": "[parameters('sqlAutopatchingWindowDuration')]"
                },
                "KeyVaultCredentialSettings": {
                    "Enable": false,
                    "CredentialName": ""
                },
                "ServerConfigurationsManagementSettings": {
                    "SQLConnectivityUpdateSettings": {
                        "ConnectivityType": "[parameters('sqlConnectivityType')]",
                        "Port": "[parameters('sqlPortNumber')]",
                        "SQLAuthUpdateUserName": "[parameters('sqlAuthenticationLogin')]",
                        "SQLAuthUpdatePassword": "[parameters('sqlAuthUpdatePassword')]"
                    },
                    "SQLWorkloadTypeUpdateSettings": {
                        "SQLWorkloadType": "[parameters('sqlStorageWorkloadType')]"
                    },
                    "SQLStorageUpdateSettings": {
                        "DiskCount": "[parameters('sqlStorageDisksCount')]",
                        "DiskConfigurationType": "[parameters('sqlStorageDisksConfigurationType')]",
                        "StartingDeviceID": "[parameters('sqlStorageStartingDeviceId')]"
                    },
                    "AdditionalFeaturesServerConfigurations": {
                        "IsRServicesEnabled": "[parameters('rServicesEnabled')]"
                    }
                }
            },
            "dependsOn": [
                "[resourceId('Microsoft.Compute/virtualMachines', parameters('sqlVirtualMachineName'))]"
            ],
            "tags": {
                "Environment": "[parameters('tag')]"
            }
        }
    ],
    "outputs": {
        "adminUsername": {
            "type": "string",
            "value": "[parameters('adminUsername')]"
        }
    }
}

 

  • The terraform.tfvars file. This file (with sensible parameters) should never checked into source control obviously
# VM configuration
subscriptionId="xxxxxxx"
resourcegroup="external-rg"
virtualMachineSize="Standard_DS13_v2"
virtualmachinename="sql1"
adminUsername="clustadmin"
adminUserPassword="xxxxxx"
diagnosticsStorageAccountName="dab"
tag="SQLDEV"

# Image reference configuration
image_ref_offer="SQL2017-WS2016"
image_ref_sku="SQLDEV"
image_ref_version="latest"

# SQL configuration
sqlServerLicenseType="PAYG"
sqlAuthenticationLogin="sqladmin"
sqlAuthenticationPassword="xxxxx"
sqlConnectivityType="Public"
diskSqlSizeGB="1024"
sqlStorageWorkloadType="OLTP"
sqlPortNumber="5040"
sqlAutopatchingDayOfWeek="Sunday"
sqlAutopatchingStartHour="2"
sqlAutopatchingWindowDuration="60"

 

This is up to you to customize this template for your own purpose. Let’s deploy it:

$ terraform refresh --var-file=vm.tfvars
data.template_file.sqlvm: Refreshing state...

azurerm_template_deployment.sqlvm: Creating...
azurerm_template_deployment.sqlvm: Still creating... [10s elapsed]
azurerm_template_deployment.sqlvm: Still creating... [20s elapsed]
azurerm_template_deployment.sqlvm: Still creating... [30s elapsed]
azurerm_template_deployment.sqlvm: Still creating... [40s elapsed]
...

 

Few minutes afterwards, my SQL Server VM on Azure is provisioned

$ az resource list --tag Environment=SQLDEV --query "[].{resource:resourceGroup,name:name,location:location,Type:type}" --out table
Resource     Name                                                Location    Type
-----------  --------------------------------------------------  ----------  ----------------------------------------------
EXTERNAL-RG  dbi-sql1_disk2_1cacce801795410fb406844cfb1f9317     westeurope  Microsoft.Compute/disks
EXTERNAL-RG  dbi-sql1_OsDisk_1_78558bdf9a9648f29d78cfbb36d9bed9  westeurope  Microsoft.Compute/disks
external-rg  dbi-sql1                                            westeurope  Microsoft.Compute/virtualMachines
external-rg  dbi-sql1-interface                                  westeurope  Microsoft.Network/networkInterfaces
external-rg  dbi-sql1-nsg                                        westeurope  Microsoft.Network/networkSecurityGroups
external-rg  dbi-sql1-ip                                         westeurope  Microsoft.Network/publicIPAddresses
external-rg  dbi-sql1                                            westeurope  Microsoft.SqlVirtualMachine/SqlVirtualMachines
external-rg  dab                                                 westeurope  Microsoft.Storage/storageAccounts

 

Happy deployment !!

 

Cet article Deploying SQL Server on Azure virtual machine with Terraform est apparu en premier sur Blog dbi services.

SQL Server containers and docker network driver performance considerations

$
0
0

Few months ago I attended to the Franck Pachot session about Microservices and databases at SOUG Romandie in Lausanne on 2019 May 21th. He covered some performance challenges that can be introduced by Microservices architecture design and especially when database components come into the game with chatty applications. One year ago, I was in a situation where a customer installed some SQL Server Linux 2017 containers in a Docker infrastructure with user applications located outside of this infrastructure. It is likely an uncommon way to start with containers but anyway when you are immerging in a Docker world you just notice there is a lot of network drivers and considerations you may be aware of and just for a sake of curiosity, I proposed to my customer to perform some network benchmark tests to get a clear picture of these network drivers and their related overhead in order to design correctly Docker infrastructure from a performance standpoint.

The initial customer’s scenario included a standalone Docker infrastructure and we considered different approaches about application network configurations from a performance perspective. We did the same for the second scenario that concerned a Docker Swarm infrastructure we installed in a second step.

The Initial reference – Host network and Docker host network

The first point was to get an initial reference with no network management overhead directly from the network host. We used the iperf3 tool for the tests. This is a kind of tool I’m using with virtual environments as well to ensure network throughput is what we really expect and sometimes I got some surprises on this topic. So, let’s go back to the container world and each test was performed from a Linux host outside to the concerned Docker infrastructure according to the customer scenario.

The attached network card speed link of the Docker Host is supposed to be 10GBits/sec …

$ sudo ethtool eth0 | grep "Speed"
        Speed: 10000Mb/s

 

… and it is confirmed by the first iperf3 output below:

Let’s say that we tested the Docker host driver as well and we got similar results.

$ docker run  -it --rm --name=iperf3-server  --net=host networkstatic/iperf3 -s

 

Docker bridge mode

The default modus operandi for a Docker host is to create a virtual ethernet bridge (called docker0), attach each container’s network interface to the bridge, and to use network address translation (NAT) when containers need to make themselves visible to the Docker host and beyond. Unless specified, a docker container will use it by default and this is exactly the network driver used by containers in the context of my customer. In fact, we used user-defined bridge network but I would say it doesn’t matter for the tests we performed here.

$ ip addr show docker0
5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:70:0a:e8:7a brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:70ff:fe0a:e87a/64 scope link
       valid_lft forever preferred_lft forever

 

The iperf3 docker container I ran for my tests is using the default bridge network as show below. The interface with index 24 corresponds to the veth0bfc2dc peer of the concerned container.

$ docker run  -d --name=iperf3-server -p 5204:5201 networkstatic/iperf3 -s
…
$ docker ps | grep iperf
5c739940e703        networkstatic/iperf3              "iperf3 -s"              38 minutes ago      Up 38 minutes                0.0.0.0:5204->5201/tcp   iperf3-server
$ docker exec -ti 5c7 ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
24: eth0@if25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

[clustadmin@docker1 ~]$ ethtool -S veth0bfc2dc
NIC statistics:
     peer_ifindex: 24

 

Here the output after running the iperf3 benchmark:

It’s worth noting that the “Bridge” network adds some overheads with an impact of 13% in my tests but in fact, it is an expected outcome to be honest and especially if we refer to the Docker documentation:

Compared to the default bridge mode, the host mode gives significantly better networking performance since it uses the host’s native networking stack whereas the bridge has to go through one level of virtualization through the docker daemon.

 

When the docker-proxy comes into play

Next scenario we wanted to test concerned the closet network proximity we may have between the user applications and the SQL Server containers in the Docker infrastructure. In other words, we assumed the application resides on the same host than the SQL Server container and we got some surprises from the docker-proxy itself.

Before running the iperf3 result, I think we have to answer to the million-dollar question here: what is the docker-proxy? But did you only pay attention to this process on your docker host? Let’s run a pstree command:

$ pstree
systemd─┬─NetworkManager───2*[{NetworkManager}]
        ├─agetty
        ├─auditd───{auditd}
        ├─containerd─┬─containerd-shim─┬─npm─┬─node───9*[{node}]
        │            │                 │     └─9*[{npm}]
        │            │                 └─12*[{containerd-shim}]
        │            ├─containerd-shim─┬─registry───9*[{registry}]
        │            │                 └─10*[{containerd-shim}]
        │            ├─containerd-shim─┬─iperf3
        │            │                 └─9*[{containerd-shim}]
        │            └─16*[{containerd}]
        ├─crond
        ├─dbus-daemon
        ├─dockerd─┬─docker-proxy───7*[{docker-proxy}]
        │         └─20*[{dockerd}]

 

Well, if I understand correctly the Docker documentation, the purpose of this process is to enable a service consumer to communicate with the service providing container …. but it’s only used in particular circumstances. Just bear in mind that controlling access to a container’s service is massively done through the host netfilter framework, in both NAT and filter tables and the docker-proxy mechanism is required only when this method of control is not available:

  • When the Docker daemon is started with –iptables=false or –ip-forward=false or when the Linux host cannot act as a router with Linux kernel parameter ipv4.ip_forward=0. This is not my case here.
  • When you are using localhost in the connection string of your application that implies to use the loopback interface (127.0.0.0/8) and the Kernel doesn’t allow routing traffic from it. Therefore, it’s not possible to apply netfilter NAT rules and instead, netfilter sends packets through the filter table’s INPUT chain to a local process listening on the docker-proxy
$ sudo iptables -L -n -t nat | grep 127.0.0.0
DOCKER     all  --  0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

 

In the picture below you will notice I’m using the localhost key word in my connection string so the docker-proxy comes into play.

A huge performance impact for sure which is about 28%. This performance drop may be explained by the fact the docker-proxy process is consuming 100% of my CPUs:

The docker-proxy operates in userland and I may simply disable it with the docker daemon parameter – “userland-proxy”: false – but I would say this is a case we would not encounter in practice because applications will never use localhost in their connection strings. By the way, changing the connection string from localhost to the IP address of the host container gives a very different outcome similar to the Docker bridge network scenario.

 

Using an overlay network

Using a single docker host doesn’t fit well with HA or scalability requirements and in a mission-critical environment I strongly guess no customer will go this way. I recommended to my customer to consider using an orchestrator like Docker Swarm or K8s to anticipate future container workload that was coming from future projects. The customer picked up Docker Swarm for its easier implementation compared to K8s.

 

After implementing a proof of concept for testing purposes (3 nodes included one manager and two worker nodes), we took the opportunity to measure the potential overhead implied by the overlay network which is the common driver used by containers through stacks and services in such situation. Referring to the Docker documentation overlay networks manage communications among the Docker daemons participating in the swarm and used by services deployed on it. Here the docker nodes in the swarm infrastructure:

$ docker node ls
ID                            HOSTNAME                    STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
vvdofx0fjzcj8elueoxoh2irj *   docker1.dbi-services.test   Ready               Active              Leader              18.09.5
njq5x23dw2ubwylkc7n6x63ly     docker2.dbi-services.test   Ready               Active                                  18.09.5
ruxyptq1b8mdpqgf0zha8zqjl     docker3.dbi-services.test   Ready               Active                                  18.09.5

 

An ingress overlay network is created by default when setting up a swarm cluster. User-defined overlay network may be created afterwards and extends to the other nodes only when needed by containers.

$ docker network ls | grep overlay
NETWORK ID    NAME              DRIVER   SCOPE
ehw16ycy980s  ingress           overlay  swarm

 

Here the result of the iperf3 benchmark:

Well, the same result than the previous test with roughly 30% of performance drop. Compared to the initial reference, this is again an expected outcome but I didn’t imagine how important could be the impact in such case.  Overlay network introduces additional overhead by putting together behind the scene a VXLAN tunnel (virtual Layer 2 network on top of an existing Layer 3 infrastructure), VTEP endpoints for encapsulation/de-encapsulation stuff and traffic encryption by default.

Here a summary of the different scenarios and their performance impact:

Scenario Throughput (GB/s) Performance impact
Host network 10.3
Docker host network 10.3
Docker bridge network 8.93 0.78
Docker proxy 7.37 0.71
Docker overlay network 7.04 0.68

 

In the particular case of my customer where SQL Server instances sit on the Docker infrastructure and applications reside outside of it, it’s clear that using directly Docker host network may be a good option from a performance standpoint assuming this infrastructure remains simple with few SQL Server containers. But in this case, we have to change the SQL Server default listen port with MSSQL_TCP_PORT parameter because using Docker host networking doesn’t provide port mapping capabilities. According to our tests, we didn’t get any evidence of performance improvement in terms of application response time between Docker network drivers but probably because those applications are not network bound here. But I may imagine scenarios where it can be. Finally, this kind of scenario encountered here is likely uncommon and I see containerized apps with database components outside the Docker infrastructure more often but it doesn’t change the game at all and the same considerations apply here … Today I’m very curious to test real microservices scenarios where database and application components are all sitting on a Docker infrastructure.

See you!

 

Cet article SQL Server containers and docker network driver performance considerations est apparu en premier sur Blog dbi services.


Telling the PostgreSQL optimizer more about your functions

$
0
0

When you reference/call functions in PostgreSQL the optimizer does not really know much about the cost nor the amount of rows that a function returns. This is not really surprising as it is hard to predict what the functions is doing and how many rows will be returned for a given set of input parameters. What you might not know is, that indeed you can tell the optimizer a bit more about your functions.

As usual let’s start with a little test setup:

postgres=# create table t1 ( a int, b text, c date );
CREATE TABLE
postgres=# insert into t1 select a,a::text,now() from generate_series(1,1000000) a;
INSERT 0 1000000
postgres=# create unique index i1 on t1(a);
CREATE INDEX
postgres=# analyze t1;
ANALYZE

A simple table containing 1’000’000 rows and one unique index. In addition let’s create a simple function that will return exactly one row from that table:

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql;

What is the optimizer doing when you call that function?

postgres=# explain (analyze) select f_tmp (1);
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..5.27 rows=1000 width=32) (actual time=0.654..0.657 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.004 rows=1 loops=1)
 Planning Time: 0.047 ms
 Execution Time: 0.696 ms
(4 rows)

We know that only one row will be returned but the optimizer is assuming that 1000 rows will be returned. This is the default and documented. So, no matter how many rows will really be returned, PostgreSQL will always estimate 1000. But you have some control and can tell the optimizer that the function will return one row only:

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql
   rows 1;

Looking again at the execution plan again:

postgres=# explain (analyze) select f_tmp (1);
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..0.27 rows=1 width=32) (actual time=0.451..0.454 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.004 rows=1 loops=1)
 Planning Time: 0.068 ms
 Execution Time: 0.503 ms
(4 rows)

Instead of 1000 rows we now do see that only 1 row was estimated which is what we specified when we created the function. Of course this is a very simple example and in reality you often might not be able to tell exactly how many rows will be returned from a function. But at least you can provide a better estimate as the default of 1000. In addition you can also specify a cost for your function (based on cpu_operator_cost):

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql
   rows 1
   cost 1;

If you use functions remember that you can give the optimizer more information and that there is a default of 1000.

Cet article Telling the PostgreSQL optimizer more about your functions est apparu en premier sur Blog dbi services.

Converting columns from one data type to another in PostgreSQL

$
0
0

Usually you should use the data type that best fits the representation of your data in a relational database. But how many times did you see applications that store dates or numbers as text or dates as integers? This is not so uncommon as you might think and fixing that could be quite a challenge as you need to cast from one data type to another when you want to change the data type used for a specific column. Depending on the current format of the data it might be easy to fix or it might become more complicated. PostgreSQL has a quite clever way of doing that.

Frequent readers of our blog might know that already: We start with a simple, reproducible test setup:

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values ( 1, '20190101');
INSERT 0 1
postgres=# insert into t1 values ( 2, '20190102');
INSERT 0 1
postgres=# insert into t1 values ( 3, '20190103');
INSERT 0 1
postgres=# select * from t1;
 a |    b     
---+----------
 1 | 20190101
 2 | 20190102
 3 | 20190103
(3 rows)

What do we have here? A simple table with two columns: Column “a” is an integer and column “b” is of type text. For humans it seems obvious that the second column in reality contains a date but stored as text. What options do we have to fix that? We could try something like this:

postgres=# alter table t1 add column c date default (to_date('YYYYDDMM',b));
psql: ERROR:  cannot use column reference in DEFAULT expression

That obviously does not work. Another option would be to add another column with the correct data type, populate that column and then drop the original one:

postgres=# alter table t1 add column c date;
ALTER TABLE
postgres=# update t1 set c = to_date('YYYYMMDD',b);
UPDATE 3
postgres=# alter table t1 drop column b;
ALTER TABLE

But what is the downside of that? This will probably break the application as the column name changed and there is no way to avoid that. Is there a better way of doing that? Let’s start from scratch:

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values ( 1, '20190101');
INSERT 0 1
postgres=# insert into t1 values ( 2, '20190102');
INSERT 0 1
postgres=# insert into t1 values ( 3, '20190103');
INSERT 0 1
postgres=# select * from t1;
 a |    b     
---+----------
 1 | 20190101
 2 | 20190102
 3 | 20190103
(3 rows)

The same setup as before. What other options do we have to convert "b" to a real date without changing the name of the column. Let's try the most obvious way and let PostgreSQL decide what to do:

postgres=# alter table t1 alter column b type date;
psql: ERROR:  column "b" cannot be cast automatically to type date
HINT:  You might need to specify "USING b::date".

This does not work as PostgreSQL in this case can not know how to go from one data type to another. But the “HINT” does already tell us what we might need to do:

postgres=# alter table t1 alter column b type date using (b::date);
ALTER TABLE
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | date    |           |          | 

postgres=# 

For our data in the “b” column that does work. but consider you have data like this:

postgres=# drop table t1;
DROP TABLE
postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values (1,'01-JAN-2019');
INSERT 0 1
postgres=# insert into t1 values (2,'02-JAN-2019');
INSERT 0 1
postgres=# insert into t1 values (3,'03-JAN-2019');
INSERT 0 1
postgres=# select * from t1;
 a |      b      
---+-------------
 1 | 01-JAN-2019
 2 | 02-JAN-2019
 3 | 03-JAN-2019
(3 rows)

Would that still work?

postgres=# alter table t1 alter column b type date using (b::date);;
ALTER TABLE
postgres=# select * from t1;
 a |     b      
---+------------
 1 | 2019-01-01
 2 | 2019-01-02
 3 | 2019-01-03
(3 rows)

Yes, but in this case it will not:

DROP TABLE
postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values (1,'First--January--19');
INSERT 0 1
postgres=# insert into t1 values (2,'Second--January--19');
INSERT 0 1
postgres=# insert into t1 values (3,'Third--January--19');
INSERT 0 1
postgres=# select * from t1;
 a |          b           
---+---------------------
 1 | First--January--19
 2 | Second--January--19
 3 | Third--January--19
(3 rows)

postgres=# alter table t1 alter column b type date using (b::date);;
psql: ERROR:  invalid input syntax for type date: "First--January--19"
postgres=# 

As PostgreSQL has no idea how to do the conversion this will fail, no surprise here. But still you have the power of doing that by providing a function that does the conversion in exactly the way you want to have it:

create or replace function f_convert_to_date ( pv_text in text ) returns date
as $$
declare
begin
  return date('20190101');
end;
$$ language plpgsql;

Of course you would add logic to parse the input string so that the function will return the matching date and not a constant as in this example. For demonstration purposes we will go with this fake function:

postgres=# alter table t1 alter column b type date using (f_convert_to_date(b));;
ALTER TABLE
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | date    |           |          | 

postgres=# select * from t1;
 a |     b      
---+------------
 1 | 2019-01-01
 2 | 2019-01-01
 3 | 2019-01-01
(3 rows)

… and here we go. The column was converted from text to date and we provided the exact way of doing that by calling a function that contains the logic to do that. As long as the output of the function conforms to the data type you want and you did not do any mistakes you can potentially go from any source data type to any target data type.

There is one remaining question: Will that block other sessions selecting from the table while the conversion is ongoing?

postgres=# drop table t1;
DROP TABLE
postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 select a, '20190101' from generate_series(1,1000000) a;
INSERT 0 1000000
postgres=# create index i1 on t1(a);
CREATE INDEX

In one session we will do the conversion and in the other session we will do a simple select that goes over the index:

-- first session
postgres=# alter table t1 alter column b type date using (f_convert_to_date(b));

Second one at the same time:

-- second session
postgres=# select * from t1 where a = 1;
-- blocks

Yes, that will block, so you should plan such actions carefully when you have a busy system. But this is still better than adding a new column.

Cet article Converting columns from one data type to another in PostgreSQL est apparu en premier sur Blog dbi services.

What you should measure on your database storage and why

$
0
0

How can you do a good capacity and performance management for your databases, when you do not know, what your storage system is capable of, and what you need?
In this article, we are discussing a way to test a storage system from an Oracle database point of view.

Before you read

This is a series of different blog posts:
First, we talk about “What you should measure on your database storage and why” aka this post 😉
The second blog post will talk about “How to do database storage performance benchmark with FIO”.
The third blog post will show “How good is the new HUAWEI Dorada 6000V3 All-Flash System for databases” measured with the methods and tools from post one and two.

The first two posts give you the theory to understand all the graphics and numbers I will show in the third blog post.

Often when I arrive at a customer, they show me their new storage system and pointing out how fast and powerful it is. When I ask for performance specs or performance baselines, I often hear, it is very quick because it is Full-Flash but no facts & figures. A lot of full flash systems provide a suitable performance, but what is suitable for your business and how do you know what you can build on when you do not know how solid your basement is?

Oracle I/O types

The “normal” traffic an Oracle database creates has the same size as the database block size.
The most databases (I would say >95%) will use 8 KByte block size.

All multiblock I/O from Oracle (Full Table Scans, Backups, Duplicates, etc.) has a block size of 1024 KByte I/O.
This is configurable over the db_file_multiblock_read_count: Yes you can configure, that a multi-block I/O is smaller than 1024 KByte but normally this does not make sense on modern storage systems and RMAN Backup/Restore and Duplicate creates 1 MByte blocksize independent of the value of this parameter.

We should test 8 Kbyte and 1024 KByte and of course a mix of both because a normal workload will always be a mixture of these two.

But this is just half the truth:
Some other questions a good storage performance test should answer is:
-How many devices should I bundle into a disk group for best performance?
-How many backups/duplicates can I run in parallel to my normal database workload without interfering with it?
-What is the best rebalance power I can use on my system?

What tool to use?

We should do the tests at least with 2 different tools. This ensures, the correctness of the result.
Also, the performance view within the storage should show the same values as the test tool.
The last point sounds easy but is often not. E.g. when you are in a virtualized environment with OVM before Version 3.3 and you creating 1000 IOPS @ 1 MByte you would expect to see 1000 IOPS@1MByte on server and storage side (assuming your storage can handle that) BUT you will see approx. 24k IOPS@42.665 KBytes block size. This because the virtual disk handler breaks the IO request into this block size.

See here the answer from the Oracle support about this behavior:
OVM blocksize

On OVM 3.4 this value changed to 128 KByte so every 1 MByte IO is split up into 8 IOPS. Better than before, but still room for improvement.

Because (normally) one server can not fully utilize a powerfully All-Flash Storage, we should run the tests parallel from multiple servers.
This means, we need a tool, which can run multiple tests on multiple servers at best in just one run.
Also, that tool has to record not just the IOPS performance but also the service time (latency).
Because what brings some 100k IOPS when the latency is bad?

I worked with iozone for the last few years, but it has two drawbacks:
-It does not measure the service time.
-No option to run a test on multiple servers (ok manually but not really accurate…)

The best tool for these test cases IMHO is FIO.
Some of my colleagues already talked about it in former blog posts:
FIO (Flexible I/O) – a benchmark tool for any operating system by Gregory Steulet

Simulating database-like I/O activity with Flexible I/O by Yann Neuhaus

Als an interessting article is from PureStorage: IO Plumbing tests with fio

Test procedure

So we have our storage, we have the server and we know which block sizes we wanna test and with which tool. But how do we test?

Let us start small and get bigger

There are two parameters we wanna increase
-Number of disks
-Number of parallel threads

Find the limits of your storage solution

We start with: 1x disk and 1x thread, then we increase both steps by step up to 10 disks and 10 threads.
Every of these test runs for 8 KByte and 1024 KByte and for different load types
8 KByte: random read, random write, sequential read, sequential write
1024 KByte: sequential read and sequential write

So a complete test set will have 6 tests x 10 Devices x 10 threads x 60 seconds each which sums up to 36’000 seconds or 10 hours for a complete test run.

Initially, we run this test on every server separately to check, that all servers have the same performance.

Yes, you need time to measure a storage system. There is now a shortcut. I often see, that system administrators just do a quick test with dd (mostly without oflag=direct) and think that replaces a complete performance test. Suprise: It does not!

Compressed and Uncompressed

When your storage system offers an option to compress the disks, you should do the test once with and once without storage compression.
Even or better especially when your storage vendors tell you, that you will not recognize the difference…

After the initial tests, we do the same tests in parallel from multiple servers and if we even then should not have reached the maximum possible performance of the storage system, we can start to increase the threads or the number of disks.

But these are “just” max out tests. These test should give us a picture of what is possible with a single type of I/O. Normally we will never have just one type of I/O. We always will have a mixture of small blocks with sequential and random read/write and big blocks with mostly sequential read and sometimes sequential write.

To simulate this, we will create a baseline of 8 KBybte blocks with random read and random write. The amount of IOPS you need to generate depends on your database load profile. If you do not know your current load profile, start with 40-50k IOPS.
Then add 1000 IOPS with 1024 KByte block size of reading and write to simulate an Oracle duplicate from an active database over a 10 GBit/s link.

Why 1000 IOPS?
A 10GBit/s link can transmit 1125 MByte/s (10’000 MBit/s / 8 ==> 1125 MByte/s). Normally you do not reache that completle. I have seen setups (even without MTU:9000) where we reached 1070 – 1100 MByte/s.
But for easy calculation and analysis the data we assume, that a 10GBit/s link can transmit 1000 MByte/s.

We start adding up these duplicates until we see a drop in performance.
This tests will show us, how many backups/duplicates/full table scans we can run in parallel until we have a performance impact on our normal workload.

Short Summary

For Oracle databases, test your storage with 8 KByte and 1024 KByte block size.
Run the tests from 1 disk 1 thread to 10 disks 10 threads
Use asynchronous I/O
Use direct I/O
Test peak and mixed workload
Read my other blog posts 😉

The next blog post will show you how to configure the open source tool FIO to run a test in the described way.

So long

Cet article What you should measure on your database storage and why est apparu en premier sur Blog dbi services.

Storage performance benchmarking with FIO

$
0
0

Learn how to do storage performance benchmarks for your database system with the open source tool FIO.

Before you read

This is a series of different blog posts:
In the first blog post, I talk about “What you should measure on your database storage and why”.
The second blog post will talk about “How to do database storage performance benchmark with FIO” (aka this one here).
The third blog post will show “How good is the new HUAWEI Dorada 6000V3 All-Flash System for databases” measured with the methods and tools from post one and two.

The first two posts give you the theory to understand all the graphics and numbers I will show in the third blog post.

Install FIO

Many distributions have FIO in their repositories. On a Fedora/RHEL system, you can just use
yum install fio
and you are ready to go.

Start a benchmark with FIO

There are mainly two different ways to start a benchmark with FIO

Command line

Starting from the command line is the way to go when you just wanna have a quick feeling about the system performance.
I prefer to do more complex setups with job files. It is easier to create and debug.
Here a small example how to start a benchmark direct from the command line:
fio --filename=/dev/xvdf --direct=1 --rw=randwrite --refill_buffers --norandommap \
--randrepeat=0 --ioengine=libaio --bs=128k --rate_iops=1280 --iodepth=16 --numjobs=1 \
--time_based --runtime=86400 --group_reporting –-name=benchtest

FIO Job files

An FIO job file holds a [GLOBAL] section and one or many [JOBS] sections. This section holds the shared parameters which are used for all the jobs when you do not override them in the job sections.
Here is what a typical GLOBAL section from my files looks like:
[global] ioengine=libaio    #ASYNCH IO
invalidate=1       #Invalidate buffer-cache for the file prior to starting I/O.
                   #Should not be necessary because of direct IO but just to be sure ;-)
ramp_time=5        #First 5 seconds do not count to the result.
iodepth=1          #Number of I/O units to keep in flight against the file
runtime=60         #Runtime for every test
time_based         #If given, run for the specified runtime duration even if the files are completely read or written.
                   #The same workload will be repeated as many times as runtimeallows.
direct=1           #Use non buffered I/O.
group_reporting=1  #If set, display per-group reports instead of per-job when numjobs is specified.
per_job_logs=0     #If set, this generates bw/clat/iops log with per file private filenames.
                   #If not set, jobs with identical names will share the log filename.
bs=8k              #Block size
rw=randread        #I/O Type

Now that we have defined the basics, we can start with the JOBS section:
Example of single device test with different parallelity:


#
#Subtest: 1
#Total devices = 1
#Parallelity = 1
#Number of processes = devices*parallelity ==> 1*1 ==> 1
#
[test1-subtest1-blocksize8k-threads1-device1of1]     #Parallelity 1, Number of device: 1/1
stonewall                               #run this test until the next [JOB SECTION] with the “stonewall” keyword
filename=/dev/mapper/device01           #Device to use
numjobs=1                               #Create the specified number of clones of this job.
                                        #Each clone of job is spawned as an independent thread or process.
                                        #May be used to setup a larger number of threads/processes doing the same thing.
                                        #Each thread is reported separately: to see statistics for all clones as a whole
                                        #use group_reporting in conjunction with new_group.
#
#Subtest: 5
#Total devices = 1
#Parallelity = 5
#Number of processes = devices*parallelity ==> 1*5 ==> 5
#
[test1-subtest5-blocksize8k-threads5-device1of1]     #Parallelity 5, Number of device: 1/1
stonewall
numjobs=5
filename=/dev/mapper/device01

Example of multi device test with different parallelity:

#Subtest: 1
#Total devices = 4
#Parallelity = 1
#Number of processes = devices*parallelity ==> 4
#
[test1-subtest1-blocksize8k-threads1-device1of4]     # Parallelity 1, Number of device 1/4
stonewall
numjobs=1
filename=/dev/mapper/device01
[test1-subtest1-blocksize8k-threads1-device2of4]     # Parallelity 1, Number of device 2/4
numjobs=1
filename=/dev/mapper/device02
[test1-subtest1-blocksize8k-threads1-device3of4]     # Parallelity 1, Number of device 3/4
numjobs=1
filename=/dev/mapper/device03
[test1-subtest1-blocksize8k-threads1-device4of4]     # Parallelity 1, Number of device 4/4
numjobs=1
filename=/dev/mapper/device04
#
#Subtest: 5
#Total devices = 3
#Parallelity = 5
#Number of processes = devices*parallelity ==> 5
#
[test1-subtest5-blocksize8k-threads5-device1of3]     # Parallelity 5, Number of device 1/3
stonewall
numjobs=5
filename=/dev/mapper/device01
[test1-subtest5-blocksize8k-threads5-device2of3]     # Parallelity 5, Number of device 2/3
filename=/dev/mapper/device02
[test1-subtest5-blocksize8k-threads5-device3of3]     # Parallelity 5, Number of device 3/3
filename=/dev/mapper/device03

You can download a compelete set of FIO job files for running the described testcase on my github repository.
Job files list

To run a complete test with my job files you have to replace the devices. There is a small shell script to replace the devices called “replaceDevices.sh”

#!/bin/bash
######################################################
# dbi services michael.wirz@dbi-services.com
# Vesion: 1.0
#
# usage: ./replaceDevices.sh
#
# todo before use: modify newname01-newname10 with
# the name of your devices
######################################################
sed -i -e 's_/dev/mapper/device01_/dev/mapper/newname01_g' *.fio
sed -i -e 's_/dev/mapper/device02_/dev/mapper/newname02_g' *.fio
sed -i -e 's_/dev/mapper/device03_/dev/mapper/newname03_g' *.fio
sed -i -e 's_/dev/mapper/device04_/dev/mapper/newname04_g' *.fio
sed -i -e 's_/dev/mapper/device05_/dev/mapper/newname05_g' *.fio
sed -i -e 's_/dev/mapper/device06_/dev/mapper/newname06_g' *.fio
sed -i -e 's_/dev/mapper/device07_/dev/mapper/newname07_g' *.fio
sed -i -e 's_/dev/mapper/device08_/dev/mapper/newname08_g' *.fio
sed -i -e 's_/dev/mapper/device09_/dev/mapper/newname09_g' *.fio
sed -i -e 's_/dev/mapper/device10_/dev/mapper/newname10_g' *.fio

!!!After you replaced the filenames you should double check, that you have the correct devices, because when you start the test, all data on these devices is lost!!!

grep filename *.fio|awk -F '=' '{print $2}'|sort -u
/dev/mapper/device01
/dev/mapper/device02
/dev/mapper/device03
/dev/mapper/device04
/dev/mapper/device05
/dev/mapper/device06
/dev/mapper/device07
/dev/mapper/device08
/dev/mapper/device09
/dev/mapper/device10

To start the test run:

for job_file in $(ls *.fio)
do
    fio ${job_file} --output /tmp/bench/${job_file%.fio}.txt
done

Multiple Servers

FIO supports to do tests on multiple servers in parallel which is very nice! Often a single server can not max out a modern all-flash storage system, this could be of bandwidth problems (e.g. not enough adapters per server) or one server is just not powerful enough.

You need to start FIO in server mode on all machines you wanna test:
fio --server

Then you start the test with
fio --client=serverA,serverB,serverC /path/to/fio_jobs.file

Should you have a lot of servers you can put them in a file and use this as input for your fio command:


cat fio_hosts.list
serverA
serverB
serverC
serverD
...

fio --client=fio_hosts.list /path/to/fio_jobs.file

Results

The output files are not really human readable, so you can go with my getResults.sh script which formats you the output ready to copy/past to excel:


cd /home/user/Huawei-Dorado6000V3-Benchmark/TESTRUN5-HOST1_3-COMPR/fio-benchmark-output
bash ../../getResults.sh
###########################################
START :Typerandread-BS8k
FUNCTION: getResults
###########################################
Typerandread-BS8k
LATENCY IN MS
.399 .824 1.664 2.500 3.332 5.022 6.660 8.316 12.464 16.683
.392 .826 1.667 2.495 3.331 4.995 6.680 8.344 12.474 16.637
.397 .828 1.661 2.499 3.330 4.992 6.656 8.329 12.505 16.656
.391 .827 1.663 2.493 3.329 5.002 6.653 8.330 12.482 16.656
.398 .827 1.663 2.497 3.327 5.005 6.660 8.327 12.480 16.683
.403 .828 1.662 2.495 3.326 4.995 6.663 8.330 12.503 16.688
.405 .825 1.662 2.496 3.325 4.997 6.648 8.284 12.369 16.444
.417 .825 1.661 2.497 3.326 4.996 6.640 8.256 12.303 16.441
.401 .826 1.661 2.500 3.327 4.999 6.623 8.273 12.300 16.438
.404 .826 1.661 2.500 3.327 4.993 6.637 8.261 12.383 16.495
IOPS
2469 6009 5989 5986 5991 5966 5998 6006 6012 5989
5004 12000 11000 11000 11000 11000 11000 11000 12000 12000
7407 17000 18000 17000 17000 18000 18000 17000 17000 17000
10000 23000 23000 24000 23000 23000 24000 23000 24000 23000
12300 29000 29000 29000 30000 29900 29000 29000 30000 29900
14600 35900 35000 35000 36000 35000 35000 35000 35000 35900
16000 42100 41000 41000 42000 41000 42100 42200 42400 42500
16500 42100 41000 41900 42000 41000 42100 42400 42600 42500
19600 48000 47000 47900 47000 47900 48300 48300 48700 48600
21900 54000 53000 53900 53000 53000 54200 54400 54400 54400
###########################################
START :Typerandwrite-BS8k
FUNCTION: getResults
###########################################
Typerandwrite-BS8k
LATENCY IN MS
.461 .826 1.662 2.501 3.332 5.022 6.660 8.317 12.467 16.676
.457 .826 1.668 2.495 3.330 5.002 6.681 8.346 12.473 16.635
.449 .826 1.662 2.499 3.327 4.991 6.664 8.326 12.497 16.649
.456 .828 1.661 2.496 3.331 4.997 6.663 8.329 12.477 16.651
.460 .827 1.663 2.495 3.327 5.001 6.660 8.333 12.484 16.676
.463 .830 1.663 2.495 3.325 4.997 6.661 8.330 12.503 16.684
.474 .827 1.661 2.495 3.324 4.999 6.665 8.334 12.451 16.580
.469 .828 1.661 2.497 3.324 5.002 6.668 8.322 12.489 16.594
.471 .827 1.660 2.499 3.327 4.998 6.663 8.335 12.481 16.609
.476 .825 1.675 2.500 3.328 4.992 6.675 8.334 12.480 16.623
IOPS
2137 5997 5990 5985 5991 5966 5998 6005 6010 5992
4306 12000 11900 11000 11000 11000 11000 11000 12000 12000
6571 17000 17000 17000 18000 18000 17000 17000 17000 18000
8635 23900 23000 23000 23000 23000 23000 23000 24000 24000
10700 29000 29000 29000 30000 29900 29000 29000 30000 29000
12800 35900 35000 35000 36000 35000 35000 35000 35000 35900
14500 41000 41000 41000 42000 41000 41000 41000 42100 42200
14700 41000 41000 41900 42000 41900 41900 42000 42000 42100
16700 48000 48000 47900 47000 47000 47000 47900 47000 48100
18600 54100 53500 53900 53000 54000 53900 53900 53000 54100
...

Copy & paste the result into the excel template and you can have an easy over view of the results:
fio summary excel

Troubleshooting

If you’ve got a libaio error you have to install the libaio libraries:

fio: engine libaio not loadable
fio: failed to load engine
fio: file:ioengines.c:89, func=dlopen, error=libaio: cannot open shared object file: No such file or directory

yum install libaio-devel

Cet article Storage performance benchmarking with FIO est apparu en premier sur Blog dbi services.

Huawei Dorado 6000 V3 benchmark

$
0
0

I had the opportunity to test the new Dorada 6000 V3 All-Flash storage system.
See what the all-new Dorado 6000 V3 All-Flash Storage system is capable as storage for your database system.

Before you read

This is a series of different blog posts:
In the first blog post, I talk about “What you should measure on your database storage and why”.
The second blog post will talk about “How to do database storage performance benchmark with FIO”.
The third blog post will show “How good is the new HUAWEI Dorada 6000V3 All-Flash System for databases” measured with the methods and tools from post one and two (aka this one here).

The first two posts give you the theory to understand all the graphics and numbers I will show in the third blog post.

So in this post, we see, what are the results when we test a Huawei Dorado 6000V3 All-Flash storage system with these technics.

I uploaded all the files to a github repository: Huawei-Dorado6000V3-Benchmark.

Foreword

The setup was provided by Huawei in Shengsen, China. I’ve got remote access with a timeout at a certain point. Every test run runs for 10h, because of the timeout I was sometimes not able to capture all performance view pictures. That’s why some of the pictures are missing. Storage array and servers were provided free of charge, there was no exercise of influence from Huawei on the results or conclusion in any way.

Setup

4 Server were. provided, each with 4×16 GBit/s FC adapter direct connected to the storage systems.
There are 256 GByte of memory installed and 2x 14 Cores 2.6 GHz E5-2690 Intel CPUs.
Hyperthreading is disabled.
The 10 GBit/s network interfaces are irrelevant for this test here because all storage. traffic runs over FC.

The Dorado 6000 V3 System has 1 TByte of cache and 50x 900 GByte SSD from Huawei.
Deduplication was disabled.
Tests were made with and without compression.

Theoretical max speed

With 4x16GBit/s a maximal throughput of 64 GBit/s or 8 GByte/s is possible.
In IOPS this means we can transmit 8192 IOPS with 1 MByte block size or 1’048’576 IOPS with 8 KByte block size.
As mentioned in the title, this is theoretically or raw bandwidth, the usable bandwidth or payload is, of course, smaller: A FC-frames is 2112 bytes with 36 bytes of protocol overhead.
So in a 64 GBit/s FC network we can transfer: 64GBit/s / 8 ==> 8GByte/s * 1024 ==> 8192 MByte/s (raw) * (100-(36/2.112))/100 ==> 6795MByte/s (payload).

So we end up with a maximum of 6975 IOPS@1MByte or 869’841 IOPS@8KByte (payload) not included is the effect, that we are using multipathing* with 4x16GBit/s, which will also consume some power.

*If somebody out there has a method to calculate the overhead of multipathing in such a setup, please contact me!

Single-Server Results

General

All single server tests were made on devices with enabled data compression. Unfortunately, I do not have the results from my tests with uncompressed devices for single server anymore, but you can see the difference in the multi-server section.

8 KByte block size

The 8 KByte block size tests on a single server were very performant.
What we can already tell, as higher the parallelity as better the storage performs. This is not really a surprise. Most storage systems work better, as higher the parallel access is.
Specialy for 1 thread, we see the differenc between having one disk in a diskgroup and be able to use 3967 IOPS or using e.g. 5 disks and 1 thread an be able to use 16700 IOPS.
The latency for all tests was great with 0.25ms to 0.4ms for reading operation and 0.1 to 0.4ms for write operations.
The 0.1ms for write is not that impressive, because it is mainly the performance of the write cache, but even when we exceeded the write cache we were not higher then 0.4ms

1 MByte block size

On the 1 MByte tests, we see, that we already hit the max speed with 6 devices (parallelity of 6) to 9 devices (parallelity 2).

As an example to interpret the graphic, when you have a look at the green line (6 devices), we reach the peak performance at a parallelity of 6.
For the dark blue line (7 devices) we hit the max peak at parallelity 4 and so on.

If we increase the parallelity over this point, the latency will grow or even the throughput will decrease.
For the 1 MByte tests, we hit a limitation at around 6280 IOPS. This is around 90% of the calculated maximum speed.

So if we go with Oracle ASM, we should bundle at least 5 devices together to a diskgroup.
We also see, that when we run a rebalance diskgroup we should go for a small rebalance power. A value smaller than 4 should be chosen, every value over 8 is counterproductive and will consume all possible I/O on your system and slow all databases on this server.

Monitoring / Verification

To verify the results, I am using dbms_io_calibration on the very same devices as the performance test was running. The expectation is, that we will see more or less the same results.

On large IO the measured 6231 IOPS by IO calibration is almost the same as measured by FIO (+/- 1%).
IO calibration measured 604K IOPS for small IO, which is significantly more than the +/- 340kw IOPS measured with FIO. This is explainable because IO calibration is working with the number of disks for the parallelity and I did this test with 20 disks instead. of 10. Sadly when I realized my mistake, I already had no more access to the system.

In the following pictures you see the performance view of the storage system with the data measured by FIO as an overlay. As we can see, the values for the IOPS matches perfectly.
The value for latency was lower on the storage part, which is explainable with the different points where we are measuring (once on the storage side, once on the server side).
All print screens of the live performance view of the storage can be found in the git repository. The values for Queue depth, throughput, and IOPS matched perfectly with the measured results.


Multi-Server Results with compression

General

The tests for compressed and uncompressed devices were made with 3 parallel servers.

8 KByte block size

For random read with 8 KByte blocks, the IOPS increased almost linear from 1 to 3 nodes and we hit a peak of 655’000 IOPS with 10 devices / 10 threads. The answer time was between 0.3 and 0.45 ms.
For random write, we hit some kind of limitation at around 250k IOPS. We could not get a higher value than that which was kind of surprising for me. I would have expected better results here.
From the point, where we hit the maximum number of IOPS we see the same behavior as with 1 MByte blocks: More threads does only increase the answer time but does not get you better performance.
So for random write with 8 KByte blocks, the maximum numbers are around 3 devices and 10 threads or 10 devices and 3 threads or a parallelity of 30.
As long as we stay under this limit we see answer times between 0.15 and 0.5ms, over this limit the answer times can increase <10ms.

1 MByte block size

The multi-server tests show some interesting behavior with large reads on this storage system.
We hit a limitation at around 7500 to 7800 IOPS per second. For sequential write, we could achieve almost double this result with up to 14.5k IOPS.

Of course, I discussed all the results with Huawei to see their view on my tests.
The explanation for the way better performance on write then read was, with write we go straight to the 1 TByte big cache, for reading the system had to scratch everything from disk. This Beta-Firmware version did not have any read cache and that’s why the results were lower. All Firmwares starting from the end of February do have also read cache.
I go with this answer and hope to retest it in the future with the newest firmware, still thinking the 7500 IOPS is a little bit low even without read cache.

Multi-Server Results without compression

Comparing the results for compressed devices to uncompressed devices we see an increase of IOPS up to 30% and a decrease of latency at the same level for 8 KByte block size.
For 1 MByte sequential read, the difference was smaller with around 10%, for 1 MByte sequential write we could gain an increase of around 15-20%.

Multi-Server Results with high parallelity

General

Because the tests with 3 servers did no max out the storage on the 8 KByte block size, I decided to do a max test with 4 parallel servers and with a parallelity from 1-100 instead of 1-10.
The steps were 1,5,10,15,20,30,40,50,75 and 100.
These tests were only performed on uncompressed devices.

8 KByte block size

It took 15 threads (per server) with 10 devices: 60 processes in total to reach the peak performance of the Dorado 6000V3 systems.
At this point, we reached 8 KByte random read 940k IOPS @0.637 ms. Remembering the answer, that this Firmware version does not have any read cache, this performance is achieved completely from the SSDs and could theoretically be even better with enabled read cache
If we increase the parallelity further, we see the same effect as with 1 MByte blocks: the answer time is increasing (dramatically) and the throughput is decreasing.

Depending on the number of parallel devices, we need between 60 parallel processes (with 10 devices) up to 300 parallel processes (with 3 parallel devices).

1 MByte block size

For the large IOs, we see the same picture as with 1 or 3 servers. A combined parallelity of 20-30 can max out the storage systems. So be very careful with your large IO tasks not to affect the other operations on the storage system.

Mixed Workload

After these tests, we know, the upper limit for this storage in single case tests. In a normal workload, we will never see only one kind of IO: There will always be a mixture of 8 KByte read & write IOPS side by side with 1 MByte IO. To simulate this picture, we create two FIO files. One creates approx: 40k-50k IOPS with random read and random write in a 50/50 split.
This will be our baseline, then we add approx. 1000 1 MByte IOPS every 60 seconds and see how the answer time reacts.


As seen in this picture from the performance monitor of the storage system the 1 MByte IOPS blocks had two effects on the smaller IOPS
The throughput of the small IOPS is decreasing
The latency is increasing.
In the middle of the test, we stop the small IOPS to see the latency of just the 1 MByte IOPS.

Both effects are expected and within the expected parameters: Test passed.

So with a base workload of 40k-50k IOPS, we can run e.g. backups in parallel with a bandwidth up to 5.5 GByte/s without interfering with the database work or we can do up to 5 active duplicates on the same storage without interfering with the other databases.

Summary

This storage system showed a fantastic performance at 8 KByte block size with very low latency. Especially the high number of parallel processes we can run against it before we hit the peak performance makes it a good choice to serving a large number of Oracle databases on it.

The large IO (1 MByte) performance for write operations was good but not that good compared with the excellent 8 KByte performance. The sequential read part is missing the read cache badly compared to the performance which is possible for writing. But even that is not on top of the line compared to other storage systems. Here I had seen other storage systems with a comparable configuration which were able to deliver up to 12k IOPS@1MByte.

Remember the questions from the first blog post:
-How many devices should I bundle into a diskgroup for best performance?
As many as possible.

-How many backups/duplicates can I run in parallel to my normal database workload without interfering with it?
You can run 5 parallel backup/duplicates with 1000 IOPS each without interferring a base line of 40-50k IOPS@8KByte

-What is the best rebalance power I can use on my system?
2-4 is absolutley enough for this system. More will slow down your other operations on the server.

Cet article Huawei Dorado 6000 V3 benchmark est apparu en premier sur Blog dbi services.

Viewing all 2844 articles
Browse latest View live