Quantcast
Channel: dbi Blog
Viewing all 2881 articles
Browse latest View live

Getting in touch with shareplex

$
0
0

Tuesday this week I had the opportunity to get in touch with shareplex, Quest’s replication solution. This product does not rely on Oracle licenses, so can also be used with Standard Edition. It is competitor of Oracle’s GoldenGate an used for asynchronous replication, too.
An interesting feature is that not only committed transactions can be replicated, which is an advantage with big transactions. With this feature replication latencies within seconds can be realized. Also migrations of big databases over operating system, character set and database release boundaries are possible with this solution. Quest mentions that costs are lower than that of comparable Oracle products.

It can be also used for reporting, high availability, distributed processing and load sharing.

Cet article Getting in touch with shareplex est apparu en premier sur Blog dbi services.


Some psql features you are maybe not aware of

$
0
0

It is the time of The 10th Annual PostgreSQL Conference Europe, so this is the perfect time to blog about some tips and tricks around psql you’ll love. psql is such a powerful tool that you really should use it every day. It saves you so much work and is packed with features that makes your life so much easier. In this post we’ll look at some features you maybe didn’t know before.

Lets start with something very simple: You probably know the “\l” shortcut to display all the databases:

postgres=# \l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   
-----------+----------+----------+------------+------------+-----------------------
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres

Did you know you also can pass the shortcuts from your shell directly into psql?

postgres@pgbox:/home/postgres/ [PGDEV] psql -c '\l' postgres
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   
-----------+----------+----------+------------+------------+-----------------------
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres

But there is even a faster way for retrieving that information:

postgres@pgbox:/home/postgres/ [PGDEV] psql -l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   
-----------+----------+----------+------------+------------+-----------------------
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres

Did you know you can log the complete psql session to a logfile?

postgres@pgbox:/home/postgres/ [PGDEV] psql -X -L /var/tmp/log postgres
psql (12devel)
Type "help" for help.

postgres=# select 1;
 ?column? 
----------
        1
(1 row)

postgres=# select 2;
 ?column? 
----------
        2
(1 row)

postgres=# \! cat /var/tmp/log
********* QUERY **********
select 1;
**************************

 ?column? 
----------
        1
(1 row)

********* QUERY **********
select 2;
**************************

postgres=# 

You probably know that copy is the fastest way to get data into and out of PostgreSQL. Did you know you can copy from a program?

postgres=# create table lottery ( draw_date date, winning_numbers text, mega_ball integer, multiplier integer );
CREATE TABLE
postgres=# copy lottery from 
                program 'curl https://data.ny.gov/api/views/5xaw-6ayf/rows.csv?accessType=DOWNLOAD' 
                with (header true, delimiter ',', format csv);
COPY 1713
postgres=# select * from lottery limit 5;
 draw_date  | winning_numbers | mega_ball | multiplier 
------------+-----------------+-----------+------------
 2002-05-17 | 15 18 25 33 47  |        30 |           
 2002-05-21 | 04 28 39 41 44  |         9 |           
 2002-05-24 | 02 04 32 44 52  |        36 |           
 2002-05-28 | 06 21 22 29 32  |        24 |           
 2002-05-31 | 12 28 45 46 52  |        47 |           
(5 rows)

That basically means, whatever “program” you use: As long as the result is something psql understands you can use it.

How often do you dynamically build SQL statements you want to execute right after? There is a quite effective solution for that in psql:

postgres=# select 'create table t'||i||'( a int )' from generate_series(1,10) i; \gexec
         ?column?          
---------------------------
 create table t1( a int )
 create table t2( a int )
 create table t3( a int )
 create table t4( a int )
 create table t5( a int )
 create table t6( a int )
 create table t7( a int )
 create table t8( a int )
 create table t9( a int )
 create table t10( a int )
(10 rows)

CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE

Did you know you can store the result of a query into a variable and use that later in other statements?

postgres=# select 3 as var; \gset
 var 
-----
   3
(1 row)

postgres=# \echo :var
3
postgres=# select * from lottery where multiplier = :var;
 draw_date  | winning_numbers | mega_ball | multiplier 
------------+-----------------+-----------+------------
 2011-02-18 | 05 06 07 30 45  |        42 |          3
 2011-03-01 | 01 12 19 20 47  |        25 |          3
 2011-04-01 | 13 14 35 36 53  |        19 |          3
 2011-04-08 | 06 40 45 50 56  |        11 |          3
 2011-04-15 | 22 23 33 39 48  |        29 |          3
 2011-04-22 | 03 18 46 51 53  |        17 |          3
 2011-04-26 | 19 29 32 38 55  |        15 |          3
 2011-05-06 | 06 18 26 37 41  |         9 |          3
 2011-05-24 | 09 12 21 42 43  |        42 |          3
 2011-05-31 | 28 30 31 37 55  |        13 |          3
 2011-06-03 | 20 23 41 49 53  |        31 |          3
 2011-06-10 | 18 21 27 37 38  |         7 |          3
...

The last one for today is one of my favorites: As with the Linux watch command you can watch in psql:

postgres=# select now();
              now              
-------------------------------
 2018-10-23 21:57:17.298083+02
(1 row)

postgres=# \watch
Tue 23 Oct 2018 09:57:19 PM CEST (every 2s)

              now              
-------------------------------
 2018-10-23 21:57:19.277413+02
(1 row)

Tue 23 Oct 2018 09:57:21 PM CEST (every 2s)

              now              
-------------------------------
 2018-10-23 21:57:21.364605+02
(1 row)

Btw: You can see that the PostgreSQL Conference Europe is a technical conference when you take a look at the exhibition area during the sessions: Almost empty :)
sdr

Cet article Some psql features you are maybe not aware of est apparu en premier sur Blog dbi services.

What’s OpenDB Appliance ?

$
0
0

Thanks to the OpenDB Appliance, the “click, hope and pray” approach is a thing of the past. Use the tool developed by dbi services’ specialists to make your work easier”

OpenDB Appliance makes life easier, discover why.

Get furhter information about the OpenDB Appliance.

Cet article What’s OpenDB Appliance ? est apparu en premier sur Blog dbi services.

Deploy DC/OS using Ansible (Part 1) – Getting Started

$
0
0

To start into this topic I want to shortly explain some basics. Afterwards I show you how to prepare the configuration files.

Ansible

Ansible is a Open Source automation utility. It is used for orchestration and configuration as well as the administration of PCs/Servers. You could say, okay but we have puppet or saltstack or an other framework, why should I use Ansible? Ansible differs! It has no Agent installed on the systems, it just needs a working SSH connection and a python installation. For deploying changes just write a Ansible Playbook, a simple YAML-File. For further information about Ansible just visit the Ansible Homepage.

DC/OS

Mesosphere’s DC/OS is a distributed operating system based on Apache Mesos (read more). It gives you the possibility to manage multiple machines as if they were one. Resource management, process placement scheduling, simplified installations and management of distributed services can be automated using DC/OS. DC/OS comes with a web interface as well as a command-line interface which can be used for monitoring and remote management.
DC/OS can be used as cluster manager, container platform and operating system. A quite mighty tool. To explain all the functionalities would go to far.

For setup the minimal DC/OS Using Ansible you need at least six servers:
– one Ansible
– one Bootstrap
– one Master
– two private Agents
– one public Agent

Bootstrap Node

In general, the bootstrap is the essential one when you spin up a server. It is used as staging location for the software installation, stores the DC/OS configuration and the bootstrap files for the DC/OS.

Master Node

The DC/OS master manages “the rest” of the cluster. It’s possible to run one or more master nodes. They contain most of the DC/OS components and the Mesos master process. It also provides the web interface, which provides a nice graphical view of the DC/OS Cluster

Private Agent Node

The private agents do not allow access from outside the cluster. They provide resources to the cluster.

Public Agent Node

The public agent node is a node on the network, that allows access from the outside of the DC/OS. The public agent is primary used as a kind of load balancer to decrease the surface that could be accessed by attackers.
In a cluster you need less public agent nodes than private agent nodes as they can handle the multiple agent services.

I just described the node components used for the installation. If you want more and deeper insights into DC/OS and it’s architecture, you can find a detailed documentation on the Mesosphere Homepage

Architecture

For the minimal installation of this DC/OS you need six servers:
Each of the server with a public and a private IP expect the ansible server.

servers

To install the DC/OS using Ansible I used the playbooks from GitHub. But as usual there is some specific stuff, when you test it at your environment.

Prepare the ansible server

Install git and get ansible-dcos from git-hub

[root@dcos-ansible ~]# yum install git -y

[root@dcos-ansible ~]# git clone https://github.com/dcos-labs/ansible-dcos
Cloning into 'ansible-dcos'...
remote: Enumerating objects: 69, done.
remote: Counting objects: 100% (69/69), done.
remote: Compressing objects: 100% (48/48), done.
remote: Total 1957 (delta 25), reused 42 (delta 15), pack-reused 1888
Receiving objects: 100% (1957/1957), 312.95 KiB | 0 bytes/s, done.
Resolving deltas: 100% (982/982), done.

[root@dcos-ansible ~]# cd ansible-dcos/
[root@dcos-ansible ansible-dcos]# git tag
v0.1.0-alpha
v0.2.0-alpha
v0.2.1-alpha
v0.3.0-alpha
v0.4.0-alpha
v0.5.0-dcos-1.10
v0.6.0-dcos-1.11
v0.6.1-dcos-1.11
v0.7.0-dcos-1.11
[root@dcos-ansible ansible-dcos]# git checkout v0.7.0-dcos-1.11
Note: checking out 'v0.7.0-dcos-1.11'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at 1f2cf7d... Prepare version v0.7.0-dcos-1.11

Install ansible

[root@dcos-ansible ansible-dcos]# yum install ansible

Prepare the hosts.yaml file

[root@dcos-ansible ansible-dcos]# pwd
/root/ansible-dcos
[root@dcos-ansible ansible-dcos]# cp hosts.example.yaml hosts.yaml

[root@dcos-ansible ansible-dcos]# cat hosts.yaml
---
# Example for an ansible inventory file
all:
  children:
    bootstraps:
      hosts:
        # Public IP Address of the Bootstrap Node
        192.168.22.100:
    masters:
      hosts:
        # Public IP Addresses for the Master Nodes
        192.168.22.101:
    agents:
      hosts:
        # Public IP Addresses for the Agent Nodes
        192.168.22.102:
        192.168.22.103:
    agent_publics:
      hosts:
        # Public IP Addresses for the Public Agent Nodes
        192.168.22.104:
  vars:
    # IaaS target for DC/OS deployment
    # options: aws, gcp, azure or onprem
    dcos_iaas_target: 'onprem'

    # Choose the IP Detect Script
    # options: eth0, eth1, ... (or other device name for existing network interface)
    dcos_ip_detect_interface: 'eth0'

    # (internal/private) IP Address of the Bootstrap Node
    dcos_bootstrap_ip: '10.0.0.1'

    # (internal/private) IP Addresses for the Master Nodes
    dcos_master_list:
      - 10.0.0.2

    # DNS Resolvers
    dcos_resolvers:
      - 8.8.4.4
      - 8.8.8.8

    # DNS Search Domain
    dcos_dns_search: 'None'

    # Internal Loadbalancer DNS for Masters (only needed for exhibitor: aws_s3)
    dcos_exhibitor_address: 'masterlb.internal'

    # External Loadbalancer DNS for Masters or
    # (external/public) Master Node IP Address (only needed for cli setup)
    dcos_master_address: 'masterlb.external'

Create the setup variables for DC/OS

[root@dcos-ansible ansible-dcos]# pwd
/root/ansible-dcos
cp group_vars/all.example group_vars/all

enable SSH access on nodes with Ansible

In case of authentication problem using ansible playbooks, repeat the steps with “exec….” and “sshd-add…”

ssh-keygen -t rsa -b 4096 -C "admin@it.dbi-services.com" -f ~/.ssh/ansible-dcos
[root@dcos-ansible ansible-dcos]# exec /usr/bin/ssh-agent $SHELL
[root@dcos-ansible ansible-dcos]# ssh-add ~/.ssh/ansible-dcos
Enter passphrase for /root/.ssh/ansible-dcos:
Identity added: /root/.ssh/ansible-dcos (/root/.ssh/ansible-dcos)

Enter lines for initial SSH access on all nodes with ansible in group_vars/all

All systems must have the same username/password combination, otherwise ansible runs into failures.
In this step you have to change only the last 4 lines of group_vars/all

[root@dcos-ansible ansible-dcos]# cat group_vars/all
---
# Install latest operating system updates
os_system_updates: False

# DC/OS cluster version
dcos_version: '1.11.4'

# If planning to upgrade a previous deployed DC/OS Cluster,
# uncomment the following variable
#dcos_upgrade_from_version: '1.11.3'

# Download URL for DC/OS
dcos_download: "https://downloads.dcos.io/dcos/stable/{{ dcos_version }}/dcos_generate_config.sh"

# Name of the DC/OS Cluster
dcos_cluster_name: 'demo'

# Deploy Mesosphere Enterprise DC/OS or DC/OS OSS?
dcos_deploy_ee_package: False

# Optional if dcos_iaas_target := aws
#dcos_exhibitor: 'aws_s3'
#dcos_aws_access_key_id: '******'
#dcos_aws_secret_access_key: '******'
#dcos_aws_region: 'us-west-2'
#dcos_s3_bucket: 'bucket-name'

# Optional if dcos_iaas_target := azure
#dcos_exhibitor: 'azure'
#dcos_exhibitor_azure_account_name: 'name'
#dcos_exhibitor_azure_account_key: '******'

# Only required when deploying Mesosphere Enterprise DC/OS
dcos_ee_security: 'permissive'
dcos_ee_license_key_contents: '******'
dcos_ee_superuser_username: admin
# Default password:= admin
dcos_ee_superuser_password_hash: "$6$rounds=656000$8CXbMqwuglDt3Yai$ZkLEj8zS.GmPGWt.dhwAv0.XsjYXwVHuS9aHh3DMcfGaz45OpGxC5oQPXUUpFLMkqlXCfhXMloIzE0Xh8VwHJ."

# Configure rexray to enable support of external volumes (only for Mesosphere Enterprise DC/OS)
# Note: Set rexray_config_method: file and edit ./roles/bootstrap/templates/rexray.yaml.j2 for a custom rexray configuration
# options: empty, file
dcos_ee_rexray_config_method: empty

#For initial SSH access on nodes with Ansible
ansible_password: "password"
ansible_become_pass: "password"
#initial_remote_user: root

Change ansible configuration file

[defaults]
inventory = hosts.yaml
host_key_checking = False
remote_user = ansible
roles_path = ./roles
[all:vars]
ansible_connection=ssh
ansible_user=ansible
ansible_ssh_pass=password

Insert the servers to Ansible hosts file

As most of this file is commented out, I just post the section I added

[root@dcos-ansible ansible-dcos]# cat /etc/ansible/hosts
[dcos_servers]
192.168.22.100
192.168.22.101
192.168.22.102
192.168.22.103
192.168.22.104

Finally we can start with the playbooks – Read part 2 for this

Cet article Deploy DC/OS using Ansible (Part 1) – Getting Started est apparu en premier sur Blog dbi services.

Deploy DC/OS using Ansible (Part 2) – Playbooks

$
0
0

Finally, after all the configuration stuff is done, we can run the playbooks

Create SSH Access

First the SSH Access on all nodes need to be created. Therefore the access-onprem.yml is used:
Be careful, I used CentOS on my system, so I commented the apt-get and the debian-based part out.
If you want to run the playbook on another operating system, adjust it carefully.

---
# This playbook enable access to all ansible targets via ssh

- name: setup the ansible requirements on all nodes
  hosts: all:!localhost
  #hosts: all
  serial: 20
  remote_user: "{{ initial_remote_user | default('root') }}"
  become: true
  tasks:

#    - name: attempt to update apt's cache
#      raw: test -e /usr/bin/apt-get && apt-get update
#      ignore_errors: yes

#    - name: attempt to install Python on Debian-based systems
#      raw: test -e /usr/bin/apt-get && apt-get -y install python-simplejson python
#      ignore_errors: yes

    - name: attempt to install Python on CentOS-based systems
      raw: test -e /usr/bin/yum && yum -y install python-simplejson python
      ignore_errors: yes

    - name: Create admin user group
      group:
        name: admin
        system: yes
        state: present

    - name: Ensure sudo is installed
      package:
        name: sudo
        state: present

    - name: Remove user centos
      user:
        name: centos
        state: absent
        remove: yes

    - name: Create Ansible user
      user:
        name: "{{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}"
        shell: /bin/bash
        comment: "Ansible management user"
        home: "/home/{{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}"
        createhome: yes
        password: "admin123"

    - name: Add Ansible user to admin group
      user:
        name: "{{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}"
        groups: admin
        append: yes

    - name: Add authorized key
      authorized_key:
        user: "{{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}"
        state: present
        key: "{{ lookup('file', lookup('env','HOME') + '/.ssh/ansible-dcos.pub') }}"

    - name: Copy sudoers file
      command: cp -f /etc/sudoers /etc/sudoers.tmp

    - name: Backup sudoers file
      command: cp -f /etc/sudoers /etc/sudoers.bak

    - name: Ensure admin group can sudo
      lineinfile:
        dest: /etc/sudoers.tmp
        state: present
        regexp: '^%admin'
        line: '%admin ALL=(ALL) NOPASSWD: ALL'
      when: ansible_os_family == 'Debian'

    - name: Ensure admin group can sudo
      lineinfile:
        dest: /etc/sudoers.tmp
        state: present
        regexp: '^%admin'
        insertafter: '^root'
        line: '%admin ALL=(ALL) NOPASSWD: ALL'
      when: ansible_os_family == 'RedHat'

    - name: Replace sudoers file
      shell: visudo -q -c -f /etc/sudoers.tmp && cp -f /etc/sudoers.tmp /etc/sudoers

    - name: Test Ansible user's access
      local_action: "shell ssh {{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}@{{ ansible_host }} 'sudo echo success'"
      become: False
      register: ansible_success

    - name: Remove Ansible SSH key from bootstrap user's authorized keys
      lineinfile:
        path: "{{ ansible_env.HOME }}/.ssh/authorized_keys"
        state: absent
        regexp: '^ssh-rsa AAAAB3N'
      when: ansible_success.stdout == "success"

Start the Playbook for the SSH access

[root@dcos-ansible ansible-dcos]# pwd
/root/ansible-dcos

[root@dcos-ansible ansible-dcos]# ansible-playbook plays/access-onprem.yml
PLAY [setup the ansible requirements on all nodes] 
****************************************************************************************
TASK [Gathering Facts]
****************************************************************************************
ok: [192.168.22.103]
ok: [192.168.22.102]
ok: [192.168.22.104]
ok: [192.168.22.101]
ok: [192.168.22.100]

[....]

PLAY RECAP 
**************************************************************************************
192.168.22.100             : ok=14   changed=6    unreachable=0    failed=0
192.168.22.101             : ok=14   changed=6    unreachable=0    failed=0
192.168.22.102             : ok=14   changed=6    unreachable=0    failed=0
192.168.22.103             : ok=14   changed=6    unreachable=0    failed=0
192.168.22.104             : ok=14   changed=6    unreachable=0    failed=0

This is not the whole output of the playbook. Important to know, during the “TASK [Test Ansible user’s access]” I had to insert the Ansible password 5 times. After that the playbooks finished successfully.

Ping the servers using Ansible

After the playbook finished successfully do a test ping

[root@dcos-ansible ansible-dcos]# ansible all -m ping
192.168.22.102 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
192.168.22.100 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
192.168.22.104 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
192.168.22.101 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
192.168.22.103 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}

In case of trouble it is really helpful to use the “-vvv” option.
It is also possible to ping only one server using

ansible 192.168.22.100 -m ping

Rollout the DC/OS installation

[root@dcos-ansible ansible-dcos]# pwd
/root/ansible-dcos
[root@dcos-ansible ansible-dcos]# cat plays/install.yml
---
- name: setup the system requirements on all nodes
  hosts: all
  serial: 20
  become: true
  roles:
    - common
    - docker

- name: generate the DC/OS configuration
  hosts: bootstraps
  serial: 1
  become: true
  roles:
    - bootstrap

- name: deploy nodes
  hosts: [ masters, agents, agent_publics]
  serial: 20
  become: true
  roles:
    - node-install

[root@dcos-ansible ansible-dcos]# pwd
/root/ansible-dcos
[root@dcos-ansible ansible-dcos]# ansible-playbook plays/install.yml

PLAY [setup the system requirements on all nodes]
*********************************************************************

TASK [Gathering Facts]
*********************************************************************
ok: [192.168.22.102]
ok: [192.168.22.104]
ok: [192.168.22.101]
ok: [192.168.22.100]
[....]

In case some installation steps fail, Ansible will skip for that server and gives you the opportunity to rerun the playbook on the failed server.

ansible-playbook plays/install.yml --limit @/root/ansible-dcos/plays/install.retry

If you cannot connect to your master via browser: Check your /var/log/messages for error messages. In my case the master searched for the eth0 interface. Which isn’t available on my VM.
Just change the detect-ip script as follows, according to your network interface. Same step is needed on all agent-nodes as well.

[root@dcos-master bin]# cat /opt/mesosphere/bin/detect_ip
#!/usr/bin/env bash
set -o nounset -o errexit
export PATH=/usr/sbin:/usr/bin:$PATH
echo $(ip addr show enp0s8 | grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | head -1)

Install the CLI

For those of you, which prefer a CLI, just install it on your master.

[root@dcos-master ~]#  [ -d /usr/local/bin ] || sudo mkdir -p /usr/local/bin
[root@dcos-master ~]# curl https://downloads.dcos.io/binaries/cli/linux/x86-64/dcos-1.11/dcos -o dcos
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.9M  100 13.9M    0     0  1313k      0  0:00:10  0:00:10 --:--:-- 3920k
[root@dcos-master ~]# sudo mv dcos /usr/local/bin
[root@dcos-master ~]# chmod +x /usr/local/bin/dcos
[root@dcos-master ~]# dcos cluster setup http://192.168.22.101
If your browser didn't open, please go to the following link:

http://192.168.22.101/login?redirect_uri=urn:ietf:wg:oauth:2.0:oob

Enter OpenID Connect ID Token: eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6Ik9UQkVOakZFTWtWQ09VRTRPRVpGTlRNMFJrWXlRa015Tnprd1JrSkVRemRCTWpBM1FqYzVOZyJ9.eyJlbWFpbCI6Imp1bGlhLmd1Z2VsQGdtYWlsLmNvbSIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJpc3MiOiJodHRwczovL2Rjb3MuYXV0aDAuY29tLyIsInN1YiI6Imdvb2dsZS1vYXV0aDJ8MTA2NTU2OTI5OTM1NTc2MzQ1OTEyIiwiYXVkIjoiM3lGNVRPU3pkbEk0NVExeHNweHplb0dCZTlmTnhtOW0iLCJpYXQiOjE1NDA0NTA4MTcsImV4cCI6MTU0MDg4MjgxN30.M8d6dT4QNsBmUXbAH8B58K6Q2XvnCKnEd_yziiijBXHdW18P2OnJEYrKa9ewvOfFhyisvLa7XMU3xeBUhoqX5T6mGkQo_XUlxXM82Ohv3zNCdqyNCwPwoniX4vU7R736blcLRx1aB8TJnydNb0H0IzEAVzaYBQ1CRV-4a9KsiMXKBBPlskOSvek4b_FRghA6hsjMA2eO-G5r3B6UgHo6CCwdwVrhsOygvJ5NwDC0xiFrnkW-SjZRZztCN8cRj7b40VH43uY6R2ibxJfE7SaGpbWzLyp7juUJ766WXar3O7ww42bYIqLnAx6YmWG5kFeJnmJGT-Rdmhl2JuvdABoozA

That’s it, now you can configure and use your DC/OS. Always keep in mind: the ntpd service is really essential for a working DC/OS Node. Also use the /var/log/messages, it really helps!
One little thing I have to mention at the end. Don’t confide in the official documentation and the troubleshooting guide, it does not help as much as expected…

Cet article Deploy DC/OS using Ansible (Part 2) – Playbooks est apparu en premier sur Blog dbi services.

Deep dive Postgres at the #pgconfeu conference

$
0
0

Today I followed many good technical sessions at the European Postgres conference. The Postgres conferences are really technical oriented, you will find no marketing sessions there and you learn a lot of things.
As promised yesterday, I wrote today my first blog about the new Postgres storage engine ZHEAP/UNDO, which is a very interesting feature, with very interesting results.

Before you continue to read this blog, if you didn’t read my blog from yesterday,read it first :-) link

First test : table creation

We create 2 tables, one with the default Postgres storage engine HEAP, and one with the new storage enfine ZHEAP.

PSQL> create table heap2 as select a.*, md5(a::varchar), now() from generate_series(1,5000000) a;
 
SELECT 5000000
Time: 12819.369 ms (00:12.819)

PSQL> create table zheap2  with (storage_engine='zheap') as select a.*, md5(a::varchar), now() from generate_series(1,5000000) a;
SELECT 5000000
Time: 19155.004 ms (00:19.155)

You noticed, that with Postgres you can choose your storage engine at table level :-). The table creation with ZHEAP is slower, but is is normal because now we have to create the UNDO segment also.

Second test : Size of the both tables

Before to start the tests we will check the size of the HEAP and ZHEAP tables, as announced yesterday the HEAP table should be smaller, because we have less header information.

PSQL>  select pg_size_pretty(pg_relation_size('heap2'));
 pg_size_pretty 
----------------
 365 MB
PSQL> select pg_size_pretty(pg_relation_size('zheap2'));
 pg_size_pretty 
----------------
 289 MB

The ZHEAP tables is smaller, it exactly what Amit explain us yesterday, because the block header with ZHEAP is smaller. If you want to learn more read his presentation from yesterday. Again the link is on my blog from yesterday.

Third test : Update on the table

To get the bloat effect on the HEAP table, we will now update the full table and see what happen.

PSQL> update heap2 set a=a+12222222;
UPDATE 5000000
Time: 19834.911 ms (00:19.835)

PSQL> update zheap2 set a=a+12222222;
UPDATE 5000000
Time: 26956.043 ms (00:26.956)

PSQL> select pg_size_pretty(pg_relation_size('zheap2'));
 pg_size_pretty 
----------------
 289 MB
PSQL> vacuum heap2;
PSQL> select pg_size_pretty(pg_relation_size('heap2'));
 pg_size_pretty 
----------------
 730 MB

The same as for the creation the update is a bit longer, but the update with ZHEAP write many information into the log file.We should test again this update with disabling the writing of information into the log file about creating undo segment.
But as you can see, the most important information here is that the table don’t bloat as the HEAP table, now the HEAP table is 2 times bigger despite I executed a VACUUM.

Fourth test: test of the ROLLBACK

To test the ROLLBACK we have to open first a transaction with BEGIN;

PSQL>  begin;
BEGIN
PSQL>* update heap2 set a=a+12222222;
UPDATE 5000000
Time: 22071.462 ms (00:22.071)
PSQL> * rollback;
ROLLBACK
Time: 1.437 ms

PSQL> begin;
BEGIN
PSQL> * update zheap2 set a=a+12222222;
UPDATE 5000000
Time: 28210.845 ms (00:28.211)
PSQL> * rollback;
ROLLBACK
Time: 0.567 ms

This is the part where I’m the most surprised, the ROLLBACK for ZHEAP is so fast as for HEAP, I can’t explain that. I will leave my colleague Daniel Westermann making deeper tests :-). Because with ZHEAP he has to apply the undo blocks, where HEAP tables only mark the transactions as aborted.

Fifth tests : Check of the query performances

For this test we have to first flush the filesystem cache and to restart the database, to be sure that nothing is cached.

postgres@dbi-pg-tun:/home/postgres/zheap/ [ZHEAP] pgstop 
waiting for server to shut down.... done
server stopped

postgres@dbi-pg-tun:/home/postgres/ [ZHEAP] sudo sync
postgres@dbi-pg-tun:/home/postgres/ [ZHEAP] sudo echo 3 > /proc/sys/vm/drop_caches

postgres@dbi-pg-tun:/home/postgres/zheap/ [ZHEAP] pgstart
waiting for server to start.... done
server started

Now we are ready for the last test

postgres@dbi-pg-tun:/home/postgres/zheap/ [ZHEAP] sqh
PSQL> select count(*) from heap2;
  count  
---------
 5000000
Time: 3444.869 ms (00:03.445)

PSQL> select count(*) from zheap2;
  count  
---------
 5000000
Time: 593.894 ms

As you can see the query performance are improved significantly for full table scan :-), because the table didn’t bloat as for the HEAP table. For you information I started additionally 2 times a full update before to restart the database and the HEAP table is now 3 times bigger.

PSQL> select pg_size_pretty(pg_relation_size('heap2'));
 pg_size_pretty 
----------------
 1095 MB

Time: 0.508 ms
PSQL> select pg_size_pretty(pg_relation_size('zheap2'));
 pg_size_pretty 
----------------
 289 MB

Conclusion of these tests

  • Postgres allow the usage or not of UNDO’s at the table level
  • We are surprise how fast the ROLLBACK are, but this must be tested again, I don’t understand why
  • Select performance are improved significantly for full table scan :-)
  • The storage will not bloat anymore with ZHEAP
  • Finally only the updates are a little bit slower

It will be interesting to follow the discussions around this feature on the mailing list.

Cet article Deep dive Postgres at the #pgconfeu conference est apparu en premier sur Blog dbi services.

Node Manager not starting after using unpack to install a WebLogic domain on a remote machine.

$
0
0

Created a domain using the config.sh script that is using several hosts. The pack and unpack command has been used to install the domain on the remote servers.
The pack command:cd $FMW_HOME/oracle_common/common/bin
./pack.sh -domain=/u02/config/domains/workshop_domain \
-template=/home/weblogic/workshop/lab_DomainCreation/workshop_template.jar \
-template_name=workshop_template \
-managed=true
The new created jar file was copied on the remote server and the unpack command run:cd $FMW_HOME/oracle_common/common/bin
./unpack.sh -domain=/u02/config/domains/workshop_domain \
-template=/home/weblogic/workshop/lab_DomainCreation/workshop_template.jar
Starting the node manager failed due to the DemoIdentity Java KeyStore file missing. This one was always generated in the previous WebLogic Sofware version.

Errors from the node manager log file:weblogic.nodemanager.common.ConfigException: Identity key store file not found: /u02/config/domains/workshop_domain/security/DemoIdentity.jks
at weblogic.nodemanager.server.SSLConfig.loadKeyStoreConfig(SSLConfig.java:225)
at weblogic.nodemanager.server.SSLConfig.access$000(SSLConfig.java:33)
at weblogic.nodemanager.server.SSLConfig$1.run(SSLConfig.java:118)
at java.security.AccessController.doPrivileged(Native Method)
at weblogic.nodemanager.server.SSLConfig.(SSLConfig.java:115)
at weblogic.nodemanager.server.NMServer.(NMServer.java:169)
at weblogic.nodemanager.server.NMServer.getInstance(NMServer.java:134)
at weblogic.nodemanager.server.NMServer.main(NMServer.java:589)
at weblogic.NodeManager.main(NodeManager.java:31)

Starting from WebLogic 12.2.1.3, the unpack command does not generate the DemoIdentity.jks keystore file anymore. The DemoIdentity JKS file needs to be created manually or the Node manager changed to non SSL. After making sure the java set in the path is the one used by the WebLogic Domain run:cd /u02/config/domains/workshop_domain/security
java utils.CertGen -certfile democert -keyfile demokey -keyfilepass DemoIdentityPassPhrase -noskid
java utils.ImportPrivateKey -certfile democert.pem -keyfile demokey.pem -keyfilepass DemoIdentityPassPhrase -keystore DemoIdentity.jks -storepass DemoIdentityKeyStorePassPhrase -alias demoidentity
After this the Node Manager can be started successfully.

Cet article Node Manager not starting after using unpack to install a WebLogic domain on a remote machine. est apparu en premier sur Blog dbi services.

How to change oam-config.xml to adapt to Oracle Access Manager configuration change

$
0
0

After upgrading the Oracle Access Manager from 11.1.2.3 to 12.2.1.3, I extended the WebLogic Domain with User Messaging Services to be able to use the Forget Password feature.
The Oracle Support note ID 2302623.1 gives a good example how to do.
Oracle Access Manager 12cps3 (OAM 12.2.1.3.0) Sample Application Demonstrates Forgot Password Flow Using Multi Factor Authentication REST APIs ( Doc ID 2302623.1 ).

But the OAM AdaptiveAuthenticatorPlugin was missing the UmsClientUrl property
and without this one, no way to send mails or SMS with the security token to the user requesting the reset of his password.

I decided to modify the oam-config.xml file with the missing property.
During the modification, I modfied the Version of the oam-config.xml to inform OAM about the configuration change.
What was my surprise when I saw that my modifications were reseted after the Administration Server was started new.

This has changed starting from OAM 12.2.1.3. Now the oam-config is stored in the Metadata Database and needs to be exported,
modified and re-imported back to the database. The steps are described in the documentation (here).

I had then to follow then those steps as shown below:
First set the JAVA_HOME and PATH to point to the right Java Version.
$ export JAVA_HOME=/u00/app/oracle/product/Java/jdk
$ export PATH=$JAVA_HOME/bin:$PATH

The export requires a properties file defining te connection to the OAM repository$ more dbschema.properties
oam.entityStore.ConnectString=jdbc:oracle:thin:@vm02:1522/IDM
oam.entityStore.schemaUser=IAM_OAM
oam.entityStore.schemaPassword=Welcome1
oam.importExportDirPath=/home/oracle/OAM_CONFIG
oam.frontending=params=vm03;14100;http
[oracle@vm03 OAM_CONFIG]$
Export the oam-config.xml file$ java -cp /u01/app/fmw_oim_12213/idm/oam/server/tools/config-utility/config-utility.jar:/u01/app/fmw_oim_12213/oracle_common/modules/oracle.jdbc/ojdbc8.jar oracle.security.am.migrate.main.ConfigCommand /u02/app/config/domains/idm_domain/ export dbschema.properties
Oct 15, 2018 6:40:44 PM oracle.security.am.migrate.main.command.CommandFactory getCommand
INFO: executable operation: export
oam.exportDirPath=/home/oracle/OAM_CONFIG
oam.exportedFile=oam-config.xml
oam.operation.time=2654
This exports the oam-config.xml file in the local directory. Modify this file and import it back to the DB.$ java -cp /u01/app/fmw_oim_12213/idm/oam/server/tools/config-utility/config-utility.jar:/u01/app/fmw_oim_12213/oracle_common/modules/oracle.jdbc/ojdbc8.jar oracle.security.am.migrate.main.ConfigCommand /u02/app/config/domains/idm_domain/ import dbschema.properties
Oct 15, 2018 6:43:25 PM oracle.security.am.migrate.main.command.CommandFactory getCommand
INFO: executable operation: import
Oct 15, 2018 6:43:27 PM oracle.security.am.migrate.util.ConfigFileUtil replaceValue
INFO: 191 will be replaced by 192
Oct 15, 2018 6:43:28 PM oracle.security.am.migrate.operation.ImportConfigOperation invoke
INFO: imported config file version to database:192
oam.importDirPath=/home/oracle/OAM_CONFIG
oam.importedFile=oam-config.xml
oam.importedVersion=192
oam.operation.time=2214
During the import, the version is incremented automatically. Take care on not to have typos errors in the oam-config.xml file you import as I’m not sure there is a validation before the import and the OAM schema can be corrupted.

Cet article How to change oam-config.xml to adapt to Oracle Access Manager configuration change est apparu en premier sur Blog dbi services.


Short summary of my PGCONF.EU 2018 conference

$
0
0

So it’s nearly the end of the conference, where I met lot’s of people from the Postgres community, customers, EDB partner and also contributor/developer of Postgres. During the 3 days I had also the chance to follow many technical sessions and I will give you a short feedback of my preferred sessions.
If you want to have the details of the sessions go the conference website https://www.postgresql.eu/events/pgconfeu2018/schedule

PGCONFEU

Tour de Data Types: VARCHAR2 or CHAR(255)?

Very interesting session from Andreas Scherbaum, where he presented the most interesting Data types with use cases and examples.

What’s new in PostgreSQL 11

Magnus it a well know presenter, and he made a good and funny presentation of the Postgres 11 new features.

CREATE STATISTICS – what is it for?

For me it was one of the best presentations it delivered not many information,the quality of the presentation was just perfect. I directly downloaded the presentation and added it to my knowledge book : https://github.com/tvondra/create-statistics-talk

Ace it with ACID: PostgreSQL transactions for fun and profit

The presentation was very good, especially if you want to trust your database.

An answer to PostgreSQL bloat woes

Thanks Amit for the presentation of the new storage engine ZHEAP, and for the information sharing. Directly after the session with Daniel we started to test this new storage engine. If you want to know more about this topic see my blog about ZHEAP.

AUTO PLAN TUNING USING FEEDBACK LOOP

It’s nice to know, which all new features in the Optimizer Tuning will come out the next years (FEEDBACK,ADVISOR,AUTONOMOUS,etc..). I wait impatiently to get access to the project,to be able to tests these new features.

See you next year !

Cet article Short summary of my PGCONF.EU 2018 conference est apparu en premier sur Blog dbi services.

Reimaging an old X3-2/X4-2 ODA

$
0
0

Introduction

X3-2 and X4-2 ODAs are still very capable pieces of hardware in 2018. With 256GB of RAM and at least 16 cores per node, and with 18TB RAW disk capacity as a standard these appliances are far from obsolete even you probably don’t have any more support on the hardware from Oracle.
If you own several ODAs of this kind, hardware support may not really be a problem. If something fails, you can use the other ODA for spare parts.

You probably missed some patches on your old ODAs. Why? Maybe because it’s not so easy to patch, and it’s even more difficult if you don’t patch regularly. Or maybe just because you don’t want to add more tasks to your job (applying each patch is just like never stop patching).

So if you want to give a second life to your ODA, you’d better reimage it.

Reimaging: how to do?

Reimaging is the best way to do the cleanup of your ODA. Current deployment packages are certified for all the ODAs, except from V1 (first generation before the X3-2).

You first have to download all the needed files from MOS. Pay attention to download the deployment packages for OAKCLI stack because ODACLI is limited to lite and newer ODAs.

Assuming you’re using a bare metal configuration and you want to deploy the latest ODA version 12.2.1.4.0, you will need the following files :

  • 12999313 : ISO for reimaging
  • 28216780 : patch for OAKCLI stack (because reimaging actually does not update bioses and firmwares)
  • 12978712 : appliance server for OAKCLI stack
  • 17770873,  19520042 et 27449599 : rdbms clones for database 11.2.0.4, 12.1.0.2 and 12.2.0.1

Network configuration and disk configuration didn’t change: you still need to provide all the IPs, VIPs, DNS and so on for the network, and disk configuration is still not so clear with external backup meaning that you will go for 85/15 repartition between DATA and RECO instead of the default 40/60 split. Don’t forget that you can change the redundancy level for each ASM diskgroup: DATA, RECO and REDO can use high redundancy, but normal redundancy will give you 50% more free space (18TB RAW is 9TB usable in normal redundancy and 6TB usable in high redundancy).

Step 1 – Connect the ISO as a CDROM through ILOM interface and reimage the servers

I won’t give you the extensive procedure for this part: nothing has changed regarding ODA reimaging during last years.

First step is to connect to the ILOM and virtually plug the ISO image on the server. Then, select the CDROM as the next boot device, and do a power cycle of the server. You’ll have to repeat this on the other node too. Reimaging lasts about 1h and is fully automatic. The latest step is still the longest one (post-installation procedure). Once the reimaging is done, each node should have a different default name: oak1 for node 0 and oak2 for node 1 (weird). If the nodes are both oak1, please check the cables connected to the shared storage: they must be connected according to the setup poster.

Step 2 – Configure basic network settings

Reimaging is always ending by a reboot, and depending on the appliance, it will ask you the kind of network you plan to use: Copper of Fiber. Then, through the ILOM, you need to launch the configure firstnet script:

/opt/oracle/oak/bin/oakcli configure firstnet

Repeat this configuration step on the second node. Now your nodes are visible through the network.

Step 3 – Deploy, cleanup, deploy…

Reimaging was so easy… But from now it will be a little more tricky. You now need to deploy the appliance: understand configure the complete network settings, install all the Oracle stack with Grid Infrastrucure, ASM, latest database engine and eventually create a first database. And you will need a graphical interface to configure all these parameters and launch the deployment. So, from the ILOM session, let’s unpack the necessary files, start a graphical session of Linux and launch the deployment GUI.

oakcli unpack -package /opt/dbi/p12978712_122140_Linux-x86-64_1of2.zip
oakcli unpack -package /opt/dbi/p12978712_122140_Linux-x86-64_2of2.zip
oakcli unpack -package /opt/dbi/p27449599_122140_Linux-x86-64.zip
startx
oakcli deploy

Graphical interface will help you to configure all the parameters, but don’t deploy straight away from now. Backup the configuration file and then edit it:

vi /opt/dbi/deploy_oda01

Review all the parameters and adjust them to perfectly match your needs (most of these parameters cannot be changed afterwards).

Now you can launch the real deployment and select your configuration file in the graphical interface:

oakcli deploy

First try will fail and it’s a normal behaviour. Failure is because of the ASM headers: they are still writen on the disks in the storage shelf. Reimaging did nothing on these disks. And already having ASM disks configured will make the deployment process to fail. Now you can exit the deployment and do a cleanup of the failed attempt.

/opt/oracle/oak/onecmd/cleanupDeploy.pl

Unfortunatly you cannot do the cleanup if nothing is already deployed, so you need this first failing attempt. Alternatively, you can do the cleanup before reimaging, or manually clean all the disks headers and partitions on the 20 disks before trying to deploy (with a dd), but it probably won’t be faster.

When the cleanup is done, the ODA will reboot and you’ll have to configure again the firstnet from the ILOM on both nodes.

/opt/oracle/oak/bin/oakcli configure firstnet

Finally, with a new graphical session you can restart the deployment, and this time, if your parameter file is OK, it will be succesful. Yes!

startx
oakcli deploy

Step 4 – Patch the server

It seems weird but reimaging actually doesn’t update the firmware, bios, ilom of the servers, nor the firmware of the disks in the storage shelf. Understand that reimaging is only a software reimaging of the nodes. This is an example of an ODA X4-2 configuration just after reimaging and deploying the appliance:

oakcli show version -detail
System Version  Component Name            Installed Version         Supported Version
--------------  ---------------           ------------------        -----------------
12.2.1.4.0
Controller_INT            11.05.03.00               Up-to-date
Controller_EXT            11.05.03.00               Up-to-date
Expander                  0018                      Up-to-date
SSD_SHARED                944A                      Up-to-date
HDD_LOCAL                 A720                      A7E0
HDD_SHARED {
[ c2d0,c2d1,c2d2,c2d      A720                      A7E0
3,c2d4,c2d5,c2d6,c2d
7,c2d8,c2d9,c2d11,c2
d12,c2d13,c2d14,c2d1
5,c2d16,c2d17,c2d18,
c2d19 ] [ c2d10 ]                 A7E0                      Up-to-date
}
ILOM                      3.2.4.46.a r101689        4.0.2.27.a r123795
BIOS                      25030100                  25060300
IPMI                      1.8.12.4                  Up-to-date
HMP                       2.4.1.0.11                Up-to-date
OAK                       12.2.1.4.0                Up-to-date
OL                        6.9                       Up-to-date
GI_HOME                   12.2.0.1.180417(2767      Up-to-date
4384,27464465)
DB_HOME                   12.2.0.1.180417(2767      Up-to-date
4384,27464465)

Hopefully you can apply the patch even if your ODA is already in the same software version as your patch. Well done Oracle.

So let’s register the patch files and do the patching of the servers (server will probably reboot):


oakcli unpack -package /opt/dbi/p282166780_122140_Linux-x86-64_1of3.zip
oakcli unpack -package /opt/dbi/p282166780_122140_Linux-x86-64_2of3.zip
oakcli unpack -package /opt/dbi/p282166780_122140_Linux-x86-64_3of3.zip
oakcli update -patch 12.2.1.4.0 --server
...


oakcli show version -detail

System Version  Component Name            Installed Version         Supported Version
--------------  ---------------           ------------------        -----------------
12.2.1.4.0
Controller_INT            11.05.03.00               Up-to-date
Controller_EXT            11.05.03.00               Up-to-date
Expander                  0018                      Up-to-date
SSD_SHARED                944A                      Up-to-date
HDD_LOCAL                 A7E0                      Up-to-date
HDD_SHARED {
[ c2d0,c2d1,c2d2,c2d      A720                      A7E0
3,c2d4,c2d5,c2d6,c2d
7,c2d8,c2d9,c2d11,c2
d12,c2d13,c2d14,c2d1
5,c2d16,c2d17,c2d18,
c2d19 ] [ c2d10 ]                 A7E0                      Up-to-date
}
ILOM                      4.0.2.27.a r123795        Up-to-date
BIOS                      25060300                  Up-to-date
IPMI                      1.8.12.4                  Up-to-date
HMP                       2.4.1.0.11                Up-to-date
OAK                       12.2.1.4.0                Up-to-date
OL                        6.9                       Up-to-date
GI_HOME                   12.2.0.1.180417(2767      Up-to-date
4384,27464465)
DB_HOME                   12.2.0.1.180417(2767      Up-to-date
4384,27464465)

Great, our servers are now up-to-date. But storage is still not OK.

Step 4 – Patch the storage

Patching the storage is quite easy (server will probably reboot):

oakcli update -patch 12.2.1.4.0 --storage
...

oakcli show version -detail

System Version  Component Name            Installed Version         Supported Version
--------------  ---------------           ------------------        -----------------
12.2.1.4.0
Controller_INT            11.05.03.00               Up-to-date
Controller_EXT            11.05.03.00               Up-to-date
Expander                  0018                      Up-to-date
SSD_SHARED                944A                      Up-to-date
HDD_LOCAL                 A7E0                      Up-to-date
HDD_SHARED                A7E0                      Up-to-date
ILOM                      4.0.2.27.a r123795        Up-to-date
BIOS                      25060300                  Up-to-date
IPMI                      1.8.12.4                  Up-to-date
HMP                       2.4.1.0.11                Up-to-date
OAK                       12.2.1.4.0                Up-to-date
OL                        6.9                       Up-to-date
GI_HOME                   12.2.0.1.180417(2767      Up-to-date
4384,27464465)
DB_HOME                   12.2.0.1.180417(2767      Up-to-date
4384,27464465)

Everything is OK now!

Conclusion – A few more things

  • When redeploying, consider changing the redundancy of the diskgroups and the partitionning of the disk if needed. This can only be configured during deployment. Disks parameters are located in the deployment file (DISKGROUPREDUNDANCYS and DBBackupType)
  • Always check that all the components are up-to-date to keep your ODA in a consistent state. Check on both nodes because local patching is also possible, and it could make no sense if the nodes are running different level of patch
  • Don’t forget to check/apply your licenses on your ODA because using Oracle software is for sure not free
  • You have to know that a freshly redeployed ODA will have 12.2 database compatibility on diskgroups, making the use of acfs mandatory for your old databases. For me it’s a real drawback considering that acfs is adding useless complexity to ASM
  • Don’t forget to deploy the other dbhomes according to your needs

Cet article Reimaging an old X3-2/X4-2 ODA est apparu en premier sur Blog dbi services.

SQL Server 2019 availability groups – Introduction to R/W redirection

$
0
0

writeThis is a second write-up about SQL Server 2019 CTP2.0 and availability group new features. This time the interesting one is about READRWRITE redirection capabilities. A couple of months ago, I wrote about SQL Server 2017 and new read-scale capabilities and listener management challenges regarding the operating system. Indeed, there are some scenarios where including listener will not as easy as we did with common ones on the top of Windows operation system. A list of these scenarios is listed in the BOL.

Without a listener, I would say that read-only connections are not a big deal because they are supposed to work regardless the replica’s role – either PRIMARY or secondary the game is not the same with read-write connections. The connection’s write-part may trigger errors if they attempt to run write queries and guess what, this issue is addressed by new READWRITE capabilities of AGs in SQL Server 2019.

Let’s set the context of my lab environment:

blog 146 - 1 - AG schema redirection RW

 

This is a pretty simple environment that includes 2 replicas in synchronous mode. Automatic failover is obviously not available in read scale topology.

My first test was to attempt a R/W client connection from the secondary to see if redirection applied on the primary. My configuration script includes the new READ_WRITE_ROUTING_URL requested for the redirection to the primary.

:CONNECT WIN20161\SQL2019CTP2
CREATE AVAILABILITY GROUP AG2019   
WITH ( CLUSTER_TYPE =  NONE )  
FOR DATABASE  [AdventureWorks2016]   
REPLICA ON 
'WIN20161\SQL2019CTP2' WITH   
(  
	ENDPOINT_URL = 'TCP://WIN20161.dbi-services.test:5026',  
	AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,  
	FAILOVER_MODE = MANUAL,  
	-- Secondary role => we need to allow all connection types 
	-- R/O + R/W to allow R/W connections
	SECONDARY_ROLE (
		ALLOW_CONNECTIONS = ALL,   
		READ_ONLY_ROUTING_URL = 'TCP://WIN20161.dbi-services.test:1459' 
	),
	PRIMARY_ROLE (
		ALLOW_CONNECTIONS = READ_WRITE,   
		READ_ONLY_ROUTING_LIST = ('WIN20162\SQL2019CTP2'),
		READ_WRITE_ROUTING_URL = 'TCP://WIN20161.dbi-services.test:1459' 
	),  
	SESSION_TIMEOUT = 10,
	SEEDING_MODE = AUTOMATIC  
),   
'WIN20162\SQL2019CTP2' WITH   
(  
	ENDPOINT_URL = 'TCP://WIN20162.dbi-services.test:5026',  
	AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,  
	FAILOVER_MODE = MANUAL, 
	SECONDARY_ROLE (
		ALLOW_CONNECTIONS = ALL,   
		READ_ONLY_ROUTING_URL = 'TCP://WIN20162.dbi-services.test:1459' 
	),   
	PRIMARY_ROLE (
		ALLOW_CONNECTIONS = READ_WRITE,   
		READ_ONLY_ROUTING_LIST = ('WIN20161\SQL2019CTP2'),
		READ_WRITE_ROUTING_URL = 'TCP://WIN20162.dbi-services.test:1459' 
	), 
	SESSION_TIMEOUT = 10,
	SEEDING_MODE = AUTOMATIC
);
GO 

ALTER AVAILABILITY GROUP [AG2019] GRANT CREATE ANY DATABASE
GO

:CONNECT WIN20162\SQL2019CTP2
ALTER AVAILABILITY GROUP [AG2019] JOIN WITH (CLUSTER_TYPE = NONE);
GO  

ALTER AVAILABILITY GROUP [AG2019] GRANT CREATE ANY DATABASE
GO

 

My AG current state is as follows and meets all R/W redirection prerequisites: the WIN20162\SQL2019CTP2 secondary replica is online and the primary replica includes READ_WRITE_ROUTING_URL as well.

blog 146 - 2 - AG config

We may notice new sys.availability_replicas view column concerning shipped with this new SQL Server version including R/W URL settings and primary connection mode as well.

Let’s try a R/W connection to the secondary replica WIN20162\SQL2019CT2 with the following connection string. ApplicationIntent parameter is not specified meaning I will use R/W intent by default.

$connectionString = "Server=WIN20162\SQL2019CTP2; Integrated Security=False; uid=sa; pwd=xxxx; Initial Catalog=AdventureWorks2016;"

 

I got the expected result. My connection is transparently redirecting to my primary replica WIN20161\SQL2019CT2 as shown below:

blog 146 - 3 - Test AG RW redirection

Let’s switch my connection intent to read-only (connection parameter ApplicationIntent=ReadOnly):

$connectionString = "Server=WIN20162\SQL2019CTP2; Integrated Security=False; uid=sa; pwd=xxxx; Initial Catalog=AdventureWorks2016; ApplicationIntent=ReadOnly"

 

The connection stays on the secondary read-only accordingly to my AG configuration:

blog 146 - 4 - Test AG RO redirection

This new capability did the expected job. Obviously if you miss some prerequisites redirection will not occur but at the moment of this write-up I didn’t find out any “obvious” extended event to troubleshoot R/W routing events as we may already use for R/O routing. Probably in the next CTP …

In a nutshell, this new feature concerns R/W redirections from a secondary to a primary replica. But for curiosity, I tried to perform the same test for R/O redirection with ApplicationIntent=ReadOnly but without luck.

$connectionString = "Server=WIN20161\SQL2019CTP2; Integrated Security=False; uid=sa; pwd=xxxx; Initial Catalog=AdventureWorks2016; ApplicationIntent=ReadOnly"

blog 146 - 5 - Test AG RO redirection 1

To confirm my R/O topology is well configured, I implemented a AG listener as I did in a previous blog post about AG read-scale scenarios. For my test, I replaced the server value by the listener name in the following connection string. My AG listener is listen on port 1459  …

$connectionString = "Server=lst-ag2019,1459; Integrated Security=False; uid=sa; pwd=xxxx; Initial Catalog=AdventureWorks2016; ApplicationIntent=ReadOnly"

 

… and it ran successfully as show below. My connection is well redirected from the primary – WIN20161\SQL2019CTP2 to the secondary WIN20162\SQL2019CTP2:

blog 146 - 6 - Test AG RO redirection 2

Well, it seems that R/O redirection without using a listener is not implemented yet but it is probably out of the scope of SQL2019 AG redirection capabilities.

See you!

 

 

 

 

 

 

 

Cet article SQL Server 2019 availability groups – Introduction to R/W redirection est apparu en premier sur Blog dbi services.

What I will take away from the PGCONF.EU 2018 Lisbon

$
0
0

You like non-technical blogs? This blog on the pgconf.eu 2018 Lisbon is for you ;-)

On PostgreSQL

Well, you’ve already heard about PostgreQSL, haven’t you?
This is all about Relational Database Management Systems. In other words, the invisible part of the IT.

PostgreSQL, or simply postgres, is a powerful, open source RDBMS with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance. At least, you probably met the PostgreSQL blue elephant after a few beers already once in your life, didn’t you? I did !

elephant

On pgconf.eu

PostgreSQL Conference Europe, PGCONF.EU, is the largest PostgreSQL conference in Europe. This year’s PGCONF.EU took place in Lisbon and was the 10th edition of this amusing meeting and cooperation event for PostgreSQL users. All editions took place in major European cities to make it easy for as many people as possible to come to the conference:

2008 ‐ Prato, Italy
2009 ‐ Paris, France
2010 ‐ Stuttgart, Germany
2011 ‐ Amsterdam, Netherlands
2012 ‐ Prague, Czech Republic
2013 ‐ Dublin, Ireland
2014 ‐ Madrid, Spain
2015 ‐ Vienna, Austria
2016 ‐ Tallinn, Estonia
2017 ‐ Warsaw, Poland

This won’t be a surprise for you if I tell that the number of attendees has increased over the 10 years. But interestingly, this year’s edition have reached around to 450 speakers and attendees where the registration process has already been stopped end of September 2018 because the conference PGCONF.EU was sold out.

On Techies

Now you will ask “who are the typical users of Postgres?”. “PostgreSQL users come from all size of companies all over the world. Postgres is global” says Marc Linster / EnterpriseDB (EDB). More than 200 papers have been submitted for the pgconf.eu 2018 event whereas around 50 sessions took place. So this has been very selective this year.

EDB Marc

At this point of the blog, I have to tell that the pgconf.eu is a real techie event. Non techies are more or less welcome ;-) Techies went that far to avoid my attending the event that they sent a bird into the machine of my EasyJet flight that has first been delayed and then canceled on Monday night. But I really wanted to attend this major techie event, so I took a TAP flight on the next morning and arrived alive! We’ll see if I can say my journey back….

On dbi services at pgconf.eu

Arrived at the Marriott Hotel Lisbon, our 3-men team installed booth and live demo table for welcoming potential interested people in services of the Swiss PostgreSQL specialist dbi services on level 0 in Room Mediterranean.

sdr IMG_E6322 table demo

Unfortunately, all sessions took place on level -1 where this was possible to get coffee and catering, lunch and even access to the hotel parc. So only 1/3 of the attendees went upstairs and visited the sponsors’ booths. Should be better organized next time.

Apart from this issue, the organization was awesome, the welcoming the attendees and partners, the sessions, the location and the catering were great. Also the social events proposed by PostgreSQL Europe. During the event we also had the opportunity to spend some time and had excellent discussions with our partner EnterpriseDB

IMG_6316 partners

food and so EDB dbi pic

Also Daniel had the opportunity to demonstrate the dbi OpenDB Appliance live to many interested people.

live demo OpenDB Appliance

Not to be forgotten, we enjoyed the very nice atmosphere in the city of the discoverers (Magellan, Vasco da Gama, and many others)…

IMG_6356 stairs porte praca do commerco

… and had another kind of “live demo” + a drink at Museu da Farmacia

demo museu da farmacia

Now it is time to say thank you and goodbye pgconf.eu !

Conclusion

For me, there were many lessons learned at the pgconf.eu #10. One of them was how to write a blog. Many thanks to my mentor, Daniel Westermann! Also, many thanks to Hervé Schweitzer especially for the night-walk through Lisbon and the great non-alcohol cocktail!

For you, well… at least you’ve got a more precise idea of what a non technical guy attending a technical event will take away from such events. And even if this is not the best blog you’ve ever read, not to write this blog would have been… bullshit!

See you ;-)

Cet article What I will take away from the PGCONF.EU 2018 Lisbon est apparu en premier sur Blog dbi services.

Password Verification Policy in MySQL 8.0.13

$
0
0

The new release 8.0.13 for MySQL is available since last week.
Concerning security, this comes with a new feature already announced: the Password Verification Policy.
Let’s have a look…

This aim of this feature is to secure the attempts to change a password by specifying the old one to be replaced.
It is turned off by default:

mysql> show variables like 'password_require_current';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| password_require_current | OFF   |
+--------------------------+-------+

and we can activate it by several ways (as for some other password features):
1. Globally, at the server level:

mysql> set persist password_require_current='ON';
mysql> show variables like 'password_require_current';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| password_require_current | ON    |
+--------------------------+-------+

2. On a per-account-basis, and if we want to force the verification of the old password:

mysql> create user olivier@localhost identified by 'MySQLisPowerful' PASSWORD REQUIRE CURRENT;

3. On a per-account-basis, and if we want to make the verification of the old password optional:

mysql> create user gregory@localhost identified by 'SecurityIsImportant' PASSWORD REQUIRE CURRENT OPTIONAL;

Suppose that we have activated it at the server level, now let’s create one user account:

mysql> create user elisa@localhost identified by 'manager';

If we try to change the password for this user, we can do that without specifying any password:

mysql> alter user elisa@localhost identified by 'WhatsTheProblem';

Why? Because we are connected as the root account. Actually accounts which have the ‘CREATE USER’ or ‘UPDATE on mysql.*’ privileges are not affected by this policy.

So if we try to connect as our user ‘elisa’ and to change our password:

mysql> select user();
+-----------------+
| user()          |
+-----------------+
| elisa@localhost |
+-----------------+
mysql> alter user elisa@localhost identified by 'GoodVibes';
ERROR 13226 (HY000): Current password needs to be specified in the REPLACE clause in order to change it.

that is not possible. We can only do that if we specify our old password in the ‘ALTER USER’ statement through the ‘REPLACE’ clause:

mysql> alter user elisa@localhost identified by 'GoodVibes' replace 'WhatsTheProblem';
Query OK, 0 rows affected (0.12 sec)

Simple, isn’t it?
As a best practice in terms of security, I suggest you to activate this functionality in your MySQL environment.
For other information concerning new security features in MySQL 8.0 check the MySQL Documentation and come to my session MySQL 8.0 Community: Ready for GDPR? at the DOAG.

Cet article Password Verification Policy in MySQL 8.0.13 est apparu en premier sur Blog dbi services.

How to fix OUI-10022 error on an ODA

$
0
0

When manually upgrading Grid Infrastructure on an ODA according to MOS note 2379389.1 it has to be done as grid user. This can fail with OUI-10022 error which indicates that Oracle inventory is corrupt.
But when trying an “opatch lsinventory” as oracle and as grid user these commands succeeded, so inventory seems to be ok.
It turned out that the locks subdirectory of the oracle inventory was not writable for the grid user. After making it writable for user grid, upgrade ran fine.

Cet article How to fix OUI-10022 error on an ODA est apparu en premier sur Blog dbi services.

Foglight: Monitoring solution for databases [Part 02]

$
0
0

Foglight is a powerful and all in one monitoring solution to monitor various IT infrastructure assets like databases, servers, applications and so on. Whenever you need to monitor a special assets like an Oracle database, a Windows server, a Tomcat server or any other component you can do so by adding a Cartridges. A Cartridges is like a plugin for a specific technology. As we already seen how to install Foglight in a previous article here, in this one we are going to test the monitoring implementation of an Oracle database.

Configuration

The configuration is done in the web console, so we need first to login: https://192.168.56.100:8443
My user/password is the default foglight/foglight and here I am:

Capture

Email for notifications

Dashboards > Administration > Setup > Email Configuration

screen2

Users and external authentication

screen3

It is possible to use an external authentication method by configuring one the following providers:

  • Microsoft® Active Directory®
  • Oracle® Directory Server Enterprise Edition
  • OpenLDAP®
  • Novell® eDirectory™

More information here.

Installing and Upgrading Cartridges

Each Foglight cartridge contains extensions for monitoring a specific environment, such as applications, operating systems, or database management systems. Cartridges are installed on the server. A cartridge can contain one or more agents that are used to collect data from monitored environments. To manage them we need to go to the Cartridges area:

screen1

Many cartridges for databases are available here: https://support.quest.com/foglight-for-databases/5.9.2/download-new-releases

Deploying an agent

  • Agents collect data from monitored environments and send it to the Management Server
  • Each agent type can monitor a specific part of your environment, such as an operating system, application, or server
  • Foglight cartridges that you install on the server include one or more agent types
  • A server installation includes an embedded Foglight Agent Manager and it starts up and stops with the Foglight Management Server
  • Agents can be install in silent mode here

To deploy an agent we need to go use the Agent Manager in Dashboards > Administration > Agents > Agent Managers:

screen4

After the agent installation, login into the target server to be monitored and install the agent:


[root@oracle-server ~]# mkdir /opt/foglight
[root@oracle-server ~]# chown foglight. /opt/foglight
[root@oracle-server ~]# mv FglAM-5_9_2-linux-x86_64.bin /opt/foglight/
[root@oracle-server ~]# chown foglight. /opt/foglight/FglAM-5_9_2-linux-x86_64.bin
[root@oracle-server ~]# su - foglight
Last login: Tue Oct 2 00:13:43 CEST 2018 on pts/0
[foglight@oracle-server ~]$ cd /opt/foglight/
[foglight@oracle-server foglight]$ ./FglAM-5_9_2-linux-x86_64.bin ## --allow-unsecured
2018-10-02 15:45:19.000 INFO [native] Extracting to a temporary directory
2018-10-02 15:45:20.000 INFO [native] Foglight Agent Manager version 5.9.2 (build 5.9.2-201712050154-RELEASE-101) starting ...
2018-10-02 15:45:23.000 INFO [native] Extraction complete, configuring
[0/2]: -Dinstaller.executable.name=FglAM-5_9_2-linux-x86_64.bin
[1/2]: -Dinstaller.executable.path=/opt/foglight
2018-10-02 15:45:24.278 INFO Foglight Agent Manager: 5.9.2 (build 5.9.2-201712
050154-RELEASE-101)

License Agreement
===============================================================================
You must accept the following license agreement to install and use Foglight Agent Manager.

Do you accept the terms of the license agreement? [Y/N]: Y

Choose Install Location
===============================================================================
Where would you like to install the Foglight Agent Manager application?

Install directory (default: /opt/quest/foglightagentmanager):
/opt/foglight/agent

The directory "/opt/foglight/agent" does not exist. Would you like to create it? [Y/N] (default: Y):

Host Display Name
===============================================================================
Foglight Agent Manager identifies itself using the detected host name for the
computer it has been installed on. This can be overridden below to provide an
alternate name if the hostname is not stable or is already in use by another
computer.

Detected Host Name: [default: oracle-server]:

Log a warning if the detected host name changes? [Y/N] (default: Y):

Server URLs
===============================================================================
Configure the URLs that the Foglight Agent Manager process will use to communicate with the management server.
For fail-over purposes you can configure multiple URLs.

You have the following options:
1) Add a new management server URL
2) Add a new SSL Certificate CA
3) Test connectivity
4) Search for additional HA servers
5) List configured management server URLs
6) List SSL Certificate CAs
7) Delete a configured management server URL
8) Delete an SSL Certificate CA
0) Continue with the next step

What would you like to do? 1

Enter the URL data to add. The management server URL parameters are specified as comma separated name=value pairs. For example:
url=https://localhost:8443,address=127.0.0.1,proxy=http://proxy.server,ssl-allow-self-signed=false,ssl-cert-common-name=name.com

Available parameters are:
url: Required. The URL that Foglight Agent Manager will connect to.
proxy: Optional. The URL of the proxy to use when connecting.
proxy-user: Optional. The username sent to the proxy.
proxy-pass: Optional. The password sent to the proxy.
proxy-ntlm-domain: Optional. The NTLM domain sent to the proxy.
address: Optional. The local network address from which connections to the management server will be made.
ssl-allow-self-signed: Optional. True to allow self-signed certificates to be accepted; false (default) otherwise.
ssl-cert-common-name: Optional. The common name contained in the management servers certificate.
compressed: Optional. True (default) to use GZIP compression when sending and receiving data.

URL: https://192.168.56.100:8443,ssl-allow-self-signed=true
The URL has been added.

You have the following options:
1) Add a new management server URL
2) Add a new SSL Certificate CA
3) Test connectivity
4) Search for additional HA servers
5) List configured management server URLs
6) List SSL Certificate CAs
7) Delete a configured management server URL
8) Delete an SSL Certificate CA
0) Continue with the next step

What would you like to do? 3
Testing connectivity...
0%... 100% finished

You have the following options:
1) Add a new management server URL
2) Add a new SSL Certificate CA
3) Test connectivity
4) Search for additional HA servers
5) List configured management server URLs
6) List SSL Certificate CAs
7) Delete a configured management server URL
8) Delete an SSL Certificate CA
0) Continue with the next step

What would you like to do? 0

You have some untested/broken management server URLs configured. Are you sure you want to continue? [Y/N] (default: Y):

Downstream Connection Configuration
===============================================================================
Foglight Agent Manager can accept incoming connections and be configured as a concentrator that acts as an intermediary connection that aggregates multiple downstream Foglight Agent Manager clients. A concentrator configuration can provide a single connection through either a firewall or proxy for all downstream clients, or as an aggregated connection directly to the server.

Enabling this install as a concentrator will pre-configure queue and heap sizes to support accepting and transferring data from one or more downstream connections.

Enable Concentrator support? [Y/N] (default: N):

Secure Launcher
===============================================================================
Some agents require elevated permissions in order to gather the required system
metrics. These agents are launched using an external loader to give them the required access.

Please see the Foglight Agent Manager documentation for more information on agent security settings.

Secure launcher (default: /bin/sudo):

UNIX init.d Script
===============================================================================
Foglight Agent Manager will be configured to start when this host is rebooted by adding an init.d-style script.

Would you like to customize the start-up script? [Y/N] (default: N):

Summary
===============================================================================
Foglight Agent Manager has been configured and will be copied into its final location.

Press to continue.

Beginning work ...
===============================================================================
Calculating install size...
0%... 100% finished

Copying files...
0%... 10%... 20%... 30%... 40%... 50%... 60%... 70%... 80%... 90%... 100% finished

Created init.d installer script:

/opt/foglight/agent/state/default/fglam-init-script-installer.sh

Since this installation is not being performed by a user
with root privileges, the init.d script will not be installed.
A copy of the script and the script installer have been saved
in the state directory for later use.

The Foglight Agent Manager process will be started once this configuration process exits.

Then we can see that the agent is running:

[foglight@oracle-server foglight]$ /opt/foglight/agent/bin/fglam --status
0
#0: Process running normally
#1: Process not running, but PID file exists
#3: Process not running
#4: Process status unknown

I hope this helps and please do not hesitate to contact us should you need more details.

Cet article Foglight: Monitoring solution for databases [Part 02] est apparu en premier sur Blog dbi services.


5 mistakes you should avoid with Oracle Database Appliance

$
0
0

Introduction

It has been 5 years I’m working on ODA and I now have a quite good overview of the pros and the cons of such a solution. On one hand, ODA can greatly streamline your Oracle Database environment if you understand the purpose of this appliance. On the other hand, ODA can be your nightmare if you’re not using it as it supposed to be used. Let’s discover the common mistakes to avoid from the very beginning.

1st mistake – Consider the ODA as an appliance

Appliances are very popular these days, you need one feature, you buy a box that handles this feature and nothing else, you plug it and it works straight. Unfortunatly you cannot do that with an ODA. First of all, the level of embedded software is not clever enough to bundle all the features needed to simplify the DBA’s job. No doubt it will help you with faster deployment compared to home made environments, but all the tasks the DBA did before still exist. The appliance part is limited to basic operations. With an ODA, you will have to connect in terminal mode, with root, grid and oracle users to do a lot of operations. And you will need Linux skills to manage your ODA: it can be a problem if you are a Windows-only user. ODA provides a graphical interface, but it’s not something you will use very often. And there is still a lot of work for Oracle to gain the appliance moniker.

2nd mistake – Consider the ODA as a normal server

Second mistake is to consider the ODA as a normal server. Because it looks like it’s a normal server.

On the software side, if you ask a dba to connect to your Oracle environment on ODA, he probably won’t see that it’s actually an ODA, unless the server name contains the oda word :-) The only tool that differs from a normal server is the oakcli/odacli appliance manager, a command-line tool created to manage several features like database creation/deletion. What can be dangerous is that you will have system level access on the server, and all the advantages it comes with. But if you do some changes on your system, for example by installing a new package, manually changing the network configuration, tuning some Linux parameters, it can later avoid you to apply the next patch. The DBA should also keep off patching a database with classic patches available for Linux systems. Doing that will make your dbhome and related databases no more manageable by the appliance manager. Wait for the ODA-dedicated quaterly patch bundle if you want to patch.

On the hardware side, yes an ODA looks like a normal server, with free disk slots in the front. But you always have to order extension from the ODA catalog, and you cannot do what you want. You need to change disks for bigger ones? It’s not possible. You want to add 2 new disks? Not more possible, disks are sold as a 3-disk pack and are supposed to be installed together. ODAs have limitations. The small one cannot be extended at all. And the other ones support limited extensions (in number and in time).

Please keep your ODA away from Linux gurus and hardware geeks. And get stuck with recommended configurations.

3rd mistake – Compare the ODA hardware to other systems

When you consider ODA, it’s quite common to compare the hardware to what other brands can propose for the same amount of money. But it’s clearly not a good comparison. You’ll probably get more for your money from other brands. You should consider the ODA as a hardware and software bundle made to last more than a normal system. As an example, I deployed my first ODA X4-2 in May 2014, and the actual software bundle is still compatible with this ODA. Support for this ODA will end in February 2020, nearly 6 years of update for all the stack for a server that is able to run 11.2.0.4, 12.1.0.2 and 12.2.0.1 databases. Do the other brands propose that? I don’t think so.

What you cannot immediately realize is how fast the adoption of ODA can be. Most of the ODA projects I did are going on full production within 1 year, starting from initial ODA consideration. On a classic project, choosing the server/storage/software takes longer, deployment last longuer because multiple people are involved, you sometimes get stucked with hardware/software compatibily problem, and you have no guarantee about the performance even if you choose the best components. ODA reduces the duration and the cost of a project for sure.

4th mistake – Buy only one ODA

If you consider just buying one ODA, you probably need to think twice. Unless you do not want to patch regularly, this is probably not a good solution. Patching is not a zero-downtime operation, and it’s not reversible. Even if ODA patch bundles simplify patching, it’s still a complex operation especially when the patch is updating the operating system and the Grid Infrastructure components. Remember that one of the big advantage of an ODA is the availabily of a new patch every 3 months to update all the stack: firmwares, bios, ILOM, operating system, grid infrastructure, oracle homes, oracle databases, … So if you want to secure the patching process, you’d better go for 2 ODAs, one for production databases and one for dev/test databases for example. And it makes no sense to move only a part of your databases on ODA, leaving the other databases on classic systems.

Another advantage of 2+ ODAs, if you’re lucky enough to use Enterprise Edition, is the free use of Data Guard (without Active mode – standby database will stay mounted only). Most often, thinking about ODA is also thinking about disaster recovery solutions. And both are better together.

5th mistake – Manage your Enterprise licenses as you always did

One of the key feature of the ODA is the ability to scale the Enterprise licenses, starting from 1 PROC on a single ODA (or 25 named users). 1 license is only 2 cores on ODA. Does it makes sense on this platform to have limited number of licenses? Answer is yes and yes. Oracle recommends at least one core per database, but it’s not a problem to deploy 5 or even 10 (small) databases with just 1 license, there is no limit for that. Appart from the CPU limitation (applying the license will limit the available cores), ODA has quite a big amount of RAM (please tune your SGA according to this) and fast I/O speed that makes reads on disks not so expensive. CPU utilisation will be optimized.

What I mean is that you probably need less licenses on ODA than you need on a normal system. You can better spread these licenses on more ODAs and/or decrease the number of licenses you need. ODA hardware is sometimes self-financed by the economy of licenses. Keep in mind that 1 Oracle Enterprise PROC license costs more than the medium-size ODA. And you can always increase the number of licenses if needed (on-demand capacity).

Buying ODA hardware can be cheaper than you thought.

Conclusion

ODA is a great piece of hardware and knowing what it is designed for and how it works will make you better manage your Oracle Database environment.

Cet article 5 mistakes you should avoid with Oracle Database Appliance est apparu en premier sur Blog dbi services.

Some more zheap testing

$
0
0

Hervé already did some tests with zheap and documented his results yesterday. After some more discussions with Amit who did the session about zHeap at the conference here in Lisbon (you can find the slides here). I thought it might be a good idea to do some more testing on that probably upcoming feature. Lets go.

If you want to test it for your own, here is a simple script that clones the repository, compiles and installs from source and then start the PostgreSQL instance:

postgres@pgbox:/home/postgres/ [ZHEAP] cat refresh_zheap.sh 
#!/bin/bash

rm -rf zheap
git clone https://github.com/EnterpriseDB/zheap
cd zheap
PGHOME=/u01/app/postgres/product/zheap/db_1/
SEGSIZE=2
BLOCKSIZE=8
./configure --prefix=${PGHOME} \
            --exec-prefix=${PGHOME} \
            --bindir=${PGHOME}/bin \
            --libdir=${PGHOME}/lib \
            --sysconfdir=${PGHOME}/etc \
            --includedir=${PGHOME}/include \
            --datarootdir=${PGHOME}/share \
            --datadir=${PGHOME}/share \
            --with-pgport=5432 \
            --with-perl \
            --with-python \
            --with-openssl \
            --with-pam \
            --with-ldap \
            --with-libxml \
            --with-libxslt \
            --with-segsize=${SEGSIZE} \
            --with-blocksize=${BLOCKSIZE} \
	    --with-systemd
make all
make install
cd contrib
make install
rm -rf /u02/pgdata/zheap
/u01/app/postgres/product/zheap/db_1/bin/initdb -D /u02/pgdata/zheap
pg_ctl -D /u02/pgdata/zheap start
psql -c "alter system set logging_collector='on'" postgres
psql -c "alter system set log_truncate_on_rotation='on'" postgres
psql -c "alter system set log_filename='postgresql-%a.log'" postgres
psql -c "alter system set log_line_prefix='%m - %l - %p - %h - %u@%d '" postgres
psql -c "alter system set log_directory='pg_log'" postgres
pg_ctl -D /u02/pgdata/zheap restart -m fast

First of all, when you startup PostgreSQL you’ll get two new background worker processes:

postgres@pgbox:/home/postgres/ [ZHEAP] ps -ef | egrep "discard|undo"
postgres  1483  1475  0 14:40 ?        00:00:00 postgres: discard worker   
postgres  1484  1475  0 14:40 ?        00:00:01 postgres: undo worker launcher   
postgres  1566  1070  0 14:51 pts/0    00:00:00 grep -E --color=auto discard|undo

The “discard worker” is responsible for getting rid of all the undo segments that are not required anymore and the “undo worker launcher” obviously is responsible for launching undo worker processes for doing the rollbacks.

There is a new parameter which controls the default storage engine (at least the parameter is there as of now, maybe that will change in the future), so lets change that to zheap before we populate a sample database (“heap” is the default value):

postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "alter system set storage_engine='zheap'" postgres
ALTER SYSTEM
Time: 12.722 ms
postgres@pgbox:/home/postgres/ [ZHEAP] pg_ctl -D $PGDATA restart -m fast
postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "show storage_engine" postgres
 storage_engine 
----------------
 zheap
(1 row)

Lets use pgbench to create the sample data:

postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "create database zheap" postgres
CREATE DATABASE
Time: 763.284 ms
postgres@pgbox:/home/postgres/ [ZHEAP] time pgbench -i -s 100 zheap
...
done.

real	0m23.375s
user	0m2.293s
sys	0m0.772s

That should have created the tables using the zheap storage engine:

postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "\d+ pgbench_accounts" zheap
                                  Table "public.pgbench_accounts"
  Column  |     Type      | Collation | Nullable | Default | Storage  | Stats target | Description 
----------+---------------+-----------+----------+---------+----------+--------------+-------------
 aid      | integer       |           | not null |         | plain    |              | 
 bid      | integer       |           |          |         | plain    |              | 
 abalance | integer       |           |          |         | plain    |              | 
 filler   | character(84) |           |          |         | extended |              | 
Indexes:
    "pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
Options: storage_engine=zheap, fillfactor=100

When we do the same using the “heap” storage format how long does that take?:

postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "alter system set storage_engine='heap'" postgres
ALTER SYSTEM
Time: 8.790 ms
postgres@pgbox:/home/postgres/ [ZHEAP] pg_ctl -D $PGDATA restart -m fast
postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "create database heap" postgres
CREATE DATABASE
Time: 889.847 ms
postgres@pgbox:/home/postgres/ [ZHEAP] time pgbench -i -s 100 heap
...

real	0m30.471s
user	0m2.355s
sys	0m0.419s
postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "\d+ pgbench_accounts" heap
                                  Table "public.pgbench_accounts"
  Column  |     Type      | Collation | Nullable | Default | Storage  | Stats target | Description 
----------+---------------+-----------+----------+---------+----------+--------------+-------------
 aid      | integer       |           | not null |         | plain    |              | 
 bid      | integer       |           |          |         | plain    |              | 
 abalance | integer       |           |          |         | plain    |              | 
 filler   | character(84) |           |          |         | extended |              | 
Indexes:
    "pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
Options: fillfactor=100

postgres@pgbox:/home/postgres/ [ZHEAP] 

I ran that test several times but the difference of about 5 to 6 seconds is consistent. zheap is faster here, but that is coming from vacuum. When you run the same test again but skip the vacuum ( the “-n” option of pgbench) at the end, heap is faster:

postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "create database heap" postgres
CREATE DATABASE
Time: 562.155 ms
postgres@pgbox:/home/postgres/ [ZHEAP] time pgbench -i -n -s 100 heap
done.

real	0m21.650s
user	0m2.316s
sys	0m0.225s

But anyway: As zheap has to create undo segments more needs to go to disk initially. heap needs to run vacuum, not immediately but for sure some time later. When you compare a pure insert only workload, without vacuum, heap is faster. The great thing is, that you can decide what you want to use on the table level. Some tables might be better created with the zheap storage engine, others may be better created with heap. The important bit is that you have full control.

Hervé already compared the size of his tables in the last post. Do we see the same here when we compare the size of the entire databases?

postgres@pgbox:/home/postgres/ [ZHEAP] vacuumdb heap
postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "\l+" postgres
                                                                   List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   |  Size   | Tablespace |                Description         
-----------+----------+----------+------------+------------+-----------------------+---------+------------+------------------------------------
 heap      | postgres | UTF8     | en_US.utf8 | en_US.utf8 |                       | 1503 MB | pg_default | 
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |                       | 7867 kB | pg_default | default administrative connection d
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +| 7721 kB | pg_default | unmodifiable empty database
           |          |          |            |            | postgres=CTc/postgres |         |            | 
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +| 7721 kB | pg_default | default template for new databases
           |          |          |            |            | postgres=CTc/postgres |         |            | 
 zheap     | postgres | UTF8     | en_US.utf8 | en_US.utf8 |                       | 1250 MB | pg_default | 
(5 rows)

Yes, heap is 253MB smaller. That difference should even get bigger once we populate the “filler” column of the pgbench_accounts table, which is currently NULL:

postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "update pgbench_accounts set filler = 'aaaaaa'" zheap
UPDATE 10000000
Time: 55768.488 ms (00:55.768)
postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "update pgbench_accounts set filler = 'aaaaaa'" heap
UPDATE 10000000
Time: 52598.782 ms (00:52.599)
postgres@pgbox:/home/postgres/ [ZHEAP] vacuumdb heap
vacuumdb: vacuuming database "heap"
postgres@pgbox:/home/postgres/ [ZHEAP] psql -c "\l+" postgres
                                                                   List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   |  Size   | Tablespace |                Description              
-----------+----------+----------+------------+------------+-----------------------+---------+------------+-----------------------------------------
 heap      | postgres | UTF8     | en_US.utf8 | en_US.utf8 |                       | 3213 MB | pg_default | 
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |                       | 7867 kB | pg_default | default administrative connection databa
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +| 7721 kB | pg_default | unmodifiable empty database
           |          |          |            |            | postgres=CTc/postgres |         |            | 
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +| 7721 kB | pg_default | default template for new databases
           |          |          |            |            | postgres=CTc/postgres |         |            | 
 zheap     | postgres | UTF8     | en_US.utf8 | en_US.utf8 |                       | 1250 MB | pg_default | 

As expected and consistent with what Herve has seen in his tests. The update against the heap table was a bit faster (around 3 seconds) but again: zheap hast to create undo segments and that causes additional writes on disk. Three seconds against a 10 million row table is not that huge, by the way, and how often do you update the complete table?

Now lets run a standard pgbench workload against these database and check what we can see there. For the zheap database with 1 connection for 60 seconds this is the best result I got after ten runs:

postgres@pgbox:/home/postgres/ [ZHEAP] pgbench -c 1 -T 60 zheap
starting vacuum...end.
transaction type: 
scaling factor: 100
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 29265
latency average = 2.050 ms
tps = 487.726916 (including connections establishing)
tps = 487.786025 (excluding connections establishing)

The same against the heap:

postgres@pgbox:/home/postgres/ [ZHEAP] pgbench -c 1 -T 60 heap
starting vacuum...end.
transaction type: 
scaling factor: 100
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 24992
latency average = 2.401 ms
tps = 416.485499 (including connections establishing)
tps = 416.516805 (excluding connections establishing)

The numbers changed a bit for every execution but always zheap was better than heap (Be aware that I am on little VM here), so at least there is no regression in performance but rather an improvement for this workload.

For the select only workload (the “-S” option) this is the best result for heap:

postgres@pgbox:/home/postgres/ [ZHEAP] for i in {1..10}; do pgbench -c 1 -S -T 60 heap; done
...
starting vacuum...end.
transaction type: 
scaling factor: 100
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 64954
latency average = 0.924 ms
tps = 1082.514439 (including connections establishing)
tps = 1082.578288 (excluding connections establishing)
...

And this is the best result for zheap:

postgres@pgbox:/home/postgres/ [ZHEAP] for i in {1..10}; do pgbench -c 1 -S -T 60 zheap; done
...
starting vacuum...end.
transaction type: 
scaling factor: 100
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 109023
latency average = 0.550 ms
tps = 1816.787280 (including connections establishing)
tps = 1817.485717 (excluding connections establishing)
...

With this workload the difference is even more clear: zheap clearly wins.

As noted before: all these test have been done locally on a little VM, so be careful with these number. We should have access to a great storage system with some good servers soon and once we have that I’ll do some more tests and publish the results.

For now it is somehow clear that zheap is an improvement for several types of workloads while heap still is better for others. In the next post I’ll try to do some tests to help the developers, meaning: Can we break it?

Cet article Some more zheap testing est apparu en premier sur Blog dbi services.

A Graphical Overview of a Repository

$
0
0

As the saying goes, “A Picture Is Worth A Thousands Words”. I’d add “And A Simple Graph Is Worth A Long, Abstruse List of Numbers”. And of words too, so let’s show off a little bit:
Screenshot from 2018-11-03 19-39-57
Interested ? Then, please read on.
It happens not so infrequently that we wish we could quickly plot a few numbers to have a look at their overall trend. The usual procedure is to output the numbers into a csv file, transfer it to a desktop machine, subsequently import it into a spreadsheet program and interactively set up a chart, a tedious manual procedure at best, especially if it has to be repeated several times.
What if we could query a few relevant values from a system and plot them in one go to visualize the produced graphs from within a browser ? What if we could generalize that procedure to any extracted tabular data ? I’m not talking of a sophisticated interface to some enterprise-class graphing tool but just of a simple way to get a quick visual overall feeling of some variables with as little installed software as possible.
I’ll show here how to do that for a Documentum repository but the target system can be anything, a database, an ldap, an O/S, just adapt the queries and the script as needed.
To simplify, I assume that we generally work on a server machine, likely a Linux headless VM, and that the browser runs remotely on any GUI-based desktop, quite a common configuration for Documentum.

The Data

As an administrator, I often have to connect to docbases I never visited before, and I find it useful to run the following queries to help me make acquaintance with those new beasts. In the examples below, the target docbase is an out of the box one with no activity in it, which explains the low numbers and lack of custom doctypes:

— 1. what distinct document types are there ?

select r_object_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by r_object_type order by 1
r_object_type count(*) tot_size
-------------------------------- ---------------------- ----------------------
dm_document 1295 6384130
dm_esign_template 1 46255
dm_format_preferences 1 109
dm_menu_system 2 352034
dm_plugin 2 212586
dm_xml_config 1 534
dmc_jar 236 50451386
dmc_preset_package 2 11951
dmc_tcf_activity_template 10 12162
(9 rows affected)

A reminder of some definitions: The result table above is a dataset. The r_object_type is the category, typically shown in the X-axis. The count(*) and tot_size columns are the “variables” to be plot, typically as bars, lines or pie slices. They are also named “traces” in some graphing tools. Some of the datasets here have 1, 2 or 3 variables to be plot. Datasets with 1 variable can be plot as bars, lines or pies charts. Datasets with 2 variables can be plot as grouped or stacked bars, or as 2 distinct graphs of one variable. There are many possibilities and combinations, and the choice depends on which representation offers the best visual clarity. Sometimes, the plotting library even lets one edit the programmatically generated graph and interactively choose the best type of graph with no coding needed !

— 2. how does their population vary over time ?

select r_object_type, datefloor(month, r_creation_date) as "creation_month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by r_object_type, datefloor(month, r_creation_date) order by 1, 2;
r_object_type creation_month count(*) tot_size
-------------------------------- ------------------------- ------------ ------------
dm_document 11/1/2017 01:00:00 448 3055062
dm_document 12/1/2017 01:00:00 120 323552
dm_document 3/1/2018 01:00:00 38 66288
dm_document 4/1/2018 02:00:00 469 1427865
dm_document 5/1/2018 02:00:00 86 584453
dm_document 6/1/2018 02:00:00 20 150464
dm_document 7/1/2018 02:00:00 40 301341
dm_document 8/1/2018 02:00:00 32 151333
dm_document 9/1/2018 02:00:00 46 356386
dm_esign_template 11/1/2017 01:00:00 1 46255
dm_format_preferences 11/1/2017 01:00:00 1 109
dm_menu_system 11/1/2017 01:00:00 2 352034
dm_plugin 11/1/2017 01:00:00 2 212586
dm_xml_config 11/1/2017 01:00:00 1 534
dmc_jar 11/1/2017 01:00:00 236 50451386
dmc_preset_package 11/1/2017 01:00:00 2 11951
dmc_tcf_activity_template 11/1/2017 01:00:00 10 12162
(17 rows affected)

This query tells how heavily used the repository is. Also, a broad spectre of custom document types tends to indicate that the repository is used in the context of applications.

— 3. how changing are those documents ?

select r_object_type, datefloor(month, r_modify_date) as "modification_month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) where r_creation_date < r_modify_date group by r_object_type, datefloor(month, r_modify_date) order by 1, 2;
r_object_type modification_month count(*) tot_size
-------------------------------- ------------------------- ------------ ------------
dm_document 11/1/2017 01:00:00 127 485791
dm_document 12/1/2017 01:00:00 33 122863
dm_document 3/1/2018 01:00:00 16 34310
dm_document 4/1/2018 02:00:00 209 749370
dm_document 5/1/2018 02:00:00 42 311211
dm_document 6/1/2018 02:00:00 10 79803
dm_document 7/1/2018 02:00:00 20 160100
dm_document 8/1/2018 02:00:00 12 81982
dm_document 9/1/2018 02:00:00 23 172299
dm_esign_template 8/1/2018 02:00:00 1 46255
dmc_jar 11/1/2017 01:00:00 14 1616218
dmc_preset_package 11/1/2017 01:00:00 2 11951
dmc_tcf_activity_template 11/1/2017 01:00:00 10 12162
(13 rows affected)

This query shows if a repository is used interactively rather than for archiving. If there are lots of editions, the docbase is a lively one; on the contrary, if documents are rarely or never edited, the docbase is mostly used for archiving. The document ownership can tell too, technical accounts vs. real people.

— samething but without distinction of document type;
— 4. new documents;

select datefloor(month, r_creation_date) as "creation_month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by datefloor(month, r_creation_date) order by 1;
creation_month count(*) tot_size
------------------------- ---------------------- ----------------------
11/1/2017 01:00:00 703 54142079
12/1/2017 01:00:00 120 323552
3/1/2018 01:00:00 38 66288
4/1/2018 02:00:00 469 1427865
5/1/2018 02:00:00 86 584453
6/1/2018 02:00:00 20 150464
7/1/2018 02:00:00 40 301341
8/1/2018 02:00:00 32 151333
9/1/2018 02:00:00 42 323772
(9 rows affected)

— 5. modified documents;

select datefloor(month, r_modify_date) as "modification_month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) where r_creation_date < r_modify_date group by datefloor(month, r_modify_date) order by 1;
modification_month count(*) tot_size
------------------------- ---------------------- ----------------------
11/1/2017 01:00:00 153 2126122
12/1/2017 01:00:00 33 122863
3/1/2018 01:00:00 16 34310
4/1/2018 02:00:00 209 749370
5/1/2018 02:00:00 42 311211
6/1/2018 02:00:00 10 79803
7/1/2018 02:00:00 20 160100
8/1/2018 02:00:00 13 128237
9/1/2018 02:00:00 20 140354
(9 rows affected)

— 6. what content types are used in the repository ?

select a_content_type, count(*) as "count_content_type", sum(r_full_content_size) as "tot_content_size" from dm_document(all) group by a_content_type order by 1
col a_content_type 20
a_content_type count_content_type tot_content_size
-------------------- ------------------ ----------------
11 0
amipro 1 4558
crtext 43 389681
dtd 5 163392
excel12bbook 1 8546
excel12book 1 7848
excel12mebook 1 7867
excel12metemplate 1 7871
excel12template 1 7853
excel5book 1 15360
excel8book 1 13824
excel8template 1 13824
ibmshrlib 2 212586
jar 233 48856783
java 3 1594603
maker55 5 117760
mdoc55 9 780288
ms_access7 1 83968
ms_access8 1 59392
ms_access8_mde 1 61440
msw12 1 11280
msw12me 1 10009
msw12metemplate 1 10004
msw12template 1 9993
msw6 1 11776
msw8 1 19456
msw8template 1 27136
pdf 2 50214
powerpoint 1 14848
ppt12 1 29956
ppt12me 1 29944
ppt12meslideshow 1 29943
ppt12metemplate 1 29941
ppt12slideshow 1 29897
ppt12template 1 29914
ppt8 1 7680
ppt8_template 1 9728
text 1175 4236008
ustn 1 10240
vrf 1 158993
wp6 1 941
wp7 1 1362
wp8 1 1362
xml 28 37953
zip 1 255125
(45 rows affected)

The content type may indicates the kind of activity the repository is used for. While pdf are generally final documents suitable for archiving, a predominance of illustrator/pagemaker/QuarkXPress vs photoshop vs multimedia vs Office documents can give a hint at the docbase’s general function.
Since most content types are unused (who uses Word Perfect or Ami Pro any more ?), a constraint on the count can be introduced, e.g. “having count(*) > 10″ to filter out those obsolete formats.

— 7. and how do they evolve over time ?

select datefloor(month, r_creation_date) as "creation_month", a_content_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by datefloor(month, r_creation_date), a_content_type order by 1, 2
creation_month a_content_type count(*) tot_size
------------------------- -------------------- ------------ ------------
11/1/2017 01:00:00 1 0
11/1/2017 01:00:00 amipro 1 4558
11/1/2017 01:00:00 crtext 43 389681
11/1/2017 01:00:00 dtd 5 163392
11/1/2017 01:00:00 excel12bbook 1 8546
11/1/2017 01:00:00 excel12book 1 7848
11/1/2017 01:00:00 excel12mebook 1 7867
11/1/2017 01:00:00 excel12metemplate 1 7871
11/1/2017 01:00:00 excel12template 1 7853
11/1/2017 01:00:00 excel5book 1 15360
11/1/2017 01:00:00 excel8book 1 13824
11/1/2017 01:00:00 excel8template 1 13824
11/1/2017 01:00:00 ibmshrlib 2 212586
11/1/2017 01:00:00 jar 233 48856783
11/1/2017 01:00:00 java 3 1594603
11/1/2017 01:00:00 maker55 5 117760
11/1/2017 01:00:00 mdoc55 9 780288
11/1/2017 01:00:00 ms_access7 1 83968
11/1/2017 01:00:00 ms_access8 1 59392
11/1/2017 01:00:00 ms_access8_mde 1 61440
11/1/2017 01:00:00 msw12 1 11280
11/1/2017 01:00:00 msw12me 1 10009
11/1/2017 01:00:00 msw12metemplate 1 10004
11/1/2017 01:00:00 msw12template 1 9993
11/1/2017 01:00:00 msw6 1 11776
11/1/2017 01:00:00 msw8 1 19456
11/1/2017 01:00:00 msw8template 1 27136
11/1/2017 01:00:00 pdf 2 50214
11/1/2017 01:00:00 powerpoint 1 14848
11/1/2017 01:00:00 ppt12 1 29956
11/1/2017 01:00:00 ppt12me 1 29944
11/1/2017 01:00:00 ppt12meslideshow 1 29943
11/1/2017 01:00:00 ppt12metemplate 1 29941
11/1/2017 01:00:00 ppt12slideshow 1 29897
11/1/2017 01:00:00 ppt12template 1 29914
11/1/2017 01:00:00 ppt8 1 7680
11/1/2017 01:00:00 ppt8_template 1 9728
11/1/2017 01:00:00 text 338 906940
11/1/2017 01:00:00 ustn 1 10240
11/1/2017 01:00:00 vrf 1 158993
11/1/2017 01:00:00 wp6 1 941
11/1/2017 01:00:00 wp7 1 1362
11/1/2017 01:00:00 wp8 1 1362
11/1/2017 01:00:00 xml 28 37953
11/1/2017 01:00:00 zip 1 255125
12/1/2017 01:00:00 text 120 323552
3/1/2018 01:00:00 text 38 66288
4/1/2018 02:00:00 text 469 1427865
5/1/2018 02:00:00 text 86 584453
6/1/2018 02:00:00 text 20 150464
7/1/2018 02:00:00 text 40 301341
8/1/2018 02:00:00 10 0
8/1/2018 02:00:00 text 22 151333
9/1/2018 02:00:00 text 42 323772
(54 rows affected)

— 8. where are those contents stored ?

select a_storage_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by a_storage_type order by a_storage_type
col a_storage_type 20
a_storage_type count(*) tot_size
-------------------- ------------ ------------
11 0
filestore_01 1539 57471147
(2 rows affected)

Filestores are conceptually quite similar to Oracle RDBMS tablespaces. It is a good practice to separate filestores by applications/type of documents and/or even by time if the volume of documents in important. This query tells if this is in place and, if so, what the criteria were.

— 9. dig by document type;

select r_object_type, a_storage_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by r_object_type, a_storage_type order by r_object_type, a_storage_type
r_object_type a_storage_type count(*) tot_size
-------------------------------- -------------------- ------------ ------------
dm_document 11 0
dm_document filestore_01 1284 6384130
dm_esign_template filestore_01 1 46255
dm_format_preferences filestore_01 1 109
dm_menu_system filestore_01 2 352034
dm_plugin filestore_01 2 212586
dm_xml_config filestore_01 1 534
dmc_jar filestore_01 236 50451386
dmc_preset_package filestore_01 2 11951
dmc_tcf_activity_template filestore_01 10 12162
(10 rows affected)

— 10. same but by content format;

select a_content_type, a_storage_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by a_content_type, a_storage_type having count(*) > 10 order by a_content_type, a_storage_type
a_content_type a_storage_type count(*) tot_size
-------------------------------- --------------- ---------------------- ----------------------
crtext filestore_01 43 389681
jar filestore_01 233 48856783
text filestore_01 1204 4432200
xml filestore_01 28 37953
(4 rows affected)

— 11. what ACLs do exist and who created them ?

select owner_name, count(*) from dm_acl group by owner_name order by 2 desc
col owner_name 30
owner_name count(*)
------------------------------ ------------
dmadmin 179
dmtest 44
dm_bof_registry 6
dmc_wdk_preferences_owner 2
dmc_wdk_presets_owner 2
dm_mediaserver 1
dm_audit_user 1
dm_autorender_mac 1
dm_fulltext_index_user 1
dm_autorender_win31 1
dm_report_user 1
(11 rows affected)

Spoiler alert: the attentive readers have probably noticed the command “col” preceding several queries, like in “col name 20″; this is not a standard idql command but part of an extended idql which will be presented in a future blog.
ACLs are part of the security model in place, if any. Usually, the owner is a technical account, sometimes a different one for each application or application’s functionality or business line, so this query can show if the repository is under control of applications or rather simply used as a replacement for a shared drive. Globally, it can tell how the repository’s security is managed, if at all.

— 12. ACLs most in use;

select acl_name, count(*) from dm_document(all) where acl_name not like 'dm_%' group by acl_name having count(*) >= 10 order by 1
acl_name count(*)
-------------------------------- ----------------------
BOF_acl 233
(1 row affected)

Here too, a filter by usage can be introduced so that rarely used acls are not reported, e.g. “having count(*) > 10″.

— 13. queue items;

select name, count(*) from dmi_queue_item group by name order by 2 desc
name count(*)
-------------------- ------------
dm_autorender_win31 115
dmadmin 101
dmtest 54
(3 rows affected)

This query is mainly useful to show if external systems are used for generating renditions, thumbnails or any other document transformation.

— non quantifiable queries;

select object_name from dm_acl where object_name not like 'dm_%' order by 1
object_name
--------------------------------
BOF_acl
BOF_acl2
BPM Process Variable ACL
BatchPromoteAcl
Default Preset Permission Set
Global User Default ACL
WebPublishingAcl
Work Queue User Default ACL
dce_world_write
desktop_client_acl1
replica_acl_default
(11 rows affected)
 
col name 20
col root 30
col file_system_path 70
select f.name, f.root, l.file_system_path from dm_filestore f, dm_location l where f.root = l.object_name
name root file_system_path
-------------------- ------------------------------ ----------------------------------------------------------------------
filestore_01 storage_01 /home/dmadmin/documentum/data/dmtest/content_storage_01
thumbnail_store_01 thumbnail_storage_01 /home/dmadmin/documentum/data/dmtest/thumbnail_storage_01
streaming_store_01 streaming_storage_01 /home/dmadmin/documentum/data/dmtest/streaming_storage_01
replicate_temp_store replicate_location /home/dmadmin/documentum/data/dmtest/replicate_temp_store
replica_filestore_01 replica_storage_01 /home/dmadmin/documentum/data/dmtest/replica_content_storage_01
(5 rows affected)

No numbers here, just plain text information.

There is of course much more information to query (e.g. what about lifecycles and workflows, users and groups ?) depending on what one needs to look at in the repositories, but those are enough examples for our demonstration’s purpose. The readers can always add queries or refine the existing ones to suit their needs.
At this point, we have a lot of hard-to-ingest numbers. Plotting them gives a pleasant 2-dimensional view of the data and will allow to easily compare and analyze them, especially if the generated plot offers some interactivity.

The Graphing Library

To do this, we need a graphing library and since we want to be able to look at the graphs from within a browser for platform-independence and zero-installation on the desktop, we need one that lets us generate an HTML page containing the charts. This is possible in 2 ways:
o   the library is a javascript one; we will programmatically build up json literals, pass them to the library’s plotting function and wrap everything into an html page which is saved on disk for later viewing;
o   the library is usable from several scripting languages and includes a function to generate an HTML page containing the graph; we will chose one for python since this language is ubiquitous and we have now a binding for Documentum (see my blog here);
Now, there are tons of such javascript libraries available on-line (see for example here for a sample) but our choice criteria will be simple:
o   the library must be rich enough but at the same time easy to use;
o   it must do its work entirely locally as servers are generally not allowed to access the Internet;
o   it must be free in order to get rid of the licensing complexity as it can often be overkill in such a simple and confined usage;
As we must pick one, let’s choose Plotly since it is very capable and largely covers our needs. Besides, it works either as a javascript or as a python library but for more generality let’s just use it as a javascript library and generate ourselves the html pages. As an additional bonus, Plotly also allows to interactively edit a plotted graph so it can be tweaked at will. This is very convenient because it lessens the effort to optimize the graphs’ readability as this can be done later by the users themselves while viewing the graphs. For example, the users can zoom, unselect variables, move legends around and much more.
To install it, simply right-click here and save the link to the default graph location in the project’s directory, e.g. /home/dmadmin/graph-stats/graphs.

pwd
/home/dmadmin/graph-stats/graphs
ls -l
...
-rw-rw-r-- 1 dmadmin dmadmin 2814564 Oct 11 11:40 plotly-latest.min.js
...

That’s less than 2.7 MiB of powerful compact js source code. Since it is a text file, there shouldn’t be any security concern in copying that file on a server. Also, as it is executed inside the sandbox of a browser, security is normally not an issue.
The library needs to be in the same directory as the generated html files. If those files are copied to a desktop for direct viewing from a browser with File/Open, don’t forget to copy over the Plotly.js library too, open the html file in an editor, go to the line 6 below and set src to the path of the html file:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
   <head>
      <title>Graphical Overview of Repository SubWay</title>
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <script src="...../plotly-latest.min.js"></script>
   </head>

Viewing the html pages

Since those pages must be accessed from any machine on the network (typically, a desktop with a browser), despite they may be created directly on the server that hosts the repositories of interest, we need a web server. It may happen that a disk volume is shared between desktops and servers but this is far from being the general case. Fortunately, python is very helpful here and saves us from the tedious task of setting up a full-fledged web server such as apache. The one-liner below will start a server for visualizing the files in the current directory and its sub-directories:

python3 -m http.server [port] # for python >= v3
python -m SimpleHTTPServer [port] # for python 2.x

It can also be moved in the background to free the command-line. To stop it, bring it in the foreground and type ctrl-C.
Its default port is 8000.
This mini-web server is now waiting for requests on the given port and its IP address. In order to determine that IP address, use the ifconfig command and try one address that is on the same network as the desktop with the browser. Then, load the URL, e.g. http://192.168.56.10:8000/, and you’ll be greeted by a familiar point-and-click interface.
The server presents a directory listing from which to select the html files containing the generated graphs.

An Example

Here is a complete example, from the query to the graph.
The 4th DQL query and its output:

select datefloor(month, r_creation_date) as "creation_month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by datefloor(month, r_creation_date) order by 1;
creation_month count(*) tot_size
------------------------- ---------------------- ----------------------
11/1/2017 01:00:00 703 54142079
12/1/2017 01:00:00 120 323552
3/1/2018 01:00:00 38 66288
4/1/2018 02:00:00 469 1427865
5/1/2018 02:00:00 86 584453
6/1/2018 02:00:00 20 150464
7/1/2018 02:00:00 40 301341
8/1/2018 02:00:00 32 151333
9/1/2018 02:00:00 42 323772
(9 rows affected)

The command to generate the graph:

pwd
/home/dmadmin/graph-stats
./graph-stats.py --docbase dmtest

A simplified generated html page showing only the count(*) variable:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
      <head>
         <title>Statistical Graphs for Repository dmtest</title>
         <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
         <script src="http://192.168.56.10:8000/plotly-latest.min.js"></script>
      </head>
      <body>
      <div id="TheChart" style="width:400px;height:400px;"></div>
      <script>
         var ctx = document.getElementById("TheChart");
         Plotly.newPlot(ctx, [{
                             type: "bar",
                             name: "new docs",
                             x: ['2017/11', '2017/12', '2018/03', '2018/04', '2018/05', '2018/06', '2018/07', '2018/08', '2018/09'],
                             y: ['703', '120', '38', '469', '86', '20', '40', '32', '42'],
                             marker: {
                                        color: '#4e6bed',
                                        line: {
                                                 width: 2.5
                                              }
                                     }
                            }],
                            {
                               title: "New Documents",
                               font: {size: 18}
                            },
                        {responsive: true});
      </script>
      </body>
</html>

Its resulting file:

ll graphs
total 2756
-rw-rw-r-- 1 dmadmin dmadmin 2814564 Oct 12 14:18 plotly-latest.min.js
-rw-rw-r-- 1 dmadmin dmadmin 1713 Oct 12 14:18 dmtest-20181012-141811.html

Same but exposed by the mini-web server and viewed from a browser:
Screenshot from 2018-11-03 15-02-56
When clicking on one of the html file, a graph such as the one below is displayed:
Screenshot from 2018-11-03 15-08-52

The python script

The following python script runs the 13 queries above and produce one large HTML page containing the graphs of all the datasets. It can also be used to query a list of local docbases and generate one single, huge html report or one report per docbase. It can even directly output the produced html code to stdout for further in-line processing, e.g. to somewhat compact the javascript code:

./graph-stats.py --docbase dmtest -o stdout | gawk '{printf $0}' > graphs/dmtest-gibberish.html
 
wc -l graphs/-gibberish.html
0 graphs/-gibberish.html
 
less graphs/dmtest-gibberish.html
Screenshot from 2018-11-03 15-26-11

So, here is the script:

#!/usr/bin/env python

# 10/2018, C. Cervini, dbi-services;
 
import sys
import getopt
from datetime import datetime
import json
import DctmAPI

def Usage():
   print("""
Usage:
Connects as dmadmin/xxxx to a local repository and generates an HTML page containing the plots of several DQL queries result;
Usage:
       ./graph-stats.py -h|--help | -d|--docbase <docbase>{,<docbase>} \[-o|--output_file <output_file>\]
<docbase> can be one repository or a comma-separated list of repositories;
if <output_file> is omitted, the html page is output to ./graphs/<docbase>-$(date +"%Y%m%d-%H%M%S").html;
<output_file> can be "stdout", which is useful for CGI programming;
Example:
       ./graph-stats.py -d dmtest,mail_archive,doc_engineering
will query the docbases dmtest, mail_archive and doc_engineering and output the graphs to the files <docbase>-$(date +"%Y%m%d-%H%M%S").html, one file per docbase;
       ./graph-stats.py -d mail_archive,doc_engineering -o all_docbase_current_status
will query the docbases mail_archive and doc_engineering and output all the graphs to the unique file all_docbase_current_status;
       ./graph-stats.py -d mail-archive
will query docbase mail-archive and output the graphs to the file mail-archive-$(date +"%Y%m%d-%H%M%S").html;
       ./graph-stats.py --docbase dmtest --output dmtest.html
will query docbase dmtest and output the graphs to the file dmtest.html;
       ./graph-stats.py -d dmtest --output stdout
will query docbase dmtest and output the graphs to stdout;
""")

def Plot2HTML(div, graph_title, data, data_labels, bLineGraph = False, mode = None):
   global html_output
   if None != html_output:
      sys.stdout = open(html_output, "a")

   # start the html page;
   if "b" == mode:
      global server, page_title
      print('''
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
   <head>
      <title>''' + page_title + '''</title>
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <script src="http://''' + server + '''/plotly-latest.min.js"></script>
   </head>
   <body>
   <center><h3>''' + page_title + '''</h3></center>
''')

   # append to the body of the html page;
   if len(data_labels) == 4:
      stack_labels = {}
      for point in data:
         stack_labels[point[data_labels[0]]] = 0
      print('''
      <div style="width:1500px;height:600px;">
          <div id="''' + div + '-' + data_labels[2] + '''" style="width:50%; float:left;"></div>
          <div id="''' + div + '-' + data_labels[3] + '''" style="width:50%; float:left;"></div>
      </div>
      <script>
         var ctx = document.getElementById("''' + div + '-' + data_labels[2] + '''");
      ''')
      variables = ""
      for stack in stack_labels:
         x = [point[data_labels[1]] for i, point in enumerate(data) if point[data_labels[0]] == stack]
         y1 = [point[data_labels[2]] for i, point in enumerate(data) if point[data_labels[0]] == stack]
         variables +=  ("" if not variables else ", ") + "data_" + stack 
         if bLineGraph:
            vars()['data_' + stack] = {
                                        'name': stack,
                                        'type': "scatter",
                                        'mode': "lines",
                                        'x': x,
                                        'y': y1,
                                      };
         else:
            vars()['data_' + stack] = {
                                        'name': stack,
                                        'type': "bar",
                                        'x': x,
                                        'y': y1,
                                        'width': 0.25,
                                        'marker': {
                                                     #'color': '#009933',
                                                     'line': {
                                                                'width': 1.0
                                                             }
                                                  }
                                      };
         print('data_' + stack + ' = ' + json.dumps(vars()['data_' + stack]) + ';')
         layout = {
                     'title': '<b>' + graph_title + '<br>' + data_labels[2] + '</b>',
                     'legend': {'x': -.1, 'y': 1.2, 'font': {'size': 8}},
                     'font': {'size': 12},
                     'width': 750,
                     'height': 600,
                     'xaxis': {
                                 'title': '<b>' + data_labels[1] + '</b>',
                                 'titlefont': {'size': 10},
                                 'tickangle': -45,
                                 'tickfont': {'size': 8},
                                 'zeroline': True,
                                 'showline': True,
                                 'categoryorder': "category ascending",
                                 'type': "category"
                              },
                     'yaxis': {
                                 'title': '<b>' + data_labels[2] + '</b>',
                                 'titlefont': {'size': 10},
                                 'zeroline': True,
                                 'showline': True,
                                 'tickfont': {'size': 8},
                                 'showgrid': False
                              }
                  }
         if not bLineGraph:
            layout.update({'barmode': "stack", 'bargap': 0.15, 'bargroupgap': 0.5})
         interaction = {'responsive': True, 'scrollZoom': True, 'editable': True}
      print('''Plotly.newPlot(ctx,
                              [''' + variables + '], ' +
                              json.dumps(layout) + ',' +
                              json.dumps(interaction) + ''');''')
      for stack in stack_labels:
         x = [point[data_labels[1]] for i, point in enumerate(data) if point[data_labels[0]] == stack]
         y2 = [point[data_labels[3]] for i, point in enumerate(data) if point[data_labels[0]] == stack]
         if bLineGraph:
            vars()['data_' + stack] = {
                                         'name': stack,
                                         'type': "scatter",
                                         'mode': "lines",
                                         'x': x,
                                         'y': y2
                                      };
         else:
            vars()['data_' + stack] = {
                                         'name': stack,
                                         'type': "bar",
                                         'x': x,
                                         'y': y2,
                                         'width': 0.25,
                                         'marker': {
                                                      #'color': '#009933',*/
                                                      'line': {
                                                                 'width': 1.0
                                                              }
                                                   }
                                      };
         print('data_' + stack + ' = ' + json.dumps(vars()['data_' + stack]) + ';')
      layout = {
                  'title': '<b>' + graph_title + '<br>' + data_labels[3] + '</b>',
                  'legend': {'x': -.1, 'y': 1.2, 'font': {'size': 8}},
                  'font': {
                             'size': 12
                          },
                  'width': 750,
                  'height': 600,
                  'xaxis': {
                            'title': '<b>' + data_labels[1] + '</b>',
                            'titlefont': {'size': 10},
                            'tickangle': -45,
                            'tickfont': {'size': 8},
                            'zeroline': True,
                            'showline': True,
                            'categoryorder': "category ascending",
                            'type': "category"
                         },
                  'yaxis': {
                    'title': '<b>' + data_labels[3] + '</b>',
                    'titlefont': {'size': 10},
                    'zeroline': True,
                    'showline': True,
                    'tickfont': {'size': 8},
                    'showgrid': False
                  }
               }
      if not bLineGraph:
         layout.update({'barmode': "stack", 'bargap': 0.15, 'bargroupgap': 0.5})
      interaction = {'responsive': True, 'scrollZoom': True, 'editable': True}
      print('''
      var ctx = document.getElementById("''' + div + '-' + data_labels[3] + '''");
      Plotly.newPlot(ctx,
                     [''' + variables + '],' + 
                     json.dumps(layout) + ''',
                     ''' + json.dumps(interaction) + ''');
      </script>
''')
   elif len(data_labels) == 3:
      print('''
      <div style="width:1200px;height:600px;">
          <div id="''' + div + '''" style="width:75%; float:left;"></div>
          <div id="''' + div + '''-pie" style="width:25%; float:left;"></div>
      </div>
      <script>
         var ctx = document.getElementById("''' + div + '''");
''')
      traces = []
      if not bLineGraph:
         traces = [
                     {
                        'name': data_labels[1],
                        'type': "bar",
                        'x': [point[data_labels[0]] for i, point in enumerate(data)],
                        'y': [point[data_labels[1]] for i, point in enumerate(data)],
                        'width': 0.25,
                        'marker': {
                                     'color': '#009933',
                                     'line': {
                                              'width': 1.0
                                             }
                                  },
                     },
                     # work around for bug "Grouped bar charts do not work with multiple Y axes";
                     # see https://github.com/plotly/plotly.js/issues/78;
                     # must be inserted here;
                     # invisible second trace in the first group
                     {
                        'x': [point[data_labels[0]] for i, point in enumerate(data)],
                        'barmode': "overlay",
                        'y': [0], 'type': 'bar', 'hoverinfo': 'none', 'showlegend': False
                     },
                     # invisible first trace in the second group
                     {
                        'x': [point[data_labels[0]] for i, point in enumerate(data)],
                        'y': [0], 'type': 'bar', 'yaxis': 'y2', 'hoverinfo': 'none', 'showlegend': False
                     },
                     {
                        'name': data_labels[2],
                        'type': "bar",
                        'x': [point[data_labels[0]] for i, point in enumerate(data)],
                        'y': [point[data_labels[2]] for i, point in enumerate(data)],
                        'width': 0.25,
                        'yaxis': "y2",
                        'marker': {
                                     'color': '#4e6bed',
                                     'line': {
                                                'width': 1.0
                                             }
                                  }
                     }
                   ]
      else:
         traces = [
                     {
                        'name': data_labels[1],
                        'type': "scatter",
                        'mode': "lines",
                        'x': [point[data_labels[0]] for i, point in enumerate(data)],
                        'y': [point[data_labels[1]] for i, point in enumerate(data)],
                        'width': 0.25,
                     },
                     {
                        'name': data_labels[2],
                        'type': "scatter",
                        'mode': "lines",
                        'x': [point[data_labels[0]] for i, point in enumerate(data)],
                        'y': [point[data_labels[2]] for i, point in enumerate(data)],
                        'width': 0.25,
                        'yaxis': "y2",
                     }
                  ]
      layout = {
                  'title': '<b>' + graph_title + '</b>',
                  'legend': {'x': -.1, 'y': 1.2, 'font': {'size': 8}},
                  'font': {'size': 12},
                  'width': 800,
                  'height': 600,
                  'xaxis': {
                              'title': '<b>' + data_labels[0] + '</b>',
                              'titlefont': {'size': 10},
                              'tickangle': -45,
                              'tickfont': {'size': 8},
                              'zeroline': True,
                              'showline': True
                           },
                  'yaxis': {
                              'title': '<b>' + data_labels[1] + '</b>',
                              'titlefont': {'size': 10},
                              'zeroline': True,
                              'showline': True,
                              'tickfont': {'size': 8},
                              'showgrid': False,
                           },
                  'yaxis2': {
                               'title': '<b>' + data_labels[2] + '</b>',
                               'titlefont': {'size': 10},
                               'tickfont': {'size': 8},
                               'zeroline': True,
                               'showline': True,
                               'overlaying': "y",
                               'side': "right",
                               'showgrid': False,
                            }
               }
      if bLineGraph:
         layout['yaxis'].update({'type': "linear"})
         layout['yaxis2'].update({'type': "linear"})
         pass
      else:
         layout.update({'barmode': "group", 'bargap': 0.15, 'bargroupgap': 0.5})
      interaction = {'responsive': True, 'scrollZoom': True, 'editable': True}
      print('''
         Plotly.newPlot(ctx,
                        ''' + json.dumps(traces) + ''',
                        ''' + json.dumps(layout) + ''',
                        ''' + json.dumps(interaction) + ''');
      </script>

''')
      if not bLineGraph:
         traces =  [
                      {
                         'name': data_labels[1],
                         'hole': .4,
                         'type': "pie",
                         'labels': [point[data_labels[0]] for i, point in enumerate(data)],
                         'values': [point[data_labels[1]] for i, point in enumerate(data)],
                         'domain': {
                            'row': 0,
                            'column': 0
                         },
                         'outsidetextfont': {'size': 8},
                         'insidetextfont': {'size': 8},
                         'legend': {'font': {'size': 8}},
                         'textfont': {'size': 8},
                         'font': {'size': 8},
                         'hoverinfo': 'label+percent+name',
                         'hoverlabel': {'font': {'size': 8}},
                         'textinfo': 'none'
                      },
                      {
                         'name': data_labels[2],
                         'hole': .4,
                         'type': "pie",
                         'labels': [point[data_labels[0]] for i, point in enumerate(data)],
                         'values': [point[data_labels[2]] for i, point in enumerate(data)],
                         'domain': {
                            'row': 1,
                            'column': 0
                         },
                         'outsidetextfont': {'size': 8},
                         'insidetextfont': {'size': 8},
                         'legend': {'font': {'size': 8}},
                         'textfont': {'size': 8},
                         'font': {'size': 8},
                         'hoverinfo': 'label+percent+name',
                         'hoverlabel': {'font': {'size': 8}},
                         'textinfo': 'none'
                      },
                   ]
         layout = {
                     'title': '<b>' + graph_title + '</b>',
                     'annotations': [
                                       {
                                          'font': {
                                                    'size': 8
                                                  },
                                          'showarrow': False,
                                          'text': '<b>' + data_labels[1] + '</b>',
                                          'x': 0.5,
                                          'y': 0.8
                                       },
                                       {
                                          'font': {
                                                     'size': 8
                                                  },
                                          'showarrow': False,
                                          'text': '<b>' + data_labels[2] + '</b>',
                                          'x': 0.5,
                                          'y': 0.25
                                       }
                                    ],
                     'height': 600,
                     'width': 600,
                     'grid': {'rows': 2, 'columns': 1},
                     'legend': {'font': {'size': 10}}
                  }

         print('''
      <script>
         var ctx = document.getElementById("''' + div + '''-pie");
         Plotly.newPlot(ctx,
                        ''' + json.dumps(traces) + ''',
                        ''' + json.dumps(layout) + '''
);
      </script>
''')
   elif len(data_labels) == 2:
      trace = [
                 {
                    'name': data_labels[1],
                    'type': "bar",
                    'x': [point[data_labels[0]] for i, point in enumerate(data)],
                    'y': [point[data_labels[1]] for i, point in enumerate(data)],
                    'width': 0.25,
                    'marker': {
                                 'color': '#009933',
                                 'line': {
                                            'width': 1.0
                                         }
                              }
                 }
              ]
      layout = {
                  'title': '<b>' + graph_title + '</b>',
                  'legend': {'x': -.1, 'y': 1.2, 'font': {'size': 8}},
                  'font': {'size': 12},
                  'height': 600,
                  'width': 800,
                  'xaxis': {
                            'title': '<b>' + data_labels[0] + '</b>',
                            'titlefont': {'size': 10},
                            'tickangle': -45,
                            'tickfont': {'size': 8},
                            'zeroline': True,
                          },
                  'yaxis': {
                            'title': '<b>' + data_labels[1] + '</b>',
                            'titlefont': {'size': 10},
                            'zeroline': True,
                            'showline': True,
                            'tickfont': {'size': 8},
                            'showgrid': False
                          },
                  'barmode': "group",
                  'bargap': 0.15,
                  'bargroupgap': 0.5
               }
      interaction = {'responsive': True, 'scrollZoom': True, 'editable': True};
      print('''        
      <div style="width:1200px;height:600px;">
          <div id="''' + div + '''" style="width:75%; float:left;"></div> 
          <div id="''' + div + '''-pie" style="width:25%; float:left;"></div>
      </div>
      <script>
         var ctx = document.getElementById("''' + div + '''");
         Plotly.newPlot(ctx,
                        ''' + json.dumps(trace) + ''',
                        ''' + json.dumps(layout) + ''',
                        ''' + json.dumps(interaction) + ''');
      </script>
''')
      trace =  [
                  {
                     'name': data_labels[1],
                     'hole': .4,
                     'type': "pie",
                     'labels': [point[data_labels[0]] for i, point in enumerate(data)],
                     'values': [point[data_labels[1]] for i, point in enumerate(data)],
                     'domain': {
                                  'row': 0,
                                  'column': 0
                               },
                     'outsidetextfont': {'size': 8},
                     'insidetextfont': {'size': 8},
                     'legend': {'font': {'size': 8}},
                     'textfont': {'size': 8},
                     'font': {'size': 8},
                     'hoverinfo': 'label+percent+name',
                     'hoverlabel': {'font': {'size': 8}},
                     'textinfo': 'none'
                  },
               ]
      layout = {
                  'title': '<b>' + graph_title + '</b>',
                  'annotations': [
                                    {
                                       'font': {
                                                  'size': 8
                                               },
                                       'showarrow': False,
                                       'text': '<b>' + data_labels[1] + '</b>',
                                       'x': 0.5,
                                       'y': 0.5
                                    },
                                 ],
                  'height': 600,
                  'width': 600,
                  'grid': {'rows': 1, 'columns': 1},
                  'legend': {'font': {'size': 10}}
               }
      print('''        
      <script>
         var ctx = document.getElementById("''' + div + '''-pie");
         Plotly.newPlot(ctx,
                        ''' + json.dumps(trace) + ''',
                        ''' + json.dumps(layout) + ''');
      </script>
''')
   else:
      print("illegal data_label value: " + repr(data_label))

   # closes the html page;
   if "e" == mode:
      print('''
   </body>
</html>
''')

   # restores default output stream;
   if None != html_output:
      sys.stdout = sys.__stdout__

def cumulAndExtend2():
   """
   for 2 variables in resultset;
   """
   global rawdata
   sum_count = 0
   sum_size = 0
   new_month = str(datetime.now().year) + "/" + str(datetime.now().month)
   for ind, point in enumerate(rawdata):
      sum_count += int(point['count(*)'])
      point['count(*)'] = sum_count
      sum_size += int(point['tot_size'])
      point['tot_size'] = sum_size
   if rawdata[ind]['month'] < new_month:
      new_point = dict(rawdata[ind])
      new_point['month'] = new_month
      rawdata.append(new_point)
   DctmAPI.show(rawdata)

def cumulAndExtend3(key):
   """
   for 3 variables in resultset;
   """
   global rawdata
   prec_doc = ""
   sum_count = 0
   sum_size = 0
   new_month = str(datetime.now().year) + "/" + str(datetime.now().month)
   for ind, point in enumerate(rawdata):
      if "" == prec_doc:
         prec_doc = point[key]
      if point[key] != prec_doc:
         # duplicate the last point so the line graphe shows a flat line and not a simple dot when there is a unique point for the document type;
         if rawdata[ind - 1]['month'] < new_month:
            new_point = dict(rawdata[ind - 1])
            new_point['month'] = new_month
            rawdata.insert(ind, new_point)
            prec_doc = point[key]
            sum_count = 0
            sum_size = 0
      else:
         sum_count += int(point['count(*)'])
         point['count(*)'] = sum_count
         sum_size += int(point['tot_size'])
         point['tot_size'] = sum_size
   if rawdata[ind]['month'] < new_month:
      new_point = dict(rawdata[ind])
      new_point['month'] = new_month
      rawdata.append(new_point)
   DctmAPI.show(rawdata)

# -----------------
# main;
if __name__ == "__main__":
   DctmAPI.logLevel = 0
 
   # parse the command-line parameters;
   # old-style for I don't need more flexibility here;
   repository = None
   output_file = None
   try:
      (opts, args) = getopt.getopt(sys.argv[1:], "hd:o:", ["help", "docbase=", "output="])
   except getopt.GetoptError:
      print("Illegal option")
      print("./graph-stats.py -h|--help | -d|--docbase <docbase>{,<docbase>} [-o|--output_file <output_file>]")
      sys.exit(1)
   for opt, arg in opts:
      if opt in ("-h", "--help"):
         Usage()
         sys.exit()
      elif opt in ("-d", "--docbase"):
         repository = arg
      elif opt in ("-o", "--output"):
         output_file = arg
   if None == repository:
      print("at least one repository must be specified")
      Usage()
      sys.exit()
   DctmAPI.show("Will connect to docbase(s): " + repository + " and output to " + ("stdout" if "stdout" == output_file else "one single file " + output_file if output_file is not None else "one file per docbase"))
 
   # needed to locally import the js library Plotly;
   server = "192.168.56.10:8000"

   docbase_done = set()
   status = DctmAPI.dmInit()
   for pointer, docbase in enumerate(repository.split(",")):
      # graphe_1;
      if docbase in docbase_done:
         print("Warning: docbase {:s} was already processed and won't be again, skipping ...".format(docbase))
         continue
      docbase_done.add(docbase)
      session = DctmAPI.connect(docbase = docbase, user_name = "dmadmin", password = "dmadmin")
      if session is None:
         print("no session opened, exiting ...")
         exit(1)

      page_title = "Graphical Overview of Repository " + docbase
      if None == output_file:
         html_output = "./graphs/" + docbase + "-" + datetime.today().strftime("%Y%m%d-%H%M%S") + ".html"
      elif "stdout" == output_file:
         html_output = None
      else:
         html_output = output_file

      graph_name = "graphe_1"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select r_object_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by r_object_type order by 1"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q1. Present Count & Size Per Document Type",
                data = rawdata,
                data_labels = attr_name,
                mode = ("b" if (("stdout" == output_file or output_file is not None) and 0 == pointer) or output_file is None else None))

      # graphe_2;
      graph_name = "graphe_2"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select r_object_type, datetostring(r_creation_date, 'yyyy/mm') as "month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by r_object_type, datetostring(r_creation_date, 'yyyy/mm') order by 1, 2"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q2. Monthly New Documents per Type",
                data = rawdata,
                data_labels = attr_name)
      cumulAndExtend3('r_object_type')
      Plot2HTML(div = graph_name + "l",
                graph_title = "Q2l. Cumulated Monthly Documents per Type",
                data = rawdata,
                data_labels = attr_name,
                bLineGraph = True)
   
      # graphe_3;
      graph_name = "graphe_3"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select r_object_type, datetostring(r_modify_date, 'yyyy/mm') as "month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) where r_creation_date < r_modify_date group by r_object_type, datetostring(r_modify_date, 'yyyy/mm') order by 1, 2"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q3. Monthly Modified Documents per Type",
                data = rawdata,
                data_labels = attr_name)
      cumulAndExtend3('r_object_type')
      Plot2HTML(div = graph_name + "l",
                graph_title = "Q3l. Cumulated Monthly Modified Documents per Type",
                data = rawdata,
                data_labels = attr_name,
                bLineGraph = True)

      # graphe_4;
      graph_name = "graphe_4"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select datetostring(r_creation_date, 'yyyy/mm') as "month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by datetostring(r_creation_date, 'yyyy/mm') order by 1"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q4. Monthly New Documents",
                data = rawdata,
                data_labels = attr_name )
      cumulAndExtend2()
      Plot2HTML(div = graph_name + "l",
                graph_title = "Q4l. Cumulated Monthly Documents",
                data = rawdata,
                data_labels = attr_name,
                bLineGraph = True)

      # graphe_5;
      graph_name = "graphe_5"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select datetostring(r_modify_date, 'yyyy/mm') as "month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) where r_creation_date != r_modify_date group by datetostring(r_modify_date, 'yyyy/mm') order by 1;
        """
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q5. Monthly Modified Documents",
                data = rawdata,
                data_labels = attr_name)
      cumulAndExtend2()
      Plot2HTML(div = graph_name + "l",
                graph_title = "Q5l. Cumulated Monthly Modified Documents",
                data = rawdata,
                data_labels = attr_name,
                bLineGraph = True,
                mode = 'e')

      # graphe_6;
      graph_name = "graphe_6"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select a_content_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by a_content_type order by 1"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q6. Count & Size Per Content Format",
                data = rawdata,
                data_labels = attr_name)

      # graphe_7;
      graph_name = "graphe_7"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select a_content_type, datetostring(r_creation_date, 'yyyy-mm') as "month", count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by a_content_type, datetostring(r_creation_date, 'yyyy-mm') having count(*) > 10 order by 1, 2"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q7. Monthly Created Documents Per Content Format with count(*) > 10",
                data = rawdata,
                data_labels = attr_name)
      cumulAndExtend3('a_content_type')
      DctmAPI.show(rawdata)
      Plot2HTML(div = graph_name + "l",
                graph_title = "Q7l. Cumulated Monthly Created Documents Per Content Format with count(*) > 10",
                data = rawdata,
                data_labels = attr_name,
                bLineGraph = True)

      # graphe_8;
      graph_name = "graphe_8"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select a_storage_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by a_storage_type order by a_storage_type"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q8. Count & Size Per Filestore",
                data = rawdata,
                data_labels = attr_name)

      # graphe_9;
      graph_name = "graphe_9"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select r_object_type, a_storage_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by r_object_type, a_storage_type order by r_object_type, a_storage_type"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q9. Count & Size Per Document Type & Filestore",
                data = rawdata,
                data_labels = attr_name)

      # graphe_10;
      graph_name = "graphe_10"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select a_content_type, a_storage_type, count(*), sum(r_full_content_size) as "tot_size" from dm_document(all) group by a_content_type, a_storage_type having count(*) > 10 order by a_content_type, a_storage_type"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q10. Count & Size Per Content Format & Filestore with count(*) > 10",
                data = rawdata,
                data_labels = attr_name)

      # graphe_11;
      graph_name = "graphe_11"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select owner_name, count(*) from dm_acl group by owner_name order by 2 desc"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q11. ACLs per owner",
                data = rawdata,
                data_labels = attr_name)

      # graphe_12;
      graph_name = "graphe_12"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select acl_name, count(*) from dm_document(all) where acl_name not like 'dm_%' group by acl_name having count(*) >= 10 order by 1"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q12. External ACLs in Use by >= 10 documents",
                data = rawdata,
                data_labels = attr_name)

      # graphe_13;
      graph_name = "graphe_13"
      DctmAPI.show(graph_name, beg_sep = True)
      stmt = """select name, count(*) from dmi_queue_item group by name order by 2 desc"""
      rawdata = []
      attr_name = []
      status = DctmAPI.select2dict(session, stmt, rawdata, attr_name)
      DctmAPI.show(rawdata)
      if not status:
         print("select [" + stmt + "] was not successful")
         exit(1)
      Plot2HTML(div = graph_name,
                graph_title = "Q13. Queue Items by Name",
                data = rawdata,
                data_labels = attr_name,
                mode = ('e' if (("stdout" == output_file or output_file is not None) and len(repository.split(",")) - 1 == pointer) or output_file is None else None))

   status = DctmAPI.disconnect(session)
   if not status:
      print("error while  disconnecting")

A pdf (sorry, uploading python scripts is not allowed on this site) of the script is available here too graph-stats.
Here is the DctmAPI.py as pdf too: DctmAPI.
The script invokes select2dict() from the module DctmAPI (as said before, it is also presented and accessible here) with the query and receives back the result into an array of dictionaries, i.e. each line of the array is a line from the query’s resultset. It then goes on creating the HTML page by invoking the function Plot2HTML(). On line 47, the js library is imported. Each graph is plotted in its own DIV section by Plotly’s js function newPlot(). All the required parameters are set in python dictionaries and passed as json literals to newPlot(). The conversion from python to javascript is conveniently done by the function dumps() from the json module. The HTML and js parts are produced by multi-line python’s print() statements.

Invoking the script

The script is invoked from the command-line through the following syntax:

./graph-stats.py -h|--help | -d|--docbase <docbase>{,<docbase>} [-o|--output_file <output_file>]

A Help option is available, which outputs the text below:

./graph-stats.py --help
Usage:
Connects as dmadmin/xxxx to a repository and generates an HTML page containing the plots of several DQL queries result;
Usage:
./graph-stats.py -h|--help | -d|--docbase {,<docbase>} [-o|--output_file <output_file>] output_file can be one repository or a comma-separated list of repositories;
if output_file is omitted, the html page is output to ./graphs/-$(date +"%Y%m%d-%H%M%S").html;
output_file can be "stdout", which is useful for CGI programming;
Example:
./graph-stats.py -d dmtest,mail_archive,doc_engineering
will query the docbases dmtest, mail_archive and doc_engineering and output the graphs to the files -$(date +"%Y%m%d-%H%M%S").html, one file per docbase;
./graph-stats.py -d mail_archive,doc_engineering -o all_docbase_current_status
will query the docbases mail_archive and doc_engineering and output all the graphs to the unique file all_docbase_current_status;
./graph-stats.py -d mail-archive
will query docbase mail-archive and output the graphs to the file mail-archive-$(date +"%Y%m%d-%H%M%S").html;
./graph-stats.py --docbase dmtest --output dmtest.html
will query docbase dmtest and output the graphs to the file dmtest.html;
./graph-stats.py -d dmtest --output stdout
will query docbase dmtest and output the graphs to stdout;

As the account used is the trusted, password-less dmadmin one, the script must run locally on the repository server machine but it is relatively easy to set it up for remote docbases, with a password prompt or without if using public key authentication (and still password-less in the remote docbases if using dmadmin).

Examples of charts

Now that we’ve seen the bits and pieces, let’s generate an HTML page with the graphs of all the above queries.
Here is an example when the queries where run against an out of the box repository named dmtest (all the graphs in one pdf file): dmtest.pdf
For more complex examples, just run the script against one of your docbases and see the results by yourself, as I can’t upload neither html files nor compressed files here.
As shown, some queries are charted using 2 to 4 graphs, the objective being to offer the clearest view. E.g. query 1 computes counts and total sizes. They will be plotted on the same graph as grouped bars of 2 independent variables and on two distinct pie charts, one for each variable. Query 2 shows the count(*) and the total sizes by document type as 2 stacked bar graphs plus 2 line graphs of the cumulated monthly created documents by document type. The line graphs show the total number of created documents at the end of a month, not the total number of existing documents as of the end of each month; to have those numbers (an historical view of query 1), one needs to query the docbase monthly and store the numbers somewhere. Query 1 returns this status but only at the time it was run.
Discussing Plotly is out of the scope of this article but suffice it to say that it allows one to work the chart interactively, e.g. removing variables (aka traces in Plotly’s parlance) so the rest of them are re-scaled for better readability, zoom parts of the charts, and scroll the graphs horizontally or vertically. A nice tip on hover is also available to show the exact values of a point, e.g.:
Screenshot from 2018-11-03 19-07-28
Plotly even lets one completely edit the chart from within a js application hosted at Plotly, so this may not suit anybody if confidentiality is required; moreover, the graph can only be saved in a public area on their site, and subsequently downloaded from there, unless a per annum paying upgrade is subscribed. Nonetheless, the free features are well enough for most needs.

Leveraging Plot2HTML()

The script’s Plot2HTML() function can be leveraged so it is used to plot data from any source, even from a text file so the query is separated from its result’s graphical representation, which is useful if a direct access to the system to query is not possible. To do this, an easy to parse file format is necessary, e.g.:

# this is a commented line;
# line_no   variable1    variable2 .....
0          type of graph                                   # use B for bars, S for stacked bars, L for lines, P for pie or any combination of them (BL, LB, BP, PB, LP, PL, BPL);
                                                           # one graph for each combination will be put on the html page in a nx2 cell grid, i.e. graphs  1 and 2 horizontally and graph 3 below,
                                                           # in the same order as the graph type letters;
1          graph title                                     # can be on several lines separated by html 
for sub-titles; 2 Xaxis_name Yaxis1_name Yaxis2_name # names of axis, one X and up to 2 Y; 3 Xaxis_values1 value1 value2 # first line of (x,y) pairs or (x,y1,y2) triplets if a second axis was specified; 4 Xaxis_values2 value1 value2 # second line of data; ... # the other lines; n Xaxis_valuesn valuen valuen # last line of data;

And that script could be invoked like this:

Usage:
       ./graph-it.py -h|--help | -f|--input_file <filename>{,<filename>} [-o|--output_file <output_file>]

The script itself just needs to parse and load the data from the input file(s) into an array of dictionaries and pass it to the plotting function Plot2HTML(). Of course, if the data is already in json format, the parsing step will be greatly simplified.
Time and mood permitting, I’ll do it in a future article.

Useful tools

The color of the many chart’s option (e.g. on line 87) can be specified through an rgb triplet or an HTML code. There are many color pickers on the web, e.g. this one. Here, I let Plotly chose the colors as it sees fit, hence the not so attractive results.
If you need to paste HTML code into an HTML page without it being interpreted by the browser, e.g. the code generated by graph-stats.py, some transformations are required, e.g. all the angle brackets need be replaced by their HTML entities representations. If that code does not contain confidential information, an on-line service can be used for that, such as this one, but it shouldn’t be too difficult to write a list of sed expressions that do the substitutions (e.g. this gawk one-liner takes care of the characters < and >:

gawk -c '{gsub(/>/, "\\>"); gsub(/</, "\\<"); print}' myfile.html).

Other replacements can be added as the need arises.

Conclusion

All the above confirms yet again that it is possible to create cheap but useful, custom tools with almost no installation, except here a textual javascript library. All the rest is already available on any administrator’s decent system, command-line and python interpreters, and a browser which, by the way, is taking more and more importance as a tiers because of the wide spreading of javascript libraries that execute locally and the ever increasing performance of javascript engines. By the way, chromium is highly recommended here for visualizing those complex charts from real production repositories.
As the source code is provided, it is possible to modify it in order to produce better looking graphs. This is the real arduous and time-consuming part in my humble opinion as it requires some artistic inclination and many experimentations. If you find a nicer color scheme, please feel free to contribute in the comments.

Cet article A Graphical Overview of a Repository est apparu en premier sur Blog dbi services.

Pass summit – dbi visit day 1

$
0
0

IMG_1022

Designing Modern Data and Analytic Solution in Azure

After having explained the pros and cons of Azure and the decision drivers for going to an Azure Architecture, some interesting messages were delivered like the decoupling of the storage and compute aspect on Azure, even if some of services still combine both. Another message that we all know but is essential to remind on a regular basis is that cost control is an important aspect. Developers and architects can influence the cost of cloud based platform hugely.

The fact that Azure is evolving a lot, that new services are offered or that their features are changing , you have to be constantly informed about these  new or deprecated capabilities. You are sometimes faced to some migration works to adapt to these evolutions. Proof of concept must be an habit, a routine due to this constantly moving environment.

Organizing and configuring your resources in Azure is also something to consider seriously not to be in trouble and lost with all the resources you deployed within your organization. You have some notions that become important to use to organize your Azure platform like

  • Subscriptions
  • Resource groups
  • Tags
  • Policies
  • Locations of your services
  • Naming convention

Using Tags on Azure can help a lot for reporting reasons.
A right naming convention (i.e. Purpose + type of service + environment), can help you to deploy easier your different environments, but a trick is also not use special characters because they cannot be used consistently in azure services. Using policies will support you to control the services are created the way your governance wants them.

A presentation of a data analytic architecture was presented including the following services in order to managed structured data:

  • Azure data factory
  • Azure Blob Storage
  • Azure data bricks
  • Azure SQL Data Warehouse
  • Azure Analysis Services
  • Power BI

Then following services are suggested additionally to add real-time data analytic

  • Azure HDInsight Kafka
  • Azure Databricks Spark
  • Azure Cosmos DB

They presented then the different layers of the architecture and the possible azure services, data acquisition, data storage, data computing and data visualization.
When doing on premise data warehouse project for 15 years , I was mentioning always that I want to privilege ELT vs. ETL , I was glad to hear that now ELT is really mostly used terminology and Data Factory was presented that way. The new version enables now to run SSIS packages too. The different data storage possibilities, like blob storage, data lake, SQL DWH, SQL DB and Cosmos DB have been reviewed.  Choosing the right data storage is always to find the right balanced between data schema read and data schema write. But in a modern BI Azure platform you will find a a so called polyglot data storage solution, combining several types a storage services.

A component often emerging in this tools constellation is Databricks running Apache Spark for the data engineering and data science. The advantage is that it is closer to open source world and support several languages SQL, Python, R, Scala and can be used for use cases like bath data processing, Interactive analytics, machine learnings, Stream event processing,…

In such environment, the data virtualization is also an interesting capabilities on such architecture. It is possible using for instance polybase, allowing to query the disparate stored data avoiding to replicate, duplicate it.

The per-conference day finished with the presentation of Power BI and Azure Analysis Service and some automation technics with ARM and PowerShell

Again an interesting data analytic day…stay tuned…

Cet article Pass summit – dbi visit day 1 est apparu en premier sur Blog dbi services.

Pass summit – dbi visit day 2

$
0
0

 IMG_1046

The Art and Science of Designing Smart Data story

This pre-conference session was moderated by Mico Yuk CEO and founder of BI Brainz

Mico drove us through the path of BI data visual storytelling and the importance it has to present the data in an attractive way.
She warned the session will not be technical at all, I was a bit disappointed about this point, but was curious how she will drive us there.

Starting, she made us aware of the impact on your brain of what you see , especially the color, the words.
In order to create a good storytelling, the first think is to raise the right questions, but avoid the pitfalls like asking
What do you want to see?
How do you want it to look?
What do you want to measure?

She drew a parallel with Hollywood and the film scenarios, being most of the time build on the same template.
Why can’t we do the same with data:
Part 1: goal
goals need to be defined clearly, They are the core of your data story and what is significant is that the goal is quantified
Part 2: Snapshot or KPI
KPI what are the figures that measure your goal
Part 3: Trends
Is the way the KPI behaves to support your goal
Part 4: Actions
There are 3 kinds of actions that can be considered to fix the issues
Reality – what we want to happen
Fallback – if reality does not work what else do you do
Wish list – what do you wish to fix the issue

To succeed in creating the different storyboard parts you have to raise the right questions:
For the goal part : What is your goal for this project? What does success look like?
For the KPI part: What KPI do you need see to hit your goal?
For the trends: If your KPI is not on target why?

A goal formulation must follow the WHW goal sample would be “Hit consumer sales target 1% by 2018″
What = hit
How much = 1% sales
When = 2018

Finally Mico showed how to display the information of the story on a dashboard following the visual storyboard tile with the following elements:
0 Goal/Kpi name
1 Where are you no
2 How di you get here
3 Where will you end up
4 what is the impact

The trick is always to prototype your storyboard outside a BI tool.

Along the day we also did some gym to relax and aware our brains

 

It was a hands-on session with practical exercises and we worked by group on a concrete use case to apply the storytelling methodology all along the day.

IMG_1048

Again an interesting data analytic day…stay tuned..

Cet article Pass summit – dbi visit day 2 est apparu en premier sur Blog dbi services.

Viewing all 2881 articles
Browse latest View live


Latest Images