Quantcast
Channel: dbi Blog
Viewing all 2845 articles
Browse latest View live

Alfresco Clustering – ActiveMQ

$
0
0

In previous blogs, I talked about some basis and presented some possible architectures for Alfresco, I talked about the Clustering setup for the Alfresco Repository and the Alfresco Share. In this one, I will work on the ActiveMQ layer. I recently posted something related to the setup of ActiveMQ and some initial configuration. I will therefore extend this topic in this blog with what needs to be done to have a simple Cluster for ActiveMQ. I’m not an ActiveMQ expert, I just started using it a few months ago in relation to Alfresco but still, I learned some things in this timeframe so this might be of some use.

ActiveMQ is a Messaging Server so there are therefore three sides to this component. First, there are Producers which produce messages. These messages are put in the broker’s queue which is the second side and finally there are Consumers which consume the messages from the queue. Producers and Consumers are satellites that are using the JMS broker’s queue: they are both clients. Therefore, in a standalone architecture (one broker), there is no issue because clients will always produce and consume all messages. However, if you start adding more brokers and if you aren’t doing it right, you might have producers talking to a specific broker and consumers talking to another one. To solve that, there are a few things possible:

  • a first solution is to create a Network of Brokers which will allow the different brokers to forward the necessary messages between them. You can see that as an Active/Active Cluster
    • Pros: this allows ActiveMQ to support a huge architecture with potentially hundreds or thousands of brokers
    • Cons: messages are, at any point in time, only owned by one single broker so if this broker goes down, the message is lost (if there is no persistence) or will have to wait for the broker to be restarted (if there is persistence)
  • the second solution that ActiveMQ supports is the Master/Slave one. In this architecture, all messages will be replicated from a Master to all Slave brokers. You can see that as something like an Active/Passive Cluster
    • Pros: messages are always processed and cannot be lost. If the Master broker is going down for any reasons, one of the Slave is instantly taking its place as the new Master with all the previous messages
    • Cons: since all messages are replicated, it’s much harder to support a huge architecture

In case of a Network of Brokers, it’s possible to use either the static or dynamic discovery of brokers:

  • Static discovery: Uses the static protocol to provide a list of all URIs to be tested to discover other connections. E.g.: static:(tcp://mq_n1.domain:61616,tcp://mq_n2.domain:61616)?maxReconnectDelay=3000
  • Dynamic discovery: Uses a multicast discovery agent to check for other connections. This is done using the discoveryUri parameter in the XML configuration file

 

I. Client’s configuration

On the client’s side, using several brokers is very simple since it’s all about using the correct broker URL. To be able to connect to several brokers, you should use the Failover Transport protocol which replaced the Reliable protocol used in ActiveMQ 3. For Alfresco, this broker URL needs to be updated in the alfresco-global.properties file. This is an example for a pretty simple URL with two brokers:

[alfresco@alf_n1 ~]$ cat $CATALINA_HOME/shared/classes/alfresco-global.properties
...
### ActiveMQ
messaging.broker.url=failover:(tcp://mq_n1.domain:61616,tcp://mq_n2.domain:61616)?timeout=3000&randomize=false&nested.daemon=false&nested.dynamicManagement=false
#messaging.username=
#messaging.password=
...
[alfresco@alf_n1 ~]$

 

There are a few things to note. The Failover used above is a transport layer that can be used in combination with any of the other transport methods/protocol. Here it’s used with two TCP protocol. The correct nomenclature is either:

  • failover:uri1,…,uriN
    • E.g.: failover:tcp://mq_n1.domain:61616,tcp://mq_n2.domain:61616 => the simplest broker URL for two brokers with no custom options
  • failover:uri1?URIOptions1,…,uriN?URIOptionsN
    • E.g.: failover:tcp://mq_n1.domain:61616?daemon=false&dynamicManagement=false&trace=false,tcp://mq_n2.domain:61616?daemon=false&dynamicManagement=true&trace=true => a more advanced broker URL with some custom options for each of the TCP protocol URIs
  • failover:(uri1?URIOptions1,…,uriN?URIOptionsN)?FailoverTransportOptions
    • E.g.: failover:(tcp://mq_n1.domain:61616?daemon=false&dynamicManagement=false&trace=false,tcp://mq_n2.domain:61616?daemon=false&dynamicManagement=true&trace=true)?timeout=3000&randomize=false => the same broker URL as above but, in addition, with some Failover Transport options
  • failover:(uri1,…,uriN)?FailoverTransportOptions&NestedURIOptions
    • E.g.: failover:(tcp://mq_n1.domain:61616,tcp://mq_n2.domain:61616)?timeout=3000&randomize=false&nested.daemon=false&nested.dynamicManagement=false&nested.trace=false => since ActiveMQ 5.9, it’s now possible to set the nested URIs options (here the TCP protocol options) at the end of the broker URL, they just need to be preceded by “nested.”. Nested options will apply to all URIs.

There are a lot of interesting parameters, these are some:

  • Failover Transport options:
    • backup=true: initialize and keep a second connection to another broker for faster failover
    • randomize=true: will pick a new URI for the reconnect randomly from the list of URIs
    • timeout=3000: time in ms before timeout on the send operations
    • priorityBackup=true: clients will failover to other brokers in case the “primary” broker isn’t available (that’s always the case) but it will consistently try to reconnect to the “primary” one. It is possible to specify several “primary” brokers with the priorityURIs option (comma separated list)
  • TCP Transport options:
    • daemon=false: specify that ActiveMQ isn’t running in a Spring or Web container
    • dynamicManagement=false: disabling the JMX management
    • trace=false: disabling the tracing

The full list of Failover Transport options is described here and the full list of TCP Transport options here.

II. Messaging Server’s configuration

I believe the simplest setup for Clustering in ActiveMQ is using the Master/Slave setup, that’s what I will talk about here. If you are looking for more information about the Network of Brokers, you can find that here. As mentioned previously, the idea behind the Master/Slave is to replicate somehow the messages to Slave brokers. To do that, there are three possible configurations:

  • Shared File System: use a shared file system
  • JDBC: use a Database Server
  • Replicated LevelDB Store: use a ZooKeeper Server. This has been deprecated in recent versions of ActiveMQ 5 in favour of KahaDB, which is a file-based persistence Database. Therefore, this actually is linked to the first configuration above (Shared File System)

In the scope of Alfresco, you should already have a shared file system as well as a shared Database Server for the Repository Clustering… So, it’s pretty easy to fill the prerequisites for ActiveMQ since you already have them. Of course, you can use a dedicated Shared File System or dedicated Database, that’s up to your requirements.

a. JDBC

For the JDBC configuration, you will need to change the persistenceAdapter to use the dedicated jdbcPersistenceAdapter and create the associated DataSource for your Database. ActiveMQ supports some DBs like Apache Derby, DB2, HSQL, MySQL, Oracle, PostgreSQL, SQLServer or Sybase. You will also need to add the JDBC library at the right location.

[alfresco@mq_n1 ~]$ cat $ACTIVEMQ_HOME/conf/activemq.xml
<beans
  xmlns="http://www.springframework.org/schema/beans"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
  http://activemq.apache.org/schema/core http://activemq.apache.org/schema/core/activemq-core.xsd">
  ...
  <broker xmlns="http://activemq.apache.org/schema/core" brokerName="mq_n1" dataDirectory="${activemq.data}">
    ...
    <persistenceAdapter>
      <jdbcPersistenceAdapter dataDirectory="activemq-data" dataSource="postgresql-ds"/>
    </persistenceAdapter>
    ...
  </broker>
  ...
  <bean id="postgresql-ds" class="org.postgresql.ds.PGPoolingDataSource">
    <property name="serverName" value="db_vip"/>
    <property name="databaseName" value="alfresco"/>
    <property name="portNumber" value="5432"/>
    <property name="user" value="alfresco"/>
    <property name="password" value="My+P4ssw0rd"/>
    <property name="dataSourceName" value="postgres"/>
    <property name="initialConnections" value="1"/>
    <property name="maxConnections" value="10"/>
  </bean>
  ...
</beans>
[alfresco@mq_n1 ~]$

 

b. Shared File System

The Shared File System configuration is, from my point of view, the simplest one to configure but for it to work properly, there are some things to note because you should use a shared file system that supports proper file lock. This means that:

  • you cannot use the Oracle Cluster File System (OCFS/OCFS2) because there is no cluster-aware flock or POSIX locks
  • if you are using NFS v3 or lower, you won’t have automatic failover from Master to Slave because there is no timeout and therefore the lock will never be released. You should therefore use NFS v4 instead

Additionally, you need to share the persistenceAdapter between all brokers but you cannot share the data folder completely otherwise the logs will be overwritten by all brokers (that’s bad but it’s not really an issue) and more importantly, the PID file will also be overwritten which will therefore cause issues to start/stop Slave brokers…

Therefore, configuring properly the Shared File System is all about keeping the “$ACTIVEMQ_DATA” environment variable set to the place where you want the logs and PID files to be stored (i.e. locally) and you need to overwrite the persistenceAdapter path to be on the Shared File System:

[alfresco@mq_n1 ~]$ # Root folder of the ActiveMQ binaries
[alfresco@mq_n1 ~]$ echo $ACTIVEMQ_HOME
/opt/activemq
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ # Location of the logs and PID file
[alfresco@mq_n1 ~]$ echo $ACTIVEMQ_DATA
/opt/activemq/data
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ # Location of the Shared File System
[alfresco@mq_n1 ~]$ echo $ACTIVEMQ_SHARED_DATA
/shared/file/system
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ sudo systemctl stop activemq.service
[alfresco@mq_n1 ~]$ grep -A2 "<persistenceAdapter>" $ACTIVEMQ_HOME/conf/activemq.xml
    <persistenceAdapter>
      <kahaDB directory="${activemq.data}/kahadb"/>
    </persistenceAdapter>
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ # Put the KahaDB into the Shared File System
[alfresco@mq_n1 ~]$ sed -i "s, directory=\"[^\"]*\", directory=\"${ACTIVEMQ_SHARED_DATA}/activemq/kahadb\"," $ACTIVEMQ_HOME/conf/activemq.xml
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ grep -A2 "<persistenceAdapter>" $ACTIVEMQ_HOME/conf/activemq.xml
    <persistenceAdapter>
      <kahaDB directory="/shared/file/system/activemq/kahadb"/>
    </persistenceAdapter>
[alfresco@mq_n1 ~]$
[alfresco@mq_n1 ~]$ sudo systemctl start activemq.service

 

Starting the Master ActiveMQ will display some information in the log of the node1 showing that it has started properly and it will listen for connections on the different transportConnector:

[alfresco@mq_n1 ~]$ cat $ACTIVEMQ_DATA/activemq.log
2019-07-28 11:34:37,598 | INFO  | Refreshing org.apache.activemq.xbean.XBeanBrokerFactory$1@9f116cc: startup date [Sun Jul 28 11:34:37 CEST 2019]; root of context hierarchy | org.apache.activemq.xbean.XBeanBrokerFactory$1 | main
2019-07-28 11:34:38,289 | INFO  | Using Persistence Adapter: KahaDBPersistenceAdapter[/shared/file/system/activemq/kahadb] | org.apache.activemq.broker.BrokerService | main
2019-07-28 11:34:38,330 | INFO  | KahaDB is version 6 | org.apache.activemq.store.kahadb.MessageDatabase | main
2019-07-28 11:34:38,351 | INFO  | PListStore:[/opt/activemq/data/mq_n1/tmp_storage] started | org.apache.activemq.store.kahadb.plist.PListStoreImpl | main
2019-07-28 11:34:38,479 | INFO  | Apache ActiveMQ 5.15.6 (mq_n1, ID:mq_n1-36925-1564306478360-0:1) is starting | org.apache.activemq.broker.BrokerService | main
2019-07-28 11:34:38,533 | INFO  | Listening for connections at: tcp://mq_n1:61616?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main
2019-07-28 11:34:38,542 | INFO  | Connector openwire started | org.apache.activemq.broker.TransportConnector | main
2019-07-28 11:34:38,545 | INFO  | Listening for connections at: amqp://mq_n1:5672?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main
2019-07-28 11:34:38,546 | INFO  | Connector amqp started | org.apache.activemq.broker.TransportConnector | main
2019-07-28 11:34:38,552 | INFO  | Listening for connections at: stomp://mq_n1:61613?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main
2019-07-28 11:34:38,553 | INFO  | Connector stomp started | org.apache.activemq.broker.TransportConnector | main
2019-07-28 11:34:38,556 | INFO  | Listening for connections at: mqtt://mq_n1:1883?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main
2019-07-28 11:34:38,561 | INFO  | Connector mqtt started | org.apache.activemq.broker.TransportConnector | main
2019-07-28 11:34:38,650 | WARN  | ServletContext@o.e.j.s.ServletContextHandler@11841b15{/,null,STARTING} has uncovered http methods for path: / | org.eclipse.jetty.security.SecurityHandler | main
2019-07-28 11:34:38,710 | INFO  | Listening for connections at ws://mq_n1:61614?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.ws.WSTransportServer | main
2019-07-28 11:34:38,712 | INFO  | Connector ws started | org.apache.activemq.broker.TransportConnector | main
2019-07-28 11:34:38,712 | INFO  | Apache ActiveMQ 5.15.6 (mq_n1, ID:mq_n1-36925-1564306478360-0:1) started | org.apache.activemq.broker.BrokerService | main
2019-07-28 11:34:38,714 | INFO  | For help or more information please see: http://activemq.apache.org | org.apache.activemq.broker.BrokerService | main
2019-07-28 11:34:39,118 | INFO  | No Spring WebApplicationInitializer types detected on classpath | /admin | main
2019-07-28 11:34:39,373 | INFO  | ActiveMQ WebConsole available at http://0.0.0.0:8161/ | org.apache.activemq.web.WebConsoleStarter | main
2019-07-28 11:34:39,373 | INFO  | ActiveMQ Jolokia REST API available at http://0.0.0.0:8161/api/jolokia/ | org.apache.activemq.web.WebConsoleStarter | main
2019-07-28 11:34:39,402 | INFO  | Initializing Spring FrameworkServlet 'dispatcher' | /admin | main
2019-07-28 11:34:39,532 | INFO  | No Spring WebApplicationInitializer types detected on classpath | /api | main
2019-07-28 11:34:39,563 | INFO  | jolokia-agent: Using policy access restrictor classpath:/jolokia-access.xml | /api | main
[alfresco@mq_n1 ~]$

 

Then starting a Slave will only display the information on the node2 logs that there is already a Master running and therefore the Slave is just waiting and it’s not listening for now:

[alfresco@mq_n2 ~]$ cat $ACTIVEMQ_DATA/activemq.log
2019-07-28 11:35:53,258 | INFO  | Refreshing org.apache.activemq.xbean.XBeanBrokerFactory$1@9f116cc: startup date [Sun Jul 28 11:35:53 CEST 2019]; root of context hierarchy | org.apache.activemq.xbean.XBeanBrokerFactory$1 | main
2019-07-28 11:35:53,986 | INFO  | Using Persistence Adapter: KahaDBPersistenceAdapter[/shared/file/system/activemq/kahadb] | org.apache.activemq.broker.BrokerService | main
2019-07-28 11:35:53,999 | INFO  | Database /shared/file/system/activemq/kahadb/lock is locked by another server. This broker is now in slave mode waiting a lock to be acquired | org.apache.activemq.store.SharedFileLocker | main
[alfresco@mq_n2 ~]$

 

Finally stopping the Master will automatically transform the Slave into a new Master, without any human interaction. From the node2 logs:

[alfresco@mq_n2 ~]$ cat $ACTIVEMQ_DATA/activemq.log
2019-07-28 11:35:53,258 | INFO  | Refreshing org.apache.activemq.xbean.XBeanBrokerFactory$1@9f116cc: startup date [Sun Jul 28 11:35:53 CEST 2019]; root of context hierarchy | org.apache.activemq.xbean.XBeanBrokerFactory$1 | main
2019-07-28 11:35:53,986 | INFO  | Using Persistence Adapter: KahaDBPersistenceAdapter[/shared/file/system/activemq/kahadb] | org.apache.activemq.broker.BrokerService | main
2019-07-28 11:35:53,999 | INFO  | Database /shared/file/system/activemq/kahadb/lock is locked by another server. This broker is now in slave mode waiting a lock to be acquired | org.apache.activemq.store.SharedFileLocker | main
  # The ActiveMQ Master on node1 has been stopped here (11:37:10)
2019-07-28 11:37:11,166 | INFO  | KahaDB is version 6 | org.apache.activemq.store.kahadb.MessageDatabase | main
2019-07-28 11:37:11,187 | INFO  | PListStore:[/opt/activemq/data/mq_n2/tmp_storage] started | org.apache.activemq.store.kahadb.plist.PListStoreImpl | main
2019-07-28 11:37:11,316 | INFO  | Apache ActiveMQ 5.15.6 (mq_n2, ID:mq_n2-41827-1564306631196-0:1) is starting | org.apache.activemq.broker.BrokerService | main
2019-07-28 11:37:11,370 | INFO  | Listening for connections at: tcp://mq_n2:61616?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main
2019-07-28 11:37:11,372 | INFO  | Connector openwire started | org.apache.activemq.broker.TransportConnector | main
2019-07-28 11:37:11,379 | INFO  | Listening for connections at: amqp://mq_n2:5672?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main
2019-07-28 11:37:11,381 | INFO  | Connector amqp started | org.apache.activemq.broker.TransportConnector | main
2019-07-28 11:37:11,386 | INFO  | Listening for connections at: stomp://mq_n2:61613?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main
2019-07-28 11:37:11,387 | INFO  | Connector stomp started | org.apache.activemq.broker.TransportConnector | main
2019-07-28 11:37:11,390 | INFO  | Listening for connections at: mqtt://mq_n2:1883?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main
2019-07-28 11:37:11,391 | INFO  | Connector mqtt started | org.apache.activemq.broker.TransportConnector | main
2019-07-28 11:37:11,485 | WARN  | ServletContext@o.e.j.s.ServletContextHandler@2cfbeac4{/,null,STARTING} has uncovered http methods for path: / | org.eclipse.jetty.security.SecurityHandler | main
2019-07-28 11:37:11,547 | INFO  | Listening for connections at ws://mq_n2:61614?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.ws.WSTransportServer | main
2019-07-28 11:37:11,548 | INFO  | Connector ws started | org.apache.activemq.broker.TransportConnector | main
2019-07-28 11:37:11,556 | INFO  | Apache ActiveMQ 5.15.6 (mq_n2, ID:mq_n2-41827-1564306631196-0:1) started | org.apache.activemq.broker.BrokerService | main
2019-07-28 11:37:11,558 | INFO  | For help or more information please see: http://activemq.apache.org | org.apache.activemq.broker.BrokerService | main
2019-07-28 11:37:11,045 | INFO  | No Spring WebApplicationInitializer types detected on classpath | /admin | main
2019-07-28 11:37:11,448 | INFO  | ActiveMQ WebConsole available at http://0.0.0.0:8161/ | org.apache.activemq.web.WebConsoleStarter | main
2019-07-28 11:37:11,448 | INFO  | ActiveMQ Jolokia REST API available at http://0.0.0.0:8161/api/jolokia/ | org.apache.activemq.web.WebConsoleStarter | main
2019-07-28 11:37:11,478 | INFO  | Initializing Spring FrameworkServlet 'dispatcher' | /admin | main
2019-07-28 11:37:11,627 | INFO  | No Spring WebApplicationInitializer types detected on classpath | /api | main
2019-07-28 11:37:11,664 | INFO  | jolokia-agent: Using policy access restrictor classpath:/jolokia-access.xml | /api | main
[alfresco@mq_n2 ~]$

 

You can of course customize ActiveMQ as per your requirements, remove some connectors, setup SSL, aso… But that’s not really the purpose of this blog.

 

 

Other posts of this series on Alfresco HA/Clustering:

Cet article Alfresco Clustering – ActiveMQ est apparu en premier sur Blog dbi services.


Alfresco Clustering – Apache HTTPD as Load Balancer

$
0
0

In previous blogs, I talked about some basis and presented some possible architectures for Alfresco, I talked about the Clustering setup for the Alfresco Repository, the Alfresco Share and for ActiveMQ. In this one, I will talk about the Front-end layer, but in a very particular setup because it will also act as a Load Balancer. For an Alfresco solution, you can choose the front-end that you prefer and it can just act as a front-end to protect your Alfresco back-end components, to add SSL or whatever. There is no real preferences but you will obviously need to know how to configure it. I posted a blog some years ago for Apache HTTPD as a simple front-end (here) or you can check the Alfresco documentation which now include a section for that as well but there is no official documentation for a Load Balancer setup.

In an Alfresco architecture that includes HA/Clustering you will, at some point, need a Load Balancer. From time to time, you will come across companies that do not already have a Load Balancer available and you might therefore have to provide something to fill this gap. Since you will most probably (should?) already have a front-end to protect Alfresco, why not using it as well as a Load Balancer? In this blog, I choose Apache HTTPD because that’s the front-end I’m usually using and I know it’s working fine as a LB as well.

The architectures that I described in the first blog of this series, there always were a front-end installed on each node with Alfresco Share and there were a LB above that. Here, these two boxes are actually together. There are multiple ways to set that up but I didn’t want to talk about that in my first blog because it’s not really related to Alfresco, it’s above that so it would just have multiplied the possible architectures that I wanted to present and my blog would just have been way too long. There were also no communications between the different front-end nodes because technically speaking, we aren’t going to setup Apache HTTPD as a Cluster, we only need to provide a High Availability solution.

Alright so let’s say that you don’t have a Load Balancer available and you want to use Apache HTTPD as a front-end+LB for a two-node Cluster. There are several solutions so here are two possible ways to do that from an inbound communication point of view that will still provide redundancy:

  • Setup a Round Robin DNS that points to both Apache HTTPD node1 and node2. The DNS will redirect connections to either of the two Apache HTTPD (Active/Active)
  • Setup a Failover DNS with a pretty low TimeToLive (TTL) which will point to a single Apache HTTPD node and redirect all traffic there. If this one isn’t available, it will failover to the second one (Active/Passive)

 

In both cases above, the Apache HTTPD configuration can be exactly the same, it will work. From an outbound communication point of view, Apache HTTPD will talk directly with all the Share nodes behind it. To avoid disconnection and loss of sessions in case an Apache HTTPD is going down, the solution will need to support session stickiness across all Apache HTTPD. With that, all communications coming a single browser will always be redirected to the same backend server which ensures that the sessions are still intact, even if you are losing an Apache HTTPD. I mentioned previously that there won’t be any communications between the different front-ends so this session stickiness must be based on something present inside the session (header or cookie) or inside the URL.

With Apache HTTPD, you can use the Proxy modules to provide both a front-end configuration as well as a Load Balancer but, in this blog, I will use the JK module. The JK module is provided by Apache for communications between Apache HTTPD and Apache Tomcat. It has been designed and optimized for this purpose and it also provides/supports a Load Balancer configuration.

 

I. Apache HTTPD setup for a single back-end node

For this example, I will use the package provided by Ubuntu for a simple installation. You can obviously build it from source to customize it, add your best practices, aso… This has nothing to do with the Clustering setup, it’s a simple front-end configuration for any installation. So let’s install a basic Apache HTTPD:

[alfresco@httpd_n1 ~]$ sudo apt-get install apache2 libapache2-mod-jk
[alfresco@httpd_n1 ~]$ sudo systemctl enable apache2.service
[alfresco@httpd_n1 ~]$ sudo systemctl daemon-reload
[alfresco@httpd_n1 ~]$ sudo a2enmod rewrite
[alfresco@httpd_n1 ~]$ sudo a2enmod ssl

 

Then to configure it for a single back-end Alfresco node (I’m just showing a minimal configuration again, there is much more to do add security & restrictions around Alfresco and mod_jk):

[alfresco@httpd_n1 ~]$ cat /etc/apache2/sites-available/alfresco-ssl.conf
...
<VirtualHost *:80>
    RewriteRule ^/?(.*) https://%{HTTP_HOST}/$1 [R,L]
</VirtualHost>

<VirtualHost *:443>
    ServerName            dns.domain
    ServerAlias           dns.domain dns
    ServerAdmin           email@domain
    SSLEngine             on
    SSLProtocol           -all +TLSv1.2
    SSLCipherSuite        EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH:AES2
    SSLHonorCipherOrder   on
    SSLVerifyClient       none
    SSLCertificateFile    /etc/pki/tls/certs/dns.domain.crt
    SSLCertificateKeyFile /etc/pki/tls/private/dns.domain.key

    RewriteRule ^/$ https://%{HTTP_HOST}/share [R,L]

    JkMount /* alfworker
</VirtualHost>
...
[alfresco@httpd_n1 ~]$
[alfresco@httpd_n1 ~]$ cat /etc/libapache2-mod-jk/workers.properties
worker.list=alfworker
worker.alfworker.type=ajp13
worker.alfworker.port=8009
worker.alfworker.host=share_n1.domain
worker.alfworker.lbfactor=1
[alfresco@httpd_n1 ~]$
[alfresco@httpd_n1 ~]$ sudo a2ensite alfresco-ssl
[alfresco@httpd_n1 ~]$ sudo a2dissite 000-default
[alfresco@httpd_n1 ~]$ sudo rm /etc/apache2/sites-enabled/000-default.conf
[alfresco@httpd_n1 ~]$
[alfresco@httpd_n1 ~]$ sudo service apache2 restart

 

That should do it for a single back-end Alfresco node. Again, this was just an example, I wouldn’t recommend using the configuration as is (inside the alfresco-ssl.conf file), there is much more to do for security reasons.

 

II. Adaptation for a Load Balancer configuration

If you want to configure your Apache HTTPD as a Load Balancer, then on top of the standard setup shown above, you just have to modify two things:

  • Modify the JK module configuration to use a Load Balancer
  • Modify the Apache Tomcat configuration to add an identifier for Apache HTTPD to be able to redirect the communication to the correct back-end node (session stickiness). This ID put in the Apache Tomcat configuration will extend the Session’s ID like that: <session_id>.<tomcat_id>

 

So on all the nodes hosting the Apache HTTPD, you should put the exact same configuration:

[alfresco@httpd_n1 ~]$ cat /etc/libapache2-mod-jk/workers.properties
worker.list=alfworker

worker.alfworker.type=lb
worker.alfworker.balance_workers=node1,node2
worker.alfworker.sticky_session=true
worker.alfworker.method=B

worker.node1.type=ajp13
worker.node1.port=8009
worker.node1.host=share_n1.domain
worker.node1.lbfactor=1

worker.node2.type=ajp13
worker.node2.port=8009
worker.node2.host=share_n2.domain
worker.node2.lbfactor=1
[alfresco@httpd_n1 ~]$
[alfresco@httpd_n1 ~]$ sudo service apache2 reload

 

With the above configuration, we keep the same JK Worker (alfworker) but instead of using a ajp13 type, we use a lb type (line 4) which is an encapsulation. The alfworker will use 2 sub-workers named node1 and node2 (line 5), that’s just a generic name. The alfworker will also enable stickiness and use the method B (Busyness), which means that for new sessions, Apache HTTPD to choose to use the worker with the less requests being served, divided by the lbfactor value.

Each sub-worker (node1 and node2) define their type which is ajp13 this time, the port and host it should target (where the Share nodes are located) and the lbfactor. As mentioned above, increasing the lbfactor means that more requests are going to be sent to this worker:

  • For the node2 to serve 100% more requests than the node1 (x2), then set worker.node1.lbfactor=1 and worker.node2.lbfactor=2
  • For the node2 to serve 50% more requests than the node1 (x1.5), then set worker.node1.lbfactor=2 and worker.node2.lbfactor=3

 

The second thing to do is to modify the Apache Tomcat configuration to add a specific ID. On the Share node1:

[alfresco@share_n1 ~]$ grep "<Engine" $CATALINA_HOME/conf/server.xml
    <Engine name="Catalina" defaultHost="localhost" jvmRoute="share_n1">
[alfresco@share_n1 ~]$

 

On the Share node2:

[alfresco@share_n2 ~]$ grep "<Engine" $CATALINA_HOME/conf/server.xml
    <Engine name="Catalina" defaultHost="localhost" jvmRoute="share_n2">
[alfresco@share_n2 ~]$

 

The value to be put in the jvmRoute parameter is just a string so it can be anything but it must be unique across all Share nodes so that the Apache HTTPD JK module can find the correct back-end node that it should transfer the requests to.

It’s that simple to configure Apache HTTPD as a Load Balancer in front of Alfresco… To check which back-end server you are currently using, you can use the browser’s utilities and in particular the network recording which will display, in the headers/cookies section, the Session ID which will therefore display the value that you put in the jvmRoute.

 

 

Other posts of this series on Alfresco HA/Clustering:

Cet article Alfresco Clustering – Apache HTTPD as Load Balancer est apparu en premier sur Blog dbi services.

Alfresco Clustering – Solr6

$
0
0

In previous blogs, I talked about some basis and presented some possible architectures for Alfresco, I talked about the Clustering setup for the Alfresco Repository, the Alfresco Share and for ActiveMQ. I also setup an Apache HTTPD as a Load Balancer. In this one, I will talk about the last layer that I wanted to present, which is Solr and more particularly Solr6 (Alfresco Search Services) Sharding. I planned on writing a blog related to Solr Sharding Concepts & Methods to explain what it brings concretely but unfortunately, it’s not ready yet. I will try to post it in the next few weeks, if I find the time.

 

I. Solr configuration modes

So, Solr supports/provides three configuration modes:

  • Master-Slave
  • SolrCloud
  • Standalone


Master-Slave
: It’s a first specific configuration mode which is pretty old. In this one, the Master node is the only to index the content and all the Slave nodes will replicate the Master’s index. This is a first step to provide a Clustering solution with Solr, and Alfresco supports it, but this solution has some important drawbacks. For example, and contrary to an ActiveMQ Master-Slave solution, Solr cannot change the Master. Therefore, if you lose your Master, there is no indexing happening anymore and you need to manually change the configuration file on each of the remaining nodes to specify a new Master and target all the remaining Slaves nodes to use the new Master. This isn’t what I will be talking about in this blog.

SolrCloud: It’s another specific configuration mode which is a little bit more recent, introduced in Solr4 I believe. SolrCloud is a true Clustering solution using a ZooKeeper Server. It adds an additional layer on top of a Standalone Solr which is slowing it down a little bit, especially on infrastructures with a huge demand on indexing. But at some points, when you start having dozens of Solr nodes, you need a central place to organize and configure them and that’s what SolrCloud is very good at. This solution provides Fault Tolerance as well as High Availability. I’m not sure if SolrCloud could be used by Alfresco because sure SolrCloud also has Shards and its behaviour is pretty similar to a Standalone Solr but it’s not entirely working in the same way. Maybe it’s possible, however I have never seen it so far. Might be the subject of some testing later… In any cases, using a SolrCloud for Alfresco might not be that useful because it’s really easier to setup a Master-Master Solr mixed with Solr Sharding for pretty much the same benefits. So, I won’t talk about SolrCloud here either.

You guessed it, in this blog, I will only talk about Standalone Solr nodes and only using Shards. Alfresco supports Solr Shards only since the version 5.1. Before that, it wasn’t possible to use this feature, even if Solr4 provided it already. When using the two default cores (the famous “alfresco” & “archive” cores), with all Alfresco versions (all supporting Solr… So since Alfresco 4), it is possible to have a High Available Solr installation by setting up two Solr Standalone nodes and putting a Load Balancer in front of it but in this case, there is no communication between the Solr nodes so, it’s only a HA solution, nothing more.

 

In the architectures that I presented in the first blog of this series, if you remember the schema N°5 (you probably don’t but no worry, I didn’t either), I put a link between the two Solr nodes and I mentioned the following related to this architecture:
“N°5: […]. Between the two Solr nodes, I put a Clustering link, that’s in case you are using Solr Sharding. If you are using the default cores (alfresco and archive), then there is no communication between distinct Solr nodes. If you are using Solr Sharding and if you want a HA architecture, then you will have the same Shards on both Solr nodes and in this case, there will be communications between the Solr nodes, it’s not really a Clustering so to speak, that’s how Solr Sharding is working but I still used the same representation.”

 

II. Solr Shards creation

As mentioned earlier in this blog, there are real Cluster solutions with Solr but in the case of Alfresco, because of the features that Alfresco adds like the Shard Registration, there is no real need to set up complex things like that. Having just a simple Master-Master installation of Solr6 with Sharding is already a very good and strong solution to provide Fault Tolerance, High Availability, Automatic Failover, Performance improvements, aso… So how can that be setup?

First, you will need to install at least two Solr Standalone nodes. You can use exactly the same setup for all nodes and it’s also exactly the same setup to use the default cores or Solr Sharding so just do what you are always doing. For the Tracking, you will need to use the Load Balancer URL so it can target all Repository nodes, if there are several.

If you created the default cores, you can remove them easily:

[alfresco@solr_n1 ~]$ curl -v "http://localhost:8983/solr/admin/cores?action=removeCore&storeRef=workspace://SpacesStore&coreName=alfresco"
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8983 (#0)
> GET /solr/admin/cores?action=removeCore&storeRef=workspace://SpacesStore&coreName=alfresco HTTP/1.1
> Host: localhost:8983
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/xml; charset=UTF-8
< Content-Length: 150
<
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">524</int></lst>
</response>
* Connection #0 to host localhost left intact
[alfresco@solr_n1 ~]$
[alfresco@solr_n1 ~]$ curl -v "http://localhost:8983/solr/admin/cores?action=removeCore&storeRef=archive://SpacesStore&coreName=archive"
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8983 (#0)
> GET /solr/admin/cores?action=removeCore&storeRef=archive://SpacesStore&coreName=archive HTTP/1.1
> Host: localhost:8983
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/xml; charset=UTF-8
< Content-Length: 150
<
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">485</int></lst>
</response>
* Connection #0 to host localhost left intact
[alfresco@solr_n1 ~]$

 

A status of “0” means that it’s successful.

Once that’s done, you can then simply create the Shards. In this example, I will:

  • use the DB_ID_RANGE method
  • use two Solr nodes
  • for workspace://SpacesStore: create 2 Shards out of a maximum of 10 with a range of 20M
  • for archive://SpacesStore: create 1 Shard out of a maximum of 5 with a range of 50M

Since I will use only two Solr nodes and since I want a High Availability on each of the Shards, I will need to have them all on both nodes. With a simple loop, it’s pretty easy to create all the Shards:

[alfresco@solr_n1 ~]$ solr_host=localhost
[alfresco@solr_n1 ~]$ solr_node_id=1
[alfresco@solr_n1 ~]$ begin_range=0
[alfresco@solr_n1 ~]$ range=19999999
[alfresco@solr_n1 ~]$ total_shards=10
[alfresco@solr_n1 ~]$
[alfresco@solr_n1 ~]$ for shard_id in `seq 0 1`; do
>   end_range=$((${begin_range} + ${range}))
>   curl -v "http://${solr_host}:8983/solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=${total_shards}&numNodes=${total_shards}&nodeInstance=${solr_node_id}&template=rerank&coreName=alfresco&shardIds=${shard_id}&property.shard.method=DB_ID_RANGE&property.shard.range=${begin_range}-${end_range}&property.shard.instance=${shard_id}"
>   echo ""
>   echo "  -->  Range N°${shard_id} created with: ${begin_range}-${end_range}"
>   echo ""
>   sleep 2
>   begin_range=$((${end_range} + 1))
> done

*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8983 (#0)
> GET /solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=10&numNodes=10&nodeInstance=1&template=rerank&coreName=alfresco&shardIds=0&property.shard.method=DB_ID_RANGE&property.shard.range=0-19999999&property.shard.instance=0 HTTP/1.1
> Host: localhost:8983
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/xml; charset=UTF-8
< Content-Length: 182
<
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">254</int></lst><str name="core">alfresco-0</str>
</response>
* Connection #0 to host localhost left intact

  -->  Range N°0 created with: 0-19999999


*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8983 (#0)
> GET /solr/admin/cores?action=newCore&storeRef=workspace://SpacesStore&numShards=10&numNodes=10&nodeInstance=1&template=rerank&coreName=alfresco&shardIds=1&property.shard.method=DB_ID_RANGE&property.shard.range=20000000-39999999&property.shard.instance=1 HTTP/1.1
> Host: localhost:8983
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/xml; charset=UTF-8
< Content-Length: 182
<
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">228</int></lst><str name="core">alfresco-1</str>
</response>
* Connection #0 to host localhost left intact

  -->  Range N°1 created with: 20000000-39999999

[alfresco@solr_n1 ~]$
[alfresco@solr_n1 ~]$ begin_range=0
[alfresco@solr_n1 ~]$ range=49999999
[alfresco@solr_n1 ~]$ total_shards=4
[alfresco@solr_n1 ~]$ for shard_id in `seq 0 0`; do
>   end_range=$((${begin_range} + ${range}))
>   curl -v "http://${solr_host}:8983/solr/admin/cores?action=newCore&storeRef=archive://SpacesStore&numShards=${total_shards}&numNodes=${total_shards}&nodeInstance=${solr_node_id}&template=rerank&coreName=archive&shardIds=${shard_id}&property.shard.method=DB_ID_RANGE&property.shard.range=${begin_range}-${end_range}&property.shard.instance=${shard_id}"
>   echo ""
>   echo "  -->  Range N°${shard_id} created with: ${begin_range}-${end_range}"
>   echo ""
>   sleep 2
>   begin_range=$((${end_range} + 1))
> done

*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8983 (#0)
> GET /solr/admin/cores?action=newCore&storeRef=archive://SpacesStore&numShards=4&numNodes=4&nodeInstance=1&template=rerank&coreName=archive&shardIds=0&property.shard.method=DB_ID_RANGE&property.shard.range=0-49999999&property.shard.instance=0 HTTP/1.1
> Host: localhost:8983
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/xml; charset=UTF-8
< Content-Length: 181
<
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">231</int></lst><str name="core">archive-0</str>
</response>
* Connection #0 to host localhost left intact

-->  Range N°0 created with: 0-49999999

[alfresco@solr_n1 ~]$

 

On the Solr node2, to create the same Shards (another Instance of each Shard) and therefore provide the expected setup, just re-execute the same commands but replacing solr_node_id=1 with solr_node_id=2. That’s all there is to do on Solr side, just creating the Shards is sufficient. On the Alfresco side, configure the Shards registration to use the Dynamic mode:

[alfresco@alf_n1 ~]$ cat $CATALINA_HOME/shared/classes/alfresco-global.properties
...
# Solr Sharding
solr.useDynamicShardRegistration=true
search.solrShardRegistry.purgeOnInit=true
search.solrShardRegistry.shardInstanceTimeoutInSeconds=60
search.solrShardRegistry.maxAllowedReplicaTxCountDifference=500
...
[alfresco@alf_n1 ~]$

 

After a quick restart, all the Shard’s Instances will register themselves to Alfresco and you should see that each Shard has its two Shard’s Instances. Thanks to the constant Tracking, Alfresco knows which Shard’s Instances are healthy (up-to-date) and which ones aren’t (either lagging behind or completely silent). When performing searches, Alfresco will make a request to any of the healthy Shard’s Instances. Solr will be aware of the healthy Shard’s Instances as well and it will start the distribution of the search request to all the Shards for the parallel query. This is the communication between the Solr nodes that I mentioned earlier: it’s not really Clustering but rather query distribution between all the healthy Shard’s Instances.

 

 

Other posts of this series on Alfresco HA/Clustering:

Cet article Alfresco Clustering – Solr6 est apparu en premier sur Blog dbi services.

Sparse OVM virtual disks on appliances

$
0
0

For some reason, you may need to sparse OVM virtual disks in an Oracle appliances. Even though that feature is present trough the OVM Manager, most of the Oracle appliances doesn’t have any OVM Manager deployed on it. Therefore if you un-sparse your virtual disk by mistake, you are on your own.

This is a note on how to sparse virtual disks which are un-sparse.

Stop I/Os on the virtual disk

First, ensure the VM using the disk is stopped:
xm shutdown {VM_NAME}

For instance:
xm shutdown exac01db01.domain.local

Sparse disk


dd if={PATH_TO_DISK_TO_BE_SPARSED} of={PATH_TO_NEW_SPARSED_DISK} conv=sparsed

For instance:

dd if=/EXAVMIMAGES/GuestImages/exac01db01.domain.local/vdisk_root_01.img \
of=/staging/vdisk_root_01.img \
conv=sparsed

Move disk to former location

After the sparsing operation finished, copy the disk back to their former location:

# Retrieve the disks path:
cat /EXAVMIMAGES/GuestImages/{VM_NAME}/vm.cfg | grep disk
# Copy each disk back to its location:
mv /staging/{DISK_NAME}.img /EXAVMIMAGES/GuestImages/{VM_NAME}/{DISK_NAME}.img

For instance:

mv /staging/vdisk_root_01.img /EXAVMIMAGES/GuestImages/exac01db01.domain.local/vdisk_root_01.img

Start back the VM

Then you can start back the VM which use the new disk:
xm create /EXAVMIMAGES/GuestImages/{VM_NAME}/vm.cfg

I hope this helps and please contact us or comment below should you need more details.

Cet article Sparse OVM virtual disks on appliances est apparu en premier sur Blog dbi services.

Feedback on the EMEA Red Hat Partner Conference 2019 in Prague

$
0
0

A few weeks ago I attended the Red Hat EMEA Partner Conference for the first time. The Red Hat Conference took place in Prague from June 25th to 27th. If you are interested in Open Source technologies and in Red Hat, feel free to read this personal feedback on trends at Red Hat and in the IT sector.

Stronger Together!

Representing dbi services at the Partner Conference in Prague was a great opportunity for us as a Red Hat Advanced Partner.

About 850 people attended this amusing event! Interactions with the Red Hat Community were very interesting and relaxed. Is it because of the Open Source atmosphere? The organization, catering, and location were great also! Many thanks to the organizers !

Also a sincere thank you to all Swiss Red Hat and Tech Data contacts at the event for welcoming and assisting Swiss Partners during the 3 days. Everything went extremely professional thanks to Leonard Bodmer (Country Manager Red Hat Switzerland), Richard Zobrist (Head of Partner & Alliances Red Hat), Maja Zurovec (Account Manager Partner & Alliances Red Hat), Sandra Maria Sigrist (Tech Data), and Daria Stempkowski (Tech Data). Many thanks to all of you, also for the warm and relaxed evening at Villa Richter at the foot of Prague’s castle !

We are Stronger Together!

All about automation, integration, hybrid cloud, and multi-cloud

With this 3 days Partner Conference, Red Hat proposed a bride agenda of Breakouts, Exams, Hackathon, Keynotes, Labs, and an Open Innovation Lab. I mostly attended sessions where Red Hat partners and customers had the opportunity to give feedbacks on their experience with Red Hat products. Some of the sessions and keynotes were remarkable.

Red Hat Middleware Roadmap

The “Red Hat Middleware Roadmap” sessions (Part 1 & 2) with Rich Sharpels were a good opportunity to learn more about productivity (automation, integration, runtimes), reliability (security, reduced complexity), and flexibility (containers for scaling, cloud, hybrid-cloud, multi-cloud) with OpenShift. With these 2 presentations you also got informed on the iPaaS which is a new private integration Platform-as-a-Service offering to provide cloud-based services for application integration and messaging. The goal is here to strengthen collaboration within the business teams (devOps) thanks to Managed Integration + OpenShift Dedicated. Rich Sharpels summarizes the benefits of the iPaaS with: “cloud services and packages where customers don’t have to administrate anything!”

Ansible Partner Enablement Offerings

Günter Herold from Red Hat and Daniel Knözinger from Open Networks Austria held the session “Ansible Partner Enablement Offerings”. This session was focusing on advantages of automating tasks with Ansible for reducing mistakes, errors, and complexity because “Ansible is the universal language for the whole IT team”. With Ansible, “start small and develop” .

Best Practices for Working with Red Hat Support

Who wanted to get informed on “Best Practices for Working with Red Hat Support” attended the session with Caroline Baillargeon, Leona Meeks, and Peter Jakobs from Red Hat. This presentation gave the opportunity to learn and discuss on:

  • The Customer Portal which is said to be “full of information and best practices that could be useful before opening and escalating issues”. For example, should you search for information on Ansible, have a look at this page
  • The TSAnet Connect where answers for multi IT solutions are centralized
  • The Case Management Tool for sharing the open source spirit and to be part of the community (example)
  • Tips to work efficiently with the Red Hat support:
    1. “Make sure the customer is registered on the correct time zone”
    2. “To get 7×24 support, a Premium Support subscription is needed”
    3. “in case the answer on an issue is not appropriate, use the escalation button”
    4. “contact Red Hat to get access to the trainings that also the Red Hat engineers follow for technical problem solving”

While keeping the end customer’s satisfaction in mind, this session could probably be best summarized with “why not contributing and sharing knowledge within the Red Hat community ?”

Keynotes

Keynotes mainly concentrated on “marketing topics” that aim at boosting partner engagement, but still interesting, in particular:

  • “The Killer App in digital transformation is human connection” with Margaret Dawson on collaborating within the community
  • “Open Hybrid Cloud Ecosystems: Bold Goals for Tomorrow” with Lars Herrmann on innovations that have been considered impossible. In short, “if you think it is impossible, just do it”, so develop cloud, hybrid cloud and multi-cloud opportunities…

 

On Red Hat’s and Partner’s booths at the Congress Center

Besides sessions and presentations, lots of interesting technical booths at the Congress Center Prague did promote the work of the Red Hat engineers within the Open Source community. In particular I spent some time with Romain Pelisse (Red Hat Berlin) and Robert Zahradnícek (Red Hat Brno) to let me explain how they work and what are the trends in their areas. Of course we did speak about automation and integration, and about findings that are developed within the open source community first, before getting implemented in Red Hat solutions.

Last but not least, some Red Hat partners were present with a booth to promote their activities and products during the conference, among which Tech Data and Arrows ECS which are well know at dbi services.

What to take from the Red Hat EMEA Conference?

On Red Hat and the Conference

At the end of the day, the keywords from the Red Hat EMEA Conference were probably not that far from the keywords you would get from other technology conferences. Concepts and products like “automation”, “integration”, “Ansible”, or “OpenShift” are means to get companies into the cloud. But why not? The trend into the cloud is getting more and more clear as it now makes sense for lots of projects at least for Disaster Recovery, Test and Development in the cloud.

If private cloud, hybrid cloud, or multi-cloud is not the topic at Red Hat. Their solutions are agnostic. And Red Hat’s strategy is clearly based on a strong commitment to open source. It’s all about “products” (not projects), “collaboration”, “community”, and “customer success”.

On Open Source trends and strategy

Now you may ask why subscribing to Red Hat’s products and support? Sure, with Ansible and other Open Source products you can easily “start small and develop”. Therefore the community version may fit. But what if you go in production? The more and the bigger the projects will become, the more you will need support. And to subscribe will probably make sense.

Then don’t forget that Open Source is not for free. That you go for community or enterprise Open Source makes no difference, at the end you will need to invest at least in time and knowledge. And, depending on the situation, you may subscribe in products and support. If you don’t know where to start, ask dbi services for Open Source expertise.

Looking forward to reading your comments.

Cet article Feedback on the EMEA Red Hat Partner Conference 2019 in Prague est apparu en premier sur Blog dbi services.

SQL Server 2019: Java in SQL Server hard to believe, no?

$
0
0

It has already been a few months that we are testing the next version of SQL Server: SQL Server 2019.
I already blogged about a previous version of SQL Server supporting R and Python.
With the new version of SQL Server 2019, Java will also be integrated.
The Java runtime used is Zulu Open JRE and can be tested from the CTP3.2 of SQL Server 2019

Step 1: The installation

Like a lot of people, I use the GUI to install SQL Server.
On the Feature Selection page, you can select Java in the Machine Learning and Language Extension.

Why not only Machine Learning like the precedent version? Because Java is not a Machine Learning language and now you have this new category “Language Extention”

After checking Java, you have a “Java Install Location”.
As written on the installation page, by default, it’s Zulu Open JRE 11.0.3 which is installed

but you can try to install another one…
I tried to install from a local directory and I searched the Zulu package on the web through this link here

I copied the zip and unzipped it

I put the link into the

Now, just install it.

After the installation finished, I had a look on the result file from the installationile.ini to see what the settings is to install it by script:

As you can see, for scripting the installation, we need to use the label SQL_INST_JAVA

Step 2: Configure Java

According to the documentation of Microsoft

  • Add the JRE_HOME variable: create an environment variable JRE_HOME with the Path where jvm.dll is located

  • Grant access to non-default JRE folder: Grant access to the SQLRUsergroup and SQL Service accounts with these commands:
    • icacls “” /grant “SQLRUsergroup”:(OI)(CI)RX /T

      In my case, I used a named instance and the usergroup need to be adapted with the instance name
      icacls "C:\Temp\zulu11.33.15-ca-jdk11.0.4-win_x64\zulu11.33.15-ca-jdk11.0.4-win_x64" /grant "SQLRUsergroupSQL2019_CTP32":(OI)(CI)RX /T
    • icacls “” /grant “ALL APPLICATION PACKAGES”:(OI)(CI)RX /T
      icacls "C:\Temp\zulu11.33.15-ca-jdk11.0.4-win_x64\zulu11.33.15-ca-jdk11.0.4-win_x64" /grant "ALL APPLICATION PACKAGES":(OI)(CI)RX /T

Step 3: Configure SQL Server Engine

First of all, we enable the usage of “external scripts” through the sp_configure

SELECT * FROM sys.configurations WHERE name= N'external scripts enabled'

If the result is 0, then enable it with:

EXEC sp_configure 'external scripts enabled', 1
RECONFIGURE WITH OVERRIDE

SELECT * FROM sys.configurations WHERE name= N'external scripts enabled'


After that, as you can see the is_dynamic column is 1 then you don’t need to restart the SQL Server Engine to use it

As you can see with SELECT * FROM sys.external_language_files you don’t see Java as external language
We need to register the external language to our Database.

To do this last step, use this command on a database:

CREATE EXTERNAL LANGUAGE Java FROM (CONTENT = N'', FILE_NAME = 'javaextension.dll');

But first, we need to search the package with the javaextention.dll and copy it to another path.

After that, you can register it

At the end, verify again with SELECT * FROM sys.external_language_files

You need to have a new line with the id 65536 and the javaextention.dll

The associated code:

SELECT * FROM sys.external_language_files
USE Test
GO
CREATE EXTERNAL LANGUAGE Java FROM (CONTENT = N'C:\Temp\java-lang-extension.zip', FILE_NAME = 'javaextension.dll');
SELECT * FROM sys.external_language_files

Step 4: Test a java code

I created a little package called test.jar with the class Hello in the package sth printing “Hello World”

But this package will not work… 🙁
To execute under SQL Server, we need to load the library mssql-java-lang-extension into the Referenced Libraries from Eclipse, import the com.microsoft.sqlserver.javalangextension.* and use the execute method.
One important point is also to create the default constructor with

  • executorExtensionVersion = SQLSERVER_JAVA_LANG_EXTENSION_V1;
  • executorInputDatasetClassName = PrimitiveDataset.class.getName();
  • executorOutputDatasetClassName = PrimitiveDataset.class.getName();


It’s just a little bit more complicated… 😕
Export the class as Runnable JAR and go back to SQL Server

The next step is to register the javaSDK and my Runnable JAR test.jar with the command:

CREATE EXTERNAL LIBRARY javaSDK FROM (CONTENT = 'C:\Temp\mssql-java-lang-extension.jar') WITH (LANGUAGE = 'Java');
GO
CREATE EXTERNAL LIBRARY test FROM (CONTENT = 'C:\Temp\test.jar') WITH (LANGUAGE = 'Java');
GO

After the registration, you can execute the DMV sys.external_libraries to see if the registration successed:

SELECT * FROM sys.external_libraries


Now the last step is to execute my library and see “Hello World”:

EXEC sp_execute_external_script  @language = N'Java' , @script = N'sth.Hello'
GO

Et voila, my first Java test with SQL Server. Easy, no? 😎

Cet article SQL Server 2019: Java in SQL Server hard to believe, no? est apparu en premier sur Blog dbi services.

EDB EPAS 12 comes with interval partitioning

$
0
0

While community PostgreSQL 12 is in beta quite some time now (currently in beta 3) it usually takes some time until EDB EPAS will be available on top of the next major PostgreSQL release. Yesterday EDB finally released a beta and you can register for access here, release notes here. One of the new features is interval partitioning which you already might know from Oracle. Basically you do not need to create range partitions in advance but the system will create the partitions for you once you add data that does not fit into any of the current partitions. Lets see how that works.

Without interval partitioning you would need to create a range partitioned table like this (note that this is Oracle syntax which is supported by EPAS but not by community PostgreSQL):

create table my_part_tab ( id int
                         , dummy text
                         , created date
                         )
                         partition by range (created)
                         ( partition my_part_tab_1 values less than (to_date('01.02.2019','DD.MM.YYYY'))
                         );

The issue with that is, that once you want to add data that does not fit into any of the current partitions you will run into issues like that:

edb=# insert into my_part_tab (id,dummy,created) values (1,'aaa',to_date('05.01.2019','DD.MM.YYYY'));
INSERT 0 1
edb=# select * from my_part_tab;
 id | dummy |      created       
----+-------+--------------------
  1 | aaa   | 05-JAN-19 00:00:00
(1 row)

edb=# insert into my_part_tab (id,dummy,created) values (1,'aaa',to_date('05.02.2019','DD.MM.YYYY'));
psql: ERROR:  no partition of relation "my_part_tab" found for row
DETAIL:  Partition key of the failing row contains (created) = (05-FEB-19 00:00:00).
edb=# 

Only when you create the partition required manually you will be able to store the data (or it goes to a default partition, which comes with its own issues):

edb=# alter table my_part_tab add partition my_part_tab_2 values less than (to_date('01.03.2019','DD.MM.YYYY'));
ALTER TABLE
edb=# insert into my_part_tab (id,dummy,created) values (1,'aaa',to_date('05.02.2019','DD.MM.YYYY'));
INSERT 0 1
edb=# select * from my_part_tab;
 id | dummy |      created       
----+-------+--------------------
  1 | aaa   | 05-JAN-19 00:00:00
  1 | aaa   | 05-FEB-19 00:00:00
(2 rows)
edb=# \d+ my_part_tab
                                     Partitioned table "public.my_part_tab"
 Column  |            Type             | Collation | Nullable | Default | Storage  | Stats target | Descripti
---------+-----------------------------+-----------+----------+---------+----------+--------------+----------
 id      | integer                     |           |          |         | plain    |              | 
 dummy   | text                        |           |          |         | extended |              | 
 created | timestamp without time zone |           |          |         | plain    |              | 
Partition key: RANGE (created) NULLS LAST
Partitions: my_part_tab_my_part_tab_1 FOR VALUES FROM (MINVALUE) TO ('01-FEB-19 00:00:00'),
            my_part_tab_my_part_tab_2 FOR VALUES FROM ('01-FEB-19 00:00:00') TO ('01-MAR-19 00:00:00')

Of course it is not a big deal to create the partitions for the next 20 years in advance but there is a more elegant way of doing this by using interval partitioning:

drop table my_part_tab;
create table my_part_tab ( id int
                         , dummy text
                         , created date
                         )
                         partition by range (created)
                         interval (numtoyminterval(1,'month'))
                         ( partition my_part_tab_1 values less than (to_date('01.02.2019','DD.MM.YYYY'))
                         );

Having the table partitioned like that new partitions will be created on the fly as required:

edb=# insert into my_part_tab (id,dummy,created) values (1,'aaa',to_date('05.01.2019','DD.MM.YYYY'));
INSERT 0 1
edb=# insert into my_part_tab (id,dummy,created) values (1,'aaa',to_date('05.02.2019','DD.MM.YYYY'));
INSERT 0 1
edb=# insert into my_part_tab (id,dummy,created) values (1,'aaa',to_date('05.03.2019','DD.MM.YYYY'));
INSERT 0 1
edb=# \d+ my_part_tab
                                     Partitioned table "public.my_part_tab"
 Column  |            Type             | Collation | Nullable | Default | Storage  | Stats target | Descripti
---------+-----------------------------+-----------+----------+---------+----------+--------------+----------
 id      | integer                     |           |          |         | plain    |              | 
 dummy   | text                        |           |          |         | extended |              | 
 created | timestamp without time zone |           |          |         | plain    |              | 
Partition key: RANGE (created) INTERVAL ('1 mon'::interval)
Partitions: my_part_tab_my_part_tab_1 FOR VALUES FROM (MINVALUE) TO ('01-FEB-19 00:00:00'),
            my_part_tab_sys138880102 FOR VALUES FROM ('01-FEB-19 00:00:00') TO ('01-MAR-19 00:00:00'),
            my_part_tab_sys138880103 FOR VALUES FROM ('01-MAR-19 00:00:00') TO ('01-APR-19 00:00:00')

edb=# 

A nice addition which is not (yet) available in community PostgreSQL.

Cet article EDB EPAS 12 comes with interval partitioning est apparu en premier sur Blog dbi services.

Patching or reimaging your ODA?

$
0
0

Introduction

One of the key features of the ODA (Oracle Database Appliance) is the ability to patch the entire stack every three months, the goal being to keep all the components up-to-date. Most of the customers won’t patch so often, but one patch a year is quite a good average. But when comes the time for patching, comes the time for anxiety for the DBA. And it’s totally justified.

Why ODA patching can eventually be a nightmare?

First of all, patching all the products is a complex operation. ODA is not a real appliance: it’s classic hardware composed of parts from various vendors, and with nearly standard software, including Linux, Grid Infrastructure, ASM, ACFS and database engines. And all these products need to be patched together. If you were allowed to patch the ODA components separately, it could last quite a long time. Yes, Oracle provides a single patch for the ODA, but it’s just a bundle of dozen of patches. It’s easier to apply, all together the patches are certified, but it’s still a complex operation to bring all the modules to the target patch level. This is why you can encounter multiple problems. For example if you installed your own RPMs onto the system (unable to update the OS), if you lack some free space (unable to complete the patching), if your databases have specific configuration, or eventually if you discover that there is a bug in the patch related to the version you come from and linked to your ODA model.

Also, some of the patches are not cumulative, meaning that you cannot directly upgrade to the latest version. You sometimes need to apply 4 or 5 patches to upgrade, making the patching even more uncertain.

Starting from these facts, you may think about reimaging, and you’re probably right.

What are the advantages and drawbacks of reimaging?

For sure, reimaging has a lot of advantages:

  • Guarantee of success (you start from scratch)
  • Cleaner ODA (no question about that)
  • Make sure you are able to do the reimage (in case of you really need it)
  • Make sure your documentation is good (and this is the only way to validate it!)
  • Avoid later problems if patching not worked correctly

These are the drawbacks:

  • Longer than a single patch succesfully applied on the first try (who knows)
  • Need to erase everything and restart as if it were a new ODA
  • You need to know how your ODA was installed and configured (not so simple if someone did the job for you)
  • You probably need another ODA with Data Guard or DBVisit to limit the downtime

Can reimaging be quicker than patching?

Patching last about 3 hours if everything is OK. But it’s only for one patch and only if everything is OK. With my patching experience, you probably need to plan 1 day for the first ODA you will patch.

Reimaging also last about 3 hours (more or less depending on your ODA version). But it’s only for reinstalling the software without any database. You will need to restore all your databases, and do all the things you’ve done at the first deployment: copy your scripts, setup your additional software, restore your crontabs, your specific configuration, put back monitoring, and so on.

So, reimaging is probably longer, but you are quite sure to redeploy your ODA in a known time. This is a strong argument. “It will take 8 hours” is always better than “it would take between 3 and 8 hours. Or maybe more. If I succeed”.

How to proceed with patches?

If you need to patch regularly, try to apply the patch on a ODA you can live without. If something goes wrong, you can decide to reimage very quickly instead of opening a SR on MOS. Please don’t get stuck because a patch is not applying correctly, it’s a waste of time.

If you patch every year, consider redeploying instead of patching. It’s probably more work but it will take the same amount of time, with success guarantee (you will love that point). Also, you will ensure that you are able to reimage completely. Reimaging is sometimes also needed if you move your ODA to another datacenter with a network change, so you could have to reimage even for other reasons than patching.

How to make sure that you are able to reimage?

This is the key: be able to reimage

Rules to follow:

  • restrict the access on your ODA to only people concerned about the appliance
  • document every change you make on the server, even a simple chmod
  • never use the GUI to deploy the appliance: deploy your ODA using odacli and save the deployment json file outside of the ODA
  • never use the GUI to create the databases: create the database with odacli and backup the used parameters in the documentation
  • use scripts to configure your databases (avoid one-shot changes)
  • install other products only if necessary: do you really need a backup tool on ODA? NFS backups are great and easy to configure without installing anything
  • install only RPMs manually from Oracle ISOs and only if needed
  • do everything from the command line and avoid using vi. Text editors prevent you from being able to repeat the exact same operation. For example, replace vi /etc/fstab by echo "srv-nfs:/orabackups /backup nfs rw,bg,hard,nolock,nointr" >> /etc/fstab
  • always consider your ODA not so critical by having the possibility to restore your database elsewhere (understand on another ODA), or adopt Data Guard or DBVisit for all your databases that cannot support to be down for hours (even development databases are production for developpers!)
  • keep the install zipfiles corresponding to your version somewhere secured to avoid searching for them on MOS the day you need to reimage

Regarding the scripts, I always create a scripts folder in /home/oracle on ODA, and each database has 3 dedicated scripts to speed up the database recreation if needed: create_SID.sh, configure_SID.sql and tbs_SID.sql. First script is for odacli database creation, first SQL script if for specific configuration (controlfile multiplexing for example, disabling the recycle bin or enabling the archive_lag_target, etc). Second SQL script is for tablespace creation. Target is to be able to recreate the database even for datapump-based restore. Make sure to backup these scripts somewhere else.

Few words about RPMs : for me the best way to install additional RPMs on ODA is to download the Oracle Linux ISO corresponding to the version on your ODA (the ISO you would use if you need to deploy a normal server), mount the ISO on your ODA and pickup only the RPMs you need from it (you can also put these few RPMs on /home/oracle/RPMs).

Conclusion

Reimaging should always be considered as an alternative way of patching. Or the best way. Companies already having integrated this are happier with their ODAs. And are taking the best from these appliances.

Cet article Patching or reimaging your ODA? est apparu en premier sur Blog dbi services.


Upgrading from OpenLeap to SLES15

$
0
0

Sometimes business plans change and maybe you need to move your OpenLeap 15 Server to the supported version SUSE Linux Enterprise Server 15. Upgrade is getting really easy with version 15. It can be performed online. So your server does not need to be offline during the upgrade.

So let’s have a look on the upgrade.

First of all, you need a SUSE Subscription. We will help you with this. Just send us a message.
As soon as you got it you can go on with the upgrade.

Let’s start with checking the actual version running on the server.

openleap:~ $ cat /etc/os-release
NAME="openSUSE Leap"
VERSION="15.0"
ID="opensuse-leap"
ID_LIKE="suse opensuse"
VERSION_ID="15.0"
PRETTY_NAME="openSUSE Leap 15.0"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:leap:15.0"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"

Now we can install SUSEConnect, so we can register the system in the next step.

openleap:~ $ zypper in SUSEConnect
Retrieving repository 'openSUSE-Leap-15.0-Update' metadata ..............................................................................................................................................................[done]
Building repository 'openSUSE-Leap-15.0-Update' cache ...................................................................................................................................................................[done]
Loading repository data...
Reading installed packages...
Resolving package dependencies...

The following 3 NEW packages are going to be installed:
  SUSEConnect rollback-helper zypper-migration-plugin

3 new packages to install.
Overall download size: 138.9 KiB. Already cached: 0 B. After the operation, additional 213.9 KiB will be used.
Continue? [y/n/...? shows all options] (y): y
Retrieving package SUSEConnect-0.3.17-lp150.2.14.1.x86_64                                                                                                                                 (1/3), 100.9 KiB (176.3 KiB unpacked)
Retrieving: SUSEConnect-0.3.17-lp150.2.14.1.x86_64.rpm ..................................................................................................................................................................[done]
Retrieving package rollback-helper-1.0+git20181218.5394d6e-lp150.3.3.1.noarch                                                                                                             (2/3),  22.6 KiB ( 19.9 KiB unpacked)
Retrieving: rollback-helper-1.0+git20181218.5394d6e-lp150.3.3.1.noarch.rpm ..................................................................................................................................[done (7.9 KiB/s)]
Retrieving package zypper-migration-plugin-0.11.1520597355.bcf74ad-lp150.1.1.noarch                                                                                                       (3/3),  15.5 KiB ( 17.6 KiB unpacked)
Retrieving: zypper-migration-plugin-0.11.1520597355.bcf74ad-lp150.1.1.noarch.rpm ..............................................................................................................................[done (253 B/s)]
Checking for file conflicts: ............................................................................................................................................................................................[done]
(1/3) Installing: SUSEConnect-0.3.17-lp150.2.14.1.x86_64 ................................................................................................................................................................[done]
(2/3) Installing: rollback-helper-1.0+git20181218.5394d6e-lp150.3.3.1.noarch ............................................................................................................................................[done]
(3/3) Installing: zypper-migration-plugin-0.11.1520597355.bcf74ad-lp150.1.1.noarch ......................................................................................................................................[done]
openleap:~ # 

Register your system at the SUSE Customer Center, so you get full access to the repositories. This step is mandatory, otherweise it’s not possible to upgrade.

openleap:~ $ SUSEConnect -r REGISTRATION_CODE -p SLES/15/x86_64
Registering system to SUSE Customer Center

Announcing system to https://scc.suse.com ...

Activating SLES 15 x86_64 ...
-> Adding service to system ...
-> Installing release package ...

Successfully registered system

Now check for available extensions and the command to activate it using SUSEConnect

openleap:~ $ SUSEConnect --list-extensions
AVAILABLE EXTENSIONS AND MODULES

    Basesystem Module 15 x86_64
    Activate with: SUSEConnect -p sle-module-basesystem/15/x86_64

        Containers Module 15 x86_64
        Activate with: SUSEConnect -p sle-module-containers/15/x86_64

        Desktop Applications Module 15 x86_64
        Activate with: SUSEConnect -p sle-module-desktop-applications/15/x86_64

            Development Tools Module 15 x86_64
            Activate with: SUSEConnect -p sle-module-development-tools/15/x86_64

            SUSE Linux Enterprise Workstation Extension 15 x86_64
            Activate with: SUSEConnect -p sle-we/15/x86_64 -r ADDITIONAL REGCODE

        SUSE Cloud Application Platform Tools Module 15 x86_64
        Activate with: SUSEConnect -p sle-module-cap-tools/15/x86_64

        SUSE Linux Enterprise Live Patching 15 x86_64
        Activate with: SUSEConnect -p sle-module-live-patching/15/x86_64 -r ADDITIONAL REGCODE

        SUSE Package Hub 15 x86_64
        Activate with: SUSEConnect -p PackageHub/15/x86_64

        Server Applications Module 15 x86_64
        Activate with: SUSEConnect -p sle-module-server-applications/15/x86_64

            Legacy Module 15 x86_64
            Activate with: SUSEConnect -p sle-module-legacy/15/x86_64

            Public Cloud Module 15 x86_64
            Activate with: SUSEConnect -p sle-module-public-cloud/15/x86_64

            SUSE Linux Enterprise High Availability Extension 15 x86_64
            Activate with: SUSEConnect -p sle-ha/15/x86_64 -r ADDITIONAL REGCODE

            Web and Scripting Module 15 x86_64
            Activate with: SUSEConnect -p sle-module-web-scripting/15/x86_64


REMARKS

(Not available) The module/extension is not enabled on your RMT/SMT
(Activated)     The module/extension is activated on your system

MORE INFORMATION

You can find more information about available modules here:
https://www.suse.com/products/server/features/modules.html

In case you need more modules, you can add now any module you need. Please keep in mind, for the extensions you need a separate subscription. For my needs the base module is enough.

openleap:~ $ SUSEConnect -p sle-module-basesystem/15/x86_64
Registering system to SUSE Customer Center

Updating system details on https://scc.suse.com ...

Activating sle-module-basesystem 15 x86_64 ...
-> Adding service to system ...
-> Installing release package ...

Successfully registered system

Next step is to do the upgrade itself. As the output is quite huge, I put some [***] as place holders.

openleap:~ $ zypper dup --force-resolution
Warning: You are about to do a distribution upgrade with all enabled repositories. Make sure these repositories are compatible before you continue. See 'man zypper' for more information about this command.
Refreshing service 'Basesystem_Module_15_x86_64'.
Refreshing service 'SUSE_Linux_Enterprise_Server_15_x86_64'.
Loading repository data...
Warning: Repository 'openSUSE-Leap-15.0-Update-Non-Oss' appears to be outdated. Consider using a different mirror or server.
Reading installed packages...
Computing distribution upgrade...

The following 11 NEW packages are going to be installed:
  dejavu-fonts glibc-locale-base google-opensans-fonts issue-generator kernel-default-4.12.14-lp150.12.67.1 man-pages man-pages-posix release-notes-sles rpcgen yast2-vm zypper-search-packages-plugin

The following 286 packages are going to be upgraded:
  NetworkManager NetworkManager-lang PackageKit PackageKit-backend-zypp PackageKit-gstreamer-plugin PackageKit-gtk3-module PackageKit-lang aaa_base aaa_base-extras apparmor-abstractions 
[***]
[***]
  yast2-storage-ng yast2-users

The following 288 packages have no support information from their vendor:
  NetworkManager NetworkManager-lang PackageKit PackageKit-backend-zypp PackageKit-gstreamer-plugin PackageKit-gtk3-module PackageKit-lang aaa_base aaa_base-extras apparmor-abstractions apparmor-docs apparmor-parser
[***]
[***]

The following package is not supported by its vendor:
  zypper-search-packages-plugin

286 packages to upgrade, 11 new.
Overall download size: 322.4 MiB. Already cached: 0 B. After the operation, additional 343.9 MiB will be used.
Continue? [y/n/...? shows all options] (y): y
Retrieving package issue-generator-1.6-1.1.noarch                                                                                                                                       (1/297),  28.0 KiB ( 25.6 KiB unpacked)
Retrieving: issue-generator-1.6-1.1.noarch.rpm ..........................................................................................................................................................................[done]
Retrieving package man-pages-4.16-3.3.1.noarch                                                                                                                                          (2/297),   
[***]
[***]

Executing %posttrans scripts ............................................................................................................................................................................................[done]
There are some running programs that might use files deleted by recent upgrade. You may wish to check and restart some of them. Run 'zypper ps -s' to list these programs.

Disable the openLeap repository to prevent warnings when using zypper (this is optional).

openleap:~ $ zypper lr -d
Repository priorities are without effect. All enabled repositories share the same priority.

#  | Alias                                                                       | Name                                      | Enabled | GPG Check | Refresh | Priority | Type   | URI                                                                                                                                                                                                                                                        | Service       
---+-----------------------------------------------------------------------------+-------------------------------------------+---------+-----------+---------+----------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------
 1 | Basesystem_Module_15_x86_64:SLE-Module-Basesystem15-Debuginfo-Pool          | SLE-Module-Basesystem15-Debuginfo-Pool    | No      | ----      | ----    |   99     | NONE   | https://updates.suse.com/SUSE/Products/SLE-Module-Basesystem/15/x86_64/product_debug?8YsR5pv4h6qQr15qW8KWqXRBK0MDd9EONPOcnYjrQyXxeU4PVhIX5FRdwf5ziU1Oa8rdtuE2W4NyVotHhKeQrdvQMM9OQ3sEllMJno1VxgQEPq-1QyaCv24cSZsg2H21-d3hQqkxXD3iUKRgNTqHGtkRHHCN71yMa28   | Basesystem_Module_15_x86_64
 2 | Basesystem_Module_15_x86_64:SLE-Module-Basesystem15-Debuginfo-Updates       | SLE-Module-Basesystem15-Debuginfo-Updates | No      | ----      | ----    |   99     | NONE   | https://updates.suse.com/SUSE/Updates/SLE-Module-Basesystem/15/x86_64/update_debug?jjKAgTm0AAAAAAAAq_jTGwRAkx4zc8EQV0ANMjmrFjIoJBofX8ETJPW9qS9ojjVsnoDNK1TRGjk5t31J0Y9Bv_KRzpdYdJVmoH_gO-WaIo-dsZHiDXUm9fjYvLJcjsm0TidUzPnNkAqDAQsPZGZUUCXrek3JjRZl        | Basesystem_Module_15_x86_64
 3 | Basesystem_Module_15_x86_64:SLE-Module-Basesystem15-Pool                    | SLE-Module-Basesystem15-Pool              | Yes     | (r ) Yes  | No      |   99     | rpm-md | https://updates.suse.com/SUSE/Products/SLE-Module-Basesystem/15/x86_64/product?MbepfbRQy5WToAHi4xjhC2KOqjwW00ax8Xj23W9iMukhhWz78BKVY5sSDHiT4nurfz1JyHJrqcqpZiJU-PdajPthp3lQx4hyu-5FzifML0ALTTvKY6XEYA7qlwbn0E6fmA_iSbMl2JOWvZDpeQUZtMlCjQI                 | Basesystem_Module_15_x86_64
 4 | Basesystem_Module_15_x86_64:SLE-Module-Basesystem15-Source-Pool             | SLE-Module-Basesystem15-Source-Pool       | No      | ----      | ----    |   99     | NONE   | https://updates.suse.com/SUSE/Products/SLE-Module-Basesystem/15/x86_64/product_source?86sSfrO8KT3dMsapcn4ihtYRbSwy2kunffEZ6oUiH-vBC-0IkEZQPniCPn63-DeOwlX9brw3vR-BqMNjC9KiOAq0JR0aHZUcyHP5sGhjitLFGTx9zUYo3F4u0KNC3rqIq2WGq-kZEhLm1s2U-vVJHpr6x5RWmMjuBDAe | Basesystem_Module_15_x86_64
 5 | Basesystem_Module_15_x86_64:SLE-Module-Basesystem15-Updates                 | SLE-Module-Basesystem15-Updates           | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | https://updates.suse.com/SUSE/Updates/SLE-Module-Basesystem/15/x86_64/update?WzCCey-NrSLfBHonPxWuaTXt1QuGMemPZsFEhtMfDC_jKtn5XUsqbdI8JZ9D6YNveeYrthpKY2uLTOIB_vtbMQsQUblAr2dU4D59yIBIjZv1l91CLeZD2z61oLPc7ad0UkZjl9R_e6bSNAGP8oz94Fp5                      | Basesystem_Module_15_x86_64
 6 | SUSE_Linux_Enterprise_Server_15_x86_64:SLE-Product-SLES15-Debuginfo-Pool    | SLE-Product-SLES15-Debuginfo-Pool         | No      | ----      | ----    |   99     | NONE   | https://updates.suse.com/SUSE/Products/SLE-Product-SLES/15/x86_64/product_debug?xtsT1GSwugZaHDGElBaTczgwJS79hgJDOy_tkzInodgbplBttQlatgP5rI0SnLQqLCw5WsfSqBIyN_tnMVZn4ZLJ3S3ENBDiZsYhg0vGZf7ILMix03bcXoHEKlzAYRntcEIx877RvS7DDHAAR4cj1V5gzcu6               | SUSE_Linux_Enterprise_Server_15_x86_64
 7 | SUSE_Linux_Enterprise_Server_15_x86_64:SLE-Product-SLES15-Debuginfo-Updates | SLE-Product-SLES15-Debuginfo-Updates      | No      | ----      | ----    |   99     | NONE   | https://updates.suse.com/SUSE/Updates/SLE-Product-SLES/15/x86_64/update_debug?tkJ9rVV33hinQtEBnPYH_5D8OCs1ZtB4WEQFAShIaq1yN6Lwg2-W2Zu2AFALp5Jk3Oh1g1XVBqEOSPnSgACvcCIWuXr_cRfirUHEwbNqIcaSwcjxGjJYdhsb97t01_X-LPT0FDiGGezP64HheC_CzdV6xA                   | SUSE_Linux_Enterprise_Server_15_x86_64
 8 | SUSE_Linux_Enterprise_Server_15_x86_64:SLE-Product-SLES15-Pool              | SLE-Product-SLES15-Pool                   | Yes     | (r ) Yes  | No      |   99     | rpm-md | https://updates.suse.com/SUSE/Products/SLE-Product-SLES/15/x86_64/product?887kGBgH3AfONFY1X3wVkuYn_5nm8sTKex06X1JSRI9gXQNqJioSBea5sAECwbVhqs510L3YRdVlVLgsavZ9D8PPplk8S_oEvhWEQdS-jfFH9dTKcukF09RkjliWQkcaNHkFzY4uQWbHzXJYekkn                             | SUSE_Linux_Enterprise_Server_15_x86_64
 9 | SUSE_Linux_Enterprise_Server_15_x86_64:SLE-Product-SLES15-Source-Pool       | SLE-Product-SLES15-Source-Pool            | No      | ----      | ----    |   99     | NONE   | https://updates.suse.com/SUSE/Products/SLE-Product-SLES/15/x86_64/product_source?XhlzrvfoPp1qTZqv1hErqkUwBGOoZMYY4RAS-c78IKoacswAmOXTemuxa8ZiAFfopgedlQfewbcC7_gxUERoKGdlcW7E4WaqpcuSDYh-xlJr2SG9-4OuxPDToPfZ1CgvDDZIAlqIyXDKGcwvl3EjALH9msDNHg            | SUSE_Linux_Enterprise_Server_15_x86_64
10 | SUSE_Linux_Enterprise_Server_15_x86_64:SLE-Product-SLES15-Updates           | SLE-Product-SLES15-Updates                | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | https://updates.suse.com/SUSE/Updates/SLE-Product-SLES/15/x86_64/update?j0Qh2SfH11scgFCBmZI3R9B4GMJWAh5l6C0P7_jtUle_3dAATzJ2wwwo3SR_dOpn4bBYL4wSkD9bMuCRJlzcmWSkeh1W06Rz8Jsq1KysLODXqUtsBgeE5Tju1Pf-XTpNJF1RQMRRRmb_Tj8RPA                                 | SUSE_Linux_Enterprise_Server_15_x86_64
11 | openSUSE-Leap-15.0-1                                                        | openSUSE-Leap-15.0-1                      | No      | ----      | ----    |   99     | rpm-md | cd:///?devices=/dev/disk/by-id/ata-VBOX_CD-ROM_VB0-01f003f6                                                                                                                                                                                                |               
12 | repo-debug                                                                  | openSUSE-Leap-15.0-Debug                  | No      | ----      | ----    |   99     | NONE   | http://download.opensuse.org/debug/distribution/leap/15.0/repo/oss/                                                                                                                                                                                        |               
13 | repo-debug-non-oss                                                          | openSUSE-Leap-15.0-Debug-Non-Oss          | No      | ----      | ----    |   99     | NONE   | http://download.opensuse.org/debug/distribution/leap/15.0/repo/non-oss/                                                                                                                                                                                    |               
14 | repo-debug-update                                                           | openSUSE-Leap-15.0-Update-Debug           | No      | ----      | ----    |   99     | NONE   | http://download.opensuse.org/debug/update/leap/15.0/oss/                                                                                                                                                                                                   |               
15 | repo-debug-update-non-oss                                                   | openSUSE-Leap-15.0-Update-Debug-Non-Oss   | No      | ----      | ----    |   99     | NONE   | http://download.opensuse.org/debug/update/leap/15.0/non-oss/                                                                                                                                                                                               |               
16 | repo-non-oss                                                                | openSUSE-Leap-15.0-Non-Oss                | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.opensuse.org/distribution/leap/15.0/repo/non-oss/                                                                                                                                                                                          |               
17 | repo-oss                                                                    | openSUSE-Leap-15.0-Oss                    | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.opensuse.org/distribution/leap/15.0/repo/oss/                                                                                                                                                                                              |               
18 | repo-source                                                                 | openSUSE-Leap-15.0-Source                 | No      | ----      | ----    |   99     | NONE   | http://download.opensuse.org/source/distribution/leap/15.0/repo/oss/                                                                                                                                                                                       |               
19 | repo-source-non-oss                                                         | openSUSE-Leap-15.0-Source-Non-Oss         | No      | ----      | ----    |   99     | NONE   | http://download.opensuse.org/source/distribution/leap/15.0/repo/non-oss/                                                                                                                                                                                   |               
20 | repo-update                                                                 | openSUSE-Leap-15.0-Update                 | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.opensuse.org/update/leap/15.0/oss/                                                                                                                                                                                                         |               
21 | repo-update-non-oss                                                         | openSUSE-Leap-15.0-Update-Non-Oss         | Yes     | (r ) Yes  | Yes     |   99     | rpm-md | http://download.opensuse.org/update/leap/15.0/non-oss/                                                                                                                                                                                                     |               
openleap:~ # zypper mr -d 21
Repository 'repo-update-non-oss' has been successfully disabled.

Check for orphaned packages.

openleap:~ $ zypper rm $(zypper --no-refresh packages --orphaned | gawk '{print $5}'  | tail -n +5 )
Too few arguments.
At least one package name is required.
remove (rm) [OPTIONS]  ...

Remove packages with specified capabilities. A capability is NAME[.ARCH][OP], where OP is
one of <, =, >.

  Command options:

-r, --repo     Load only the specified repository.
-t, --type            Type of package (package, patch, pattern, product).
-n, --name                  Select packages by plain name, not by capability.
-C, --capability            Select packages by capability.
-u, --clean-deps            Automatically remove unneeded dependencies.
-U, --no-clean-deps         No automatic removal of unneeded dependencies.
-D, --dry-run               Test the removal, do not actually remove.
    --details               Show the detailed installation summary.
-y, --no-confirm            Don't require user interaction. Alias for the --non-interactive global
                            option.

  Solver options:

    --debug-solver          Create a solver test case for debugging.
    --force-resolution      Force the solver to find a solution (even an aggressive one) rather than
                            asking.
    --no-force-resolution   Do not force the solver to find solution, let it ask.

openleap:~ $ zypper --no-refresh packages --orphaned
Loading repository data...
Reading installed packages...
No packages found.

My whole migration lasts about 30 minutes. But that’s really a small server.

And in the end – you have to reboot, anyway.

openleap:~ $ systemctl reboot

Let’s check if we really run a SLES15 server now.

openleap:~ # cat /etc/os-release
NAME="SLES"
VERSION="15"
VERSION_ID="15"
PRETTY_NAME="SUSE Linux Enterprise Server 15"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15"

Looks good! The system is running SLES15 now. Now you can enjoy the full support and service of SUSE.

Cet article Upgrading from OpenLeap to SLES15 est apparu en premier sur Blog dbi services.

Useful Linux commands for an Oracle DBA

$
0
0

Introduction

Oracle & Linux is a great duet. Very powerfull, very scriptable. Here are several commands that make my life easier. These tools seems to be widespread on most of the Linux distributions.

watch with diff

It’s my favorite tool since a long time. watch can repeat a command indefinitely until you stop it with Ctrl+C. And it’s even more useful with the – -diff parameter. All the differences since last run are highlighted. For example if you want to monitor a running backup, try this:

watch -n 60 --diff 'sqlplus -s /nolog @check_backup; echo ; du -hs /backup'

The check_backup.sql being:


conn / as sysdba
set feedback off
set lines 150
set pages 100
col status for a30
alter session set NLS_DATE_FORMAT="DD/MM-HH24:MI:SS";
select start_time "Start", round (input_bytes/1024/1024,1) "Source MB", round(output_bytes/1024/1024,1) "Backup MB", input_type "Type", status "Status", round(elapsed_seconds/60,1) "Min", round(compression_ratio,1) "Ratio" from v$rman_backup_job_details where start_time >= SYSDATE-1 order by 1 desc;
exit;

Every minute (60 seconds), you will check, in the rman backup views, the amount of data already backed up. And the amount of data in your backup folder.

Very convenient to keep an eye on things without actually repeating the commands.

Truncate a logfile in one simple command

Oracle is generating a lot of logfiles, some of them can reach several GB and fill up your filesystem. How to quickly empty a big logfile without removing it? Simply use the true command:

true > listener.log

Run a SQL script on all the running databases

You need to check something on every databases running on your system? Or eventually make the same change to all these databases? A single line will do the job:

for a in `ps -ef | grep pmon | grep -v grep | awk '{print $8}' | cut -c 10- | sort`; do . oraenv <<< $a; sqlplus -s / as sysdba @my_script.sql >> output.log; done

Don’t forget to put an exit at the end of your SQL script my_script.sql. Using this script through ansible will even increase the scope and save hours of work.

Copy a folder to another server

scp is fine for copying single file or multiple files inside a folder. But copying a folder recursively to a remote server with scp is more complicated. Actually, you need to do a tarfile for that purpose. A clever solution is to use tar without creating any archive on the source server, but with a pipe to the destination server. Very useful and efficient, with just one line:

tar cf - source_folder | ssh oracle@192.168.50.167 "cd destination_folder_for_source_folder; tar xf -"

For sure, you will need +rwx on destination_folder_for_source_folder for oracle user on 192.168.50.167.

Check the network speed – because you need to check

As an Oracle DBA you probably have to deal with performance: not a problem it’s part of your job. But are you sure your database system is running at full network speed? You probably didn’t check that, but low network speed could be the root cause of some performance issues. This concerns copper-based networks.

Today’s servers handle 10Gb/s ethernet speed but can also work with 1Gb/s depending on the network behind the servers. You should be aware that you can still find 100Mb/s network speeds, for example if the network port of the switch attached to your server has been limitated for some reason (needed for the server connected to this port before yours for example). If 1Gb/s is probably enough for most of the databases, 100Mb/s is clearly inadequate, and most of the recent servers will even not handle correctly 100Mb/s network speed. Your Oracle environment may work, but don’t expect high performance level as your databases will have to wait for the network to send packets. Don’t forget that 1Gb/s gives you about 100-120MBytes/s in real condition, and 100Mb/s only allows 10-12MBytes/s, “Fast Ethernet” of the 90’s…

Checking the network speed is easy, with ethtool.

[root@oda-x6-2 ~]# ethtool btbond1
Settings for btbond1:
Supported ports: [ ] Supported link modes: Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Speed: 1000Mb/s <= Network speed is OK
Duplex: Full
Port: Other
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
Link detected: yes

In case of a network bonding interface, please also check the real interfaces associated to the bonding, all the network interfaces belonging to the bonding need to have the same network speed :

[root@oda-x6-2 ~]# ethtool em1
Settings for em1:
Supported ports: [ TP ] Supported link modes: 100baseT/Full <= This network interface is physically supporting 100Mb/s
1000baseT/Full <= also 1Gb/s
10000baseT/Full <= and 10Gb/s
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Advertised link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Speed: 1000Mb/s <= Network speed is 1Gb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: external
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes <= This interface is connected to a switch

Conclusion

Hope this helps!

Cet article Useful Linux commands for an Oracle DBA est apparu en premier sur Blog dbi services.

Alfresco – Share Clustering fail with ‘Ignored XML validation warning’

$
0
0

In a recent project on Alfresco, I had to setup a Clustering environment. It all went smoothly but I did face one single issue with the setup of the Clustering on the Alfresco Share layer. That’s something I never faced before and you will understand why below.

Initially, to setup the Alfresco Share Clustering, I used the sample file packaged in the distribution zip (E.g.: alfresco-content-services-distribution-6.1.0.5.zip):

<?xml version='1.0' encoding='UTF-8'?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:hz="http://www.hazelcast.com/schema/spring"
       xsi:schemaLocation="http://www.springframework.org/schema/beans
                http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
                http://www.hazelcast.com/schema/spring
                https://hazelcast.com/schema/spring/hazelcast-spring-2.4.xsd">

   <!--
        Hazelcast distributed messaging configuration - Share web-tier cluster config
        - see http://www.hazelcast.com/docs.jsp
        - and specifically http://docs.hazelcast.org/docs/2.4/manual/html-single/#SpringIntegration
   -->
   <!-- Configure cluster to use either Multicast or direct TCP-IP messaging - multicast is default -->
   <!-- Optionally specify network interfaces - server machines likely to have more than one interface -->
   <!-- The messaging topic - the "name" is also used by the persister config below -->
   <!--
   <hz:topic id="topic" instance-ref="webframework.cluster.slingshot" name="slingshot-topic"/>
   <hz:hazelcast id="webframework.cluster.slingshot">
      <hz:config>
         <hz:group name="slingshot" password="alfresco"/>
         <hz:network port="5801" port-auto-increment="true">
            <hz:join>
               <hz:multicast enabled="true"
                     multicast-group="224.2.2.5"
                     multicast-port="54327"/>
               <hz:tcp-ip enabled="false">
                  <hz:members></hz:members>
               </hz:tcp-ip>
            </hz:join>
            <hz:interfaces enabled="false">
               <hz:interface>192.168.1.*</hz:interface>
            </hz:interfaces>
         </hz:network>
      </hz:config>
   </hz:hazelcast>

   <bean id="webframework.cluster.clusterservice" class="org.alfresco.web.site.ClusterTopicService" init-method="init">
      <property name="hazelcastInstance" ref="webframework.cluster.slingshot" />
      <property name="hazelcastTopicName"><value>slingshot-topic</value></property>
   </bean>
   -->

</beans>

 

I obviously uncommented the whole section and configured it properly for the Share Clustering. The above content is only the default/sample content, nothing more.

Once configured, I restarted Alfresco but it failed with the following messages:

24-Aug-2019 14:35:12.974 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Catalina]
24-Aug-2019 14:35:12.974 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet Engine: Apache Tomcat/8.5.34
24-Aug-2019 14:35:12.988 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDescriptor Deploying configuration descriptor [/opt/tomcat/conf/Catalina/localhost/share.xml]
Aug 24, 2019 2:35:15 PM org.apache.jasper.servlet.TldScanner scanJars
INFO: At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
Aug 24, 2019 2:35:15 PM org.apache.catalina.core.ApplicationContext log
INFO: No Spring WebApplicationInitializer types detected on classpath
Aug 24, 2019 2:35:15 PM org.apache.catalina.core.ApplicationContext log
INFO: Initializing Spring root WebApplicationContext
2019-08-23 14:35:16,052  WARN  [factory.xml.XmlBeanDefinitionReader] [localhost-startStop-1] Ignored XML validation warning
 org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 92; schema_reference.4: Failed to read schema document 'https://hazelcast.com/schema/spring/hazelcast-spring-2.4.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
	at java.xml/com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:204)
	at java.xml/com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.warning(ErrorHandlerWrapper.java:100)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:392)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:306)
	at java.xml/com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDHandler.reportSchemaErr(XSDHandler.java:4218)
  ... 69 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
	... 89 more
...
2019-08-23 14:35:16,067  ERROR [web.context.ContextLoader] [localhost-startStop-1] Context initialization failed
 org.springframework.beans.factory.parsing.BeanDefinitionParsingException: Configuration problem: Failed to import bean definitions from relative location [surf-config.xml]
Offending resource: class path resource [web-application-config.xml]; nested exception is org.springframework.beans.factory.parsing.BeanDefinitionParsingException: Configuration problem: Failed to import bean definitions from URL location [classpath*:alfresco/web-extension/*-context.xml]
Offending resource: class path resource [surf-config.xml]; nested exception is org.springframework.beans.factory.xml.XmlBeanDefinitionStoreException: Line 18 in XML document from file [/opt/tomcat/shared/classes/alfresco/web-extension/custom-slingshot-application-context.xml] is invalid; nested exception is org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 92; cvc-complex-type.2.4.c: The matching wildcard is strict, but no declaration can be found for element 'hz:topic'.
	at org.springframework.beans.factory.parsing.FailFastProblemReporter.error(FailFastProblemReporter.java:68)
	at org.springframework.beans.factory.parsing.ReaderContext.error(ReaderContext.java:85)
	at org.springframework.beans.factory.parsing.ReaderContext.error(ReaderContext.java:76)
  ... 33 more
Caused by: org.springframework.beans.factory.parsing.BeanDefinitionParsingException: Configuration problem: Failed to import bean definitions from URL location [classpath*:alfresco/web-extension/*-context.xml]
Offending resource: class path resource [surf-config.xml]; nested exception is org.springframework.beans.factory.xml.XmlBeanDefinitionStoreException: Line 18 in XML document from file [/opt/tomcat/shared/classes/alfresco/web-extension/custom-slingshot-application-context.xml] is invalid; nested exception is org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 92; cvc-complex-type.2.4.c: The matching wildcard is strict, but no declaration can be found for element 'hz:topic'.
	at org.springframework.beans.factory.parsing.FailFastProblemReporter.error(FailFastProblemReporter.java:68)
	at org.springframework.beans.factory.parsing.ReaderContext.error(ReaderContext.java:85)
	at org.springframework.beans.factory.parsing.ReaderContext.error(ReaderContext.java:76)
	... 42 more
Caused by: org.springframework.beans.factory.xml.XmlBeanDefinitionStoreException: Line 18 in XML document from file [/opt/tomcat/shared/classes/alfresco/web-extension/custom-slingshot-application-context.xml] is invalid; nested exception is org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 92; cvc-complex-type.2.4.c: The matching wildcard is strict, but no declaration can be found for element 'hz:topic'.
	at org.springframework.beans.factory.xml.XmlBeanDefinitionReader.doLoadBeanDefinitions(XmlBeanDefinitionReader.java:397)
	at org.springframework.beans.factory.xml.XmlBeanDefinitionReader.loadBeanDefinitions(XmlBeanDefinitionReader.java:335)
	at org.springframework.beans.factory.xml.XmlBeanDefinitionReader.loadBeanDefinitions(XmlBeanDefinitionReader.java:303)
	... 44 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 92; cvc-complex-type.2.4.c: The matching wildcard is strict, but no declaration can be found for element 'hz:topic'.
	at java.xml/com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:204)
	at java.xml/com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.error(ErrorHandlerWrapper.java:135)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:396)
	... 64 more
...
24-Aug-2019 14:35:16.196 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.startInternal One or more listeners failed to start. Full details will be found in the appropriate container log file
24-Aug-2019 14:35:16.198 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.startInternal Context [/share] startup failed due to previous errors
Aug 24, 2019 2:35:16 PM org.apache.catalina.core.ApplicationContext log
...

 

As you can see above, the message is pretty clear: there is a problem within the file “/opt/tomcat/shared/classes/alfresco/web-extension/custom-slingshot-application-context.xml” which is causing Share to fail to start properly. The first warning message points you directly to the issue: “Failed to read schema document ‘https://hazelcast.com/schema/spring/hazelcast-spring-2.4.xsd’

After checking the content of the sample file and comparing it with a working one, I found out what was wrong. To solve this specific issue, you can simply replace “https://hazelcast.com/schema/spring/hazelcast-spring-2.4.xsd” with “http://www.hazelcast.com/schema/spring/hazelcast-spring-2.4.xsd“. Please note the two differences in the URL:

  • Switch from “https” to “http
  • Switch from “hazelcast.com” to “www.hazelcast.com

 

The issue was actually caused by the fact that this installation was completely offline, with no access to internet. Because of that, Spring wasn’t able to check for the XSD file to validate the definition in the context file. The solution is therefore to switch the URL to http with www.hazelcast.com so that the Spring internal resolution can understand and use the local file to do the validation and not look for it online.

As mentioned previously, I never faced this issue before for two main reasons:

  • I usually don’t use the sample files provided by Alfresco, I always prefer to build my own
  • I mainly install Alfresco on servers which have internet access (outgoing communications allowed)

 

Once the URL is corrected, Alfresco Share is able to start and the Clustering is configured properly:

24-Aug-2019 14:37:22.558 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Catalina]
24-Aug-2019 14:37:22.558 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet Engine: Apache Tomcat/8.5.34
24-Aug-2019 14:37:22.573 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDescriptor Deploying configuration descriptor [/opt/tomcat/conf/Catalina/localhost/share.xml]
Aug 24, 2019 2:37:24 PM org.apache.jasper.servlet.TldScanner scanJars
INFO: At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
Aug 24, 2019 2:37:25 PM org.apache.catalina.core.ApplicationContext log
INFO: No Spring WebApplicationInitializer types detected on classpath
Aug 24, 2019 2:37:25 PM org.apache.catalina.core.ApplicationContext log
INFO: Initializing Spring root WebApplicationContext
Aug 24, 2019 2:37:28 PM com.hazelcast.impl.AddressPicker
INFO: Resolving domain name 'share_n1.domain' to address(es): [10.10.10.10]
Aug 24, 2019 2:37:28 PM com.hazelcast.impl.AddressPicker
INFO: Resolving domain name 'share_n2.domain' to address(es): [127.0.0.1, 10.10.10.11]
Aug 24, 2019 2:37:28 PM com.hazelcast.impl.AddressPicker
INFO: Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [share_n1.domain/10.10.10.10, share_n2.domain/10.10.10.11, share_n2.domain/127.0.0.1]
Aug 24, 2019 2:37:28 PM com.hazelcast.impl.AddressPicker
INFO: Prefer IPv4 stack is true.
Aug 24, 2019 2:37:28 PM com.hazelcast.impl.AddressPicker
INFO: Picked Address[share_n2.domain]:5801, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5801], bind any local is true
Aug 24, 2019 2:37:28 PM com.hazelcast.system
INFO: [share_n2.domain]:5801 [slingshot] Hazelcast Community Edition 2.4 (20121017) starting at Address[share_n2.domain]:5801
Aug 24, 2019 2:37:28 PM com.hazelcast.system
INFO: [share_n2.domain]:5801 [slingshot] Copyright (C) 2008-2012 Hazelcast.com
Aug 24, 2019 2:37:28 PM com.hazelcast.impl.LifecycleServiceImpl
INFO: [share_n2.domain]:5801 [slingshot] Address[share_n2.domain]:5801 is STARTING
Aug 24, 2019 2:37:28 PM com.hazelcast.impl.TcpIpJoiner
INFO: [share_n2.domain]:5801 [slingshot] Connecting to possible member: Address[share_n1.domain]:5801
Aug 24, 2019 2:37:28 PM com.hazelcast.nio.ConnectionManager
INFO: [share_n2.domain]:5801 [slingshot] 54991 accepted socket connection from share_n1.domain/10.10.10.10:5801
Aug 24, 2019 2:37:29 PM com.hazelcast.impl.Node
INFO: [share_n2.domain]:5801 [slingshot] ** setting master address to Address[share_n1.domain]:5801
Aug 24, 2019 2:37:35 PM com.hazelcast.cluster.ClusterManager
INFO: [share_n2.domain]:5801 [slingshot]

Members [2] {
	Member [share_n1.domain]:5801
	Member [share_n2.domain]:5801 this
}

Aug 24, 2019 2:37:37 PM com.hazelcast.impl.LifecycleServiceImpl
INFO: [share_n2.domain]:5801 [slingshot] Address[share_n2.domain]:5801 is STARTED
2019-08-23 14:37:37,664  INFO  [web.site.ClusterTopicService] [localhost-startStop-1] Init complete for Hazelcast cluster - listening on topic: share_hz_test
...

 

Cet article Alfresco – Share Clustering fail with ‘Ignored XML validation warning’ est apparu en premier sur Blog dbi services.

Documentum – Encryption/Decryption of WebTop 6.8 passwords ‘REJECTED’ with recent JDK

$
0
0

Recently, we had a project to modernize a little bit a pretty old Documentum installation. As part of this project, there were a refresh of the Application Server hosting a WebTop 6.8. In this blog, I will be talking about an issue that we faced in encryption & decryption of passwords in the refresh environment. This new environment was using WebLogic 12.1.3 with the latest PSU in conjunction with the JDK 1.8u192. Since WebTop 6.8 P08, the JDK 1.8u111 is supported so a newer version of the JDK8 should mostly be working without much trouble.

To properly deploy a WebTop application, you will need to encrypt some passwords like the Preferences or Preset passwords. Doing so in the new environment unfortunately failed:

[weblogic@wls_01 ~]$ work_dir=/tmp/work
[weblogic@wls_01 ~]$ cd ${work_dir}/
[weblogic@wls_01 work]$
[weblogic@wls_01 work]$ jar -xf webtop_6.8_P27.war WEB-INF/classes WEB-INF/lib
[weblogic@wls_01 work]$
[weblogic@wls_01 work]$ kc="${work_dir}/WEB-INF/classes/com/documentum/web/formext/session/KeystoreCredentials.properties"
[weblogic@wls_01 work]$
[weblogic@wls_01 work]$ sed -i "s,use_dfc_config_dir=[^$]*,use_dfc_config_dir=false," ${kc}
[weblogic@wls_01 work]$
[weblogic@wls_01 work]$ sed -i "s,keystore.file.location=[^$]*,keystore.file.location=${work_dir}," ${kc}
[weblogic@wls_01 work]$
[weblogic@wls_01 work]$ grep -E "^use_dfc_config_dir|^keystore.file.location" ${kc}
use_dfc_config_dir=false
keystore.file.location=/tmp/work
[weblogic@wls_01 work]$
[weblogic@wls_01 work]$ enc_classpath="${work_dir}/WEB-INF/classes:${work_dir}/WEB-INF/lib/*"
[weblogic@wls_01 work]$
[weblogic@wls_01 work]$ java -classpath "${enc_classpath}" com.documentum.web.formext.session.TrustedAuthenticatorTool "MyP4ssw0rd"
Aug 27, 2019 11:02:23 AM java.io.ObjectInputStream filterCheck
INFO: ObjectInputFilter REJECTED: class com.rsa.cryptoj.o.nc, array length: -1, nRefs: 1, depth: 1, bytes: 72, ex: n/a
java.security.UnrecoverableKeyException: Rejected by the jceks.key.serialFilter or jdk.serialFilter property
        at com.sun.crypto.provider.KeyProtector.unseal(KeyProtector.java:352)
        at com.sun.crypto.provider.JceKeyStore.engineGetKey(JceKeyStore.java:136)
        at java.security.KeyStoreSpi.engineGetEntry(KeyStoreSpi.java:473)
        at java.security.KeyStore.getEntry(KeyStore.java:1521)
        at com.documentum.web.formext.session.TrustedAuthenticatorUtils.getSecretKey(Unknown Source)
        at com.documentum.web.formext.session.TrustedAuthenticatorUtils.decryptByDES(Unknown Source)
        at com.documentum.web.formext.session.TrustedAuthenticatorTool.main(TrustedAuthenticatorTool.java:64)
[weblogic@wls_01 work]$

 

As you can see above, the encryption of password is failing with some error. The issue is that starting with the JDK 1.8u171, Oracle introduced some new restrictions. From the Oracle release note (JDK-8189997):

New Features
security-libs/javax.crypto
Enhanced KeyStore Mechanisms
A new security property named jceks.key.serialFilter has been introduced. If this filter is configured, the JCEKS KeyStore uses it during the deserialization of the encrypted Key object stored inside a SecretKeyEntry. If it is not configured or if the filter result is UNDECIDED (for example, none of the patterns match), then the filter configured by jdk.serialFilter is consulted.

If the system property jceks.key.serialFilter is also supplied, it supersedes the security property value defined here.

The filter pattern uses the same format as jdk.serialFilter. The default pattern allows java.lang.Enum, java.security.KeyRep, java.security.KeyRep$Type, and javax.crypto.spec.SecretKeySpec but rejects all the others.

Customers storing a SecretKey that does not serialize to the above types must modify the filter to make the key extractable.

 

On recent versions of Documentum Administrator for example, there is no issue because it complies but for WebTop 6.8, it doesn’t and therefore to be able to encrypt/decrypt the password, you will have to modify the filter. There are several solutions to our current problem:

  • Downgrade the JDK: this isn’t a good solution since it might introduce security vulnerabilities and it will also prevent you to upgrade it in the future so…
  • Extend the ‘jceks.key.serialFilter‘ definition inside the ‘$JAVA_HOME/jre/lib/security/java.security‘ file: that’s a possibility but it means that any processes using this Java will use the updated filter list. Whether or not that’s fine, it’s up to you
  • Override the ‘jceks.key.serialFilter‘ definition using a JVM startup parameter on a per-process basis: better control on which processes are allowed to use updated filters and which ones aren’t

 

So the simplest way, and most probably the better way, to solve this issue is to simply add a command line parameter to specify that you want to allow some additional classes. By default, the ‘java.security‘ provides a list of some classes that are allowed and it ends with ‘!*‘ which means that everything else is forbidden.

[weblogic@wls_01 work]$ grep -A2 "^jceks.key.serialFilter" $JAVA_HOME/jre/lib/security/java.security
jceks.key.serialFilter = java.lang.Enum;java.security.KeyRep;\
  java.security.KeyRep$Type;javax.crypto.spec.SecretKeySpec;!*

[weblogic@wls_01 work]$
[weblogic@wls_01 work]$ grep "^security.provider" $JAVA_HOME/jre/lib/security/java.security
security.provider.1=com.rsa.jsafe.provider.JsafeJCE
security.provider.2=com.rsa.jsse.JsseProvider
security.provider.3=sun.security.provider.Sun
security.provider.4=sun.security.rsa.SunRsaSign
security.provider.5=sun.security.ec.SunEC
security.provider.6=com.sun.net.ssl.internal.ssl.Provider
security.provider.7=com.sun.crypto.provider.SunJCE
security.provider.8=sun.security.jgss.SunProvider
security.provider.9=com.sun.security.sasl.Provider
security.provider.10=org.jcp.xml.dsig.internal.dom.XMLDSigRI
security.provider.11=sun.security.smartcardio.SunPCSC
[weblogic@wls_01 work]$
[weblogic@wls_01 work]$ # Using an empty parameter allows everything (not the best idea)
[weblogic@wls_01 work]$ java -Djceks.key.serialFilter='' -classpath "${enc_classpath}" com.documentum.web.formext.session.TrustedAuthenticatorTool "MyP4ssw0rd"
Encrypted: [4Fc6kvmUc9cCSQXUqGkp+A==], Decrypted: [MyP4ssw0rd]
[weblogic@wls_01 work]$
[weblogic@wls_01 work]$ # Using the default value from java.security causes the issue
[weblogic@wls_01 work]$ java -Djceks.key.serialFilter='java.lang.Enum;java.security.KeyRep;java.security.KeyRep$Type;javax.crypto.spec.SecretKeySpec;!*' -classpath "${enc_classpath}" com.documentum.web.formext.session.TrustedAuthenticatorTool "MyP4ssw0rd"
Aug 27, 2019 12:05:08 PM java.io.ObjectInputStream filterCheck
INFO: ObjectInputFilter REJECTED: class com.rsa.cryptoj.o.nc, array length: -1, nRefs: 1, depth: 1, bytes: 72, ex: n/a
java.security.UnrecoverableKeyException: Rejected by the jceks.key.serialFilter or jdk.serialFilter property
        at com.sun.crypto.provider.KeyProtector.unseal(KeyProtector.java:352)
        at com.sun.crypto.provider.JceKeyStore.engineGetKey(JceKeyStore.java:136)
        at java.security.KeyStoreSpi.engineGetEntry(KeyStoreSpi.java:473)
        at java.security.KeyStore.getEntry(KeyStore.java:1521)
        at com.documentum.web.formext.session.TrustedAuthenticatorUtils.getSecretKey(Unknown Source)
        at com.documentum.web.formext.session.TrustedAuthenticatorUtils.encryptByDES(Unknown Source)
        at com.documentum.web.formext.session.TrustedAuthenticatorTool.main(TrustedAuthenticatorTool.java:63)
[weblogic@wls_01 work]$
[weblogic@wls_01 work]$ # Adding com.rsa.cryptoj.o.nc to the allowed list
[weblogic@wls_01 work]$ java -Djceks.key.serialFilter='com.rsa.cryptoj.o.nc;java.lang.Enum;java.security.KeyRep;java.security.KeyRep$Type;javax.crypto.spec.SecretKeySpec;!*' -classpath "${enc_classpath}" com.documentum.web.formext.session.TrustedAuthenticatorTool "MyP4ssw0rd"
Aug 27, 2019 12:06:14 PM java.io.ObjectInputStream filterCheck
INFO: ObjectInputFilter REJECTED: class com.rsa.jcm.f.di, array length: -1, nRefs: 3, depth: 2, bytes: 141, ex: n/a
java.security.UnrecoverableKeyException: Rejected by the jceks.key.serialFilter or jdk.serialFilter property
        at com.sun.crypto.provider.KeyProtector.unseal(KeyProtector.java:352)
        at com.sun.crypto.provider.JceKeyStore.engineGetKey(JceKeyStore.java:136)
        at java.security.KeyStoreSpi.engineGetEntry(KeyStoreSpi.java:473)
        at java.security.KeyStore.getEntry(KeyStore.java:1521)
        at com.documentum.web.formext.session.TrustedAuthenticatorUtils.getSecretKey(Unknown Source)
        at com.documentum.web.formext.session.TrustedAuthenticatorUtils.encryptByDES(Unknown Source)
        at com.documentum.web.formext.session.TrustedAuthenticatorTool.main(TrustedAuthenticatorTool.java:63)
[weblogic@wls_01 work]$
[weblogic@wls_01 work]$ # Adding com.rsa.jcm.f.* + com.rsa.cryptoj.o.nc to the allowed list
[weblogic@wls_01 work]$ java -Djceks.key.serialFilter='com.rsa.jcm.f.*;com.rsa.cryptoj.o.nc;java.lang.Enum;java.security.KeyRep;java.security.KeyRep$Type;javax.crypto.spec.SecretKeySpec;!*' -classpath "${enc_classpath}" com.documentum.web.formext.session.TrustedAuthenticatorTool "MyP4ssw0rd"
Encrypted: [4Fc6kvmUc9cCSQXUqGkp+A==], Decrypted: [MyP4ssw0rd]
[weblogic@wls_01 work]$

 

So as you can see above, to encrypt passwords for WebTop 6.8 using a JDK 8u171+, you will need to add both ‘com.rsa.cryptoj.o.nc‘ and ‘com.rsa.jcm.f.*‘ in the allowed list. There is a wildcard for the JCM one because it will require several classes from this package.

The above was for the encryption of the password. That’s fine but obviously, when you will deploy WebTop, it will need to decrypt these passwords at some point… So you will also need to put the same JVM parameter for the process of your Application Server (for the Managed Server’s process in WebLogic):

-Djceks.key.serialFilter='com.rsa.jcm.f.*;com.rsa.cryptoj.o.nc;java.lang.Enum;java.security.KeyRep;java.security.KeyRep$Type;javax.crypto.spec.SecretKeySpec;!*'

 

You can change the order of the classes in the list, it just needs to be before the ‘!*‘ section because everything after that is ignored.

 

Cet article Documentum – Encryption/Decryption of WebTop 6.8 passwords ‘REJECTED’ with recent JDK est apparu en premier sur Blog dbi services.

Debugging SQL Server containers considerations

$
0
0

When it comes to troubleshooting processes or to get a deeper look of how SQL Server works internally , I always used debugging tools as windbg on Windows and since SQL Server is available on Linux, I switched to strace tool.

But let’s add containers in the game now. I didn’t want to touch any based image for applications including SQL Server with debugging tools added. So, my concern was to figure out how to debug a running Docker container from a separate container. Creating a custom image for only debugging tools including strace is not a hard task and after some quick google searches, it is easy to find blogs that explain the process.

Here is my Docker file:

FROM alpine

ENV CAPTUREMODE "S"
RUN apk update && apk add strace
CMD ["sh", "-c", "
strace -t -f -p $(if [ $CAPTUREMODE == ‘Start’ ]; then echo 1; else pgrep -P 1; fi)”]

 

I picked up the alpine based image for is low footprint and my custom image is designed to trace any running container. Note that I’m using the -p parameter to attach to the concerning process and begin tracing. In the Docker file the PID environment variable is intended to target the process ID of the remote container process I want to look at. Obviously, this image may be improved by adding other parameters to implement specific filter rules for example, but this is not the focus of this write-up.

As an aside, maybe you are wondering why I have implemented such variable because usually we trace the PID 1 within a container? (the magic of Docker entry point and Linux namespace)

(Update 06.09.2019 : Anthony E. Nocentino brought me to my attention a more elegant way to get dynamically the child processes of the PID 1. In fact it works well with a running container but not a starting container  because pgrep will return child processes of the sleep command. I modified my Dockerfile above accordingly to provide a way to choose which mode you want to enter)

Well, referring to the Bob Dorr’s blog post and applying it to the container world, good chances are the PID 1 doesn’t belong to the SQL Server process itself but it is a lightweight monitor process for the SQL Server process (WATCHDOG).

If I try to trace the PID 1 of a running SQL Server container, I get the following result with nothing relevant here …

strace: Process 1 attached with 2 threads
[pid     1] 08:11:59 wait4(9,  <unfinished ...>
[pid     8] 08:11:59 ppoll([{fd=15, events=POLLIN}], 1, NULL, NULL, 8

 

… hence the idea to implement the PID variable to provide the ability to pick up the right process to trace (by default the PID 1). Here an example of a Docker command to trace the sqlservr process inside my container:

$ docker ps | grep sql
e3aca69d5dff        mcr.microsoft.com/mssql/server:2019-latest   "/opt/mssql/bin/perm…"   About an hour ago   Up About an hour    0.0.0.0:1401->1433/tcp   sql1

$ docker exec -ti sql1 top
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                 
    9 root      20   0 12.337g 770160  58540 S   1.7 12.7   1:11.78 sqlservr
    1 root      20   0  148520  23036  13376 S   0.0  0.4   0:00.62 sqlservr
  325 root      20   0    1888    920    728 S   0.0  0.0   0:00.04 strace
  376 root      20   0   38728   3184   2748 R   0.0  0.1   0:00.04 top

$ docker run --name debug -t --pid=container:sql1 \
 --net=container:sql1 \
 --cap-add sys_admin \
 --cap-add sys_ptrace \
 strace | tee trace.txt

 

The PID 9 corresponds to the SQL Server process (not the WATCHDOG). Then I started my debug container which begun to trace a bunch of activities. The last docker command uses special parameters including –pid, –net and –cap-add to run the debug container in the same pid and network namespace. Pretty cool!

Here a trimmed sample of the trace.txt file during the creation of the test database:

strace: Process 9 attached with 161 threads
[pid   373] 08:21:49 restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid   343] 08:21:49 futex(0x7f9835de2cc4, FUTEX_WAIT_PRIVATE, 5, NULL <unfinished ...>
…
[pid   162] 08:26:05 open("/var/opt/mssql/data/test.mdf", O_RDWR|O_CREAT|O_EXCL|O_DIRECT, 0660 <unfinished ...>
[pid    14] 08:26:05 clock_gettime(CLOCK_REALTIME, {tv_sec=1567671965, tv_nsec=625743500}) = 0
…
[pid   162] 08:26:05 stat("/var/opt/mssql/data/test.mdf", {st_mode=S_IFREG|0640, st_size=0, ...}) = 0
[pid    14] 08:26:05 clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
…

 

So, for a running container everything seems to work like a charm but what if I want to catch events for a starting container? . I get this opportunity to thank Anthony E. Nocentino (@nocentino) to make me indirectly want to write this blog post 🙂 Anthony is  a Microsoft Data Platform MVP who is doing incredible stuff around SQL Server containers with blogs and trainings. I would simply say just follow it :). At the first step, I got the same idea as him and he already explained very well the context of this tracing problem when SQL Server is starting and the way he found to fix it. Everything is in his blog post so there is no need to duplicate his work. The interesting point here is Anthony’s final approach consists in creating a custom container with SQL Server and strace installed. That’s exactly at the opposite of what I want to achieve but it seemed to be the better alternative so far. To be honest, it remains a good and simple approach in any cases.

But I kept motivated (thank again Anthony for that!), to find another solution I decided to continue exploring other alternatives based on my first requirements (a separate container) and I believe I found an interesting one. Just to refresh your mind, the problem here is to ensure catching all events since the startup of the SQL Server process but However running 2 containers in parallel – the SQL Server container and the debug container including strace tool – seems to be not a viable option and this is exactly what Anthony explained in the first attempt of tests section of his blog.

If you start the debug container too early before the SQL Server container process has started the sqlservr process …

docker run --rm --name debug -t --pid=container:sql1 \
 --net=container:sql1 \
 --cap-add sys_admin \
 --cap-add sys_ptrace \
 strace > trace.txt & \ docker run --name sql1 \
 --env 'ACCEPT_EULA=Y' \
 --env 'MSSQL_SA_PASSWORD=Password1' \
 --volume /t/Docker/DMK/BACKUP/test1:/var/opt/mssql/ \
 --publish 1401:1433 \
 --detach \
 mcr.microsoft.com/mssql/server:2019-latest

 

… chances are you will face an error as show below:

docker: Error response from daemon: No such container: sql1.

 

At the opposite, if you start the debug container too late (note the introduction of the sleep command)  …

$ docker run --name debug -t --pid=container:sql1 \
 --net=container:sql1 \
 --cap-add sys_admin \
 --cap-add sys_ptrace \
 strace > trace.txt & \
 sleep .001 && docker restart sql1

[1] 1171
sql1

[1]  + 1171 exit 137   docker run --name debug -t --pid=container:sql1 --net=container:sql1 --cap-ad

 

… chances are you will miss some events. Here the beginning of the trace output:

trace: Process 1 attached
13:52:53 pread64(13, "\33\n\6-\16\2rx\6\2p\3(\237\24\0\6\24*~L\16\0\n\7\6oQ\16\0\n\6"..., 131072, 31195148) = 131072
13:52:53 pread64(13, "\1\0\245\20\0\0\0\0\0\0\306\r(i\2\0\340\1\0\0\246\20\0\0\0\0\0\0\306\rhi"..., 131072, 31326220) = 131072
13:52:53 pread64(13, "\301\35\2\0b.\0\0\344\10\362\31\2\0i.\0\0\364\t\260\225\1\0\205\4\0\0l\nIQ"..., 131072, 31457292) = 131072

 

Referring to the above strace output, it is clear we didn’t catch all the events from the startup of the sqlservr process with the following missing entry for example:

execve("/opt/mssql/bin/sqlservr", ["/opt/mssql/bin/sqlservr"]

 

In a nutshell, this is not a viable way of achieving what we want. My approach, based on keeping running separate containers, consists in overriding the default entry point in order to delay directly the call of the sqlservr binary during the container runtime.

We may easily identify the Docker entry point by running the docker history command against the concerned SQL Server docker image as follow:

$ docker history c5a295efea97 --no-trunc | grep ENTRYPOINT
<missing>                                                                 2 weeks ago         /bin/sh -c #(nop)  ENTRYPOINT ["/opt/mssql/bin/permissions_check.sh"]

 

When the SQL Server container is spinning up the /opt/mssql/bin/permissions_check.sh is called. Without going into details here, this bash script executes at the final stage the CMD from the Dockerfile which the following is:

$ docker history c5a295efea97 --no-trunc | grep CMD
sha256:c5a295efea970743a71a330a1d314458272811e3524e1fc1245e6d6fe57aab90   2 weeks ago         /bin/sh -c #(nop)  CMD ["/opt/mssql/bin/sqlservr"]

 

It only remains to override the entry point to make a custom call of the sqlservr process during the container runtime with a delay of 4s for example. Here the command I used to start both the SQL Server container and the debug container in parallel and how I managed to trace all the desired events of the sqlservr process.

docker run --name sql1 \
 --env 'ACCEPT_EULA=Y' \
 --env 'MSSQL_SA_PASSWORD=Password1' \
 --volume /t/Docker/DMK/BACKUP/test1:/var/opt/mssql/ \
 --publish 1401:1433 \
 --detach \
 --entrypoint "/bin/sh" \
 mcr.microsoft.com/mssql/server:2019-latest -c 'sleep 4 && /opt/mssql/bin/sqlservr' && \
docker run --rm --name debug -t --pid=container:sql1 \
 --net=container:sql1 \
 --cap-add sys_admin \
 --cap-add sys_ptrace \
 strace > trace.txt

 

I get the following result (pretty similar to Anthony’s result in his blog post)

strace: Process 1 attached
09:12:12 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 6
09:12:15 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=6, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
09:12:15 rt_sigreturn({mask=[]})        = 6
09:12:15 clone(strace: Process 13 attached
 <unfinished ...>
[pid    13] 09:12:15 execve("/opt/mssql/bin/sqlservr", ["/opt/mssql/bin/sqlservr"], 0x5577ad6ccbc8 /* 6 vars */ <unfinished ...>
[pid     1] 09:12:15 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f941b2499d0) = 13
[pid     1] 09:12:15 wait4(-1,  <unfinished ...>
[pid    13] 09:12:15 <... execve resumed> ) = 0
[pid    13] 09:12:15 brk(NULL)          = 0x558af526b000
[pid    13] 09:12:15 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
[pid    13] 09:12:15 readlink("/proc/self/exe", "/opt/mssql/bin/sqlservr", 4096) = 23
[pid    13] 09:12:15 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
[pid    13] 09:12:15 open("/opt/mssql/bin/tls/x86_64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
…

 

Here we go ! Feel free to comment!

Happy container debugging!

 

Cet article Debugging SQL Server containers considerations est apparu en premier sur Blog dbi services.

Scripting is not Just a Series of Commands

$
0
0

I saw, several times now, scripts were it is just a series of commands, sometime just created with copy/paste and few changes.
This is far from best practices in terms of readability, maintainability and debug (ability 🙂).

My Best Practices

Use functions (with types)

Beside adding, the most obvious re-usability, using functions also helps for code maintainability. For instance, if a bug is found and the fix is in a function used 10 times, you will have to fix it once and not 10 times. It also makes code readability better.

Manage errors

If the scripting language you are using support exception, catch exception and use them. It is important, like for example calling a WebService, to catch known and unexpected exception. For example, if an error code 500 (Internal Server Error) is raised while querying a WebService, the script will not be able to proceed any further and thus error code and message could be logged and script stopped.
Also, exception can make the code easier to maintain. Assuming that piece of code:

    Try {
        If ( $input -like 'ADM_*' ){
            # call function with parameters
            FUNCTION_A "ADM"
        } ElseIf ( $input -like 'SYS_*' ) {
            FUNCTION_A "SYS"
        } else
            return $false
        } # End If ADM user
    } Catch {
        # manage the exception as it failed
    }

Instead of managing errors in each If and ElseIf part, multiplying the effort, it is easier to throw an exception in FUNCTION_A that will be caught at end of If’s. With this, you can quick go out of execution stack up to the first catch encountered.

Debugging

For example, in my PowerShell scripts, I usually add this function:

Function Write-Log ([string] $log) {
    $date = Get-Date -Format "dd-MMM-yyyy HH:mm:ss.fff"
    If ($Global:debug) {
        $message = '{0} ({1} ms) - {2}' -f $date, ([math]::floor($measureTime.Elapsed.TotalMilliseconds)), $log
        $measureTime.Restart()
    } Else {
        $message = '{0} - {1}' -f $date, $log
    }
    Write-Host($message)
    $Global:StreamWriter.WriteLine($message)
} # End Write-Log

It nicely format log message with date and time and, if running in debug mode, also include time difference between each log messages. This message will be displayed in console and store in script log file. StreamWriter global variable is set at script initialization. This is the most efficient method I found to write into files.
Visual Studio Code is also not to neglect. It is developed by Microsoft, customizable via plugins and includes an excellent integrated debugger.

Use classes

In some cases, it could be interesting to use classes to hide complexity of certain object.
For example, I had the need to create an Excel report based on a WebService result. Instead of managing complexity of Excel in the main function, I created:

  • A class with a constructor that start Excel and initiate the new sheet
  • A method in this class to set the columns names and width
  • A method to append a line where current line number is maintained in the class

Comments your code

I am sometimes in a hurry and do not comment properly the code. This will probably lead to a waste of time if some changes of code have to be done several weeks of months later. If you don’t do it for others, do it for yourself.

Algorithm

Algorithm is an important part of coding leading to efficient (or not) scripts. Each case is different and the faster algorithm is not necessarily the first you will think of. Just look at sorting algorithm, quick sort being the fastest one.

And you?

This is a few rules or tips that I am trying to follow.
Do you have some others to share ?

Cet article Scripting is not Just a Series of Commands est apparu en premier sur Blog dbi services.

POUG Conference 2019

$
0
0

POUG (Pint with Oracle users group) organized his annual conference on 6-7th September in Wroclaw in New Horizons Cinema.

My abstract about “MySQL 8.0 Community: Ready for GDPR?” was accepted, so I had the opportunity to be there.

Sessions

My talk was planned for the first day. New MySQL 8.0 version introduces several improvements about security and these are the main points I discussed:
– Encryption of Redo/Undo and Binary/Relay log files, which comes to enrich existing datafile encryption
– Some password features such as:
* Password Reuse Policy, to avoid a user to always use the same passwords
* Password Verification Policy, to require current password before changing it
* validate_password Component (with replaces the old validate_password Plugin), to define a secure password policy through some system variables and 3 different levels
– New caching_sha2_password plugin, which let you manage authentication in a faster and more secure way
– SQL Roles, to simplify the user access right management

Here some interesting sessions that I attended.

Keep them out of the database!

How to avoid unwanted connections to have access to our database? Flora Barrièle and Martin Berger explained some possibilities.
Following methods have limitations:
– Filter through a firewall, cause we have to involve the network team
– Use a dedicated listener for each instance, cause it’s difficult to manage in case of big number of databases and environments
To solve these issues we can use instead:
– Connection Manager (a sort of listener with in addition a set of rules to define the source, service, activity, destination)
– Access Control List (ACL, a new functionality of Oracle 12.2 which is used to protect PDBs and associated services)
– Logon triggers
– Audit and reports
In conclusion, different solutions exist. First of all we have to know our ecosystem and our environments before deciding to put something in place. Then we should make it as simple as possible, test and check what is the best for our specific situation.

The MacGyver approach

Lothar Flatz explained an approach to analyze what’s wrong with a query and how to fix it when we don’t have a lot of time.
The first step is to optimize, and for this point we have to know how the optimizer works. Then we can enforce new plans (inserting hints, changing statements text, …) and look for the outline.
Sometimes it’s not easy. Lothar’s session ended with this quote: “Performance optimization is not magic: it’s based on knowledge and facts”.

From transportable tablespaces to pluggable databases

Franck Pachot showed different ways to transport data in different Oracle versions:
– Simple logical move through export/import -> slow
– Logical move including direct-path with Data Pump export/import -> flexible, but slow
– Physical transport with RMAN duplicate -> fast, but not cross-versions
– Transportable Tablespaces which provides a mix between logical move (for metadata) and physical transport (for application/user data) -> fast and flexible (cross-versions)
– Physical transport through PDB clone -> fast, efficient, ideal in a multi-tenant environment
– Full Transportable Tablespaces to move user tablespaces and other objects such as roles, users, … -> flexible, ideal to export from 11R2 to 12c and then to non-CDB to multi-tenant, no need to run scripts on dictionary

Data Guard new features

The Oracle MAA (Maximum Availability Architectures) describes 4 HA reference architectures in order to align Oracle capabilities with customer Service Level requirements. Oracle Data Guard can match Silver, Gold and Platinum reference architectures.
Pieter Van Puymbroeck (Oracle Product Manager for Data Guard) talked about following new 19c features:
– Flashback operations are propagated automatically to the standby (requirements: configure standby for flashback database and in MOUNT state first, set DB_FLASHBACK_RETENTION_TARGET)
– Restore points are automatically propagated from the primary to the standby
– On the Active Data Guard standby, the database buffer cache state is preserved during a role change
– Multi-Instance Redo Apply (parallel redo log apply in RAC environments)
– Observe-Only mode to test fast-start failover without having any impact on the production database
– New commands such as “show configuration lag;” to check all members, and to export/import the Broker configuration

Discussion Panel

In the form of a discussion animated by Kamil Stawiarski, and with funny but serious exchanges with the audience, some Oracle Product Managers and other Oracle specialists talked about one of most topical subject today: Cloud vs on-prem. Automation, Exadata Cloud at Customer, Oracle documentation and log files and much more…

Networking moments

Lots of networking moments during this conference: a game in the city center, a speakers dinner, lunch time at the conference, the party in the Grey Music Club.

As usual it was a real pleasure to share knowledge and meet old friends and new faces.
Thanks to Luiza, Kamil and the ORA-600 Database Whisperers for their warm welcome and for the perfect organization of the event.

A suggestion? Don’t miss it next year!

Cet article POUG Conference 2019 est apparu en premier sur Blog dbi services.


Introducing Accelerated Database Recovery with SQL Server 2019

$
0
0

SQL Server 2019 RC1 was released out a few weeks ago and it is time to start blogging about my favorite core engine features that will be shipped with the next version of SQL Server. Things should not be completely different with the RTM, so let’s introduce the accelerated database recovery (aka ADR) which is mainly designed to solve an annoying issue that probably most of SQL Server DBAs already faced at least one time: long running transactions that impact the overall recovery time. As a reminder with current versions of SQL Server, database recovery time is tied to the largest transaction at the moment of the crash. This is even more true in high-critical environments where it may have a huge impact on the service or application availability and ADR is another feature that may help for sure.

Image from Microsoft documentation

In order to allow very fast rollback and recovery process the SQL Server team redesigned completely the SQL database engine recovery process and the interesting point is they have introduced row-versioning to achieve it. Row-versioning, however, exist since the SQL Server 2005 version through RCSI and SI isolation levels and from my opinion this is finally good news to extend (finally) such capabilities to address long recovery time.

Anyway, I performed some testing to get an idea of what could be the benefit of ADR and the impact of the workload as well. Firstly, I performed a recovery test without ADR and after initiating a long running transaction, I simply crashed my SQL Server instance. I used an AdventureWorks database with the dbo.bigTransactionHistory table which is big enough (I think) to get a relevant result.

The activation of ADR is per database meaning that row-versioning is also managed locally per database. It allows a better workload isolation compared to using the global tempdb version store with previous SQL Server versions.

USE AdventureWorks_dbi;

ALTER DATABASE AdventureWorks_dbi SET
    ACCELERATED_DATABASE_RECOVERY = OFF; 

ALTER DATABASE AdventureWorks_dbi SET
	COMPATIBILITY_LEVEL = 150;
GO

 

The dbo.bigtransactionHistory table has only one clustered primary key …

EXEC sp_helpindex 'dbo.bigTransactionHistory';
GO

 

… with 158’272’243 rows and about 2GB of data

EXEC sp_helpindex 'dbo.bigTransactionHistory';
GO

 

I simulated a long running transaction with the following update query that touches every row of the dbo.bigTransactionHistory table to get a relevant impact on the recovery process duration time.

BEGIN TRAN;

UPDATE dbo.bigTransactionHistory
SET Quantity = Quantity + 1;
GO

 

The related transactions wrote a log of records into the transaction log size as show below:

SELECT 
	DB_NAME(database_id) AS [db_name],
	total_log_size_in_bytes / 1024 / 1024 AS size_MB,
	used_log_space_in_percent AS [used_%]
FROM sys.dm_db_log_space_usage;
GO

 

The sys.dm_tran_* and sys.dm_exec_* DMVs may be helpful to dig into the transaction detail including the transaction start time and log used in the transaction log:

SELECT 
   GETDATE() AS [Current Time],
   [des].[login_name] AS [Login Name],
   DB_NAME ([dtdt].database_id) AS [Database Name],
   [dtdt].[database_transaction_begin_time] AS [Transaction Begin Time],
   [dtdt].[database_transaction_log_bytes_used] / 1024 / 1024 AS [Log Used MB],
   [dtdt].[database_transaction_log_bytes_reserved] / 1024 / 1024 AS [Log Reserved MB],
   SUBSTRING([dest].text, [der].statement_start_offset/2 + 1,(CASE WHEN [der].statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(max),[dest].text)) * 2 ELSE [der].statement_end_offset END - [der].statement_start_offset)/2) as [Query Text]
FROM 
   sys.dm_tran_database_transactions [dtdt]
   INNER JOIN sys.dm_tran_session_transactions [dtst] ON  [dtst].[transaction_id] = [dtdt].[transaction_id]
   INNER JOIN sys.dm_exec_sessions [des] ON  [des].[session_id] = [dtst].[session_id]
   INNER JOIN sys.dm_exec_connections [dec] ON   [dec].[session_id] = [dtst].[session_id]
   LEFT OUTER JOIN sys.dm_exec_requests [der] ON [der].[session_id] = [dtst].[session_id]
   OUTER APPLY sys.dm_exec_sql_text ([der].[sql_handle]) AS [dest]
GO

 

The restart of my SQL Server instance kicked-in the AdventureWorks_dbi database recovery process. It took about 6min in my case:

EXEC sp_readerrorlog 0, 1, N'AdventureWorks_dbi'

 

Digging further in the SQL Server error log, I noticed the phase2 (redo) and phase3 (undo) of the recovery process that took the most of time (as expected).

However, if I performed the same test with ADR enabled for the AdventureWorks_dbi database …

USE AdventureWorks_dbi;

ALTER DATABASE AdventureWorks_dbi SET
    ACCELERATED_DATABASE_RECOVERY = ON;

 

… and I dig again into the SQL Server error log:

Well, the output above is pretty different but clear and irrevocable: there is a tremendous improvement of the recovery time process here. The SQL Server error log indicates the redo phase took 0ms and the undo phase 119ms. I also tested different variations in terms of long transactions and logs generated in the transaction log (4.5GB, 9.1GB and 21GB) without and with ADR. With the latter database recovery remained fast irrespective to the transaction log size as shown below:

But there is no free lunch when enabling ADR because it is a row-versioning based process which may have an impact on the workload. I was curious to compare the performance of my update queries between scenarios including no row-versioning (default), row-versioning with RCSI only, ADR only and finally both RCSI and ADR enabled. I performed all my tests on a virtual machine quad core Intel® Core ™ i7-6600U CPU @ 2.6Ghz and 8GB of RAM. SQL Server memory is capped to 6GB. The underlying storage for SQL Server data files is hosted on SSD disk Samsung 850 EVO 1TB.

Here the first test I performed. This is the same update I performed previously which touches every row on the dbo.bigTransactionHistory table:

BEGIN TRAN;

UPDATE dbo.bigTransactionHistory
SET Quantity = Quantity + 1;
GO

 

And here the result with the different scenarios:

Please don’t focus strongly on values here because it will depend on your context but the result answers to the following questions: Does the activation of ADR will have an impact on the workload and if yes is it in the same order of magnitude than RCSI / SI? The results are self-explanatory.

Then I decided to continue my tests by increasing the impact of the long running transaction with additional updates on the same data in order to stress a little bit the version store.

BEGIN TRAN;

UPDATE dbo.bigTransactionHistory
SET Quantity = Quantity + 1;
GO
UPDATE dbo.bigTransactionHistory
SET Quantity = Quantity + 1;
GO
UPDATE dbo.bigTransactionHistory
SET Quantity = Quantity + 1;
GO
UPDATE dbo.bigTransactionHistory
SET Quantity = Quantity + 1;
GO

 

Here the new results:

This time ADR seems to have a bigger impact than RCSI in my case. Regardless the strict values of this test, the key point here is we have to be aware that enabling ADR will have an impact to the workload.

After performing these bunch of tests, it’s time to get a big picture of ADR design with several components per database including a persisted version store (PVS), a Logical Revert, a sLog and a cleaner process. In this blog post I would like to focus on the PVS component that acts as persistent version store for the concerned database. In other words, with ADR, tempdb will not be used to store row versions anymore. The interesting point is that RCSI / SI row-versioning will continue to be handle through the PVS if ADR is enabled according to my tests.

There is the new added column named is_accelerated_database_recovery_on to the sys.databases system view. In my case both RCSI and ADR are enabled in AdventureWorks_dbi database.

SELECT 
	name AS [database_name],
	is_read_committed_snapshot_on,
	is_accelerated_database_recovery_on
FROM sys.databases
WHERE database_id = DB_ID()

 

The sys.dm_tran_version_store_space_usage DMV displays the total space in tempdb used by the version store for each database whereas the new sys.dm_tran_persistent_version_store_stats DMV provides information related to the new PVS created with the ADR activation.

BEGIN TRAN;

UPDATE dbo.bigTransactionHistory
SET Quantity = Quantity + 1;
GO

SELECT 
	DB_NAME(database_id) AS [db_name],
	oldest_active_transaction_id,
	persistent_version_store_size_kb / 1024 AS pvs_MB
FROM sys.dm_tran_persistent_version_store_stats;
GO

SELECT 
	database_id,
	reserved_page_count / 128 AS reserved_MB
FROM sys.dm_tran_version_store_space_usage;
GO

 

After running my update query, I noticed the PVS in AdventureWorks_dbi database was used rather the version store in tempdb.

So, getting rid of the version store in tempdb seems to be a good idea and probably more scalable per database but according to my tests and without drawing any conclusion now it may lead to performance considerations … let’s see in the future what happens …

In addition, from a storage perspective, because SQL Server doesn’t use tempdb anymore as version store, my curiosity led  to see what happens behind the scene and how PVS interacts with the data pages where row-versioning comes into play. Let’s do some experiments:

Let’s create the dbo.bigTransationHistory_row_version table from the dbo.bigTransationHistory table with fewer data:

USE AdventureWorks_dbi;
GO

DROP TABLE IF EXISTS [dbo].[bigTransactionHistory_row_version];

SELECT TOP 1 *
INTO [dbo].[bigTransactionHistory_row_version]
FROM [dbo].[bigTransactionHistory]

 

Now, let’s have a look at the data page that belongs to my dbo.bigTransacitonHistory_row_version table with the page ID 499960 in my case:

DBCC TRACEON (3604, -1);
DBCC PAGE (AdventureWorks_dbi, 1, 499960, 3);

 

Versioning info exists in the header but obviously version pointer is set to Null because there is no additional version of row to maintain in this case. I just inserted one.

Let’s update the only row that exists in the table as follows:

BEGIN TRAN;
UPDATE [dbo].[bigTransactionHistory_row_version]
SET Quantity = Quantity + 1

 

The version pointer has been updated (but not sure the information is consistent here or at least the values displayed are weird). One another interesting point is there exists more information than the initial 14 bytes of information we may expect to keep track of the pointers. There is also extra 21 bytes at the end of row as show above. On the other hand, the sys.dm_db_index_physical_stats() DMF has been updated to reflect the PVS information with new columns inrow_*, total_inrow_* and offrow_* and may help to understand some of the PVS internals.

SELECT 
	index_id,
	index_level,
	page_count,
	record_count,
	version_record_count,
	inrow_version_record_count,
	inrow_diff_version_record_count,
	total_inrow_version_payload_size_in_bytes,
	offrow_regular_version_record_count,
	offrow_long_term_version_record_count
FROM sys.dm_db_index_physical_stats(
	DB_ID(), OBJECT_ID('dbo.bigTransactionHistory_row_version'), 
	NULL, 
	NULL, 
	'DETAILED'
)

 

Indeed, referring to the above output and correlating them to results I found inside the data page, I would assume the extra 21 bytes stored in the row seems to reflect a (diff ?? .. something I need to get info) value of the previous row (focus on in_row_diff_version_record_count and in_row_version_payload_size_in_bytes columns).

Furthermore, if I perform the update operation on the same data the storage strategy seems to switch to a off-row mode if I refer again to the sys.dm_db_index_physical_stats() DMF output:

Let’s go back to the DBCC PAGE output to confirm this assumption:

Indeed, the extra payload has disappeared, and it remains only the 14 bytes pointer which has been updated accordingly.

Finally, if I perform multiple updates of the same row, SQL Server should keep the off-row storage and should create inside it a chain of version pointers and their corresponding values.

BEGIN TRAN;

UPDATE [dbo].[bigTransactionHistory_row_version]
SET Quantity = Quantity + 1
GO 100000

 

My assumption is verified by taking a look at the previous DMVs. The persistent version store size has increased from ~16MB to ~32MB and we still have 1 version record in off-row mode meaning there is still one version pointer that references the off-row mode structure for my record.

Finally, let’s introduce the cleaner component. Like the tempdb version store, cleanup of old row versions is achieved by an asynchronous process that cleans page versions that are not needed. It wakes up periodically, but we can force it by executing the sp_persistent_version_cleanup stored procedure.

Referring to one of my first tests, the PVS size is about 8GB.

BEGIN TRAN;

UPDATE dbo.bigTransactionHistory
SET Quantity = Quantity + 1;
GO


UPDATE dbo.bigTransactionHistory
SET Quantity = Quantity + 1;
GO
UPDATE dbo.bigTransactionHistory
SET Quantity = Quantity + 1;
GO
UPDATE dbo.bigTransactionHistory
SET Quantity = Quantity + 1;
GO
SELECT 
	DB_NAME(database_id) AS [db_name],
	oldest_active_transaction_id,
	persistent_version_store_size_kb / 1024 AS pvs_MB
FROM sys.dm_tran_persistent_version_store_stats;
GO
-- Running PVS cleanu process
EXEC sp_persistent_version_cleanup

 

According to my tests, the cleanup task took around 6min for the entire PVS, but it was not a blocking process at all as you may see below. As ultimate test, I executed in parallel an update query that touched every row of the same table, but it was not blocked by the cleaner as show below:

This is a process I need to investigate further. Other posts are coming as well .. with other ADR components.

See you!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Cet article Introducing Accelerated Database Recovery with SQL Server 2019 est apparu en premier sur Blog dbi services.

Red Hat Forum Zürich, 2019, some impressions

$
0
0

Currently the Red Hat Forum 2019 in Zürich is ongoing and people just finished lunch before the more technical sessions are starting.

As expected a lot is around OpenShift 4 and automation with Ansible. As dbi is a Red Hat advanced business partner we took the opportunity to be present with a booth for getting in touch with our existing customers and to meet new people:

All the partners got their logo on a huge wall at the entrance to the event:

As the event is getting more and more popular, Red Hat moved to the great and huge location, the Stage One in Zürich Oerlikon. So all of the 850 participants found their space.

There is even space for some fun stuff:

Important as well: the catering was excellent:

The merger with IBM was an important topic and Red Hat again stated several times: Red Hat will stay Red Hat. Let’s see what happens here, not all people trust this statement. All in all it is a great atmosphere here in Oerlikon, great people to discuss with, interesting topics, great organization and a lot of “hybrid cloud”. Technology is moving fast and Red Hat is trying to stay at the front. From a partner perspective the Forum is a great chance to meet the right people within Red Hat, no matter what topic you want to discuss: Technology, marketing, training, whatever. I am pretty sure we will attend the next forum as well.

Cet article Red Hat Forum Zürich, 2019, some impressions est apparu en premier sur Blog dbi services.

Moving oracle database to new home on ODA

$
0
0

Moving database to new ORACLE_HOME is a common dba task. Performing this task on an ODA will need an additional step knowing ODA Lite is using an internal Derby database for the metadata. ODA HA will not be a problem here, knowing we do not have any Derby database. Through this blog I would like to give you some guidance and work around to move database to a new home (of same major release). In this example we will move a database named mydb, with db_unique_name set to mydb_site1 from OraDB11204_home1 to OraDB11204_home2.

I would like to highlight that the following blog is showing the procedure to move a database between ORACLE_HOME of same major release. The new ORACLE_HOME would for example run additionnal patches. An upgrade between Oracle major release is not possible following this procedure, and you would need to use the appropriate odacli commands (odacli upgrade-database) in that case.
Last but not least, I also would like to strongly advise that updating manually the ODA repository should only be peformed after getting Oracle support guidance and agreement to do so. Neither the author (that’s me 🙂 ) nor dbi services 😉 would be responsible for any issue or consequence following commands described in this blog. This would be your own responsability. 😉

I’m running ODA release 12.2.1.3. The database version used in this exemple is an Oracle 11g, but would work exactly the same for any other version like Oracle 12c databases.

Curent database information

Let’s first get information on which dbhome our mydb database is running.

List dbhomes :
[root@oda tmp]# odacli list-dbhomes
ID Name DB Version Home Location Status
---------------------------------------- -------------------- ---------------------------------------- --------------------------------------------- ----------
ed0a6667-0d70-4113-8a5e-3afaf1976fc2 OraDB12102_home1 12.1.0.2.171017 (26914423, 26717470) /u01/app/oracle/product/12.1.0.2/dbhome_1 Configured
89f6687e-f575-45fc-91ef-5521374c54c0 OraDB11204_home1 11.2.0.4.171017 (26609929, 26392168) /u01/app/oracle/product/11.2.0.4/dbhome_1 Configured
8c6bc663-b064-445b-8a14-b7c46df9d1da OraDB12102_home3 12.1.0.2.171017 (26914423, 26717470) /u01/app/oracle/product/12.1.0.2/dbhome_3 Configured
9783fd89-f035-4d1a-aaaf-f1cdb09c6ea8 OraDB11204_home2 11.2.0.4.171017 (26609929, 26392168) /u01/app/oracle/product/11.2.0.4/dbhome_2 Configured

List database information :
[root@oda tmp]# odacli list-databases
ID DB Name DB Type DB Version CDB Class Shape Storage Status DbHomeID
---------------------------------------- ---------- -------- -------------------- ---------- -------- -------- ---------- ------------ ----------------------------------------
f38f3a6c-987c-4e11-8cfa-af5cb66ff4e3 mydb Si 11.2.0.4 false OLTP Odb1 ACFS Configured 89f6687e-f575-45fc-91ef-5521374c54c0

Our database is running on OraDB11204_home1 Oracle home.

Moving database to new home

Let’s move mydb database on OraDB11204_home2.

    The process to link the database to a new home is quite simple and would just be easily done by :

  1. Moving the instance parameter file and password file to the new oracle home
  2. Updating the listener configuration by inserting the new oracle home in case static registration is used
  3. Changing grid cluster information using srvctl command

First we need to stop the database :
oracle@oda:/u01/app/oracle/product/11.2.0.4/dbhome_2/dbs/ [mydb] srvctl stop database -d mydb_site1

We can list the current grid configuration :
oracle@oda:/u01/app/oracle/product/11.2.0.4/dbhome_2/dbs/ [mydb] srvctl config database -d mydb_site1
Database unique name: mydb_site1
Database name: mydb
Oracle home: /u01/app/oracle/product/11.2.0.4/dbhome_1
Oracle user: oracle
Spfile: /u02/app/oracle/oradata/mydb_site1/dbs/spfilemydb.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: mydb_site1
Database instance: mydb
Disk Groups:
Mount point paths: /u02/app/oracle/oradata/mydb_site1,/u03/app/oracle/
Services:
Type: SINGLE
Database is administrator managed

As we can see the grid cluster database is referring the current dbhome. Let’s update it to have it linked to new oracle home :
oracle@oda:/u01/app/oracle/product/11.2.0.4/dbhome_2/dbs/ [mydb] srvctl modify database -d mydb_site1 -o /u01/app/oracle/product/11.2.0.4/dbhome_2

Note that if you are using Oracle 12c, the srvctl command option might differ. Use :
-db for database name
-oraclehome for database oracle home
-pwfile for password file

With Oracle 12c database you will have to specify the change for the password file as well in case it is stored in $ORACLE_HOME/dbs folder.

We can check the new grid database configuration :
oracle@oda:/u01/app/oracle/product/11.2.0.4/dbhome_2/dbs/ [mydb] srvctl config database -d mydb_site1
Database unique name: mydb_site1
Database name: mydb
Oracle home: /u01/app/oracle/product/11.2.0.4/dbhome_2
Oracle user: oracle
Spfile: /u02/app/oracle/oradata/mydb_site1/dbs/spfilemydb.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: mydb_site1
Database instance: mydb
Disk Groups:
Mount point paths: /u02/app/oracle/oradata/mydb_site1,/u03/app/oracle/
Services:
Type: SINGLE
Database is administrator managed

And we can start our database again :
oracle@oda:/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/ [mydb] srvctl start database -d mydb_site1

Our database will now successfully be running on OraDB11204_home2.

We can check and see that dcs agent has successfully updated the oratab file :
oracle@oda:/tmp/ [mydb] grep mydb /etc/oratab
mydb:/u01/app/oracle/product/11.2.0.4/dbhome_2:N # line added by Agent

Our ORACLE_HOME env variable will now be :
oracle@oda01-p:/tmp/ [mydb] echo $ORACLE_HOME
/u01/app/oracle/product/11.2.0.4/dbhome_2

Are we done? No, let’s check how the ODA will display the new updated information.

Checking ODA metadata information

List dbhomes :
[root@oda tmp]# odacli list-dbhomes
ID Name DB Version Home Location Status
---------------------------------------- -------------------- ---------------------------------------- --------------------------------------------- ----------
ed0a6667-0d70-4113-8a5e-3afaf1976fc2 OraDB12102_home1 12.1.0.2.171017 (26914423, 26717470) /u01/app/oracle/product/12.1.0.2/dbhome_1 Configured
89f6687e-f575-45fc-91ef-5521374c54c0 OraDB11204_home1 11.2.0.4.171017 (26609929, 26392168) /u01/app/oracle/product/11.2.0.4/dbhome_1 Configured
8c6bc663-b064-445b-8a14-b7c46df9d1da OraDB12102_home3 12.1.0.2.171017 (26914423, 26717470) /u01/app/oracle/product/12.1.0.2/dbhome_3 Configured
9783fd89-f035-4d1a-aaaf-f1cdb09c6ea8 OraDB11204_home2 11.2.0.4.171017 (26609929, 26392168) /u01/app/oracle/product/11.2.0.4/dbhome_2 Configured

List database information :
[root@oda tmp]# odacli list-databases
ID DB Name DB Type DB Version CDB Class Shape Storage Status DbHomeID
---------------------------------------- ---------- -------- -------------------- ---------- -------- -------- ---------- ------------ ----------------------------------------
f38f3a6c-987c-4e11-8cfa-af5cb66ff4e3 mydb Si 11.2.0.4 false OLTP Odb1 ACFS Configured 89f6687e-f575-45fc-91ef-5521374c54c0

As we can see, ODA metadata coming from the derby database will still show mydb database linked to OraDB11204_home1.

Updating ODA metadata

Let’s update derby database to reflect the changes.

You can get your current appliance version by running the command :
odacli describe-component

ODA version 18.3 or higher

If you are running ODA version 18.3 or higher you can use following command to move a database from one database home to another database home of same base version :
odacli modify-database -i -dh

This command might not be successfull if your database was initially created as instance only :

[root@oda tmp]# odacli modify-database -i f38f3a6c-987c-4e11-8cfa-af5cb66ff4e3 -dh 9783fd89-f035-4d1a-aaaf-f1cdb09c6ea8
DCS-10045:Validation error encountered: Changing the database home is not allowed for an instance only database.

ODA version lower than 18.3

If you are running a lower version of ODA, you will need to update the derby DB manually. I would strongly recommend to act carefully on the derby database to make sure not to corrupt the ODA. I would also encourage you to get Oracle support guidance in case you need to act on your production ODA.
The next steps will describe how to update the derby DB manually.

1) Stop the DCS Agent

[root@oda ~]# initctl stop initdcsagent
initdcsagent stop/waiting

[root@oda ~]# ps -ef | grep dcs-agent | grep -v grep
[root@oda ~]#
2) Copy the derby Database

It is important to backup the repository and to apply the changes on the backup in order to keep the original version unchanged in case of trouble.

Go in the derby db repository folder :
[root@oda tmp]# cd /opt/oracle/dcs/repo

List current repository folder :
[root@oda repo]# ls -l
total 24
-rw-r--r-- 1 root root 1149 Aug 27 11:57 derby.log
drwxr-xr-x 4 root root 4096 Aug 27 15:32 node_0
drwxr-xr-x 4 root root 4096 Aug 12 16:18 node_0_orig_12082019_1619
drwxr-xr-x 4 root root 4096 Aug 26 11:31 node_0_orig_26082019_1132
drwxr-xr-x 4 root root 4096 Aug 26 15:27 node_0_orig_26082019_1528
drwxr-xr-x 4 root root 4096 Aug 27 11:57 node_0_orig_27082019_1158

Backup the repository (we will apply the changes on the backup repository to keep the original so far unchanged) :
[root@oda repo]# cp -rp node_0 node_0_backup_27082019_1533

List current repository folder :
[root@oda repo]# ls -l
total 28
-rw-r--r-- 1 root root 1149 Aug 27 11:57 derby.log
drwxr-xr-x 4 root root 4096 Aug 27 15:32 node_0
drwxr-xr-x 4 root root 4096 Aug 27 15:32 node_0_backup_27082019_1533
drwxr-xr-x 4 root root 4096 Aug 12 16:18 node_0_orig_12082019_1619
drwxr-xr-x 4 root root 4096 Aug 26 11:31 node_0_orig_26082019_1132
drwxr-xr-x 4 root root 4096 Aug 26 15:27 node_0_orig_26082019_1528
drwxr-xr-x 4 root root 4096 Aug 27 11:57 node_0_orig_27082019_1158

3)Start DCS Agent

[root@oda repo]# initctl start initdcsagent
initdcsagent start/running, process 45530

[root@oda repo]# ps -ef | grep dcs-agent | grep -v grep
root 45530 1 99 15:33 ? 00:00:10 java -Xms128m -Xmx512m -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=512m -XX:+DisableExplicitGC -XX:ParallelGCThreads=4 -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:/opt/oracle/dcs/log/gc-dcs-agent-%t-%p.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -Doracle.security.jps.config=/opt/oracle/dcs/agent/jps-config.xml -jar /opt/oracle/dcs/bin/dcs-agent-2.4.12-oda-SNAPSHOT.jar server /opt/oracle/dcs/conf/dcs-agent.json
[root@oda repo]#
4) Update metadata information

We now need to connect to the Derby backup database and make home id changes for the specific database.

Let’s connect to the derby database :
[root@oda repo]# /usr/java/jdk1.8.0_161/db/bin/ij
ij version 10.11
ij> connect 'jdbc:derby:node_0_backup_27082019_1533';

Let’s check current metadata information :
ij> select DBHOMEID from DB where ID='f38f3a6c-987c-4e11-8cfa-af5cb66ff4e3';
DBHOMEID
--------------------------------------------------------------------------------------------------------------------------------
89f6687e-f575-45fc-91ef-5521374c54c0
1 row selected

Let’s update metadata according the home changes :
ij> update DB set DBHOMEID='9783fd89-f035-4d1a-aaaf-f1cdb09c6ea8' where ID='f38f3a6c-987c-4e11-8cfa-af5cb66ff4e3';
1 row inserted/updated/deleted

Let’s check the updated information :
ij> select DBHOMEID from DB where ID='f38f3a6c-987c-4e11-8cfa-af5cb66ff4e3';
DBHOMEID
--------------------------------------------------------------------------------------------------------------------------------
9783fd89-f035-4d1a-aaaf-f1cdb09c6ea8
1 row selected

Let’s commit the changes :
ij> commit;

And finally exit :
ij> exit;

5) Stop the DCS Agent

[root@oda repo]# initctl stop initdcsagent
initdcsagent stop/waiting

[root@oda repo]# ps -ef | grep dcs-agent | grep -v grep
[root@oda repo]#
6) Apply the changes in production

In this step, we will rename the original repository to keep a backup and put our changes in production.

List current repository folder :
[root@oda repo]# ls -ltrh
total 28K
drwxr-xr-x 4 root root 4.0K Aug 12 16:18 node_0_orig_12082019_1619
drwxr-xr-x 4 root root 4.0K Aug 26 11:31 node_0_orig_26082019_1132
drwxr-xr-x 4 root root 4.0K Aug 26 15:27 node_0_orig_26082019_1528
drwxr-xr-x 4 root root 4.0K Aug 27 11:57 node_0_orig_27082019_1158
drwxr-xr-x 4 root root 4.0K Aug 27 15:35 node_0_backup_27082019_1533
-rw-r--r-- 1 root root 1.2K Aug 27 15:35 derby.log
drwxr-xr-x 4 root root 4.0K Aug 27 15:36 node_0

Backup the original database :
[root@oda repo]# mv node_0 node_0_orig_27082019_1536

Put our changes in production :
[root@oda repo]# mv node_0_backup_27082019_1533 node_0

Check the repository folder :
[root@oda repo]# ls -ltrh
total 28K
drwxr-xr-x 4 root root 4.0K Aug 12 16:18 node_0_orig_12082019_1619
drwxr-xr-x 4 root root 4.0K Aug 26 11:31 node_0_orig_26082019_1132
drwxr-xr-x 4 root root 4.0K Aug 26 15:27 node_0_orig_26082019_1528
drwxr-xr-x 4 root root 4.0K Aug 27 11:57 node_0_orig_27082019_1158
drwxr-xr-x 4 root root 4.0K Aug 27 15:35 node_0
-rw-r--r-- 1 root root 1.2K Aug 27 15:35 derby.log
drwxr-xr-x 4 root root 4.0K Aug 27 15:36 node_0_orig_27082019_1536

7) Start the DCS Agent

[root@oda repo]# initctl start initdcsagent
initdcsagent start/running, process 59703

[root@oda repo]# ps -ef | grep dcs-agent | grep -v grep
root 59703 1 99 15:37 ? 00:00:11 java -Xms128m -Xmx512m -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=512m -XX:+DisableExplicitGC -XX:ParallelGCThreads=4 -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:/opt/oracle/dcs/log/gc-dcs-agent-%t-%p.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -Doracle.security.jps.config=/opt/oracle/dcs/agent/jps-config.xml -jar /opt/oracle/dcs/bin/dcs-agent-2.4.12-oda-SNAPSHOT.jar server /opt/oracle/dcs/conf/dcs-agent.json
[root@oda repo]#
8) Check ODA metadata

Now we can check and see that derby database is showing correct metadata information.

Lsit dbhomes :
[root@oda repo]# odacli list-dbhomes
ID Name DB Version Home Location Status
---------------------------------------- -------------------- ---------------------------------------- --------------------------------------------- ----------
ed0a6667-0d70-4113-8a5e-3afaf1976fc2 OraDB12102_home1 12.1.0.2.171017 (26914423, 26717470) /u01/app/oracle/product/12.1.0.2/dbhome_1 Configured
89f6687e-f575-45fc-91ef-5521374c54c0 OraDB11204_home1 11.2.0.4.171017 (26609929, 26392168) /u01/app/oracle/product/11.2.0.4/dbhome_1 Configured
8c6bc663-b064-445b-8a14-b7c46df9d1da OraDB12102_home3 12.1.0.2.171017 (26914423, 26717470) /u01/app/oracle/product/12.1.0.2/dbhome_3 Configured
9783fd89-f035-4d1a-aaaf-f1cdb09c6ea8 OraDB11204_home2 11.2.0.4.171017 (26609929, 26392168) /u01/app/oracle/product/11.2.0.4/dbhome_2 Configured

Check new medata database information :
[root@oda repo]# odacli list-databases
ID DB Name DB Type DB Version CDB Class Shape Storage Status DbHomeID
---------------------------------------- ---------- -------- -------------------- ---------- -------- -------- ---------- ------------ ----------------------------------------
f38f3a6c-987c-4e11-8cfa-af5cb66ff4e3 mydb Si 11.2.0.4 false OLTP Odb1 ACFS Configured 9783fd89-f035-4d1a-aaaf-f1cdb09c6ea8

[root@oda repo]# odacli describe-database -i f38f3a6c-987c-4e11-8cfa-af5cb66ff4e3
Database details
----------------------------------------------------------------
ID: f38f3a6c-987c-4e11-8cfa-af5cb66ff4e3
Description: mydb
DB Name: mydb
DB Version: 11.2.0.4
DB Type: Si
DB Edition: EE
DBID:
Instance Only Database: true
CDB: false
PDB Name:
PDB Admin User Name:
Class: OLTP
Shape: Odb1
Storage: ACFS
CharacterSet: AL32UTF8
National CharacterSet: AL16UTF16
Language: AMERICAN
Territory: AMERICA
Home ID: 9783fd89-f035-4d1a-aaaf-f1cdb09c6ea8
Console Enabled: false
Level 0 Backup Day: Sunday
AutoBackup Disabled: false
Created: June 11, 2019 2:46:35 PM CEST
DB Domain Name: in-kon.ch

Conclusion

Database is now running on new oracle home and ODA metadata information are up to date. Updating metada might be very important for further database upgrade or further database deletion that will be performed with odacli commands. Otherwise next commands might failed.

Cet article Moving oracle database to new home on ODA est apparu en premier sur Blog dbi services.

RMAN in a Multitenant Environment

$
0
0

The Oracle Multitenant architecture came with Oracle 12c a few years ago. For people usually working with traditional Non-CDB database it might be confusing the first time to do Backup and Recovery with pluggable databases (PDBs)
In this document we are trying to explain how to use RMAN backup and recovery in a multitenant environment for an oracle 19c database with 2 pluggable databases.
Below the configuration we are using.

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           READ WRITE NO
         4 PDB2                           READ WRITE NO

We will not use a recovery catalog, but the use of recovery is the same than in a non-CDB environment.
Note that starting with Oracle 19c, we can now connect to a recovery catalog when the target database is a PDB.

Whole CDB Backups

Backin up a whole CDB is like backing up non-CDB database. We have to backup
-root pdb
-all pluggable databases
-archived logs
The steps are:
1- Connect to the root container with a common user having SYSBACKUP or SYSDBA privileges

RMAN> connect target /

connected to target database: ORCL (DBID=1546409981)

RMAN>

2- Launch the backup

RMAN> BACKUP DATABASE PLUS ARCHIVELOG;

Starting backup at 11-SEP-19
current log archived
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=279 device type=DISK
channel ORA_DISK_1: starting archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=6 RECID=1 STAMP=1018632767
input archived log thread=1 sequence=7 RECID=2 STAMP=1018690452
input archived log thread=1 sequence=8 RECID=3 STAMP=1018691169
input archived log thread=1 sequence=9 RECID=4 STAMP=1018693343
channel ORA_DISK_1: starting piece 1 at 11-SEP-19
channel ORA_DISK_1: finished piece 1 at 11-SEP-19
piece handle=/u01/app/oracle/fast_recovery_area/ORCL/backupset/2019_09_11/o1_mf_annnn
...
...
Finished backup at 11-SEP-19

Starting Control File and SPFILE Autobackup at 11-SEP-19
piece handle=/u01/app/oracle/fast_recovery_area/ORCL/autobackup/2019_09_11/o1_mf_s_1018693411_gqkcr46z_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 11-SEP-19

Oracle also recommends to backup sometimes the root container.
Once connected to the root container with a common user, run the backup command

RMAN> BACKUP DATABASE ROOT;

Starting backup at 11-SEP-19
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/u01/app/oracle/oradata/ORCL/system01.dbf
input datafile file number=00003 name=/u01/app/oracle/oradata/ORCL/sysaux01.dbf
input datafile file number=00004 name=/u01/app/oracle/oradata/ORCL/undotbs01.dbf
input datafile file number=00007 name=/u01/app/oracle/oradata/ORCL/users01.dbf
channel ORA_DISK_1: starting piece 1 at 11-SEP-19
channel ORA_DISK_1: finished piece 1 at 11-SEP-19
piece handle=/u01/app/oracle/fast_recovery_area/ORCL/backupset/2019_09_11/o1_mf_nnndf_TAG20190911T103019_gqkd4vxb_.bkp tag=TAG20190911T103019 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:16
Finished backup at 11-SEP-19

Starting Control File and SPFILE Autobackup at 11-SEP-19
piece handle=/u01/app/oracle/fast_recovery_area/ORCL/autobackup/2019_09_11/o1_mf_s_1018693836_gqkd5d65_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 11-SEP-19

RMAN>

PDBSs Backups

Backing up PDBs is not difficult, there are just some mechanisms to know. When connecting with RMAN to the root container, we can back up one or more PDBs while directly connecting to a PDB, we can only back up this PDB.
1- Connect to the root container with a common user having SYSBACKUP or SYSDBA privileges

RMAN> connect target /

connected to target database: ORCL (DBID=1546409981)

RMAN>

And backup individual PDBs

RMAN> BACKUP PLUGGABLE DATABASE PDB1,PDB2;

Starting backup at 11-SEP-19
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
...
...
Starting Control File and SPFILE Autobackup at 11-SEP-19
piece handle=/u01/app/oracle/fast_recovery_area/ORCL/autobackup/2019_09_11/o1_mf_s_1018695111_gqkff85l_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 11-SEP-19

RMAN>

2- Connecting to the PDBs with a local user having SYSBACKUP or SYSDBA privileges
PDB1

RMAN> connect target sys/root@pdb1

connected to target database: ORCL:PDB1 (DBID=4178439423)

RMAN> BACKUP DATABASE;

Starting backup at 11-SEP-19
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=279 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00010 name=/u01/app/oracle/oradata/ORCL/pdb1/sysaux01.dbf
input datafile file number=00009 name=/u01/app/oracle/oradata/ORCL/pdb1/system01.dbf
input datafile file number=00011 name=/u01/app/oracle/oradata/ORCL/pdb1/undotbs01.dbf
input datafile file number=00012 name=/u01/app/oracle/oradata/ORCL/pdb1/users01.dbf
channel ORA_DISK_1: starting piece 1 at 11-SEP-19
channel ORA_DISK_1: finished piece 1 at 11-SEP-19
piece handle=/u01/app/oracle/fast_recovery_area/ORCL/92359B0BEC8B4545E0531502A8C0F64E/backupset/2019_09_11/o1_mf_nnndf_TAG20190911T110707_gqkg9w5n_.bkp tag=TAG20190911T110707 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:07
Finished backup at 11-SEP-19

RMAN>

PDB2

RMAN> connect target sys/root@pdb2

connected to target database: ORCL:PDB2 (DBID=3996013191)

RMAN>  BACKUP DATABASE;

Starting backup at 11-SEP-19
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=279 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00014 name=/u01/app/oracle/oradata/ORCL/pdb2/sysaux01.dbf
input datafile file number=00013 name=/u01/app/oracle/oradata/ORCL/pdb2/system01.dbf
input datafile file number=00015 name=/u01/app/oracle/oradata/ORCL/pdb2/undotbs01.dbf
input datafile file number=00016 name=/u01/app/oracle/oradata/ORCL/pdb2/users01.dbf
channel ORA_DISK_1: starting piece 1 at 11-SEP-19
channel ORA_DISK_1: finished piece 1 at 11-SEP-19
piece handle=/u01/app/oracle/fast_recovery_area/ORCL/92359E387C754644E0531502A8C02C00/backupset/2019_09_11/o1_mf_nnndf_TAG20190911T110844_gqkgdwmm_.bkp tag=TAG20190911T110844 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:07
Finished backup at 11-SEP-19

RMAN>

Tablespace Backup in a PDB

Tablespaces in different PDBs can have the same name. So to eliminate ambiguity always connect to the PDB you want to back up tablespaces.
1- Connect to the PDB with a local user having SYSBACKUP or SYSDBA privilege

[oracle@oraadserver ~]$ rman target sys/root@pdb1

Recovery Manager: Release 19.0.0.0.0 - Production on Wed Sep 11 11:35:53 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL:PDB1 (DBID=4178439423)

2- Issue the BACKUP TABLESPACE command

RMAN> BACKUP TABLESPACE USERS;

Starting backup at 11-SEP-19
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=290 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00012 name=/u01/app/oracle/oradata/ORCL/pdb1/users01.dbf
channel ORA_DISK_1: starting piece 1 at 11-SEP-19
channel ORA_DISK_1: finished piece 1 at 11-SEP-19
piece handle=/u01/app/oracle/fast_recovery_area/ORCL/92359B0BEC8B4545E0531502A8C0F64E/backupset/2019_09_11/o1_mf_nnndf_TAG20190911T113623_gqkj0qxl_.bkp tag=TAG20190911T113623 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 11-SEP-19

RMAN>

Data File Backup in a PDB

Data Files are identified by a unique number across the CDB. So for the Backup we can connect either to the root container or directly to the PDB.
Note that while directly connecting to the PDB, we can only backup files belonging to this PDB.

1- Connect to the root container with a common user having SYSBACKUP or SYSDBA privileges

[oracle@oraadserver admin]$ rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Wed Sep 11 11:54:42 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL (DBID=1546409981)

2- Backup the Data File

RMAN> BACKUP DATAFILE 10;

Starting backup at 11-SEP-19
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=274 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00010 name=/u01/app/oracle/oradata/ORCL/pdb1/sysaux01.dbf
channel ORA_DISK_1: starting piece 1 at 11-SEP-19
channel ORA_DISK_1: finished piece 1 at 11-SEP-19
piece handle=/u01/app/oracle/fast_recovery_area/ORCL/92359B0BEC8B4545E0531502A8C0F64E/backupset/2019_09_11/o1_mf_nnndf_TAG20190911T115504_gqkk3s44_.bkp tag=TAG20190911T115504 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
Finished backup at 11-SEP-19

Starting Control File and SPFILE Autobackup at 11-SEP-19
piece handle=/u01/app/oracle/fast_recovery_area/ORCL/autobackup/2019_09_11/o1_mf_s_1018698908_gqkk3wwt_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 11-SEP-19

RMAN>

Whole CDB Complete Recovery

Suppose we lose all Data Files, Control Files and Redo Log Files of the whole container. We can restore with the following steps
1- Restore Control Files while connecting to the root container

[oracle@oraadserver ORCL]$ rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Wed Sep 11 14:25:25 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database (not started)

RMAN> startup nomount

Oracle instance started

Total System Global Area     872413680 bytes

Fixed Size                     9140720 bytes
Variable Size                297795584 bytes
Database Buffers             557842432 bytes
Redo Buffers                   7634944 bytes

RMAN>

RMAN> restore controlfile FROM AUTOBACKUP;

Starting restore at 11-SEP-19
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=21 device type=DISK

recovery area destination: /u01/app/oracle/fast_recovery_area
database name (or database unique name) used for search: ORCL
channel ORA_DISK_1: AUTOBACKUP /u01/app/oracle/fast_recovery_area/ORCL/autobackup/2019_09_11/o1_mf_s_1018698908_gqkk3wwt_.bkp found in the recovery area
AUTOBACKUP search with format "%F" not attempted because DBID was not set
channel ORA_DISK_1: restoring control file from AUTOBACKUP /u01/app/oracle/fast_recovery_area/ORCL/autobackup/2019_09_11/o1_mf_s_1018698908_gqkk3wwt_.bkp
channel ORA_DISK_1: control file restore from AUTOBACKUP complete
output file name=/u01/app/oracle/oradata/ORCL/control01.ctl
output file name=/u01/app/oracle/fast_recovery_area/ORCL/control02.ctl
Finished restore at 11-SEP-19
RMAN>

2- Mount the CDB

RMAN> alter database mount;

released channel: ORA_DISK_1
Statement processed

3- List Backup of archived logs

RMAN> list backup of archivelog all;


List of Backup Sets
===================


BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
2       397.52M    DISK        00:00:04     11-SEP-19
        BP Key: 2   Status: AVAILABLE  Compressed: NO  Tag: TAG20190911T102225
        Piece Name: /u01/app/oracle/fast_recovery_area/ORCL/backupset/2019_09_11/o1_mf_annnn_TAG20190911T102225_gqkcp1k9_.bkp

  List of Archived Logs in backup set 2
  Thrd Seq     Low SCN    Low Time  Next SCN   Next Time
  ---- ------- ---------- --------- ---------- ---------
  1    6       2120330    10-SEP-19 2155559    10-SEP-19
  1    7       2155559    10-SEP-19 2257139    11-SEP-19
  1    8       2257139    11-SEP-19 2327294    11-SEP-19
  1    9       2327294    11-SEP-19 2342937    11-SEP-19

BS Key  Size       Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
7       5.00K      DISK        00:00:00     11-SEP-19
        BP Key: 7   Status: AVAILABLE  Compressed: NO  Tag: TAG20190911T102330
        Piece Name: /u01/app/oracle/fast_recovery_area/ORCL/backupset/2019_09_11/o1_mf_annnn_TAG20190911T102330_gqkcr2n1_.bkp

  List of Archived Logs in backup set 7
  Thrd Seq     Low SCN    Low Time  Next SCN   Next Time
  ---- ------- ---------- --------- ---------- ---------
  1    10      2342937    11-SEP-19 2342996    11-SEP-19

RMAN>

4- Restore the database according to the sequence

RMAN> restore database until sequence 11;

Starting restore at 11-SEP-19
Starting implicit crosscheck backup at 11-SEP-19
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=24 device type=DISK
Crosschecked 18 objects
...
...
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
Finished restore at 11-SEP-19

RMAN>

5- Recover the database

RMAN> recover database until sequence 11;

Starting recover at 11-SEP-19
using channel ORA_DISK_1

starting media recovery

archived log for thread 1 with sequence 10 is already on disk as file /u01/app/oracle/fast_recovery_area/ORCL/archivelog/2019_09_11/o1_mf_1_10_gqkcr27d_.arc
archived log file name=/u01/app/oracle/fast_recovery_area/ORCL/archivelog/2019_09_11/o1_mf_1_10_gqkcr27d_.arc thread=1 sequence=10
media recovery complete, elapsed time: 00:00:00
Finished recover at 11-SEP-19

RMAN>

6- Open database in Resetlogs mode

RMAN> alter database open resetlogs;

Statement processed

RMAN>

PDBs Complete Recovery

To recover a PDB we can :
– Connect to the root and then use the RESTORE PLUGGABLE DATABASE and RECOVER PLUGGABLE DATABASE commands.
1- Close the PDB to recover

SQL> alter pluggable database pdb1 close;

Pluggable database altered.

SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           MOUNTED
         4 PDB2                           READ WRITE NO
SQL>

2- Connect to rman on the root container and issue the restore command

RMAN> RESTORE PLUGGABLE DATABASE  PDB1;

Starting restore at 11-SEP-19
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=56 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00009 to /u01/app/oracle/oradata/ORCL/pdb1/system01.dbf
channel ORA_DISK_1: restoring datafile 00010 to /u01/app/oracle/oradata/ORCL/pdb1/sysaux01.dbf
channel ORA_DISK_1: restoring datafile 00011 to /u01/app/oracle/oradata/ORCL/pdb1/undotbs01.dbf
channel ORA_DISK_1: restoring datafile 00012 to /u01/app/oracle/oradata/ORCL/pdb1/users01.dbf
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fast_recovery_area/ORCL/92359B0BEC8B4545E0531502A8C0F64E/backupset/2019_09_11/o1_mf_nnndf_TAG20190911T144816_gqkv9btm_.bkp
channel ORA_DISK_1: piece handle=/u01/app/oracle/fast_recovery_area/ORCL/92359B0BEC8B4545E0531502A8C0F64E/backupset/2019_09_11/o1_mf_nnndf_TAG20190911T144816_gqkv9btm_.bkp tag=TAG20190911T144816
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
Finished restore at 11-SEP-19

3- Recover the pluggable database

RMAN> RECOVER PLUGGABLE DATABASE  PDB1;

Starting recover at 11-SEP-19
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 11-SEP-19

4- Open the pluggable database

RMAN> alter pluggable database PDB1 open;

Statement processed

RMAN>

– Connect to the PDB and use the RESTORE DATABASE and RECOVER DATABASE commands.
1- Close the PDB to recover

SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           READ WRITE NO
         4 PDB2                           MOUNTED
SQL>

2- Connect to the PDB and issue the RESTORE DATABASE command

[oracle@oraadserver pdb1]$ rman target sys/root@pdb2

Recovery Manager: Release 19.0.0.0.0 - Production on Wed Sep 11 15:19:03 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL:PDB2 (DBID=3996013191, not open)

RMAN> RESTORE DATABASE;

Starting restore at 11-SEP-19
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=45 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00013 to /u01/app/oracle/oradata/ORCL/pdb2/system01.dbf
channel ORA_DISK_1: restoring datafile 00014 to /u01/app/oracle/oradata/ORCL/pdb2/sysaux01.dbf
channel ORA_DISK_1: restoring datafile 00015 to /u01/app/oracle/oradata/ORCL/pdb2/undotbs01.dbf
channel ORA_DISK_1: restoring datafile 00016 to /u01/app/oracle/oradata/ORCL/pdb2/users01.dbf
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fast_recovery_area/ORCL/92359E387C754644E0531502A8C02C00/backupset/2019_09_11/o1_mf_nnndf_TAG20190911T144816_gqkv9tfq_.bkp
channel ORA_DISK_1: piece handle=/u01/app/oracle/fast_recovery_area/ORCL/92359E387C754644E0531502A8C02C00/backupset/2019_09_11/o1_mf_nnndf_TAG20190911T144816_gqkv9tfq_.bkp tag=TAG20190911T144816
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
Finished restore at 11-SEP-19

RMAN>

3- Recover the pluggable database

RMAN> recover database;

Starting recover at 11-SEP-19
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 11-SEP-19

4- Open the database

RMAN> alter database open ;

Statement processed

RMAN>

Complete Tablespace Recovery in a PDB

-Non-SYSTEM Tablespace

To recover a Non-SYSTEM Tablespace in a PDB we can do next steps
1-Put the tablespace offline while connecting to the PDB

SQL> ALTER TABLESPACE MYTAB OFFLINE;

Tablespace altered.

SQL>

2- Connect to the PDB with RMAN and issue the RESTORE TABLESPACE command

[oracle@oraadserver pdb2]$ rman target sys/root@pdb2

Recovery Manager: Release 19.0.0.0.0 - Production on Wed Sep 11 16:52:37 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL:PDB2 (DBID=3996013191)

RMAN> RESTORE TABLESPACE MYTAB;

Starting restore at 11-SEP-19
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=274 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00017 to /u01/app/oracle/oradata/ORCL/pdb2/mytab01.dbf
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fast_recovery_area/ORCL/92359E387C754644E0531502A8C02C00/backupset/2019_09_11/o1_mf_nnndf_TAG20190911T163708_gql1o4gx_.bkp
channel ORA_DISK_1: piece handle=/u01/app/oracle/fast_recovery_area/ORCL/92359E387C754644E0531502A8C02C00/backupset/2019_09_11/o1_mf_nnndf_TAG20190911T163708_gql1o4gx_.bkp tag=TAG20190911T163708
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 11-SEP-19

3- Issue the RECOVER TABLESPACE command

RMAN> RECOVER TABLESPACE MYTAB;

Starting recover at 11-SEP-19
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 11-SEP-19

RMAN>

4- Put back the tablespace ONLINE

RMAN> ALTER TABLESPACE MYTAB ONLINE;

Statement processed

RMAN>

-SYSTEM Tablespace

To recover a SYSTEM Tablespace in a PDB

1- Shutdown the entire CDB and Mount it

[oracle@oraadserver pdb2]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Wed Sep 11 17:09:33 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> shut immediate
ORA-01116: error in opening database file 13
ORA-01110: data file 13: '/u01/app/oracle/oradata/ORCL/pdb2/system01.dbf'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
SQL> shut abort;
ORACLE instance shut down.
SQL>

SQL> startup mount
ORACLE instance started.

Total System Global Area  872413680 bytes
Fixed Size                  9140720 bytes
Variable Size             310378496 bytes
Database Buffers          545259520 bytes
Redo Buffers                7634944 bytes
Database mounted.
SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       MOUNTED
         3 PDB1                           MOUNTED
         4 PDB2                           MOUNTED
SQL>

2- Connect to root container and restore the corresponding files (Files can be identified using command REPORT SCHEMA for example)

RMAN> RESTORE DATAFILE 13;

Starting restore at 11-SEP-19
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=26 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00013 to /u01/app/oracle/oradata/ORCL/pdb2/system01.dbf
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fast_recovery_area/ORCL/92359E387C754644E0531502A8C02C00/backupset/2019_09_11/o1_mf_nnndf_TAG20190911T163708_gql1o4gx_.bkp
channel ORA_DISK_1: piece handle=/u01/app/oracle/fast_recovery_area/ORCL/92359E387C754644E0531502A8C02C00/backupset/2019_09_11/o1_mf_nnndf_TAG20190911T163708_gql1o4gx_.bkp tag=TAG20190911T163708
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
Finished restore at 11-SEP-19

RMAN>

3- Recover the Data File

RMAN> RECOVER DATAFILE 13;

Starting recover at 11-SEP-19
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:00

Finished recover at 11-SEP-19

4- Open all containers

RMAN> alter database open;

Statement processed

RMAN> ALTER PLUGGABLE DATABASE ALL OPEN READ WRITE;

Statement processed

RMAN>

Conclusion


In this blog we basically explain how to use RMAN in a multitenant environment. We did not talked about PITR recovery, we will do it in a coming blog.
Note also that we did not use RMAN commands like LIST FAILURE, ADVISE FAILURE and REPAIR FAILURE. But these commands also work.

Cet article RMAN in a Multitenant Environment est apparu en premier sur Blog dbi services.

How to get a big picture of K8s pods and PVs by script

$
0
0

A couple of weeks ago, during an internal dbi services workshop about Docker and K8s, I got an interesting question: is it possible to get a big picture of pods connected to one specific persistent volume (PV) by using kubectl command?

It was an interesting question because if we go through the usual kubectl command, it doesn’t provide natively the desired output. In fact, one one hand we get information about PVs and their related persistent volume claims (PVC) and one other hand Pods with their PVCs.

I didn’t find out any relevant result with my google-fu so here my contribution with a bash script that provides the overall picture of pods that are tied to a specific PV. It could be helpful in any appropriate way to quickly identify stateful(set) applications like SQL Server databases (and others) and their related storage 🙂

#!/bin/bash

pv=$1

pvcs=$(kubectl get pv --output=json | jq -r --arg PV "$pv" '.items[] | select (.metadata.name==$PV or $PV=="") | "\(.metadata.name)" + "|" + "\(.spec.claimRef.namespace)" + "|" + "\(.spec.claimRef.name)"')

 for pvc in $pvcs 
 do 
     p_pv=$(echo $pvc | cut -d'|' -f1)
     p_ns=$(echo $pvc | cut -d'|' -f2)
     p_pvc=$(echo $pvc | cut -d'|' -f3)
  
     echo "===================================================="
     echo "==> pv: $p_pv"
     kubectl get pods -n $p_ns --output=json | jq -c --arg CLAIM "$p_pvc" '.items[] | {Pod: .metadata.name, Namespace: .metadata.namespace, ClaimName: .spec | select ( has ("volumes") ).volumes[] | select( has ("persistentVolumeClaim") ).persistentVolumeClaim | select (.claimName==$CLAIM) }'
 done

 

If you provide as input a specific PV you will get a picture of all pods tied to it otherwise all PVs and their linked pods will be shown in the output.

  • With a specific PV as input
$ ./get_pods_by_pv.sh pvc-a969c5d7-d654-11e9-ab0d-06376ae701a9
====================================================
==> pv: pvc-a969c5d7-d654-11e9-ab0d-06376ae701a9
{"Pod":"mssql-deployment-85bfdfc66c-ht4wl","Namespace":"ci","ClaimName":{"claimName":"mssql-data"}}

 

  • No PV as input
$ ./get_pods_by_pv.sh
====================================================
==> pv: pvc-a969c5d7-d654-11e9-ab0d-06376ae701a9
{"Pod":"mssql-deployment-85bfdfc66c-ht4wl","Namespace":"ci","ClaimName":{"claimName":"mssql-data"}}
====================================================
==> pv: pvc-eb90982d-d654-11e9-ab0d-06376ae701a9
{"Pod":"mssql-deployment-2-7db97bf7d-9bsp6","Namespace":"ci","ClaimName":{"claimName":"mssql-data-2"}}
====================================================
==> pv: pvc-f9c14523-d61e-11e9-ab0d-06376ae701a9

 

Feel free to comment, to share or to improve!

See you!

 

 

Cet article How to get a big picture of K8s pods and PVs by script est apparu en premier sur Blog dbi services.

Viewing all 2845 articles
Browse latest View live