Quantcast
Channel: dbi Blog
Viewing all 2879 articles
Browse latest View live

PostgreSQL 14: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

$
0
0

It is a common misunderstanding that VACUUM FULL saves you from running out of disk space if you already have space pressure. Running a VACUUM FULL temporarily requires at least double the space, as the table (and the indexes on the table) get completely re-written. PostgreSQL 14 will probably come with a solution for that as this patch introduces the possibility to move relations from one tablespace to another, when either CLUSTER, VACUUM FULL or REINDEX is executed.

As this is about moving relations from one tablespace to another we obviously need at least two tablespaces to play with:

postgres=# \! mkdir /var/tmp/tbs1
postgres=# \! mkdir /var/tmp/tbs2
postgres=# create tablespace tbs1 location '/var/tmp/tbs1';
CREATE TABLESPACE
postgres=# create tablespace tbs2 location '/var/tmp/tbs2';
CREATE TABLESPACE
postgres=# \db
          List of tablespaces
    Name    |  Owner   |   Location    
------------+----------+---------------
 pg_default | postgres | 
 pg_global  | postgres | 
 tbs1       | postgres | /var/tmp/tbs1
 tbs2       | postgres | /var/tmp/tbs2
(4 rows)

Lets assume we have a table in the first tablespace and we face space pressure on that file system:

postgres=# create table t1 ( a int, b date ) tablespace tbs1;
CREATE TABLE
postgres=# insert into t1 select x, now() from generate_series(1,1000000) x;
INSERT 0 1000000

Without that patch there is not much you can do, except for this (which blocks for the duration of the operation):

postgres=# alter table t1 set tablespace tbs2;
ALTER TABLE
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | text    |           |          | 
 c      | date    |           |          | 
Tablespace: "tbs2"

This will move the files of that table to the new tablespace (but not the indexes). If you really want to get back the space on disk with “vacuum full” you can now do that:

postgres=# vacuum (tablespace tbs1, full true)  t1;
VACUUM
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | date    |           |          | 
Tablespace: "tbs1"

The very same is possible with reindex:

postgres=# create index i1 on t1 (a);
CREATE INDEX
postgres=# reindex (tablespace tbs2) index i1;
REINDEX

… and cluster:

postgres=# cluster (tablespace tbs1, index_tablespace tbs1) t1 using i1;
CLUSTER
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | date    |           |          | 
Indexes:
    "i1" btree (a) CLUSTER, tablespace "tbs1"
Tablespace: "tbs1"

postgres=# 

Nice.

Cet article PostgreSQL 14: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly est apparu en premier sur Blog dbi services.


Documentum – Custom facets not showing up after full reindex?

$
0
0

Beginning of the year, while performing a migration from a Documentum 7.3 environment on VM to Documentum 16.4 on Kubernetes, a customer had an issue where their custom facets weren’t showing up on D2 after a full reindex. At the end of the migration, since xPlore has been upgraded as well (from xPlore 1.5 to 16.4, from VM to K8s), then a full reindex has been executed so that all the documents are indexed. In this case, it was several millions of documents that were indexed and it took a few days. Unfortunately, at the end of the full reindex, the customer saw that the facets weren’t working…

Why is that exactly? Well, while configuring custom facets, you will need to add subpath configuration for the facet computing and that is a schema change inside the index. Each and every schema change requires at the very least an online rebuild of the index so that the change of the schema is propagated into each and every node of the index. Unless you are doing this online rebuild, the xPlore index schema will NOT be refreshed and the indexing of documents will therefore use the old schema. In case you are wondering what is the “online rebuild” I’m talking about, it’s the action behind the button “Rebuild Index” that you can find in the Dsearch Admin UI under “Home >> Data Management >> <DOMAIN_NAME> (usually Repo name) >> <COLLECTION_NAME> (e.g.: default or Node1_CPS1 or Node4_CPS2 …)“:

This action will not index any new content, it will however create a new index based on the refreshed schema and then copy all the nodes from the current index to the new one. At the end, it will replace the current index with the new one and this can be done online without downtime. This button was initially present for both Data collections (where your documents are) as well as ApplicationInfo collections (ACLs, Groups). However in recent versions of xPlore (at least since 16.4), the feature has been removed for the ApplicationInfo collections.

 

So, what is the minimum required to configure custom facets? The answer is that it depends… :). Here are some examples:

  • If the xPlore has never been started, the index doesn’t exist yet and therefore configuring the facets inside the indexserverconfig.xml file would take effect immediately at the first startup. In this case, an online rebuild wouldn’t even be needed. However, it might not always be easy to modify the indexserverconfig.xml file before xPlore even starts; it depends on how you are deploying the components…
  • If the xPlore has been started at least once but indexing hasn’t started yet (0 content inside the Data collections), then you can just login to the Dsearch Admin UI and perform the online rebuild on the empty collections. This will be almost instantaneous so you will most probably not even see it happen though.
    • If this is a new environment, then make sure the IndexAgent is started in normal mode after that so that it will process incoming indexing requests and that’s it
    • If this is an existing environment, then you will need to execute a full reindex operation using your preferred choice (IndexAgent full reindex action, through some select queries, through the ids.txt)
  • If the xPlore has been started at least once and the indexing has been completed, then you will need to perform the online rebuild as well. However, this time, it will take probably quite some time because as I mentioned earlier, it needs to copy all the indexed nodes to a new index. This process is normally faster than a full reindex because it’s only xPlore internal communications, because it only duplicates the existing index (and applied schema change) and because there is no exchange with the Content Server. Once the online rebuild has been performed, then the facets should be available.

 

Even if an online rebuild is faster than a full reindex, based on the size of the index, it might still take from hours to days to complete. It is therefore quite important to plan this properly in advance in case of migration or upgrade so that you can start with an online rebuild on an empty index (therefore instantaneously done) and then perform the needed full reindex after, instead of the opposite. This might save you several days of pain with your users and considerably reduce the load on the Dsearch/CPS.

This behavior wasn’t really well documented before. I had some exchange with OpenText on this topic and they created the KB15765485 based on these exchanges and also based on what is described in this blog. I’m not sure if that is really better now but at least there is a little bit more information.

 

Cet article Documentum – Custom facets not showing up after full reindex? est apparu en premier sur Blog dbi services.

Documentum – xPlore online rebuild stopped because of “immense term”

$
0
0

In relation to my previous blog about custom facets not showing up after full reindex, a customer was doing a migration that just completed. After the full reindex, there were no facets because of what I explained in the blog. Knowing that the online rebuild is normally faster than a full reindex, I helped to start this operation but after a little bit more than a day of processing, it failed on a document. The online rebuild operation is something really useful on xPlore and it’s something that I found pretty robust since it usually works quite well.

The online rebuild stopped with the following error on the dsearch.log:

2020-01-21 17:53:44,853 WARN [Index-Rebuilder-default-0-Worker-0] c.e.d.c.f.indexserver.core.index.plugin.CPSPlugin - Content Processing Service failed for [090f1234800d647e] with error code [7] and message [Communication error while processing req 090f1234800d647e]
2020-01-21 17:53:45,758 WARN [Index-Rebuilder-default-0] c.e.d.c.f.i.core.collection.FtReindexTask - Reindex for index default.dmftdoc failed
com.emc.documentum.core.fulltext.common.exception.IndexServerException: java.lang.IllegalArgumentException: Document contains at least one immense term in field="<>/dmftcontents<0>/dmftcontent<0>/ tkn" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[109, 97, 115, 116, 101, 114, 102, 105, 108, 101, 32, 112, 115, 117, 114, 32, 99, 97, 115, 101, 32, 114, 101, 118, 105, 101, 119, 32, 32, 32]...', original message: bytes can be at most 32766 in length; got 39938386
	at com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.recreatePathIndexNB(ESSCollection.java:3391)
	at com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.reindexNB(ESSCollection.java:1360)
	at com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.reindex(ESSCollection.java:1249)
	at com.emc.documentum.core.fulltext.indexserver.core.collection.FtReindexTask.run(FtReindexTask.java:204)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="<>/dmftcontents<0>/dmftcontent<0>/ tkn" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[109, 97, 115, 116, 101, 114, 102, 105, 108, 101, 32, 112, 115, 117, 114, 32, 99, 97, 115, 101, 32, 114, 101, 118, 105, 101, 119, 32, 32, 32]...', original message: bytes can be at most 32766 in length; got 39938386
	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
	at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359)
	at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)
	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:465)
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1526)
	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1252)
	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1234)
	at com.xhive.xDB_10_7_r4498571.xo.addEntry(xdb:156)
	at com.xhive.xDB_10_7_r4498571.qo.a(xdb:194)
	at com.xhive.xDB_10_7_r4498571.qo.a(xdb:187)
	at com.xhive.core.index.ExternalIndex.add(xdb:368)
	at com.xhive.core.index.XhiveIndex.a(xdb:321)
	at com.xhive.core.index.XhiveIndex.a(xdb:330)
	at com.xhive.xDB_10_7_r4498571.eq$b$1.a(xdb:142)
	at com.xhive.xDB_10_7_r4498571.bo$a.a(xdb:58)
	at com.xhive.xDB_10_7_r4498571.bo$f.a(xdb:86)
	at com.xhive.xDB_10_7_r4498571.eq$b.a(xdb:126)
	at com.xhive.core.index.PathValueIndexModifier.a(xdb:335)
	at com.xhive.core.index.PathValueIndexModifier.b(xdb:291)
	at com.xhive.core.index.PathValueIndexModifier.a(xdb:279)
	at com.xhive.core.index.PathValueIndexModifier.d(xdb:514)
	at com.xhive.core.index.PathValueIndexModifier.a(xdb:456)
	at com.xhive.core.index.PathValueIndexModifier.a(xdb:435)
	at com.xhive.core.index.PathValueIndexModifier.a(xdb:414)
	at com.xhive.core.index.PathValueIndexModifier.b(xdb:403)
	at com.xhive.core.index.PathValueIndexModifier.a(xdb:397)
	at com.xhive.xDB_10_7_r4498571.ca.a(xdb:666)
	at com.xhive.xDB_10_7_r4498571.ca.a(xdb:504)
	at com.xhive.xDB_10_7_r4498571.ca.a(xdb:494)
	at com.xhive.xDB_10_7_r4498571.ca.a(xdb:362)
	at com.xhive.xDB_10_7_r4498571.ca.a(xdb:213)
	at com.xhive.xDB_10_7_r4498571.ca.a(xdb:179)
	at com.xhive.core.index.XhiveIndexInConstruction.indexNext(xdb:199)
	at com.emc.documentum.core.fulltext.indexserver.core.collection.ESSCollection.reindexByWorker(ESSCollection.java:3538)
	at com.emc.documentum.core.fulltext.indexserver.core.collection.FtReindexTask$ReindexWorker.run(FtReindexTask.java:91)
	... 1 common frames omitted
Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 39938386
	at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)
	at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:151)
	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:663)
	... 36 common frames omitted

 

I don’t remember seeing this error before related to Documentum but I did see something similar on another Lucene based engine and as you can see in the exception stack, it seems to be linked to Lucene indeed… Anyway, I tried to start again the online rebuild but it failed on the exact same document. I wasn’t sure if this was a document issue or some kind of bug in xPlore so I opened the SR#4481792 and in the meantime did some checks. On the current index, I could display the dmftxml content of any random documents in less than a second, except for this specific document where it was just loading forever. Since the availability of the facets was rather time sensitive, I removed this specific document from the index using the “deleteDocs.sh” script and started again the online rebuild… However, it failed again on a second document.

The error above was happening for at least two documents but it might have been much more. Trials and errors by deleting impacted documents and restarting the online rebuild could have taken ages potentially. I was certain that the full reindex would complete for the millions of documents in a couple days because it happened just before. Therefore, instead of continuing to perform the online rebuild, which could have failed dozens of times on wrong documents, I choose another approach:

  • Delete the Data collections containing the indexed documents
    • Navigate to: Home >> Data Management >> <DOMAIN_NAME> (usually Repo name)
    • Delete the collection(s) with Category=dftxml and Usage=Data using the Red Cross on the right side of the table
  • Re-create the needed collections with the same parameters
    • Still under: Home >> Data Management >> <DOMAIN_NAME> (usually Repo name)
    • Click on: New Collection
    • Set the Name to: <COLLECTION_NAME> (e.g.: default or Node1_CPS1 or Node4_CPS2 …)
    • Set the Usage to: Data
    • Set the Document Category to: dftxml
    • Set the Binding Instance to the Dsearch which should be used, probably PrimaryDsearch
    • Select the correct location to use. If you select the “Same location as domain”, it will put the new collection as usual on your domain data folder. If you want to use another location, select the checkbox and pick the correct one: in this case, you must have already created in advance the needed storage location (“Home >> System Overview >> Global Configuration >> Storage Location“)
  • Perform the online rebuild (as mentioned above) on the empty collections (instantaneous)
  • Perform the full reindex

Doing the above will remove all indexed documents, meaning that searches will not return anything anymore, which is worse that just not having facets from a user’s perspective. However, it was just before the week-end so it was fine in this case for the end-users and at least this completely solved the issue and the facets were available on the next Monday morning. With the full reindex logs and some smart processing (I tried to give some example on this blog), I could find the list of all documents that had the above issue… In the end, it was really a document content issue and nothing related to xPlore. As mentioned on the previous blog, I had some exchange with OpenText on this topic and they created the KB15765485 based on these exchanges. It’s not exactly the procedure that I applied since I did it on the Dsearch Admin UI but the result should be the same to cleanup the index. As one would say, all roads lead to Rome… 😉

 

Cet article Documentum – xPlore online rebuild stopped because of “immense term” est apparu en premier sur Blog dbi services.

Documentum – IndexAgent can’t start in normal mode

$
0
0

Everybody familiar with Documentum knows that just starting the JBoss/WildFly hosting an IndexAgent isn’t really enough to have the indexing working: the IndexAgent must be started from the UI (or via DA or via the job or via iapi or automatically via the Repository startup or …). Starting the IA in “normal mode” is usually something that takes a few seconds. I faced a few times an IA that apparently didn’t want to start: whenever the request was submitted, it would just try but never succeed. In this blog, I will try to explain why it happens and what can be done to restore it.

When an IndexAgent start, it will do a few things like setup the filters/exclusions, it will check all the parameters configured and finally it will communicate with the Repository to do cleanup. The step that is most probably causing this “issue” is the last one. What happen is that whenever the IndexAgent is running, it will consumes documents for indexing. During this process, it will mark some of the items in the dmi_queue_item table as taken into account. However, if the xPlore Server is stopped during the processing of these items, it might not be fully completed and therefore, there are still tasks in progress that were cancelled. To avoid non-indexed documents, the very first task of the IndexAgent, even before it is marked as started in normal mode, is therefore to reinitialize the status of these items by putting them back into the queue to process. The IndexAgent will never be marked as running if this doesn’t complete and this is what happen whenever you are facing this issue about an IndexAgent just stuck in the start process.

To see the details of the start process of an IndexAgent, you can just look into its log file whenever you submit the request. This is an example of a “working” startup:

2020-11-13 14:29:29,765 INFO FtIndexAgent [http--0.0.0.0-9202-3]DM_INDEX_AGENT_START
2020-11-13 14:29:29,808 INFO Context [http--0.0.0.0-9202-3]Filter cabinets_to_exclude value: Temp, System, Resources,
2020-11-13 14:29:29,808 INFO Context [http--0.0.0.0-9202-3]Filter types_to_exclude value: dmi_expr_code, dmc_jar, dm_method, dm_activity, dmc_module, dmc_aspect_type, dm_registered, dm_validation_descriptor, dm_location, dmc_java_library, dm_public_key_certificate, dm_client_registration, dm_procedure, dmc_dar, dm_process, dmc_tcf_activity_template, dm_ftwatermark, dmc_wfsd_type_info, dm_menu_system, dm_plugin, dm_script, dmc_preset_package, dm_acs_config, dm_business_pro, dm_client_rights, dm_cont_transfer_config, dm_cryptographic_key, dm_docbase_config, dm_esign_template, dm_format_preferences, dm_ftengine_config, dm_ftfilter_config, dm_ftindex_agent_config, dm_jms_config, dm_job, dm_mount_point, dm_outputdevice, dm_server_config, dm_xml_application, dm_xml_config, dm_ftquery_subscription, dm_smart_list,
2020-11-13 14:29:29,808 INFO Context [http--0.0.0.0-9202-3]Filter folders_to_exclude value: /Temp/Jobs, /System/Sysadmin/Reports, /System/Sysadmin/Jobs,
2020-11-13 14:29:29,811 INFO AgentInfo [http--0.0.0.0-9202-3]Start
Documentum Index Agent 1.5.0170.0173
Java Version                    1.7.0_72
DFC Version                     7.2.0170.0165
DMCL Version                    7.2.0170.0165
Docbase (Repo01)                7.2.0160.0297  Linux64.Oracle

Start Configuration Information
 Instance
  indexagent_instance_name(AgentInstanceName)=xplore_server01_9200_IndexAgent
  docbase_name(DocbaseName)=Repo01
  docbase_user(DocbaseUser)=
  docbase_domain(DocbaseDomain)=
  runaway_item_timeout(RunawayItemTimeout)=600000
  runaway_thread_timeout(RunawayThreadTimeout)=600000
  parameter_list(InstanceOptionalParams)
 Status
  frequency(StatusFrequency)=5000
  history_size(StatusHistorySize)=20
 Connectors
  class_name(ClassName)=com.documentum.server.impl.fulltext.indexagent.connector.DocbaseNormalModeConnector
  parameter_list(Options)
   parameter=save_queue_items, value=false
   parameter=queue_user, value=dm_fulltext_index_user
   parameter=wait_time, value=60000
   parameter=batch_size, value=1000
  class_name(ClassName)=com.documentum.server.impl.fulltext.indexagent.connector.FileConnector
  parameter_list(Options)
   parameter=wait_time, value=2000
   parameter=batch_size, value=100
   parameter=file_name, value=ids.txt
 Exporter
  queue_size(PrepQSize)=250
  queue_low_percent(PrepQLowPercentage)=90
  wait_time(PrepWaitTime)=100
  thread_count(PrepWorkers)=2
  shutdown_timeout(PrepShutdownTimeout)=60000
  runaway_timeout(RunawayItemTimeout)=600000
  all_filestores_local(areAll_filestores_local)=false
  local_content_area(LocalContentArea)=/data/primary/Indexagent_Repo01/export
  local_filestore_map(LocalFileStoreMap)
  local_content_remote_mount(LocalContentRemoteMount)=null
  content_clean_interval(ContentCleanInterval)=2000000
  keep_dftxml(KeepDftxml)=false
  parameter_list(PrepOptionalParameters)=
   parameter=contentSizeLimit, value=367001600
 Indexer
  queue_size(IndexQSize)=500
  queue_low_percent(IndexQLowPercentage)=90
  queue_size(CallbackQSize)=200
  queue_low_percent(CallbackQLowPercentage)=90
  wait_time(IndexWaitTime)=100
  thread_count(IndexWorkers)=1
  shutdown_timeout(IndexShutdownTimeout)=60000
  runaway_timeout(IndexRunawayTimeout)60000
  partition_config
   default_partition collection_name(DefaultCollection)=null
  partitions(PartitionMap)
 Indexer Plugin Config
  class_name(IndexerClassName)=com.documentum.server.impl.fulltext.indexagent.plugins.enterprisesearch.DSearchFTPlugin
  parameter_list(IndexerParams)
   parameter=dsearch_qrserver_host, value=lb_xplore_server.domain.com
   parameter=query_plugin_mapping_file, value=/app/dctm/server/fulltext/dsearch/dm_AttributeMapping.xml
   parameter=max_tries, value=2
   parameter=max_pending_requests, value=10000
   parameter=load_balancer_enabled, value=true
   parameter=dsearch_qrserver_protocol, value=HTTPS
   parameter=dsearch_qrygen_mode, value=both
   parameter=security_mode, value=BROWSE
   parameter=max_requests_in_batch, value=10
   parameter=dsearch_qrserver_port, value=9302
   parameter=dsearch_config_port, value=9302
   parameter=dsearch_config_host, value=xplore_server01.domain.com
   parameter=max_batch_wait_msec, value=1000
   parameter=dsearch_qrserver_target, value=/dsearch/IndexServerServlet
   parameter=dsearch_domain, value=Repo01
   parameter=group_attributes_exclude_list, value=i_all_users_names
End Configuration Information

2020-11-13 14:29:29,828 INFO ObjectFilter [http--0.0.0.0-9202-3][DM_INDEX_AGENT_CUSTOM_FILTER_INFO] running DQL query: select primary_class from dmc_module where any a_interfaces = 'com.documentum.fc.indexagent.IDfCustomIndexFilter'
2020-11-13 14:29:29,833 INFO ObjectFilter [http--0.0.0.0-9202-3][DM_INDEX_AGENT_CUSTOM_FILTER_INFO] instantiated filter: com.documentum.services.message.impl.type.MailMessageChildFilter
2020-11-13 14:29:29,834 INFO ObjectFilter [http--0.0.0.0-9202-3][DM_INDEX_AGENT_CUSTOM_FILTER_INFO] instantiated filter: com.documentum.services.message.impl.type.MailMessageChildFilter
2020-11-13 14:29:29,834 INFO ObjectFilter [http--0.0.0.0-9202-3][DM_INDEX_AGENT_CUSTOM_FILTER_INFO] instantiated filter: com.documentum.server.impl.fulltext.indexagent.filter.defaultCabinetFilterAction
2020-11-13 14:29:29,834 INFO ObjectFilter [http--0.0.0.0-9202-3][DM_INDEX_AGENT_CUSTOM_FILTER_INFO] instantiated filter: com.documentum.server.impl.fulltext.indexagent.filter.defaultFolderFilterAction
2020-11-13 14:29:29,834 INFO ObjectFilter [http--0.0.0.0-9202-3][DM_INDEX_AGENT_CUSTOM_FILTER_INFO] instantiated filter: com.documentum.server.impl.fulltext.indexagent.filter.defaultTypeFilterAction
2020-11-13 14:29:29,869 INFO defaultFilters [http--0.0.0.0-9202-3]Populated cabinet cache for filter CabinetsToExclude with count 3
2020-11-13 14:29:30,462 INFO defaultFilters [http--0.0.0.0-9202-3]Populated folder id cache for filter FoldersToExclude with count 140
2020-11-13 14:29:30,488 INFO DocbaseNormalModeConnector [http--0.0.0.0-9202-3][DM_INDEX_AGENT_QUERY_BEGIN] update dmi_queue_item objects set task_state = ' ', set sign_off_user = ' ', set dequeued_by = ' ', set message = ' ' where name = 'dm_fulltext_index_user' and task_state = 'acquired' and sign_off_user = 'xplore_server01_9200_IndexAgent'
2020-11-13 14:29:30,488 INFO DocbaseNormalModeConnector [http--0.0.0.0-9202-3][DM_INDEX_AGENT_QUERY_UPDATE_COUNT] 0
2020-11-13 14:29:30,489 INFO ESSIndexer [http--0.0.0.0-9202-3][DM_INDEX_AGENT_PLUGIN] DSS Server host: xplore_server01.domain.com
2020-11-13 14:29:30,489 INFO ESSIndexer [http--0.0.0.0-9202-3][DM_INDEX_AGENT_PLUGIN] DSS Server protocol: HTTPS
2020-11-13 14:29:30,489 INFO ESSIndexer [http--0.0.0.0-9202-3][DM_INDEX_AGENT_PLUGIN] DSS Server port: 9302
2020-11-13 14:29:30,489 INFO ESSIndexer [http--0.0.0.0-9202-3][DM_INDEX_AGENT_PLUGIN] DSS Server domain: Repo01
2020-11-13 14:29:30,502 INFO ESSIndexer [http--0.0.0.0-9202-3][DM_INDEX_AGENT_PLUGIN] Index Server Status: normal

 

When this issue occurs, the lines 92 and above will not appear. As you can see, the DQL query executed is actually recorded in the log as well as the number of items updated. The “issue” is that if there are too many items that would match the WHERE clause (acquired items), this query could take hours to complete (if at all) and therefore, it would appear as if the start isn’t working. Because of how DQL works, this kind of query on thousands of objects or more will be very DB intensive and that would introduce a big performance hit.

How is it possible to end-up with hundreds of thousand or even millions of acquired items you may think? Well each time it happened to me, it was in relation to some huge batches or jobs running that would update millions of items or during big migrations/imports of objects. As you know, the events that have been registered in the dmi_registry table will trigger the creation of a new entry in the dmi_queue_item table. Therefore, whenever you are importing a lot of documents for example, it is highly recommended to carefully manage the index table because it can cause huge performance issues since it is used a lot inside Documentum for various purposes. This is especially true whenever Lifecycles are in the picture because then processes (like ApplyD2Config) will generate a lot of dm_save events per documents and therefore duplicates in the table. I won’t go into these details in this blog but in short, you can chose to remove the events from the dmi_registry during the import and put them back afterwards, manually indexing the imported documents at the end or do manual cleanups of the dmi_queue_item table during the process. Unfortunately, if you aren’t aware that a huge migration takes places for example, then the situation can quickly become complicated with millions and millions of items. Last time I saw something similar happening, it was an import started “in secret” before the weekend and filling the dmi_queue_item table. The IndexAgent was initially started and therefore it processed them but it wasn’t fast enough. On the Monday morning, we had the pleasant surprise to see around 6 million of acquired items and 9 more million of awaiting….

I think (to be confirmed) the behavior changed in more recent versions but this environment was using xPlore 1.5 and here, the IndexAgent might pull batches of documents for processing, even if there are still already a lot in process. The xPlore Servers (a Federation) weren’t sleeping at all since they actually processed millions of items already but there were just too many to handle and unfortunately, the IA kind of entered a dead end where updating the dmi_queue_item table would just be too long for the processing to be effective again. I didn’t try to restart the IndexAgent because I knew it would never complete but I thought this might make an interesting blog post. There is probably a KB on the OpenText site describing that since it is rather well known.

As you might expect, triggering a DQL query supposed to update 6 million rows on a table that contains at the very least three times that isn’t gonna happen. So what can be done then to restore the system performance and to allow the IndexAgent to restart properly? DQL isn’t very good for processing of huge batches and therefore, your best bet would be to go to the Database directly to avoid the Documentum overhead. Instead of executing one single SQL command to update the 6 million of items, you should also split it in smaller batches by adding a WHERE clause on the date for example. That would help tremendously and that’s not something that the IndexAgent can do by itself because it has no idea of when things started to go south… So then, which kind of command should be executed? In this case, I wouldn’t recommend to do what the IndexAgent is doing. If you are simply resetting the status from acquired to awaiting, sure the IndexAgent will be able to start but it will still have 6+9 million items awaiting for processing and therefore, you still have bad performance and you have a pretty high probability that the number of acquired will rise again… Therefore, the only reasonable choice is to export all distinct items from the dmi_queue_item table and then clean/remove all FT items. With some luck, you might have 5 or 10 duplicates for each document so instead of indexing 15 million, it would just be 1 or 2 million (distinct).

An example of SQL command to cleanup all the items on a 1 hour timeframe would be for Oracle (I would suggest to make sure the IA isn’t running when messing with the table):

DELETE dmi_queue_item_s
WHERE name='dm_fulltext_index_user'
  AND delete_flag=0
  AND date_sent>=to_date('2020-06-28 22:00:00','YYYY-MM-DD HH24:MI:SS')
  AND date_sent<to_date('2020-06-28 23:00:00','YYYY-MM-DD HH24:MI:SS');
commit;

 

This cleanup can be done online without issue, just make sure you take an export of all distinct item_id to re-index afterwards, otherwise you will have to execute the FT Integrity utility to find the missing documents in the index. With parallel execution on several DB sessions, the cleanup can actually be done rather quickly and then it’s just background processing for the index via the ids.txt for example.

 

Cet article Documentum – IndexAgent can’t start in normal mode est apparu en premier sur Blog dbi services.

How to declare TNS entries in Oracle Unified Directory (OUD)

$
0
0

Using a LDAP server to store the TNS connection strings can be a single point of declarations for all client tools. Note that it can be a single point of failure too, thus, a High Availability LDAP configuration is recommended for production use. In my case, I was interested in using a LDAP as TNS connections repository for the WebLogic Domains Data-Source connections. I used a Oracle Unified Directory (OUD).

The first step is to enable the Oracle Database Net Services in OUD. I choose to create a new Naming Context to isolate the TNS declarations from the users and groups.

Connect to the OUD
Connect to the OUD instance

Create a new Naming Context in the configuration TAB. I choose “dc=databaseconnextion,dc=com” as new naming context
Creating a new Naming Context

Enable this new Naming Context to store Oracle Databases net Services
Enabling the New Naming Context for Database Net Services

Move to the OUD data Browser and Select the OracleContext entry in the dc=databaseConnect,dc=com Naming Context created above.
OUD data browser This OrcaleContext entry has been created automatically and at the same time, some policies were created to allow queries in this OracleContext to anonymous users.

Create a TNS entry to point to the DB. This is done creating a new entry and selecting the orclNetService object class in the first wizard.
Create New TNS entry
New TNS Entry: give a name
New TNS entry: fill up the attributes
New TNS entry: select the Attribute used in the DN
New TNS Entry: Summary
New TNS Entry: The ORCL entry once created.

Test if the TNS connection resolution is reachable using a LDAPSEARCH request:

[oracle@vm02 ~]$ /u00/app/oracle/product/12.1.0/dbhome_1/bin/ldapsearch -h vm01 -p 1389 -b dc=DatabaseConnection,dc=com cn=orcl
cn=orcl,cn=OracleContext,dc=databaseconnection,dc=com
orclNetDescString=(DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)   (HOST = vm02.dbi-workshop.com)(PORT =1521))) (CONNECT_DATA = (SID = ORCL)))
orclVersion=12.2.0.4
cn=orcl
objectClass=top
objectClass=orclNetService
orclNetDescName=ORCL Demonstration DB

Of course once the new Naming Context has been created and the Oracle Net Services enabled in it, the same TNS entry can be created using a ldif file.

[oracle@vm02 ~]$ more orcl.ldif
dn: cn=orcl,cn=OracleContext,dc=databaseconnection,dc=com
objectClass: top
objectClass: orclNetService
orclNetDescString: (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)   (HOST = vm02.dbi-workshop.com)(PORT =1521))) (CONNECT_DATA = (SID = ORCL)))
orclVersion: 12.2.0.4
cn: orcl
orclNetDescName: ORCL Demonstration DB
[oracle@vm02 ~]$ ldapadd -h vm01 -p 1389 -D "cn=Directory Manager" -w ****** -f orcl.ldif
adding new entry cn=orcl,cn=OracleContext,dc=databaseconnection,dc=com

[oracle@vm02 ~]$

Now that we have our LDAP server configured to store TNS connections, the following blog will explain how to configure a WebLogic JDBC Datasource to resolve the database connection using an LDAP definition.

Cet article How to declare TNS entries in Oracle Unified Directory (OUD) est apparu en premier sur Blog dbi services.

Using LDAP resolved connection descriptor in WebLogic JDBC data sources.

$
0
0

I got the question if it is possible to have a the WebLogic JDBC Data Source to be resolved by an LDAP entry. The answer is yes; since WebLogic 12.2.1.3 a JDBC connection pool URL declaration can point to an LDAP entry.

This can be done by simply editing an existing JDBC data source.

jdbc:oracle:thin:@ldap://vm01.dbi-workshop.com:1389/cn=orcl,cn=OracleContext,dc=DatabaseConnection,dc=com

WebLogic_JDBC_Connection_pool

Of course the LDAP server needs to have been configured to store the TNS entries. I explained how to do this in one of my previous blogs: how-to-declare-tns-entries-in-oracle-unified-directory-oud
Or it can be done using WLST scripts.
First a properties file

 DS_NAME=MyLDAPDatasource
JNDIName=jdbc/MyLDAPDatasource
TEST_TABLE_QUERY=SQL SELECT 1 FROM DUAL
JDBC_DRIVER=oracle.jdbc.OracleDriver
TARGET=Server1
JDBC_URL=jdbc:oracle:thin:@ldap://vm01.dbi-workshop.com:1389/cn=orcl,cn=OracleContext,dc=DatabaseConnection,dc=com
DB_USER=USER01
DB_USER_PASSWORD=Welcome1
POOL_MIN_CAPACITY=10
POOL_MAX_CAPACITY=100
POOL_INITIAL_CAPACITY=1
POOL_STATEMENT_CACHE=10           
XA_TRANSACTION_TIMEOUT=7200
XA_RETRY_INTERVAL_SECONDS=60
XA_RETRY_DURATION_SECONDS=300
JDBC_DEBUG_LEVEL=10

and then the python script

#read the domain properties file
try:
  print "Load properties file"
  properties =  os.environ["WEBLOGIC_DOMAIN_DEF_DIR"] + "/" + os.environ["WEBLOGIC_DOMAIN"] + "/domain.properties"
  print properties
  loadProperties(properties)
except :
  print "unable to load domain.properties file"
#  exit(exitcode=1)

try:
  jdbcProperties=os.path.realpath(os.path.dirname(sys.argv[0])) + "/JDBC_Datasource.properties"
  print jdbcProperties
  loadProperties(jdbcProperties)

except :
  print "Unable to load JDBC_Camunda.properties"
  exit(exitcode=1)

#AdminUser=raw_input('Please Enter WebLogic Domain Admin user Name: ')
#AdminPassword= "".join(java.lang.System.console().readPassword("%s", ['Please enter WebLogic Domain Admin user password:']))


try:
    #Connect to AdminServer
    connect(userConfigFile=CONFIG_FILE,userKeyFile=KEY_FILE,url=ADMIN_URL)
    #connect(url=ADMIN_URL)
    #connect(AdminUser,AdminPassword,ADMIN_URL)
    #connect()
    #connect('weblogic','Welcome1')
except:
    print "Unable to connect"
    exit(exitcode=1)
	
try: 
    edit()
    startEdit()

    cd('/')
    cmo.createJDBCSystemResource(DS_NAME)
    cd('/JDBCSystemResources/'+DS_NAME+'/JDBCResource/'+DS_NAME)
    cmo.setName(DS_NAME)
    cd('/JDBCSystemResources/'+DS_NAME+'/JDBCResource/'+DS_NAME+'/JDBCDataSourceParams/'+DS_NAME)

    print "Setting JNDI Names"
    set('JNDINames',jarray.array([String(JNDIName)], String))
    cd('/JDBCSystemResources/'+DS_NAME+'/JDBCResource/'+DS_NAME)
    cmo.setDatasourceType('GENERIC')
    cd('/JDBCSystemResources/'+DS_NAME+'/JDBCResource/'+DS_NAME+'/JDBCDriverParams/'+DS_NAME)

    print "Setting JDBC URL"
    cmo.setUrl(JDBC_URL)

    print "Setting Driver Name"
    cmo.setDriverName(JDBC_DRIVER)
    
    print "Setting Password"
    set('Password', DB_USER_PASSWORD)
    cd('/JDBCSystemResources/'+DS_NAME+'/JDBCResource/'+DS_NAME+'/JDBCConnectionPoolParams/'+DS_NAME)
    cmo.setTestTableName(TEST_TABLE_QUERY)
    cd('/JDBCSystemResources/'+DS_NAME+'/JDBCResource/'+DS_NAME+'/JDBCDriverParams/'+DS_NAME+'/Properties/'+DS_NAME)
    cmo.createProperty('user')
    cd('/JDBCSystemResources/'+DS_NAME+'/JDBCResource/'+DS_NAME+'/JDBCDriverParams/'+DS_NAME+'/Properties/'+DS_NAME+'/Properties/user')
    cmo.setValue(DB_USER)

    cd('/JDBCSystemResources/'+DS_NAME)
    set('Targets',jarray.array([ObjectName('com.bea:Name='+TARGET+',Type=Server')], ObjectName))
   
    print "Saving and activating changes"
    save()    
    activate()

except Exception, e:
    dumpStack()
    print "ERROR 2... check error messages for cause."
    print e
    dumpStack()
    stopEdit(defaultAnswer='y')
    exit(exitcode=1)
	
try: 
    edit()
    startEdit()

    cd('/JDBCSystemResources/'+DS_NAME+'/JDBCResource/'+DS_NAME+'/JDBCDriverParams/'+DS_NAME+'/Properties/'+DS_NAME+'/Properties/user')
    cmo.unSet('SysPropValue')
    cmo.unSet('EncryptedValue')
    cmo.setValue(DB_USER)
    cd('/JDBCSystemResources/'+DS_NAME+'/JDBCResource/'+DS_NAME+'/JDBCConnectionPoolParams/'+DS_NAME)
    cmo.setInitialCapacity(long(POOL_INITIAL_CAPACITY))
    cmo.setMinCapacity(long(POOL_MIN_CAPACITY))
    cmo.setStatementCacheSize(long(POOL_STATEMENT_CACHE))
    cmo.setMaxCapacity(long(POOL_MAX_CAPACITY))
    cmo.setStatementCacheType('LRU')
    cd('/JDBCSystemResources/'+DS_NAME+'/JDBCResource/'+DS_NAME+'/JDBCDataSourceParams/'+DS_NAME)
    cmo.setGlobalTransactionsProtocol('OnePhaseCommit')
       
    save()
    activate()

except Exception, e:
    print "ERROR... check error messages for cause."
    print e
    stopEdit(defaultAnswer='y')
    exit(exitcode=1)
	
exit(exitcode=0)

This script and properties file can be used to create the JDBC connection on one WebLogic Server defined as TARGET in the properties file.

Cet article Using LDAP resolved connection descriptor in WebLogic JDBC data sources. est apparu en premier sur Blog dbi services.

Documentum – D2 doesn’t load repositories with “Unexpected error occured”

$
0
0

I had a case today where all Documentum components were up and running, including D2 but while accessing its login page, the repositories wouldn’t appear and a message “An unexpected error occurred. Please refresh your browser” would pop-up in the lower-right corner and disappear quickly. Refreshing the browser or opening a private window wouldn’t do anything. In such cases, of course the first thing to do would be to make sure the docbroker and repositories are responding but if it is the case, then what could be the problem? The root cause of this can be several things I assume since it’s a rather generic behavior but I saw that a few times already and it might not be really obvious at first glance so sharing some thoughts about it might prove useful for someone.

Here is the login screen of D2 having the issue:

In my case, the repositories were apparently available on the Content Server and responding (connection through iapi/idql working). The next step would probably be to check the D2 logs with DEBUG enabled to make sure to capture as much as possible. This is what you would see in the logs whenever accessing the D2 login URL:

2020-11-29 11:12:36,434 UTC [DEBUG] ([ACTIVE] ExecuteThread: '37' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.x3.portal.server.utils.X3PortalJspUtils   : D2 full build version: 16.5.1050 build 096
2020-11-29 11:12:36,435 UTC [DEBUG] ([ACTIVE] ExecuteThread: '37' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.x3.portal.server.utils.X3PortalJspUtils   : patch version: 16.5.1050
2020-11-29 11:12:36,886 UTC [DEBUG] ([ACTIVE] ExecuteThread: '66' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.x.s.s.labels.X3ResourceBundleFactory      : getAllBundle for resources.i18n en
2020-11-29 11:12:36,924 UTC [DEBUG] ([ACTIVE] ExecuteThread: '99' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.x.p.s.s.settings.RpcSettingsServiceImpl   : Fetching Server properties
2020-11-29 11:12:36,940 UTC [DEBUG] ([ACTIVE] ExecuteThread: '21' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.x.p.s.s.settings.RpcSettingsServiceImpl   : Fetching Server shiro.ini
2020-11-29 11:12:36,942 UTC [DEBUG] ([ACTIVE] ExecuteThread: '55' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.x.p.s.s.settings.RpcSettingsServiceImpl   : Fetching Server adminMessage Settings
2020-11-29 11:12:36,978 UTC [DEBUG] ([ACTIVE] ExecuteThread: '84' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.x.s.s.labels.X3ResourceBundleFactory      : getAllBundle for resources.i18n en_US
2020-11-29 11:12:37,709 UTC [DEBUG] ([ACTIVE] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.common.dctm.objects.DfDocbaseMapEx    : Load docbases from docbrocker 0.623s
2020-11-29 11:12:37,711 UTC [INFO ] ([ACTIVE] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.d2fs.dctm.web.services.D2fsRepositories   : Loaded repositories from docbroker: GR_Repo,Repo1
2020-11-29 11:12:37,712 UTC [INFO ] ([ACTIVE] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.d2fs.dctm.web.services.D2fsRepositories   : loginRepositoryFilter=GR_Repo
2020-11-29 11:12:37,713 UTC [INFO ] ([ACTIVE] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.d2fs.dctm.web.services.D2fsRepositories   : Filtering out repository GR_Repo
2020-11-29 11:12:37,713 UTC [DEBUG] ([ACTIVE] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2.api.config.D2OptionsCache          : D2Info element not for in cache
2020-11-29 11:12:37,713 UTC [ERROR] ([ACTIVE] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2.api.config.D2OptionsCache          : Trying to fetch D2Info before it's been set
2020-11-29 11:12:37,815 UTC [DEBUG] ([ACTIVE] ExecuteThread: '51' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2.api.D2Session                      : D2Session::initTBOEx after tbos from map 0.000s
2020-11-29 11:12:37,815 UTC [DEBUG] ([ACTIVE] ExecuteThread: '51' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2.api.D2Session                      : D2Session::initTBOEx after tbos C6-dbor bundle 0.001s
2020-11-29 11:12:38,808 UTC [INFO ] ([ACTIVE] ExecuteThread: '27' for queue: 'weblogic.kernel.Default (self-tuning)') - c.emc.x3.portal.server.X3HttpSessionListener  : Created http session 3tGeYTFa9ChEQJP-V7GdMyQreCk3t7_BFfS3EixfHtTbO6qFtOg3!781893690!1606648358808
2020-11-29 11:12:38,809 UTC [DEBUG] ([ACTIVE] ExecuteThread: '27' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.x3.portal.server.utils.X3PortalJspUtils   : XSRF_TOKEN not found in session
2020-11-29 11:12:38,811 UTC [DEBUG] ([ACTIVE] ExecuteThread: '27' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.x3.portal.server.utils.X3PortalJspUtils   : D2 full build version: 16.5.1050 build 096
2020-11-29 11:12:38,811 UTC [DEBUG] ([ACTIVE] ExecuteThread: '27' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.x3.portal.server.utils.X3PortalJspUtils   : patch version: 16.5.1050

 

At first glance, the log content doesn’t look so strange, there are no obvious warnings or errors clearly showing the issue. As you can see, the list of repositories is present, it’s filtered properly so the drop-down should display something but it’s not. The only message that might give you some hint is the one error and its associated debug message just before about the D2OptionsCache: the D2Info elements aren’t in the cache while D2 is trying to use it. In this case, the only way to clearly see what is actually the issue would be to restart the Application Server of D2 to force the LoadOnStartup to be re-executed. Maybe this is only true if the LoadOnStartup is enabled. I didn’t test without but it might be worth to check whether D2 is able to refresh it at runtime in this case. After a restart of the Application Server, it becomes clear what the problem is:

2020-11-29 11:18:28,421 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - c.emc.d2fs.services.ServiceBeanPostProcessor  : Initialized Bean : d2fs
2020-11-29 11:18:28,426 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - c.emc.d2fs.services.ServiceBeanPostProcessor  : Initialized Bean : subscriptionsService
2020-11-29 11:18:28,427 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - c.emc.d2fs.services.ServiceBeanPostProcessor  : Service Bean is set to Remote
2020-11-29 11:18:28,431 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - c.emc.d2fs.services.ServiceBeanPostProcessor  : Initialized Bean : exceptionResolver
2020-11-29 11:18:28,433 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - c.emc.d2fs.services.ServiceBeanPostProcessor  : Initialized Bean : soapProvider
2020-11-29 11:18:28,503 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - c.emc.d2fs.dctm.servlets.init.LoadOnStartup   : DFC version : 16.4.0200.0080
2020-11-29 11:18:28,543 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.servlets.D2HttpServlet      : LoadOnStartup - START =====================================
2020-11-29 11:18:28,544 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.servlets.D2HttpServlet      : LoadOnStartup - HTTP Headers
Remote : null (null)
Locale : null
Request Protocol : null
Request Method : null
Context Path : /D2
Request URI : null
Request encoding : null
Request Parameters :
Request Headers :
2020-11-29 11:18:30,799 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.servlets.D2HttpServlet      : LoadOnStartup - Plugins (0.001s)
2020-11-29 11:18:30,803 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.servlets.D2HttpServlet      : LoadOnStartup - Start plugin before : D2-Widget v16.5.1050 build 096
2020-11-29 11:18:30,804 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.servlets.D2HttpServlet      : LoadOnStartup - End plugin before : D2-Widget v16.5.1050 build 096 0.000s
2020-11-29 11:18:30,806 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.servlets.D2HttpServlet      : LoadOnStartup - Standard Servlet :
2020-11-29 11:18:30,808 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - c.emc.d2fs.dctm.servlets.init.LoadOnStartup   : Cache BOCS URL disabled.
2020-11-29 11:18:40,865 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - c.emc.d2fs.dctm.servlets.init.LoadOnStartup   : Free memory=3.1386707 GB, Total memory=4.0 GB
2020-11-29 11:18:50,217 UTC [ERROR] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.servlets.D2HttpServlet      : LoadOnStartup - DfException:: THREAD: [ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)'; MSG: [DM_STORAGE_E_NOT_ACCESSIBLE]error:  "Storage area filestore_01 is not currently accessible.  Reason:  errno: 2, message: No such file or directory."; ERRORCODE: 100; NEXT: null
2020-11-29 11:18:50,220 UTC [ERROR] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.servlets.D2HttpServlet      : {}
com.documentum.fc.common.DfException: [DM_STORAGE_E_NOT_ACCESSIBLE]error:  "Storage area filestore_01 is not currently accessible.  Reason:  errno: 2, message: No such file or directory."
        at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:57)
        at com.documentum.fc.client.impl.connection.docbase.MessageEntry.getException(MessageEntry.java:39)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseMessageManager.getException(DocbaseMessageManager.java:137)
        at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.checkForMessages(NetwiseDocbaseRpcClient.java:329)
        at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.applyForInt(NetwiseDocbaseRpcClient.java:600)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection$6.evaluate(DocbaseConnection.java:1382)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.evaluateRpc(DocbaseConnection.java:1180)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.applyForInt(DocbaseConnection.java:1375)
        at com.documentum.fc.client.impl.docbase.DocbaseApi.makePuller(DocbaseApi.java:630)
        at com.documentum.fc.client.impl.connection.docbase.RawPuller.<init>(RawPuller.java:22)
        at com.documentum.fc.client.impl.session.Session.makePuller(Session.java:3796)
        at com.documentum.fc.client.impl.session.SessionHandle.makePuller(SessionHandle.java:2468)
        at com.documentum.fc.client.content.impl.BlockPuller.<init>(BlockPuller.java:27)
        at com.documentum.fc.client.content.impl.PusherPullerContentAccessor.buildStreamFromContext(PusherPullerContentAccessor.java:40)
        at com.documentum.fc.client.content.impl.PusherPullerContentAccessor.getStream(PusherPullerContentAccessor.java:28)
        at com.documentum.fc.client.content.impl.ContentAccessorFactory.getStream(ContentAccessorFactory.java:37)
        at com.documentum.fc.client.content.impl.Store.getStream(Store.java:64)
        at com.documentum.fc.client.content.impl.FileStore___PROXY.getStream(FileStore___PROXY.java)
        at com.documentum.fc.client.content.impl.Content.getStream(Content.java:185)
        at com.documentum.fc.client.content.impl.Content___PROXY.getStream(Content___PROXY.java)
        at com.documentum.fc.client.content.impl.ContentManager.getStream(ContentManager.java:84)
        at com.documentum.fc.client.content.impl.ContentManager.namelessGetFile(ContentManager.java:252)
        at com.documentum.fc.client.content.impl.ContentManager.getFile(ContentManager.java:198)
        at com.documentum.fc.client.content.impl.ContentManager.getFile(ContentManager.java:173)
        at com.documentum.fc.client.DfSysObject.getFileEx2(DfSysObject.java:1978)
        at com.documentum.fc.client.DfSysObject.getFileEx(DfSysObject.java:1970)
        at com.documentum.fc.client.DfSysObject.getFile(DfSysObject.java:1965)
        at com.emc.d2.api.config.modules.property.D2PropertyConfig___PROXY.getFile(D2PropertyConfig___PROXY.java)
        at com.emc.common.java.xml.XmlCacheValue.<init>(XmlCacheValue.java:63)
        at com.emc.common.java.xml.XmlCacheImpl.getXmlDocument(XmlCacheImpl.java:154)
        at com.emc.common.java.xml.XmlCacheImpl.getXmlDocument(XmlCacheImpl.java:182)
        at com.emc.d2fs.dctm.servlets.init.LoadOnStartup.loadXmlCache(LoadOnStartup.java:501)
        at com.emc.d2fs.dctm.servlets.init.LoadOnStartup.refreshCache(LoadOnStartup.java:424)
        at com.emc.d2fs.dctm.servlets.init.LoadOnStartup.processRequest(LoadOnStartup.java:208)
        at com.emc.d2fs.dctm.servlets.D2HttpServlet.execute(D2HttpServlet.java:244)
        at com.emc.d2fs.dctm.servlets.D2HttpServlet.doGetAndPost(D2HttpServlet.java:510)
        at com.emc.d2fs.dctm.servlets.D2HttpServlet.doGet(D2HttpServlet.java:113)
        at com.emc.d2fs.dctm.servlets.init.LoadOnStartup.init(LoadOnStartup.java:136)
        at javax.servlet.GenericServlet.init(GenericServlet.java:244)
		...
        at weblogic.work.ExecuteThread.execute(ExecuteThread.java:420)
        at weblogic.work.ExecuteThread.run(ExecuteThread.java:360)
2020-11-29 11:18:50,230 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.d.dctm.web.services.D2fsSessionManager    : Using non-sso shiro SSO filter with non-sso.enableDFCPrincipalMode=false
2020-11-29 11:18:50,231 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.d.dctm.web.services.D2fsSessionManager    : Not using DFC Principal Support
2020-11-29 11:18:50,232 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.servlets.D2HttpServlet      : LoadOnStartup - Free memory=2.5813167 GB. Total memory=4.0 GB.
2020-11-29 11:18:50,232 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - com.emc.d2fs.dctm.servlets.D2HttpServlet      : LoadOnStartup - END (21.726s) =====================================

2020-11-29 11:18:50,235 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.x3.portal.server.servlet.init.LogMemory   : D2SecurityConfiguration : Start
2020-11-29 11:18:50,235 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.x3.portal.server.servlet.init.LogMemory   : ServletContext: D2
2020-11-29 11:18:50,269 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.x3.portal.server.servlet.init.LogMemory   : D2SecurityConfiguration : End
2020-11-29 11:18:50,270 UTC [INFO ] ([ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)') - c.e.x3.portal.server.servlet.init.LogMemory   : Free memory=2.5780156 GB, Total memory=4.0 GB

 

So, as you can see above, the issue is actually linked to the Data of the repositories that weren’t available and it is only displayed during the LoadOnStartup execution, then it’s not showing-up anymore. Here, it was the NAS that was unreachable at that time and therefore D2 was impacted and nobody could login to D2. From my point of view, it’s a pity that D2 behaves this way… Even if the Data/Documents aren’t reachable, in a perfect world this shouldn’t prevent you from logging into the system and using it, except for actions involving the content of the documents of course. Browsing the repository, checking properties and some other stuff should work without issue but it’s not because of how Documentum is designed and how it works.

Because the LoadOnStartup actions are only executed at startup (if it is enabled), then it means that once the Data of the repositories are back, you will need to restart D2 again, otherwise the issue will remain. Therefore, if you have this issue and even if the Data are currently available, it might be worth to check whether it was available when D2 started. In addition to that, a restart of D2 never really hurts…

If you encountered this behavior of D2 with another root cause, feel free to share!

 

Cet article Documentum – D2 doesn’t load repositories with “Unexpected error occured” est apparu en premier sur Blog dbi services.

The PostgreSQL shared/global catalog

$
0
0

A PostgreSQL instance (or cluster) can contain many databases, three of them (template0, template1 and postgres) are there by default. Over the last years we trained many people on PostgreSQL Essentials and there have been mainly two points that needed more clarification when it comes to catalogs and the postgres default database:

  1. Does the postgres default database define the catalog and somehow is the master database?
  2. What exactly is in the global catalog?

In this post we’ll look into both points and I hope to make it more clear what the shared/global catalog contains, and that the postgres default database is not a master database and it does not define the postgres catalog.

For the first point (is the default postgres database a master database and does it define the catalog?) the answer can quite easily be given. The default postgres database is there for only one reason: Because most client utilities assume it is there, and by default connect into that database. But this does not mean, that the default postgres is any special, you can go well ahead and drop it:

postgres=# \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(3 rows)

postgres=# \c template1
You are now connected to database "template1" as user "postgres".
template1=# drop database postgres;
DROP DATABASE
template1=# \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(2 rows)

template1=# 

We even have customers which do that by default. The default postgres database is nothing special and initially it is exactly the same as template1. You can easily re-create, it if you want:

template1=# create database postgres;
CREATE DATABASE
template1=# \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(3 rows)

This answers the first question: The default postgres database is not a master database and it does not define the PostgreSQL catalog. Again, check here if you want to have more details about the three default databases.

The second question can be answered easily as well: What exactly is in the global/shared catalog? Most of the PostgreSQL catalog tables are per database, such as pg_tables:

postgres=# \d pg_tables
              View "pg_catalog.pg_tables"
   Column    |  Type   | Collation | Nullable | Default 
-------------+---------+-----------+----------+---------
 schemaname  | name    |           |          | 
 tablename   | name    |           |          | 
 tableowner  | name    |           |          | 
 tablespace  | name    |           |          | 
 hasindexes  | boolean |           |          | 
 hasrules    | boolean |           |          | 
 hastriggers | boolean |           |          | 
 rowsecurity | boolean |           |          | 

All these catalog tables and views are in a system schema called “pg_catalog”. This schema is not listed by default when you use the “\dn” shortcut in psql:

postgres=# \dn
  List of schemas
  Name  |  Owner   
--------+----------
 public | postgres
(1 row)

You need to add “S” for system, to list the system schemas:

postgres=# \dnS
        List of schemas
        Name        |  Owner   
--------------------+----------
 information_schema | postgres
 pg_catalog         | postgres
 pg_toast           | postgres
 public             | postgres
(4 rows)

Some catalog tables/views are global to the cluster/instance and are not per database. The obvious ones are users/roles and tablespaces. None of them are per database as users/roles can have access to various databases and various databases can store relations in the same tablespace. The question now is: How can I know if a catalog table/view is global or per database? Even global catalog tables/views are listed in the local catalog schema:

postgres=# \d pg_catalog.pg_roles
                         View "pg_catalog.pg_roles"
     Column     |           Type           | Collation | Nullable | Default 
----------------+--------------------------+-----------+----------+---------
 rolname        | name                     |           |          | 
 rolsuper       | boolean                  |           |          | 
 rolinherit     | boolean                  |           |          | 
 rolcreaterole  | boolean                  |           |          | 
 rolcreatedb    | boolean                  |           |          | 
 rolcanlogin    | boolean                  |           |          | 
 rolreplication | boolean                  |           |          | 
 rolconnlimit   | integer                  |           |          | 
 rolpassword    | text                     |           |          | 
 rolvaliduntil  | timestamp with time zone |           |          | 
 rolbypassrls   | boolean                  |           |          | 
 rolconfig      | text[]                   | C         |          | 
 oid            | oid                      |           |          | 

By only looking in the catalog schema we can not answer that question. What we can do, however, is to look at the data directory ($PGDATA). The databases are in “base” and the global/shared catalog is in “global”:

postgres@centos8pg:/home/postgres/ [pgdev] cd $PGDATA
postgres@centos8pg:/u02/pgdata/DEV/ [pgdev] ls -l | egrep "base|global"
drwx------. 6 postgres postgres    58 Nov 21 09:50 base
drwx------. 2 postgres postgres  4096 Nov 21 09:48 global

When we look into the “global” directory we’ll see a number of OIDs (object identifiers), this is how PostgreSQL internally is referencing the relations:

postgres@centos8pg:/u02/pgdata/DEV/ [pgdev] ls -l global/
total 564
-rw-------. 1 postgres postgres  8192 Nov 21 03:52 1213
-rw-------. 1 postgres postgres 24576 Nov 20 22:52 1213_fsm
-rw-------. 1 postgres postgres  8192 Nov 21 03:53 1213_vm
-rw-------. 1 postgres postgres  8192 Nov 20 22:52 1214
-rw-------. 1 postgres postgres 24576 Nov 20 22:52 1214_fsm
-rw-------. 1 postgres postgres  8192 Nov 20 22:52 1214_vm
-rw-------. 1 postgres postgres 16384 Nov 20 22:52 1232
-rw-------. 1 postgres postgres 16384 Nov 20 22:52 1233
-rw-------. 1 postgres postgres  8192 Nov 20 22:57 1260
-rw-------. 1 postgres postgres 24576 Nov 20 22:52 1260_fsm
-rw-------. 1 postgres postgres  8192 Nov 20 22:52 1260_vm
...

Each of these OIDs is one relation of the global/shared catalog. As we are not interested in the visibility maps and free space maps let’s exclude them, and only list the unique OIDs:

postgres@centos8pg:/u02/pgdata/DEV/ [pgdev] ls -l global/ | awk -F " " '{print $9}' | egrep "^[0-9]" | egrep -v "fsm|vm"
1213
1214
1232
1233
1260
1261
1262
2396
2397
2671
2672
2676
2677
2694
2695
2697
2698
2846
2847
2964
2965
2966
2967
3592
3593
4060
4061
4175
4176
4177
4178
4181
4182
4183
4184
4185
4186
6000
6001
6002
6100
6114
6115

These are the relations in the global/shared catalog. For translating these OIDs into human readable names there is oid2name. Without any additional parameters oid2name will give you the name of the databases listed in the “base” directory:

postgres@centos8pg:/u02/pgdata/DEV/ [pgdev] oid2name 
All databases:
    Oid  Database Name  Tablespace
----------------------------------
  24616       postgres  pg_default
  12905      template0  pg_default
      1      template1  pg_default

We can also pass the OIDs of the shared/global catalog to oid2name and the result will answer the second question: What, exactly, is in the global/shared catalog?

postgres@centos8pg:/u02/pgdata/DEV/ [pgdev] for i in `ls -l global/ | awk -F " " '{print $9}' | egrep "^[0-9]" | egrep -v "fsm|vm"`; do oid2name -x -S -q -o $i; done | grep -v "index"
      1213  pg_tablespace  1213  pg_catalog   pg_global
      1214  pg_shdepend  1214  pg_catalog   pg_global
      1260   pg_authid  1260  pg_catalog   pg_global
      1261  pg_auth_members  1261  pg_catalog   pg_global
      1262  pg_database  1262  pg_catalog   pg_global
      2396  pg_shdescription  2396  pg_catalog   pg_global
      2846  pg_toast_2396  2846  pg_toast   pg_global
      2964  pg_db_role_setting  2964  pg_catalog   pg_global
      2966  pg_toast_2964  2966  pg_toast   pg_global
      3592  pg_shseclabel  3592  pg_catalog   pg_global
      4060  pg_toast_3592  4060  pg_toast   pg_global
      4175  pg_toast_1260  4175  pg_toast   pg_global
      4177  pg_toast_1262  4177  pg_toast   pg_global
      4181  pg_toast_6000  4181  pg_toast   pg_global
      4183  pg_toast_6100  4183  pg_toast   pg_global
      4185  pg_toast_1213  4185  pg_toast   pg_global
      6000  pg_replication_origin  6000  pg_catalog   pg_global
      6100  pg_subscription  6100  pg_catalog   pg_global

Here is the answer (excluding the indexes). If we exclude the toast tables as well, you’ll notice that not many catalog tables/views are in the global/shared catalog:

postgres@centos8pg:/u02/pgdata/DEV/ [pgdev] for i in `ls -l global/ | awk -F " " '{print $9}' | egrep "^[0-9]" | egrep -v "fsm|vm"`; do oid2name -x -S -q -o $i; done | egrep -v "index|toast"
      1213  pg_tablespace  1213  pg_catalog   pg_global
      1214  pg_shdepend  1214  pg_catalog   pg_global
      1260   pg_authid  1260  pg_catalog   pg_global
      1261  pg_auth_members  1261  pg_catalog   pg_global
      1262  pg_database  1262  pg_catalog   pg_global
      2396  pg_shdescription  2396  pg_catalog   pg_global
      2964  pg_db_role_setting  2964  pg_catalog   pg_global
      3592  pg_shseclabel  3592  pg_catalog   pg_global
      6000  pg_replication_origin  6000  pg_catalog   pg_global
      6100  pg_subscription  6100  pg_catalog   pg_global

That’s it, hope it helps.

Cet article The PostgreSQL shared/global catalog est apparu en premier sur Blog dbi services.


Cross-cloud PMM: which TCP ports to open

$
0
0

By Franck Pachot

.
I recently installed Percona Monitoring & Management on AWS (free tier) and here is how to monitor an instance on another cloud (OCI), in order to show which TCP port must be opened.

PMM server

I installed PMM from the AWS Marketplace, following those instructions: https://www.percona.com/doc/percona-monitoring-and-management/deploy/server/ami.html. I’ll not reproduce the instructions, just some screenshots I took during the install:

I have opened the HTTPS port in order to access the console, and also configure the clients which will also connect through HTTPS (but I’m not using a signed certificate).

Once installed, two targets are visible: the PMM server host (Linux) and database (PostgreSQL):

Note that I didn’t secure HTTPS here and I’ll have to accept insecure SSL.

PMM client

I’ll monitor an Autonomous Linux instance that I have on Oracle Cloud (Free Tier). Autonomous Linux is based on OEL, which is based on RHEL (see https://blog.dbi-services.com/al7/) and is called “autonomous” because it updates the kernel without the need to reboot. I install the PMM Client RPM:


[opc@al ~]$ sudo yum -y install https://repo.percona.com/yum/percona-release-latest.noarch.rpm


Loaded plugins: langpacks
percona-release-latest.noarch.rpm                                                                        |  19 kB  00:00:00
Examining /var/tmp/yum-root-YgvokG/percona-release-latest.noarch.rpm: percona-release-1.0-25.noarch
Marking /var/tmp/yum-root-YgvokG/percona-release-latest.noarch.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package percona-release.noarch 0:1.0-25 will be installed
--> Finished Dependency Resolution
al7/x86_64                                                                                               | 2.8 kB  00:00:00
al7/x86_64/primary_db                                                                                    |  21 MB  00:00:00
epel-apache-maven/7Server/x86_64                                                                         | 3.3 kB  00:00:00
ol7_UEKR5/x86_64                                                                                         | 2.5 kB  00:00:00
ol7_latest/x86_64                                                                                        | 2.7 kB  00:00:00
ol7_x86_64_userspace_ksplice                                                                             | 2.8 kB  00:00:00

Dependencies Resolved

Dependencies Resolved

================================================================================================================================
 Package                        Arch                  Version               Repository                                     Size
================================================================================================================================
Installing:
 percona-release                noarch                1.0-25                /percona-release-latest.noarch                 31 k

Transaction Summary
================================================================================================================================
Install  1 Package

Total size: 31 k
Installed size: 31 k
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded

Running transaction
  Installing : percona-release-1.0-25.noarch                                                                                1/1
* Enabling the Percona Original repository
 All done!
* Enabling the Percona Release repository
 All done!
The percona-release package now contains a percona-release script that can enable additional repositories for our newer products.

For example, to enable the Percona Server 8.0 repository use:

  percona-release setup ps80

Note: To avoid conflicts with older product versions, the percona-release setup command may disable our original repository for some products.

For more information, please visit:
  https://www.percona.com/doc/percona-repo-config/percona-release.html

  Verifying  : percona-release-1.0-25.noarch                                                                                1/1

Installed:
  percona-release.noarch 0:1.0-25


This packages helps to enable additional repositories. Here, I need the PMM 2 Client:


[opc@al ~]$ sudo percona-release enable pmm2-client


* Enabling the PMM2 Client repository
 All done!

Once enabled, it is easy to install it with YUM:


[opc@al ~]$ sudo yum install -y pmm2-client


Loaded plugins: langpacks
percona-release-noarch                                                                                   | 2.9 kB  00:00:00
percona-release-x86_64                                                                                   | 2.9 kB  00:00:00
pmm2-client-release-x86_64                                                                               | 2.9 kB  00:00:00
prel-release-noarch                                                                                      | 2.9 kB  00:00:00
(1/4): percona-release-noarch/7Server/primary_db                                                         |  24 kB  00:00:00
(2/4): pmm2-client-release-x86_64/7Server/primary_db                                                     | 3.5 kB  00:00:00
(3/4): prel-release-noarch/7Server/primary_db                                                            | 2.5 kB  00:00:00
(4/4): percona-release-x86_64/7Server/primary_db                                                         | 1.2 MB  00:00:00
Resolving Dependencies
--> Running transaction check
---> Package pmm2-client.x86_64 0:2.11.1-6.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================================================================
 Package                     Arch                   Version                        Repository                              Size
================================================================================================================================
Installing:
 pmm2-client                 x86_64                 2.11.1-6.el7                   percona-release-x86_64                  42 M

Transaction Summary
================================================================================================================================
Install  1 Package

Total download size: 42 M
Installed size: 42 M
Downloading packages:
warning: /var/cache/yum/x86_64/7Server/percona-release-x86_64/packages/pmm2-client-2.11.1-6.el7.x86_64.rpm: Header V4 RSA/SHA256
 Signature, key ID 8507efa5: NOKEY
Public key for pmm2-client-2.11.1-6.el7.x86_64.rpm is not installed
pmm2-client-2.11.1-6.el7.x86_64.rpm                                                                      |  42 MB  00:00:07
Retrieving key from file:///etc/pki/rpm-gpg/PERCONA-PACKAGING-KEY
Importing GPG key 0x8507EFA5:
 Userid     : "Percona MySQL Development Team (Packaging key) "
 Fingerprint: 4d1b b29d 63d9 8e42 2b21 13b1 9334 a25f 8507 efa5
 Package    : percona-release-1.0-25.noarch (@/percona-release-latest.noarch)
 From       : /etc/pki/rpm-gpg/PERCONA-PACKAGING-KEY
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : pmm2-client-2.11.1-6.el7.x86_64                                                                              1/1
  Verifying  : pmm2-client-2.11.1-6.el7.x86_64                                                                              1/1

Installed:
  pmm2-client.x86_64 0:2.11.1-6.el7

Complete!

That’s all for software installation. I just need to configure the agent to connect to the PMM Server:


[opc@al ~]$ sudo pmm-admin config --server-url https://admin:secretpassword@18.194.119.174 --server-insecure-tls $(curl ident.me) generic OPC-$(hostname)


Checking local pmm-agent status...
pmm-agent is running.
Registering pmm-agent on PMM Server...
Registered.
Configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml updated.
Reloading pmm-agent configuration...
Configuration reloaded.
Checking local pmm-agent status...
pmm-agent is running.

As you can see, I use the “ident-me” web service to identify my IP address, but you probably know your public IP.

This configuration goes to a file, which you should protect because it contains the password in clear text:


[opc@al ~]$ ls -l /usr/local/percona/pmm2/config/pmm-agent.yaml
-rw-r-----. 1 pmm-agent pmm-agent 805 Nov 28 20:22 /usr/local/percona/pmm2/config/pmm-agent.yaml

# Updated by `pmm-agent setup`.
---
id: /agent_id/853027e6-563e-42b8-a417-f144541358ff
listen-port: 7777
server:
  address: 18.194.119.174:443
  username: admin
  password: secretpassword
  insecure-tls: true
paths:
  exporters_base: /usr/local/percona/pmm2/exporters
  node_exporter: /usr/local/percona/pmm2/exporters/node_exporter
  mysqld_exporter: /usr/local/percona/pmm2/exporters/mysqld_exporter
  mongodb_exporter: /usr/local/percona/pmm2/exporters/mongodb_exporter
  postgres_exporter: /usr/local/percona/pmm2/exporters/postgres_exporter
  proxysql_exporter: /usr/local/percona/pmm2/exporters/proxysql_exporter
  rds_exporter: /usr/local/percona/pmm2/exporters/rds_exporter
  tempdir: /tmp
  pt_summary: /usr/local/percona/pmm2/tools/pt-summary
ports:
  min: 42000
  max: 51999
debug: false
trace: false

What is interesting here is the port that is used for the server to connect to pull metrics from the client: 42000

I’ll need to open this port. I can see the error from the PMM server: https://18.194.119.174/prometheus/targets

I open this port on the host:


[opc@al ~]$ sudo iptables -I INPUT 5 -i ens3 -p tcp --dport 42000 -m state --state NEW,ESTABLISHED -j ACCEPT

and on the ingress rules as well:

Testing

I’m running two processes here to test if I get the right metrics


[opc@al ~]$ while true ; do sudo dd bs=100M count=1                if=$(df -Th | sort -rhk3 | awk '/^[/]dev/{print $1;exit}') of=/dev/null ; done &
[opc@al ~]$ while true ; do sudo dd bs=100M count=10G iflag=direct if=$(df -Th | sort -rhk3 | awk '/^[/]dev/{print $1;exit}') of=/dev/null ; done &

The latter will do mostly I/O as I read with O_DIRECT and the former mainly system CPU as it reads from filesystem cache

Here is the Grafana dashboard from PMM:

I see my two processes, and 80% of CPU stolen by the hypervisor as I’m running on the Free Tier here which provides 1/8th of OCPU.

If you have MySQL or PostgreSQL databases there, they can easily be monitored (“pmm-admin add MySQL” or “pmm-admin add MySQL” you can see all that in Elisa Usai demo: https://youtu.be/VgOR_GCUpVw?t=1558).

Last test, let’s see what happens if the monitored host reboots:


[opc@al ~]$ date
Sat Nov 28 22:54:36 CET 2020
[opc@al ~]$ uptrack-uname -a
Linux al 4.14.35-2025.402.2.1.el7uek.x86_64 #2 SMP Fri Oct 23 22:27:16 PDT 2020 x86_64 x86_64 x86_64 GNU/Linux
[opc@al ~]$ uname -a
Linux al 4.14.35-1902.301.1.el7uek.x86_64 #2 SMP Tue Mar 31 16:50:32 PDT 2020 x86_64 x86_64 x86_64 GNU/Linux
[opc@al ~]$ sudo systemctl reboot
Connection to 130.61.159.88 closed by remote host.
Connection to 130.61.159.88 closed.

Yes… I do not reboot it frequently because it is Autonomous Linux and the Effective kernel is up to date (latest patches from October) even if the last restart was in March. But this deserves a test.

The first interesting thing is that PMM seems to keep the last read metrics for a while:

The host was shut down at 22:55 and it shows the last metrics for 5 minutes before stopping.

I had to wait for a while because my Availability Domain was out of capacity for the free tier:


[opc@al ~]$ systemctl status pmm-agent.service
● pmm-agent.service - pmm-agent
   Loaded: loaded (/usr/lib/systemd/system/pmm-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Sat 2020-11-28 23:40:10 UTC; 1min 15s ago
 Main PID: 46446 (pmm-agent)
   CGroup: /system.slice/pmm-agent.service
           ├─46446 /usr/sbin/pmm-agent --config-file=/usr/local/percona/pmm2/config/pmm-agent.yaml
           └─46453 /usr/local/percona/pmm2/exporters/node_exporter --collector.bonding --collector.buddyinfo --collector.cpu ...

No problem, the installation of PMM client has defined the agent to restart on reboot.

In summary, PMM pulls the metric from the exporter, so you need to open inbound ports on the host where the PMM client agent runs. And HTTPS on the PMM server. Then everything is straightforward.

Cet article Cross-cloud PMM: which TCP ports to open est apparu en premier sur Blog dbi services.

JBoss EAP 7 – Domain Configuration

$
0
0

In a previous blog I talked about the domain creation, we saw at the end that some server groups and servers are created by default which usually are not what we need.
Through this blog, I will show you how to clean this unneeded configuration and how to configure your domain according to your defined architecture.

How this default configuration arrive in my domain?

As explained before we use the default domain.xml and (host.xml, host-master.xml or host-slave.xml) to create a domain, in fact, the default configuration created is there 😉

Server groups preconfigured in domain.xml

In the domain.xml you will find below configuration:

    <server-groups>
        <server-group name="main-server-group" profile="full">
            <jvm name="default">
                <heap size="1000m" max-size="1000m"/>
            </jvm>
            <socket-binding-group ref="full-sockets"/>
        </server-group>
        <server-group name="other-server-group" profile="full-ha">
            <jvm name="default">
                <heap size="1000m" max-size="1000m"/>
            </jvm>
            <socket-binding-group ref="full-ha-sockets"/>
        </server-group>
    </server-groups>

This means that two server groups are configured already.

Servers configured in host*.xml

host-master.xml

No servers defined, which is normal because it is not recommended to have servers on the master host.

host.xml

    <servers>
        <server name="server-one" group="main-server-group">
        </server>
        <server name="server-two" group="main-server-group" auto-start="true">
            <socket-bindings port-offset="150"/>
        </server>
        <server name="server-three" group="other-server-group" auto-start="false">
        </server>
    </servers>

host-slave.xml

    <servers>
        <server name="server-one" group="main-server-group"/>
        <server name="server-two" group="other-server-group">
            <socket-bindings port-offset="150"/>
        </server>
    </servers>

So, depending on which preconfigured file is used servers are configured in your domain…

How to clean default configuration?

You have two choices:

Update xml preconfigured files

The idea is to update the xml files before you start the domain and remove this default configuration on each host. which means:
Remove all lines between

    <server-groups>

and

    </server-groups>

from domain.xml

Remove all lines between

    <servers>

and

    <servers>

from host.xml or host-slave.xml

Be careful, servers are assigned to server groups, so if you remove the server groups you have to remove related servers, if not this will cause issues because server groups will not be found!

Clean default configuration after domain start

Start your domain and connect to the CLI:

[jboss@vmjboss ~]$ $JBOSS_HOME/bin/jboss-cli.sh -c --controller=vmjboss:9990
[domain@vmjboss:9990 /] 

You need first to stop servers then remove them:

[domain@vmjboss:9990 /] /host=slave1/server-config=server-one:stop(blocking=true)
{
    "outcome" => "success",
    "result" => "STOPPED"
}
[domain@vmjboss:9990 /] /host=slave1/server-config=server-one:remove             
{
    "outcome" => "success",
    "result" => undefined,
    "server-groups" => undefined
}

Repeat the operation on all servers all hosts.

Now, you will be able to remove default server groups:

[domain@vmjboss:9990 /] /server-group=main-server-group:remove
{
    "outcome" => "success",
    "result" => undefined,
    "server-groups" => undefined
}

Repeat the operation to remove all default server groups

Configure the domain

Define server groups

Server groups already explained in this blog.

To define a server group you should at least know:
– Which profile is needed (default, full, full-ha, ha)
– Which socket-binding-group according to your profile (standard-sockets, full-sockets, full-ha-sockets, ha-sockets)
– socket-binding-port-offset if needed

You can create a server groups via CLI or console, here the CLI Command to create an HA server groups:

[domain@vmjboss:9990 /] /server-group=HA-GROUP:add(profile=ha,socket-binding-group=ha-sockets)
{
    "outcome" => "success",
    "result" => undefined,
    "server-groups" => undefined
}

Define servers

Servers already explained in this blog.

To create a server you should at least know:
– On which host?
– Assigned to which group?

The server is created in a host and assigned to a server groups, below the command line to create a server:

[domain@vmjboss:9990 /] /host=slave1/server-config=server1:add(group=HA-GROUP,socket-binding-port-offset=100,auto-start=true)
{
    "outcome" => "success",
    "result" => undefined,
    "server-groups" => undefined
}

Now you know how to clean default servers and server groups, and how to create yours according to your need. Don’t hesitate to ask questions 😉

Cet article JBoss EAP 7 – Domain Configuration est apparu en premier sur Blog dbi services.

PostgreSQL 14: Automatic hash and list partitioning?

$
0
0

Declarative partitioning was introduced in PostgreSQL 10 and since then has improved quite much over the last releases. Today almost everything is there what you would expect from such a feature:

  • You can partition by range, list and hash
  • Attaching and detaching partitions
  • Foreign keys
  • Sub-partitioning
  • Indexing and constrains on partitions
  • Partition pruning

What is missing, is the possibility to let PostgreSQL create partitions automatically. With this patch this will finally be possible for hash and list partitioning, once it gets committed.

Lets start with list partitioning: Looking at the patch, new syntax is introduced:

CREATE TABLE tbl_list (i int) PARTITION BY LIST (i)
CONFIGURATION (values in (1, 2), (3, 4) DEFAULT PARTITION tbl_default);

Taking that as an example we should see all partitions created automatically, if we create a partitioned table like this:

postgres=# create table tpart_list ( a text primary key, b int, c int )
           partition by list(a)
           configuration (values in ('a'),('b'),('c'),('d') default partition tpart_list_default);
CREATE TABLE

That should have created 5 partitions automatically: a,b,c,d and the default partition:

postgres=# \d+ tpart_list
                           Partitioned table "public.tpart_list"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+---------+-----------+----------+---------+----------+--------------+-------------
 a      | text    |           | not null |         | extended |              | 
 b      | integer |           |          |         | plain    |              | 
 c      | integer |           |          |         | plain    |              | 
Partition key: LIST (a)
Indexes:
    "tpart_list_pkey" PRIMARY KEY, btree (a)
Partitions: tpart_list_0 FOR VALUES IN ('a'),
            tpart_list_1 FOR VALUES IN ('b'),
            tpart_list_2 FOR VALUES IN ('c'),
            tpart_list_3 FOR VALUES IN ('d'),
            tpart_list_default DEFAULT

Nice. The same works for hash partitioned tables but the syntax is slightly different:

CREATE TABLE tbl_hash (i int) PARTITION BY HASH (i)
CONFIGURATION (modulus 3);

The idea is the same, of course: You need to specify the “configuration” and when you go for hash partitioning you need to provide the modulus:

postgres=# create table tpart_hash ( a int primary key, b text)
           partition by hash (a)
           configuration (modulus 5);
CREATE TABLE
postgres=# \d+ tpart_hash
                           Partitioned table "public.tpart_hash"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+---------+-----------+----------+---------+----------+--------------+-------------
 a      | integer |           | not null |         | plain    |              | 
 b      | text    |           |          |         | extended |              | 
Partition key: HASH (a)
Indexes:
    "tpart_hash_pkey" PRIMARY KEY, btree (a)
Partitions: tpart_hash_0 FOR VALUES WITH (modulus 5, remainder 0),
            tpart_hash_1 FOR VALUES WITH (modulus 5, remainder 1),
            tpart_hash_2 FOR VALUES WITH (modulus 5, remainder 2),
            tpart_hash_3 FOR VALUES WITH (modulus 5, remainder 3),
            tpart_hash_4 FOR VALUES WITH (modulus 5, remainder 4)

Really nice, great work and thanks to all involved. I hope that the next steps will be:

  • Support automatic partition creation for range partitioning
  • Support automatic partition creation on the fly when data comes in, which requires a new partition. In the thread this is referenced as “dynamic” partitioning and what is implemented here is referenced as “static” partitioning

Cet article PostgreSQL 14: Automatic hash and list partitioning? est apparu en premier sur Blog dbi services.

Installing MySQL Database Service (MDS)

$
0
0

On a previous blog post, we saw how to create an account on the Oracle OCI using the Oracle Cloud Free Tier offer and then how to instal MySQL Server on the Compute instance.
Some weeks later, the new MySQL Database Service (MDS) was out and I can show you now how to install and configure it.

We are talking about the MySQL 8.0 Enterprise Edition on the on Oracle Generation 2 Cloud Infrastructure. For the moment it’s only available on some of the data regions (Frankfurt and London for the EMEA zone), but normally others will be activated beginning of 2021 (Zurich for example). Most of these regions have 3 Availability Domains (physical buildings), each of them composed by three Fault Domains (a group of separated hardware and infrastructure).
When we connect to the OCI console, the first step is to create a Virtual Cloud Network (VCN) in order to have our own private cloud network on OCI. This will be created in our Compartment, the main container that will contain our resources (elisapriv in my case).
We click on Networking > Virtual Cloud Network:

We can start then the VCN Wizard:

and define our VCN name and subnet:

We click on Create to finalize the VCN creation:

When it’s done, we can create a compute instance, which means our host.
We click on Compute > Instances:

We click then on Create Instance:

At this point we can adapt the compute instance configuration in terms of placement for the availability and fault domains, the resources and the OS images:

For example I decided to have 1 OCPU and 8GB of memory:

We need to upload our public key to connect then via ssh, and we can click on Create to create the compute instance:

We can get now the public IP address that we will use to connect then via ssh:

At this point, we can take care of the MySQL part.
If we need to set a variable to a value other than the default one, before creating our MySQL Server we have to create a new ad-hoc configuration, clicking on MySQL > Configurations:

and then clicking on Create MySQL Configuration:

We can now name our configuration and provide the variable value for the variable that we want to adapt. In my case, for example, I increased the maximum number of connections to 500:


It’s time to create our MySQL Server, clicking on MySQL > DB Systems:

and then clicking on Create MySQL DB System:

We can now name our MySQL DB System, and decide which kind of placement and hardware to use:

If we want to use the ad-hoc configuration that we defined just before, we have to click on Change Configuration:

and then select the right one:

We can choose the storage size for InnoDB and binary log data and the maintenance window (and I suggest to specify a convenient time for you, because otherwise one will be chosen at your place):

Still some information to fill, such as the administration user (which has to be different from root) and his password, networks details (VCN, the subnet, the port):

and the backup configuration (the backups will be executed through snapshots):

This creation operation will take some minutes, and then our MySQL DB System will be up and running.
One more thing to do before connecting to it: enable the network traffic to reach our MySQL Server.
On the VCN page, we have to click on Security Lists and to select the subnet:


We click on “Add Ingress Rules” and we can add the 3306 as destination port:

We can connect now to our compute instance and install the MySQL Shell:

ssh -i C:\MySQL\Cloud\ssh-key-2020-11-24.key opc@xxx.xx.xxx.xxx
[opc@instance-20201127-1738 ~]$ sudo yum install https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
[opc@instance-20201127-1738 ~]$ sudo yum install mysql-shell

and then connect to the MySQL DB System using MySQL Shell:

[opc@instance-20201127-1738 ~]$ mysqlsh --sql admin@10.0.1.3
Please provide the password for 'admin@10.0.1.3': *************
MySQL Shell 8.0.22

Copyright (c) 2016, 2020, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
Creating a session to 'admin@10.0.1.3'
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 33
Server version: 8.0.22-u2-cloud MySQL Enterprise - Cloud
No default schema selected; type \use  to set one.
 MySQL  10.0.1.3:3306 ssl  SQL >

Great, no?
And this is not everything. I think that news concerning analytics for MySQL data will come next weeks.
So stay tuned! 😉

Cet article Installing MySQL Database Service (MDS) est apparu en premier sur Blog dbi services.

DynamoDB Scan (and why 128.5 RCU?)

$
0
0

By Franck Pachot

.
In the previous post I described the PartiSQL SELECT for DynamoDB and mentioned that a SELECT without a WHERE clause on the partition key may result in a Scan, but the result is automatically paginated. This pagination, and the cost of a Scan, is something that may not be very clear from the documentation and I’ll show it here on the regular DynamoDB API. By not very clear, I think this is why many people in the AWS community fear that, with this new PartiQL API, there is a risk to full scan tables, consuming expensive RCUs. I was also misled, when I started to look at DynamoDB, by the AWS CLI “–no-paginate” option, as well as its “Consumed Capacity” always showing 128.5 even for very large scans. So those examples should, hopefully, clear out some doubts.

I have created a HASH/RANGE partitioned table and filled it with a few thousands of items:


aws dynamodb create-table --attribute-definitions \
  AttributeName=MyKeyPart,AttributeType=N \
  AttributeName=MyKeySort,AttributeType=N \
 --key-schema \
  AttributeName=MyKeyPart,KeyType=HASH \
  AttributeName=MyKeySort,KeyType=RANGE \
 --billing-mode PROVISIONED \
 --provisioned-throughput ReadCapacityUnits=25,WriteCapacityUnits=25 \
 --table-name Demo

for i in {1..5000} ; do aws dynamodb put-item --table-name Demo --item '{"MyKeyPart":{"N":"'$(( $RANDOM /1000 ))'"},"MyKeySort":{"N":"'$SECONDS'"},"MyUnstructuredData":{"S":"'$(printf %-1000s | tr ' ' x)'"}}' ; done

Here is how those items look like:

I have created large items with a 1000 bytes “MyUnstructuredData” attribute. According to https://zaccharles.github.io/dynamodb-calculator/ an item size is around 1042 bytes. And that’s exactly the size I see here (5209783/5000=1041.96) from the console “Items summary” (I waited a few hours to get it updated in the screenshot above). This means that around 1000 items can fit on a 1MB page. We will see why I’m mentioning 1MB here: the title says 128.5 RCU and that’s the consumed capacity when reading 1MB with eventual consistency (0.5 RCU per 4KB read is 128 RCU per 1MB). Basically, this post will try to explain why we see a 128.5 consumed capacity at maximum when scanning any large table:


[opc@a aws]$ aws dynamodb scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL\
 --no-consistent-read --output table

----------------------------------
|              Scan              |
+----------+---------------------+
|   Count  |    ScannedCount     |
+----------+---------------------+
|  5000    |  5000               |
+----------+---------------------+
||       ConsumedCapacity       ||
|+----------------+-------------+|
||  CapacityUnits |  TableName  ||
|+----------------+-------------+|
||  128.5         |  Demo       ||
|+----------------+-------------+|
[opc@a aws]$

TL;DR: this number is wrong 😉
You cannot scan 5000 items of 1000 bytes with 128.5 RCU as this is nearly 5MB scanned and you need 0.5 RCU per 4KB reads.

Output text

I’ve run this with “–output table” for a pretty print of it. Let’s have a look at the other formats (json and text):


[opc@a aws]$ aws dynamodb scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL \
--no-consistent-read --output json
{
    "Count": 5000,
    "ScannedCount": 5000,
    "ConsumedCapacity": {
        "TableName": "Demo",
        "CapacityUnits": 128.5
    }
}

[opc@a aws]$ aws dynamodb scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL\
 --no-consistent-read --output text

1007    None    1007
CONSUMEDCAPACITY        128.5   Demo
1007    None    1007
1007    None    1007
1007    None    1007
972     None    972

The JSON format is similar to the TABLE one, but the TEXT output gives more information about this 128.5 consumed capacity as it appears after a count of 1007 items. Yes, this makes sense, 1007 is approximately the number of my items I expect in 1MB and, as I mentioned earlier, reading 1MB in eventual consistency consumes 128 RCU (0.5 RCU per 4KB). What actually happens here is pagination. A scan call is always limited to read 1MB at maximum (you can compare that to a fetch size in a SQL database except that it is about the amount read rather than returned) and what happens here is that the AWS CLI fetches the next pages in order to get the whole COUNT. Unfortunately, the “–return-consumed-capacity TOTAL” shows the value from the first fetch only. And only the TEXT format shows this count for each call. The TABLE and JSON formats do the sum of count for you (which is nice) but hide the fact that the Consumed Capacity is the one from the first call only.

Debug

The partial display of the consumed capacity is a problem with AWS CLI but each call actually returns the right value, which we can see with “–debug”:


[opc@a aws]$ aws dynamodb scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL \
 --no-consistent-read --output table --debug 2>&1 | grep '"ConsumedCapacity"'

b'{"ConsumedCapacity":{"CapacityUnits":128.5,"TableName":"Demo"},"Count":1007,"LastEvaluatedKey":{"MyKeyPart":{"N":"2"},"MyKeySort":{"N":"96744"}},"ScannedCount":1007}'
b'{"ConsumedCapacity":{"CapacityUnits":128.5,"TableName":"Demo"},"Count":1007,"LastEvaluatedKey":{"MyKeyPart":{"N":"27"},"MyKeySort":{"N":"91951"}},"ScannedCount":1007}'
b'{"ConsumedCapacity":{"CapacityUnits":128.5,"TableName":"Demo"},"Count":1007,"LastEvaluatedKey":{"MyKeyPart":{"N":"20"},"MyKeySort":{"N":"85531"}},"ScannedCount":1007}'
b'{"ConsumedCapacity":{"CapacityUnits":128.5,"TableName":"Demo"},"Count":1007,"LastEvaluatedKey":{"MyKeyPart":{"N":"29"},"MyKeySort":{"N":"90844"}},"ScannedCount":1007}'
b'{"ConsumedCapacity":{"CapacityUnits":124.0,"TableName":"Demo"},"Count":972,"ScannedCount":972}'

Here the total RCU is 128.5+128.5+128.5+128.5+125=639 which is what we can expect here to scan a 5MB table (0.5*5209783/4096=636).

Pagination

To make the confusion bigger, there are several meanings in “pagination”. One is about the fact that a scan call reads at maximum 1MB of DynamoDB storage (and then at maximum 128.5 RCU – or 257 for strong consistency). And the other is about the fact that, from the AWS CLI, we can automatically fetch the next pages.

I can explicitly ask to read only one page with the “–no-paginate” option (misleading name, isn’t it? It actually means “do not automatically read the next pages”):


[opc@a aws]$ aws dynamodb scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL \
--no-consistent-read --output json --no-paginate

{
    "Count": 1007,
    "ScannedCount": 1007,
    "LastEvaluatedKey": {
        "MyKeyPart": {
            "N": "2"
        },
        "MyKeySort": {
            "N": "96744"
        }
    },
    "ConsumedCapacity": {
        "TableName": "Demo",
        "CapacityUnits": 128.5
    }
}

[opc@a aws]$ aws dynamodb scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL \
--no-consistent-read --output text --no-paginate

1007    1007
CONSUMEDCAPACITY        128.5   Demo
MYKEYPART       2
MYKEYSORT       96744

Here, in all output formats, things are clear as the consumed capacity matches the number of items. The first 1MB page has 1007 items, which consumes 128.5 RCU, and, in order to know the total number, we need to read the next pages.

Different than the auto pagination (automatically call the next page until the end), the read pagination always happens for scans: you will never read more than 1MB from the DynamoDB storage in one call. This is how the API avoids any surprise in the response time: the fetch size depends on the cost of data access rather than the result. If I add a filter to my scan so that no rows are returned, the same pagination happens:


[opc@a aws]$ aws dynamodb scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL \
--no-consistent-read --output table \
--filter-expression "MyUnstructuredData=:v" --expression-attribute-values '{":v":{"S":"franck"}}'

----------------------------------
|              Scan              |
+----------+---------------------+
|   Count  |    ScannedCount     |
+----------+---------------------+
|  0       |  5000               |
+----------+---------------------+
||       ConsumedCapacity       ||
|+----------------+-------------+|
||  CapacityUnits |  TableName  ||
|+----------------+-------------+|
||  128.5         |  Demo       ||
|+----------------+-------------+|

[opc@a aws]$ aws dynamodb scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL \
--no-consistent-read --output text \
--filter-expression "MyUnstructuredData=:v" --expression-attribute-values '{":v":{"S":"franck"}}'

0       None    1007
CONSUMEDCAPACITY        128.5   Demo
0       None    1007
0       None    1007
0       None    1007
0       None    972

Here no items verify my filter (Count=0) but all items had to be scanned (ScanndCount=5000). And then, as displayed with the text output, pagination happened, returning empty pages. This is a very important point to understand DynamoDB scans: as there is no access filter (no key value) it does a Full Table Scan, with the cost of it, and the filtering is done afterwards. This means that empty pages can be returned and you may need multiple roundtrips even for no rows. And this is what I had here with the COUNT: 5 calls to get the answer “Count: 0”.

For my Oracle Database readers, you can think of DynamoDB scan operation like a “TABLE ACCESS FULL” in an execution plan (but not like a “TABLE ACCESS STORAGE FULL” which offloads the predicates to the storage) where you pay per throttled reads per second. The cost of the operation depends on the volume read (the size of the table) but not on the result. Yes, the message in DynamoDB is “avoid scan as much as possible” like we had “avoid full table scans as much as possible” in SQL databases, when used for OLTP, except for small tables. And guess what? The advantage of DynamoDB scan operation comes when you need to read a large part of the table because it can read many items with one call, with 1MB read I/O size on the storage… Yes, 1MB, the same as what the db_file_multiblock_read_count default value has always set for maximum I/O size behind in Oracle for full table scans. APIs change but many concepts are the same.

Page size

We can control the page size, if we want more smaller pages (but you need very good reasons to do so), by specifying the number of items. In order to show that it is about the number of items scanned but not returned after filtering, I keep my filter that removes all items from the result:


[opc@a aws]$ aws dynamodb scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL \
--no-consistent-read --output text --filter-expression "MyUnstructuredData=:v" --expression-attribute-values '{":v":{"S":"franck"}}' \
--page-size 500

0       None    500
CONSUMEDCAPACITY        64.0    Demo
0       None    500
0       None    500
0       None    500
0       None    500
0       None    500
0       None    500
0       None    500
0       None    500
0       None    500
0       None    0

Here each page is smaller than before, limited to 500 items. Then the RCU consumed by each call is smaller, as well as the response time. However, it is clear that the total number of RCU consumed is still the same, even higher:


[opc@a aws]$ aws dynamodb scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL \
--no-consistent-read --output text --filter-expression "MyUnstructuredData=:v" --expression-attribute-values '{":v":{"S":"franck"}}' \
--page-size 10 \
--debug  2>&1 | awk -F, '/b.{"ConsumedCapacity":{"CapacityUnits":/{sub(/[^0-9.]*/,"");cu=cu+$1}END{print cu}'

750.5

and the response time is higher because of additional roundtrips.

We can define smaller pages, but never larger than 1MB:


[opc@a aws]$ aws dynamodb scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL \
--no-consistent-read --output text --filter-expression "MyUnstructuredData=:v" --expression-attribute-values '{":v":{"S":"franck"}}' \
--page-size 1500

0       None    1067
CONSUMEDCAPACITY        128.5   Demo
0       None    1074
0       None    1057
0       None    1054
0       None    748

Here, even if I asked for 1500 items per page, I get the same as before because, given the item size, the 1MB limit is reached first.

A few additional tests:


[opc@a aws]$ aws dynamodb scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL \
--no-consistent-read --output table --page-size 1500 --no-paginate                                                                                             

Cannot specify --no-paginate along with pagination arguments: --page-size

[opc@a aws]$ aws dynamodb scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL \
--no-consistent-read --output table --page-size 1 --debug 2>&1 | grep '"ConsumedCapacity"' | head -2

b'{"ConsumedCapacity":{"CapacityUnits":0.5,"TableName":"Demo"},"Count":1,"LastEvaluatedKey":{"MyKeyPart":{"N":"7"},"MyKeySort":{"N":"84545"}},"ScannedCount":1}'
b'{"ConsumedCapacity":{"CapacityUnits":0.5,"TableName":"Demo"},"Count":1,"LastEvaluatedKey":{"MyKeyPart":{"N":"7"},"MyKeySort":{"N":"85034"}},"ScannedCount":1}'

[opc@a aws]$ aws dynamodb scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL \
--no-consistent-read --output table --page-size 10 --debug 2>&1 | grep '"ConsumedCapacity"' | head -2

b'{"ConsumedCapacity":{"CapacityUnits":1.5,"TableName":"Demo"},"Count":10,"LastEvaluatedKey":{"MyKeyPart":{"N":"7"},"MyKeySort":{"N":"85795"}},"ScannedCount":10}'
b'{"ConsumedCapacity":{"CapacityUnits":1.5,"TableName":"Demo"},"Count":10,"LastEvaluatedKey":{"MyKeyPart":{"N":"7"},"MyKeySort":{"N":"86666"}},"ScannedCount":10}'

[opc@a aws]$ aws dynamodb scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL \
--no-consistent-read --output table --page-size 100 --debug 2>&1 | grep '"ConsumedCapacity"' | head -2

b'{"ConsumedCapacity":{"CapacityUnits":13.0,"TableName":"Demo"},"Count":100,"LastEvaluatedKey":{"MyKeyPart":{"N":"7"},"MyKeySort":{"N":"94607"}},"ScannedCount":100}'
b'{"ConsumedCapacity":{"CapacityUnits":13.0,"TableName":"Demo"},"Count":100,"LastEvaluatedKey":{"MyKeyPart":{"N":"8"},"MyKeySort":{"N":"89008"}},"ScannedCount":100}'

First, I cannot disable auto pagination when defining a page size. This is why I used the debug mode to get the RCU consumed. Reading only one item (about 1KB) consumed 0.5 RCU because this is the minimum: 0.5 to read up to 4KB. Then I called for 10 and 100 items per page. This helps to estimate the size of items. 13 RCU for 100 items means that the average item size is 13*4096/0.5/100=1065 bytes.

You probably don’t want to reduce the page size under 1MB, except maybe if your RCU are throttled and you experience timeout. It is a response time vs. throughput decision. And in any case, a scan page should return many items. If I scan all my 5000 items with –page-size 1 will require 2500 RCU because each call is 0.5 at minimum:


[opc@a aws]$ aws dynamodb scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL --no-consistent-read --output text --filter-expression "MyUnstructuredData=:v" --expression-attribute-values '{":v":{"S":"franck"}}' --page-size 1 --debug  2>&1 | awk -F, '/b.{"ConsumedCapacity":{"CapacityUnits":/{sub(/[^0-9.]*/,"");cu=cu+$1}END{print cu}'

2500.5

This cost with the smallest page size is the same as reading each item with a getItem operation. So you see, except with this extremely small page example, that scan is not always evil. When you need to read many items, scan can get them with less RCU. How much is “many”? The maths is easy here. With 0.5 RCU you can read a whole 1MB page with scan, or just one item with getItem. Then, as long as, on average, you read more than one item per page, you get a benefit from scan. You can estimate the number of items you retreive. And you can divide the table size by 1MB. But keep in mind that if the table grows, the cost of scan increases. And sometimes, you prefer scalable and predictable response time over fast response time.

Max items

In addition to the page size (number of items scanned) we can also paginate the result. This doesn’t work for count:


[opc@a aws]$ aws dynamodb scan --table-name Demo --select=COUNT --return-consumed-capacity TOTAL \
--no-consistent-read --output text --max-items 1

1007    None    1007
CONSUMEDCAPACITY        128.5   Demo
1007    None    1007
1007    None    1007
1007    None    1007
972     None    972

Here, despite the “–max-items 1” the full count has been returned.

I’m now selecting (projection) two attributes, with a “–max-items 5”:

[opc@a aws]$ aws dynamodb scan --table-name Demo --select=SPECIFIC_ATTRIBUTES --projection-expression=MyKeyPart,MyKeySort \
--return-consumed-capacity TOTAL --no-consistent-read --output text --max-items 5

1007    1007
CONSUMEDCAPACITY        128.5   Demo
MYKEYPART       7
MYKEYSORT       84545
MYKEYPART       7
MYKEYSORT       85034
MYKEYPART       7
MYKEYSORT       85182
MYKEYPART       7
MYKEYSORT       85209
MYKEYPART       7
MYKEYSORT       85359
NEXTTOKEN       eyJFeGNsdXNpdmVTdGFydEtleSI6IG51bGwsICJib3RvX3RydW5jYXRlX2Ftb3VudCI6IDV9

This, like pagination, gives a “next token” to get the remaining items.


[opc@a aws]$ aws dynamodb scan --table-name Demo --select=SPECIFIC_ATTRIBUTES --projection-expression=MyKeyPart,MyKeySort \
--return-consumed-capacity TOTAL --no-consistent-read --output text --max-items 5 \
--starting-token eyJFeGNsdXNpdmVTdGFydEtleSI6IG51bGwsICJib3RvX3RydW5jYXRlX2Ftb3VudCI6IDV9

0       0
CONSUMEDCAPACITY        128.5   Demo
MYKEYPART       7
MYKEYSORT       85380
MYKEYPART       7
MYKEYSORT       85516
MYKEYPART       7
MYKEYSORT       85747
MYKEYPART       7
MYKEYSORT       85769
MYKEYPART       7
MYKEYSORT       85795
NEXTTOKEN       eyJFeGNsdXNpdmVTdGFydEtleSI6IG51bGwsICJib3RvX3RydW5jYXRlX2Ftb3VudCI6IDEwfQ==

The displayed cost here is the same as a full 1MB scan: 128.5 RCU and, if you look at the calls with “–debug” you will see that a thousand of items were returned. However, the ScannedCount is zero:


[opc@a aws]$ aws dynamodb scan --table-name Demo --select=SPECIFIC_ATTRIBUTES --projection-expression=MyKeyPart,MyKeySort \
--return-consumed-capacity TOTAL --no-consistent-read --output table --max-items 5 \
--starting-token eyJFeGNsdXNpdmVTdGFydEtleSI6IG51bGwsICJib3RvX3RydW5jYXRlX2Ftb3VudCI6IDV9

-----------------------------------------------------------------------------------------------------------
|                                                  Scan                                                   |
+-------+---------------------------------------------------------------------------------+---------------+
| Count |                                    NextToken                                    | ScannedCount  |
+-------+---------------------------------------------------------------------------------+---------------+
|  0    |  eyJFeGNsdXNpdmVTdGFydEtleSI6IG51bGwsICJib3RvX3RydW5jYXRlX2Ftb3VudCI6IDEwfQ==   |  0            |
+-------+---------------------------------------------------------------------------------+---------------+
||                                           ConsumedCapacity                                            ||
|+---------------------------------------------------------+---------------------------------------------+|
||                      CapacityUnits                      |                  TableName                  ||
|+---------------------------------------------------------+---------------------------------------------+|
||  128.5                                                  |  Demo                                       ||
|+---------------------------------------------------------+---------------------------------------------+|
||                                                 Items                                                 ||

Does it make sense? No items scanned but thousand if items retrieved, with the RCU of a 1MB read?

Let’s try to answer this. I’ll query this again and update one item:


[opc@a aws]$ aws dynamodb scan --table-name Demo --select=ALL_ATTRIBUTES --return-consumed-capacity TOTAL <
--no-consistent-read --output text --max-items 5 \
--starting-token eyJFeGNsdXNpdmVTdGFydEtleSI6IG51bGwsICJib3RvX3RydW5jYXRlX2Ftb3VudCI6IDV9 \
| cut -c1-80

0       0
CONSUMEDCAPACITY        128.5   Demo
MYKEYPART       7
MYKEYSORT       85380
MYUNSTRUCTUREDDATA      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
MYKEYPART       7
MYKEYSORT       85516
MYUNSTRUCTUREDDATA      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
MYKEYPART       7
MYKEYSORT       85747
MYUNSTRUCTUREDDATA      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
MYKEYPART       7
MYKEYSORT       85769
MYUNSTRUCTUREDDATA      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
MYKEYPART       7
MYKEYSORT       85795
MYUNSTRUCTUREDDATA      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
NEXTTOKEN       eyJFeGNsdXNpdmVTdGFydEtleSI6IG51bGwsICJib3RvX3RydW5jYXRlX2Ftb3VudCI6ID

[opc@a aws]$ aws dynamodb execute-statement \
--statement "update Demo set MyunstructuredData='Hello' where MyKeyPart=7 and MyKeySort =85516"

------------------
|ExecuteStatement|
+----------------+

I’ve used the PartiQL SQL-Like API juszt because I find it really convenient for this.

Now scanning from the same next token:


[opc@a aws]$ aws dynamodb scan --table-name Demo --select=ALL_ATTRIBUTES --return-consumed-capacity TOTAL <
--no-consistent-read --output text --max-items 5 \
--starting-token eyJFeGNsdXNpdmVTdGFydEtleSI6IG51bGwsICJib3RvX3RydW5jYXRlX2Ftb3VudCI6IDV9 \
| cut -c1-80

0       0
CONSUMEDCAPACITY        128.5   Demo
MYKEYPART       7
MYKEYSORT       85380
MYUNSTRUCTUREDDATA      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
MYKEYPART       7
MYKEYSORT       85516
MYUNSTRUCTUREDDATA      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
MYUNSTRUCTUREDDATA      Hello
MYKEYPART       7
MYKEYSORT       85747
MYUNSTRUCTUREDDATA      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
MYKEYPART       7
MYKEYSORT       85769
MYUNSTRUCTUREDDATA      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
MYKEYPART       7
MYKEYSORT       85795
MYUNSTRUCTUREDDATA      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
NEXTTOKEN       eyJFeGNsdXNpdmVTdGFydEtleSI6IG51bGwsICJib3RvX3RydW5jYXRlX2Ftb3VudCI6ID

Besides the fact that my update has added a new attribute MyunstructuredData rather than replacing MyUnstructuredData because I made a typo (not easy to spot with the text output as it uppercases all attributes names) the important point is that I’ve read the new value. Despite the “ScannedCount=0”, I have obviously read the items again. Nothing stays in cache and there are no stateful cursors in a NoSQL database.

So just be careful with “–max-items”. It limits the result, but not the work done in one page read. RCU is always calculated from the number of 4KB that are read to get the page from the storage, far before any filtering. Where “–max-items” can limit the cost is when using auto pagination to avoid reading more pages than necessary:


[opc@a aws]$ aws dynamodb scan --table-name Demo --select=SPECIFIC_ATTRIBUTES --projection-expression=MyKeyPart,MyKeySort --return-consumed-capacity TOTAL \
--no-consistent-read --output text \
--max-items 2500 --debug 2>&1 | grep '"ConsumedCapacity"' | cut -c1-80

b'{"ConsumedCapacity":{"CapacityUnits":128.5,"TableName":"Demo"},"Count":1007,"I
b'{"ConsumedCapacity":{"CapacityUnits":128.5,"TableName":"Demo"},"Count":1007,"I
b'{"ConsumedCapacity":{"CapacityUnits":128.5,"TableName":"Demo"},"Count":1007,"I

Here limiting the displayed result to 2500 items only 3 pages (of 1007 items which are not filtered to the result) have been read.

Consistency

I’ve run all those scans with “–no-consistent-read”, which is the default, just to make it implicit that we accept to miss the latest changes. With consistent reads, we are sure to read the latest, but requires more reads (from a quorum of mirrors) and doubles the RCU consumption:


[opc@a aws]$ aws dynamodb scan --table-name Demo --select=SPECIFIC_ATTRIBUTES --projection-expression=MyKeyPart,MyKeySort \
--return-consumed-capacity TOTAL --consistent-read --output text --max-items 5 \
--starting-token eyJFeGNsdXNpdmVTdGFydEtleSI6IG51bGwsICJib3RvX3RydW5jYXRlX2Ftb3VudCI6IDV9

0       0
CONSUMEDCAPACITY        257.0   Demo
MYKEYPART       7
MYKEYSORT       85380
MYKEYPART       7
MYKEYSORT       85516
MYKEYPART       7
MYKEYSORT       85747
MYKEYPART       7
MYKEYSORT       85769
MYKEYPART       7
MYKEYSORT       85795
NEXTTOKEN       eyJFeGNsdXNpdmVTdGFydEtleSI6IG51bGwsICJib3RvX3RydW5jYXRlX2Ftb3VudCI6ID==

The cost here is 257 RCU as 4KB consistent reads cost 1 RCU instead of 0.5 without caring about consistency.

Query

I focused on the scan operation here, but the same multi-item read applies to the query operation. Except that you do not read the whole table, even with pagination, as you define a specific partition by providing a value for the partition key which will be hashed to one partition.

Those tests are fully reproducible on the Free Tier. You don’t need billions of items to understand how it works. And once you understand how it works, simple math will tell you how it scales to huge tables.

Cet article DynamoDB Scan (and why 128.5 RCU?) est apparu en premier sur Blog dbi services.

Convert private key generated via OCI Console to ppk

$
0
0

I am pretty new on the Oracle Cloud Infrastructure technology, so maybe I am talking about something you already know. But anyway I prefer to share this case: it can help if you encounter the same problem as me. Let’s take the risk to have too much information rather than nothing! 😉

The problem

I was doing some tests on the new MySQL Database Service and during the setup I decided to generate my ssh keys via the OCI console:

When I tried to connect via PuTTY or MobaXterm to my compute instance using the opc account and my private key (generated previously), I got the following error:

Looking at the keys generated via the Oracle Cloud console, I saw that they were defined in the following format:

The solution

Actually I don’t work directly on a Linux system. So I need to convert my private key if I want to make it usable via my connection tools.
First step is to transform it to RSA format. I can do it using OpenSSL:

# openssl rsa -in ssh-key-2020-11-24.key -out ssh-key-2020-11-24.rsa

Second and last step is to convert it to ppk format. I can do it using PuTTYgen.
I load the private key:

I filter on all files types:

I select my RSA key and I click on Open:

I click on Ok on the following message:

and then on Save private key:

So I save the key with a ppk format:

Tests

I can now use my private key to connect to my OCI compute instance via PuTTY:

or MobaXterm:

Hope this can help you!

Cet article Convert private key generated via OCI Console to ppk est apparu en premier sur Blog dbi services.

An Introduction to Pester – Unit Testing and Infrastructure checks in PowerShell

$
0
0

Introduction

If you never heard of it, Pester is a PowerShell module, written in PowerShell.
It’s a framework for writing and running unit tests, integration tests, and also infrastructure checks as we will see in a moment.
Pester is used for example to test PowerShell Core and Pester itself.

In this blog post, I’ll do a short introduction to Pester with Installation and basic checks examples.

Installation

Pester is shipped by default with Windows 10 and Windows Server 2016. The version installed is 3.4.
The latest version is available in the PSGallery. It is currently version 5.1.
If you have the 3.4 version installed and would like to update it you will face errors with Update-Module. You need to use the following command to get the latest version:

PS C:\> Install-Module -Name Pester -Force -SkipPublisherCheck
PS C:\> Get-InstalledModule

Version    Name                                Repository           Description
-------    ----                                ----------           -----------
5.1.0      Pester                              PSGallery            Pester provides a framework for...

PowerShell function example

I will now show you a very basic Pester test.
Let’s say I want to write a Pester test for the following PowerShell function.

This is a very basic function that reverses the string characters. This is the output:

Now that I have a working function I can start to write the Pester test.

Create a Pester Tests file

The Pester function New-Fixture will create a template file for me but you could definitely create it yourself.
By convention, Pester test files should end with “Tests.ps1”.
The Tests file has been created.
I already edited the file and wrote the test.
This is what a Pester test looks like:

Pester basics

Pester is very declarative and easy to read.
You can ignore the first 3 rows in the Tests file, they came from the New-Fixture function and just dot sources the function to test into the PowerShell session.

The main commands with Pester are Describe, It, Context, and Should.

Describe is a block that contains tests. You will often have one Describe block for each function you want to test.
Context blocks are like Describe, they contain It blocks. They are optional and are useful to organize your test code.
The It block is the one that actually contains the test. The It block should have an expressive phrase describing the expected test outcome.
Finally, the Should command defines the test condition to be met. If the assertion is not met the test fails and an exception is thrown up. I used the -Be parameter. Many more are available like -BeFalse, -BeGreaterOrEqual, -BeLike, -Contain, etc.

In this example the test is simple. I set the expected value that I should get and I compare it to the actual value returned by the function.

Running Pester Tests

Now let’s run the test itself with Invoke-Pester.
Also, running Invoke-Pester without any parameter will run all the “Tests.ps1” files in the current location.
So, everything is green. We can see that one test was performed and it Passed.
Now let’s say another developer worked on the Get-ReverseString function and the latest change introduced a bug. Function behavior has changed and the Pester test will now throw a beautiful exception:
What is great is all the details (in red color) we can get when a test fails.

Infrastructure Testing

Pester is often used by sysadmin to do infrastructure testing. Your environment changes frequently and you need to be sure that your infrastructure is aligned with your standard.
Here are a few examples of such tests. The test is done directly inside a Pester code block not using a function like I did previously.

Check that my Windows Server Power Plan is set to High Performance.

Describe "Power Plan" {
    $PowerPlan = (Get-CimInstance -ClassName Win32_PowerPlan -Namespace 'root\cimv2\power' | Where-Object IsActive).ElementName
    It "Should be set to High Performance" {
        $PowerPlan | Should -be "High Performance" -Because "This Power Plan increases performance"
    }
}

The best practice for SQL Server disks is to have an allocation unit size of 64 KB, here is the check:

Describe "File Allocation Unit Size" {    
    $BlockSize = (Get-CimInstance -ClassName Win32_Volume | Where-Object DriveLetter -eq $SQLDisk).BlockSize
    It "Should be 64 KB" {
        $BlockSize | Should -Be 65536 -Because "It is recommended to set a File Allocation Unit Size value to 64 KB on partitions where resides SQL Server data or log files"
    }
}

Here I used the dbatools command Get-DbaErrorLogConfig to get the number of files configured for my ErrorLog. My best practice is to have 30 files instead of 6 by default.

Describe "SQL Server Error Log Files" {
    $errorLogCount = (Get-DbaErrorLogConfig -SqlInstance $SQLInstance).LogCount
    It "Should have Number of Log files set to 30" {
        $errorLogCount | Should -Be 30 -Because "Best practices requires 30 logs files to perform daily recycling"
    }
}

When all put together the output of the Tests looks like this:
As you can see I can easily validate that my SQL Server infrastructure is configured as expected.

Code Coverage

Code Coverage is the percentage of lines of code that is tested by unit tests.
It’s an indicator of how thoroughly your code has been tested. Having 100% coverage doesn’t mean that the code is bug-free, it just indicates that all your code is being executed during the test.

If I add some code to my Get-ReverseString.ps1 file, the -CodeCoverage functionality will tell me exactly what is not covered by tests:

Conclusion

This blog post was just to get you started on learning Pester. There are a lot more possibilities with Pester. I might cover some more advanced usages a future post like the TestDrive or Mocking.
Here are some resources I’d recommend:

You can find all the code from this blog post on GitHub.

Cet article An Introduction to Pester – Unit Testing and Infrastructure checks in PowerShell est apparu en premier sur Blog dbi services.


Database announcements at re:Invent 2020

$
0
0

By Franck Pachot

.
This year is not very nice for conferences as everything is virtual and we miss the most important: meeting and sharing with people. But the AWS re:Invent is actually a great experience. As an AWS Data Heros, I received an Oculus Quest 2 to teleport to the virtual Neon City where we can meet and have fun in Virtual Reality (but incredibly real-life chatting):

There are 3 important new launches announced around databases: Babelfish for Aurora, Aurora Serverless v2 and AWS Glue Elastic Views but let’s start by a recap of the pre-reInvent new features from this year.

We have more regions, even one planned in Switzerland. And also more cloud at customer solutions, like RDS in Outposts in addition to RDS on VMware. We had new versions, PostgreSQL 12, MariaDB 10.5, SQL Server 2019. SQL Server even came with SSRS). And also recent Release Updates for Oracle (July 2020).
About new features from 2020, we can export RDS snapshots to S3 parquet format. We can share AD with RDS from multiple VPC, we have connection pooling in RDS Proxy (session state aware). SQL Server supports parallel backups. Oracle supports backup to other regions. RDS can use always-on for SQL Server read replicas. And Oracle does not need Active Data Guard option when the replica is not there for read workloads. And talking about licenses, there’s the License Manager for Oracle to help manage them. There’s also the new Graviton2 processors for RDS PostgreSQL and MySQL.

All that was about relational databases, there’s also new features in NoSQL databases, like DynamoDB export to S3, PartiQL queries. But let’s new go to the new launches.

AWS Glue Elastic Views

I mentioned that we can query the NoSQL DynamoDB tables with a SQL-like API, PartiQL. Now those PartiQL queries can do more: continuous query to propagate data and changes, like materialized views. This event sourcing is based on CDC (not Stream). It propagates changes in near real-time (asynchronous, can be throttled by the target capacity) and to multiple destinations: Elasticsearch to search, S3 for data lake, Redshift for analytics. A nice serverless solution for CQRS: DynamoDB for ingest and OLTP and propagation to purpose-build services for the queries that cannot be done in the NoSQL operational database. This is serverless: billed per second of compute, and volume of storage.

Currently, those materialized views support only selection and projection, but hopefully, in the future, they will be able to maintain aggregations with GROUP BY. As I’m not a fan of writing procedural code to process data, I really like materialized views for replication, rather than triggers and lambdas.

Aurora Serverless v2

You don’t want to pre-plan the capacity but have your database server scale up, out, and down according to the load? That’s serverless. You don’t provision servers, but capacity units: Aurora Capacity Units (ACU). Rather than multiplying the capacity when needed, by changing the instance size, the new Aurora Serverless v2 elasticity has a granularity of 0.5 ACU: you start by provisioning 0.5 ACU (not zero because you don’t want to wait seconds on first start after being idle). When compared with v1 (which is still available) the starting capacity is lower, the increment is finer, and the scale-down is in minute rather than a 15 minutes cool down. And it has all Aurora features: Multi-AZ, Global Database, Proxy,… Basically, this relies on the ability to add vCPU and memory online, and reduce it (this includes shrinking the buffer pool according to LRU). This means scale up and down as long as it is possible (depends on the neighbors activity in the same VM). It can scale out as well if in the compute fleet and move to another VM if needed, but the goal is to be able to scale-up in-place most of the time.

Releasing idle CPU is easy, but knowing how much RAM can be released without significantly increase I/O and response time, is probably more challenging. Anyway, we can expect min/max controls on it. The goal is not to replace the capacity planning, but to be more elastic with unplanned workloads.

You have the choice to migrate to v2, but look at the price. The ACU is more expensive, but given the elasticity, you probably save a lot (start lower, increase by smaller steps, decrease sooner).

Babelfish

This is the most revolutionary in my opinion. We want polyglot databases not only to have the coice of language or API for new developments. Many databases run applications, like ERP, which are tied to a specific commercial database. And companies want to get out of this vendor lock-in but migration of those applications is impossible. They use specific behaviour, or code, in the database, and they do it for a reason: the agility and performance of processing data within the database. The business logic is tied to data for consistency and performance, in stored procedures. There are many attempts to translate the code, but this works partially. And that’s not sufficient for enterprise software: rewriting is easy but testing… who will sign the UAT validation that the business code, working for years in a database engine, has been rewritten to show the same behaviour?

This is different when there is no application change at all, and that’s the idea of Babelfish, starting with SQL Server compatibility in Aurora. Given the powerful extensibility of PostgreSQL, AWS has built some extensions to understand T-SQL, and specific SQL Server datatype behaviour. They also add endpoints that understand the MS SQL network protocol. And then can run the applications running on SQL Server, without any change besides the connection to the new endpoint. Of course, this is not easy. Each application may have specificities and need to implement new extensions. And for this reason, AWS decided to Open Source this compatibility layer. Who will contribute? Look at the ISV who has an ERP running on SQL Server. They can invest in developing the compatibility with Babelfish, and then can propose to their customer to move out of the commercial database, to PostgreSQL. Of course, the goal of AWS is to get them to Aurora, providing the high availability and scalability that big companies may require. But Babelfish target is PostgreSQL, the community one.

About the target, Aurora comes with two flavors, using the upper layer from MySQL or PostgreSQL. PostgreSQL was chosen as it is probably the most compatible with commercial databases, and provides easy extensibility in procedural language, datatypes and extensions. About the source, it is SQL Server for the moment (a commercial reply to the licensing policy they have set for their cloud competitors) but I’m sure Oracle will come one day. Probably not 100% compatible, given the complexity of it, but the goal of an ISV is to provide 100% compatibility for one application. And, once compatibility is there, the database is also accessible with the native PostgreSQL API for further developments.

I’m looking forward to seeing how this Open Source project will get contributions. Aurora has a bad reputation in the PostgreSQL community, taking the community code, making money with it, and not giving back their optimizations. But this Babelfish can really extend the popularity of this reliable open-source database. Contributions are not only extensions for code-compatibility. I can expect lot of contributions about test cases and documentation.

I’ve seen a demo about T-SQL and Money datatype. This is nice, but a single-user test case. I’ll test concurrency as soon as I have the preview. Isolation of transactions, read and write consistency in multi-user workloads are very different in PostgreSQL and SQL Server. And test case for compatibility acceptance is not easy.

You can expect more technical insights on this blog, as soon as I have access to the preview. For the moment, let me share some pictures about the Oculus Quest 2 I got from the AWS Heroes program, and the Neon City place where we meet:

I forgot to mention the io2 Block Express which will be very interesting for database bandwidth with 4GB/s (and 256K IOPS if you really need this):


and the EC2 R5b instance:

Cet article Database announcements at re:Invent 2020 est apparu en premier sur Blog dbi services.

Even faster data loading with PostgreSQL 14? wal_level=none

$
0
0

PostgreSQL is already very fast with loading loading large amounts of data. You can follow this post to get some recommendations for loading data as fast as possible. In addition you can create unlogged tables, but this on the table level and not the whole cluster. With this patch there will be another option: wal_level=none. With this, only minimal WAL is written, but of course at the cost of losing durability. If the cluster crashes in that mode, the whole cluster is corrupted and can not anymore be started. If you accept that risks, this can be something for you, especially when do data warehousing and load time is one of the most important factors.

To have a baseline to start with lets create a simple file we’ll use for loading a table:

postgres=# create table t1 ( a int, b text, c date );
CREATE TABLE
postgres=# insert into t1 select x, md5(x::text), now() from generate_series(1,6000000) x;
INSERT 0 6000000
postgres=# copy t1 to '/var/tmp/demo.txt';
COPY 6000000
postgres=# \! ls -lha /var/tmp/demo.txt
-rw-r--r--. 1 postgres postgres 297M Nov 23 15:51 /var/tmp/demo.txt
postgres=# 

My current wal_level is replica, so lets change that to minimal:

postgres=# alter system set wal_level = minimal;
ALTER SYSTEM
postgres=# \! pg_ctl restart -m fast
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-23 15:53:23.424 CET - 1 - 209537 -  - @ LOG:  redirecting log output to logging collector process
2020-11-23 15:53:23.424 CET - 2 - 209537 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# show wal_level;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# show wal_level;
 wal_level 
-----------
 minimal
(1 row)

How long does it take to load that file into a new table with wal_level=minimal and how much WAL was generated?

postgres=# create table t2 ( like t1 );
CREATE TABLE
postgres=# \timing
Timing is on.
postgres=# select pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 0/39872628
(1 row)

Time: 0.757 ms
postgres=# copy t2 from '/var/tmp/demo.txt';
COPY 6000000
Time: 10008.335 ms (00:10.008)
postgres=# select pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 0/4C693DD8
(1 row)

Time: 0.857 ms
postgres=# select pg_wal_lsn_diff('0/4C693DD8','0/39872628');
 pg_wal_lsn_diff 
-----------------
       316807088
(1 row)
Time: 2.714 ms

The time is around 10 second and we have generated around 316MB of WAL. How does that change if we go with wal_level=none?

ALTER SYSTEM
Time: 28.625 ms
postgres=# \! pg_ctl restart -m fast
waiting for server to shut down.... done
server stopped
waiting for server to start....2020-11-23 16:00:25.648 CET - 1 - 209599 -  - @ LOG:  redirecting log output to logging collector process
2020-11-23 16:00:25.648 CET - 2 - 209599 -  - @ HINT:  Future log output will appear in directory "pg_log".
 done
server started
postgres=# show wal_level;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
Time: 21.286 ms
postgres=# show wal_level;
 wal_level 
-----------
 none
(1 row)

Time: 1.251 ms

Same test as above:

postgres=# create table t3 ( like t1 );
CREATE TABLE
Time: 44.676 ms
postgres=# select pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 0/4CA0A550
(1 row)

Time: 7.053 ms
postgres=# copy t3 from '/var/tmp/demo.txt';
COPY 6000000
Time: 7968.204 ms (00:07.968)
postgres=# select pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 0/4CA0A550
(1 row)

Time: 0.948 ms
postgres=# select pg_wal_lsn_diff('0/4CA0A550','0/4CA0A550');
 pg_wal_lsn_diff 
-----------------
               0
(1 row)

Time: 3.857 ms

We come down to 7 seconds and no WAL generated at all. That means faster loading and no space consumption in the pg_wal directory. Really nice, but be aware that the cluster gets corrupted when it crashes during loading:

postgres=# \! ps -ef | grep "postgres -D"
postgres  209599       1  0 16:00 ?        00:00:00 /u01/app/postgres/product/DEV/db_1/bin/postgres -D /u02/pgdata/DEV
postgres  209644  209534  0 16:04 pts/1    00:00:00 sh -c ps -ef | grep "postgres -D"
postgres  209646  209644  0 16:04 pts/1    00:00:00 grep postgres -D
postgres=# create table t4 ( like t1 );
CREATE TABLE
Time: 3.731 ms
postgres=# copy t4 from '/var/tmp/demo.txt';
COPY 6000000
Time: 8070.995 ms (00:08.071)

In another session kill the postmaster while the load is running:

postgres@centos8pg:/home/postgres/ [pgdev] kill -9 209599

If you try to restart the cluster afterwards this is the result:

2020-11-23 16:05:17.441 CET - 1 - 210089 -  - @ LOG:  database system was interrupted; last known up at 2020-11-23 16:00:25 CET
2020-11-23 16:05:17.441 CET - 2 - 210089 -  - @ FATAL:  detected an unexpected server shutdown when WAL logging was disabled
2020-11-23 16:05:17.441 CET - 3 - 210089 -  - @ HINT:  It looks like you need to deploy a new cluster from your full backup again.
2020-11-23 16:05:17.444 CET - 7 - 210087 -  - @ LOG:  startup process (PID 210089) exited with exit code 1
2020-11-23 16:05:17.444 CET - 8 - 210087 -  - @ LOG:  aborting startup due to startup process failure
2020-11-23 16:05:17.449 CET - 9 - 210087 -  - @ LOG:  database system is shut dow

If you can accept that, this can be a huge step to load data faster than it is possible now.

Cet article Even faster data loading with PostgreSQL 14? wal_level=none est apparu en premier sur Blog dbi services.

Incremental materialized view maintenance for PostgreSQL 14?

$
0
0

Since PostgreSQL 9.3 there is the possibility to create materialized views in PostgreSQL. PostgreSQL 9.4 (one year later) brought concurrent refresh which already is a major step forward as this allowed querying the materialized view while it is being refreshed. What still is missing are materialized views which refresh themselves, as soon as there are changed to the underlying tables. This might change with PostgreSQL 14, as this patch is in active development (at least since middle of 2019). Lets have a look at how that currently works and what the limitations are. If you want to play with this for yourself and do not want to apply the patches: There is a Docker container you can use for your testing as well.

If you want to have a materialized view that is incrementally updated you need to specify this when the materialized view is created:

postgres=# \h create materialized view
Command:     CREATE MATERIALIZED VIEW
Description: define a new materialized view
Syntax:
CREATE [ INCREMENTAL ] MATERIALIZED VIEW [ IF NOT EXISTS ] table_name
    [ (column_name [, ...] ) ]
    [ USING method ]
    [ WITH ( storage_parameter [= value] [, ... ] ) ]
    [ TABLESPACE tablespace_name ]
    AS query
    [ WITH [ NO ] DATA ]

URL: https://www.postgresql.org/docs/devel/sql-creatematerializedview.html

If you skip “INCREMENTAL”, the materialized view will not be updated automatically and you get the behavior as it is now. As we want to have a look at the new feature lets create a base table and then add an incrementally updated materialized view on top of it:

postgres=# create table t1 ( a int, b text, c date );
CREATE TABLE
postgres=# insert into t1 select x, x::text, now() from generate_series(1,1000000) x;
INSERT 0 1000000
postgres=# create incremental materialized view mv1 as select * from t1 with data;
SELECT 1000000
postgres=# 

“\d+” will show you that this materialized view is incrementally updated:

postgres=# \d+ mv1
                              Materialized view "public.mv1"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+---------+-----------+----------+---------+----------+--------------+-------------
 a      | integer |           |          |         | plain    |              | 
 b      | text    |           |          |         | extended |              | 
 c      | date    |           |          |         | plain    |              | 
View definition:
 SELECT t1.a,
    t1.b,
    t1.c
   FROM t1;
Access method: heap
Incremental view maintenance: yes

If we update the underlying table, the materialized view gets updated automatically:

postgres=# insert into t1 (a,b,c) values(-1,'aaa',now());
INSERT 0 1
postgres=# select * from mv1 where a = -1;
 a  |  b  |     c      
----+-----+------------
 -1 | aaa | 2020-11-23
(1 row)

postgres=# update t1 set a = -2 where a = -1;
UPDATE 1
postgres=# select * from mv1 where a = -2;
 a  |  b  |     c      
----+-----+------------
 -2 | aaa | 2020-11-23
(1 row)

postgres=# 

That’s really cool but you need to be aware that this comes with a cost: Modifying (insert/update/delete) the underlying table(s) becomes more expensive. Lets compare a small bulk load into a table without a materialized view on top of it against the same load into a table with a materialized view on top:

postgres=# truncate table t1;
TRUNCATE TABLE
postgres=# create table t2 ( a int, b text, c date );
CREATE TABLE
postgres=# \timing
Timing is on.
postgres=# insert into t1 select x, x::text, now() from generate_series(1,1000000) x;
INSERT 0 1000000
Time: 3214.712 ms (00:03.215)
postgres=# insert into t2 select x, x::text, now() from generate_series(1,1000000) x;
INSERT 0 1000000
Time: 1285.578 ms (00:01.286)
postgres=# insert into t1 select x, x::text, now() from generate_series(1,1000000) x;
INSERT 0 1000000
Time: 4117.097 ms (00:04.117)
postgres=# insert into t2 select x, x::text, now() from generate_series(1,1000000) x;
INSERT 0 1000000
Time: 1511.681 ms (00:01.512)
postgres=# insert into t1 select x, x::text, now() from generate_series(1,1000000) x;
INSERT 0 1000000
Time: 3844.273 ms (00:03.844)
postgres=# insert into t2 select x, x::text, now() from generate_series(1,1000000) x;
INSERT 0 1000000
Time: 1463.377 ms (00:01.463)

Without a materialized view, the load time is around 3 times faster, so you have to decide what is more important to you: Fast loading or up to date materialized views.

Finally: Here is the Wiki page that summarizes the feature and also lists some limitations.

Cet article Incremental materialized view maintenance for PostgreSQL 14? est apparu en premier sur Blog dbi services.

Aurora Serverless v2 (preview) – RAM

$
0
0

By Franck Pachot

.
What is Aurora Serverless? That’s the RDS Aurora name for auto-scaling: instead of provisioning an instance size (from the burstable db.t3.small with 2 vCPU and 2GB RAM to db.r5.16xlarge with 64 vCPU and 512 GB RAM) you define a range in term of ACU /Aurora Capacity Unit). ACU is about CPU + RAM. This blog post will focus on RAM.

Aurora Serverless v1

In Serverless v1 the ACU goes from 1 (2 GB RAM) to 256 (488GB RAM) and the granularity is in power of two: each scale-up will double the instance. You can also opt for a minimum capacity of 0 where the instance is stopped when unused (no connections for 5 minutes), but then you accept that it takes few minutes for the first connection after that to startup again to the minimum capacity defined. Scaling happens on measured metrics like CPU (scale-up when >70%, down when <30%). The number of connections (percentage of max), and available RAM (but this is actually how the maximum number of maximum connections is calculated from the instance RAM. I don’t think the RAM threshold considers the usage of the shared buffer pool in serverless v1. Aurora tries to scale-up as soon as one threshold is reached, but scale-down requires both CPU and connections to be lower than the threshold, and scale-down cannot happen within 15 minutes after scale-up (the cooldown period). And, as scaling in Serverless v1 means stopping the database and starting it in another VM, it tries to do it outside of active connections and may timeout or force (your choice).

Serverless 1 will still be available. And there are currently some features that are not available in v2, like PostgreSQL compatibility or Data API. But they will come and I suppose v2 will replace v1 one day.

Aurora Serverless v2 (preview)

This new service has a finer grain of auto-scaling. With server’s virtualization, there’s the possibility to increase the number of vCPU on a VM without restart, and MySQL 5.7.5 can change the buffer pool online. And this gives a finer grain in scaling up and scaling down (announced 0.5 ACU gain), and without waiting. The preview goes from 4 ACU (8GB) to 32 (64GB) but the plan is that the minimum is as low as 0.5 ACU. Then, you will probably not opt for stopping the instance, to avoid cold start latency, but keep it low at 0.5 ACU and then the database will be immediately available when a connection comes. And the granularity is by addition of 0.5 ACU rather than doubling the instance. So, even if the ACU is more expensive in v2, you probably consume less. And the scale-down doesn’t have to wait 15 minutes to cut by half the capacity as it can be decreased progressively online. Of course, having the instance restarted is still a possibility if there’s no vCPU available in the shared one, but that should not happen often.

Here is an example where I created a 8GB demo table:


--------------
create procedure demo(n int)
begin
 declare i int default 0;
 create table demo (id int not null primary key auto_increment, n int,x varchar(1000));
 insert into demo(n,x) values (1,lpad('x',898,'x'));
 while i < n do
  set i = i + 1;
  insert into demo(n,x) select n,x from demo;
  commit;
 end while;
end
--------------

--------------
call demo(23)
--------------

VM movement

During this creation I experienced multiple scale down (after the instance creation) and up (during the table row ingestion) and you can see that in this case the VM was probably moved to another server and had to restart. The “Engine Uptime” in CloudWatch testifies from the restarts and “Serverless Database Capacity” i the ACU (capacity units):

During those VM movements, I got this kind of error:


ERROR 2013 (HY000) at line 29: Lost connection to MySQL server during query

You just have to retry in this case. If you can’t then you will set some minimum/maximum ACU or maybe go to a provisioned database.


--------------
analyze table demo
--------------
Table           Op       Msg_type   Msg_text
-----------     -------  --------   --------
franck.demo     analyze  status     OK

--------------
select round(data_length/1024/1024) TABLE_MB,round(index_length/1024/1024) INDEX_MB,table_rows,table_name,table_type,engine from information_s
chema.tables where table_schema='franck'
--------------

TABLE_MB  INDEX_MB        table_rows      table_name      table_type      engine
--------  --------        ----------      ----------      ----------      ------
    8200         0           7831566            demo      BASE TABLE      InnoDB

Here I checked the size of my table: about 8GB.

buffer pool

I mentioned vCPU but what about RAM? The VM memory can also be resized online but there’s a difference. With CPU, if you scaled-down too early, it can scale-up immediately and you get back to the previous performance. But when you do that with RAM you have evicted some data from the caches that will not be back immediately until the first sessions warms it up again. So, the Serverless v2 has to look at the InnoDB LRU (Least Recently Used) buffers to estimate the risk to drop them. I mention InnoDB because for the moment Aurora Serverless v2 is on the MySQL compatibility only.

On my DEMO table I’ve run the following continuously:


use franck;
set profiling = 1;
select count(*) from demo where x='$(date) $RANDOM';
show profiles;

I’ve run that in a loop, so one session continuously active reading 8GB (the predicate does not filter anything and is there just to run a different query each time as I want to show the effect on the buffer pool and not the query cache).

Then, from 18:00 to 18:23 approximately I have run another session:


use franck;
delimiter $$
drop procedure if exists cpu;
create procedure cpu()
begin
 declare i int default 0;
 while 1  do
  set i = i + 1;
 end while;
end$$
delimiter ;
call cpu();

Please, don’t judge me on my MySQL procedural code 😉 I’m just looping in CPU
Then after 20 minutes:


MySQL [(none)]> show full processlist;

--------------
show full processlist
--------------

+-----+----------+--------------------+--------+---------+------+--------------+------------------------------------------------------------------------+
| Id  | User     | Host               | db     | Command | Time | State        | Info                                                                   |
+-----+----------+--------------------+--------+---------+------+--------------+------------------------------------------------------------------------+
|  42 | admin    | 192.169.29.1:22749 | franck | Query   |    0 | NULL         | call cpu()                                                             |
| 302 | rdsadmin | localhost          | NULL   | Sleep   |    1 | NULL         | NULL                                                                   |
| 303 | rdsadmin | localhost          | NULL   | Sleep   |    0 | NULL         | NULL                                                                   |
| 304 | rdsadmin | localhost          | NULL   | Sleep   |    0 | NULL         | NULL                                                                   |
| 305 | rdsadmin | localhost          | NULL   | Sleep   |    0 | NULL         | NULL                                                                   |
| 339 | admin    | 192.169.29.1:30495 | NULL   | Query   |    0 | starting     | show full processlist                                                  |
| 342 | admin    | 192.169.29.1:42711 | franck | Query   |    3 | Sending data | select count(*) from demo where x='Sun Dec  6 18:23:25 CET 2020 28911' |
+-----+----------+--------------------+--------+---------+------+--------------+------------------------------------------------------------------------+
7 rows in set (0.00 sec)

MySQL [(none)]> kill 42;
--------------
kill 42
--------------

I’ve stopped my running loop.
And here is what I can see from CloudWatch:

  • DB Connection: I always have one busy connection most of the time for the repeated scans on DEMO. And during 20 minutes a second one (my CPU procedure). The third one at the end is when I connected to kill the session
  • Serverless Database Capacity: this is the number of ACU. The value was 6 when only the scan was running, and scaled-up to 11 when the CPU session was running
  • CPU Utilization (Percent): this is the percentage on the number of OS threads. 13.6% when only the scan session was running, and reached 23% during the additional CPU run. Note that the number of OS threads was not increased when ACU scaled from 6 to 11… I’ll come back on that later
  • Engine Uptime: increasing by 1 minute every minute, this means that all scale up/down was done without a restart of the engine

Now, something that you don’t see in CloudWatch metrics, is the response time for my DEMO table scan.


Dec 06 17:52:37 1       79.03579950     select count(*) from demo where x='Sun Dec  6 17:51:17 CET 2020 13275'
Dec 06 17:53:59 1       79.83154300     select count(*) from demo where x='Sun Dec  6 17:52:37 CET 2020 4418'
Dec 06 17:55:20 1       79.81895825     select count(*) from demo where x='Sun Dec  6 17:53:59 CET 2020 11596'
Dec 06 17:56:40 1       78.29040100     select count(*) from demo where x='Sun Dec  6 17:55:20 CET 2020 25484'
Dec 06 17:58:02 1       80.15728125     select count(*) from demo where x='Sun Dec  6 17:56:40 CET 2020 15321'
Dec 06 17:59:42 1       98.29309550     select count(*) from demo where x='Sun Dec  6 17:58:02 CET 2020 31126'
Dec 06 18:01:09 1       85.07732725     select count(*) from demo where x='Sun Dec  6 17:59:42 CET 2020 29792'
Dec 06 18:02:30 1       79.16154650     select count(*) from demo where x='Sun Dec  6 18:01:09 CET 2020 21930'
Dec 06 18:02:34 1       2.81377450      select count(*) from demo where x='Sun Dec  6 18:02:30 CET 2020 12269'
Dec 06 18:02:38 1       2.77996150      select count(*) from demo where x='Sun Dec  6 18:02:34 CET 2020 30306'
Dec 06 18:02:42 1       2.73756325      select count(*) from demo where x='Sun Dec  6 18:02:38 CET 2020 22678'
Dec 06 18:02:47 1       2.77504400      select count(*) from demo where x='Sun Dec  6 18:02:42 CET 2020 933'
Dec 06 18:02:51 1       2.73966275      select count(*) from demo where x='Sun Dec  6 18:02:47 CET 2020 21922'
Dec 06 18:02:56 1       2.87023975      select count(*) from demo where x='Sun Dec  6 18:02:51 CET 2020 9158'
Dec 06 18:03:00 1       2.75959675      select count(*) from demo where x='Sun Dec  6 18:02:56 CET 2020 31710'
Dec 06 18:03:04 1       2.72658975      select count(*) from demo where x='Sun Dec  6 18:03:00 CET 2020 27248'
Dec 06 18:03:09 1       2.71731325      select count(*) from demo where x='Sun Dec  6 18:03:04 CET 2020 18965'

More than one minute to scan those 8GB, that’s 100 MB/s which is what we can expect from physical reads. 8GB doesn’t fit in a 6 ACU instance. Then, when I started another session, which triggered auto-scaling to 11 ACU, the memory became large enough and this is why my response time for the scan is now less than 3 seconds. I mentioned that I’d come back on CPU usage because that’s not easy to do the maths without looking at the OS. I think that this will deserve another blog post. I have seen 13.6% of CPU Utilization when the count was running alone in CPU. There were I/O involved here, but as far as I know 13.6% on 6 ACU is the equivalent of one sessions running in CPU. So probably the scan was not throttled by I/O? Then I added another session, which I know was running fully in CPU, and the scan was also running fully in CPU (all from buffer pool) between the connection time. I had 23% CPU Utilization and I think that in a 11 ACU scale, 2 sessions fully on CPU take 26%. I’ll do other test to try to confirm this. I miss Performance Insight here to get the real picture…

Then, as you have seen that scale-down happened when I stopped my concurrent session, you can imagine the response time:


Dec 06 18:24:23 1       3.03963650      select count(*) from demo where x='Sun Dec  6 18:24:17 CET 2020 2815'
Dec 06 18:24:27 1       2.80417800      select count(*) from demo where x='Sun Dec  6 18:24:23 CET 2020 16763'
Dec 06 18:24:31 1       2.77208025      select count(*) from demo where x='Sun Dec  6 18:24:27 CET 2020 29473'
Dec 06 18:24:36 1       3.13085700      select count(*) from demo where x='Sun Dec  6 18:24:31 CET 2020 712'
Dec 06 18:24:41 1       2.77904025      select count(*) from demo where x='Sun Dec  6 18:24:36 CET 2020 17967'
Dec 06 18:24:45 1       2.76111900      select count(*) from demo where x='Sun Dec  6 18:24:41 CET 2020 20407'
Dec 06 18:24:49 1       2.79092475      select count(*) from demo where x='Sun Dec  6 18:24:45 CET 2020 7644'
Dec 06 18:26:17 1       85.68287300     select count(*) from demo where x='Sun Dec  6 18:24:49 CET 2020 691'
Dec 06 18:27:40 1       81.58135400     select count(*) from demo where x='Sun Dec  6 18:26:17 CET 2020 14101'
Dec 06 18:29:02 1       80.00523900     select count(*) from demo where x='Sun Dec  6 18:27:40 CET 2020 31646'
Dec 06 18:30:22 1       78.79213700     select count(*) from demo where x='Sun Dec  6 18:29:02 CET 2020 811'
Dec 06 18:31:42 1       78.37765950     select count(*) from demo where x='Sun Dec  6 18:30:22 CET 2020 24539'
Dec 06 18:33:02 1       78.64492525     select count(*) from demo where x='Sun Dec  6 18:31:42 CET 2020 789'
Dec 06 18:34:22 1       78.36776750     select count(*) from demo where x='Sun Dec  6 18:33:02 CET 2020 2321'
Dec 06 18:35:42 1       78.38105625     select count(*) from demo where x='Sun Dec  6 18:34:22 CET 2020 27716'
Dec 06 18:37:04 1       79.74060525     select count(*) from demo where x='Sun Dec  6 18:35:42 CET 2020 487'

Yes, down to 6 ACU, the buffer pool is shrinked, back to physical I/O…

This is the risk when CPU and RAM are scaled in proportion: a single-thread may not have enough RAM in order to save on CPU. And, paradoxically, can get more when there are more concurrent activity. Here is the kind of scenario I would not like to encounter:

ACU and Buffer Pool size

I’ve seen 0 ACU only during the creation, but in this preview there is no “pause” option and we can go only between 4 and 32 ACU.

I have tested all of them to check the related settings. You see ACU and max_connections in x-axis, and VM size as well as the buffer size and query cache size:

Here you see what happened with my 8GB demo table, it didn’t fit in a 6 ACU shape where the buffer pool is 6GB but stayed in memory in the 11 ACU shape with 13.5 GB.
Among the interesting buffer pool settings that are different in Serverless:


--------------
show global variables like '%innodb%buffer_pool%'
--------------

Variable_name   Value
innodb_buffer_pool_chunk_size   157286400
innodb_buffer_pool_dump_at_shutdown     OFF
innodb_buffer_pool_dump_now     OFF
innodb_buffer_pool_dump_pct     25
innodb_buffer_pool_filename     ib_buffer_pool
innodb_buffer_pool_instances    8
innodb_buffer_pool_load_abort   OFF
innodb_buffer_pool_load_at_startup      OFF
innodb_buffer_pool_load_now     OFF
innodb_buffer_pool_size 3774873600
innodb_shared_buffer_pool_uses_huge_pages       OFF
--------------

Huge pages are off in Serverless. This innodb_shared_buffer_pool_uses_huge_pages is not a MySQL but an Aurora specific one which is ON with the provisioned flavor of Aurora, but off in serverless. This makes sense given how Serverless can allocate and de-allocate memory.

ACU and CPU

As I mentioned, this deserves a new blog post. Amazon does not give the ACU – vCPU equivalence. And, given the CPU Utilization percentages I see, I don’t think that the VM is resized. Except when we see that the engine is restarted.

The price

About the price, I let you read Jeremy Daly analysis: https://www.jeremydaly.com/aurora-serverless-v2-preview/
My opinion… serverless is a feature that the cloud provider gives you to lower their revenue. Then, of course, it has to be more expensive. The cloud provider must keep a margin of idle CPU in order to face your scale-out without moving the VM (taking more time and flushing memory, which compromises availability). You pay more when busy, but you save on idle time without risking saturation at peak.

And anyway, don’t forget that Serverless is an option. It may fit your needs or not. If you don’t want the buffer pool effect that I’ve described above, you can provision an instance where you know exactly how much RAM you have. And… don’t forget this is preview, like beta, and anything can change. Next post gives more details about CPU Utilization: https://blog.dbi-services.com/aurora-serverless-v2-cpu/

Cet article Aurora Serverless v2 (preview) – RAM est apparu en premier sur Blog dbi services.

Aurora Serverless v2 (preview) – CPU

$
0
0

By Franck Pachot

.
This follows my previous post https://blog.dbi-services.com/aurora-serverless-v2-ram/ ‎which you should read before this one. I was looking at the auto-scaling of RAM and it is now time to look at the CPU Utilization.

I have created an Aurora Serverless v2 database (please don’t forget it is the beta preview) with auto-scaling from 4 ACU to 32 ACU. I was looking at a table scan to show how the buffer pool is dynamically resized with auto-scaling. Here I’ll start to run this same cpu() procedure in one, then two, then tree… concurrent sessions to show auto-scaling and related metrics.

Here is the global workload in number of queries per second (I have installed PMM on AWS in a previous post so let’s use it):


And the summary of what I’ve run, with the auto-scaled capacity and the CPU utilization measured:


10:38 1 session  running,  6 ACU , 14% CPU usage
10:54 2 sessions running, 11 ACUs, 26% CPU usage
11:09 3 sessions running, 16 ACUs, 39% CPU usage
11:25 4 sessions running, 21 ACUs, 50% CPU usage
11:40 5 sessions running, 26 ACUs, 63% CPU usage
11:56 6 sessions running, 31 ACUs, 75% CPU usage
12:12 7 sessions running, 32 ACUs, 89% CPU usage
12:27 8 sessions running, 32 ACUs, 97% CPU usage

The timestamp shows when I started to add one more session running in CPU, so that we can match with the metrics from CloudWatch. From there, it looks like the Aurora database engine is running on an 8 vCPU machine and the increase of ACU did not change dynamically the OS threads the “CPU Utilization” metric is based on.

Here are the details from CloudWatch:

The metrics are:

  • Serverless Capacity Units on top-left: the auto-scaled ACU from 4 to 32 (in the preview), with a granularity of 0.5
  • CPU Utilization on top-right: the sessions running in CPU as a pourcentage of available threads
  • Engine Uptime on bottom-left: there were no restart during those runs
  • DB connections on botton right: I had 4 idle sessions before starting, then substract 4 and you have the sessions running

With 8 sessions in CPU, I’ve saturated the CPU and, as we reached 100%, my guess is that those are 8 cores, not hyperthreaded. As this is 32 ACUs, this would mean that an ACU is 1/4th of a core, but…

Here is the same metric I displayed from PMM, but here from CloudWatch, to look again how the workload scales:

If ACUs were proportional to the OS cores, I would expect linear performance, which is not the case. One session runs at 1.25M queries per second on 6 ACUs. Two sessions are at 1.8M queries per second on 11 ACUs. Tree sessions at 2.5M queries/s on 16 ACU. So the math is not so simple. Does this mean that 16 ACU does not offer the same throughput as two times 8 ACU? And, 8 vCPU with 64 GB, does that mean that when I start a serverless database with a 32 ACU maximum it runs on a db.r5.2xlarge, whatever the actual ACU it scales to? Is the VM simply provisioned on the maximum ACU and CPU limited by cgroup or similar?

I’ve done another test, this time fixing the min and max ACU to 16. So, maybe, this is similar to provisioning a db.r5.xlarge.
And I modified my cpu() procedure to stop after 10 million loops:


delimiter $$
drop procedure if exists cpu;
create procedure cpu()
begin
 declare i int default 0;
 while i < 1e7  do
  set i = i + 1;
 end while;
end$$
delimiter ;

1 million loops, this takes 50 seconds on dbfiddle, and you can test it on other platforms where you have an idea of the CPU speed.

I’ve run a loop that connects, run this function and displays the time and loop again:


Dec 07 18:41:45 real    0m24.271s
Dec 07 18:42:10 real    0m25.031s
Dec 07 18:42:35 real    0m25.146s
Dec 07 18:43:00 real    0m24.817s
Dec 07 18:43:24 real    0m23.868s
Dec 07 18:43:48 real    0m24.180s
Dec 07 18:44:12 real    0m23.758s
Dec 07 18:44:36 real    0m24.532s
Dec 07 18:45:00 real    0m23.651s
Dec 07 18:45:23 real    0m23.540s
Dec 07 18:45:47 real    0m23.813s
Dec 07 18:46:11 real    0m24.295s
Dec 07 18:46:35 real    0m23.525s

This is one session and CPU usage is 26% here (this is why I think that my 16 ACU serverless database runs on a 4 vCPU server)


Dec 07 18:46:59 real    0m24.013s
Dec 07 18:47:23 real    0m24.318s
Dec 07 18:47:47 real    0m23.845s
Dec 07 18:48:11 real    0m24.066s
Dec 07 18:48:35 real    0m23.903s
Dec 07 18:49:00 real    0m24.842s
Dec 07 18:49:24 real    0m24.173s
Dec 07 18:49:49 real    0m24.557s
Dec 07 18:50:13 real    0m24.684s
Dec 07 18:50:38 real    0m24.860s
Dec 07 18:51:03 real    0m24.988s

This is two sessions (I’m displaying the time for one only) and CPU usage is 50% which confirms my guess: I’m using half of the CPU resources. And the response time per session is till the same as when one session only was running.


Dec 07 18:51:28 real    0m24.714s
Dec 07 18:51:53 real    0m24.802s
Dec 07 18:52:18 real    0m24.936s
Dec 07 18:52:42 real    0m24.371s
Dec 07 18:53:06 real    0m24.161s
Dec 07 18:53:31 real    0m24.543s
Dec 07 18:53:55 real    0m24.316s
Dec 07 18:54:20 real    0m25.183s

I am now running 3 sessions there and the response time is still similar (I am at 75% CPU usage so obviously I have more than 2 cores here – no hyperthreading – or I should have seen some performance penalty when running more threads than cores)


Dec 07 18:54:46 real    0m25.937s
Dec 07 18:55:11 real    0m25.063s
Dec 07 18:55:36 real    0m24.400s
Dec 07 18:56:01 real    0m25.223s
Dec 07 18:56:27 real    0m25.791s
Dec 07 18:57:17 real    0m24.798s
Dec 07 18:57:42 real    0m25.385s
Dec 07 18:58:07 real    0m24.561s

This was with 4 sessions in total. The CPU is near 100% busy and the response time is still ok, which confirms I have 4 cores available to run that.


Dec 07 18:58:36 real    0m28.562s
Dec 07 18:59:06 real    0m30.618s
Dec 07 18:59:36 real    0m30.002s
Dec 07 19:00:07 real    0m30.921s
Dec 07 19:00:39 real    0m31.931s
Dec 07 19:01:11 real    0m32.233s
Dec 07 19:01:43 real    0m32.138s
Dec 07 19:02:13 real    0m29.676s
Dec 07 19:02:44 real    0m30.483s

One more session here. Now the CPU is a 100% and the processes have to wait 1/5th of their time in runqueue as there is only 4 threads available. That’s an additional 20% that we can see in the response time.

Not starting more processes, but increasing the capacity now, setting the maximum ACU to 24 which then enables auto-scaling:


...
Dec 07 19:08:02 real    0m33.176s
Dec 07 19:08:34 real    0m32.346s
Dec 07 19:09:01 real    0m26.912s
Dec 07 19:09:25 real    0m24.319s
Dec 07 19:09:35 real    0m10.174s
Dec 07 19:09:37 real    0m1.704s
Dec 07 19:09:39 real    0m1.952s
Dec 07 19:09:41 real    0m1.600s
Dec 07 19:09:42 real    0m1.487s
Dec 07 19:10:07 real    0m24.453s
Dec 07 19:10:32 real    0m25.794s
Dec 07 19:10:57 real    0m24.917s
...
Dec 07 19:19:48 real    0m25.939s
Dec 07 19:20:13 real    0m25.716s
Dec 07 19:20:40 real    0m26.589s
Dec 07 19:21:06 real    0m26.341s
Dec 07 19:21:34 real    0m27.255s

At 19:00 I increased to maximum ACU to 24 and let it auto-scale. The engine restarted at 19:09:30 and I got some errors until 19:21 where I reached the optimal response time again. I have 5 sessions running on a machine sized for 24 ACUs which I think is 6 OS threads and then I expect 5/6=83% CPU utilization if all my hypothesis are right. Here are the CloudWatch metrics:

Yes, it seems we reached this 83% after some fluctuations. Those irregularities may be the consequence of my scripts running loops of long procedures. When the engine restarted (visible in “Engine Uptime”), I was disconnected for a while (visible in “DB Connections”), then the load decreased (visible in “CPU Utilization”), then scaling-down the available resources (visible in “Serverless Capacity Unit”)

Please remember, all those are guesses as very little information is disclosed about how it works internally. And this is a preview beta, many things will be different when GA. The goal of this blog is only to show that a little understanding about how it works will be useful when deciding between provisioned or serverless, think about side effects, and interpret the CloudWatch metrics. And we don’t need huge workloads for this investigation: learn on small labs and validate it on real stuff.

Cet article Aurora Serverless v2 (preview) – CPU est apparu en premier sur Blog dbi services.

Viewing all 2879 articles
Browse latest View live