Quantcast
Channel: Archives des Oracle - dbi Blog
Viewing all 523 articles
Browse latest View live

Creating archived redolog-files in group dba instead of oinstall

$
0
0

Since Oracle 11g files created by the database belong by default to the Linux group oinstall. Changing the default group after creating the central inventory is difficult. In this Blog I want to show how locally created archived redo can be in group dba instead of oinstall.

One of my customers had the requirement to provide read-access on archived redo to an application for logmining. To ensure the application can access the archived redo, we created an additinal local archive log destination:


LOG_ARCHIVE_DEST_9 = 'LOCATION=/logmining/ARCHDEST/NCEE19C valid_for=(online_logfile,primary_role)'

and provided NFS-access to that directory for the application. To ensure that the application can access the archived redo, the remote user was part of a remote dba-group, which had the same group-id (GID) as the dba-group on the DB-server. Everything worked fine until we migrated to a new server and changed the setup to use oinstall as the default group for Oracle. The application could no longer read the files, because they were created with group oinstall:


oracle@19c:/logmining/ARCHDEST/NCEE19C/ [NCEE19C] ls -ltr
-rw-r-----. 1 oracle oinstall 24403456 Oct 9 21:21 1_32_1017039068.dbf
-rw-r-----. 1 oracle oinstall 64000 Oct 9 21:25 1_33_1017039068.dbf
-rw-r-----. 1 oracle oinstall 29625856 Oct 9 21:27 1_34_1017039068.dbf
oracle@19c:/logmining/ARCHDEST/NCEE19C/ [NCEE19C]

One possibility to workaround this would have been to use the id-mapper on Linux, but there’s something better:

With the group-sticky-bit on Linux we can achieve, that all files in a directory are part of the group of the directory.

I.e.


oracle@19c:/logmining/ARCHDEST/ [NCEE19C] ls -l
total 0
drwxr-xr-x. 1 oracle dba 114 Oct 9 21:27 NCEE19C
oracle@19c:/logmining/ARCHDEST/ [NCEE19C] chmod g+s NCEE19C
oracle@19c:/logmining/ARCHDEST/ [NCEE19C] ls -l
drwxr-sr-x. 1 oracle dba 114 Oct 9 21:27 NCEE19C

Whenever an archived redo is created in that directory it will be in the dba-group:


SQL> alter system switch logfile;
 
System altered.
 
SQL> exit
 
oracle@19c:/logmining/ARCHDEST/ [NCEE19C] cd NCEE19C/
oracle@19c:/logmining/ARCHDEST/NCEE19C/ [NCEE19C] ls -ltr
-rw-r-----. 1 oracle oinstall 24403456 Oct 9 21:21 1_32_1017039068.dbf
-rw-r-----. 1 oracle oinstall 64000 Oct 9 21:25 1_33_1017039068.dbf
-rw-r-----. 1 oracle oinstall 29625856 Oct 9 21:27 1_34_1017039068.dbf
-rw-r-----. 1 oracle dba 193024 Oct 9 21:50 1_35_1017039068.dbf
oracle@19c:/logmining/ARCHDEST/NCEE19C/ [NCEE19C]

To make all files part of the dba-group use chgrp and use the newest archivelog as a reference:


oracle@19c:/logmining/ARCHDEST/NCEE19C/ [NCEE19C] chgrp --reference 1_35_1017039068.dbf 1_3[2-4]*.dbf
oracle@19c:/logmining/ARCHDEST/NCEE19C/ [NCEE19C] ls -ltr
-rw-r-----. 1 oracle dba 24403456 Oct 9 21:21 1_32_1017039068.dbf
-rw-r-----. 1 oracle dba 64000 Oct 9 21:25 1_33_1017039068.dbf
-rw-r-----. 1 oracle dba 29625856 Oct 9 21:27 1_34_1017039068.dbf
-rw-r-----. 1 oracle dba 193024 Oct 9 21:50 1_35_1017039068.dbf
oracle@19c:/logmining/ARCHDEST/NCEE19C/ [NCEE19C]

Hope this helps somebody.

Cet article Creating archived redolog-files in group dba instead of oinstall est apparu en premier sur Blog dbi services.


odacli create-database error DCS-10802: Insufficient disk space on file system: database

$
0
0

Introduction

I was reimaging an X6-2M ODA after various patching troubles, and everything was fine. After several databases created, the next ones could no more be created.

DCS-10802: Insufficient disk space on file system: database. Expected free space (MB): {1}, available space (MB): {2}

I’ve spent some time on it, and finally found the cause of the problem. And the solution.

Context

After successfully reimaged an X6-2M ODA with 18.5, and applied the patch for the firmwares, ILOM and disks, I was creating all the databases with odacli with the following commands:


odacli create-database -hm XyXyXyXyXyXy --dbstorage ACFS --characterset WE8MSWIN1252 --databaseUniqueName HPMVRN --dbhomeid '0704ef7c-0cb9-4525-8edb-8d70b8f7ddfb' --dblanguage AMERICAN --dbname HPMVRN --dbshape odb1s --dbterritory AMERICA --no-cdb --no-dbconsole --json
odacli create-database ...

Each database is created with the smallest shape odb1s, as I later fine tune each instance according to my needs.

After the 8th or 9th database created, the next ones ended with a failure:

odacli describe-job -i "2ad9f7e8-331d-4a82-bb7b-c88ddad1cdf8"

Job details
----------------------------------------------------------------
ID: 2ad9f7e8-331d-4a82-bb7b-c88ddad1cdf8
Description: Database service creation with db name: HPMVRN
Status: Failure
Created: November 8, 2019 1:33:23 PM CET
Message: DCS-10802:Insufficient disk space on file system: database. Expected free space (MB): {1}, available space (MB): {2}

Task Name Start Time End Time Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Database Service creation November 8, 2019 1:33:23 PM CET November 8, 2019 1:33:58 PM CET Failure
Database Service creation November 8, 2019 1:33:23 PM CET November 8, 2019 1:33:58 PM CET Failure
Setting up ssh equivalance November 8, 2019 1:33:23 PM CET November 8, 2019 1:33:23 PM CET Success
Creating volume datHPMVRN November 8, 2019 1:33:23 PM CET November 8, 2019 1:33:42 PM CET Success
Creating ACFS filesystem for DATA November 8, 2019 1:33:42 PM CET November 8, 2019 1:33:57 PM CET Success
Database Service creation November 8, 2019 1:33:57 PM CET November 8, 2019 1:33:58 PM CET Failure
Database Creation November 8, 2019 1:33:57 PM CET November 8, 2019 1:33:58 PM CET Failure

Analysis

Error seems obvious: Insufficient disk space. Let’s check the disk space:


df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroupSys-LogVolRoot
30G 5.6G 23G 20% /
tmpfs 126G 1.1G 125G 1% /dev/shm
/dev/sda1 477M 41M 411M 9% /boot
/dev/mapper/VolGroupSys-LogVolOpt
109G 85G 19G 82% /opt
/dev/mapper/VolGroupSys-LogVolU01
148G 31G 110G 22% /u01
/dev/asm/datdbtest-8 100G 2.2G 98G 3% /u02/app/oracle/oradata/DBTEST
/dev/asm/datgotst-8
100G 2.2G 98G 3% /u02/app/oracle/oradata/GOTST
/dev/asm/datgeval-8
100G 2.2G 98G 3% /u02/app/oracle/oradata/GEVAL
/dev/asm/commonstore-8
5.0G 391M 4.7G 8% /opt/oracle/dcs/commonstore
/dev/asm/datsmval-8
100G 44G 57G 44% /u02/app/oracle/oradata/SMVAL
/dev/asm/datvival-8
100G 34G 67G 34% /u02/app/oracle/oradata/VIVAL
/dev/asm/datvjval-8
100G 56G 45G 56% /u02/app/oracle/oradata/VJVAL
/dev/asm/dump-8 200G 132G 69G 66% /dpdumplocal
/dev/asm/dataoval-8
100G 2.2G 98G 3% /u02/app/oracle/oradata/AOVAL
/dev/asm/datgrtst-8 100G 2.2G 98G 3% /u02/app/oracle/oradata/GRTST
/dev/asm/datgival-8
100G 7.8G 93G 8% /u02/app/oracle/oradata/GIVAL
/dev/asm/reco-329 74G 56G 19G 76% /u03/app/oracle
/dev/asm/datgetst-8 100G 2.2G 98G 3% /u02/app/oracle/oradata/GETST
/dev/asm/datgftst-8
100G 30G 71G 30% /u02/app/oracle/oradata/GFTST
/dev/asm/datgctst-8
100G 2.2G 98G 3% /u02/app/oracle/oradata/GCTST
/dev/asm/dathpmvrn-8 100G 448M 100G 1% /u02/app/oracle/oradata/HPMVRN

No filesystem is full. And create-database managed to create the acfs volume for data successfully. Let’s try to put something in it:


cp /the_path/the_big_file /u02/app/oracle/oradata/HPMVRN/

No problem with this acfs volume.

Let’s try to create the database into ASM:


odacli list-databases | tail -n 1

bed7f9a0-e108-4423-8b2c-d7c33c795e87 HPMVRN Si 12.1.0.2 false Oltp Odb1s Acfs Failed 0704ef7c-0cb9-4525-8edb-8d70b8f7ddfb

odacli delete-database -i "bed7f9a0-e108-4423-8b2c-d7c33c795e87"

odacli create-database -hm XyXyXyXyXyXy --dbstorage ASM --characterset WE8MSWIN1252 --databaseUniqueName HPMVRN --dbhomeid '0704ef7c-0cb9-4525-8edb-8d70b8f7ddfb' --dblanguage AMERICAN --dbname HPMVRN --dbshape odb1s --dbterritory AMERICA --no-cdb --no-dbconsole --json

No problem here, really seems to be related to acfs.

Where does the create-database also need free space? For sure, in the RECOvery area filesystem, created with the very first database:


Filesystem Size Used Avail Use% Mounted on
/dev/asm/reco-329 74G 56G 19G 76% /u03/app/oracle

Let’s create a file in this filesystem:

cp /the_path/the_big_file /u03/app/oracle/

No problem.

Quite strange, and as the database is not yet created, there is no alert_HPMVRN.log to look for the error…

Maybe the RECOvery area filesystem is not big enough for odacli. acfs concept means autoextensible filesystems, but as all my Fast Recovery Areas of all my databases probably won’t fit in the allocated 74GB, odacli may fail. How to extend the acfs RECOvery filesystem? Just put enough dummy files in it, and then remove them.


cd /u03/app/oracle
cp /the_path/the_big_file tmpfile1
cp tmpfile1 tmpfile2
cp tmpfile1 tmpfile3
cp tmpfile1 tmpfile4
cp tmpfile1 tmpfile5
cp tmpfile1 tmpfile6
rm -rf tmp*

df -h /u03/app/oracle
Filesystem Size Used Avail Use% Mounted on
/dev/asm/reco-329 88G 59G 30G 67% /u03/app/oracle

RECOvery filesystem is slightly bigger, let’s retry the database creation:


odacli delete-database -i "985b1d37-6f84-4d64-884f-3a429c195a5d"

odacli create-database -hm XyXyXyXyXyXy --dbstorage ACFS --characterset WE8MSWIN1252 --databaseUniqueName HPMVRN --dbhomeid '0704ef7c-0cb9-4525-8edb-8d70b8f7ddfb' --dblanguage AMERICAN --dbname HPMVRN --dbshape odb1s --dbterritory AMERICA --no-cdb --no-dbconsole --json

odacli describe-job -i "d5022d8b-9ddb-4f93-a84a-36477657794f"

Job details
----------------------------------------------------------------
ID: d5022d8b-9ddb-4f93-a84a-36477657794f
Description: Database service creation with db name: HPMVRN
Status: Success
Created: November 8, 2019 1:47:28 PM CET
Message:

Task Name Start Time End Time Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Setting up ssh equivalance November 8, 2019 1:47:30 PM CET November 8, 2019 1:47:40 PM CET Success
Creating volume datHPMVRN November 8, 2019 1:47:40 PM CET November 8, 2019 1:47:59 PM CET Success
Creating ACFS filesystem for DATA November 8, 2019 1:47:59 PM CET November 8, 2019 1:48:14 PM CET Success
Database Service creation November 8, 2019 1:48:14 PM CET November 8, 2019 1:54:38 PM CET Success
Database Creation November 8, 2019 1:48:14 PM CET November 8, 2019 1:53:02 PM CET Success
Change permission for xdb wallet files November 8, 2019 1:53:02 PM CET November 8, 2019 1:53:02 PM CET Success
Place SnapshotCtrlFile in sharedLoc November 8, 2019 1:53:02 PM CET November 8, 2019 1:53:04 PM CET Success
SqlPatch upgrade November 8, 2019 1:54:02 PM CET November 8, 2019 1:54:35 PM CET Success
updating the Database version November 8, 2019 1:54:35 PM CET November 8, 2019 1:54:38 PM CET Success
create Users tablespace November 8, 2019 1:54:38 PM CET November 8, 2019 1:54:40 PM CET Success

This is it. I was then able to create the next databases without any problem.

Final words

Keep in mind that all your Fast Recovery Areas should fit in your dedicated filesystem. To make sure that you will not encounter problems, resize you RECOvery filesystem to the sum of all your target FRA with acfsutil:

acfsutil size 500G –d /dev/asm/reco-329 /u03/app/oracle

Autoextensible filesystem never ensures you it will succeed in case of extension.

Or simply go for ASM instead of acfs on your ODA, if you do not need acfs features like snapshots. ASM is simplier and more efficient because it does not provision disk space for each database like acfs.

Cet article odacli create-database error DCS-10802: Insufficient disk space on file system: database est apparu en premier sur Blog dbi services.

Some words about SOUG Day in Lausanne

$
0
0

Today I participate to SOUG Day which takes place in Lausanne at the “Centre Pluriculturel et social d’Ouchy”.

After a coffee and a welcome speech by Yann Neushaus, Ludovico Caldara and Flora Barriele,

the event starts with 2 global sessions:

A l’heure du serverless, le futur va-t-il aller aux bases de données distribuées?

Franck Pachot makes a comparison between Oracle products (Active Data Guard, RAC, Sharding) and new distributed databases in order to scale-up and scale-out.
Briefly his talk makes reference to:
– Differences among RDBMS, NoSQL and NewSQL according to the CAP Theorem
– Definition and needs for NoSQL and NewSQL
– Definition of services such as Google’s Cloud Spanner, TiDB, CockroachDB, YugabyteDB.

From DBA to Data Engineer – How to survive a career transition?

Kamran Agayev from Azerbaijan speaks about what Big Data in general is and the transition from DBA to Database Engineer.
He addresses several interesting topics:
– Definition of Big Data
– Which are the skills for a Data Engineer, a Data Architect (more complex competences than “just” being a Database Administrator)
– Definition of products like Hadoop, Kafka, NoSQL
After the coffee break, the choice is between 2 different streams. Here some words about the sessions I attend.

Amazing SQL

Laetitia Avrot from EnterpriseDB talks about SQL, which is much more than what we know. SQL is different from other programming languages but it must be treated as one of them. At school we still learn SQL from before 1992, but in 1999 it changed to add relational algebra and data atomicity. PostgreSQL is very close to this standard. Laetitia shows lots of concrete examples of subqueries, common table expressions (CTE), lateral joins (not implemented in MySQL for the moment), anti joins, rollup, window functions, recursive CTE, and also some explanations about key words such as in, values, not in, not exists.

Graph Database

After the lunch, Gianni Ceresa presents property graph databases as combination of vertex (node, properties, ID) and edge (node, ID, label, properties). To start working with Oracle graphs, we can use PGX (Oracle Labs Parallel Graph AnalytiX). The OTN version is better for documentation. Through a demo, Gianni shows how to build a graph using Apache Zeppelin as interpreter and Phyton and Jupyter to visualize it. And then we can also use it to write some data analysis.

5 mistakes that you’re making while presenting your data

Luiza Nowak, a non-IT girl working with IT people (she is a board member of POUG), talks about IT presentations. There are 4 important parts defining them: the content, the story, the speaker performance and visualization.
Here recurrent errors of IT presentations and how to handle them:
1. Lack of data selection – you need to filter your data, to consider who and where you are talking to
2. Too much at once – you need to divide your content, create some suspense and put less information into slides to let the audience listen to you instead of reading them
3. Forget about contrast – you have to use contrast on purpose because it can be useful to underline something but it can also distract your audience
4. Wrong type of charts – you have to be clear about your data and explain results
5. You don’t have any story – you need to conceptualize your data.

How can Docker help a MariaDB cluster for disaster/recovery

Saïd Mendi from dbi services explains what a MariaDB Galera Cluster is and his benefits and how Docker can help in some critical situations. Actually you can create some delayed slaves which can be useful to emulate the flashback functionality.

Conclusion

SOUG Day arrives to an end. It was a nice opportunity to meet international speakers, discuss with some customers and colleagues, learn and share. As usual, this is part of dbi services spirit and matches with our values!
And now, I have to say you goodbye: it’s the aperitif and dinner time with the community 😉 Hope to see you at the next SOUG event.

Cet article Some words about SOUG Day in Lausanne est apparu en premier sur Blog dbi services.

A day of conferences with the Swiss Oracle User Group

$
0
0

Introduction

I’m not that excited with all these events arround Oracle technologies (and beyond) but it’s always a good place to learn new things, and maybe the most important, discover new ways of thinking. And regarding this point, I was not disappointed.

Franck Pachot: serverless and distributed database

Franck talked about scaling out, it means avoid monoliths. Most of the database servers are this kind of monoliths today. And he advises us to think microservices. It’s not so easy regarding the database component, but it could surely simplify the management of different modules through different developper teams. Achieving scaling out is also get rid of these old cluster technologies (think about RAC) and instead of that, adopt the “sharing nothing”: no storage sharing, no network sharing, etc.
It also means the need for db replication, and also scale of the writes: and that point is more complicated. Sharding is a key point for scaling out (put the associated data where the users resides).

I discovered the CAP theorem, a very interesting theory that shows us that there is actually no ultimate solution. You need to choose your priority: Consistancy and Availability, or Availability and Partition Tolerant or Consistency and Partiton Tolerant. Just remind to keep your database infrastructure adapted to your needs, a google-like infrastructure being probably nice but do you really need the same?

Kamran Aghayer: Transition from dba to data engineer

Times are changing. I knew that since several years, but now it’s like an evidence: as a traditional DBA, I will soon be deprecated. Old-school DBA jobs will be replaced by a lot of new jobs: data architect, data engineer, data analyst, data scientist, machine learning engineer, AI engineer, …

Kamran focused on Hadoop ecosystem and Spark especially when he needed to archive data from EXADATA to HADOOP (and explained how HADOOP manage data through HDFS filesystem and datanodes – sort of ASM). He used a dedicated connector, sort of wrapper using external tables. Actually this is also what’s inside the Big Data Appliance from Oracle. This task was out of the scope of a traditional DBA, as a good knowledge of the data was needed. So, traditionnal DBA is dead.

Stefan Oehrli – PDB isolation and security

Since Oracle announced the availability of 3 free PDBs with each container database, the interest for Multitenant increased.

We had an overview of the top 10 security risks, all about privileges, privilege abuse, unauthorized privileges elevation, platform vulnerability, sql injection, etc. If you’re already in the cloud with PAAS or DBAAS, risks are the same.

We had a presentation of several clues for risk mitigation:
– path_prefix: it’s some kind of chroot for the PDB
– PDB_os_credential (still bugs but…): concerns credentials and dbms_scheduler
– lockdown profiles: a tool for restricting database features like queuing, partitioning, Java OS access, altering the database. Restrictions working with inclusion or exclusion.

Paolo Kreth and Thomas Bauman: The role of the DBA in the era of Cloud, Multicloud and Autonomous Database

Already heard today that the classic DBA is soon dead. And now the second bullet. The fact is that Oracle worked hard to improve autonomous features during the last 20 years, and like it was presented, you realize that it’s clearly true. Who cares about extents management now?

But there is still a hope. DBA of tomorrow is starting today. As the DBA role actually sits between infrastructure team and data scientists, there is a way to architect your career. Keep a foot in technical stuff, but become a champion in data analysis and machine learning.

Or focus on development with opensource and cloud. The DBA job can shift, don’t miss this opportunity.

Nikitas Xenakis – MAA with 19c and GoldenGate 19c: a real-world case study

Hey! Finally, the DBA is not dead yet! Some projects still need technical skills and complex architecture. The presented project was driven by dowtime costs, and for some kind of businesses, a serious downtime can kill the company. The customer concerned by this project cannot afford more than 1h of global downtime.

We had an introduction of MAA (standing for Maximum Availability Architecture – see Oracle documentation for that).

You first need to estimate:
– the RPO: how much data you can afford to loose
– the RTO: how quick you’ll be up again
– the performance you expect after the downtime: because it matters

The presented infrastructure was composed of RHEL, RAC with Multitenant (1 PDB only), Acitve Data Guard and GoldenGate. The middleware was not from Oracle but configured to work with Transparent Application Failover.

For sure, you still need several old-school DBA’s to setup and manage this kind of infrastructure.

Luiza Nowak: Error when presenting your data

You can refer to the blog from Elisa USAI for more information.

For me, it was very surprising to discover how a presentation can be boring, confusing, missing the point just because of inappropriate slides. Be precise, be captivating, make use of graphics instead of sentences, make good use of the graphics, if you want your presentation to have the expected impact.

Julian Frey: Database cloning in a multitenant environment

Back to pure DBA stuff. Quick remind of why we need to clone, and what we need to clone (data, metadata, partial data, refreshed data only, anonymised data, etc). And now, always considering GDPR compliance!

Cloning before 12c was mainly done with these well known tools: rman duplicate, datapump, GoldenGate, dblinks, storage cloning, embedded clone.pl script (didn’t heard about this one before).

Starting from 12c, and only if you’re using multitenant, new convenient tools are available for cloning: PDB snapshot copy, snapshot carousel, refreshable copy, …

I discovered that you can duplicate a PDB without actually putting the source PDB in read only mode: you just need to put your source PDB in begin backup mode, copy the files, generate the metadata file and create the database with resetlogs. Nice feature.

You have to know that cloning a PDB is native with multitenant, a database being always a clone of something (at least an empty PDB is created from PDB$seed).

Note that Snapshot copy of a PDB is limited for some kind of filesystems, the most known being nfs and acfs. If you decide to go for multitenant without actually having the option, don’t forget to limit the maximum of PDB in your CDB settings. It’s actually a parameter: max_PDBs. Another interesting feature is the possibily to create a PDB from a source PDB without the data (but tablespace and tables are created).

Finally, and against all odds, datapump is still a great tool for most of the cases. You’d better still consider this tool too.

Conclusion

This was a great event, from great organizers, and if pure Oracle DBA is probably not a job that makes younger people dream, jobs dealing with data are not planned to disappear in the near future.

Cet article A day of conferences with the Swiss Oracle User Group est apparu en premier sur Blog dbi services.

First day at DOAG 2019 Conference

$
0
0

This year I have the opportunity to take place at one of the most popular Oracle conferences, the DOAG. In the meantime, I have the chance to give a talk on Thursday about Docker containers. Yes! Because at DOAG we are not only speaking about Oracle products.

I’m really impressed by the quality of the presentations. I followed 4 presentations, on different subjects/technologies and the quality of them was very good.

David Hueber, OCI vs AWS for Oracle SE2


David presented the difference between Oracle Cloud Infrastructure and Amazon Web Services for an Oracle Standard Edition 2. The goal of his presentation was to go in deep detail from the buzz idea that AWS is the best cloud provider. Technically for each features daily used by an Oracle user, what is the difference between this 2 cloud provided. In the end, I get surprised that for this specific case, AWS is not as good as OCI.

Lykle Thijssen, The Pillars of Continuous Delivery

Very interesting presentation from Lykle Thijssen regarding the basis of DevOps, with a clear definition and explanations about continuous delivery and all concepts around that.

Continuous delivery introduction:

  • Makes the difference between continuous delivery and continuous deployment
  • Builds, tests and release software with speed frequency
  • Reduce the costs, time and risk of delivering changes to make more incremental updates

The idea is to define properly the rules to make continuous delivery and avoid mistakes in order to take the benefits from continuous delivery.

Agile:
The agile development process is important for continuous delivery:

  • Flexible to change
  • Dev and test working in parallel
  • Shorter release cycles
  • More motivation to work on a short life cycle. You get faster feedback from users.

Why Microservices architecture is important for continuous delivery?

  • Cross-functional teams
  • Runtimes dependencies
  • Reduce the dependencies

Test Automation:
Test automation is important for continuous delivery:

  • Control the quality immediately
  • Automate repeatable steps every day for example
  • Test upon every deployment
  • Source control system (GitLab):
    One of the most important pieces of the “DevOps standard architecture” is the source version control. GitLab for instance.

    It allows us to create multiple branches in the Git server based on features and test them separately which allows to integrate them into the automation integration process.

    To conclude, all this point are very good reminders for us, who are working on a big DevOps project by implementing the continuous integration and delivery for one of our customers. We can finger out that we will need to manage better the source control integration (development process) by splitting the code into multiple branches and makes the merge progressively. The integration of the test automation is something very interesting and important that we need to take into account to complete our process and be full “DevOps compliant”.

    Oracle OCI und Kubernetes: Aufbau eines Clusters

    Yes, I followed a session in german :-)! A very interesting session regarding Kubernetes in OCI (Oracle Cloud Infrastructure). Thorsten Wussow explains to us the Kubernetes architecture in OCI and all the specifications.

    Oracle Container Engine for Kubernetes key points:

    • OKE is CNCF certified
    • max 1000 nodes
    • workers are managed by NodePools
    • RestAPI available
    • OCI CLI available
    • SSH to nodes also working

    Franck Pachot, Oracle 19 features

    Last but not least, the presentation from the popular Franck Pachot. An amazing room (Tokyo) with a recorded session, for the first time it’s really impressive for me.

    Franck started by talking about the DBRanking website which is considered as a reliable source for most people.
    In fact, he explained to us that it cannot be used as a reliable comparison tool because the ranking is only based on the popularity of the database, not on the features and the performances.

    After the introduction, Franck presented 19 features in Oracle compared with other open source DB such as PostgreSQL with a lot of interesting demos, as usual for some of the features.
    The core message was very clear: don’t only see the buzz and all information from social media but keep in mind that we are technicals expert and we need to go over the buzz words.

    Cet article First day at DOAG 2019 Conference est apparu en premier sur Blog dbi services.

Focus on 19c NOW!

$
0
0

Introduction

For years, Oracle used the same mechanism for database versioning. A major version, represented by the first number. And then a release number, 1 for the very first edition, and a mature and reliable release 2 for production databases. Both of them having patchsets (the last number) and regular patchset updates (the date optionally displayed at the end) to remove bugs and to increase security. Jumping from release 1 to release 2 required a migration as if you were coming from an older version. Recently, Oracle broke this release pace to introduce a new versioning system based on the year of release, like Microsoft and a lot of others did. Patchsets are also replaced by release updates. Quite obvious: it’s been a long time patchsets have become complete releases. Lots of Oracle DBAs are now in the fog, and as a result, could take wrong decision regarding the version to choose.

A recent history of Oracle Database versioning

Let’s focus on the versions currently running on most of customer’s databases:

  • 11.2.0.4: The terminal version of 11gR2 (long-term). 4 is the latest patchset of the 11gR2, there will never exist a 11.2.0.5. If you install the latest PSU (Patchset update) your database will precisely run on 11.2.0.4.191015 (as of the 29th of November 2019)
  • 12.1.0.2: The terminal version of 12cR1 (sort of long-term). A 12.1.0.1 existed but for a very short time
  • 12.2.0.1: first version of 12cR2 (short-term). This is the latest version with old versioning model
  • 18c: actually 12.2.0.2 – first patchset of the 12.2.0.1 (short-term). You cannot apply this patchset on top of the 12.2.0.1
  • 19c: actually 12.2.0.3 – terminal version of the 12cR2 (long-term). The next version will no more be based on 12.2 database kernel

18c and 19c also have sort of patchset but the name has changed: we’re now talking about RU (release update). RU are actually the second number, 18.8 for example. Each release update can also be updated with PSUs, still the last number, for example 18.8.0.0.191015.

Is there a risk to use older versions?

Actually, there is no risk using 11.2.0.4 and 12.1.0.2. These versions represent almost all the Oracle databases running in the world. Few people already migrated to 12.2 or newer versions. The risk is more related to the support provided by Oracle. With premier support (linked to the support fees almost every customer pay each year), you have limited access to My Oracle Support. Looking up for something in the knowledge database is OK, downloading old patches is OK, but downloading newest patches will no more be possible. And if you open a SR, the Oracle support team could ask you to buy extended support, or at least to apply the latest PSU you cannot download. If you want to keep your databases fully supported by Oracle, you’ll have to ask and pay for extended support, as far as your version can still be supported with this kind of support. For sure, 11gR1, 10gR2 and older versions are no more eligible for extended support.

Check this My Oracle Support note for fresh information about support timeline: Doc ID 742060.1

Should I migrate to 12.2 or 18c?

If you plan to migrate to 12.2 or 18c in 2020, think twice. The problem with these versions is that premier support is ending soon: before the end of 2020 for 12.2 and in the middle of 2021 for 18c. It’s very short and you probably won’t have the possibility to buy extended support (these are not terminal releases), you’ll have to migrate to 19c or newer version in 2020 or 2021.

Why 19c is probably the only version you should migrate to?

19c is the long-term support release, meaning that premier support will last longer (until 2023) and also that extended support will be available (until 2026). If you plan to migrate to 19c in 2020, you will benefit from all the desired patches and full support for 3 years. And there is a chance that Oracle will also offer extended support for the first year or more, as they did for 11.2 and 12.1, even it’s pure assumption.

How about the costs?

You probably own perpetual licenses, meaning that the Oracle database product is yours (if you are compliant regarding the number of users or processors defined in your contract). Your licenses are not attached to a specific version, you can use 11gR2, 12c, 18c, 19c… Each year, you pay support fees: these fees give you access to My Oracle Support, for downloading patches or opening a Service Request in case of problem. But you are supposed to run recent version of the database with this premier support. For example, as of the 29th of November 2019, the versions supported with premier support are 12.2.0.1, 18c and 19c. If you’re using older versions, like 12.1.0.2 or 11.2.0.4, you should pay additional fees for extended support. Extended support is not something you have to subscribe indefinitely, as the purpose is only to keep your database supported before you migrate to a newer version and return to premier support.

So, keeping older versions will cost you more, and in-time migration will keep your support fees as low as possible.

For sure, migrating to 19c also comes at a cost, but we’re now quite aware of the importance of migrating software and stay up to date for a lot of reasons.

Conclusion

Motivate your software vendor or your development team to validate and support 19c. The amount of work for supporting 19c against 18c or 12c is quite the same. All these versions being actually 12c. The behaviour of the database will be the same for most of us. Avoid migrating to 12.2.0.1 or 18c as you’ll have to migrate again in 1 year. Keep your 11gR2 and/or 12cR1 and take extended support for one year while preparing the migration to 19c if you’re not yet ready. 20c will be a kind of very first release 1: you probably won’t migrate to this version if you mostly consider stability and reliability for your databases.

Cet article Focus on 19c NOW! est apparu en premier sur Blog dbi services.

Dbvisit Standby 9 : Do you know the new snapshot feature?

$
0
0

Dbvisit snapshot option is a new feature available starting from version 9.0.06. I have tested this option and in this blog I am describing the tasks I have done.
The configuration I am using is following
dbvist1 : primary server
dbvist 2 : standby server
orcl : oracle 19c database
We suppose that the dbvisit environment is already set and the replication is going fine. See previous blog for setting up dbvisit standby
First there are two options
-Snapshot Groups
-Single Snapshots

Snapshot Groups
This option is ideal for companies that are using Oracle Standard Edition (SE,SE1,SE2) where they would like to have a “database that is read-only, but also being kept up to date” . It allows to create 2 or more read-only snapshots of the standby at a time interval. The snapshot will be opened in a read-only mode after its creation. A new service which will point to the latest snapshot is created and will be automatically managed by the listener.
The use of the feature is very easy. As it required managing filesystems like mount and umount, the user which will create the snapshot need some admin privileges.
In our case we have configured the user with full sudo privileges, but you can find all required privileges in the documentation .

oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)] sudo grep oracle /etc/sudoers
oracle ALL=(ALL) NOPASSWD:ALL
oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)]

The use of the option can be done by command line or using the graphical dbvserver console. But It is highly recommended to use the graphical tool.
When the option snapshot is available with your license, you will see following tab in the dbvserver console

To create a snapshot groups we just have to choose the option NEW SNAPSHOT GROUP

And then fill the required information. In this example
-We will create a service named snap_service
-We will generate a snapshot every 15 minutes (900 secondes)
There will be a maximum of 2 snapshots. When the limit is reached, the oldest snapshop is deleted. Just note that the latest snapshot will be deleted only when the new snapshot is successfully created. That’s why we can have sometimes 3 snapshots
-The snapshots will be prefixed by MySna

After the snapshots are created without errors, we have to start the snapshots generation by clicking on the green button

And then we can see the status

On OS level we can verify that the first snapshot is created and that the corresponding instance started

oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)] ps -ef | grep pmon
oracle    2300     1  0 09:05 ?        00:00:00 ora_pmon_orcl
oracle    6765     1  0 09:57 ?        00:00:00 ora_pmon_MySna001
oracle    7794  1892  0 10:00 pts/0    00:00:00 grep --color=auto pmon
oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)]

We can also verify that the service snap_service is registered in the listener. This service will be automatically pointed to the latest snapshot of the group.

oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)] lsnrctl status

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 16-JAN-2020 10:01:49

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=dbvisit2)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                16-JAN-2020 09:05:07
Uptime                    0 days 0 hr. 56 min. 42 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/dbvisit2/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dbvisit2)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
…
…
…
Service "snap_service" has 1 instance(s).
  Instance "MySna001", status READY, has 1 handler(s) for this service...
The command completed successfully
oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)]

To connect to this service, we just have to create an alias like

snapgroup =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = dbvisit2)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = snap_service)
    )
  )

15 minutes later we can see that a new snapshot was generated

oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)] ps -ef | grep pmon
oracle    2300     1  0 09:05 ?        00:00:00 ora_pmon_orcl
oracle    6765     1  0 09:57 ?        00:00:00 ora_pmon_MySna001
oracle   11355     1  0 10:11 ?        00:00:00 ora_pmon_MySna002
oracle   11866  1892  0 10:13 pts/0    00:00:00 grep --color=auto pmon
oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)]

Note that we can only open the snapshot in a read only mode

oracle@dbvisit1:/home/oracle/ [orcl (CDB$ROOT)] sqlplus sys/*****@snapgroup as sysdba

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           MOUNTED
         4 PDB2                           MOUNTED
SQL> alter pluggable database all open;
alter pluggable database all open
*
ERROR at line 1:
ORA-65054: Cannot open a pluggable database in the desired mode.


SQL>  alter pluggable database all open read only;

Pluggable database altered.

Of course you have probably seen that you can stop, start, pause and remove snapshots from the GUI.

Single Snapshot
This option allows to create read-only as well read-write snapshots for the database.
To create a single snapshot , we just have to choose the NEW SINGLE SNAPSHOT
In the following example the snapshot will be opened in a read write mode

At this end of the creation we can see the status

We can verify that a service SingleSn was also created

oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)] lsnrctl status | grep Singl
  Instance "SingleSn", status READY, has 1 handler(s) for this service...
  Instance "SingleSn", status READY, has 1 handler(s) for this service...
Service "SingleSn" has 1 instance(s).
  Instance "SingleSn", status READY, has 1 handler(s) for this service...
  Instance "SingleSn", status READY, has 1 handler(s) for this service...
  Instance "SingleSn", status READY, has 1 handler(s) for this service...
  Instance "SingleSn", status READY, has 1 handler(s) for this service

And that the instance SinglSn is started

oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)] ps -ef | grep pmon
oracle    3294     1  0 16:04 ?        00:00:00 ora_pmon_SingleSn
oracle    3966  1748  0 16:08 pts/0    00:00:00 grep --color=auto pmon
oracle   14349     1  0 13:57 ?        00:00:00 ora_pmon_orcl
oracle@dbvisit2:/home/oracle/ [orcl (CDB$ROOT)]

We can also remark that at OS level, a filesystem is mounted for the snap. So there must be enough free space at the LVM that host the database.

oracle@dbvisit2:/home/oracle/ [MySna004 (CDB$ROOT)] df -h | grep snap
/dev/mapper/ora_data-SingleSn   25G   18G  6.3G  74% /u01/app/dbvisit/standby/snap/orcl/SingleSn
oracle@dbvisit2:/home/oracle/ [MySna004 (CDB$ROOT)]

Using the alias

singlesnap =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = dbvisit2)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = SingleSn)
    )
  )

We can see that new snapshot is opened in read write mode

SQL> select name,open_mode from v$database;

NAME      OPEN_MODE
--------- --------------------
SINGLESN  READ WRITE

SQL>

Conclusion

What we will retain is that a snapshot groups is a group of snapshots taken at a regular time interval. These kinds of snapshots can only be opened in a read only mode.
The single snapshot can be created in a read only or read write mode. When opened in read write mode, it can be compared maybe to Oracle snapshot standby.
The option dbvisit snapshot required a license and is only supported on linux.

Cet article Dbvisit Standby 9 : Do you know the new snapshot feature? est apparu en premier sur Blog dbi services.

Dbvisit 9: Adding datafiles and or tempfiles

$
0
0

One question I was asking is if the standby_file_management parameter is relevant in a Dbvisit environment with Oracle Standard Edition. I did some tests and I show here what I did.
We suppose that the Dbvisit is already set and that the replication is fine

[oracle@dbvisit1 trace]$ /u01/app/dbvisit/standby/dbvctl -d dbstd -i
=============================================================
Dbvisit Standby Database Technology (9.0.08_0_g99a272b) (pid 19567)
dbvctl started on dbvisit1: Fri Jan 17 16:48:16 2020
=============================================================

Dbvisit Standby log gap report for dbstd at 202001171648:
-------------------------------------------------------------
Description       | SCN          | Timestamp
-------------------------------------------------------------
Source              2041731        2020-01-17:16:48:18 +01:00
Destination         2041718        2020-01-17:16:48:01 +01:00

Standby database time lag (DAYS-HH:MI:SS): +00:00:17

Report for Thread 1
-------------------
SOURCE
Current Sequence 8
Last Archived Sequence 7
Last Transferred Sequence 7
Last Transferred Timestamp 2020-01-17 16:48:07

DESTINATION
Recovery Sequence 8

Transfer Log Gap 0
Apply Log Gap 0

=============================================================
dbvctl ended on dbvisit1: Fri Jan 17 16:48:23 2020
=============================================================

[oracle@dbvisit1 trace]$

While the standby_file_management is set to MANUAL on both servers

[oracle@dbvisit1 trace]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Jan 17 16:50:50 2020
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
READ WRITE

SQL> show parameter standby_file_management;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
standby_file_management              string      MANUAL
SQL>


[oracle@dbvisit2 back_dbvisit]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Jan 17 16:51:15 2020
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
MOUNTED

SQL>  show parameter standby_file_management;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
standby_file_management              string      MANUAL
SQL>

Let’s create a tablespace MYTAB on the primary database

SQL> create tablespace mytab datafile '/u01/app/oracle/oradata/DBSTD/mytab01.dbf' size 10M;

Tablespace created.

SQL>

SQL> alter system switch logfile;

System altered.

SQL> select name from v$tablespace;

NAME
------------------------------
SYSAUX
SYSTEM
UNDOTBS1
USERS
TEMP
MYTAB

6 rows selected.

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/DBSTD/system01.dbf
/u01/app/oracle/oradata/DBSTD/sysaux01.dbf
/u01/app/oracle/oradata/DBSTD/undotbs01.dbf
/u01/app/oracle/oradata/DBSTD/mytab01.dbf
/u01/app/oracle/oradata/DBSTD/users01.dbf

SQL>

A few moment we can see that the new datafile is replicated on the standby

SQL> select name from v$tablespace;

NAME
------------------------------
SYSAUX
SYSTEM
UNDOTBS1
USERS
TEMP
MYTAB

6 rows selected.

SQL>  select name from v$datafile
  2  ;

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/DBSTD/system01.dbf
/u01/app/oracle/oradata/DBSTD/sysaux01.dbf
/u01/app/oracle/oradata/DBSTD/undotbs01.dbf
/u01/app/oracle/oradata/DBSTD/mytab01.dbf
/u01/app/oracle/oradata/DBSTD/users01.dbf

SQL>

Now let’s repeat the tablespace creation while the parameter is set to AUTO on both side

SQL> show parameter standby_file_management;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
standby_file_management              string      AUTO
SQL>  create tablespace mytab2 datafile '/u01/app/oracle/oradata/DBSTD/mytab201.dbf' size 10M;

Tablespace created.

SQL>

SQL> alter system switch logfile;

System altered.

SQL>

A few moment later the tablespace mytab2 was also replicated on the standby

SQL> select name from v$tablespace;

NAME
------------------------------
SYSAUX
SYSTEM
UNDOTBS1
USERS
TEMP
MYTAB
MYTAB2

7 rows selected.

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/DBSTD/system01.dbf
/u01/app/oracle/oradata/DBSTD/mytab201.dbf
/u01/app/oracle/oradata/DBSTD/sysaux01.dbf
/u01/app/oracle/oradata/DBSTD/undotbs01.dbf
/u01/app/oracle/oradata/DBSTD/mytab01.dbf
/u01/app/oracle/oradata/DBSTD/users01.dbf

6 rows selected.

In Dbvisit documentation we can find this
Note 2: This feature is independent of the Oracle parameter STANDBY_FILE_MANAGEMENT. Dbvisit Standby will detect if STANDBY_FILE_MANAGEMENT has added the datafile to the standby database, and if so, Dbvisit Standby will not add the datafile.
Note 3: STANDBY_FILE_MANAGEMENT can only be used in Enterprise Edition and should not be set in Standard Edition.

Dbvisit does not use STANDBY_FILE_MANAGEMENT for datafile replication. So I decide to set this value to its default value which is MANUAL.

What about adding tempfile in a dbvisit environment. In the primary I create a new temporary tablespace

SQL> show parameter standby_file_management;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
standby_file_management              string      MANUAL
SQL> create temporary tablespace temp2 tempfile '/u01/app/oracle/oradata/DBSTD/temp2_01.dbf' size 10M;

Tablespace created.

SQL> alter system switch logfile;

System altered.

SQL>

We can see on the primary that we now have two tempfiles.

SQL> select name from v$tablespace;

NAME
------------------------------
SYSAUX
SYSTEM
UNDOTBS1
USERS
TEMP
MYTAB
MYTAB2
TEMP2

8 rows selected.

SQL>  select name from v$tempfile;

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/DBSTD/temp01.dbf
/u01/app/oracle/oradata/DBSTD/temp2_01.dbf

SQL>

On standby side, the new temporary tablespace was replicated.

SQL> select name from v$tablespace;

NAME
------------------------------
SYSAUX
SYSTEM
UNDOTBS1
USERS
TEMP
MYTAB
MYTAB2
TEMP2

8 rows selected.

But the new tempfile is not listed on the standby

SQL>  select name from v$tempfile;

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/DBSTD/temp01.dbf

SQL>

In fact it’s the expected behavior. In the documentation we can find following
If your preference is to have exactly the same number of temp files referenced in the standby control file as your current primary database, then once a new temp file has been added on the primary, you need to recreate a standby control file by running the following command from the primary server:
dbvctl -f create_standby_ctl -d DDC

So let’s recreate the standby control file

[oracle@dbvisit1 ~]$ /u01/app/dbvisit/standby/dbvctl -f create_standby_ctl -d dbstd
=>Replace current standby controfiles on dbvisit2 with new standby control
file?  [No]: yes

>>> Create standby control file... done

>>> Copy standby control file to dbvisit2... done

>>> Recreate standby control file... done

>>> Standby controfile(s) on dbvisit2 recreated. To complete please run dbvctl on the
    primary, then on the standby.
[oracle@dbvisit1 ~]$

And then after we can verify that the new tempfile is now visible at standby side

SQL> select name from v$tempfile;

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/DBSTD/temp01.dbf
/u01/app/oracle/oradata/DBSTD/temp2_01.dbf

SQL>

Cet article Dbvisit 9: Adding datafiles and or tempfiles est apparu en premier sur Blog dbi services.


Make Oracle database simple again!

$
0
0

Introduction

Let’s have a look at how to make Oracle database as simple as it was before.

Oracle database is a great piece of software, yes it’s quite expensive, but it’s still the reference and most of the companies can find a configuration that fits their needs according to a budget. Another complain about Oracle is the complexity: nothing is really simple, and you’ll need skillful DBA(s) to deploy, manage, upgrade, troubleshoot your databases. But complexity is sometimes caused by wrong decisions you make without having the knowledge, mainly because some choices add significant complexity compared to others.

The goal

Why the things need to be simple?

Obviously, simplification is:

  • easier troubleshooting
  • more “understandable by the others”
  • reinstallation made possible in case of big troubles
  • avoiding bugs related to the mix of multiple components
  • less work, because you probably have enough work with migration, patching, performance, …
  • more reliability because less components is less problems

On the hardware side

Rules for simplifying on the hardware side are:

  • Choose the same hardware for all your environments (DEV/TEST/QUAL/PROD/…): same server models, same CPU family, same revision. Make only slight variations on memory amount, number of disks and processor cores configuration if needed. Order all the servers at the same time. If a problem is related to hardware, you will be able to test the fix on a less important environment before going on production
  • Don’t use SAN: SAN is very nice, but SAN is not the performance guarantee you expect. Adopt local SSD disks: NVMe type SSDs have amazing speed, they are a true game changer in today’s database performance. Getting rid of the SAN is also getting rid of multipathing, resource sharing, complex troubleshooting, external dependencies and so on
  • Provision very large volumes for data: dealing with space pressure is not the most interesting part of your job. And it’s time consuming. You need 4TB of disks? Order 12TB and you’ll be ready for each situation even those not planned. For sure it’s more expensive, but adding disks afterall is not always that simple. It makes me think about a customer case where trying to add a single disk led to a nightmare (production down for several days)
  • Consider ODA appliances (Small or Medium): even if it’s not simplifying everything, at least hardware is all that you need and is dedicated to Oracle database software
  • Think consolidation: Oracle database has a strong database isolation thus leading to easy consolidation. Consolidate to limit the number of servers you need to manage is also simplifying your environment
  • Avoid virtualization: without talking about the license, virtualization is for sure underlying complexity

On the system side

Some rules are also good to know regarding the system:

  • Go for Redhat or Oracle Linux: mainly because it’s the most common OS for Oracle databases. Releases and patches are always available for Linux first. UNIX and Windows are decreasing in popularity for Oracle Databases these past 10 years
  • Same OS: please keep your operating systems strictly identical from development to production. If you decide to upgrade the OS, do that first on TEST/DEV and finish with production servers without waiting months. And never update through internet as packages can be different each time you update a system
  • Limit the filesystems number for your databases: 1 big oradata and 1 big FRA is enough on SSD, you don’t need to slice everything as we did before, and slicing is always wasting space

On the software side

A lot of things should be done, or not done regarding software:

  • Install the same Oracle version (release + patch) and use the same tree everywhere. Use OFA (/u01/…) or not but be consistent
  • Limit the Oracle versions in use: inform your software vendors that your platform cannot support too old versions, especially non-terminal releases like 11.2.0.3. 19c, 12.1.0.2 and eventually 11.2.0.4 are the only recommended version to deploy
  • Don’t use ASM: because ext4 is fine and SSDs now bring you maximum performance even on a classic filesystem. ASM will always be linked to Grid Infrastructure making dependencies between the DB Homes and that Grid stack, making patching much more complex
  • Don’t use RAC: because most of your applications cannot correctly manage high availability. RAC is much more complex compared to single instance databases. Not choosing RAC is getting rid of interconnect, cluster repository, fusion cache for SGA sharing, SAN or NAS technologies, split brains, scan listeners and complex troubleshooting. Replacing RAC with Data Guard or Dbvisit standby is the new way of doing sort of high availability without high complexity

Simplify backup configuration

For sure you will use RMAN, but how to simplify backups with RMAN?

  • Use the same backup script for all the databases
  • Use the same backup frequency for each database because when you need to restore, you’d better have a fresh backup
  • Configure only different retention on each database
  • Backup to disk (the most convenient being on a big nfs share) and without any specific library (backup your /backup filesystem later with your enterprise backup tool if needed)
  • provision large enough filesystem to never need to delete backups still in the retention period

Using the same backup strategy means being able to use the same restore procedure on all databases because you always need a quick restore of a broken database.

Always backup controlfile and spfile on the standby databases, the resulting backupset has a very small footprint and makes easier restore of the standby using database backupsets from the primary without the need for duplication.

Consider RMAN catalog only if you have enough databases to manage.

Simplify database management and configuration

  • Create scripts for database configuration and tablespace creation (for example: configure_SID.sql and tablespaces_SID.sql) to be able to reconfigure the same database elsewhere
  • don’t create grid and oracle users if you plan to use Grid Infrastructure/ASM: as a DBA you probably manage both ASM and database instances. Instead of loosing time switching between these 2 users, configure only one oracle user for both
  • never use graphical tools to create a database, deploy an appliance, configure something: because screenshots are far less convenient than pure text commands easily repeatable and scriptable
  • Use OMF: configure only db_create_file_dest and db_recovery_file_dest and Oracle will multiplex the controlfile and the redologs in these areas. OMF is also naming datafiles for you: there is no need for manual naming, who really cares about the name of the datafiles?
  • Don’t use multitenant: multitenant is fine but it’s been years we’re living with non-CDB databases and it works like a charm. You can still use non-CDB architecture in 19c, so multitenant is not mandatory even on this latest version. Later migration from non-CDB to pluggable database is quite easy, you will be able to use multitenant later
  • Keep your spfile clean: don’t specify unused parameters or parameters that already have the given value as a standard. Remove from the spfile these parameters using ALTER SYSTEM RESET parameter SCOPE=spfile SID='*';

Simplify patching

Patching can also be simplified:

  • Patch once a year, because you need to patch, but you don’t need to spend all your time applying each PSU every 3 months
  • Start with test/dev databases and take the time to test from the application
  • Don’t wait too much to patch the other environments: production should be patched few weeks after the first patched environment

Simplify Oracle*Net configuration

Simplifying also concerns Oracle*Net configuration:

  • Avoid configuring multiple listeners on a system because one is enough for all your databases
  • put your Oracle*Net configuration files in /etc because you don’t want multiple files in multiple homes
  • Keep your Oracle*Net configuration files clean and organized for increased readability

Make your database movable

Last but not least, one of the biggest mistake is to create a strong dependency between a database and a system. How to make your database easily movable? By configuring a standby database and using Data Guard or Dbvisit standby. Moving your database to another server is done within a few minutes with a single SWITCHOVER operation.

Using standby databases make your life easier for all of these purposes:

  • you need to move your server to another datacenter
  • a power outage happened on one site
  • you need to update hardware or software
  • you suspect a hardware problem impacting the database

Don’t only create standbys for production databases: even development databases are some kind of production for developers. If a database cannot be down for 1 day, you need a standby database.

Finest configuration is not dedicating a server for the primaries and a server for the standbys but dispatching the primaries between 2 identical servers on 2 sites. Each database having a preference server for its primary, the standby being on the opposite server.

Conclusion

It’s so easy to increase complexity without any good reason. Simplifying is the power of saying NO. No to interesting features and configurations that are not absolutely necessary. All you need for your databases is reliability, safety, availability, performance. Simplicity helps you in that way.

Cet article Make Oracle database simple again! est apparu en premier sur Blog dbi services.

ROLLBACK TO SAVEPOINT;

$
0
0

By Franck Pachot

.
I love databases and, rather than trying to compare and rank them, I like to understand their difference. Sometimes, you make a mistake and encounter an error. Let’s take the following example:
create table DEMO (n int);
begin transaction;
insert into DEMO values (0);
select n "after insert" from DEMO;
update DEMO set n=1/n;
select n "after error" from DEMO;
commit;
select n "after commit" from DEMO;

The “begin transaction” is not valid syntax in all databases because transactions may be started implicitly, but the other statements are valid syntax in all the common SQL databases. They all raise an error in the update execution because there’s one row with N=0 and then we cannot calculate 1/N as it is a math error. But, what about the result of the last select?

If I run this with Oracle, DB2, MS SQL Server, My SQL (links go to example in db<>fiddle), the row added by the insert is always visible by my session: after the insert, of course, after the update error, and after the commit (then visible by everybody).

The same statements run with PostgreSQL have a different result. You cannot do anything after the error. Only rollback the transaction. Even if you “commit” it will rollback. 

Yes, no rows are remaining there! Same code but different result.

You can have the same behavior as the other databases by defining a savepoint before the statement, and rollback to savepoint after the error. Here is the db<>fiddle. With PostgreSQL you have to define an explicit savepoint if you want to continue in your transaction after the error. Other databases take an implicit savepoint. By the way, I said “statement” but here is Tanel Poder showing that in Oracle the transaction is actually not related to the statement but the user call: Oracle State Objects and Reading System State Dumps Hacking Session Video – Tanel Poder’s blog

In Oracle, you can run multiple statements in a user call with a PL/SQL block. With PostgreSQL, you can group multiple statements in one command but you can also run a PL/pgSQL block. And with both, you can catch errors in the exception block. And then, it is PostgreSQL that takes now an implicit savepoint as I explained in a previous post: PostgreSQL subtransactions, savepoints, and exception blocks

This previous post was on Medium ( you can read https://www.linkedin.com/pulse/technology-advocacy-why-i-am-still-nomad-blogger-franck-pachot/ where I explain my blog “nomadism”), but as you can see I’m back on the dbi-services blog for my 500th post there. 

My last post here was called “COMMIThttps://blog.dbi-services.com/commit/ where I explained that I was quitting consulting for CERN to start something new. But even if I decided to change, I was really happy at dbi-services (as I mentioned on a LinkedIn post about great places to work). And when people like to work together it creates an implicit SAVEPOINT where you can come back if you encounter some unexpected results. Yes… this far-fetched analogy just to mention that I’m happy to come back to dbi services and this is where I’ll blog again.

As with many analogies, it reaches the limits of the comparison very quickly. You do not ROLLBACK a COMMIT and it is not a real rollback because this year at CERN was a good experience. I’ve met great people there, learned interesting things about matter and anti-matter, and went out of my comfort zone like co-organizing a PostgreSQL meetup and inviting external people ( https://www.linkedin.com/pulse/working-consultants-only-externalization-franck-pachot/) for visits and conferences. 

This “rollback” is actually a step further, but back in the context I like: solve customer problems in a company that cares about its employees and customers. And I’m not exactly coming back at the same “savepoint”. I was mostly focused on Oracle and I’m now covering more technologies in the database ecosystem. Of course, consulting on Oracle Database will still be a major activity. But today, many other databases are raising: NoSQL, NewSQL… Open Source is more and more relevant. And in this jungle, the replication and federation technologies are raising. I’ll continue to share on these areas and you can follow this blog, the RSS feed, and/or my twitter account.

Cet article ROLLBACK TO SAVEPOINT; est apparu en premier sur Blog dbi services.

Should I go for ODA 19.5 or should I wait until 19.6?

$
0
0

Introduction

As you may know, Oracle Database 19c is available for new (X8-2) or older Oracle Database Appliances since several weeks. Current version is 19.5. But when you go to the official ODA documentation , it still first proposes version 18.7 not compatible with 19c databases. Here is why.

19c database is the final 12.2

First of all, 19c is an important release because it’s the terminal release of the 12.2, as 11.2.0.4 was for 11.2. Please refer to my other blog to understand the new Oracle versioning. ODA always supports new releases few months after being available on Linux, and it’s why it’s only available now.

Drawbacks of 19.5

19.5 is available on your ODA, but you will not be able to patch to this version. Reason is quite simple, it’s not a complete patch, you can only download ISO for reimaging and 19c grid and database software and that’s it. The reason for not yet having a patch resides in the difficulty of updating the OS part. 19.5 runs on Linux 7.7, and all previous releases are stuck with Linux 6.10, meaning that the patch should include the OS upgrade, and this jump is not so easy. It’s the first drawback.

Second drawback is that you cannot run another database version. If you still need 18c, 12.2, 12.1 or 11.2, this 19.5 is not for you.

The third drawback is that you will not be able to patch from 19.5 to 19.6 or newer version. Simply because 19.5 is an out of the way release.

Another drawback concerns the documentation not yet complete: many parts are copy/paste from 18.7. For example, described initcl command to restart the dcs agent is not a command that actually exists on Linux 7.

Moreover, my first tests on this version show annoying bugs related to database creation, those under investigation by Oracle.

When 19.6 will be ready?

19.6 is planned for 2020, yes but which month? There is no official date, it could come in march, or during the summer, nobody knows. As a result, you will have to wait for this patch to be released to start your migration to 19c on ODA.

So, what to do?

3 solutions are possible:

  • You can deal with your old databases until the patch is released: buy extended support for 11gR2/12cR1. Premier support is still OK for 12.2.0.1 and 18c
  • Migrate your old 11gR2 and 12cR1 to 18c to be prepared for 19c and avoid buying extended support, differences between 18c and 19c should be minimal
  • Deploy 19.5 for testing purpose on a test ODA and start your migration project to get prepared for 19.6. Once available, patch or redeploy your ODAs and migrate all your databases

Conclusion

Not having 19.6 now is really annoying. Afterall we choose ODA because it’s easier to get updates. But you can still prepare everything for 19c migration, by first migrate to 18c or give a try to 19c with this 19.5 release.

Cet article Should I go for ODA 19.5 or should I wait until 19.6? est apparu en premier sur Blog dbi services.

NVMe the afterburer for your database

$
0
0

Over 1 million IOPS (@8 KByte) and more than 26 GByte/s (@1MByte): Read more to see all impressive benchmark figures from a server with 14 NVMe drives and read why this is still not the best you could get…


End of last year, I have gotten a call from Huawei. They (finally) agreed to provide me a server with their enterprise NVMe drives for performance testing.
To say that I was delighted is a huge understatement. It felt like an 8-year old waiting for Christmas to get his present.

Choosing the right hardware for a database server is always important and challenging. Only if you build a rock-solid, stable and performant base you can build a reliable database service with predictable performance. Sounds expensive and most of the time, it is, but NVMe drives can be a game-changer in this field.

After a very nice and friendly guy from Huawei delivered this server to me, I immediately started to inspect this wonderful piece of hardware I’ve got.

Testserver

Huawei provided an H2288 2 Socket Server with 15x 800GByte NVMe drives (ES3600P V5 800 GBytee PCI Gen3) to me.

The server would be able to handle 25 NVMe drives but I think I’ve got every drive, which was available at this time 🙂

All drives have a U.2 PCIe connector. The drives are connected over two PCIe Gen3 16x adapter card with 4 cables on each card (this information will get important later…)

The PCIe cards provide the PCIe lanes for the NVMe drives. These cards are just passthrough cards without any RAID function at all.
So all your RAID configuration has to be done within a software raid (e.g. ASM, LVM, etc.)
The drives are also available with the sizes of 1-6.4 TByte.

Expected performance

As often with SSD/NVMe drives the performance is tied to the size of the drive. We are testing the smallest drive from the series with the lowest specs. But “low” depends on your angle of view.

The 800 GByte drive has a spec performance of
420k IOPS read @ 4KByte blocksize
115k IOPS write @ 4KByte blocksize
3500 IOPS read @ 1MByte blocksize
1000 IOPS write @ 1MByte blocksize

Because we are testing with the Oracle default blocksize we can translate that to
210k IOPS read
57k IOPS write @ 8KByte blocksize PER DRIVE!

The bigger drives have even higher specs as you can see in the picture.

Even more impressive is the specified read and write latency which is 0.088ms and 0.014ms

Of course, these numbers are a little bit like the fuel usage numbers in a car commercial: Impressive but mostly not true. At least when you exceed the cache size of your drive your performance will drop. How much? The interesting question is: How big is the drop?

Test setup

I used an Oracle Linux 7.7 OS and ext4 filesystem with fio for benchmarking tests as well as Oracle 19c with ASM and calibrate_io to prove the result.

Because I used 1 drive for the OS I had “only” 14 drives for testing the performance.

Please see my other blog post Storage performance benchmarking with FIO for detailed information about the fio tests.

Test results with fio

Because SSD/NVMe drives tend to be faster when empty, all drives were first filled 100% before the performance measuring started.

The test was made with 1 drive and 1 thread up to 14 drives and 10 threads per drive.

8KByte random read/write

For random read, the results of the test start at 12k IOPS for 1 drive and 1 thread up 1.186 million IOPS (roughly 9.5GByte/s!!) with 14 drives and 10 threads per drive. The latency was between 82 and 117 microseconds

Random write tests started even more impressive with 64.8k IOPS with 1 drive and 1 thread up to 0.956 million IOPS (roughly 7.6GByte/s!!) at max.
The write latency was between 14 and 145 microseconds which is absolutely insane, especially compared to the latency you can expect from an all-flash storage system with an FC network in between.

8KByte sequential read/write

The numbers for 8KB sequential read and write are even better but in the life of a DB server a little bit less important.

1MByte sequential read/write

Let us talk about the 1MByte blocksize results.
Here we are starting around 1GByte/s with 1 drive and 1 thread up to 26.9 GByte/s with 14 drives and 10 thread. The max value was already reached with 13 drive and 6 threads per drive.
Latency also looks good with 1-5ms which is a good value for IO of this blocksize.

The 2x 16x PCIe Gen3 adapter cards are here the bottleneck. We hit the maximum throughput of the bus. The drives could deliver more performance than the bus and this with just 14 of the possible 25 drives!

The number for sequential writes are not as high as for sequential reads but still quite impressive. We reached 0.656 GByte with 1 drive/1thread up to a maximum speed of 12GByte/s.
The latency is also good with 1.5-12ms.

If you wanna compare these numbers with an all-flash SAN storage go to my blog post about the Dorado 6000 V3 Benchmark.

Verify the results with a secondary tool

To verify the results of the filesystem test with a secondary tool, created a database with all the 14 drives in one ASM diskgroup with external redundancy.

dbms_resource_manager.calibrate_io supports the fio results with 1.35 million IOPS and 25 Gbyte/s throughput. Not exactly the same numbers, but close enough to prove the fio results. Not only we added ASM as an additional layer instead of LVM, but we also tested within the database instead of directly on the OS layer.

Even more speed possible?

There is not a clear answer:
If we could switch to bigger drives and higher capacity of the drives we could expect up to 4 times more IOPS for 8Kbyte (2x more for the double numbers of drives and 2x more for the better specs with the bigger drives).
BUT the controller will be the limitation. We already hit the limit of the 2xPCIe 16x Gen3 bus with 1MByte sequential read in this test and will possibly do so for the other tests when we are testing with more drives.

The server has at least 4 additional PCIe slots but all with only 8 lanes. So here is a bottleneck we are not able to solve in a server with Intel CPUs.

AMD already switched to PCI Gen4 (2x bandwidth then PCIe Gen3) and much more PCI lanes at with their CPUs, but most of the enterprise servers are Intel-based and even when AMD based servers should be available they are rarely seen at customer’s side (at least by my customers).

Intel also has a roadmap for providing PCIe Gen4 but the CPUs will be available earliest end of this year ;-(

Conclusion

This setup is freaking fast! Not only the throughput numbers are astonishing, but also the latency is very impressive. Even a modern storage system has a hard time to provide such numbers and has almost no chance to get down to this latency level.
Not talking about the fact, that a storage system eventually can hit these numbers but you need a 200GBit/s connection to get it down to your server…
With local NVMe drives you can get these numbers out of every box, not sharing with anybody!

As my colleague has written very well in his blog post Make Oracle Database simple again I could not agree more: Do not use SAN when it is not necessary, especially when your other option is local NVMe drives!
As long as you are not using Oracle RAC, you’ve got everything you need for your high availability with Data Guard or other HA solutions. Combine it with Oracle Multitenant and snapshot standbys: Ready is your high-performance database platform to consolidate all your databases on.

Skip the SAN Storage and put the saved money in some additional NVMe drives and set your database to hyperspeed.
When choosing the right CPUs with the needed numbers of Cores and the best available clock speed you can build a beast of a database server (not only for Oracle databases…).

Talking about money, the NVMe drives are getting cheaper almost every month and you can get a 6.4 TByte enterprise NVMe drive quite cheap today.

Of course, there is a downside: Scalability and RAID
At the moment your journey ends with 25×6.4 TByte drives (160 TBye raw / 80 TByte mirrored) capacity.
If you need more space, you have to go to another solution at least for the moment. I expect to see NVMe drives with >15 TByte this year and switch from PCIe Gen3 to Gen4 will bring even more performance for a single drive.

The other downside is the raid controller. Which just does not exist at the moment (as far as I know). You have to go with software RAID.
Not a big deal if you are using Oracle ASM, even when you just can do a RAID-1 where you are “loosing” 50% of your raw capacity. With DM/LVM you could do a software Raid-5. I can not tell you about the performance impact of that, because I just had not enough time to test this out.

But stay tuned, I am looking for a server with bigger and more drives for my next tests.

Cet article NVMe the afterburer for your database est apparu en premier sur Blog dbi services.

Running SQL Server on the Oracle Free tier

$
0
0

By Franck Pachot

The Oracle Cloud is not only for Oracle Database. You can create a VM running Oracle Linux with full root access to it, even in the free tier: a free VM that will be always up, never expires, with full ssh connectivity to a sudoer user, where you are able to tunnel any port. Of course, there are some limits that I’ve detailed in a previous post. But that is sufficient to run a database, given that you configure a low memory usage. For Oracle Database XE, Kamil Stawiarski mentions that you can just hack the memory test in the RPM shell script.
But for Microsoft SQL Server, that’s a bit more complex because this test is hardcoded in the sqlservr binary and the solution I propose here is to intercept the call to the sysinfo() system call.

Creating a VM in the Oracle Cloud is very easy, here are the steps in one picture:

I’m connecting to the public IP Address with ssh (the public key is uploaded when creating the VM) and I’ll will run everything as root:

ssh opc@129.213.138.34
sudo su -
cat /etc/oracle-release

I install docker engine (version 19.3 there)
yum install -y docker-engine

I start docker

systemctl start docker
docker info


I’ll use the latest SQL Server 2019 image built on RHEL
docker pull mcr.microsoft.com/mssql/rhel/server:2019-latest
docker images

5 minutes to download a 1.5GB image. Now trying to start it.
The nice thing (when I compare to Oracle) is that we don’t have to manually accept the license terms with a click-through process. I just mention that I have read and accepted them with: ACCEPT_EULA=Y 

I try to run it:
docker run \
-e "ACCEPT_EULA=Y" \
-e 'MSSQL_PID=Express' \
-p 1433:1433 \
-e 'SA_PASSWORD=**P455w0rd**' \
--name mssql \
mcr.microsoft.com/mssql/rhel/server:2019-latest

There’s a hardcoded prerequisite verification to check that the system has at least 2000 MB of RAM. And I have less than one GB here in this free tier:


awk '/^Mem/{print $0,$2/1024" MB"}' /proc/meminfo

Fortunately, there’s always a nice geek on the internet with an awesome solution: hack the sysinfo() system call with a LD_PRELOAD’ed wrapper : A Slightly Liberated Microsoft SQL Server Docker image

Let’s get it:
git clone https://github.com/justin2004/mssql_server_tiny.git
cd mssql_server_tiny

I changed the FROM to build from the 2019 RHEL image and I preferred to use /etc/ld.so.preload rather than overriding the CMD command with LD_LIBRARY:


FROM oraclelinux:7-slim AS build0
WORKDIR /root
RUN yum update -y && yum install -y binutils gcc
ADD wrapper.c /root/
RUN gcc -shared -ldl -fPIC -o wrapper.so wrapper.c
FROM mcr.microsoft.com/mssql/rhel/server:2019-latest
COPY --from=build0 /root/wrapper.so /root/
ADD wrapper.c /root/
USER root
RUN echo "/root/wrapper.so" > /etc/ld.so.preload
USER mssql

I didn’t change the wrapper for the sysinfo function:
#define _GNU_SOURCE
#include
#include
#include
int sysinfo(struct sysinfo *info){
// clear it
//dlerror();
void *pt=NULL;
typedef int (*real_sysinfo)(struct sysinfo *info);
// we need the real sysinfo function address
pt = dlsym(RTLD_NEXT,"sysinfo");
//printf("pt: %x\n", *(char *)pt);
// call the real sysinfo system call
int real_return_val=((real_sysinfo)pt)(info);
// but then modify its returned totalram field if necessary
// because sqlserver needs to believe it has "2000 megabytes"
// physical memory
if( info->totalram totalram = 1000l * 1000l * 1000l * 2l ;
}
return real_return_val;
}

I build the image from there:

docker build -t mssql .


I run it:

docker run -d \
-e "ACCEPT_EULA=Y" \
-e 'MSSQL_PID=Express' \
-p 1433:1433 \
-e 'SA_PASSWORD=**P455w0rd**' \
--name mssql \
mssql

I wait until it is ready:

until docker logs mssql | grep -C10 "Recovery is complete." ; do sleep 1 ; done

All is ok and I connect and check the version:

Well… as you can see my first attempt failed. I am running with very low memory here, and then many memory allocation problems can be expected. If you look at the logs after a while, many automatic system tasks fail. But that’s sufficient for a minimal lab and you can tweak some Linux and SQL Server parameters if you need it. Comments are welcome here for feedback and ideas…

The port 1433 is exposed here locally and it can be tunneled through ssh. This is a free lab environment always accessible from everywhere to do small tests in MS SQL, running on the Oracle free tier. Here is how I connect with DBeaver from my laptop, just mentioning the public IP address, private ssh key and connection information:

Cet article Running SQL Server on the Oracle Free tier est apparu en premier sur Blog dbi services.

How SQL Server MVCC compares to Oracle and PostgreSQL

$
0
0

By Franck Pachot

.
Microsoft SQL Server has implemented MVCC in 2005, which has been proven to be the best approach for transaction isolation (the I in ACID) in OLTP. But are you sure that writers do not block readers with READ_COMMITTED_SNAPSHOT? I’ll show here that some reads are still blocked by locked rows, contrary to the precursors of MVCC like PostgreSQL and Oracle.

For this demo, I run SQL Server 2019 RHEL image on docker in an Oracle Cloud compute running OEL7.7 as explained in the previous post. If you don’t have the memory limit mentioned, you can simply run:

docker run -d -e "ACCEPT_EULA=Y" -e 'MSSQL_PID=Express' -p 1433:1433 -e 'SA_PASSWORD=**P455w0rd**' --name mssql mcr.microsoft.com/mssql/rhel/server:2019-latest
time until docker logs mssql | grep -C10 "Recovery is complete." ; do sleep 1 ; done

Test scenario description

Here is what I’ll run in a first session:

  1. create a DEMO database
  2. (optional) set MVCC with Read Commited Snapshot isolation level
  3. create a DEMO table with two rows. One with “a”=1 and one with “a”=2
  4. (optional) build an index on column “a”
  5. update the first line where “a”=1


cat > session1.sql <<'SQL'
drop database if exists DEMO;
create database DEMO;
go
use DEMO;
go
-- set MVCC to read snapshot rather than locked current --
-- alter database DEMO set READ_COMMITTED_SNAPSHOT on;
go
drop table if exists DEMO;
create table DEMO(id int primary key, a int not null, b int);
begin transaction;
insert into DEMO values(1,1,1);
insert into DEMO values(2,2,2);
commit;
go
select * from DEMO;
go
-- index to read only rows that we want to modify --
-- create index DEMO_A on DEMO(a);
go
begin transaction;
update DEMO set b=b+1 where a=1;
go
SQL

I’ll run it in the background (you can also run it in another terminal) where it waits 60 seconds before quitting:

( cat session1.sql ; sleep 60 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &

[root@instance-20200208-1719 ~]# ( cat session1.sql ; sleep 60 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &
[1] 27759
[root@instance-20200208-1719 ~]# Feb 09 17:05:43 drop database if exists DEMO;
Feb 09 17:05:43 create database DEMO;
Feb 09 17:05:43
Feb 09 17:05:43 use DEMO;
Feb 09 17:05:43
Feb 09 17:05:43 Changed database context to 'DEMO'.
Feb 09 17:05:43 -- set MVCC to read snapshot rather than locked current --
Feb 09 17:05:43 -- alter database DEMO set READ_COMMITTED_SNAPSHOT on;
Feb 09 17:05:43
Feb 09 17:05:43 drop table if exists DEMO;
Feb 09 17:05:43 create table DEMO(id int primary key, a int not null, b int);
Feb 09 17:05:43 begin transaction;
Feb 09 17:05:43 insert into DEMO values(1,1,1);
Feb 09 17:05:43 insert into DEMO values(2,2,2);
Feb 09 17:05:43 commit;
Feb 09 17:05:43
Feb 09 17:05:43
Feb 09 17:05:43 (1 rows affected)
Feb 09 17:05:43
Feb 09 17:05:43 (1 rows affected)
Feb 09 17:05:43 select * from DEMO;
Feb 09 17:05:43
Feb 09 17:05:43 id          a           b
Feb 09 17:05:43 ----------- ----------- -----------
Feb 09 17:05:43           1           1           1
Feb 09 17:05:43           2           2           2
Feb 09 17:05:43
Feb 09 17:05:43 (2 rows affected)
Feb 09 17:05:43 -- index to read only rows that we want to modify --
Feb 09 17:05:43 -- create index DEMO_A on DEMO(a);
Feb 09 17:05:43
Feb 09 17:05:43 begin transaction;
Feb 09 17:05:43 update DEMO set b=b+1 where a=1;
Feb 09 17:05:43
Feb 09 17:05:43
Feb 09 17:05:43 (1 rows affected)

SQL Server default

While this session has locked the first row I’ll run the following, reading the same row that is currently locked by the other transaction:

docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO | ts
-- read access the row that is not locked
select * from DEMO where a=2;
go

This hangs until the first transaction is canceled:

[root@instance-20200208-1719 ~]# docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO | ts
-- read access the row that is not locked
 select * from DEMO where a=2;
go
Feb 09 17:06:42
Feb 09 17:06:42
Feb 09 17:06:42
Feb 09 17:06:42 Sqlcmd: Warning: The last operation was terminated because the user pressed CTRL+C.
Feb 09 17:06:42
Feb 09 17:06:42 -- read access the row that is not locked
Feb 09 17:06:42  select * from DEMO where a=2;
Feb 09 17:06:42
Feb 09 17:06:42 id          a           b
Feb 09 17:06:42 ----------- ----------- -----------
Feb 09 17:06:42           2           2           2
Feb 09 17:06:42
Feb 09 17:06:42 (1 rows affected)

The “Sqlcmd: Warning: The last operation was terminated because the user pressed CTRL+C” message is fron the first session and only then my foreground session was able to continue. This is the worst you can encounter with the default isolation level in SQL Server where writes and reads are blocking each other even when not touching the same row (I read the a=2 row and only the a=1 one was locked). The reason for this is that I have no index for this predicate and I have to read all rows in order to find mine:

set showplan_text on ;
go
select * from DEMO where a=2;
go

go
Feb 09 17:07:24 set showplan_text on ;
Feb 09 17:07:24
select * from DEMO where a=2;
go
Feb 09 17:07:30 select * from DEMO where a=2;
Feb 09 17:07:30
Feb 09 17:07:30 StmtText
Feb 09 17:07:30 -------------------------------
Feb 09 17:07:30 select * from DEMO where a=2;
Feb 09 17:07:30
Feb 09 17:07:30 (1 rows affected)
Feb 09 17:07:30 StmtText
Feb 09 17:07:30 ---------------------------------------------------------------------------------------------------------------------------------------------------
Feb 09 17:07:30   |--Clustered Index Scan(OBJECT:([DEMO].[dbo].[DEMO].[PK__DEMO__3213E83F2AD8547F]), WHERE:([DEMO].[dbo].[DEMO].[a]=CONVERT_IMPLICIT(int,[@1],0)))
Feb 09 17:07:30
Feb 09 17:07:30 (1 rows affected)

Now, in order to avoid this situation, I’ll run the same but with an index on column “a”.
It was commented out in the session1.sql script and then I just re-ren everything without those comments:

( sed -e '/create index/s/--//' session1.sql ; sleep 60 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &

I’m running the same, now with a 3 seconds timeout so that I don’t have to wait for my background session to terminate:

docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3 | ts
-- read access the row that is not locked
select * from DEMO where a=2;
go

[root@instance-20200208-1719 ~]# docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3 | ts
-- read access the row that is not locked
 select * from DEMO where a=2;
 go
Feb 09 17:29:25 -- read access the row that is not locked
Feb 09 17:29:25  select * from DEMO where a=2;
Feb 09 17:29:25
Feb 09 17:29:25 Timeout expired

Here I’m blocked again like in the previous scenario because the index was not used.
I can force the index access with an hint:

-- read access the row that is not locked forcing index access
select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
go

-- read access the row that is not locked forcing index access
 select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
 go
Feb 09 17:29:30 -- read access the row that is not locked forcing index access
Feb 09 17:29:30  select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
Feb 09 17:29:30
Feb 09 17:29:30 id          a           b
Feb 09 17:29:30 ----------- ----------- -----------
Feb 09 17:29:30           2           2           2
Feb 09 17:29:30
Feb 09 17:29:30 (1 rows affected)

This didn’t wait because the index access didn’t have to to to the locked row.

However, when I read the same row that is concurently locked I have to wait:

-- read access the row that is locked
select * from DEMO where a=1;
go

 -- read access the row that is locked
 select * from DEMO where a=1;
 go
Feb 09 17:29:34  -- read access the row that is locked
Feb 09 17:29:34  select * from DEMO where a=1;
Feb 09 17:29:34
Feb 09 17:29:34 Timeout expired

Here is the confirmation that the index was used only with the hint:

set showplan_text on ;
go
select * from DEMO where a=2;
go
select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
go

Feb 09 17:29:50 set showplan_text on ;
Feb 09 17:29:50
 select * from DEMO where a=2;
 go
Feb 09 17:29:50  select * from DEMO where a=2;
Feb 09 17:29:50
Feb 09 17:29:50 StmtText
Feb 09 17:29:50 --------------------------------
Feb 09 17:29:50  select * from DEMO where a=2;
Feb 09 17:29:50
Feb 09 17:29:50 (1 rows affected)
Feb 09 17:29:50 StmtText
Feb 09 17:29:50 --------------------------------------------------------------------------------------------------------------------------
Feb 09 17:29:50   |--Clustered Index Scan(OBJECT:([DEMO].[dbo].[DEMO].[PK__DEMO__3213E83F102B4054]), WHERE:([DEMO].[dbo].[DEMO].[a]=(2)))
Feb 09 17:29:50
Feb 09 17:29:50 (1 rows affected)
 select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
 go
Feb 09 17:29:52  select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
Feb 09 17:29:52
Feb 09 17:29:52 StmtText
Feb 09 17:29:52 -----------------------------------------------------
Feb 09 17:29:52  select * from DEMO WITH (INDEX(DEMO_A)) where a=2;
Feb 09 17:29:52
Feb 09 17:29:52 (1 rows affected)
Feb 09 17:29:52 StmtText                                                                                                                                                
Feb 09 17:29:52 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Feb 09 17:29:52   |--Nested Loops(Inner Join, OUTER REFERENCES:([DEMO].[dbo].[DEMO].[id]))                                                                              
Feb 09 17:29:52        |--Index Seek(OBJECT:([DEMO].[dbo].[DEMO].[DEMO_A]), SEEK:([DEMO].[dbo].[DEMO].[a]=(2)) ORDERED FORWARD)                                         
Feb 09 17:29:52        |--Clustered Index Seek(OBJECT:([DEMO].[dbo].[DEMO].[PK__DEMO__3213E83F102B4054]), SEEK:([DEMO].[dbo].[DEMO].[id]=[DEMO].[dbo].[DEMO].[id]) LOOKUP ORDERED FORWARD)
Feb 09 17:29:52
Feb 09 17:29:52 (3 rows affected)

So, with de the default isolation level and index access, we can read a row that is not locked. The last query was blocked for the SELECT * FROM DEMO WHERE A=1 because we are in the legacy, and default, mode where readers are blocked by writers.

SQL Server MVCC

In order to improve this situation, Microsoft has implemented MVCC. With it, we do not need to read the current version of the rows (which requires waiting when it is concurrently modified) because the past version of the rows are stored in TEMPDB and we can read a past snapshot of it. Typically, with READ COMMITED SNAPSHOT isolation level, we read a snapshot as-of the point-in-time our query began. 
In general, we need to read all rows from a consistent point in time. This can be the one where our query started, and then while the query is running, a past version may be reconstructed to remove concurrent changes. Or, when there is no MVCC to rebuild this snapshot, this consistent point can only be the one when our query is completed. This means that while we read rows, we must lock them to be sure that they stay the same until the end of our query. Of course, even with MVCC there are cases where we want to read the latest value and then we will lock with something like a SELECT FOR UPDATE. But that’s not the topic here.

I’ll run the same test as the first one, but now have the database with READ_COMMITTED_SNAPSHOT on:

( sed -e '/READ_COMMITTED/s/--//' session1.sql ; sleep 120 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &

[root@instance-20200208-1719 ~]# ( sed -e '/READ_COMMITTED/s/--//' session1.sql ; sleep 120 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &
[1] 38943
[root@instance-20200208-1719 ~]# Feb 09 18:21:19 drop database if exists DEMO;
Feb 09 18:21:19 create database DEMO;
Feb 09 18:21:19
Feb 09 18:21:19 use DEMO;
Feb 09 18:21:19
Feb 09 18:21:19 Changed database context to 'DEMO'.
Feb 09 18:21:19 -- set MVCC to read snapshot rather than locked current --
Feb 09 18:21:19  alter database DEMO set READ_COMMITTED_SNAPSHOT on;
Feb 09 18:21:19
Feb 09 18:21:19 drop table if exists DEMO;
Feb 09 18:21:19 create table DEMO(id int primary key, a int not null, b int);
Feb 09 18:21:19 begin transaction;
Feb 09 18:21:19 insert into DEMO values(1,1,1);
Feb 09 18:21:19 insert into DEMO values(2,2,2);
Feb 09 18:21:19 commit;
Feb 09 18:21:19
Feb 09 18:21:19
Feb 09 18:21:19 (1 rows affected)
Feb 09 18:21:19
Feb 09 18:21:19 (1 rows affected)
Feb 09 18:21:19 select * from DEMO;
Feb 09 18:21:19
Feb 09 18:21:19 id          a           b
Feb 09 18:21:19 ----------- ----------- -----------
Feb 09 18:21:19           1           1           1
Feb 09 18:21:19           2           2           2
Feb 09 18:21:19
Feb 09 18:21:19 (2 rows affected)
Feb 09 18:21:19 -- index to read only rows that we want to modify --
Feb 09 18:21:19 -- create index DEMO_A on DEMO(a);
Feb 09 18:21:19
Feb 09 18:21:19 begin transaction;
Feb 09 18:21:19 update DEMO set b=b+1 where a=1;
Feb 09 18:21:19
Feb 09 18:21:19
Feb 09 18:21:19 (1 rows affected)

And then running the same scenario:

docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3 | ts
-- read access the row that is not locked
select * from DEMO where a=2;
go
-- read access the row that is locked
select * from DEMO where a=1;
go
-- write access on the row that is not locked
delete from DEMO where a=2;
go

[root@instance-20200208-1719 ~]# docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3 | ts

-- read access the row that is not locked
select * from DEMO where a=2;
go
Feb 09 18:21:36 -- read access the row that is not locked
Feb 09 18:21:36 select * from DEMO where a=2;
Feb 09 18:21:36
Feb 09 18:21:36 id          a           b
Feb 09 18:21:36 ----------- ----------- -----------
Feb 09 18:21:36           2           2           2
Feb 09 18:21:36
Feb 09 18:21:36 (1 rows affected)

-- read access the row that is locked
select * from DEMO where a=1;
go
Feb 09 18:21:47 -- read access the row that is locked
Feb 09 18:21:47 select * from DEMO where a=1;
Feb 09 18:21:47
Feb 09 18:21:47 id          a           b
Feb 09 18:21:47 ----------- ----------- -----------
Feb 09 18:21:47           1           1           1
Feb 09 18:21:47
Feb 09 18:21:47 (1 rows affected)

-- write access on the row that is not locked
delete from DEMO where a=2;
go
Feb 09 18:22:01 -- write access on the row that is not locked
Feb 09 18:22:01 delete from DEMO where a=2;
Feb 09 18:22:01
Feb 09 18:22:01 Timeout expired

Ok, that’s better. I confirm that readers are not blocked by writers. But the modification on “A”=2 was blocked. This is not a writer-writer situation because we are not modifying the row that is locked by the other session. Here, I have no index on “A” and then the delete statement must first read the table and had to read this locked row. And obviously, this read is blocked. It seems that DML must read the current version of the row even when MVCC is available. That means that reads can be blocked by writes when those reads are in a writing transaction.

Last test on SQL Server: the same, with MVCC, and the index on “A”

( sed -e '/READ_COMMITTED/s/--//' -e '/create index/s/--//' session1.sql ; sleep 120 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &

[root@instance-20200208-1719 ~]# ( sed -e '/READ_COMMITTED/s/--//' -e '/create index/s/--//' session1.sql ; sleep 120 ) | docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e | ts &
[1] 40320
[root@instance-20200208-1719 ~]#
[root@instance-20200208-1719 ~]# Feb 09 18:30:15 drop database if exists DEMO;
Feb 09 18:30:15 create database DEMO;
Feb 09 18:30:15
Feb 09 18:30:15 use DEMO;
Feb 09 18:30:15
Feb 09 18:30:15 Changed database context to 'DEMO'.
Feb 09 18:30:15 -- set MVCC to read snapshot rather than locked current --
Feb 09 18:30:15  alter database DEMO set READ_COMMITTED_SNAPSHOT on;
Feb 09 18:30:15
Feb 09 18:30:15 drop table if exists DEMO;
Feb 09 18:30:15 create table DEMO(id int primary key, a int not null, b int);
Feb 09 18:30:15 begin transaction;
Feb 09 18:30:15 insert into DEMO values(1,1,1);
Feb 09 18:30:15 insert into DEMO values(2,2,2);
Feb 09 18:30:15 commit;
Feb 09 18:30:15
Feb 09 18:30:15
Feb 09 18:30:15 (1 rows affected)
Feb 09 18:30:15
Feb 09 18:30:15 (1 rows affected)
Feb 09 18:30:15 select * from DEMO;
Feb 09 18:30:15
Feb 09 18:30:15 id          a           b
Feb 09 18:30:15 ----------- ----------- -----------
Feb 09 18:30:15           1           1           1
Feb 09 18:30:15           2           2           2
Feb 09 18:30:15
Feb 09 18:30:15 (2 rows affected)
Feb 09 18:30:15 -- index to read only rows that we want to modify --
Feb 09 18:30:15  create index DEMO_A on DEMO(a);
Feb 09 18:30:15
Feb 09 18:30:15 begin transaction;
Feb 09 18:30:15 update DEMO set b=b+1 where a=1;
Feb 09 18:30:15
Feb 09 18:30:15
Feb 09 18:30:15 (1 rows affected)

Here is my full scenario to see where it blocks:

docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3
-- read access the row that is not locked
select * from DEMO where a=2;
go
-- read access the row that is locked
select * from DEMO where a=1;
go
-- write access on the row that is not locked
delete from DEMO where a=2;
go
-- write access on the row that is locked
delete from DEMO where a=1;
go

docker exec -i mssql /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "**P455w0rd**" -e -d DEMO -t 3
-- read access the row that is not locked
select * from DEMO where a=2;
go
-- read access the row that is not locked
select * from DEMO where a=2;

id          a           b
----------- ----------- -----------
          2           2           2

(1 rows affected)

-- read access the row that is locked
select * from DEMO where a=1;
go
-- read access the row that is locked
select * from DEMO where a=1;

id          a           b
----------- ----------- -----------
          1           1           1

(1 rows affected)

-- write access on the row that is not locked
delete from DEMO where a=2;
go
-- write access on the row that is not locked
delete from DEMO where a=2;


(1 rows affected)

-- write access on the row that is locked
delete from DEMO where a=1;
go
-- write access on the row that is locked
delete from DEMO where a=1;

Timeout expired

Finally, the only blocking situation here is when I want to write on the same row. The index access reduces the risk of being blocked.

In summary, we can achieve the best concurrency with READ_COMMITTED_SNAPSHOT isolation level, and ensuring that we read only the rows we will update, with proper indexing and maybe hinting. This is, in my opinion, very important to know because we rarely cover those situations during integration tests. But they can happen quickly in production with high load.

PostgreSQL

Let’s do the same with PostgreSQL which is natively MVCC:

cat > session1.sql <<'SQL'
drop database if exists DEMO;
create database DEMO;
\c demo
drop table if exists DEMO;
create table DEMO(id int primary key, a int not null, b int);
begin transaction;
insert into DEMO values(1,1,1);
insert into DEMO values(2,2,2);
commit;
select * from DEMO;
begin transaction;
update DEMO set b=b+1 where a=1;
SQL

No specific settings, and no index created here.

( cat session1.sql ; sleep 120 ; echo "commit;") | psql -e | ts &

-bash-4.2$ ( cat session1.sql ; sleep 120 ; echo "commit;") | psql -e | ts &
[1] 31125
-bash-4.2$
-bash-4.2$ Feb 09 18:42:48 drop database if exists DEMO;
Feb 09 18:42:48 DROP DATABASE
Feb 09 18:42:48 create database DEMO;
Feb 09 18:42:49 CREATE DATABASE
Feb 09 18:42:49 You are now connected to database "demo" as user "postgres".
Feb 09 18:42:49 drop table if exists DEMO;
NOTICE:  table "demo" does not exist, skipping
Feb 09 18:42:49 DROP TABLE
Feb 09 18:42:49 create table DEMO(id int primary key, a int not null, b int);
Feb 09 18:42:49 CREATE TABLE
Feb 09 18:42:49 begin transaction;
Feb 09 18:42:49 BEGIN
Feb 09 18:42:49 insert into DEMO values(1,1,1);
Feb 09 18:42:49 INSERT 0 1
Feb 09 18:42:49 insert into DEMO values(2,2,2);
Feb 09 18:42:49 INSERT 0 1
Feb 09 18:42:49 commit;
Feb 09 18:42:49 COMMIT
Feb 09 18:42:49 select * from DEMO;
Feb 09 18:42:49  id | a | b
Feb 09 18:42:49 ----+---+---
Feb 09 18:42:49   1 | 1 | 1
Feb 09 18:42:49   2 | 2 | 2
Feb 09 18:42:49 (2 rows)
Feb 09 18:42:49
Feb 09 18:42:49 begin transaction;
Feb 09 18:42:49 BEGIN
Feb 09 18:42:49 update DEMO set b=b+1 where a=1;
Feb 09 18:42:49 UPDATE 1

While the transaction updating the first row is in the background, I run the following readers and writers:

psql demo | ts
set statement_timeout=3000;
-- read access the row that is not locked
select * from DEMO where a=2;
-- read access the row that is locked
select * from DEMO where a=1;
-- write access on the row that is not locked
delete from DEMO where a=2;
-- write access on the row that is locked
delete from DEMO where a=1;

-bash-4.2$ psql demo | ts
set statement_timeout=3000;
Feb 09 18:43:00 SET
-- read access the row that is not locked
select * from DEMO where a=2;
Feb 09 18:43:08  id | a | b
Feb 09 18:43:08 ----+---+---
Feb 09 18:43:08   2 | 2 | 2
Feb 09 18:43:08 (1 row)
Feb 09 18:43:08
-- read access the row that is locked
select * from DEMO where a=1;
Feb 09 18:43:16  id | a | b
Feb 09 18:43:16 ----+---+---
Feb 09 18:43:16   1 | 1 | 1
Feb 09 18:43:16 (1 row)
Feb 09 18:43:16
-- write access on the row that is not locked
delete from DEMO where a=2;
Feb 09 18:43:24 DELETE 1
-- write access on the row that is locked
delete from DEMO where a=1;

ERROR:  canceling statement due to statement timeout
CONTEXT:  while deleting tuple (0,1) in relation "demo"

Nothing is blocked except, of course, when modifying the row that is locked.

Oracle Database

One of the many things I’ve learned from Tom Kyte when I was reading AskTom regularly is how to build the simplest test cases. And with Oracle there is no need to run multiple sessions to observe multiple transactions concurrency. I can do it with an autonomous transaction in one session and one advantage is that I can share a dbfiddle example:

Here, deadlock at line 14 means that only the “delete where a=1” encountered a blocking situation with “update where a=1”. All previous statements, select on any row and update of other rows, were executed without conflict.

A DML statement has two phases: one to find the rows and the second one to modify them. A DELETE or UPDATE in Oracle and Postgres runs the first in snapshot mode: non-blocking MVCC. The second must, of course, modify the current version. This is a very complex mechanism because it may require a retry (restart) when the current version does not match the consistent snapshot that was used for filtering. Both PostgreSQL and Oracle can ensure this write consistency without the need to block the reads. SQL Server has implemented MVCC more recently and provides non-blocking reads only for the SELECT reads. But a read can still be in blocking situation for the query phase of an update statement.

Cet article How SQL Server MVCC compares to Oracle and PostgreSQL est apparu en premier sur Blog dbi services.

odacli create-database extremely slow on ODA X8-2 with 19.5

$
0
0

Introduction

ODA X8-2, in the S, M or HA flavour, is the new database appliance from Oracle. But as a brand new product, you can experience troubles due to the lack of maturity. This time, it concerns database creation that can be extremely slow on these servers if you are using ODA 19.5.

Deployment, appliance creation and core configuration

X8-2 reimaging is fine, and appliance creation is done without any problem on X8-2. Don’t hesitate to create a DBTEST database during appliance creation, it’s a good practise to check if everything is OK after deployment. Once appliance creation is done you should find these pmon processes running on the system:

[root@oda-cr-test ~]# ps -ef | grep pmon
oracle 17600 1 0 11:49 ? 00:00:00 asm_pmon_+ASM1
oracle 20604 1 0 11:50 ? 00:00:00 apx_pmon_+APX1
oracle 21651 1 0 11:50 ? 00:00:00 ora_pmon_DBTEST
root 81984 37658 0 13:44 pts/0 00:00:00 grep --color=auto pmon

If you deployed your ODA for Enterprise Edition, you should now apply the core configuration, understand disabling the unlicensed cores. It’s better to do that immediatly after deployment to avoid using cores you didn’t pay for. One EE license is 2 cores, 2 licenses is 4 cores, etc:

odacli update-cpucore -c 2

Core configuration is done within a few seconds, you could check with this command:

odacli describe-cpucore
Node Cores Modified Job Status
----- ------ ------------------------------ ---------------
0 2 January 27, 2020 9:56:37 AM CET Configured

Starting from now, you can create your own databases.

Database creation: extremely slow and eventually failing

Without additional steps you can create your very first DB. I’m using basic shape for all my databases, as I fine tune everything later using a SQL script (SGA and PGA targets, undo_retention, archive_lag_target, redolog configuration, options, etc):

[root@oda-cr-test ~]# odacli list-dbhomes
ID Name DB Version Home Location Status
---------------------------------------- -------------------- ---------------------------------------- --------------------------------------------- ----------
ae442886-6bcc-497c-8e16-2e8b4e55157e OraDB19000_home1 19.5.0.0.191015 /u01/app/oracle/product/19.0.0.0/dbhome_1 Configured
odacli create-database -m MAnager_2020_dbi -cs AL32UTF8 -no-c -u DBSUS -dh 'ae442886-6bcc-497c-8e16-2e8b4e55157e' -n DBSUS -s odb1s -l AMERICAN -dt AMERICA -no-co -r asm

ASM is a better solution than acfs for me: no need to have “real” filesystems and optimal storage usage.

Creating the database lasts normally less than 10 minutes, but not this time:


odacli describe-job -i 94726b11-ea0c-46b0-ab55-6d709ef747d3
Job details
----------------------------------------------------------------
ID: 94726b11-ea0c-46b0-ab55-6d709ef747d3
Description: Database service creation with db name: DBSUS
Status: Failure
Created: February 13, 2020 11:24:53 AM CET
Message: DCS-10001:Internal error encountered: Fail to create User tablespace .
Task Name Start Time End Time Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Database Service creation February 13, 2020 11:24:53 AM CET February 13, 2020 6:23:16 PM CET Failure
Database Service creation February 13, 2020 11:24:53 AM CET February 13, 2020 6:23:16 PM CET Failure
Setting up ssh equivalance February 13, 2020 11:26:26 AM CET February 13, 2020 11:26:36 AM CET Success
Database Service creation February 13, 2020 11:26:36 AM CET February 13, 2020 6:23:11 PM CET Success
Database Creation February 13, 2020 11:26:36 AM CET February 13, 2020 6:21:37 PM CET Success
Change permission for xdb wallet files February 13, 2020 6:21:37 PM CET February 13, 2020 6:21:38 PM CET Success
Place SnapshotCtrlFile in sharedLoc February 13, 2020 6:21:38 PM CET February 13, 2020 6:22:06 PM CET Success
SqlPatch upgrade February 13, 2020 6:23:02 PM CET February 13, 2020 6:23:05 PM CET Success
Running dbms_stats init_package February 13, 2020 6:23:05 PM CET February 13, 2020 6:23:07 PM CET Success
updating the Database version February 13, 2020 6:23:07 PM CET February 13, 2020 6:23:11 PM CET Success
create Users tablespace February 13, 2020 6:23:11 PM CET February 13, 2020 6:23:16 PM CET Failure
Creating Users tablespace February 13, 2020 6:23:11 PM CET February 13, 2020 6:23:16 PM CET Failure

I took 7 hours and it finished with a failure!

Failure can be slightly different, for example :
DCS-10001:Internal error encountered: configure snapshot control file for databaseDBSUS.

Troubleshooting

Creation of the DBTEST database was fine at appliance creation. So what changed between deployment and database creation? The cpu-core configuration. Something is probably wrong with this core configuration.

dbca log is not very interesting regarding our problem:

cat /u01/app/oracle/cfgtoollogs/dbca/DBSUS/DBSUS.log
[ 2020-02-13 11:37:46.492 CET ] Prepare for db operation
DBCA_PROGRESS : 10%
[ 2020-02-13 11:38:22.441 CET ] Registering database with Oracle Restart
DBCA_PROGRESS : 14%
[ 2020-02-13 11:38:51.365 CET ] Copying database files
DBCA_PROGRESS : 43%
[ 2020-02-13 12:09:44.044 CET ] Creating and starting Oracle instance
DBCA_PROGRESS : 45%
DBCA_PROGRESS : 49%
DBCA_PROGRESS : 53%
DBCA_PROGRESS : 56%
DBCA_PROGRESS : 62%
[ 2020-02-13 15:44:47.163 CET ] Completing Database Creation
DBCA_PROGRESS : 68%
[ 2020-02-13 17:48:28.024 CET ] [WARNING] ORA-13516: AWR Operation failed: AWR Schema not initialized
ORA-06512: at "SYS.DBMS_SWRF_INTERNAL", line 356
ORA-06512: at "SYS.DBMS_SWRF_INTERNAL", line 389
ORA-06512: at line 1
DBCA_PROGRESS : 71%
[ 2020-02-13 18:20:49.119 CET ] Executing Post Configuration Actions
DBCA_PROGRESS : 100%
[ 2020-02-13 18:20:49.123 CET ] Database creation complete. For details check the logfiles at:
/u01/app/oracle/cfgtoollogs/dbca/DBSUS.
Database Information:
Global Database Name:DBSUS.salt.ch
System Identifier(SID):DBSUS

Let’s check the ASM alert.log:

tail /u01/app/oracle/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log
2020-02-13T14:35:24.880987+01:00
LGWR (ospid: 17755) waits for event 'kfk: async disk IO' for 44 secs.
2020-02-13T14:35:24.881028+01:00
LGWR (ospid: 17755) is hung in an acceptable location (inwait 0x1.ffff).
2020-02-13T14:35:30.039917+01:00
LGWR (ospid: 17755) waits for event 'kfk: async disk IO' for 59 secs.
2020-02-13T14:35:40.280979+01:00
LGWR (ospid: 17755) waits for event 'kfk: async disk IO' for 75 secs.
2020-02-13T14:37:13.180477+01:00
LGWR (ospid: 17755) waits for event 'kfk: async disk IO' for 39 secs.
2020-02-13T14:37:13.180536+01:00
LGWR (ospid: 17755) is hung in an acceptable location (inwait 0x1.ffff).

There is a problem with the disks.

What the system tells us?

tail /var/log/messages
...
Feb 13 14:34:49 oda-cr-test kernel: nvme nvme1: I/O 615 QID 1 timeout, completion polled
Feb 13 14:35:09 oda-cr-test kernel: nvme nvme0: I/O 348 QID 1 timeout, completion polled
Feb 13 14:35:12 oda-cr-test kernel: nvme nvme1: I/O 623 QID 1 timeout, completion polled
Feb 13 14:35:12 oda-cr-test kernel: nvme nvme1: I/O 624 QID 1 timeout, completion polled
Feb 13 14:35:16 oda-cr-test kernel: nvme nvme0: I/O 349 QID 1 timeout, completion polled
Feb 13 14:35:40 oda-cr-test kernel: nvme nvme0: I/O 348 QID 1 timeout, completion polled
Feb 13 14:35:46 oda-cr-test kernel: nvme nvme0: I/O 349 QID 1 timeout, completion polled
Feb 13 14:35:46 oda-cr-test kernel: nvme nvme0: I/O 351 QID 1 timeout, completion polled
Feb 13 14:35:46 oda-cr-test kernel: nvme nvme0: I/O 352 QID 1 timeout, completion polled
Feb 13 14:35:46 oda-cr-test kernel: nvme nvme0: I/O 353 QID 1 timeout, completion polled
Feb 13 14:35:46 oda-cr-test kernel: nvme nvme0: I/O 354 QID 1 timeout, completion polled
...

Let’s try to create a database with more cores:

odacli update-cpucore -c 16
reboot
odacli create-database -m MAnager_2020_dbi -cs AL32UTF8 -no-c -u DBSUP -dh '2d147842-4f42-468a-93c9-112ce9c23ee7' -n DBSUP -s odb1s -l AMERICAN -dt AMERICA -no-co -r asm
odacli describe-job -i 204fcd53-a9f1-416e-953b-c50448207fc1

Job details
----------------------------------------------------------------
ID: 204fcd53-a9f1-416e-953b-c50448207fc1
Description: Database service creation with db name: DBSUP
Status: Success
Created: February 7, 2020 3:41:54 PM CET
Message:
Task Name Start Time End Time Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Setting up ssh equivalance February 7, 2020 3:41:55 PM CET February 7, 2020 3:41:55 PM CET Success
Database Service creation February 7, 2020 3:41:56 PM CET February 7, 2020 3:50:07 PM CET Success
Database Creation February 7, 2020 3:41:56 PM CET February 7, 2020 3:48:16 PM CET Success
Change permission for xdb wallet files February 7, 2020 3:48:16 PM CET February 7, 2020 3:48:16 PM CET Success
Place SnapshotCtrlFile in sharedLoc February 7, 2020 3:48:16 PM CET February 7, 2020 3:48:19 PM CET Success
SqlPatch upgrade February 7, 2020 3:49:52 PM CET February 7, 2020 3:50:01 PM CET Success
Running dbms_stats init_package February 7, 2020 3:50:01 PM CET February 7, 2020 3:50:05 PM CET Success
updating the Database version February 7, 2020 3:50:05 PM CET February 7, 2020 3:50:07 PM CET Success
create Users tablespace February 7, 2020 3:50:07 PM CET February 7, 2020 3:50:12 PM CET Success
Clear all listeners from Databse {8a6b0534-26be-4a9a-90fd-f2167f57fded} February 7, 2020 3:50:12 PM CET February 7, 2020 3:50:14 PM CET Success

Success in 9 minutes.

odacli update-cpucore -c 8 --force
reboot
odacli create-database -m MAnager_2020_dbi -cs AL32UTF8 -no-c -u DBSUQ -dh '2d147842-4f42-468a-93c9-112ce9c23ee7' -n DBSUQ -s odb1s -l AMERICAN -dt AMERICA -no-co -r asm
odacli describe-job -i 210e1bd7-fe87-4d60-adce-84b121448c2b

Job details
----------------------------------------------------------------
ID: 210e1bd7-fe87-4d60-adce-84b121448c2b
Description: Database service creation with db name: DBSUQ
Status: Success
Created: February 7, 2020 4:05:48 PM CET
Message:
Task Name Start Time End Time Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Setting up ssh equivalance February 7, 2020 4:05:49 PM CET February 7, 2020 4:05:49 PM CET Success
Database Service creation February 7, 2020 4:05:49 PM CET February 7, 2020 5:53:09 PM CET Success
Database Creation February 7, 2020 4:05:49 PM CET February 7, 2020 5:46:50 PM CET Success
Change permission for xdb wallet files February 7, 2020 5:46:50 PM CET February 7, 2020 5:46:50 PM CET Success
Place SnapshotCtrlFile in sharedLoc February 7, 2020 5:46:50 PM CET February 7, 2020 5:48:55 PM CET Success
SqlPatch upgrade February 7, 2020 5:50:28 PM CET February 7, 2020 5:53:02 PM CET Success
Running dbms_stats init_package February 7, 2020 5:53:02 PM CET February 7, 2020 5:53:06 PM CET Success
updating the Database version February 7, 2020 5:53:06 PM CET February 7, 2020 5:53:09 PM CET Success
create Users tablespace February 7, 2020 5:53:09 PM CET February 7, 2020 5:53:45 PM CET Success
Clear all listeners from Databse {f0900b63-baf8-4896-8572-a4120770a362} February 7, 2020 5:53:45 PM CET February 7, 2020 5:53:47 PM CET Success

Success in 1h50.


odacli update-cpucore -c 4 --force
reboot
odacli create-database -m MAnager_2020_dbi -cs AL32UTF8 -no-c -u DBSUR -dh '2d147842-4f42-468a-93c9-112ce9c23ee7' -n DBSUR -s odb1s -l AMERICAN -dt AMERICA -no-co -r asm
odacli describe-job -i 1931b118-3407-413b-babc-ff9a832fab59

Job details
----------------------------------------------------------------
ID: 1931b118-3407-413b-babc-ff9a832fab59
Description: Database service creation with db name: DBSUR
Status: Success
Created: February 7, 2020 6:25:37 PM CET
Message:
Task Name Start Time End Time Status
---------------------------------------- ----------------------------------- ----------------------------------- ----------
Setting up ssh equivalance February 7, 2020 6:25:38 PM CET February 7, 2020 6:25:38 PM CET Success
Database Service creation February 7, 2020 6:25:38 PM CET February 7, 2020 10:09:17 PM CET Success
Database Creation February 7, 2020 6:25:38 PM CET February 7, 2020 9:30:40 PM CET Success
Change permission for xdb wallet files February 7, 2020 9:30:40 PM CET February 7, 2020 9:30:40 PM CET Success
Place SnapshotCtrlFile in sharedLoc February 7, 2020 9:30:40 PM CET February 7, 2020 9:35:21 PM CET Success
SqlPatch upgrade February 7, 2020 9:51:49 PM CET February 7, 2020 10:08:17 PM CET Success
Running dbms_stats init_package February 7, 2020 10:08:17 PM CET February 7, 2020 10:09:14 PM CET Success
updating the Database version February 7, 2020 10:09:14 PM CET February 7, 2020 10:09:17 PM CET Success
create Users tablespace February 7, 2020 10:09:17 PM CET February 7, 2020 10:39:01 PM CET Success
Clear all listeners from Databse {8cb507cf-3c84-4ad9-8302-844005965f6b} February 7, 2020 10:39:01 PM CET February 7, 2020 10:41:49 PM CET Success

Success in 4h15.

That’s it. Decreasing the cores dramatically decreases the I/O performance and makes our ODA unusable.

And this problem doesn’t limit to database creation: don’t expect to run correctly a database with this bug. A small datafile creation will take long minutes.

Are you concerned?

This problem does not concern everybody. If you use Standard Edition 2 databases, you don’t need to decrease the cores on your ODA. So you won’t experience this problem. If you have enough Enterprise Edition licenses (at least to enable half of the total cores), you also won’t have this problem. This problem only impacts those who have limited number of licenses.

All my tests were done with ODA 19.5, so it’s probably limited to this specific version. ODA X8.2 with 18.7 shouldn’t have this bug.

Is there a workaround?

For now, there is no workaround. But it seems that an updated Linux kernel could solve the problem. Current kernel provided with 19.5 is 4.14.35-1902.5.2.el7uek.x86_64. Oracle will probably provide something soon.

Conclusion

One of the main advantages of ODA is the core configuration to fit the license. But for now it doesn’t work out of the box with ODA X8-2 and 19.5. If you can keep 11g/12c/18c, waiting for release 19.6 is probably better.

Cet article odacli create-database extremely slow on ODA X8-2 with 19.5 est apparu en premier sur Blog dbi services.


SQLNET.EXPIRE_TIME and ENABLE=BROKEN

$
0
0

By Franck Pachot

.
Those parameters, SQLNET.EXPIRE_TIME in sqlnet.ora and ENABLE=BROKEN in a connection description exist for a long time but may have changed in behavior. They are both related to detecting dead TCP connections with keep-alive probes. The former from the server, and the latter from the client.

The change in 12c is described in the following MOS note: Oracle Net 12c: Changes to the Functionality of Dead Connection Detection (DCD) (Doc ID 1591874.1). Basically instead sending a TNS packet for the keep-alive, the server Dead Connection Detection now relies on the TCP keep-alive feature when available. The note mentions that it may be required to set (ENABLE=BROKEN) in the connection string “in some circumstances” - which is not very precise. This “ENABLE=BROKEN” was used in the past for transparent failover when we had no VIP (virtual IP) in order to detect a lost connection to the server.

I don’t like those statements like “on some platform”, “in some circumstances”, “with some drivers”, “it may be necessary”… so there’s only one solution: test it in your context.

My listener is on (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521))) and I will connect to it and keep my connection idle (no user call to the server).I trace the server (through the forks of the listener, found by pgrep with the name of listener associated with this TCP address) and color it in green (GREP_COLORS=’ms=01;32′):

pkill strace ; strace -fyye trace=socket,setsockopt -p $(pgrep -f "tnslsnr $(lsnrctl status "(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521)))" | awk '/^Alias/{print $2}') ") 2>&1 | GREP_COLORS='ms=01;32' grep --color=auto -E '^|.*sock.*|^=*' &

I trace the client and color it in yellow (GREP_COLORS=’ms=01;32′):

strace -fyye trace=socket,setsockopt sqlplus demo/demo@"(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=PDB1))(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=1521)))" <<<quit 2>&1 | GREP_COLORS='ms=01;33' grep --color=auto -E '^|.*sock.*|^=*'

I’m mainly interested by the setsockopt() here because this is how to enable TCP Keep Alive.

(ENABLE=BROKEN) on the client

My first test is without enabling DCD on the server: I have nothing defined in sqlnet.ora on the server side. I connect from the client without mentioning “ENABLE=BROKEN”:


The server (green) has set SO_KEEPALIVE but not the client.

Now I run the same scenario but adding (ENABLE=BROKEN) in the description:

strace -fyye trace=socket,setsockopt sqlplus demo/demo@"(DESCRIPTION=(ENABLE=BROKEN)(CONNECT_DATA=(SERVICE_NAME=PDB1))(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=1521)))" <<<quit 2>&1 | GREP_COLORS='ms=01;33' grep --color=auto -E '^|.*sock.*|^=*'

The client (yellow) has now a call to set keep-alive:

setsockopt(9<TCP:[1810151]>, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0

However, as I’ll show later, this uses the TCP defaults:

[oracle@db195 tmp]$ tail /proc/sys/net/ipv4/tcp_keepalive*
==> /proc/sys/net/ipv4/tcp_keepalive_intvl <== 
75
==> /proc/sys/net/ipv4/tcp_keepalive_probes <== 
9
==> /proc/sys/net/ipv4/tcp_keepalive_time <== 
7200

After 2 hours (7200 seconds) of idle connection, the client will send a probe 9 times, every 75 seconds. If you want to reduce it you must change it on the client system settings. If you don’t add “(ENABLE=BROKEN)” the dead broken connection will not be detected before then next user call, after the default TCP timeout (15 minutes).

That’s only from the client when its connection to the server is lost.

SQLNET.EXPIRE_TIME on the server

On the server side, we have seen that SO_KEEPALIVE is set - using the TCP defaults. But, there, it may be important to detect dead connections faster because a session may hold some locks. You can (and should) set a lower value in sqlnet.ora with SQLNET.EXPIRE_TIME. Before 12c this parameter was used to send TNS packets as keep-alive probes but now that SO_KEEPALIVE is set, this parameter will control the keep-alive idle time (using TCP_KEEPIDL instead of the default /proc/sys/net/ipv4/tcp_keepalive_time).
Here is the same as my first test (without the client ENABLE=BROKER) but after having set SQLNET.EXPIRE_TIME=42 in $ORACLE_HOME/network/admin/sqlnet.ora

Side note: I’ve got the “do we need to restart the listener?” question very often about changes in sqlnet.ora but the answer is clearly “no”. This file is read for each new connection to the database. The listener forks the server (aka shadow) process and this one reads the sqlnet.ora, as we can see here when I “strace -f” the listener but the forked process is setting-up the socket.

Here is the new setsockopt() from the server process:

[pid  5507] setsockopt(16<TCP:[127.0.0.1:1521->127.0.0.1:31374]>, SOL_TCP, TCP_KEEPIDLE, [2520], 4) = 0
[pid  5507] setsockopt(16<TCP:[127.0.0.1:1521->127.0.0.1:31374]>, SOL_TCP, TCP_KEEPINTVL, [6], 4) = 0
[pid  5507] setsockopt(16<TCP:[127.0.0.1:1521->127.0.0.1:31374]>, SOL_TCP, TCP_KEEPCNT, [10], 4) = 0

This means that the server waits for 42 minutes of inactivity (the EXPIRE_TIME that I’ve set, here TCP_KEEPIDLE=2520 seconds) and then sends a probe. Without answer (ack) it re-probes every 6 seconds during one minute (the 6 seconds interval is defined by TCP_KEEPINTVL and TCP_KEEPCNT sets the retries to 10 times). We control the idle time with SQLNET.EXPIRE_TIME and then can expect that a dead connection is closed after one additional minute of retry.

Here is a combination of SQLNET.EXPIRE_TIME (server detecting dead connection in 42+1 minute) and ENABLE=BROKEN (client detecting dead connection after the default of 2 hours):

tcpdump and iptable drop

The above, with strace, shows the translation of Oracle settings to Linux settings. Now I’ll translate to the actual behavior by tracing the TCP packets exchanged, with tcpdump:

sqlplus demo/demo@"(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=PDB1))(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521)))"
host cat $ORACLE_HOME/network/admin/sqlnet.ora
host sudo netstat -np  | grep sqlplus
host sudo netstat -np  | grep 36316
set time on escape on
host sudo tcpdump -vvnni lo port 36316 \&

“netstat -np | grep sqlplus” finds the client connection in order to get the port and “netstat -np | grep $port” shows both connections (“sqlplus” for the client and “oracleSID” for the server).

I have set SQLNET.EXPIRE_TIME=3 here and I can see that the server sends a 0-length packets every 3 minutes (connection at 14:43:39, then idle, 1st probe: 14:46:42, 2nd probe: 14:49:42…). And each time the client replied with an ACK and then the server knows that the connection is still alive:

Now I simulate a client that doesn’t answer, by blocking the input packets:

host sudo iptables -I INPUT 1 -p tcp --dport 36316 -j DROP
host sudo netstat -np  | grep 36316

Here I see the next probe 3 minutes after the last one (14:55:42) and then, as there is no reply, the 10 probes every 6 seconds:

At the end, I checked the TCP connections and the server one has disappeared. But the client side remains. That is exactly what DCD does: when a session is idle for a while it tests if the connection is dead and closes it to release all resources.
If I continue from there and try to run a query, the server cannot be reached and I’ll hang for the default TCP timeout of 15 minutes. If I try to cancel, I get “ORA-12152: TNS:unable to send break message” as it tries to send an out-of-bound break. SQLNET.EXPIRE_TIME is only for the server-side. The client detects nothing until it tries to send something.

For the next test, I remove my iptables rule to stop blocking the packets:

host sudo iptables -D INPUT 1

And I’m now running the same but with (ENABLE=BROKEN)

connect demo/demo@(DESCRIPTION=(ENABLE=BROKEN)(CONNECT_DATA=(SERVICE_NAME=PDB1))(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521)))
host sudo netstat -np  | grep sqlplus
host sudo netstat -np  | grep 37064
host sudo tcpdump -vvnni lo port 37064 \&
host sudo iptables -I INPUT 1 -p tcp --dport 37064 -j DROP
host sudo netstat -np  | grep 37064
host sudo iptables -D INPUT 1
host sudo netstat -np  | grep 37064

Here is the same as before: DCD after 3 minutes idle, and 10 probes that fail because I’ve blocked again with iptables:

As with the previous test, the server connection (the oracleSID) has been closed and only the client one remains. As I know that SO_KEEPALIVE has been enabled thanks to (ENABLE=BROKEN) the client will detect the closed connection:

17:52:48 is 2 hours after the last activity and probes 9 times every 1’15 according to the system defaults:

[oracle@db195 tmp]$ tail /proc/sys/net/ipv4/tcp_keepalive*
==> /proc/sys/net/ipv4/tcp_keepalive_intvl <==    TCP_KEEPINTVL
75
==> /proc/sys/net/ipv4/tcp_keepalive_probes <==     TCP_KEEPCNT
9
==> /proc/sys/net/ipv4/tcp_keepalive_time <==      TCP_KEEPIDLE

It was long (but you can change those defaults on the client) but finally, the client connection is cleared up (sqlplus not there in the last netstat).
Now, an attempt to run a user call fails immediately with the famous ORA-03113 because the client knows that the connection is closed:

Just a little additional test to show ORA-03135. If the server detects and closes the dead connection, but before the dead connection is detected on the client, we have seen that we wait for a 15 minutes timeout. But that’s because the iptable rule was still there to drop the packet. If I remove the rule before attempting a user-call, the server can be reached (then no wait and timeout) and detects immediately that there’s no endpoint anymore. This raises “connection lost contact”.

In summary:

  • On the server, the keep-alive is always enabled and SQLNET.EXPIRE_TIME is used to reduce the tcp_keepalive_time defined by the system, because it is probably too long.
  • On the client, the keep-alive is enabled only when (ENABLE=BROKEN) is in the connection description, and uses the tcp_keepalive_time from the system. Without it, the broken connection will be detected only when attempting a user call.

Setting SQLNET.EXPIRE_TIME to a few minutes (like 10) is a good idea because you don’t want to keep resources and locks on the server when a simple ping can ensure that the connection is lost and we have to rollback. If we don’t, then the dead connections may disappear only after 2 hours and 12 minutes (the idle time + the probes). On the client-side, it is also a good idea to add (ENABLE=BROKEN) so that idle sessions that have lost contact have a chance to know it before trying to use them. This is a performance gain if it helps to avoid sending a “select 1 from dual” each time you grab a connection from the pool

And, most important: the documentation is imprecise, which means that the behavior can change without notification. This is a test on specific OS, specific driver, specific version,… Do not take the results from this post, but now you know how to check in your environment.

Cet article SQLNET.EXPIRE_TIME and ENABLE=BROKEN est apparu en premier sur Blog dbi services.

Oracle 20c : The new PREPARE DATABASE FOR DATA GUARD

$
0
0

As you may know, Oracle 20c is in the cloud with new features. The one I have tested is the PREPARE DATABASE FOR DATA GUARD.
This command configures a database for use as a primary database in a Data Guard broker configuration. Database initialization parameters are set to recommended values.
Let’s see what this command will do for us
The db_unique_name of the primary database is prod20 and in the Data Guard I will build, the db_unique_name will be changed to prod20_site1.

SQL> show parameter db_unique_name

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
db_unique_name			     string	 prod20
SQL> 

Now let’s connect to the broker can run the help to see the syntax

[oracle@oraadserver ~]$ dgmgrl
DGMGRL for Linux: Release 20.0.0.0.0 - Production on Tue Feb 18 21:36:39 2020
Version 20.2.0.0.0

Copyright (c) 1982, 2020, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect /
Connected to "prod20_site1"
Connected as SYSDG.
DGMGRL> 
 
DGMGRL> help prepare    

Prepare a primary database for a Data Guard environment.

Syntax:

  PREPARE DATABASE FOR DATA GUARD
    [WITH [DB_UNIQUE_NAME IS ]
          [DB_RECOVERY_FILE_DEST IS ]
          [DB_RECOVERY_FILE_DEST_SIZE IS ]
          [BROKER_CONFIG_FILE_1 IS ]
          [BROKER_CONFIG_FILE_2 IS ]];

And then run the command

DGMGRL> PREPARE DATABASE FOR DATA GUARD with DB_UNIQUE_NAME is prod20_site1;
Preparing database "prod20" for Data Guard.
Initialization parameter DB_UNIQUE_NAME set to 'prod20_site1'.
Initialization parameter DB_FILES set to 1024.
Initialization parameter LOG_BUFFER set to 268435456.
Primary database must be restarted after setting static initialization parameters.
Shutting down database "prod20_site1".
Database closed.
Database dismounted.
ORACLE instance shut down.
Starting database "prod20_site1" to mounted mode.
ORACLE instance started.
Database mounted.
Initialization parameter DB_FLASHBACK_RETENTION_TARGET set to 120.
Initialization parameter DB_LOST_WRITE_PROTECT set to 'TYPICAL'.
RMAN configuration archivelog deletion policy set to SHIPPED TO ALL STANDBY.
Adding standby log group size 209715200 and assigning it to thread 1.
Adding standby log group size 209715200 and assigning it to thread 1.
Adding standby log group size 209715200 and assigning it to thread 1.
Initialization parameter STANDBY_FILE_MANAGEMENT set to 'AUTO'.
Initialization parameter DG_BROKER_START set to TRUE.
Database set to FORCE LOGGING.
Database set to FLASHBACK ON.
Database opened.
DGMGRL> 

The output shows the changes done by the PREPARE command. We can do some checks

SQL> show parameter db_unique_name

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
db_unique_name			     string	 prod20_site1
SQL> select flashback_on,force_logging from v$database;

FLASHBACK_ON	   FORCE_LOGGING
------------------ ---------------------------------------
YES		   YES

SQL> 

SQL> show parameter standby_file

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
standby_file_management 	     string	 AUTO
SQL> 

But here I can see that I only have 3 standby redo log groups instead of 4 (as I have 3 redo log groups)

SQL> select bytes,group# from v$log;

     BYTES     GROUP#
---------- ----------
 209715200	    1
 209715200	    2
 209715200	    3

SQL> 


SQL> select group#,bytes from v$standby_log;

    GROUP#	BYTES
---------- ----------
	 4  209715200
	 5  209715200
	 6  209715200

SQL> 

After building the Data Guard I did some checks (note that steps not shown here but the same that other version)
For the configuration

DGMGRL> show configuration verbose;

Configuration - prod20

  Protection Mode: MaxPerformance
  Members:
  prod20_site1 - Primary database
    prod20_site2 - Physical standby database 

  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'
    ConfigurationWideServiceName    = 'prod20_CFG'
    ConfigurationSimpleName         = 'prod20'

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS

For the primary database

DGMGRL> show database verbose 'prod20_site1';

Database - prod20_site1

  Role:                PRIMARY
  Intended State:      TRANSPORT-ON
  Instance(s):
    prod20

  Properties:
    DGConnectIdentifier             = 'prod20_site1'
    ObserverConnectIdentifier       = ''
    FastStartFailoverTarget         = ''
    PreferredObserverHosts          = ''
    LogShipping                     = 'ON'
    RedoRoutes                      = ''
    LogXptMode                      = 'ASYNC'
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyLagThreshold               = '30'
    TransportLagThreshold           = '30'
    TransportDisconnectedThreshold  = '30'
    ApplyParallel                   = 'AUTO'
    ApplyInstances                  = '0'
    ArchiveLocation                 = ''
    AlternateLocation               = ''
    StandbyArchiveLocation          = ''
    StandbyAlternateLocation        = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    LogXptStatus                    = '(monitor)'
    SendQEntries                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    HostName                        = 'oraadserver'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=oraadserver)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=prod20_site1_DGMGRL)(INSTANCE_NAME=prod20)(SERVER=DEDICATED)))'
    TopWaitEvents                   = '(monitor)'
    SidName                         = '(monitor)'

  Log file locations:
    Alert log               : /u01/app/oracle/diag/rdbms/prod20_site1/prod20/trace/alert_prod20.log
    Data Guard Broker log   : /u01/app/oracle/diag/rdbms/prod20_site1/prod20/trace/drcprod20.log

Database Status:
SUCCESS

DGMGRL> 

For the standby database

DGMGRL> show database verbose 'prod20_site2';

Database - prod20_site2

  Role:                PHYSICAL STANDBY
  Intended State:      APPLY-ON
  Transport Lag:       0 seconds (computed 1 second ago)
  Apply Lag:           0 seconds (computed 1 second ago)
  Average Apply Rate:  2.00 KByte/s
  Active Apply Rate:   0 Byte/s
  Maximum Apply Rate:  0 Byte/s
  Real Time Query:     OFF
  Instance(s):
    prod20

  Properties:
    DGConnectIdentifier             = 'prod20_site2'
    ObserverConnectIdentifier       = ''
    FastStartFailoverTarget         = ''
    PreferredObserverHosts          = ''
    LogShipping                     = 'ON'
    RedoRoutes                      = ''
    LogXptMode                      = 'ASYNC'
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyLagThreshold               = '30'
    TransportLagThreshold           = '30'
    TransportDisconnectedThreshold  = '30'
    ApplyParallel                   = 'AUTO'
    ApplyInstances                  = '0'
    ArchiveLocation                 = ''
    AlternateLocation               = ''
    StandbyArchiveLocation          = ''
    StandbyAlternateLocation        = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    LogXptStatus                    = '(monitor)'
    SendQEntries                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    HostName                        = 'oraadserver2'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=oraadserver2)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=PROD20_SITE2_DGMGRL)(INSTANCE_NAME=prod20)(SERVER=DEDICATED)))'
    TopWaitEvents                   = '(monitor)'
    SidName                         = '(monitor)'

  Log file locations:
    Alert log               : /u01/app/oracle/diag/rdbms/prod20_site2/prod20/trace/alert_prod20.log
    Data Guard Broker log   : /u01/app/oracle/diag/rdbms/prod20_site2/prod20/trace/drcprod20.log

Database Status:
SUCCESS

DGMGRL> 

Conclusion

I am sure that you will adopt this nice command.

Cet article Oracle 20c : The new PREPARE DATABASE FOR DATA GUARD est apparu en premier sur Blog dbi services.

Oracle 20c Data Guard : Validating a Fast Start Failover Configuration

$
0
0

In Oracle 20c, we can now validate a Fast Start Failover configuration with the new command VALIDATE FAST_START FAILOVER. This command will help identifying issues in the configuration. I tested this new feature.
The Fast Start Failover is configured and the observer is running fine as we can see below.

DGMGRL> show configuration verbose

Configuration - prod20

  Protection Mode: MaxPerformance
  Members:
  prod20_site1 - Primary database
    prod20_site2 - (*) Physical standby database 

  (*) Fast-Start Failover target
  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'
    ConfigurationWideServiceName    = 'prod20_CFG'
    ConfigurationSimpleName         = 'prod20'

Fast-Start Failover: Enabled in Potential Data Loss Mode
  Lag Limit:          30 seconds
  Threshold:          30 seconds
  Active Target:      prod20_site2
  Potential Targets:  "prod20_site2"
    prod20_site2 valid
  Observer:           oraadserver2
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE

Configuration Status:
SUCCESS

DGMGRL> 

If we run the command we can see that everything is working and that a failover will happen if needed

DGMGRL> VALIDATE FAST_START FAILOVER;
  Fast-Start Failover:  Enabled in Potential Data Loss Mode
  Protection Mode:      MaxPerformance
  Primary:              prod20_site1
  Active Target:        prod20_site2

DGMGRL

Now let’s stop the observer

DGMGRL> stop observer

Observer stopped.

And if we run the Validate command again, we have the following message

DGMGRL> VALIDATE FAST_START FAILOVER;

  Fast-Start Failover:  Enabled in Potential Data Loss Mode
  Protection Mode:      MaxPerformance
  Primary:              prod20_site1
  Active Target:        prod20_site2

Fast-start failover not possible:
  Fast-start failover observer not started.

DGMGRL> 

Cet article Oracle 20c Data Guard : Validating a Fast Start Failover Configuration est apparu en premier sur Blog dbi services.

Speed up datapump export for migrating big databases

$
0
0

Introduction

Big Oracle databases (several TB) are still tough to migrate to another version on a new server. For most of them, you’ll probably use RMAN restore or Data Guard, but datapump is always a cleaner way to migrate. With datapump, you can easily migrate to a new filesystem (ASM for example), rethink your tablespace organization, reorganize all the segments, exclude unneeded components, etc. All of these tasks in one operation. But datapump export can take hours and hours to complete. This blog post describe a method I used on several projects: it helped me a lot to optimize migration time.

Why datapump export takes so much time?

First of all, exporting data with datapump is actually extracting all the objects from the database, so it’s easy to understand why it’s much slower than copying datafiles. Regarding datapump speed, it mainly depends on disk speed where datafiles reside, and parallelism level. Increasing parallelism does not always speed up export, simply because if you’re on mechanical disks, it’s slower to read multiple objects on the same disks than actually do it serially. So there is some kind of limit, and for big databases, it can last hours to export data. Another problem is that long lasting export needs more undo data. If your datapump export lasts 10 hours, you’ll need 10 hours of undo_retention (if you need a consistent dump – at least when testing the migration because application is running). You’re also risking DDL changes on the database, and undo_retention cannot do anything for that. Be carefull because uncomplete dump is totally usable to import data, but you’ll miss several objects, not the goal I presume.

The solution would be trying to reduce the time needed for datapump export to avoid such problems.

SSD is the solution

SSD is probably the best choice for today’s databases. No more bottleneck with I/Os, that’s all we were waiting for. But your source database, an old 11gR2 or 12cR1, probably doesn’t run on SSD, especially if it’s a big database. SSD were quite small and expensive several years ago. So what? You probably didn’t plan a SSD migration on source server as you will decommission it as soon as migration is finished.

The solution is to use a temporary server fitted with fast SSDs. You don’t need a real server, with a fully rendundant configuration. You even don’t need RAID at all to protect your data because this server will only be for a single use: JBOD is OK.

How to configure this server?

This server will have:

  • exactly the same OS, or something really similar compared to source server
  • the exact same Oracle version
  • the same configuration of the filesystems
  • enough free space to restore the source database
  • SSD-only storage for datafiles without redundancy
  • enough cores to maximise the parallelism level
  • a shared folder to put the dump, this shared folder would also be mounted on target server
  • a shared folder to pick up the latest backups from source database
  • enough bandwith for shared folders. 1Gbps network is only about 100MB/s, so don’t expect very high speed with that kind of network
  • you don’t need a listener
  • you’ll never use this database for you application
  • if you’re reusing a server, make sure it will be dedicated for this purpose (no other running processes)

And regarding the license?

As you may know this server would need a license. But you also know that during the migration project, you’ll have twice the license used on your environment for several weeks: still using old servers, and already using new servers for migrated database. To avoid any problem, you can use a server previously running Oracle databases and already decommissionned. Tweak it with SSDs and it will be fine. And please make sure to be fully compliant with the Oracle license on your target environment.

How to proceed?

We won’t use this server as a one-shot path for migration because we need to try if the method is good enough and also find the best settings for datapump.

To proceed, the steps are:

  • declare the database in /etc/oratab
  • create a pfile on source server and copy it to $ORACLE_HOME/dbs on the temporary server
  • edit the parameters to disable references to source environnement, for example local and remote_listeners and Data Guard settings. The goal is to make sure starting this database will have no impact on production
  • startup the instance on this pfile
  • restore the controlfile from the very latest controlfile autobackup
  • restore the database
  • recover the database and check the SCN
  • take a new archivelog backup on the source database (to simulate the real scenario)
  • catalog the backup folder on the temporary database with RMAN
  • do another recover database on temporary database, it should apply the archivelogs of the day, then check again the SCN
  • open the database in resetlogs mode
  • create the target directory for datapump on the database
  • do the datapump export with maximum parallelism level (2 times the number of cores available on your server – it will be too many at the beginning, but not enough at the end)

You can try various parallelism levels to adjust to the best value. Once you’ve found the best value, you can schedule the real migration.

Production migration

Now you managed to master the method, let’s imagine that you planned to migrate to production tonight at 18:00.

09:00 – have a cup of coffee first, you’ll need it!
09:15 – remove all the datafiles on the temporary server, also remove redologs and controlfiles, and empty the FRA. Only keep the pfile.
09:30 – startup force your temporary database, it should stop in nomount mode
09:45 – restore the latest controlfile autobackup on temporary database. Make sure no datafile will be added today on production
10:00 – restore the database on the temporary server. During the restore, production is still available on source server. At the end of the restore, do a first recover but DON’T open your database with resetlogs now
18:00 – your restore should be finished now, you can disconnect everyone from source database, and take the very latest archivelog backup on source database. From now your application should be down.
18:20 – on your temporary database, catalog the backup folder with RMAN. It will discover the latest archivelog backups.
18:30 – do a recover of your temporary database again. It should apply the latest archivelogs (generated during the day). If you want to make sure that everything is OK, check the current_scn on source database, it should be nearly the same as your temporary database
18:45 – open the temporary database with RESETLOGS
19:00 – do the datapump export with your optimal settings

Once done, you now have to do the datapump import on your target database. Parallelism will depend on the cores available on target server, and the resources you would preserve for other databases already running on this server.

Benefits and drawbacks

Obvious benefit is that it probably costs less than 30 minutes to apply the archivelogs of the day on the temporary database. And total duration of the export can be cut by several hours.

First drawback is that you’ll need a server of this kind, or you’ll need to build one. Second drawback is if you’re using Standard Edition: don’t expect to save that much hours as it has no parallelism at all. Big databases are not very well deserved by Standard Edition, you may know.

Real world example

This is a recent case. Source database is 12.1, about 2TB on mechanical disks. Datapump export is not working correctly: it lasted more than 19 hours with lots of errors. One of the big problem of this database is a bigfile tablespace of 1.8TB. Who did this kind of configuration?

Temporary server is a DEV server already decommissioned running the same version of Oracle and using the same Linux kernel. This server is fitted with enough TB of SSD: mount path was changed to match source database filesystems.

On source server:

su – oracle
. oraenv <<< BP3
sqlplus / as sysdba
create pfile='/tmp/initBP3.ora' from spfile;
exit
scp /tmp/initBP3.ora oracle@db32-test:/tmp

On temporary server:
su – oracle
cp /tmp/initBP3.ora /opt/orasapq/oracle/product/12.1.0.2/dbs/
echo "BP3:/opt/orasapq/oracle/product/12.1.0.2:N" >> /etc/oratab
. oraenv <<< BP3
vi $ORACLE_HOME/dbs/initBP3.ora
remove db_unique_name, dg_broker_start, fal_server, local_listener, log_archive_config, log_archive_dest_2, log_archive_dest_state_2, service_names from this pfile
sqlplus / as sysdba
startup force nomount;
exit
ls -lrt /backup/db42-prod/BP3/autobackup | tail -n 1
/backup/db42-prod/BP3/autobackup/c-2226533455-20200219-01
rman target /
restore controlfile from '/backup/db42-prod/BP3/autobackup/c-2226533455-20200219-01';
alter database mount;
CONFIGURE DEVICE TYPE DISK PARALLELISM 8 BACKUP TYPE TO BACKUPSET;
restore database;
...
recover database;
exit;

On source server:
Take a last backup of archivelogs with your own script: the one used in scheduled tasks.

On temporary server:
su – oracle
. oraenv <<< BP3
rman target /
select current_scn from v$database;
CURRENT_SCN
-----------
11089172427
catalog start with '/backup/db42-prod/BP3/backupset/';
recover database;
select current_scn from v$database;
CURRENT_SCN
-----------
11089175474
alter database open resetlogs;
exit;
sqlplus / as sysdba
create or replace directory mig as '/backup/dumps/';
expdp \'/ as sysdba\' full=y directory=migration dumpfile=expfull_BP3_`date +%Y%m%d_%H%M`_%U.dmp parallel=24 logfile=expfull_BP3_`date +%Y%m%d_%H%M`.log

Export was done in less than 5 hours, 4 times less than on source database. Database migration could now fit in one night. Much better isn’t it?

Other solutions

If you’re used to Data Guard, you can create a standby on this temporary server that would be dedicated to this purpose. No need to manually apply the latest archivelog backup of the day because it’s already in sync. Just convert this standby to primary without impacting the source database, or do a simple switchover then do the datapump export.

Transportable tablespace is a mixed solution where datafiles are copied to destination database, only metadata being exported and imported. But don’t expect any kind of reorganisation here.

If you cannot afford a downtime of several hours of migration, you should think about logical replication. Solutions like Golden Gate are perfect for keeping application running. But as you probably know, it comes at a cost.

Conclusion

If several hours of downtime is acceptable, datapump is still a good option for migration. Downtime is all about disk speed and parallelism.

Cet article Speed up datapump export for migrating big databases est apparu en premier sur Blog dbi services.

Oracle 20c SQL Macros: a scalar example to join agility and performance

$
0
0

By Franck Pachot

.
Let’s say you have a PEOPLE table with FIRST_NAME and LAST_NAME and you want, in many places of your application, to display the full name. Usually my name will be displayed as ‘Franck Pachot’ and I can simply add a virtual column to my table, or view, as: initcap(FIRST_NAME)||’ ‘||initcap(LAST_NAME). Those are simple SQL functions. No need for procedural code there, right? But, one day, the business will come with new requirements. In some countries (I’ve heard about Hungary but there are others), my name may be displayed with last name… first, like: ‘Pachot Franck’. And in some context, it may have a comma like: ‘Pachot, Franck’.

There comes a religious debate between Dev and Ops:

  • Developer: We need a function for that, so that the code can evolve without changing all SQL queries or views
  • DBA: That’s the worst you can do. Calling a function for each row is a context switch between SQL and PL/SQL engine. Not scalable.
  • Developer: Ok, let’s put all that business logic in the application so that we don’t have to argue with the DBA…
  • DBA: Oh, that’s even worse. The database cannot perform correctly with all those row-by-row calls!
  • Developer: No worry, we will put the database on Kubernetes, shard and distribute it, and scale as far as we need for acceptable throughput

And this is where we arrive in an unsustainable situation. Because we didn’t find a tradeoff between code maintainability and application performance, we get the worst from each of them: crazy resource usage for medium performance.

However, in Oracle 20c, we have a solution for that. Did you code some C programs where you replace functions by pre-processor macros? So that your code is readable and maintainable like when using modules and functions. But compiled as if those functions have been merged to the calling code at compile time? What was common in those 3rd generation languages is now possible in a 4th generation declarative language: Oracle SQL.

Let’s take an example. I’m building a PEOPLE table using the Linux /usr/share/dict of words:


create or replace directory "/usr/share/dict" as '/usr/share/dict';
create table people as
with w as (
select *
 from external((word varchar2(60))
 type oracle_loader default directory "/usr/share/dict" access parameters (nologfile) location('linux.words'))
) select upper(w1.word) first_name , upper(w2.word) last_name
from w w1,w w2 where w1.word like 'ora%' and w2.word like 'aut%'
order by ora_hash(w1.word||w2.word)
/

I have 100000 rows table here with first and last names.
Here is a sample:


SQL> select count(*) from people;

  COUNT(*)
----------
    110320

SQL> select * from people where rownum<=10;

FIRST_NAME                     LAST_NAME
------------------------------ ------------------------------
ORACULUM                       AUTOMAN
ORANGITE                       AUTOCALL
ORANGUTANG                     AUTHIGENOUS
ORAL                           AUTOPHOBIA
ORANGUTANG                     AUTOGENEAL
ORATORIAN                      AUTOCORRELATION
ORANGS                         AUTOGRAPHICAL
ORATORIES                      AUTOCALL
ORACULOUSLY                    AUTOPHOBY
ORATRICES                      AUTOCRATICAL

PL/SQL function

Here is my function that displays the full name, with the Hungarian specificity as an example but, as it is a function, it can evolve further:


create or replace function f_full_name(p_first_name varchar2,p_last_name varchar2)
return varchar2
as
 territory varchar2(64);
begin
 select value into territory from nls_session_parameters
 where parameter='NLS_TERRITORY';
 case (territory)
 when 'HUNGARY'then return initcap(p_last_name)||' '||initcap(p_first_name);
 else               return initcap(p_first_name)||' '||initcap(p_last_name);
 end case;
end;
/
show errors

The functional result depends on my session settings:


SQL> select f_full_name(p_first_name=>first_name,p_last_name=>last_name) from people
     where rownumFIRST_NAME,P_LAST_NAME=>LAST_NAME)
------------------------------------------------------------------------------------------------
Oraculum Automan
Orangite Autocall
Orangutang Authigenous
Oral Autophobia
Orangutang Autogeneal
Oratorian Autocorrelation
Orangs Autographical
Oratories Autocall
Oraculously Autophoby
Oratrices Autocratical

10 rows selected.

But let’s run it on many rows, like using this function in the where clause, with autotrace:


SQL> set timing on autotrace on
select f_full_name(first_name,last_name) from people
where f_full_name(p_first_name=>first_name,p_last_name=>last_name) like 'Oracle Autonomous';

F_FULL_NAME(FIRST_NAME,LAST_NAME)
------------------------------------------------------------------------------------------------------
Oracle Autonomous

Elapsed: 00:00:03.47

Execution Plan
----------------------------------------------------------
Plan hash value: 2528372185

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |  1103 | 25369 |   129   (8)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| PEOPLE |  1103 | 25369 |   129   (8)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("F_FULL_NAME"("P_FIRST_NAME"=>"FIRST_NAME","P_LAST_NAME"=>
              "LAST_NAME")='Oracle Autonomous')


Statistics
----------------------------------------------------------
     110361  recursive calls
          0  db block gets
        426  consistent gets
          0  physical reads
          0  redo size
        608  bytes sent via SQL*Net to client
        506  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

100000 recursive calls. That is bad and not scalable. The time spent in context switches from the SQL to the PL/SQL engine is a waste of CPU cycles.

Note that this is difficult to improve because we cannot create on index for that predicate:


SQL> create index people_full_name on people(f_full_name(first_name,last_name));
create index people_full_name on people(f_full_name(first_name,last_name))
                                        *
ERROR at line 1:
ORA-30553: The function is not deterministic

Yes, this function cannot be deterministic because it depends on many other parameters (like the territory in this example, in order to check if I am in Hungary)

Update 25-FEB-2020

Please, before reading the next part, keep in mind that this example, using NLS_TERRITORY, is not a good idea at all, at least in the current 20c preview. It thought that changing NLS_TERRITORY would re-parse the query but it seems that cursors are shared across different territories:


In addition to that, the documentation says:
Although the DETERMINISTIC property cannot be specified, a SQL macro is always implicitly deterministic.
So this example were a bit too optimistic (not changing the signature of the function). We should add the ‘NLS_TERRITORY’ as a parameter here. I’ll update the post later.

SQL Macro

The solution in 20c, currently available in the Oracle Cloud, here is very easy. I create a new function, M_FULL_NAME, when the only differences with F_FULL_NAME are:

  1. I add the SQL_MACRO(SCALAR) keyword and change the return type to varchar2 (if not already)
  2. I enclose the return expression value in quotes (using q'[ … ]’ for better readability) to return it as a varchar2 containing the expression string where variable names are just placeholders (no bind variables here!)

create or replace function m_full_name(p_first_name varchar2,p_last_name varchar2)
return varchar2 SQL_MACRO(SCALAR)
as
 territory varchar2(64);
begin
 select value into territory from nls_session_parameters
 where parameter='NLS_TERRITORY';
 case (territory)
 when 'HUNGARY'then return q'[initcap(p_last_name)||' '||initcap(p_first_name)]';
 else               return q'[initcap(p_first_name)||' '||initcap(p_last_name)]';
 end case;
end;
/

Here is the difference if I call both of them:


SQL> set serveroutput on
SQL> exec dbms_output.put_line(f_full_name('AAA','BBB'));
Aaa Bbb

PL/SQL procedure successfully completed.

SQL> exec dbms_output.put_line(m_full_name('AAA','BBB'));
initcap(p_first_name)||' '||initcap(p_last_name)

PL/SQL procedure successfully completed.

SQL> select m_full_name('AAA','BBB') from dual;

M_FULL_
-------
Aaa Bbb

One returns the function value, the other returns the expression that can be used to return the value. It is a SQL Macro that can be applied to a SQL text to replace part of it – a scalar expression in this case as I mentioned SQL_MACRO(SCALAR)

The result is the same as with the previous function:


SQL> select m_full_name(p_first_name=>first_name,p_last_name=>last_name) from people
     where rownumFIRST_NAME,P_LAST_NAME=>LAST_NAME)
-------------------------------------------------------------------------------------------------------
Oraculum Automan
Orangite Autocall
Orangutang Authigenous
Oral Autophobia
Orangutang Autogeneal
Oratorian Autocorrelation
Orangs Autographical
Oratories Autocall
Oraculously Autophoby
Oratrices Autocratical

10 rows selected.

And now let’s look at the query using this as a predicate:


SQL> set timing on autotrace on
SQL> select m_full_name(first_name,last_name) from people
     where m_full_name(p_first_name=>first_name,p_last_name=>last_name) like 'Oracle Autonomous';

M_FULL_NAME(FIRST_NAME,LAST_NAME)
-----------------------------------------------------------------------------------------------------
Oracle Autonomous

Elapsed: 00:00:00.06

Execution Plan
----------------------------------------------------------
Plan hash value: 2528372185

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |  1103 | 25369 |   122   (3)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| PEOPLE |  1103 | 25369 |   122   (3)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(INITCAP("FIRST_NAME")||' '||INITCAP("LAST_NAME")='Oracle
              Autonomous')


Statistics
----------------------------------------------------------
         40  recursive calls
          4  db block gets
        502  consistent gets
          0  physical reads
          0  redo size
        608  bytes sent via SQL*Net to client
        506  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

I don’t have all those row-by-row recursive calls. And the difference is easy to see in the execution plan predicate sections: there’s no call to my PL/SQL function there. It was called only at parse time to transform the SQL statement: now only using the string returned by the macro, with parameter substitution.

That was my goal: stay in SQL engine for the execution, calling only standard SQL functions. But while we are in the execution plan, can we do something to avoid the full table scan? My function is not deterministic but has a small number of variations. Two in my case. Then I can create an index for each one:


 
SQL>
SQL> create index people_full_name_first_last on people(initcap(first_name)||' '||initcap(last_name));
Index created.

SQL> create index people_full_name_first_first on people(initcap(last_name)||' '||initcap(first_name));
Index created.

And run my query again:


SQL> select m_full_name(first_name,last_name) from people
     where m_full_name(p_first_name=>first_name,p_last_name=>last_name) like 'Autonomous Oracle';

no rows selected

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 1341595178

------------------------------------------------------------------------------------------------
| Id  | Operation        | Name                        | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                             |  1103 | 25369 |   118   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| PEOPLE_FULL_NAME_FIRST_LAST |   441 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access(INITCAP("FIRST_NAME")||' '||INITCAP("LAST_NAME")='Autonomous Oracle')

Performance and agility

Now we are ready to bring back the business logic into the database so that it is co-located with data and run within the same process. Thanks to SQL Macros, we can even run it within the same engine, SQL, calling the PL/SQL one only at compile time to resolve the macro. And we keep full code maintainability as the logic is defined in a function that can evolve and be used in many places without duplicating the code.

Cet article Oracle 20c SQL Macros: a scalar example to join agility and performance est apparu en premier sur Blog dbi services.

Viewing all 523 articles
Browse latest View live