Interesting observation about standby redo logs in Data Guard

martin.bach's picture

Some of you might have followed the discussion around the number of standby redo logs on twitter, but since 140 characters are woefully short for the complete story here’s the writeup that prompted the question. This is a test with 12.1.0.2 on virtualised Linux, repeated on a proper platform with physical hardware.

First of all here’s my setup. I have a dbca-based database (CDB, but doesn’t matter) that features 3 groups for its online redo logs. They are all 50 MB in size-important for this test, but not realistic :) Following the Oracle documentation I created n + 1 groups (per thread) on the standby to stop Data Guard broker from complaining about missing standby redo logs (SRL).

The end result was positive, here’s what the broker thinks:

DGMGRL> show configuration

Configuration - test

  Protection Mode: MaxPerformance
  Members:
  CDB1  - Primary database
    STDBY - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 23 seconds ago)

The broker likes to bemoan you in case SRLs are missing on either primary and/or standby. So I’m all good to go! But wait a second, when I’m using one of the cool new 12c features to check the failover readiness, something is odd:

DGMGRL> validate database "STDBY"

  Database Role:     Physical standby database
  Primary Database:  CDB1

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Flashback Database Status:
    CDB1:   Off
    STDBY:  Off

  Current Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (CDB1)                  (STDBY)
    1         3                       2                       Insufficient SRLs

  Future Log File Groups Configuration:
    Thread #  Online Redo Log Groups  Standby Redo Log Groups Status
              (STDBY)                 (CDB1)
    1         3                       0                       Insufficient SRLs
    Warning: standby redo logs not configured for thread 1 on CDB1

DGMGRL>

Pardon me? I _do_ have 4 groups of SRLs:

SQL> r
  1* select group#,sequence#,bytes,used,status from v$standby_log

    GROUP#  SEQUENCE#      BYTES       USED STATUS
---------- ---------- ---------- ---------- ----------
         4         27   52428800       2048 ACTIVE
         5         29   52428800      11264 ACTIVE
         6          0   52428800          0 UNASSIGNED
         7          0   52428800          0 UNASSIGNED

And 3 online redo logs:

SQL> select group#,thread#,sequence#,bytes,status from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES STATUS
---------- ---------- ---------- ---------- ----------------
         1          1         31   52428800 CURRENT
         2          1         29   52428800 INACTIVE
         3          1         30   52428800 INACTIVE

But the fun fact remains that out of my 4 standby redo logs, only 2 are ever used. @pioro suggested that it’s because there is no need to use any of the other two because of a lack of redo to be applied and that sounds plausible. To validate this I created a test environment on physical hardware with proper storage because my tests on the lab turned out to suffer from IO problems.

The setup is the same.

I created a couple of sessions that start off by creating a 1,000,000 rows table, then delete from it only to roll back after the delete completed. That’s a lot of redo for 50 MB files (again you wouldn’t use 50MB online redo logs in production-this is just a test). My observation remains: only groups 4 and 5 are used. When the redo generation on the primary gets too far ahead, there will be RFS transfers of the archived redo log. Here are a few queries with their output:

PROCESS   STATUS       CLIENT_P GROUP#                                      THREAD#  SEQUENCE#     BLOCK# ACTIVE_AGENTS
--------- ------------ -------- ---------------------------------------- ---------- ---------- ---------- -------------
ARCH      CLOSING      ARCH     5                                                 1        149      83968             0
ARCH      CONNECTED    ARCH     N/A                                               0          0          0             0
ARCH      CLOSING      ARCH     5                                                 1        147      86016             0
ARCH      CLOSING      ARCH     4                                                 1        148      83968             0
MRP0      APPLYING_LOG N/A      N/A                                               1        147      85766            25
RFS       IDLE         UNKNOWN  N/A                                               0          0          0             0
RFS       IDLE         UNKNOWN  N/A                                               0          0          0             0
RFS       RECEIVING    LGWR     3                                                 1        150      85744             0
RFS       IDLE         ARCH     N/A                                               0          0          0             0

Is this a problem? Not as far as I can tell. And maybe I did something incorrectly on my side too. There wasn’t a problem-my archiver process was quick enough to archive the SRLs and as soon as it fell behind it resorted to fetch archived redo logs from the primary. When it uses the FAL process you see line like this in the alert.log:

2014-09-17 02:34:45.164000 -05:00
RFS[4]: Selected log 4 for thread 1 sequence 211 dbid 788205474 branch 857548261
Media Recovery Log +RECO/STDBY/ARCHIVELOG/2014_09_17/thread_1_seq_209.368.858479683
Archived Log entry 141 added for thread 1 sequence 210 ID 0x2efbcba0 dest 1:
2014-09-17 02:34:46.215000 -05:00

When Real-Time-Apply was active (whenever redo transfer rates dropped), I saw this, alternating between group 4 and 5:

2014-09-17 02:35:33.653000 -05:00
Media Recovery Waiting for thread 1 sequence 230 (in transit)
Recovery of Online Redo Log: Thread 1 Group 5 Seq 230 Reading mem 0
  Mem# 0: +DATA/STDBY/ONLINELOG/group_5.380.858076769
  Mem# 1: +RECO/STDBY/ONLINELOG/group_5.342.858076769

So for some reason yet unknown to me there are only ever 2 SRLs in use.

To prevent automated spam submissions leave this field empty.