linux

martin.bach's picture

Configuration device mapper multipath on OEL5 update 5

I have always wondered how to configure the device mapper multipath package for a Linux system. I knew how to do it in principle, but was never involved in the configuration from start up. Today I got the chance to work on this. The system is used for a lab test and not a production box (otherwise I probably wouldn’t have been allowed on). Actually it’s part of a 2 node cluster.

So the first step is to find out which partitions are visible to the system. The Linux kernel presents this information in the /proc/partitions table, as in the following example:


[root@node1 ~]# cat /proc/partitions
major minor  #blocks  name

 104     0   71652960 cciss/c0d0
 104     1     152586 cciss/c0d0p1
 104     2   71497282 cciss/c0d0p2
 8     0       2880 sda
 8    16  190479360 sdb
 8    32   23809920 sdc
 8    48   23809920 sdd
 8    64   23809920 sde
 8    80   23809920 sdf
 8    96   23809920 sdg
 8   112    1048320 sdh
 8   128    1048320 sdi
 8   144    1048320 sdj
 8   160       2880 sdk
 8   176  190479360 sdl
 8   192   23809920 sdm
 8   208   23809920 sdn
 8   224   23809920 sdo
 8   240   23809920 sdp
 65     0   23809920 sdq
 65    16    1048320 sdr
 65    32    1048320 sds
 65    48    1048320 sdt
 253     0    5111808 dm-0
 253     1   25591808 dm-1
 253     2   10223616 dm-2
 253     3    1015808 dm-3
 253     4   16777216 dm-4
[root@node1 ~]#

Using a keen eye you can see that sdk is the same size as sda, so probably that means that we have two paths to sda to sdj. We’ll confirm this later. The more HBAs and paths you have, the more partitions you are going to see. This is where the multipathing software comes into play: it allows us to abstract from the physical paths and presents a logical device. And offers some additional goodies such as path failover and limited load balancing.

Before proceeding I checked the status of the multipath daemon:

[root@node1 ~]# service multipathd status
multipathd is stopped
[root@node1 ~]# chkconfig --list multipathd
multipathd      0:off   1:off   2:off   3:off   4:off   5:off   6:off
[root@node1 ~]# chkconfig multipathd on

As you can see it was not started, and wouldn't start with a reboot - it was necessary to enable the service at boot time using the chkconfig command. This will automatically create links in /etc/rc.d/rcx.d to start and stop the service. As an additional benefit this command will respect dependencies the authors of the startup script have defined and create the {K,S}xxmultipathd links accordingly.

I next loaded the necessary modules-dm-multipath and dm-round-robin:

[root@node1 ~]# modprobe dm-multipath
[root@node1 ~]# modprobe dm-round-robin

With the multipathing nearly done, I need to get the WWIDs of all attached devices. At some point the WWID is going to be repeated - this is where you stop creating meta devices. Let's have a look at the output of this first. You have to change directory to /sys, as the scsi_id commands are relative to it.

[node1 sys]# for i in `cat /proc/partitions | awk '{print $4}' |grep sd`; do echo "### $i: `scsi_id -g -u -s /block/$i`"; done
### sda: 360000970000294900664533030303238
### sdb: 360000970000294900664533030344133
### sdc: 360000970000294900664533030344142
### sdd: 360000970000294900664533030344143
### sde: 360000970000294900664533030344144
### sdf: 360000970000294900664533030344239
### sdg: 360000970000294900664533030344241
### sdh: 360000970000294900664533030344244
### sdi: 360000970000294900664533030344245
### sdj: 360000970000294900664533030344246
### sdk: 360000970000294900664533030303238
### sdl: 360000970000294900664533030344133
### sdm: 360000970000294900664533030344142
### sdn: 360000970000294900664533030344143
### sdo: 360000970000294900664533030344144
### sdp: 360000970000294900664533030344239
### sdq: 360000970000294900664533030344241
### sdr: 360000970000294900664533030344244
### sds: 360000970000294900664533030344245
### sdt: 360000970000294900664533030344246
[node1 sys]#

Here you see again that sda and sdk have the same WWID. I like to assign alias names to the multipathing devices-that's going to make it easier to find out what they are used for. I now have to get the disk sizes and map these to their intended use.

Getting disk sizes:

[node1 sys]# fdisk -l 2>/dev/null | grep ^Disk
Disk /dev/cciss/c0d0: 73.3 GB, 73372631040 bytes        local
Disk /dev/sda: 2 MB, 2949120 bytes                ignore
Disk /dev/sdb: 195.0 GB, 195050864640 bytes
Disk /dev/sdc: 24.3 GB, 24381358080 bytes
Disk /dev/sdd: 24.3 GB, 24381358080 bytes
Disk /dev/sde: 24.3 GB, 24381358080 bytes
Disk /dev/sdf: 24.3 GB, 24381358080 bytes
Disk /dev/sdg: 24.3 GB, 24381358080 bytes
Disk /dev/sdh: 1073 MB, 1073479680 bytes
Disk /dev/sdi: 1073 MB, 1073479680 bytes
Disk /dev/sdj: 1073 MB, 1073479680 bytes
Disk /dev/sdk: 2 MB, 2949120 bytes                ignore
Disk /dev/sdl: 195.0 GB, 195050864640 bytes
Disk /dev/sdm: 24.3 GB, 24381358080 bytes
Disk /dev/sdn: 24.3 GB, 24381358080 bytes
Disk /dev/sdo: 24.3 GB, 24381358080 bytes
Disk /dev/sdp: 24.3 GB, 24381358080 bytes
Disk /dev/sdq: 24.3 GB, 24381358080 bytes
Disk /dev/sdr: 1073 MB, 1073479680 bytes
Disk /dev/sds: 1073 MB, 1073479680 bytes
Disk /dev/sdt: 1073 MB, 1073479680 bytes

The cleared, consolidated view on the storage:

### sdb: 360000970000294900664533030344133    195G
### sdc: 360000970000294900664533030344142    24.3G
### sdd: 360000970000294900664533030344143    24.3G   
### sde: 360000970000294900664533030344144    24.3G   
### sdf: 360000970000294900664533030344239    24.3G   
### sdg: 360000970000294900664533030344241    24.3G   
### sdh: 360000970000294900664533030344244    1G
### sdi: 360000970000294900664533030344245    1G
### sdj: 360000970000294900664533030344246    1G

### sdl: 360000970000294900664533030344133    repeat - second path
### sdm: 360000970000294900664533030344142
### sdn: 360000970000294900664533030344143
### sdo: 360000970000294900664533030344144
### sdp: 360000970000294900664533030344239
### sdq: 360000970000294900664533030344241
### sdr: 360000970000294900664533030344244
### sds: 360000970000294900664533030344245
### sdt: 36000097000029490066453303034424

Finally here's the mapping I will use:

  • sdb    DATA001
  • sdc    REDO001
  • sdd     FRA001
  • sde    FRA002
  • sdf    ACFS001
  • sdg    ACFS002
  • h,i,j     VOTINGOCR{1,2,3}

The mapping between WWID and alias happens in the /etc/multipath.conf file. The defaults section has been taken from MOS note 555603.1. The devnode_blacklist section has to be set up according to your storage config-in my case I ignore IDE devices and the internal RAID adapter.

[root@node1 ~]# cat /etc/multipath.conf
defaults {
 udev_dir                /dev
 polling_interval        10
 selector                "round-robin 0"
 path_grouping_policy    multibus
 getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
 prio_callout            /bin/true
 path_checker            readsector0
 rr_min_io               100
 rr_weight               priorities
 failback                immediate
 no_path_retry           fail
 user_friendly_name      no
}

devnode_blacklist {
 devnode "^(ramrawloopfdmddm-srscdst)[0-9]*"
 devnode "^hd[a-z]"
 devnode "^cciss!c[0-9]d[0-9]*"
 }

}

multipaths {
 multipath {
 wwid 360000970000294900664533030344133
 alias data001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344142
 alias redo001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344143
 alias fra001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344144
 alias fra002
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344239
 alias acfs001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344241
 alias acfs002
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344244
 alias votingocr001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344245
 alias votingocr002
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344246
 alias votingocr003
 path_grouping_policy failover
 }
}

The mapping is really simple-for each device you use, create a "multipath" section, enter WWID, an alias and a path policy. Done! See if that worked by starting the multipath daemon:

[root@node1 ~]# service multipathd start

As always, /var/log/messages is a good place to check:

Nov 16 16:34:58 loninengblc204 kernel: device-mapper: table: 253:5: multipath: error getting device
Nov 16 16:34:58 loninengblc204 kernel: device-mapper: ioctl: error adding target to table
Nov 16 16:34:58 loninengblc204 multipathd: 360000970000294900664533030303238: load table [0 5760 multipath 0 0 1 1 round-robin 0 2 1 8:0 1000 8:160 1000]
Nov 16 16:34:58 loninengblc204 multipathd: data001: load table [0 380958720 multipath 0 0 2 1 round-robin 0 1 1 8:16 1000 round-robin 0 1 1 8:176 1000]
Nov 16 16:34:58 loninengblc204 multipathd: redo001: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:32 1000 round-robin 0 1 1 8:192 1000]
Nov 16 16:34:58 loninengblc204 multipathd: fra001: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:48 1000 round-robin 0 1 1 8:208 1000]
Nov 16 16:34:58 loninengblc204 multipathd: fra002: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:64 1000 round-robin 0 1 1 8:224 1000]
Nov 16 16:34:58 loninengblc204 multipathd: acfs001: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:80 1000 round-robin 0 1 1 8:240 1000]
Nov 16 16:34:58 loninengblc204 multipathd: acfs002: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:96 1000 round-robin 0 1 1 65:0 1000]
Nov 16 16:34:58 loninengblc204 multipathd: votingocr001: load table [0 2096640 multipath 0 0 2 1 round-robin 0 1 1 8:112 1000 round-robin 0 1 1 65:16 1000]
Nov 16 16:34:58 loninengblc204 multipathd: votingocr002: load table [0 2096640 multipath 0 0 2 1 round-robin 0 1 1 8:128 1000 round-robin 0 1 1 65:32 1000]
Nov 16 16:34:58 loninengblc204 multipathd: votingocr003: load table [0 2096640 multipath 0 0 2 1 round-robin 0 1 1 8:144 1000 round-robin 0 1 1 65:48 1000]
Nov 16 16:34:58 loninengblc204 multipathd: 360000970000294900664533030303238: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: data001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: redo001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: fra001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: fra002: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: acfs001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: acfs002: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: votingocr001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: votingocr002: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: votingocr003: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: path checkers start u

Great - are all paths working?

[root@node1 ~]# multipath -ll | head
fra002 (360000970000294900664533030344144) dm-9 EMC,SYMMETRIX
[size=23G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:4 sde 8:64  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:4 sdo 8:224 [active][ready]
fra001 (360000970000294900664533030344143) dm-8 EMC,SYMMETRIX
[size=23G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:3 sdd 8:48  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:3 sdn 8:208 [active][ready]
acfs002 (360000970000294900664533030344241) dm-11 EMC,SYMMETRIX
[size=23G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:6 sdg 8:96  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:6 sdq 65:0  [active][ready]

Congratulations - distribute the working multipath.conf to all cluster nodes and start multipathd.

The beauty of this over a solution such as power path is that the device names are consistent across the cluster. With PowerPath I have come across a situation where /dev/rdsk/emcpower1a on node1 was /dev/rdsk/emcpower4a on node2 and again a different device on the other nodes. Not really user friendly, but neither a big issue with ASM: it'll read the information from the disk headers anyway. It was more a problem with pre 11.2 when you had to use block devices to store the OCR and voting files.

martin.bach's picture

Oracle RAC One Node revisited – 11.2.0.2

Since we published the RAC book, Oracle has released patchset 11.2.0.2. Amongst other things, this improved the RAC One Node option, exactly the way we expected.

How it was

A quick recap on the product as it was in 11.2.0.1: RAC One Node is part of Oracle Enterprise Edition, any other software editions are explicitly not allowed. Another restriction exists for 3rd party Clusterware: it’s not allowed to use one. RAC One Node is a hybrid between full blown RAC and the active/passive cluster. The option uses Grid Infrastructure for cluster management and storage provisioning via ASM. The RAC One instance starts its life as a RAC database, limited to only one cluster node. It only ever runs on one node only, but that node can change. It is strongly recommended to create a service for that RAC database. Utilities such as raconeinit provide a text based command line interface to transform that database to a “RAC One Node”-instance. In the process, the administrator can elect which nodes should be allowed to run the instance. The “omotion” utilities allowed the DBA to move the RAC One Node instance from the current node to another one. Optionally a time threshold could be set after which all ongoing transactions were to move to the new node. This feature required TAF or FAN to be set up correctly. The raconestatus utility allowed you to view the status of your RAC One Node instances. Conversion to full RAC was made possible by the racone2rac utility.

If you were after a Data Guard setup you’d be disappointed: that wasn’t (and AFAIK still is not) supported.

So all in all, that seemed a little premature. A patch to be downloaded and applied, no Data Guard and a new set of utilities are not really user friendly. Plus, initially this patch was available for Linux only. But at least a MOS note (which I didn’t find until after having finished writing this!) exists, RAC One — Changes in 11.2.0.2 [ID 1232802.1]

Changes

Instead of having to apply patch 9004119 to your environment, RAC One Node is available “out of the box” with 11.2.0.2. Sadly, the Oracle RAC One Node manual has not been updated, and searches on Metalink reveal no new information. One interesting piece of information: the patch for RAC One Node is listed as “undocumented Oracle Server” section.

The creation of a RAC One Node instance has been greatly simplified-dbca has added support for it, both from the command line for silent installations as well as the interactive GUI. Consider these options for dbca.

$ dbca -help
dbca  [-silent | -progressOnly | -customCreate] {  }  |
 { [ [options] ] -responseFile   }
 [-continueOnNonFatalErrors ]
Please refer to the manual for details.
You can enter one of the following command:

Create a database by specifying the following parameters:
-createDatabase
 -templateName 
 [-cloneTemplate]
 -gdbName 
 [-RACOneNode
 -RACOneNodeServiceName  ]
 [-policyManaged | -adminManaged ]
 [-createServerPool ]
 [-force ]
 -serverPoolName 
 -[cardinality ]
 [-sid ]
...

With RAC One Node you will most likely end up with a policy managed database in the end, I can’t see how an admin managed database made sense.

The srvctl command line tool has been improved to deal with the RAC One node. The most important operations are to add, remove, config and status. The nice thing about dbca is that it actually registers the database in the OCR. Immediately after the installation, you see this status information:

$ srvctl status database -d rontest
Instance rontest_1 is running on node node2
Online relocation: INACTIVE

$ srvctl config database -d rontest
Database unique name: rontest
Database name:
Oracle home: /data/oracle/product/11.2.0.2
Oracle user: oracle
Spfile: + DATA/rontest/spfilerontest.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: rontest
Database instances:
Disk Groups: DATA
Mount point paths:
Services: rontestsrv
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: rontest
Candidate servers: node2,node3
Database is administrator managed

Note that the instance_name, although the instance is administrator managed, changed to $ORACLE_SID_1. Relocating works now with the srvctl relocate database command as in this example:

$ srvctl relocate database -d rontest -n node2

You’ll get feedback about this in the output of the “status” command:

$ srvctl status database -d rontest
Instance rontest_1 is running on node node2
Online relocation: ACTIVE
Source instance: rontest_1 on node2
Destination instance: rontest_2 on node3

After the command completed, check the status again:

srvctl status database -d rontest
Instance rontest_2 is running on node node2
Online relocation: INACTIVE

The important difference between an admin managed database and a policy managed database is that you are responsible for undo tablespaces. If you don’t create and configure undo tablespaces, the relocate command will fail:

$ srvctl relocate database –d rontest -n node3                       <
PRCD-1222 : Online relocation of database rontest failed but database was restored to its original state
PRCD-1129 : Failed to start instance rontest_2 for database rontest
PRCR-1064 : Failed to start resource ora.rontest.db on node node3
CRS-5017: The resource action "ora.rontest.db start" encountered the following error:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-30013: undo tablespace 'UNDOTBS1' is currently in use
Process ID: 1587
Session ID: 35 Serial number: 1

CRS-2674: Start of 'ora.rontest.db' on 'node3' failed

In this case, the database runs on the same node. Check the ORACLE_SID (rontest_2 in my case) and modify the initialisation parameter.

SQL> select tablespace_name from dba_data_files where tablespace_name like ‘%UNDO%’;

TABLESPACE_NAME
------------------------------
UNDOTBS1
UNDOTBS2

So the tablespace was there, but the initialisation parameter was wrong! Let’s correct this:

SQL> alter system set undo_tablespace='UNDOTBS1' sid='rontest_1';

System altered.

SQL> alter system set undo_tablespace='UNDOTBS2' sid='rontest_2';

System altered.

Now the relocate will succeed.

To wrap this article up, the srvctl convert database command will convert between single instance, RAC One Node and RAC databases.

oraclebase's picture

RHEL 6 Released…

So RHEL 6 was released yesterday. Excuse my lack of interest, but I don’t think I will be bothering with it until Oracle Enterprise Linux 6 is available on eDelivery. Then I will no doubt start a frenzy of installations and testing. :)

Cheers

Tim…

oraclebase's picture

Fedora 14 on my desktop…

I wrote a post a few days ago about Fedora 14. Over the weekend I could resist no longer and switched to Fedora 14 as my desktop OS. Prior to this I had been using CentOS 5 for ages.

Now remember, I do almost everything in VMs, so all my Oracle stuff is still on OEL5 x86-64. This is just the desktop I use to run VirtualBox and a browser.

So far so good. The installation went fine and VirtualBox is behaving it self OK, so all my VMs are running with no problems. For the most part it all feels very similar to CentOS 5, but because all the underlying pieces are up to date I get to run a few extra things, like Chrome as my browser, Shutter for image capture and a newer version of Gimp.

I think Ubuntu is a more natural desktop than Fedora, but I’ve been using Red Hat versions of Linux for years, so I just feel a little happier on them. Fingers crossed this will work out OK.

Cheers

Tim…

martin.bach's picture

Build your own stretch cluster part V

This post is about the installation of Grid Infrastructure, and where it’s really getting exciting: the 3rd NFS voting disk is going to be presented and I am going to show you how simple it is to add it into the disk group chosen for OCR and voting disks.

Let’s start with the installation of Grid Infrastructure. This is really simple, and I won’t go into too much detail. Start by downloading the required file from MOS, a simple search for patch 10098816 should bring you to the download patch for 11.2.0.2 for Linux-just make sure you select the 64bit version. The file we need just now is called p10098816_112020_Linux-x86-64_3of7.zip. The file names don’t necessarily relate to their contents, the readme helps finding out which piece of the puzzle is used for what functionality.

I alluded to my software distribution method in one of the earlier posts, and here’s all the detail to come. My dom0 exports the /m directory to the 192.168.99.0/24 network, the one accessible to all my domUs. This really simplifies software deployments.

So starting off, the file has been unzipped:

openSUSE-112-64-minimal:/m/download/db11.2/11.2.0.2 # unzip -q p10098816_112020_Linux-x86-64_3of7.zip

This creates the subdirectory “grid”. Switch back to edcnode1 and log in as oracle. As I already explained I won’t use different accounts for Grid Infrastructure and the RDBMS in this example.

If not already done so, mount the /m directory on the domU (which requires root privileges). Move to the newly unzipped “grid” directory under your mount point and begin to set up the user equivalence. On edcnode1 and edcnode2, create RSA and DSA keys for SSH:

[oracle@edcnode1 ~]$ ssh-keygen -t rsa

Any questions can be answered with the return key, it’s important to leave the passphrase empty. Repeat the call to ssh-keygen with argument “-t dsa”. Navigate to ~/.ssh and create the authorized_keys file as follows:

[oracle@edcnode1 .ssh]$ cat *.pub >> authorized_keys

Then copy the authorized_keys file to edcnode2 and add the public keys:

[oracle@edcnode1 .ssh]$ scp authorized_keys oracle@edcnode2:`pwd`
[oracle@edcnode1 .ssh]$ ssh oracle@edcnode2

If you are prompted, add the host to the ~/.ssh/known_hosts file by typing in “yes”.

[oracle@edcnode2 .ssh]$ cat *.pub >> authorized_keys

Change the permissions on the authorized_keys file to 0400 on both hosts, otherwise it won’t be considered when trying to log in. With all of this done, you can add all the unknown hosts to each node’s known_hosts file. The easiest way is a for loop:

[oracle@edcnode1 ~]$ for i in edcnode1 edcnode2 edcnode1-priv edcnode2-priv; do  ssh $i hostname; don

Run this twice on each node, acknowledging the question if the new address should be added. Important: Ensure that there is no banner (/etc/motd, .profile, .bash_profile etc) writing to stdout or stderr or you are going to see strange error messages about user equivalence not being set up correctly.

I hear you say: but 11.2 can create user equivalence in OUI now-this is of course correct, but I wanted to run cluvfy now which requires a working setup.

Cluster Verification

It is good practice to run a check to see if the prerequisites for the Grid Infrastructure installation are met, and keep the output. Change to the NFS mount where the grid directory is exported, and execute runcluvfy.sh as in this example:

[oracle@edcnode1 grid]$ ./runcluvfy.sh stage -pre crsinst -n edcnode1,edcnode2 -verbose -fixup 2>&1 | tee /tmp/preCRS.tx

The nice thing is that you can run the fixup script now to fix kernel parameter settings:

[root@edcnode2 ~]# /tmp/CVU_11.2.0.2.0_oracle/runfixup.sh
/usr/bin/id
Response file being used is :/tmp/CVU_11.2.0.2.0_oracle/fixup.response
Enable file being used is :/tmp/CVU_11.2.0.2.0_oracle/fixup.enable
Log file location: /tmp/CVU_11.2.0.2.0_oracle/orarun.log
Setting Kernel Parameters...
fs.file-max = 327679
fs.file-max = 6815744
net.ipv4.ip_local_port_range = 9000 65500
net.core.wmem_max = 262144
net.core.wmem_max = 1048576

Repeat this on the second node, edcnode2. Obviously you should fix any other problem cluvfy reports before proceeding.

In the previous post I created the /u01 mount point-double check that /u01 is actually mounted-otherwise you’d end up writing on your root_vg’s root_lv, not an ideal situation.

You are now ready to start the installer: type in ./runInstaller to start the installation.

Grid Installation

This is rather mundane, and instaed of providing print screens, I opted for a description of the steps needed to execute in the OUI session.

  • Screen 01: Skip software updates (I don’t have an Internet connection on my lab)
  • Screen 02: Install and configure Grid Infrastructure for a cluster
  • Screen 03: Advanced Installation
  • Screen 04: Keep defaults or add additional languages
  • Screen 05: Cluster Name: edc, SCAN name edc-scan, SCAN port: 1521, do not configure GNS
  • Screen 06: Ensure that both hosts are listed in this screen. Add/edit as appropriate. Hostnames are edcode{1,2}.localdomain, VIPs are to be edcnode{1,2}-vip.localdomain. Enter the oracle  user’s password and click on next
  • Screen 07: Assign eth0 to public, eth1 to private and eth2 to “do not use”.
  • Screen 08: Select ASM
  • Screen 09: disk group name: OCRVOTE with NORMAL redundancy. Tick the boxes for “ORCL:OCR01FILER01″, “ORCL:OCR01FILER02″ and “ORCL:OCR02FILER01″
  • Screen 10: Choose suitable passwords for SYS and ASMSNMP
  • Screen 11: Don’t use IPMI
  • Screen 12: Assign DBA to OSDBA, OSOPER and OSASM. Again, in the real world you should think about role separation and assign different groups
  • Screen 13: ORACLE_BASE: /u01/app/oracle, Software location: /u01/app/11.2.0/grid
  • Screen 14: Oracle inventory: /u01/app/oraInventory
  • Screen 15: Ignore all-there should only be references to swap, cvuqdisk, ASM device checks and NTP. If you have additional warnings, fix them first!
  • Screen 16: Click on install!

The usual installation will now take place. At the end, run the root.sh script on edcnode1 and after it completes, on edcnode2. The output is included here for completeness:

[root@edcnode1 u01]# /u01/app/11.2.0/grid/root.sh 2>&1 | tee /tmp/root.sh.out
Running Oracle 11g root script...

The following environment variables are set as:
 ORACLE_OWNER= oracle
 ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
 Copying dbhome to /usr/local/bin ...
 Copying oraenv to /usr/local/bin ...
 Copying coraenv to /usr/local/bin ...

Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
 root wallet
 root wallet cert
 root cert export
 peer wallet
 profile reader wallet
 pa wallet
 peer wallet keys
 pa wallet keys
 peer cert request
 pa cert request
 peer cert
 pa cert
 peer root cert TP
 profile reader root cert TP
 pa root cert TP
 peer pa cert TP
 pa peer cert TP
 profile reader pa cert TP
 profile reader peer cert TP
 peer user cert
 pa user cert
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-2672: Attempting to start 'ora.mdnsd' on 'edcnode1'
CRS-2676: Start of 'ora.mdnsd' on 'edcnode1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'edcnode1'
CRS-2676: Start of 'ora.gpnpd' on 'edcnode1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'edcnode1'
CRS-2672: Attempting to start 'ora.gipcd' on 'edcnode1'
CRS-2676: Start of 'ora.gipcd' on 'edcnode1' succeeded
CRS-2676: Start of 'ora.cssdmonitor' on 'edcnode1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'edcnode1'
CRS-2672: Attempting to start 'ora.diskmon' on 'edcnode1'
CRS-2676: Start of 'ora.diskmon' on 'edcnode1' succeeded
CRS-2676: Start of 'ora.cssd' on 'edcnode1' succeeded

ASM created and started successfully.

Disk Group OCRVOTE created successfully.

clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4256: Updating the profile
Successful addition of voting disk 38f2caf7530c4f67bfe23bb170ed2bfe.
Successful addition of voting disk 9aee80ad14044f22bf6211b81fe6363e.
Successful addition of voting disk 29fde7c3919b4fd6bf626caf4777edaa.
Successfully replaced voting disk group with +OCRVOTE.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   38f2caf7530c4f67bfe23bb170ed2bfe (ORCL:OCR01FILER01) [OCRVOTE]
 2. ONLINE   9aee80ad14044f22bf6211b81fe6363e (ORCL:OCR01FILER02) [OCRVOTE]
 3. ONLINE   29fde7c3919b4fd6bf626caf4777edaa (ORCL:OCR02FILER01) [OCRVOTE]
Located 3 voting disk(s).
CRS-2672: Attempting to start 'ora.asm' on 'edcnode1'
CRS-2676: Start of 'ora.asm' on 'edcnode1' succeeded
CRS-2672: Attempting to start 'ora.OCRVOTE.dg' on 'edcnode1'
CRS-2676: Start of 'ora.OCRVOTE.dg' on 'edcnode1' succeeded
ACFS-9200: Supported
ACFS-9200: Supported
CRS-2672: Attempting to start 'ora.registry.acfs' on 'edcnode1'
CRS-2676: Start of 'ora.registry.acfs' on 'edcnode1' succeeded
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

[root@edcnode2 ~]# /u01/app/11.2.0/grid/root.sh 2>&1 | tee /tmp/rootsh.out
Running Oracle 11g root script...

The following environment variables are set as:
 ORACLE_OWNER= oracle
 ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
 Copying dbhome to /usr/local/bin ...
 Copying oraenv to /usr/local/bin ...
 Copying coraenv to /usr/local/bin ...

Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node edcnode1, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded
[root@edcnode2 ~]#

Congratulations! You have a working setup! Check if everything is ok:

[root@edcnode2 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.OCRVOTE.dg
 ONLINE  ONLINE       edcnode1
 ONLINE  ONLINE       edcnode2
ora.asm
 ONLINE  ONLINE       edcnode1                 Started
 ONLINE  ONLINE       edcnode2
ora.gsd
 OFFLINE OFFLINE      edcnode1
 OFFLINE OFFLINE      edcnode2
ora.net1.network
 ONLINE  ONLINE       edcnode1
 ONLINE  ONLINE       edcnode2
ora.ons
 ONLINE  ONLINE       edcnode1
 ONLINE  ONLINE       edcnode2
ora.registry.acfs
 ONLINE  ONLINE       edcnode1
 ONLINE  ONLINE       edcnode2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
 1        ONLINE  ONLINE       edcnode2
ora.LISTENER_SCAN2.lsnr
 1        ONLINE  ONLINE       edcnode1
ora.LISTENER_SCAN3.lsnr
 1        ONLINE  ONLINE       edcnode1
ora.cvu
 1        ONLINE  ONLINE       edcnode1
ora.edcnode1.vip
 1        ONLINE  ONLINE       edcnode1
ora.edcnode2.vip
 1        ONLINE  ONLINE       edcnode2
ora.oc4j
 1        ONLINE  ONLINE       edcnode1
ora.scan1.vip
 1        ONLINE  ONLINE       edcnode2
ora.scan2.vip
 1        ONLINE  ONLINE       edcnode1
ora.scan3.vip
 1        ONLINE  ONLINE       edcnode1
[root@edcnode2 ~]#

[root@edcnode1 ~]# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   38f2caf7530c4f67bfe23bb170ed2bfe (ORCL:OCR01FILER01) [OCRVOTE]
 2. ONLINE   9aee80ad14044f22bf6211b81fe6363e (ORCL:OCR01FILER02) [OCRVOTE]
 3. ONLINE   29fde7c3919b4fd6bf626caf4777edaa (ORCL:OCR02FILER01) [OCRVOTE]
Located 3 voting disk(s).

Adding the NFS voting disk

It’s about time to deal with this subject. If not done so already, start the domU “filer03″. Log in as openfiler and ensure that the NFS server is started. On the services tab click on enable next to the NFS server if needed. Next navigate to the shares tab, where you should find the volume group and logical volume created earlier. The volume group I created is called “ocrvotenfs_vg”, and it has 1 logical volume, “nfsvol_lv”. Click on the name of the LV to create a new share. I named the new share “ocrvote” – enter this in the popup window and click on “create sub folder”.

The new share should appear underneath the nfsvol_lv now. Proceed by clicking on “ocrvote” to set the share’s properties. Before you get to enter these, click on “make share”. Scroll down to the host access configuration section in the following screen. In this section you could set all sorts of technologies-SMB, NFS, WebDAV, FTP and RSYNC. For this example, everything but NFS should be set to “NO”.

For NFS, the story is different: ensure you set the radio button to “RW” for both hosts. Then click on Edit for each machine. This is important! The anonymous UID and GID must match the Grid Owner’s uid and gid. In my scenario I entered “500″ for both-you can check your settings using the id command as oracle: it will print the UID and GID plus other information.

The UID/GID mapping then has to be set to all_squash, IO mode to sync, and write delay to wdelay. Leave the default for “requesting origin port”, which was set to “secure < 1024″ in my configuration.

I decided to create /ocrvote on both nodes to mount the NFS export:

[root@edcnode2 ~]# mkdir /ocrvote

Edit the /etc/fstab file to make the mount persistent across reboots. I added this line to the file on both nodes:

192.168.101.52:/mnt/ocrvotenfs_vg/nfsvol_lv/ocrvote /ocrvote nfs rw,bg,hard,intr,rsize=32768,wsize=32768,tcp,noac,nfsvers=3,timeo=600,addr=192.168.101.51

The “addr” command instructs Linux to use the storage network to mount the share. Now you are ready to mount the device on all nodes, using the “mount /ocrvote” command.

I changed the export on the filer to the uid/gid combination of the oracle account (or, on an installation with separate grid software owner, to its uid/gid combination):

[root@filer03 ~]# cd /mnt/ocrvotenfs_vg/nfsvol_lv/
[root@filer03 nfsvol_lv]# ls -l
total 44
-rw-------  1 root    root     6144 Sep 24 15:38 aquota.group
-rw-------  1 root    root     6144 Sep 24 15:38 aquota.user
drwxrwxrwx  2 root    root     4096 Sep 24 15:26 homes
drwx------  2 root    root    16384 Sep 24 15:26 lost+found
drwxrwsrwx  2 ofguest ofguest  4096 Sep 24 15:31 ocrvote
-rw-r--r--  1 root    root      974 Sep 24 15:45 ocrvote.info.xml
[root@filer03 nfsvol_lv]# chown 500:500 ocrvote
[root@filer03 nfsvol_lv]# ls -l
total 44
-rw-------  1 root root  7168 Sep 24 16:09 aquota.group
-rw-------  1 root root  7168 Sep 24 16:09 aquota.user
drwxrwxrwx  2 root root  4096 Sep 24 15:26 homes
drwx------  2 root root 16384 Sep 24 15:26 lost+found
drwxrwsrwx  2  500  500  4096 Sep 24 15:31 ocrvote
-rw-r--r--  1 root root   974 Sep 24 15:45 ocrvote.info.xml
[root@filer03 nfsvol_lv]#

ASM requires zero padded files asm “disks”, so create one:

[root@filer03 nfsvol_lv]# dd if=/dev/zero of=ocrvote/nfsvotedisk01 bs=1G count=2
[root@filer03 nfsvol_lv]# chown 500:500 ocrvote/nfsvotedisk01

Add the third voting disk

Almost there! Before performing any change to the cluster configuration it is always a good idea to take a backup.

[root@edcnode1 ~]# ocrconfig -manualbackup

edcnode1     2010/09/24 17:11:51     /u01/app/11.2.0/grid/cdata/edc/backup_20100924_171151.ocr

You only need to do this on one node. Recall that the current state is:

[oracle@edcnode1 ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   38f2caf7530c4f67bfe23bb170ed2bfe (ORCL:OCR01FILER01) [OCRVOTE]
 2. ONLINE   9aee80ad14044f22bf6211b81fe6363e (ORCL:OCR01FILER02) [OCRVOTE]
 3. ONLINE   29fde7c3919b4fd6bf626caf4777edaa (ORCL:OCR02FILER01) [OCRVOTE]
Located 3 voting disk(s).

ASM sees it the same way:

SQL> select mount_status,header_status, name,failgroup,library
 2  from v$asm_disk
 3  /

MOUNT_S HEADER_STATU NAME                           FAILGROUP       LIBRARY
------- ------------ ------------------------------ --------------- ------------------------------------------------------------
CLOSED  PROVISIONED                                                 ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
CLOSED  PROVISIONED                                                 ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
CLOSED  PROVISIONED                                                 ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
CLOSED  PROVISIONED                                                 ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
CACHED  MEMBER       OCR01FILER01                   OCR01FILER01    ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
CACHED  MEMBER       OCR01FILER02                   OCR01FILER02    ASM Library - Generic Linux, version 2.0.4 (KABI_V2)
CACHED  MEMBER       OCR02FILER01                   OCR02FILER01    ASM Library - Generic Linux, version 2.0.4 (KABI_V2)

7 rows selected.

Now here’s the idea: you add the NFS location to the ASM diskstring in addition with “ORCL:*” and all is well. But that didn’t work:

SQL> show parameter disk  

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                       string
asm_diskstring                       string      ORCL:*
SQL> 

SQL> alter system set asm_diskstring = 'ORCL:*, /ocrvote/nfsvotedisk01' scope=memory sid='*';
alter system set asm_diskstring = 'ORCL:*, /ocrvote/nfsvotedisk01' scope=memory sid='*'
*
ERROR at line 1:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-15014: path 'ORCL:OCR01FILER01' is not in the discovery set

Regardless of what I tried, the system complained. Grudgingly I used the GUI – asmca.

After starting asmca, click on Disk Groups. Then select diskgroup “OCRVOTE”, and right click to “add disks”. The trick is to click on “change discovery path”. Enter “ORCL:*, /ocrvote/nfsvotedisk01″ (without quotes) to the dialog field and close it. Strangely, now the NFS disk now appears. Make two ticks: before disk path, and in the quorum box. A click on the OK button starts the magic, and you should be presented with a success message. The ASM instance reports a little more:

ALTER SYSTEM SET asm_diskstring='ORCL:*','/ocrvote/nfsvotedisk01' SCOPE=BOTH SID='*';
2010-09-29 10:54:52.557000 +01:00
SQL> ALTER DISKGROUP OCRVOTE ADD  QUORUM DISK '/ocrvote/nfsvotedisk01' SIZE 500M /* ASMCA */
NOTE: Assigning number (1,3) to disk (/ocrvote/nfsvotedisk01)
NOTE: requesting all-instance membership refresh for group=1
2010-09-29 10:54:54.445000 +01:00
NOTE: initializing header on grp 1 disk OCRVOTE_0003
NOTE: requesting all-instance disk validation for group=1
NOTE: skipping rediscovery for group 1/0xd032bc02 (OCRVOTE) on local instance.
2010-09-29 10:54:57.154000 +01:00
NOTE: requesting all-instance disk validation for group=1
NOTE: skipping rediscovery for group 1/0xd032bc02 (OCRVOTE) on local instance.
2010-09-29 10:55:00.718000 +01:00
GMON updating for reconfiguration, group 1 at 5 for pid 27, osid 15253
NOTE: group 1 PST updated.
NOTE: initiating PST update: grp = 1
GMON updating group 1 at 6 for pid 27, osid 15253
2010-09-29 10:55:02.896000 +01:00
NOTE: PST update grp = 1 completed successfully
NOTE: membership refresh pending for group 1/0xd032bc02 (OCRVOTE)
2010-09-29 10:55:05.285000 +01:00
GMON querying group 1 at 7 for pid 18, osid 4247
NOTE: cache opening disk 3 of grp 1: OCRVOTE_0003 path:/ocrvote/nfsvotedisk01
GMON querying group 1 at 8 for pid 18, osid 4247
SUCCESS: refreshed membership for 1/0xd032bc02 (OCRVOTE)
2010-09-29 10:55:06.528000 +01:00
SUCCESS: ALTER DISKGROUP OCRVOTE ADD  QUORUM DISK '/ocrvote/nfsvotedisk01' SIZE 500M /* ASMCA */
2010-09-29 10:55:08.656000 +01:00
NOTE: Attempting voting file refresh on diskgroup OCRVOTE
NOTE: Voting file relocation is required in diskgroup OCRVOTE
NOTE: Attempting voting file relocation on diskgroup OCRVOTE
NOTE: voting file allocation on grp 1 disk OCRVOTE_0003
2010-09-29 10:55:10.047000 +01:00
NOTE: voting file deletion on grp 1 disk OCR02FILER01
NOTE: starting rebalance of group 1/0xd032bc02 (OCRVOTE) at power 1
Starting background process ARB0
ARB0 started with pid=29, OS id=15446
NOTE: assigning ARB0 to group 1/0xd032bc02 (OCRVOTE) with 1 parallel I/O
2010-09-29 10:55:13.178000 +01:00
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=1
2010-09-29 10:55:15.533000 +01:00
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 1/0xd032bc02 (OCRVOTE)
GMON updating for reconfiguration, group 1 at 9 for pid 31, osid 15451
NOTE: group 1 PST updated.
2010-09-29 10:55:17.907000 +01:00
NOTE: membership refresh pending for group 1/0xd032bc02 (OCRVOTE)
2010-09-29 10:55:20.481000 +01:00
GMON querying group 1 at 10 for pid 18, osid 4247
SUCCESS: refreshed membership for 1/0xd032bc02 (OCRVOTE)
2010-09-29 10:55:23.490000 +01:00
NOTE: Attempting voting file refresh on diskgroup OCRVOTE
NOTE: Voting file relocation is required in diskgroup OCRVOTE
NOTE: Attempting voting file relocation on diskgroup OCRVOTE

Superb! But did it kick out the correct disk? Yes it did-you now see OCR01FILER01 and ORC01FILER02 plus the NFS disk:

[oracle@edcnode1 ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   38f2caf7530c4f67bfe23bb170ed2bfe (ORCL:OCR01FILER01) [OCRVOTE]
 2. ONLINE   9aee80ad14044f22bf6211b81fe6363e (ORCL:OCR01FILER02) [OCRVOTE]
 3. ONLINE   6107050ad9ba4fd1bfebdf3a029c48be (/ocrvote/nfsvotedisk01) [OCRVOTE]
Located 3 voting disk(s).

Preferred Mirror Read

One of the cool new 11.1 features allowed administrators to instruct administrators of stretch RAC system to read mirrored extents rather than primary extents. This can speed up data access in cases where data would otherwise have been sent from the remote array. Setting this parameter is crucial to many implementations. In preparation of the RDBMS installation (to be detailed in the next post), I created a disk group consisting of 4 ASM disks, two from each filer. The syntax for the disk group creation is as follows:

SQL> create diskgroup data normal redundancy
  2  failgroup sitea disk 'ORCL:ASM01FILER01','ORCL:ASM01FILER02'
  3* failgroup siteb disk 'ORCL:ASM02FILER01','ORCL:ASM02FILER02'
SQL> /

Diskgroup created.

As you can see all disks from sitea are from filer01 and form one failure group. The other disks, originating from filer02 form the second failure group.

You can see the result in v$asm_disk, as this example shows:

SQL> select name,failgroup from v$asm_disk;

NAME                           FAILGROUP
------------------------------ ------------------------------
ASM01FILER01                   SITEA
ASM01FILER02                   SITEA
ASM02FILER01                   SITEB
ASM02FILER02                   SITEB
OCR01FILER01                   OCR01FILER01
OCR01FILER02                   OCR01FILER02
OCR02FILER01                   OCR02FILER01
OCRVOTE_0003                   OCRVOTE_0003

8 rows selected.

Now all that remains to be done is to instruct the ASM instances to read from the local storage if possible. This is performed by setting an instance-specific init.ora parameter. I used the following syntax:

SQL> alter system set asm_preferred_read_failure_groups='DATA.SITEB' scope=both sid='+ASM2';

System altered.

SQL> alter system set asm_preferred_read_failure_groups='DATA.SITEA' scope=both sid='+ASM1';

System altered.

So I’m all set for the next step, the installation of the RDBMS software. But that’s for another post…

martin.bach's picture

Kindle version of Pro Oracle Database 11g RAC on Linux

I had a few questions from readers whether or not there was going to be a kindle version of Pro Oracle Database 11g RAC on Linux.

The good news for those waiting is: yes! But it might take a couple of weeks for it to be released.

I checked with Jonathan Gennick who expertly oversaw the whole project and he confirmed that Amazon have been contacted to provide a kindle version.

As soon as I hear more, I’ll post it here.

oraclebase's picture

Fedora 14…

Fedora 14 is here and so are the obligatory articles:

My attitude to Fedora and Ubuntu as changed today, with most of that shift due to VirtualBox.

Before I switched to VirtualBox I was always reliant on my OS being able to run VMware Server. Over the years I had repeatedly encountered problems running VMware Server on Ubuntu and Fedora. Not all of them show stoppers, but enough to put me off them as my main desktop OS. Why did I stick with VMware Server? Just because it supported shared virtual disks, which allowed me to easily create virtual RAC installations. Version 3.2.8 of VirtualBox included support for shared disks for the first time, so I ditched VMware Server and launched full scale into using VirtualBox.

While I was playing around with Fedora 14 I was thinking how cool it would be to have a newer OS on my desktop that could run Google Chrome, then it dawned on me that now I can. I’ve been free of VMware Server for a while now and I hadn’t realized the knock-on effect of that.

My years of using RHEL mean I feel a little more comfortable with Fedora than Ubuntu, but to be honest all I do on a desktop is fire up VirtualBox, use a browser (preferably Chrome) and use a terminal for SSH. Virtually everything else is done in VMs.

Now, do I waste a few days assessing the various options for my desktop, or do I just stick with CentOS and deal with the fact I can’t use Chrome on it? :)

Cheers

Tim…

oraclebase's picture

UltraEdit on Linux and Mac…

When I was a Windows user, one tool I felt I couldn’t live without was UltraEdit. It’s awesome.

A few months ago I checked the UltraEdit website and saw a Linux version of the editor was available. Unfortunately, it only had a subset of the functionality found in the Windows version. I checked again yesterday, and the Linux version is still lagging behind, but it’s a bit better than it was. I wrote to the company (IDM Computer Solutions) to ask when/if some of the functionality I require would be coming and it looks like the next release (start of next year) will include everything I need for my day-to-day use. What’s more, towards the end of this year there should be a Mac version available. Joy!

There are of course alternatives out there, but I really like Ultraedit and I’m happy to pay for a lifetime updates license on each platform (I already have a Windows one) if I have to. I’m keeping my fingers crossed for a nice Christmas present from IDM Computer Solutions. :)

Cheers

Tim…

martin.bach's picture

Build your own 11.2.0.2 stretched RAC part IV

Finally I have some more time to work on the next article in this series, dealing with the setup of my two cluster nodes. This is actually going to be quite short compared to the other articles so far. This is mainly due to the fact that I have streamlined the deployment of new Oracle-capable machines to a degree where I can comfortably set up a cluster in 2 hours. It’s a bit more work initially, but it paid off. The setup of my reference VM is documented on this blog as well, search for virtualisation and opensuse to get to the article.

When I first started working in my lab environment I created a virtual machine called “rhel55ref”. In reality it’s OEL, because of Red Hat’s windooze like policy to require an activation code. I would have considered CentOS as well, but when I created the reference VM the community hadn’t provided the “update 5″. I like the brand new shiny things most :)

Seems like I’m lucky now as well with the introduction of Oracle’s own Linux kernel I am ready for the future. Hopefully Red Hat will get their act together soon and release version 6 of their distribution. As much as I like Oracle I don’t want them to dominate the OS market too much. With Solaris now in their hands as well…

Anyway, to get started with my first node I cloned my template. Moving to /var/lib/xen/images all I had to do was to “cp -a rhel55ref edcnode1″. One repetition to edcnode2 gave me my second node. Xen (or libvirt for that matter) stores the VM configuration in xenstore, a backend database which can be interrogated easily. So I dumped the XML configuration file for my rhel55ref VM and stored it in edcnode{1,2}.xml. The command to dump the information is “virsh dumpxml domainName” > edcnode{1,2}.xml

The domU folder contains the virtual disk for the root file system of my VM, called disk0. I then created a new “hard disk”, called disk1 to contain the Oracle binaries. Experience told me not to have that too small, 20G should be enough for my /u01 mountpoint for Grid Infrastructure and the RDBMS binaries.

[root@dom0]# /var/lib/xen/images/edcnode1 # dd if=/dev/zero of=disk01 bs=1 count=0 seek=20G
0+0 records in
0+0 records out
0 bytes (0 B) copied, 1.3869e-05 s, 0.0 kB/

I like to speed the file creation up by using the sparse file trick: the file disk1 will be reported to be 20G in size, but it will only use that if the virtual machine needs them. It’s a bit like Oracle creating a temporary tablespace.

With that information it’s time to modify the dumped XML file. Again it’s important to define MAC addresses for the network interfaces, otherwise the system will try and use dhcp for your NICs, destroying the carefully crafted /etc/sysconfig/network-scripts/ifcfg-eth{0,1,2} files. Oh, and remember that the first 3 tupel are reserved for XEN, so don’t change “00:16:3e”! Your UUID also has to be unique. In the end my first VM’s XML description looked like this:


 edcnode1
 46a36f98-4e52-45a5-2579-80811b38a3ab
 4194304
 524288
 2
 /usr/bin/pygrub
 -q
 
 linux
  
 
 
 destroy
 restart
 destroy
 
 /usr/lib64/xen/bin/qemu-dm
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 

 
 
 
 
 
 
 
 

You can see that the interfaces refer to br1, br2, and br3. These are the ones that were previously defined in the first article. The tag “” in the tag doesn’t matter as that will be dynamically assigned anyway.

When done, you can define the new VM and start it:

[root@dom0]# virsh define edcnode{1,2}.xml
[root@dom0]# xm start edcnode1 -c

You are directly connected to the VM’s console (80×24-just like in the old times!) and have to wait a looooong time for the DHCP requests for eth0, eth1 and eth2 to time out. This is the first thing to address. As root, log in to the system and navigate straight to /etc/sysconfig/network-scripts to change ifcfg-eth{0,1,2}. Alternatively, use system-config-network-tui to change the network settings.

The following settings should be used for edcnode1:

  • eth0:    192.168.99.56/24
  • eth1:    192.168.100.56/24
  • eth2:    192.168.101.56/24

These are the settings for edcnode2:

  • eth0:    192.168.99.58/24
  • eth1:    192.168.100.58/24
  • eth2:    192.168.101.58/24

The nameserver for both is my dom0 – in this case 192.168.99.10. Enter the appropriate hostname as well as the nameserver. Note that 192.168.99.57 and 59 are reserved for the node VIPs, hence the “gap”. Then edit /etc/hosts to enter the information about the private interconnect, which for obvious reasons is not included in DNS. If you like, persist your public and VIP information in /etc/hosts as well. Don’t do this with the SCAN, it’s not suggested to have the SCAN resolve through /etc/hosts although it works.

Now’s the big moment-restart the network services and get out of the uncomfortable 80×24 character limitation:

[root@edcnode1]# service network restart

The complete configuration is printed here for the sake of completeness for edcnode1:

[root@edcnode1 ~]# cat /etc/resolv.conf
nameserver 192.168.99.10
search localdomain
[root@edcnode1 ~]#

[root@edcnode1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth{0,1,2}
# Xen Virtual Ethernet
DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
HWADDR=00:16:3e:ab:cd:ef
NETMASK=255.255.255.0
IPADDR=192.168.99.56
TYPE=Ethernet
# Xen Virtual Ethernet
DEVICE=eth1
BOOTPROTO=none
ONBOOT=yes
HWADDR=00:16:3e:10:13:1a
NETMASK=255.255.255.0
IPADDR=192.168.100.56
TYPE=Ethernet
# Xen Virtual Ethernet
DEVICE=eth2
BOOTPROTO=none
ONBOOT=yes
HWADDR=00:16:3e:11:12:ef
NETMASK=255.255.255.0
IPADDR=192.168.101.56
TYPE=Ethernet

[root@edcnode1 ~]# cat /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=edcnode1

Next on the agenda is the iscsi-initiator. This isn’t part of my standard build and had to be added. All my software is exported from the dom0 via NFS and mounted to /mnt/

[root@edcnode1 ~]# find /mnt -iname "iscsi*"
/mnt/oracleEnterpriseLinux/source/iscsi-initiator-utils-6.2.0.871-0.16.el5.x86_64.rpm

[root@edcnode1 ~]# cd /mnt/oracleEnterpriseLinux/source/
[root@edcnode1 ~]# rpm -ihv iscsi-initiator-utils-6.2.0.871-0.16.el5.x86_64.rpm
warning: iscsi-initiator-utils-6.2.0.871-0.16.el5.x86_64.rpm: ...
Preparing...                ########################################### [100%]
 1:iscsi-initiator-utils  ########################################### [100%

It's important to edit the initiator name, i.e. the name the initiator reports back to OpenFiler. I changed it to include edcnode1 and edcnode2 on their respective hosts. The file to edit is /etc/iscsi/initiatorname.iscsi

Time to get serious now:

[root@edcnode1 ~]# /etc/init.d/iscsi start
iscsid is stopped
Starting iSCSI daemon:                                     [  OK  ]
 [  OK  ]
Setting up iSCSI targets: iscsiadm: No records found!
 [  OK  ]

We are ready to roll. First, we need to discover the targets from the OpenFiler appliance-start with the first one filer01:

[root@edcnode1 ~]# iscsiadm -m discovery -t sendtargets -p 192.168.101.50
192.168.101.50:3260,1 iqn.2006-01.com.openfiler:asm01Filer01
192.168.101.50:3260,1 iqn.2006-01.com.openfiler:ocrvoteFiler01
192.168.101.50:3260,1 iqn.2006-01.com.openfiler:asm02Filer01

A restart of the iscsi service will automatically log in and persist the settings (this is very wide output-works best in 1280xsomething resolution)

[root@edcnode1 ~]# service iscsi restart
Stopping iSCSI daemon:
iscsid dead but pid file exists                            [  OK  ]
Starting iSCSI daemon:                                     [  OK  ]
 [  OK  ]
Setting up iSCSI targets: Logging in to [iface: default, target: iqn.2006-01.com.openfiler:asm01Filer01, portal: 192.168.101.50,3260]
Logging in to [iface: default, target: iqn.2006-01.com.openfiler:ocrvoteFiler01, portal: 192.168.101.50,3260]
Logging in to [iface: default, target: iqn.2006-01.com.openfiler:asm02Filer01, portal: 192.168.101.50,3260]
Login to [iface: default, target: iqn.2006-01.com.openfiler:asm01Filer01, portal: 192.168.101.50,3260]: successful
Login to [iface: default, target: iqn.2006-01.com.openfiler:ocrvoteFiler01, portal: 192.168.101.50,3260]: successful
Login to [iface: default, target: iqn.2006-01.com.openfiler:asm02Filer01, portal: 192.168.101.50,3260]: successful
 [  OK  ]

Fine! Now over to fdisk the new devices. I know that my “local” storage is named /dev/xvd*, so anything new (“/dev/sd*”) will be iSCSI provided storage. If you are unsure you can always check the /var/log/messages file to see which device have just been discovered. You should see something similar to this output:

Sep 24 12:20:08 edcnode1 kernel: Loading iSCSI transport class v2.0-871.
Sep 24 12:20:08 edcnode1 kernel: cxgb3i: tag itt 0x1fff, 13 bits, age 0xf, 4 bits.
Sep 24 12:20:08 edcnode1 kernel: iscsi: registered transport (cxgb3i)
Sep 24 12:20:08 edcnode1 kernel: Broadcom NetXtreme II CNIC Driver cnic v2.1.0 (Oct 10, 2009)
Sep 24 12:20:08 edcnode1 kernel: Broadcom NetXtreme II iSCSI Driver bnx2i v2.1.0 (Dec 06, 2009)
Sep 24 12:20:08 edcnode1 kernel: iscsi: registered transport (bnx2i)
Sep 24 12:20:08 edcnode1 kernel: iscsi: registered transport (tcp)
Sep 24 12:20:08 edcnode1 kernel: iscsi: registered transport (iser)
Sep 24 12:20:08 edcnode1 kernel: iscsi: registered transport (be2iscsi)
Sep 24 12:20:08 edcnode1 iscsid: iSCSI logger with pid=20558 started!
Sep 24 12:20:08 edcnode1 kernel: scsi0 : iSCSI Initiator over TCP/IP
Sep 24 12:20:08 edcnode1 kernel: scsi1 : iSCSI Initiator over TCP/IP
Sep 24 12:20:08 edcnode1 kernel: scsi2 : iSCSI Initiator over TCP/IP
Sep 24 12:20:09 edcnode1 kernel:   Vendor: OPNFILER  Model: VIRTUAL-DISK      Rev: 0
Sep 24 12:20:09 edcnode1 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 04
Sep 24 12:20:09 edcnode1 kernel:   Vendor: OPNFILER  Model: VIRTUAL-DISK      Rev: 0
Sep 24 12:20:09 edcnode1 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 04
Sep 24 12:20:09 edcnode1 kernel:   Vendor: OPNFILER  Model: VIRTUAL-DISK      Rev: 0
Sep 24 12:20:09 edcnode1 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 04
Sep 24 12:20:09 edcnode1 kernel:   Vendor: OPNFILER  Model: VIRTUAL-DISK      Rev: 0
Sep 24 12:20:09 edcnode1 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 04
Sep 24 12:20:09 edcnode1 kernel: scsi 0:0:0:0: Attached scsi generic sg0 type 0
Sep 24 12:20:09 edcnode1 kernel: scsi 1:0:0:0: Attached scsi generic sg1 type 0
Sep 24 12:20:09 edcnode1 kernel: scsi 2:0:0:0: Attached scsi generic sg2 type 0
Sep 24 12:20:09 edcnode1 kernel: scsi 1:0:0:1: Attached scsi generic sg3 type 0
Sep 24 12:20:09 edcnode1 kernel: SCSI device sda: 20971520 512-byte hdwr sectors (10737 MB)
Sep 24 12:20:09 edcnode1 kernel: sda: Write Protect is off
Sep 24 12:20:09 edcnode1 kernel: SCSI device sda: drive cache: write through
Sep 24 12:20:09 edcnode1 kernel: SCSI device sda: 20971520 512-byte hdwr sectors (10737 MB)
Sep 24 12:20:09 edcnode1 kernel: sda: Write Protect is off
Sep 24 12:20:09 edcnode1 kernel: SCSI device sda: drive cache: write through
Sep 24 12:20:09 edcnode1 kernel:  sda: unknown partition table
Sep 24 12:20:09 edcnode1 kernel: sd 0:0:0:0: Attached scsi disk sda

The output will continue with /dev/sdb and other devices exported by the filer.

Prepare the local Oracle Installation

Using fdisk, modify /dev/xvdb, create a partition spanning the whole disk and set its type to “8e” – Linux LVM. It’s always a good idea to use LVM to install Oracle binaries into, it makes later extension of a filesystem easier. I’ll add the fdisk output here for this device but won’t for later partitioning excercises.

root@edcnode1 ~]# fdisk /dev/xvdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.

The number of cylinders for this disk is set to 1305.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
 (e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Command action
 e   extended
 p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1305, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-1305, default 1305):
Using default value 1305

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 8e
Changed system type of partition 1 to 8e (Linux LVM)

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Once /dev/xvdb1 is ready, we need to start its transformation into a logical volume. First, a physical volume is to be created:

[root@edcnode1 ~]# pvcreate /dev/xvdb1
 Physical volume "/dev/xvdb1" successfully created

The physical volume (“PV”) is then used to form a volume group (“VG”). In real life, you’d probably have more than 1 PV to form a VG… I named my volume group “oracle_vg”. The existing volume group is called “root_vg” by the way.

[root@edcnode1 ~]# vgcreate oracle_vg /dev/xvdb1
 Volume group "oracle_vg" successfully create

Wonderful! I never quite remember how many extents this VG has so I need to query it. When using –size 10g it will through an error – some internal overhead will reduce the available capacity to something just shy of 10G:

[root@edcnode1 ~]# vgdisplay oracle_vg
 --- Volume group ---
 VG Name               oracle_vg
 System ID
 Format                lvm2
 Metadata Areas        1
 Metadata Sequence No  1
 VG Access             read/write
 VG Status             resizable
 MAX LV                0
 Cur LV                0
 Open LV               0
 Max PV                0
 Cur PV                1
 Act PV                1
 VG Size               10.00 GB
 PE Size               4.00 MB
 Total PE              2559
 Alloc PE / Size       0 / 0
 Free  PE / Size       2559 / 10.00 GB
 VG UUID               QgHgnY-Kqsl-noAR-VLgP-UXcm-WADN-VdiwO7

Right, so now let’s create a logical volume (“LV”) with 2559 extents:

[root@edcnode1 ~]# lvcreate --extents 2559 --name grid_lv oracle_vg
 Logical volume "grid_lv" created

And now we need a file system:

[root@edcnode1 ~]# mkfs.ext3 /dev/oracle_vg/grid_lv

You are done! Create the mountpoint for your oracle installation, /u01/ in my case, and grant oracle:oinstall ownership to it. In this lab excercise I didn’t create a separate owner for the Grid Infrastructure to avoid potentially undiscovered problems in 11.2.0.2 and stretched RAC. Finally add this to /etc/fstab to make it persistent:

[root@edcnode1 ~]# echo "/dev/oracle_vg/grid_lv   /u01   ext3 defaults 0 0" >> /etc/fstab
[root@edcnode1 ~]# mount /u01

Now continue to partition the iSCSI volumes, but don’t create file systems on top of them. You should not assign a partition type other than the default “Linux” to it either.

ASMLib

Yes I know…The age old argument, but I decided to use it anyway. The reason is simple: scsi_id doesn’t return a value in para-virtualised Linux, which makes it impossible to set up device name persistence with udev. And ASMLib is easier to use anyway! But if your system administrators are database agnostic and not willing to learn the basics about ASM, then probably ASMLib is not a good idea to be rolled out. It’s only a matter of time until someone executes an “rpm -Uhv kernel*” to your box and of course a) didn’t tell the DBAs and b) didn’t bother applying the ASMLib kernel module. But I digress.

Before you are able to use ASMLib you have to configure it on each cluster node. A sample session could look like this:

[root@edcnode1 ~]# /etc/init.d/oracleasm configure
Configuring the Oracle ASM library driver.

This will configure the on-boot properties of the Oracle ASM library
driver.  The following questions will determine whether the driver is
loaded on boot and what permissions it will have.  The current values
will be shown in brackets ('[]').  Hitting  without typing an
answer will keep that current value.  Ctrl-C will abort.

Default user to own the driver interface []: oracle
Default group to own the driver interface []: dba
Start Oracle ASM library driver on boot (y/n) [n]:
Scan for Oracle ASM disks on boot (y/n) [y]:
Writing Oracle ASM library driver configuration: done
Dropping Oracle ASMLib disks:                              [  OK  ]
Shutting down the Oracle ASMLib driver:                    [  OK  ]
[root@edcnode1 ~]#

Now with this done, it is possible to create the ASMLib maintained ASM disks. For the LUNs presented by filer01 these be

  • ASM01FILER01
  • ASM02FILER01
  • OCR01FILER01
  • OCR02FILER01

The disks are created using the /etc/init.d/oracleasm createdisk command as in these examples:

[root@edcnode1 ~]# /etc/init.d/oracleasm createdisk asm01filer01 /dev/sda1
Marking disk "asm01filer01" as an ASM disk:                [  OK  ]
[root@edcnode1 ~]# /etc/init.d/oracleasm createdisk asm02filer01 /dev/sdc1
Marking disk "asm02filer01" as an ASM disk:                [  OK  ]
[root@edcnode1 ~]# /etc/init.d/oracleasm createdisk ocr01filer01 /dev/sdb1
Marking disk "ocr01filer01" as an ASM disk:                [  OK  ]
[root@edcnode1 ~]# /etc/init.d/oracleasm createdisk ocr02filer01 /dev/sdd1
Marking disk "ocr02filer01" as an ASM disk:                [  OK  ]

Switch over to the second node now to validate the configuration and to continue the configuration of the iSCSI LUNs from filer02. Define the domU with a similar configuration file as shown above for edcnode1, and start the domU. Once the wait for DHCP timeouts is over and you are presented with a login, set up the network as shown above. Install the iscsi initiator package, change the initiator name and discover the targets from filer02 in addition to those from filer01.

[root@edcnode2 ~]# iscsiadm -t st -p 192.168.101.51 -m discovery
192.168.101.51:3260,1 iqn.2006-01.com.openfiler:asm02Filer02
192.168.101.51:3260,1 iqn.2006-01.com.openfiler:ocrvoteFiler02
192.168.101.51:3260,1 iqn.2006-01.com.openfiler:asm01Filer02
[root@edcnode2 ~]# iscsiadm -t st -p 192.168.101.50 -m discovery
192.168.101.50:3260,1 iqn.2006-01.com.openfiler:asm01Filer01
192.168.101.50:3260,1 iqn.2006-01.com.openfiler:ocrvoteFiler01
192.168.101.50:3260,1 iqn.2006-01.com.openfiler:asm02Filer01

Still on the second node, continue the mounting of the scsi devices

[root@edcnode2 ~]# service iscsi start
iscsid (pid  2802) is running...
Setting up iSCSI targets: Logging in to [iface: default, target: iqn.2006-01.com.openfiler:asm02Filer02, portal: 192.168.101.51,3260]
Logging in to [iface: default, target: iqn.2006-01.com.openfiler:ocrvoteFiler02, portal: 192.168.101.51,3260]
Logging in to [iface: default, target: iqn.2006-01.com.openfiler:asm01Filer01, portal: 192.168.101.50,3260]
Logging in to [iface: default, target: iqn.2006-01.com.openfiler:asm01Filer02, portal: 192.168.101.51,3260]
Logging in to [iface: default, target: iqn.2006-01.com.openfiler:ocrvoteFiler01, portal: 192.168.101.50,3260]
Logging in to [iface: default, target: iqn.2006-01.com.openfiler:asm02Filer01, portal: 192.168.101.50,3260]
Login to [iface: default, target: iqn.2006-01.com.openfiler:asm02Filer02, portal: 192.168.101.51,3260]: successful
Login to [iface: default, target: iqn.2006-01.com.openfiler:ocrvoteFiler02, portal: 192.168.101.51,3260]: successful
Login to [iface: default, target: iqn.2006-01.com.openfiler:asm01Filer01, portal: 192.168.101.50,3260]: successful
Login to [iface: default, target: iqn.2006-01.com.openfiler:asm01Filer02, portal: 192.168.101.51,3260]: successful
Login to [iface: default, target: iqn.2006-01.com.openfiler:ocrvoteFiler01, portal: 192.168.101.50,3260]: successful
Login to [iface: default, target: iqn.2006-01.com.openfiler:asm02Filer01, portal: 192.168.101.50,3260]: successful

Partition the disks from filer02 the same way as shown in the previous example. On edcnode2, fdisk reported the following as new disks

Disk /dev/sda doesn't contain a valid partition table
Disk /dev/sdb doesn't contain a valid partition table
Disk /dev/sdf doesn't contain a valid partition table

Disk /dev/sda: 10.6 GB, 10670309376 bytes
Disk /dev/sdb: 2650 MB, 2650800128 bytes
Disk /dev/sdf: 10.7 GB, 10737418240 bytes

Note that /dev/sda and /dev/sdf are the 2 10G LUNs for ASM data, and /dev/sdb is the OCR/voting disk combination. Next, create the additional ASMLib disks:

[root@edcnode2 ~]# /etc/init.d/oracleasm scandisks
...
[root@edcnode2 ~]# /etc/init.d/oracleasm createdisk asm01filer02 /dev/sda1
Marking disk "asm01filer02" as an ASM disk:                [  OK  ]
[root@edcnode2 ~]# /etc/init.d/oracleasm createdisk asm02filer02 /dev/sdf1
Marking disk "asm02filer02" as an ASM disk:                [  OK  ]
[root@edcnode2 ~]# /etc/init.d/oracleasm createdisk ocr01filer02 /dev/sdb1
Marking disk "ocr01filer02" as an ASM disk:                [  OK  ]
[root@edcnode2 ~]# /etc/init.d/oracleasm listdisks
ASM01FILER01
ASM01FILER02
ASM02FILER01
ASM02FILER02
OCR01FILER01
OCR01FILER02
OCR02FILER01

Perform another scandisks command on edcnode1 to have all the disks:

[root@edcnode1 ~]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks:               [  OK  ]
[root@edcnode1 ~]# /etc/init.d/oracleasm listdisks
ASM01FILER01
ASM01FILER02
ASM02FILER01
ASM02FILER02
OCR01FILER01
OCR01FILER02
OCR02FILER01

Summary

All done!And I seriously thought initially that this was going to be a shorter post than the others, how wrong I was. Congratulations on having arrived here at the bottom of the article by the way.

In the course of this post I prepared my virtual machines to begin the installation of Grid Infrastructure. The ASM disk names will be persistent across reboots thanks to ASMLib, and no messing around with udev for that matter. You might notice that there are 2 ASM disk from filer01 but only 1 from filer02 for the voting disk/OCR diskgroup, and that’s for a reason. I’m cheeky and won’t tell you here, that’s for another post later…

martin.bach's picture

First contact with Oracle 11.2.0.2 RAC

As you may know, Oracle released the first patchset on top of 11g Release 2. At the time of this writing, the patchset is out for 32bit and 64bit Linux, 32bit and 64bit Solaris SPARC and Intel. What an intersting combination of platforms… I thought there was no Solaris 32bit on Intel anymore.

Upgrade

Oracle has come up with a fundamentally different approach to patching with this patchset. The long version of this can be found in MOS document 1189783.1 “Important Changes to Oracle Database Patch Sets Starting With 11.2.0.2″. The short version is that new patches will be supplied as full releases. This is really cool, and some people have asked why that wasn’t always the case. In 10g Release 2, to get to the latest version with all the patches, you had to

  • Install the base release for Clusterware, ASM and at least one RDBMS home
  • Install the latest patchset on Clusterware, ASM and the RDBMS home
  • Apply the latest PSU for Clusterware/RDBMS, ASM and RDBMS

Especially applying the PSUs for Clusterware were very labour intensive. In fact, for a fresh install it was usually easier to install and patch everything on only one node and then extend the patched software homes to the other nodes of the cluster.

Now in 11.2.0.2 things are different. You no longer have to apply any of the interim releases-the patch contains everything you need, already on the correct version. The above process is shortened to:

  • Install Grid Infrastructure 11.2.0.2
  • Install RDBMS home 11.2.0.2

Optionally, apply PSUs or other patches when they become available. Currently, MOS note 756671.1 doesn’t list any patch as recommended on top of 11.2.0.2.

Interestingly upgrading from 11.2.0.1 to 11.2.0.2 is more painful than from Oracle 10g, at least on the Linux platform. Before you can run rootupgrade.sh, the script tests if you applied the Grid Infrastructure PSU for 11.2.0.1.2. OUI hasn’t performed the test when it checked for prerequisistes which caught me off-guard. The casual observer may now ask: why do I have to apply a PSU when the bug fixes should be rolled up into the patchset anyway? I honestly don’t have an answer, other than that if you are not on Linux you should be fine.

Grid Infrastructure will be an out-of-place upgrade which means you have to manage your local disk space very carefully from now on. I would not use anything less than 50-75G on my Grid Infrastructure mount point.This takes the new cluster health monitor facility (see below) into account, as well as the fact that Oracle performs log rotation for most logs in $GRID_HOME/log.

The RDBMS binaries can be patched either in-place or out-of-place. I’d say that the out-of-place upgrade for RDBMS binaries is wholeheartedly recommended as it makes backing out a change so much easier. As I said, you don’t have a choice for Grid Infrastructure which is always out-of-place.

And then there is the multicast issue Julian Dyke (http://juliandyke.wordpress.com/) has written about. I couldn’t reproduce the test case, and my lab and real-life clusters run with 11.2.0.2 happily.

Changes to Grid Infrastructure

After the successful upgrade you’d be surprised to find new resources in Grid Infrastructure. Have a look at these:

[grid@node1] $ crsctl stat res -t -init
-----------------------------------------------------------------
NAME           TARGET  STATE        SERVER          STATE_DETAILS
-----------------------------------------------------------------
Cluster Resources
-----------------------------------------------------------------
ora.asm
 1        ONLINE  ONLINE       node1           Started
ora.cluster_interconnect.haip
 1        ONLINE  ONLINE       node1
ora.crf
 1        ONLINE  ONLINE       node1
ora.crsd
 1        ONLINE  ONLINE       node1
ora.cssd
 1        ONLINE  ONLINE       node1
ora.cssdmonitor
 1        ONLINE  ONLINE       node1
ora.ctssd
 1        ONLINE  ONLINE       node1           OBSERVER
ora.diskmon
 1        ONLINE  ONLINE       node1
ora.drivers.acfs
 1        ONLINE  ONLINE       node1
ora.evmd
 1        ONLINE  ONLINE       node1
ora.gipcd
 1        ONLINE  ONLINE       node1
ora.gpnpd
 1        ONLINE  ONLINE       node1
ora.mdnsd
 1        ONLINE  ONLINE       node1

The cluster_interconnect.haip is yet another step towards the self contained system. The Grid Infrastructure installation guide for Linux states:

“With Redundant Interconnect Usage, you can identify multiple interfaces to use for the cluster private network, without the need of using bonding or other technologies. This functionality is available starting with Oracle Database 11g Release 2 (11.2.0.2).”

So – good news for anyone who is relying on third party software like for example HP ServiceGuard for network bonding. Linux has always done this for you, even in the times of the 2.4 kernel. Linux network bonding is actually quite simple to set up as well. But anyway, I’ll run a few tests in the lab when I have time with this new feature enabled, deliberately taking down NICs to see if the new feature works as labelled on the tin. The documentation states that you don’t need to bond your NICs for the private interconnect, simply leave the ethx (or whatever name you NICs have on your OS) as they are, and indicate the ones you like to use for the private interconnect as private during the installation. If you decide to add a NIC to the cluster for use with the private interconnect later, use oifcfg as root to add the new interface (or watch this space for a later blog post on this). Oracle states that if one of the private interconnects fails, it will transparently use another one. Additionally to the high availability benefit, Oracle apparently also performs load balancing across the configured interconnects.

To learn more about the redundant interconnect feature I had a glance at its profile. As with any resource in the lower stack (or HA stack), you need to append the “-init” argument to crsctl.

[oracle@node1] $ crsctl stat res ora.cluster_interconnect.haip -p -init
NAME=ora.cluster_interconnect.haip
TYPE=ora.haip.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=always
CARDINALITY=1
CHECK_INTERVAL=30
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for a Highly Available network IP"
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PLACEMENT=balanced
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_DEPENDENCIES=hard(ora.gpnpd,ora.cssd)pullup(ora.cssd)
START_TIMEOUT=60
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=hard(ora.cssd)
STOP_TIMEOUT=0
UPTIME_THRESHOLD=1m
USR_ORA_AUTO=
USR_ORA_IF=
USR_ORA_IF_GROUP=cluster_interconnect
USR_ORA_IF_THRESHOLD=20
USR_ORA_NETMASK=
USR_ORA_SUBNET=

With this information at hand, we see that the resource is controlled through ORAROOTAGENT, and judging from the start sequence position and the fact that we queried crsctl with the “-init” flag, it must be OHASD’s ORAROOTAGENT.

Indeed, there are references to it in the $GRID_HOME/log/`hostname -s`/agent/ohasd/orarootagent_root/ directory. Further reference to the resource was found in cssd.log which makes perfect sense: it will use it for many things, last but not least fencing.

[ USRTHRD][1122056512] {0:0:2} HAIP: configured to use 1 interfaces
...
[ USRTHRD][1122056512] {0:0:2} HAIP:  Updating member info HAIP1;192.168.52.0#0
[ USRTHRD][1122056512] {0:0:2} InitializeHaIps[ 0]  infList 'inf bond1, ip 192.168.52.155, sub 192.168.52.0'
[ USRTHRD][1122056512] {0:0:2} HAIP:  starting inf 'bond1', suggestedIp '169.254.79.209', assignedIp ''
[ USRTHRD][1122056512] {0:0:2} Thread:[NetHAWork]start {
[ USRTHRD][1122056512] {0:0:2} Thread:[NetHAWork]start }
[ USRTHRD][1089194304] {0:0:2} [NetHAWork] thread started
[ USRTHRD][1089194304] {0:0:2}  Arp::sCreateSocket {
[ USRTHRD][1089194304] {0:0:2}  Arp::sCreateSocket }
[ USRTHRD][1089194304] {0:0:2} Starting Probe for ip 169.254.79.209
[ USRTHRD][1089194304] {0:0:2} Transitioning to Probe State
[ USRTHRD][1089194304] {0:0:2}  Arp::sProbe {
[ USRTHRD][1089194304] {0:0:2} Arp::sSend:  sending type 1
[ USRTHRD][1089194304] {0:0:2}  Arp::sProbe }
...
[ USRTHRD][1122056512] {0:0:2} Completed 1 HAIP assignment, start complete
[ USRTHRD][1122056512] {0:0:2} USING HAIP[  0 ]:  bond1 - 169.254.79.209
[ora.cluster_interconnect.haip][1117854016] {0:0:2} [start] clsn_agent::start }
[    AGFW][1117854016] {0:0:2} Command: start for resource: ora.cluster_interconnect.haip 1 1 completed with status: SUCCESS
[    AGFW][1119955264] {0:0:2} Agent sending reply for: RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:343
[    AGFW][1119955264] {0:0:2} ora.cluster_interconnect.haip 1 1 state changed from: STARTING to: ONLINE
[    AGFW][1119955264] {0:0:2} Started implicit monitor for:ora.cluster_interconnect.haip 1 1
[    AGFW][1119955264] {0:0:2} Agent sending last reply for: RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:343

OK, I know understand this a bit better. But the log information mentioned something else as well, an IP address that I haven’t assigned to the cluster. It turns out that this IP address is another virtual IP on the private interconnect, called bond1:1

[grid]grid@node1 $ /sbin/ifconfig
bond1     Link encap:Ethernet  HWaddr 00:23:7D:3d:1E:77
 inet addr:192.168.52.155  Bcast:192.168.52.255  Mask:255.255.255.0
 inet6 addr: fe80::223:7dff:fe3c:1e74/64 Scope:Link
 UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
 RX packets:33155040 errors:0 dropped:0 overruns:0 frame:0
 TX packets:20677269 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:0
 RX bytes:21234994775 (19.7 GiB)  TX bytes:10988689751 (10.2 GiB)
bond1:1   Link encap:Ethernet  HWaddr 00:23:7D:3d:1E:77
 inet addr:169.254.79.209  Bcast:169.254.255.255  Mask:255.255.0.0
 UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

Ah, something running multicast. I tried to sniff that traffic but couldn’t make any sense if it. There is UDP (not TCP) multicast traffic on that interface. This can be checked with tcpdump:

root@node1 ~]# tcpdump src 169.254.79.209 -i bond1:1 -c 10  -s 1514
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond1:1, link-type EN10MB (Ethernet), capture size 1514 bytes
14:30:18.704688 IP 169.254.79.209.55310 > 169.254.228.144.31112: UDP, length 252
14:30:18.704943 IP 169.254.79.209.55310 > 169.254.169.62.20057: UDP, length 252
14:30:18.705155 IP 169.254.79.209.55310 > 169.254.45.135.30040: UDP, length 252
14:30:18.895764 IP 169.254.79.209.51227 > 169.254.228.144.57323: UDP, length 192
14:30:18.895976 IP 169.254.79.209.51227 > 169.254.228.144.21319: UDP, length 296
14:30:18.897109 IP 169.254.79.209.48094 > 169.254.45.135.40464: UDP, length 192
14:30:18.897633 IP 169.254.79.209.48094 > 169.254.45.135.40464: UDP, length 192
14:30:18.897998 IP 169.254.79.209.48094 > 169.254.169.62.48215: UDP, length 192
14:30:18.902325 IP 169.254.79.209.51227 > 169.254.228.144.57323: UDP, length 192
14:30:18.902422 IP 169.254.79.209.51227 > 169.254.228.144.21319: UDP, length 296
10 packets captured
14 packets received by filter
0 packets dropped by kernel

If you are interested in the actual messages, use this command instead to capture a package:

[root@node1 ~]# tcpdump src 169.254.79.209 -i bond1:1 -c 1 -X -s 1514
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond1:1, link-type EN10MB (Ethernet), capture size 1514 bytes
14:31:43.396614 IP 169.254.79.209.58803 > 169.254.169.62.16178: UDP, length 192
 0x0000:  4500 00dc 0000 4000 4011 ed04 a9fe 4fd1  E.....@.@.....O.
 0x0010:  a9fe a93e e5b3 3f32 00c8 4de6 0403 0201  ...>..?2..M.....
 0x0020:  e403 0000 0000 0000 4d52 4f4e 0003 0000  ........MRON....
 0x0030:  0000 0000 4d4a 9c63 0000 0000 0000 0000  ....MJ.c........
 0x0040:  0000 0000 0000 0000 0000 0000 0000 0000  ................
 0x0050:  a9fe 4fd1 4d39 0000 0000 0000 0000 0000  ..O.M9..........
 0x0060:  e403 0000 0000 0000 0100 0000 0000 0000  ................
 0x0070:  5800 0000 ff7f 0000 d0ff b42e 0f2b 0000  X............+..
 0x0080:  a01e 770d 0403 0201 0b00 0000 67f2 434c  ..w.........g.CL
 0x0090:  0000 0000 b1aa 0500 0000 0000 cf0f 3813  ..............8.
 0x00a0:  0000 0000 0400 0000 0000 0000 a1aa 0500  ................
 0x00b0:  0000 0000 0000 ae2a 644d 6026 0000 0000  .......*dM`&....
 0x00c0:  0000 0000 0000 0000 0000 0000 0000 0000  ................
 0x00d0:  0000 0000 0000 0000 0000 0000            ............
1 packets captured
10 packets received by filter
0 packets dropped by kernel

Substitute the correct values of course for interface and source address.

Oracle CRF resources

Another intersting new feature is the CRF resource, which seems to be an implementation of IPD/OS Cluster Health Monitor on the servers. I need to dig a little deeper in this feature, currently I can’t get any configuration data from the cluster:

[grid@node1] $ oclumon showobjects

 Following nodes are attached to the loggerd
[grid@node1] $

You will see some additional background processes now, namely ologgerd and osysmond.bin, which are started through the CRF resource. The resource profile (shown below) suggests that this resource is started through OHASD’s ORAROOTAGENT and can take custom logging levels.

[grid]grid@node1 $ crsctl stat res ora.crf -p -init
NAME=ora.crf
TYPE=ora.crf.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=always
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=30
CLEAN_ARGS=
CLEAN_COMMAND=
DAEMON_LOGGING_LEVELS=CRFMOND=0,CRFLDREP=0,...,CRFM=0
DAEMON_TRACING_LEVELS=CRFMOND=0,CRFLDREP=0,...,CRFM=0
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for Crf Agents"
DETACHED=true
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=hard(ora.gpnpd)
START_TIMEOUT=120
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=hard(shutdown:ora.gipcd)
STOP_TIMEOUT=120
UPTIME_THRESHOLD=1m
USR_ORA_ENV=

An investigation of orarootagent_root.log revealed that the rootagent indeed starts the CRF resource. This resource will start the ologgerd and oysmond processes, which then write their log files into $GRID_HOME/log/`hostname -s`/crf{logd,mond}.

Configuration of the daemons can be found in $GRID_HOME/ologgerd/init and $GRID_HOME/osysmond/init. Except for the PID file for the daemons there didn’t seem to be anything of value in the directory.

The command line of the ologgerd process shows it’s configuration options:

root 13984 1 0 Oct15 ? 00:04:00 /u01/crs/11.2.0.2/bin/ologgerd -M -d /u01/crs/11.2.0.2/crf/db/node1

The files in the directory specified by the “-d” flag denote where the process stores its logging information. The files are in BDB format, or Berkeley DB (now Oracle too). The oclumon tool should be able to read these files, but until I can persuade it to connect to the host there is no output.

CVU

Unlike the previous resources, the cvu resource is actually cluster aware. It’s the Cluster Verification Utility we all know from installing RAC. Going by the profile (shown below), I conclude that the utility is run through the grid software owner’s scriptagent and has exactly 1 incarnation on the cluster. It is only executed every 6 hours and restarted if it fails. If you like to execute a manual check, simply execute the action script with the command line argument “check”.

[root@node1 tmp]# crsctl stat res ora.cvu -p
NAME=ora.cvu
TYPE=ora.cvu.type
ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=%CRS_HOME%/bin/cvures%CRS_SCRIPT_SUFFIX%
ACTIVE_PLACEMENT=1
AGENT_FILENAME=%CRS_HOME%/bin/scriptagent
AUTO_START=restore
CARDINALITY=1
CHECK_INTERVAL=21600
CHECK_RESULTS=
CHECK_TIMEOUT=600
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION=Oracle CVU resource
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PLACEMENT=balanced
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=600
SERVER_POOLS=*
START_DEPENDENCIES=hard(ora.net1.network)
START_TIMEOUT=0
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=hard(ora.net1.network)
STOP_TIMEOUT=0
TYPE_VERSION=1.1
UPTIME_THRESHOLD=1h
USR_ORA_ENV=
VERSION=11.2.0.2.0

The action script $GRID_HOME/bin/cvures implements the usual callbacks required by scriptagent: start(), stop(), check(), clean(), abort(). All log information goes into $GRID_HOME/log/`hostname -s`/cvu.

The actual check performed is this one: $GRID_HOME/bin/cluvfy comp health -_format & > /dev/null 2>&1

Summary

Enough for now, this has become a far longer post than I initially anticipated. There are so many more new things around, like Quality of Server that need exploring making it very difficult to keep up.

To prevent automated spam submissions leave this field empty.
Syndicate content