Veritas VCS on AIX.


VCS configuration file:            /etc/VRTSvcs/conf/config/main.cf

To check  Cluster Status :        /opt/VRTSvcs/bin/hastatus -sum


JFS2 Filesystem fscklog Information


The fsck utility is used to repair damaged filesystem. For JFS2 filesystems fsck keeps a log in the fsck working area of what was run and the output of fsck. It keeps this for the last run of fsck and one run prior to that. These can be printed out using /sbin/helpers/jfs2/fscklog.

The fscklog can contain more verbose output than what was written to the screen when fsck was initially run, aiding in debug of filesystem problems.

How can I check which log is current, or if there is one for my filesystem?

In the header file /usr/include/j2/j2_superblock.h we see there is a value for the fscklog, which is stored in the superblock of the filesystem:

int32 s_fscklog; /* 4: which fsck service log is most recent
* 0 => no service log data yet
* 1 => the first one
* 2 => the 2nd one

The superblock on JFS2 filesystem starts at 32kb (0x8000) into the logical volume, and the variable s_fscklog is at 0x80A0

# lquerypv -h /dev/fslv05 80A0 10
000080A0   00000001 00000000 00000000 00000000  |................|

So on this filesystem the first log is the most recent, as the s_fscklog value = 1

How can I view one of the fsck logs?

Using the fscklog utility you can print the contents of either current or previous log. You can supply either the filesystem mount point or the device (logical volume) the filesystem is on.

# /sbin/helpers/jfs2/fscklog /dev/fslv00

or use the filesystem mount point:

# /sbin/helpers/jfs2/fscklog /mymount

If you wish to print out the prior log, use -p flag

# /sbin/helpers/jfs2/fscklog -p /dev/fslv00

Missing disk on AIX



AIX LVM will mark a disk as "missing" when it cannot successfully determine if that disk belongs to the volume group that it is in. Here are some ways to diagnose this problem and some possible solutions to get the disk back in an active state again.

One main symptom is that the lsvg command shows the disk in a "missing" state:

# lsvg -p vgname

datavg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk5            active            646         486         130..00..98..129..129
hdisk6            missing           646         6           00..00..00..00..06


Diagnosing the problem

First data gathering steps to take.
These steps can all be performed by a non-root user.

* Are there any disk errors for this physical volume in the system error report?
$ errpt | less
or for more in-depth error information
$ errpt -a | less

* Is the disk marked in an "Available" state in lsdev output?
$ lsdev -Cc disk -l hdiskX

* Does the disk show up in lspv?
$ lspv

* Is the disk in an "active" state in lsvg?
$ lsvg -p VGNAME

Resolving the problem

Try to read the PVID directly off the drive. This technique uses a lower-level command that bypasses the ODM and will print out information recorded on the disk. You will need to be root or su'ed to root to run this command and many of the ones that follow.

# lquerypv -h /dev/hdiskX 80 10
00000080   00050A85 A9B17061 00000000 00000000  |......pa........|

In the above output the PVID is the 2nd and 3rd columns combined.

Does the PVID returned match the output from lspv?

1. If yes then it's possible there was some temporary problem accessing the disk.

1a. Try running:

# varyonvg VGNAME

which should force the volume group to go out and probe all physical volumes belonging to it.

1b. If varyonvg does not go out and find the disk, we may have to force it into an available state so LVM will check it. To do this use:

# chpv -va hdiskX

It's possible after this you may need to try the varyonvg again.


2. If a PVID is returned but does not match what lspv or the ODM show, then does it exist in the VGDA?

The PVIDs in the VGDA can be viewed easiest using lqueryvg:

# lqueryvg -Ptp hdiskX
( -P will list the PVIDs only )

If the PVID on the drive is in the VGDA, but not in the ODM, the ODM can be updated by forcing a re-read of the PVID from the drive using the chdev command.

Do NOT run this command unless you have verified that there is a PVID on the drive AND that PVID is in the VGDA also on the drive.

# chdev -a pv=yes -l hdiskX

After this check to see if the physical volume shows up with no errors:

# lspv
# lsvg -p VGNAME


3. If the PVID on disk does not exist in the VGDA, a new PVID can be written to the drive and ODM, and the VGDA updated with that new PVID.

      NOTE: You should be suspicious that this may not be the proper disk for this volume group. For example if a LUN was unmapped and then a different one remapped accidentally, that LUN may belong to a completely different volume group!

      You can check this using lqueryvg:

      # lqueryvg -Atp hdiskX

      Run this against the disk in question, and one that is a known good disk in the volume group. Compare PVIDs, logical volume names, etc to insure it really belongs to the same volume group. If not then do not proceed with the steps below.

The volume group will be removed and re-imported using recreatevg.

3a. First get a list of all disks that are part of this volume group

# lsvg -p VGNAME

3b. The next steps will require that all logical volumes in the volume group be closed, so unmount any filesystems and stop any applications that are using raw logical volumes.

3c. Now remove the volume group:

# varyoffvg VGNAME
# exportvg VGNAME

3d. And bring it back in using recreatevg. In this instance we DO NOT want recreatevg to add the default prefixes onto the logical volume names and filesystem mount points, so we add flags to prevent that.

Using recreatevg in this manner it is IMPORTANT to list ALL disks belonging to this volume group. Unlike importvg, recreatevg needs a complete list of physical volumes in order to completely import the volume group and all logical volumes. The exception to this is when "-f" is used.

# recreatevg -L / -Y NA -y VGNAME hdiskX hdiskY hdiskZ

This will write new PVIDs to all drives listed on the command-line and update the VGDA with those PVIDs. It will also import and vary on the volume group.


4. If no PVID is returned at all, or the lqueryvg command hangs, then there is a disk problem. No LVM commands will fix this issue. Contact the correct team who support the disk type being used and have them find a solution to the problem.

Even a brand-new drive, or one completely clean of any LVM information should return with either a PVID or all zeroes:

# lspv | grep hdisk7
hdisk7          none                                None

# lquerypv -h /dev/hdisk7 80 10
00000080   00000000 00000000 00000000 00000000  |................|


Examples of errors that may accompany this issue.

This list is not complete, but may help you to identify the source of the problem. Many of these errors have been seen in conjunction with a disk marked as "missing".

LVM Errors you may see in errpt:

LABEL: LVM_SA_STALEPP
IDENTIFIER: EAA3D429
Description: PHYSICAL PARTITION MARKED STALE

LABEL: LVM_SA_PVMISS
IDENTIFIER: F7DDA124
Description: PHYSICAL VOLUME DECLARED MISSING

LABEL: LVM_SA_WRTERR
IDENTIFIER: 52715FA5
Description: FAILED TO WRITE VOLUME GROUP STATUS AREA

LABEL: LVM_SA_STALEPP
IDENTIFIER: EAA3D429
Description: PHYSICAL PARTITION MARKED STALE

LABEL: LVM_IO_FAIL
IDENTIFIER: E86653C3
Description: I/O ERROR DETECTED BY LVM

LABEL: LVM_QUORUMNOQUORUM
IDENTIFIER: 5BEAD71B
Description: Activation of a no quorum volume group without 100% of the disks.

LABEL: LVM_MISSPVADDED
IDENTIFIER: 26120107
Description: PHYSICAL VOLUME DEFINED AS MISSING


Filesystem errors you may see in errpt:

LABEL: J2_METADATA_EIO
IDENTIFIER: 78ABDDEB
Description: META-DATA I/O ERROR

LABEL: J2_FSCK_REQUIRED
IDENTIFIER: B6DB68E0
Description: FILE SYSTEM RECOVERY REQUIRED

LABEL: J2_LOG_EIO
IDENTIFIER: C1348779
Description: LOG I/O ERROR


Disk errors that may be seen during this problem:

LABEL: SC_DISK_ERR1
IDENTIFIER: 747725D9

LABEL: SC_DISK_ERR2
IDENTIFIER: B6267342

LABEL: SC_DISK_ERR7
IDENTIFIER: DE3B8540

0403-031 The fork function failed.There is not enough memory available.



0403-031  The fork function failed. There is not enough memory available.
If above error is reported. Server need to be rebooted or kill the process consuming more memory






AIX Boot and Console Logs

Boot Logs:  

Boot and Console messages can be used to identify and fix problems.
To view stored messages  use alog command.

 # alog -L                                         < Lists  the defined  log types >

Eg:
# alog -L
boot
bosinst
nim
console
cfg
mdmplog
dumpsymp
lvmt
lvmcfg



-o                
      Lists the contents of the log file. Writes the contents of the  log file to standard output in sequential order.

 -t LogType
    Identifies a log defined in the alog configuration database. The alog command gets the log's file name and size from the alog configuration database. If LogFile does not exist, one is created.

          
# alog -L -t boot
    #file:size:verbosity
    /var/adm/ras/bootlog:131072:1


-L
    Lists the log types currently defined in the alog configuration database. If you use the -L flag with the -t LogType flag, the attributes for a specified LogType are listed. The current values of the File, Size, and Verbosity attributes are listed as colon separated values:

            <File>:<Size>:<Verbosity>


========================================================================
To display all previous logins and logoffs

last  ---->   /var/adm/wtmp

Usage:
# last
# last root console
# last shutdown