Friday, October 23, 2009

My Kids

Friday, February 20, 2009

Enabling large file support dynamically with VxFS

I recently encountered a VxFS file system that didn’t support largefiles. This issue was causing one of our Oracle databases to complain, which was preventing us from using datafiles optimized for our application access patterns. Since the file system was a Veritas File System (VxFS), I was able to fix this problem with the fsadm utility:

$ /usr/lib/fs/vxfs/fsadm -F vxfs -o largefiles /u01

$ mount -p | grep u01
/dev/vx/dsk/oradg/oravol01 - /u01 vxfs - no rw,suid,delaylog,largefiles,ioerror=mwdisable

This operation can be run against mounted live file systems, which is great for production environments.

Solaris notes

Disk Identification

The /dev/dsk and /dev/rdsk directories are checked for disk partition entries - that is, symbolic links with names of the form cN[tN]dNsN, where N represents a decimal number.

cN is the logical controller number, an arbitrary number assigned by this program to designate a particular disk controller. The first controller found on the first occasion this program is run on a system, is assigned number 0.

tN is the bus-address number of a subsidiary controller attached to a peripheral bus such as SCSI or IPI (the target number for SCSI, and the facility number for IPI controllers).

dN is the number of the disk attached to the controller, and

sN is the partition, or slice, number of the entry.

c0t0d0s0

c - controller

t - target (e.g. scsi target)

d - disk (usually 0)

s - slice

To map sd? style disk names to this format, check the link displayed on a long listing.

ls -l /dev/sd?c


Solstice disk suite
The disk suite package normally lives in /usr/opt/SUNWmd/usr/sbin. Add this to the PATH of the root user

PATH=$PATH:/usr/opt/SUNWmd/usr/sbin; export $PATH


The examples for mirror disks use the idea of a "mirror top". This is the metadisk which is actually mounted, as opposed to the two "sides" of the mirror.
Decoding the md.tab

concatentation of 2

d7 2 1 /dev/dsk/c0t0d0s0 1 /dev/dsk/c1t0d0s0

stripe of 2, with 16k interlace

d8 1 2 /dev/dsk/c0t0d0s0 /dev/dsk/c1t0d0s0 -i 16k

concat of 2 stripes of 2

d3 2 2 /dev/dsk/c0t0d0s0 /dev/dsk/c1t0d0s0 -i 16k \
2 /dev/dsk/c0t0d1s0 /dev/dsk/c1t0d1s0

i.e. dx

mirror

d1 -m d2

where d1 is top
Activating the md.tab

metainit -a will read whole file

metainit d2 will initialise given disk

metainit d2 will initialise top of mirror

metattach d1 d3 will attach other side of mirror (synch done in background)

To detach one side of a mirror

metadetach [ -f ]
To attach the "second side"

metattach Note that the newly attached side resilvers in the background automatically. Therefore, if multiple mirrors exist on the same physical disk beware of maxing out the IO on the physical drive if re-attaching multiple mirrors. It is possible to check the current amount of work complete as follows:

metstat | grep %


To clear errors on a metadisk
This procedure should only be applied where the error state is a result of SCSI errors - not when there is an actual error on the disk.

Note also that this process should only be applied for an unmounted disk.

metaclear clears the mdisk from the metadb. The reference will still exist in /usr/opt/SUNWmd/md.tab.

metainit re-reads the info in md.tab and activates the mdisk.

e.g.

# metastat d0
d0: Mirror
Submirror 0: d32
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 4140480 blocks

d32: Submirror of d0
State: Needs maintenance
Invoke: after replacing "Maintenance" components:
metareplace d0 c1t4d0s1
Size: 4140480 blocks
Stripe 0: (interlace: 64 blocks)
Device Start Block Dbase State Hot Spare
c1t3d0s0 0 No Okay
c1t4d0s1 0 No Last Erred
c1t5d0s3 0 No Last Erred

# umount
# metaclear -r d0
# metainit d32
# metainit d0
# mount

To clear a mirror disk where one is in maintenance/one in last err
This procedure should only be applied where the error state is a result of SCSI errors - not when there is an actual error on the disk.

metareplace -e /dev/dsk/c?td?s? where this is the maintenance side of the mirror.

Await resynching, then repeat for the last erred side of the mirror.

e.g.

# metastat d0

d0: Mirror
Submirror 0: d32
State: Needs maintenance
Submirror 1: d33
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 4149600 blocks

d32: Submirror of d0
State: Needs maintenance
Invoke: metareplace d0 c1t0d0s0
Size: 4149600 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c1t0d0s0 0 No Maintenance

d33: Submirror of d0
State: Needs maintenance
Invoke: after replacing "Maintenance" components:
metareplace d0 c1t1d0s0
Size: 4149600 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c1t1d0s0 0 No Last Erred


# metareplace -e d0 /dev/dsk/c1t0d0s0
# #wait
metareplace -e d0 /dev/dsk/c1t1d0s0


Boot
Boot Block


To install the boot information at the start of a given disk (in a reserve area, so data is not overwritten)

installboot /usr/platform//lib/fs/ufs/bootblk /dev/rdsk/c?t?d?s?

where architecture is like sun4d . This can be ascertained from uname -m
prtdiag

prtdiag displays system configuration and diagnostic information.

/usr/platform//sbin/prtdiag -v

where architecture is like sun4d . This can be ascertained from uname -a

The boot prompt is ok
The boot command

boot for full, normal, multiuser
-r to reconfigure device tree (if devices added or removed)
-s single user
-v verbose
-w writable (for cdrom boot)
-a interactive - rarely used, this option asks location of kernel, root filesys type, etc

boot cdrom -svwr to boot from cdrom, single user,verbose, writable fsys, build device tree

boot kadb to boot with lkernel debugger
Getting to the prom
To be precise, ok is the "prom prompt" rather than the "boot prompt" but mostly, we just want to boot when there is an ok prompt. Sometimes, you may need to get to the prom on a running system. The method depends on the console you have. If it is a proper Sun console, there's a good chance that stop-A will work. Otherwise, try F5, Break or L1-A. Be aware that this can hang or kill the system, so use it with caution.
A hanging system
If the system is hanging or you are trying to crash it, be sure to sync the system before rebooting. This way, you have a chance of getting a core dump as the system comes back up.

ok sync

After reboot, if you don't have enough room in /var/crash (and, really, you should have a /var/crash which is larger than physical memory), you can attempt a manual savecore to another area.

e.g.

# cd /spare
# savecore -d .

While you're at the prom level

banner for additional info

reset run hardware diagnostics and reboot

help [command] list of commands or help on given command

probe-scsi-all look for attached SCSI devices

test [device|device-type] run selftests on device e.g. test net

printenv all bootflags

setenv e.g. setenv auto-boot? False

eeprom is the os-level equivalent of printenv and setenv

e.g.

eeprom auto-boot\?=false

Disk Formatting
Disk geometry

An example disk geometry from /etc/format.dat

disk_type = "SUN9.0G" \
: ctlr = SCSI : fmt_time = 4 \
: ncyl = 4924 : acyl = 2 : pcyl = 4926 : nhead = 27 \
: nsect = 133 : rpm = 7200

disk_type is a label

ncyl - number of cylinders available

acyl - additional cylinders

pcyl - ?

nhead - number of heads

nsect - number of sectors per track: 1 sector = 1 block = 512 bytes

rpm - rpm

nsect * nhead = blocks per cylinder

ncyl * nsect * nhead / 2048 = useable space in Mb
VTOC

An example VTOC from /etc/format.dat

partition = "SUN9.0G_4-4GB"\
: disk = "SUN9.0G" : ctlr = SCSI\
: 2 = 0, 17682084\
: 0 = 0, 8618400\
: 1 = 2400, 438102\
: 3 = 2522, 8618400\
: 7 = 4922, 7182

partition is a label

disk matches a defined disk geometry

= , \

where \ is a continuation character

length in blocks should be a multiple of nsect * nhead

partition 2 is the whole disk. Length in blocks of partition 2 should be equal to useable size i.e. nsect * nhead * ncyl Note that with the format menu, the shortest unique string will do - a is sufficient rather than analyze below.

It is often easier to use fmthard to duplicate a disk format

prtvtoc /dev/rdsk/c1t0d0s2 > /tmp/vtoc

fmthard -s /tmp/vtoc /dev/rdsk/c1t1d0s2


# format
Specify disk (enter its number): 3
selecting c0t3d0
[disk formatted]
Warning: Current Disk has mounted partitions.

FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
quit
format> analyze

ANALYZE MENU:
read - read only test (doesn't harm SunOS)
refresh - read then write (doesn't harm data)
test - pattern testing (doesn't harm data)
write - write then read (corrupts data)
compare - write, read, compare (corrupts data)
purge - write, read, write (corrupts data)
verify - write entire disk, then verify (corrupts data)
print - display data buffer
setup - set analysis parameters
config - show analysis parameters
quit


Package info

/var/sadm/install/contents holds details of all packages on the system. pkginfo gives list of packages installed.
Photon arrays

Minimal configuration is 5 disks. The first slots to be filled must be 3 and 6 at the front and 0,3 and 6 in the back. This minimum configuration is a requirement because circuitry on the disk drives regenerate and retime the data signals which corrects signal quality loss accumulated through the bypass circuitry between empty slots. Empty backplanes can generate errors which are intermittent and difficult to isolate. As disks are added they should be spaced to minimize the gaps between disks.
Incorrect screen size

stty rows 24
stty cols 80

Too many open files

The default value for maximum number of files open in Solaris 2.5.1 is 64 - this can be checked with ulimit -n. To set this to a higher default value in 2.5.1, add the following to /etc/system

set rlim_fd_max= #(NOT greater then 1024)
set rlim_fd_cur=

and reboot the system.

To set the value for one user or session;

ulimit #(NOT greater then 1024)

Installing Disks

* shut down the system (init 0)
* plug in the new disks, ensuring no duplicate scsi numbers within a controller. This can be checked with a probe-scsi-all
* do a reconfigure boot (boot -r)

If disks are added and the system is booted without a reconfigure, these can be "seen" by running drvconfig ensures the /devices tree is up to date and correct

disks creates the symbolic links to /devices in /dev/[r]dsk

tapes creates the symbolic links to /devices in /dev/[r]mt
Reading utmp and wtmp
/usr/bin/last converts /var/adm/wtmpx to readable data. cat /var/adm/utmp | /usr/lib/acct/fwtmp > /tmp/utmp.ascii can be used to make utmp readable.
swapfile

To add additional (temporary) swap, can create a filesystem swap file

# mkfile 32M /u/swapfile
# swap -a /u/swapfile

add a line like the following to /etc/vfstab to make it permanent

/u/swapfile - - swap - no -

kernel drivers

To find out modules are currently loaded into the kernel:

# modinfo

Id Loadaddr Size Info Rev Module Name
5 600fa000 3b30 1 1 specfs (filesystem for specfs)
7 6011f000 2bc8 1 1 TS (time sharing sched class)
8 600f0a88 4a4 - 1 TS_DPTBL (Time sharing dispatch table)
9 60142000 234b8 2 1 ufs (filesystem for ufs)
10 6017c000 dc4b 226 1 rpcmod (RPC syscall)
10 6017c000 dc4b 1 1 rpcmod (rpc interface str mod)
11 60196000 277ef 0 1 ip (IP Streams module)
11 60196000 277ef 3 1 ip (IP Streams device)
12 60123550 127f 1 1 rootnex (sun4u root nexus)
. . .

The first field of output is an identifier. It is possible to unload a module from the kernel using that number. This will fail if the driver is busy.

# modunload -i 23

To load a module

# cd /kernel/drv #(or wherever it lives e.g. /usr/kernel/drv )
# modload st

To unload the st driver

The st driver is used to control tape drives. If a tape drive definition has changed, the driver needs to be reloaded before it will use the new definitions. It is possible to manually load it as show above, but it will be autoloaded as soon as atape drive is used. Make sure none of your tape drives are in use:

# for tape in /dev/rmt/?
> do
> mt -f $tape rewind
> done
#

If you get an error like "/dev/rmt/0: no tape loaded or drive offline" don't worry it just means no tapes are loaded and therefore the device can't be in use.
Remote root login
It is, of course, a bad idea to allow network logins as root all the time but it can be extremely useful during initial install once you've got the network up. Edit /etc/default/login and comment out the following line to allow remote login by root.

CONSOLE=/dev/console

Remember to uncomment it when you're done

Network Printers
These days, most printers which are going to be accessed from a Sun box will already be set up and working on the office network. They will already have a name defined in DNS and/or a well known IP address. This can be added to /etc/hosts if required. Once you know where it is, you need to tell the print service as follows:

lpadmin -p printername -s printername

If the printer is not directly on the network but can be accessed via a defined print server, you can use the following form:

lpadmin -p printername -s print_server!printername

To make a printer the default for the server:

lpadmin -d printername

Pretty printing
A sub-topic for setting up printing is to actually print nicely. Desktop-oriented printers probably aren't going to cope well with unix style files but it is possible to get around this using a the 'mp' print pre-processor. The simplest example is:

mp filename | lpr

Or

filep -l -h filename
# -l for landscape format, -h no header page

Look at the man page for more tips and tricks.
makewhatis
What do you do when you get annoying messages like

/usr/man/windex: No such file or directory

when you try a to use man -k (also known as apropos) ?

You need to find and use makewhatis to make your index.

Patching
However much - or little - patching you do, knowing the patch return codes is very useful. The most 'ignorable' codes are 2 (patch already applied) and 8 (attempt to patch a non-installed package). For a complete list, have a look here.

Courthney's Aching eye

Today, Courthney always complain about her aching eyes. So me and my wife is thinking that maybe she need a glasses.

Wednesday, February 18, 2009

Fish...

Last week my wife bought a set of aquarium (the 50RM worth). Yesterday the last fish died. My kids seems not to be bothered but Alex (my second) ask me what should we do those fish. I do not know what to tell her, I just said we need to bury them..

Tuesday, February 17, 2009

US Visa interview Experience

Today we went to US embassy for a tourist visa interview for family. We arrived at the embassy at around 9:30am, our scheduled as stated on our appointment was 10am.

After sometimes queuing and verification at the embassy gate we were allowed to enter the embassy premises. We were then ask to wait at the waiting area. My wife and I anxiously waited for our number to be called while we hear other people being interviewed by the consul. This make me nervous since I heard that some are being denied. My wife noticed that on window number 3 the consul always approved the visa. We are hoping to be interviewed by that consul for that reason.

When our number was called (A101 is our number), We ask to proceed on Window #3. I think we were lucky to be called by the same consul who is on window 3. During the interview, the consul only asked for my monthly salary and the nature of my work and look for my employment letter. After less than 2 minutes chatting with the consul she told us our visa is approved, and wish us good trip. She then gave us instruction on how to pickup our passport the next day.

I think my wife and I are lucky today.

Sunday, February 15, 2009

Alex and Mica