Today, I got an error like this:
SUNW-MSG-ID: FMD-8000-2K, TYPE: Defect, VER: 1, SEVERITY: Minor EVENT-TIME: Tue Aug 4 14:52:43 WIT 2009 PLATFORM: SUNW,SPARC-Enterprise, CSN: BEF09142B8, HOSTNAME: server SOURCE: fmd-self-diagnosis, REV: 1.0 EVENT-ID: d232928b-dd11-e149-b515-82ff5df189f8 DESC: A Solaris Fault Manager component has experienced an error that required the module to be disabled. Refer to http://sun.com/msg/FMD-8000-2K for more information. AUTO-RESPONSE: The module has been disabled. Events destined for the module will be saved for manual diagnosis. IMPACT: Automated diagnosis and response for subsequent events associated with this module will not occur. REC-ACTION: Use fmdump -v -u <EVENT-ID> to locate the module. Use fmadm reset <module> to reset the module. root@server #
and this:
Aug 4 14:37:13 server ufs: NOTICE: alloc: /var: file system full
it seem “/var” directory full. OK, lets check disk space.
root@server # df -h Filesystem size used avail capacity Mounted on /dev/md/dsk/d10 20G 7.0G 12G 37% / /devices 0K 0K 0K 0% /devices ctfs 0K 0K 0K 0% /system/contract proc 0K 0K 0K 0% /proc mnttab 0K 0K 0K 0% /etc/mnttab swap 88G 1.6M 88G 1% /etc/svc/volatile objfs 0K 0K 0K 0% /system/object sharefs 0K 0K 0K 0% /etc/dfs/sharetab fd 0K 0K 0K 0% /dev/fd /dev/md/dsk/d30 20G 20G 0K 100% /var swap 88G 32K 88G 1% /tmp swap 88G 72K 88G 1% /var/run /dev/md/dsk/d40 12G 12M 12G 1% /oracle /dev/md/dsk/d50 36G 37M 36G 1% /internaldisk1
its corrects..
now, lets check whats going on with “/var” directory.
root@server # cd /var/adm/ root@server # ls -ltr total 4242 -rw-rw-rw- 1 root bin 0 Aug 25 2008 spellhist -rw------- 1 uucp bin 0 Aug 25 2008 aculog drwxr-xr-x 2 adm adm 512 Jun 25 18:15 exacct drwxr-xr-x 2 adm adm 512 Jun 25 18:15 log drwxr-xr-x 2 root sys 512 Jun 25 18:15 streams drwxr-xr-x 2 root sys 512 Jun 25 18:19 pool drwxrwxr-x 5 adm adm 512 Jun 25 18:26 acct drwxrwxr-x 2 adm sys 512 Jun 25 18:26 sa drwxr-xr-x 2 root sys 512 Jun 25 18:57 sm.bin -rw-r--r-- 1 root root 0 Jun 25 19:25 vold.log -rw-r--r-- 1 root root 34472 Jun 25 19:27 messages.3 -rw-r--r-- 1 root root 73131 Jul 2 13:40 messages.2 -rw-r--r-- 1 root root 249 Jul 24 18:35 messages.1 -rw-r--r-- 1 root root 1906434 Jul 31 20:15 messages.0 -r--r--r-- 1 root root 28 Aug 4 15:18 lastlog -rw-r--r-- 1 root bin 2232 Aug 4 15:18 utmpx -rw-r--r-- 1 adm adm 51708 Aug 4 15:18 wtmpx -rw-r--r-- 1 root root 81457 Aug 4 15:19 messages
wtmpx,utmpx,messages are normal in size.
Check crash dump and core files:
root@server # dumpadm Dump content: kernel pages Dump device: /dev/md/dsk/d30 (dedicated) Savecore directory: /var/crash/server Savecore enabled: yes root@server # root@server # cd /var/crash/ root@server # ls server root@server # ls -ltr total 2 drwx------ 2 root root 512 Jun 25 19:25 server root@server # cd server/ root@server # ls root@server # ls -ltr total 0
OK, no crash dump or core dump on /var.
finally, after checking each directory on “/var” one by one, found directory “/var/fm” with 18GB space.
root@server # du -sh /var/fm/fmd 18G fmd
root@server # pwd /var/fm/fmd root@server # ls ckpt core.fmd.1506 core.fmd.1726 core.fmd.1948 core.fmd.2168 core.fmd.1284 core.fmd.1508 core.fmd.1730 core.fmd.1950 core.fmd.2170 core.fmd.1287 core.fmd.1510 core.fmd.1732 core.fmd.1952 core.fmd.2172 core.fmd.1290 core.fmd.1512 core.fmd.1734 core.fmd.1954 core.fmd.2174 core.fmd.1293 core.fmd.1514 core.fmd.1736 core.fmd.1956 core.fmd.2176 core.fmd.1295 core.fmd.1516 core.fmd.1738 core.fmd.1958 core.fmd.2178 core.fmd.1297 core.fmd.1518 core.fmd.1740 core.fmd.1960 core.fmd.2180 core.fmd.1299 core.fmd.1520 core.fmd.1742 core.fmd.1962 core.fmd.2182 core.fmd.1302 core.fmd.1522 core.fmd.1744 core.fmd.1964 core.fmd.2184 core.fmd.1304 core.fmd.1526 core.fmd.1746 core.fmd.1966 core.fmd.2186 core.fmd.1307 core.fmd.1528 core.fmd.1748 core.fmd.1968 core.fmd.2188 core.fmd.1310 core.fmd.1530 core.fmd.1750 core.fmd.1970 core.fmd.2190 core.fmd.1312 core.fmd.1532 core.fmd.1752 core.fmd.1972 core.fmd.2192 core.fmd.1314 core.fmd.1534 core.fmd.1754 core.fmd.1974 core.fmd.2194 core.fmd.1316 core.fmd.1536 core.fmd.1756 core.fmd.1976 core.fmd.2196
etc…
etc…
Found alot of “core.fmd.xxx” files on “/var/fm/fmd”.
did you know that, this error is related with Solaris Fault Management, generated by fmd services on Solaris.
what is fmd services?
- fmd is a daemon that runs in the background on each Solaris system. fmd receives telemetry information relating to problems detected by the system software, diagnoses these problems, and initiates proactive self-healing activities such as disabling faulty components. When appropriate, the fault manager also sends a message to the syslogd(1M) service to notify an administrator that a problem has been detected. The message directs administrators to a knowledge article on Sun’s web site, http://www.sun.com/msg/, which explains more about the problem impact and appropriate responses.
Each problem diagnosed by the fault manager is assigned a Universal Unique Identifier (UUID). The UUID uniquely identifes this particular problem across any set of systems. The fmdump(1M) utility can be used to view the list of problems diagnosed by the fault manager, along with their UUIDs and knowledge article message identifiers. The fmadm(1M) utility can be used to view the resources on the system believed to be faulty. The fmstat(1M) utility can be used to report statistics kept by the fault manager. The fault manager is started automatically when Solaris boots, so it is not necessary to use the fmd command directly. Sun’s web site explains more about what capabilities are currently available for the fault manager on Solaris.
The Solaris Fault Management Facility is designed to be integrated into the Service Management Facility to provide a self-healing capability to Solaris 10 systems. The fmd daemon is responsible for monitoring several aspects of system health.
my “/var/fm/fmd” is full because today I’m working on IPMP configuration and testing and there are a problem with my network connection to Cisco Switch. so thats way, Solaris create alot of “core.fmd” files.
you can run “mdump -v -u <EVENT-ID>” to locate the module.
Use fmadm reset <module> to reset the module
Example:
# fmdump -v -u 815bf413-9de6-4667-e118-93dc3bc33e71
for temporary solution, I disable the “fmd” services and remove all core.fmd files from “/var/fm/fmd” directory.
root@server # svcs fmd STATE STIME FMRI online Jul_29 svc:/system/fmd:default root@server # svcadm disable fmd root@server # svcs fmd STATE STIME FMRI disabled 16:48:03 svc:/system/fmd:default
root@server # rm core.fmd.* I'll re enable again the fmd services when I finished with my IPMP configuration..
Source:
Fault Manager Daemon man pages
http://www.princeton.edu/~unix/Solaris/troubleshoot/fm.html