最初作为实验性质的GridControl逐渐在监控组件中占据了重要位置,于是将其重建至一台更稳定的服务器上(基于RHEL 5.3),途中遇若干反贼,平定之。
1. 寻找libdb.so.2
安装时遇到下面的错误提示,如图:

提示OPMN无法启动,查看opmn status,HTTP_Server处于Down状态
[oracle@query ~]$ cd $OMS_HOME/opmn/bin
[oracle@query bin]$ ./opmnctl status
Processes in Instance: EnterpriseManager0.query.rwdata
-------------------+--------------------+---------+---------
ias-component | process-type | pid | status
-------------------+--------------------+---------+---------
DSA | DSA | N/A | Down
HTTP_Server | HTTP_Server | N/A | Down
LogLoader | logloaderd | N/A | Down
dcm-daemon | dcm-daemon | N/A | Down
OC4J | home | 12214 | Alive
WebCache | WebCache | 12224 | Alive
WebCache | WebCacheAdmin | 12215 | Alive
关闭其他组件,重启OPMN,HTTP_Server依然杯具
[oracle@query bin]$ ./opmnctl stopproc process-type=home
opmnctl: stopping opmn managed processes...
[oracle@query bin]$ ./opmnctl stopproc process-type=WebCache
opmnctl: stopping opmn managed processes...
[oracle@query bin]$ ./opmnctl stopproc process-type=WebCacheAdmin
opmnctl: stopping opmn managed processes...
[oracle@query bin]$ ./opmnctl startall
opmnctl: starting opmn and all managed processes...
================================================================================
opmn id=query.rwdata:6202
3 of 4 processes started.
ias-instance id=EnterpriseManager0.query.rwdata
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ias-component/process-type/process-set:
HTTP_Server/HTTP_Server/HTTP_Server
Error
--> Process (pid=12409)
在超过最大重试限制之后仍无法启动受管进程
Log:
/oracle/gridcontrol/oms10g/opmn/logs/HTTP_Server~1
检查日志/oracle/gridcontrol/oms10g/opmn/logs/HTTP_Server~1,貌似与libdb.so.2千丝万缕。。。
10/05/28 13:37:16 Start process
--------
/oracle/gridcontrol/oms10g/Apache/Apache/bin/apachectl start: execing httpd
/oracle/gridcontrol/oms10g/Apache/Apache/bin/httpd: error while loading shared libraries: libdb.so.2: cannot open shared object file: No such file or directory
尝试通过emctl启动OMS,libdb.so.2只闻其声不贱其人。。。
[oracle@query bin]$ cd $OMS_HOME/bin
[oracle@query bin]$ ./emctl start oms
Oracle Enterprise Manager 10g Release 3 Grid Control
Copyright (c) 1996, 2007 Oracle Corporation. All rights reserved.
opmnctl: opmn is already running
ADMN-202027
A problem has occurred reading the initial configuration and storing it into repository
Resolution:
Please refer to the base exception for resolution, or call Oracle support.
Base Exception:
/oracle/gridcontrol/oms10g/Apache/Apache/bin/httpd: error while loading shared libraries: libdb.so.2: cannot open shared object file: No such file or directory
Resolution:
Please make sure the values entered in OHS configuration files are correct.
oracle.ias.sysmgmt.exception.InvalidConfigurationException: Base Exception:
/oracle/gridcontrol/oms10g/Apache/Apache/bin/httpd: error while loading shared libraries: libdb.so.2: cannot open shared object file: No such file or directory
Resolution:
Please make sure the values entered in OHS configuration files are correct.
at oracle.ias.sysmgmt.repository.plugin.advanced.apache.StateTranslator.checkConfigFileValidity(Unknown Source)
at oracle.ias.sysmgmt.repository.plugin.advanced.apache.StateTranslator.validateConfigDuringEvaluate(Unknown Source)
at oracle.ias.sysmgmt.repository.plugin.advanced.apache.PlugInImpl.localConfigValidation(Unknown Source)
at oracle.ias.sysmgmt.repository.DcmPlugin.localConfigValidation(Unknown Source)
at oracle.ias.sysmgmt.repository.RepositoryImpl.performLocalValidation(Unknown Source)
at oracle.ias.sysmgmt.repository.SyncUpHandler._updatePluginConfigData(Unknown Source)
at oracle.ias.sysmgmt.repository.SyncUpHandler.syncUpFromLocalFiles(Unknown Source)
at oracle.ias.sysmgmt.repository.RepositoryImpl.syncUpFromLocalFiles(Unknown Source)
at oracle.ias.sysmgmt.utility.editpropagator.PropagateLocalEdit.repositoryInit(Unknown Source)
at oracle.ias.sysmgmt.persistence.utility.PMUtility.initConfiguration(Unknown Source)
at oracle.ias.sysmgmt.task.TaskMaster.initConfiguration(Unknown Source)
at oracle.ias.sysmgmt.task.TaskMaster.sysInit(Unknown Source)
at oracle.ias.sysmgmt.task.TaskMaster.sysInit(Unknown Source)
at oracle.ias.sysmgmt.task.InstanceManager.sysInit(Unknown Source)
at oracle.ias.sysmgmt.task.InstanceManager.init(Unknown Source)
at oracle.ias.sysmgmt.cmdline.DcmCmdLine.checkInit(Unknown Source)
at oracle.ias.sysmgmt.cmdline.DcmCmdLine.execute(Unknown Source)
at oracle.ias.sysmgmt.cmdline.DcmCmdLine.main(Unknown Source)
Starting HTTP Server ...
Starting Oracle Management Server ...
Checking Oracle Management Server Status ...
Oracle Management Server is not functioning because of the following reason:
Unexpected error occurred. Check error and log files.
解决方法:建立指向/usr/lib/libgdbm.so.2.0.0的符号链接/usr/lib/libdb.so.2
[oracle@query bin]$ su - root
Password:
[root@query ~]# ln -s /usr/lib/libgdbm.so.2.0.0 /usr/lib/libdb.so.2
[root@query ~]# chmod 755 /usr/lib/libgdbm.so.2.0.0
重启OPMN,HTTP Server原地复活
[root@query ~]# su - oracle
[oracle@query ~]$ source .bash_profile_rwgrid
[oracle@query ~]$ cd $OMS_HOME/opmn/bin
[oracle@query bin]$ ./opmnctl stopproc process-type=home
opmnctl: stopping opmn managed processes...
[oracle@query bin]$ ./opmnctl stopproc process-type=WebCache
opmnctl: stopping opmn managed processes...
[oracle@query bin]$ ./opmnctl stopproc process-type=WebCacheAdmin
opmnctl: stopping opmn managed processes...
[oracle@query bin]$ ./opmnctl startall
opmnctl: starting opmn and all managed processes...
[oracle@query bin]$
[oracle@query bin]$
[oracle@query bin]$ ./opmnctl status
Processes in Instance: EnterpriseManager0.query.rwdata
-------------------+--------------------+---------+---------
ias-component | process-type | pid | status
-------------------+--------------------+---------+---------
DSA | DSA | N/A | Down
HTTP_Server | HTTP_Server | 13829 | Alive
LogLoader | logloaderd | N/A | Down
dcm-daemon | dcm-daemon | N/A | Down
OC4J | home | 13830 | Alive
WebCache | WebCache | 13841 | Alive
WebCache | WebCacheAdmin | 13832 | Alive
遗憾的是,中断的安装过程无法断点续传。。。只能选择重新来过 - -||
2. Agent返乡
新的OMS主公上位了,失散的Agent贤弟们需要重新吹响集结号。。。
方法也很简单,关闭Agent后,修改$AGENT_HOME/sysman/config/emd.properties,将其中对应的URL指向新的OMS,然后重启Agent
REPOSITORY_URL=https://query.rwdata:1159/em/upload
emdWalletSrcUrl=http://b2b.rwdata:4889/em/wallets/emd
部分文档上指出需要删除Agent原有的配置文件并重建,就10.2.0.5这个版本的测试情况来看,直接重启Agent即可,原有的主机、数据库及监瑞脑消金兽听名称都会保留。
当然,如果RP不佳。。。可能会出现需要重建Agent的状况。。。这时需要先删掉OMS中关于Agent和Target的配置信息,删除Target可以通过WEB界面操作,而删除Agent的PL/SQL代码如下
begin
mgmt_admin.cleanup_agent('主机名:端口号');
commit;
end;
建议先删除Target,然后再删Agent,不然有可能在OMS后台库的表中残留信息,导致后续添加Target时提示错误,如下:
java.sql.SQLException: ORA-20600: The specified target is in the process of being deleted.(target name = RWDB)(target type = oracle_database)(target guid = DBCB2D54577145C54B2A6188B5101F4F) ORA-06512: at "SYSMAN.TARGETS_INSERT_TRIGGER", line 46 ORA-04088: error during execution of trigger 'SYSMAN.TARGETS_INSERT_TRIGGER' ORA-06512: at "SYSMAN.EM_TARGET", line 2117 ORA-06512: at "SYSMAN.MGMT_TARGET", line 2701 ORA-06512: at line 1
这时需要使用PL/SQL代码手动删除
begin
mgmt_admin.delete_target_internal('RWDB', 'oracle_database');
commit;
end;
begin
mgmt_admin.delete_target_internal('LISTENER_RWDB', 'oracle_listener');
commit;
end;
最后提醒一下,重建OMS后,不要忘了在OMS主机的/etc/hosts文件中添加Agent主机名及对应的IP。。。否则从WEB端操作或查看Agent信息会受到影响。。。不瞒您说。。。微臣粗心大意,被如此囧囧有神的问题困扰了整整两天。。。