ocssd进程的问题该怎么解决
更新:HHH   时间:2023-1-7


ocssd进程的问题该怎么解决,相信很多没有经验的人对此束手无策,为此本文总结了问题出现的原因和解决方法,通过这篇文章希望你能解决这个问题。

有关ocssd进程的问题解决:

    昨天有一个数据库的用户提出,在数据库服务器上的/var/log/messages文件中,每5秒钟写一些日志,内容:
Feb 27 08:11:44 bj su(pam_unix)[7692]: session opened for user oracle by (uid=0)
Feb 27 08:11:44 bj su(pam_unix)[7692]: session closed for user oracle
Feb 27 08:11:44 bj logger: Failure in CSS initialization opening OCR.
Feb 27 08:11:49 bj su(pam_unix)[7731]: session opened for user oracle by (uid=0)
Feb 27 08:11:49 bj su(pam_unix)[7731]: session closed for user oracle
Feb 27 08:11:49 bj logger: Failure in CSS initialization opening OCR.

我检查了另外一个10g的数据库服务器,相同的文件:
Feb 28 10:02:27 bj sshd(pam_unix)[5985]: session opened for user lisa by (uid=502)
Feb 28 10:06:40 bj sshd(pam_unix)[5985]: session closed for user lisa
Feb 28 15:31:17 bj sshd(pam_unix)[6115]: session opened for user lisa by (uid=502)
Feb 28 15:32:09 bj sshd(pam_unix)[6115]: session closed for user lisa
Mar  1 10:19:54 bj sshd(pam_unix)[15042]: session opened for user lisa by (uid=502)
Mar  1 10:54:29 bj su(pam_unix)[15086]: session opened for user root by lisa(uid=502)
Mar  1 10:54:33 bj su(pam_unix)[15119]: session opened for user oracle by lisa(uid=0)
Mar  1 12:12:30 bj su(pam_unix)[15189]: session opened for user root by lisa(uid=501)

记录的是一些用户登录的信息,以及用户su的信息,其中前面的代码是进程的ID,后面的代码是用户的ID。

查看有问题的数据库服务器的bdump目录和udump目录,以及alert.log文件,均没有发现异常记录。
查看系统进程:
[oracle@db1 udump]$ ps -ef | grep css
root      5716     1  0 Jan11 ?        00:00:00 /bin/sh /etc/init.d/init.cssd run
root      5721  5716  0 Jan11 ?        00:13:17 /bin/sh /etc/init.d/init.cssd startcheck
oracle   17210  5844  0 14:02 pts/2    00:00:00 grep css

正确的数据库服务器上的系统进程:
[root@bj log]# ps -ef | grep css
root      4669     1  0  2004 ?        00:00:00 /bin/su oracle -c exec /home/oracle/product/10.1.0/db_1/bin/ocssd
oracle    4771  4669  0  2004 ?        00:25:53 /home/oracle/product/10.1.0/db_1/bin/ocssd.bin
root     15278 15225  0 14:05 pts/0    00:00:00 grep css

随即我查看了/etc/init.d/init.cssd,没有什么收获,太长了,我没有仔细看。

察看oracle的文档有关css的部分:
Oracle Cluster Synchronization Services (CSS) is a daemon process that is configured by the root.sh script when you install Oracle Database 10g for the first time. It is configured to start every time the system boots. This daemon process is required to enable synchronization between Oracle ASM and database instances. It must be running if an Oracle database is using ASM for database file storage.
CSS是一个后台进程,安装的时候默认安装的,系统启动的时候自动启动,用来做ASM和数据库实例的同步,如果使用ASM则必须要使用这个进程。

先放了一半的心,因为现在的数据库并没有使用ASM,实在不行还可以把它停掉。

然后查看了oracle文档中有关Reconfiguring Oracle Cluster Synchronization Services 部分,摘录如下:
1、Identifying Oracle Database 10g Oracle Homes
To identify all of the Oracle Database 10g Oracle home directories, enter one of the following commands:
$ more /etc/oratab

这是在我的服务器上的结果
[root@bj log]# more /etc/oratab
#

# This file is used by ORACLE utilities.  It is created by root.sh
# and updated by the Database Configuration Assistant when creating
# a database.

# A colon, ':', is used as the field terminator.  A new line terminates
# the entry.  Lines beginning with a pound sign, '#', are comments.
#
# Entries are of the form:
#   $ORACLE_SID:$ORACLE_HOME::
#
# The first and second fields are the system identifier and home
# directory of the database respectively.  The third filed indicates
# to the dbstart utility that the database should , "Y", or should not,
# "N", be brought up at system boot time.
#
# Multiple entries with the same $ORACLE_SID are not allowed.
#
#
# *:/home/oracle/product/10.1.0/db_1:N
$ORACLE_SID:/home/oracle/product/10.1.0/db_1:N
*:/home/oracle/product/10.1.0/db_1:N
$ORACLE_SID:/home/oracle/product/10.1.0/db_1:N

From the output, identify any Oracle home directories where Oracle Database 10g is installed. Oracle homes that contain Oracle Database 10g typically have paths similar to the following. However, they might use different paths.
/mount_point/app/oracle/product/10.1.0/db_n
If there is only one Oracle home directory that contains Oracle Database 10g, see the "Deleting the Oracle CSS Daemon Configuration" section for information about deleting the Oracle CSS daemon configuration.
If you identify more than one Oracle Database 10g Oracle home directory, see the following section for information about reconfiguring the Oracle CSS daemon.

2、Reconfiguring the Oracle CSS Daemon
To reconfigure the Oracle CSS daemon so that it runs from an Oracle home that you are not removing, follow these steps:
In all Oracle home directories on the system, stop all Oracle ASM instances and any Oracle Database instances that use ASM for database file storage.
Switch user to root.
Depending on your operating system, enter one of the following commands to identify the Oracle home directory being used to run the CSS daemon:
# more /etc/oracle/ocr.loc
The output from this command is similar to the following:
ocrconfig_loc=/u01/app/oracle/product/10.1.0/db_1/cdata/localhost/local.ocr
local_only=TRUE

这是在我的服务器上的结果
[root@bj log]# more /etc/oracle/ocr.loc
ocrconfig_loc=/home/oracle/product/10.1.0/db_1/cdata/localhost/local.ocr
local_only=TRUE

The ocrconfig_loc parameter specifies the location of the Oracle Cluster Registry (OCR) used by the CSS daemon. The path up to the cdata directory is the Oracle home directory where the CSS daemon is running (/Volumes/u01/app/oracle/product/10.1.0/db_1 in this example).
Note:
If the value for the local_only parameter is FALSE, Oracle CRS is installed on this system. See the Oracle Real Application Clusters Installation and Configuration Guide for information about removing RAC or CRS.  
If this Oracle home directory is not the Oracle home that you want to remove, you can continue to the "Removing Oracle Software" section.
Change directory to the Oracle home directory for an Oracle Database 10g installation that you are not removing.
Set the ORACLE_HOME environment variable to specify the path to this Oracle home directory:
Bourne, Bash, or Korn shell:
# ORACLE_HOME=/u01/app/oracle/product/10.1.0/db_2;
# export ORACLE_HOME
C shell:
# setenv ORACLE_HOME /u01/app/oracle/product/10.1.0/db_2
Enter the following command to reconfigure the CSS daemon to run from this Oracle home:
# $ORACLE_HOME/bin/localconfig reset $ORACLE_HOME
The script stops the Oracle CSS daemon, reconfigures it in the new Oracle home, and then restarts it. When the system boots, the CSS daemon starts automatically from the new Oracle home.
To remove the original Oracle home directory, see the "Removing Oracle Software" section.

3、Deleting the Oracle CSS Daemon Configuration
To delete the Oracle CSS daemon configuration, follow these steps:
Note:
Delete the CSS daemon configuration only if you are certain that no other Oracle Database 10g installation requires it.  
Remove any databases or ASM instances associated with this Oracle home. See the preceding sections for information about how to complete these tasks.
Switch user to root.
Change directory to the Oracle home directory that you are removing.
Set the ORACLE_HOME environment variable to specify the path to this Oracle home directory:
Bourne, Bash, or Korn shell:
# ORACLE_HOME=/u01/app/oracle/product/10.1.0/db_1;
# export ORACLE_HOME
C shell:
# setenv ORACLE_HOME /u01/app/oracle/product/10.1.0/db_1
Enter the following command to delete the CSS daemon configuration from this Oracle home:
# $ORACLE_HOME/bin/localconfig delete
The script stops the Oracle CSS daemon, then deletes its configuration. When the system boots, the CSS daemon no longer starts.


那么可以试着重新设置或者删除css进程的配置,但是这两个操作需要用root用户来做,但是那台错误的服务器,我并没有root的口令,并且我也没有什么把握。

于是我开始检查我的其他的两台安装10g的服务器:
第一台服务器:
[lisa@localhost lisa]$ ps -ef | grep css
lisa      3336  3294  0 14:39 pts/0    00:00:00 grep css
什么进程也没有,呵呵

网上有人提到,将这个文件的最后一样去掉,就可以将occsd.bin的进程去掉,但是不提倡这样做:
[lisa@localhost lisa]$ cat /etc/inittab
......
# Run xdm in runlevel 5
x:5:respawn:/etc/X11/prefdm -nodaemon
h2:35:respawn:/etc/init.d/init.cssd run >/dev/null 2>&1

查看/etc/oracle/ocr.loc和/etc/oratab,都没有什么问题,和正确的服务器上的配置是相同的。

查看日志文件,现象是每5分钟要执行crsstart,我理解是要启动ocssd进程:
[root@localhost log]# tail messages
Mar  1 14:39:39 localhost logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 14:39:39 localhost logger:    Could not access /etc/oracle/scls_scr/localhost.localdomain/root/crsstart.
Mar  1 14:39:39 localhost logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 14:39:39 localhost logger:    Could not access /etc/oracle/scls_scr/localhost.localdomain/root/crsstart.
Mar  1 14:39:39 localhost logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 14:39:39 localhost logger:    Could not access /etc/oracle/scls_scr/localhost.localdomain/root/crsstart.
Mar  1 14:39:39 localhost logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 14:39:39 localhost logger:    Could not access /etc/oracle/scls_scr/localhost.localdomain/root/crsstart.
Mar  1 14:39:39 localhost init: Id "h2" respawning too fast: disabled for 5 minutes
Mar  1 14:41:48 localhost su(pam_unix)[3482]: session opened for user root by lisa(uid=502)

/etc/oracle/scls_scr/这个目录下面并没有localhost.localdomain这个目录。
查看环境变量:
[root@localhost scls_scr]# env
HOSTNAME=localhost.localdomain
应该是HOSTNAME不对造成的,于是修改HOSTNAME。
由于修改HOSTNAME操作遇到一点儿问题,所以,我当时打算放弃了,注掉了/etc/inittab最后一行,企图停止进程启动。

[lisa@localhost lisa]$ cat /etc/inittab
......
# Run xdm in runlevel 5
x:5:respawn:/etc/X11/prefdm -nodaemon
#h2:35:respawn:/etc/init.d/init.cssd run >/dev/null 2>&1

但是注掉这一行以后,并没有如我所愿的没有再写日志,问题还是一如既往地存在着。

如果我直接执行那个文件,提示没有权限(不论用root还是oracle):
[root@localhost log]# /etc/oracle/scls_scr/*/root/crsstart
bash: /etc/oracle/scls_scr/****/root/crsstart: Permission denied

最后在网管的指导下,成功修改了HOSTNAME(呵呵,汗颜)

再检查日志文件:
[root@bj34 log]# tail messages
Mar  1 15:23:01 localhost su(pam_unix)[8583]: session opened for user oracle by (uid=0)
Mar  1 15:23:01 localhost su(pam_unix)[8583]: session closed for user oracle
Mar  1 15:23:01 localhost logger: Failed 3 to bind listening endpoint: (ADDRESS=(PROTOCOL=tcp)(HOST=**))
Mar  1 15:23:05 localhost su(pam_unix)[8623]: session opened for user root by lisa(uid=502)
Mar  1 15:23:06 localhost su(pam_unix)[8655]: session opened for user oracle by (uid=0)
Mar  1 15:23:06 localhost su(pam_unix)[8655]: session closed for user oracle
Mar  1 15:23:06 localhost logger: Failed 3 to bind listening endpoint: (ADDRESS=(PROTOCOL=tcp)(HOST=**))
Mar  1 15:23:11 localhost su(pam_unix)[8695]: session opened for user oracle by (uid=0)
Mar  1 15:23:11 localhost su(pam_unix)[8695]: session closed for user oracle
Mar  1 15:23:11 localhost logger: Failed 3 to bind listening endpoint: (ADDRESS=(PROTOCOL=tcp)(HOST=**))
仍然提示了错误,但是问题改变了,每5秒钟写一次,应该还是机器名配置的问题

查看系统进程:
[root@bj34 log]# ps -ef | grep css
root      4722     1  0 15:14 ?        00:00:00 /bin/su -l oracle -c exec /home/oracle/product/10.1.0/db_1/bin/ocssd
oracle    9834  4722  0 15:25 ?        00:00:00 /home/oracle/product/10.1.0/db_1/bin/ocssd.bin
root      9957  9921  0 15:27 pts/0    00:00:00 grep css

进程上倒是对了的。

这次我修改了/etc/hosts,问题就解决了:
没有再写日志文件。


第二台服务器:
系统进程:
[root@bj72 root]# ps -ef | grep css
root      5498     1  0 15:50 ?        00:00:00 /bin/sh /etc/init.d/init.cssd run
root      5501  5498  0 15:50 ?        00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root      5515  5465  0 15:51 pts/0    00:00:00 grep css

[root@bj72 lisa]# tail /var/log/messages
Mar  1 15:45:50 bj72 logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 15:45:50 bj72 logger:    Could not access /etc/oracle/scls_scr/****/root/crsstart.
Mar  1 15:45:50 bj72 logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 15:45:50 bj72 logger:    Could not access /etc/oracle/scls_scr/****/root/crsstart.
Mar  1 15:45:50 bj72 logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 15:45:50 bj72 logger:    Could not access /etc/oracle/scls_scr/****/root/crsstart.
Mar  1 15:45:50 bj72 logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 15:45:50 bj72 logger:    Could not access /etc/oracle/scls_scr/****/root/crsstart.
Mar  1 15:45:50 bj72 init: Id "h2" respawning too fast: disabled for 5 minutes
Mar  1 15:47:55 bj72 su(pam_unix)[5464]: session opened for user root by lisa(uid=502)

也是提示错误的,但是错误的情况不同
在/etc/oracle/scls_scr/目录下面没有****这个目录,查看环境变量,****为HOSTNAME,由于这台服务器在数据库已经安装完毕,且运行了一段时间以后迁移到其他机房,并更换了IP和HOSTNAME,想必是这个原因引起的。

这次不能修改HOSTNAME了,所以我把****目录重命名为新的HOSTNAME,五分钟后:
[root@bj72 root]# tail /var/log/messages
Mar  1 15:45:50 bj72 logger: Oracle Cluster Ready Services disabled by corrupt install
Mar  1 15:45:50 bj72 logger:    Could not access /etc/oracle/scls_scr/****/root/crsstart.
Mar  1 15:45:50 bj72 init: Id "h2" respawning too fast: disabled for 5 minutes
Mar  1 15:47:55 bj72 su(pam_unix)[5464]: session opened for user root by lisa(uid=502)
Mar  1 15:50:51 bj72 su(pam_unix)[5505]: session opened for user oracle by (uid=0)
Mar  1 15:50:51 bj72 su(pam_unix)[5505]: session closed for user oracle
Mar  1 15:51:51 bj72 su(pam_unix)[5498]: session opened for user oracle by (uid=0)
Mar  1 15:51:52 bj72 su(pam_unix)[5498]: session closed for user oracle
Mar  1 15:51:52 bj72 su(pam_unix)[5532]: session opened for user oracle by (uid=0)
Mar  1 15:51:52 bj72 su(pam_unix)[5532]: session closed for user oracle

问题发生了变化,但是仍然存在。这次我试来试去都不行,决定把进程停掉:
首先
[root@bj72 etc]# cp inittab.no_cssd inittab感觉上就是把最后一行删掉了而已,仍旧写日志。

执行:
[root@bj72 bin]# $ORACLE_HOME/bin/localconfig reset $ORACLE_HOME
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Oracle Cluster Registry for cluster has been initialized

Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Adding to inittab
/home/oracle/product/10.1.0/db_1/bin/localconfig: line 1:  /bin/cp: No such file or directory
Checking the status of new Oracle init process...
Expecting the CRS daemons to be up within 600 seconds.
Giving up: Oracle CSS stack appears NOT to be running.
Oracle CSS service would not start as installed
Automatic Storage Management(ASM) cannot be used until Oracle CSS service is started

再查看进程,已经没有了:
[root@bj72 bin]# ps -ef | grep css
root      6279  5465  0 16:30 pts/0    00:00:00 grep css

查看日志文件,可以看到重新配置的过程,后来也没有再写:
[root@bj72 bin]# tail  /var/log/messages
Mar  1 16:17:19 bj72 su(pam_unix)[6125]: session opened for user oracle by (uid=0)
Mar  1 16:17:20 bj72 su(pam_unix)[6125]: session closed for user oracle
Mar  1 16:17:20 bj72 su(pam_unix)[6156]: session opened for user oracle by (uid=0)
Mar  1 16:17:20 bj72 su(pam_unix)[6156]: session closed for user oracle
Mar  1 16:18:20 bj72 su(pam_unix)[6149]: session opened for user oracle by (uid=0)
Mar  1 16:18:21 bj72 su(pam_unix)[6149]: session closed for user oracle
Mar  1 16:18:21 bj72 su(pam_unix)[6178]: session opened for user oracle by (uid=0)
Mar  1 16:18:21 bj72 su(pam_unix)[6178]: session closed for user oracle
Mar  1 16:19:09 bj72 lisa: (Oracle CSSD will be run out of init)
Mar  1 16:19:09 bj72 init: Re-reading inittab

根据我掌握的这三台服务器的情况看,用户所提出的问题应该是HOSTNAME修改造成的,于是建议用户用root执行
$ORACLE_HOME/bin/localconfig reset $ORACLE_HOME

执行的结果:
[root@db1 oracle]# $ORACLE_HOME/bin/localconfig reset $ORACLE_HOME
nThe following environment variables are set as:
   ORACLE_OWNER= oracle
   ORACLE_HOME=  /home1/oracle/product/10.1.0/db_1
Failure at scls_scr_create with code 1
Internal Error Information:
 Category: 1234
 Operation: scls_scr_create
 Location: mkdir
 Other: Unable to make user dir
 Dep: 2
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Oracle Cluster Registry for cluster has been initialized

Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Usage: /etc/init.d/init.cssd {start|stop|run|fatal|startcheck|activatevg}
Adding to inittab
Checking the status of new Oracle init process...
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
       db1
CSS is active on all nodes.
Oracle CSS service is installed and running under init(1M)

查看进程:
[oracle@db1 oracle]$ ps -ef | grep css
root      5716     1  0 Jan11 ?        00:00:00 /bin/su -l oracle -c exec /home1/oracle/product/10.1.0/db_1/bin/ocssd
oracle    9933  5716  0 17:15 ?        00:00:00 /home1/oracle/product/10.1.0/db_1/bin/ocssd.bin

也没有再写日志文件,至此问题解决。

综上所述,在数据库服务器安装完毕以后,如果修改了HOSTNAME,会导致ocssd进程启动错误,因为进程启动的目录是写死了机器名的,执行
$ORACLE_HOME/bin/localconfig reset $ORACLE_HOME
重新配置参数就可以解决了。

看完上述内容,你们掌握ocssd进程的问题该怎么解决的方法了吗?如果还想学到更多技能或想了解更多相关内容,欢迎关注天达云行业资讯频道,感谢各位的阅读!

返回网络安全教程...