这篇文章主要介绍“Oracle Exadata存储服务器原理是什么”,在日常操作中,相信很多人在Oracle Exadata存储服务器原理是什么问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答”Oracle Exadata存储服务器原理是什么”的疑惑有所帮助!接下来,请跟着小编一起来学习吧!
Exadata Storage Server的机制
Exadata的I/O流程
Exadata的抗故障性
–“Storage”故障时,DB的操作
–Grid Infrastructure / DB Patch : 11.2.0.3.8 等的5行
–Exadata Storage Server Software Patch : 11.2.3.1.1 等的5行
例)与Oracle Database 11.2.0.3 的情况的「11.2」相同
例)与Oracle Database 11.2.0.3 情况的「3」相同
※11.2.2.4.2对应11.2.0.3
Exadata
架构(全景)

InfiniBand Network
→实现Cell的完全Scale-out
Exadata Storage Server上有哪些进程?
oracle是如何连接ASM/DB的IO/ASM层到Exadata 存储服务器的?

Cell Server(CELLSRV)负责与DB Server的信息交换。于是变成多线程,各个线程对磁盘与网络执行非同步I/O
Management Server (MS)管理制成grid disk,变更H/W、SNMPTrap、警报、email通知、缺点
Restart Server (RS)
监视CELLSRV与MS。进程的存亡、内存使用状況等。

通过Exadata
Storage Server Software检测HW故障的机制
–也可以从CellCLI 查看HW故障
–对于ILOM中自节点的eth0 (management eth)的IP, SNMP trap会跳过。


■ 从CellCLI 观察的情况
[root@cell01] # cellcli -e list cell detail | tail -n 3
cellsrvStatus: running
msStatus: running
rsStatus: running
■ 从ps 中观察的情况
CELLSRV 进程
[root@cell01] # ps -ef | grep "bin/cellsrv " | grep -v grep
root 13085 13084 【略】 /opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/bin/cellsrv 100 5000 9 5042
RS 进程
[root@cell01] # ps -ef | grep cellrssrm | grep -v grep
root 11081 1 【略】 /opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/bin/cellrssrm
■ 从ps 中观察的情况 续表
[root@cell01]# ps -ef | grep oc4j | grep -v grep
root 12093 12092 【略】 /usr/java/jdk1.5.0_15/bin/java -Xms256m -Xmx512m -Djava.library.path=/opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/lib -Ddisable.checkForUpdate=true -jar /opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/oc4j/ms/j2ee/home/oc4j.jar -out /opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/deploy/log/ms.lst -err /opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/deploy/log/ms.err
Master diskmon (diskmon) 通过CRS,与CSS同时启动与CELLSRV通信
Slave diskmon (dskm) 作为各个实例的一部存在与master进行通信
执行检测Cell故障,分配IO fencing 、IO资源管理

Cellinit.ora
# cat /etc/oracle/cell/network-config/cellinit.ora
ipaddress1=192.168.20.81/22

# cat /etc/oracle/cell/network-config/cellip.ora
cell=”192.168.20.91”
cell=”192.168.20.92”
cell=”192.168.20.93”
diskmon
与
dskm
■ 从crsctl 中观察的情况
[root@db01] # /u01/app/11.2.0.3/grid/bin/crsctl stat res -t -init | grep -A2 diskmon
ora.diskmon
1 ONLINE ONLINE katana01m
■ 从ps 中观察的情况
[root@db01] # ps -ef | grep diskmon | grep -v grep
oracle 24362 1 0 Aug06 ? 00:01:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d –f
[root@db01] # ps -ef | grep dskm | grep -v grep
oracle 22500 1 0 Aug09 ? 00:00:00 ora_dskm_dgprmy1
oracle 25083 1 0 Aug06 ? 00:00:00 asm_dskm_+ASM1
Cell Server认识DB Server的机制

DB/ASM通过Libcell(library)与CELLSRV通信
通信中,使用iDB 协议
Exadata存储通信中使用的Exadata固有的协议
基于RDS协议来构筑的InfiniBand上的操作

Exadata
Storage Server Software的警报日志
追踪文件以及警报日志在Automatic Diagnostic Repository(ADR)中配置
alert.log (from RS and CELLSRV), ms-odl.log, ms-odl.trc, rs*trc, svtrc*.trc
与Oracle Database相同,可以使用ADRCI进行管理

日志文件、追踪文件
–$ADR_BASE/diag/asm/cell/<hostname>/trace/
–存在CELLSRV、RS、MS的警报日志以及追踪文件
alert.log : CELLSRV与RS的警报日志
ms-old.log / ms-old.trc : MS的警报日志与追踪文件
svtrc_<pid>_<tid>.trc (svtrc_13661_0.trc) : CELLSRV的追踪文件
rstrc_<pid>_<tid>.trc (rstrc_27528_4.trc) : RS的追踪文件
Diskmon的日志文件
–$ORA_CRS_HOME/log/<hostname>/diskmon/
进程、设定文件等一览(数据库中)
diskmon、dskm
master diskmon (diskmon) 与CSS同时启动,与CELLSRV通信。
Slave diskmon (dskm)是各个实例的一部分,与master diskmon通信。
cell故障以及I/Ofencing、I/O资源管理计划
cellip.ora
cellinit.ora

内存使用率等。Backup RS监视Core RS。
各进程的作用
进程 | 服务器 | 作用 |
CELLSRV | Exadata | 对于磁盘以及网络发现非同步IO。 |
MS | Exadata | 制成/删除Grid磁盘、变更H/W、展示、管理SNMP trap、警报、缺点 |
Core RS | Exadata | 监视CELLSRV与MS。监视进程存亡以及、内存使用率。 |
Backup RS | Exadata | 监视Backup RS与Core RS |
Diskmon | DB Server | 与CSS同时启动、与CELLSRV通信。cell故障以及I/Ofencing、I/O资源管理计划的传播 |
Dskm | DB Server | 各实例的后台进程中与master diskmon通信。 |
通过RS进程进行的监视①
alter cell startup services all 执行时 (Backup RS的启动)
[root@cell01] # ps -ef | grep cellrssrm | grep -v grep
root 11081 1 【略】 /opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/bin/cellrssrm
[root@cell01] # ps -ef | grep cellrsbkm | grep -v grep
root 11089 11087 【略】
/opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/bin/cellrsbkm
-rs_conf
/opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/deploy/config/cellinit.ora
-ms_conf
/opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/deploy/config/cellrsms.state
-cellsrv_conf
/opt/oracle/cell11.2.3.1.1_LINUX.X64_120607/cellsrv/deploy/config/cellrsos.state
-debug 0

通过RS进程进行监视②
alter cell startup services all 执行时 (CELLSRV , MS的启动)
监视对象的进程启动时就会生成监视这些项目的进程。
监视进程:
–cellrssmt – server monitoring process
–cellrsbmt – backup monitoring process
–cellrsomt – oss (outdated name of cellsrv) monitoring process
–cellrsmmt – ms monitoring process
All names prefixed with cellrs
monitoring procs suffixed with mt

制成Heartbeat与incident
–服务重启之前,监视进程会生ADR incident

通过ps
观察的监视进程①

通过ps
观察的监视进程②

通过ps
观察的监视进程③

通过ps
观察的监视进程④

主要进程故障时的操作①
【Thu Aug 23 20:25:40 JST 2012 : kill Core RS进程】
-------------------------------------------------------------------------------
Thu Aug 23 20:25:41 2012 : RS-7445 [Serv RS_MAIN is absent] [It will be restarted]
Thu Aug 23 20:25:42 2012 : cellrsomt / cellrsbmt / cellrsmmt をshotdown
Thu Aug 23 20:25:43 2012 : [RS] Started Service RS_MAIN with pid 26177
Thu Aug 23 20:25:43 2012 : cellrsomt / cellrsbmt / cellrsmmt 新pid重新启动
-------------------------------------------------------------------------------
【另外、CELLSRV / MS /Backup RS 进程之父为”1”】
【 Thu Aug 23 20:47:56 :kill Backup RS进程】
-------------------------------------------------------------------------------
Thu Aug 23 20:47:57 2012 : RS-7445 [Serv RS_BACKUP is absent] [It will be restarted]
Thu Aug 23 20:47:58 2012 : cellrssmt をshotdown
Thu Aug 23 20:47:58 2012 : [RS] Started Service RS_BACKUP with pid 28476
Thu Aug 23 20:47:58 2012 : cellrssmt を新たしいpidで
【Thu Aug 23 20:55:44 JST 2012 : kill MS进程】
-------------------------------------------------------------------------------
Thu Aug 23 20:55:44 2012 : RS-7445 [Serv MS is absent] [It will be restarted]
: [RS] Started Service MS with pid 3839
【Thu Aug 23 20:38:56 JST 2012 : kill CELLSRV 进程】
-------------------------------------------------------------------------------
Thu Aug 23 20:38:56 2012 : RS-7445 [Serv CELLSRV is absent] [It will be restarted]
:通过新的pid重启cellrsomt
Thu Aug 23 20:38:59 2012 : FlashLog的有效化
Thu Aug 23 20:38:59 2012 : [RS] Started Service CELLSRV with pid 9593
Thu Aug 23 20:39:00 2012 : diskmon的Heartbeat开始
Thu Aug 23 20:39:03 2012 : FlashCache的有效化
Exadata的I/O的种类
–一个或者多个的Database Block的I/O
–Filtering(行filtering )以及 Predicate evaluation(列filtering )通过Storage Server来执行
–Storage Server中的数据处理結果返回到DB Server中,执行剩余处理
Smart I/O的种类
–行的filtering :仅将必需的行返回到DB服务器中
–列的filtering :仅将必需的列返回到DB服务器中
–Join filtering :使用Bloom filter,結合之前,会执行cell中的filtering
–索引扫描:index fast full scan的话就会执行Smart Scan
–仅仅读入有变更的块
–在cell中unload数据块格式
–在cell中unload数据块格式
Smart I/O的操作

Smart Scan的选择
–不通过SGA(缓冲区高速缓存),直接读入到PGA中
–采用Direct Read的案例(11g)
并行执行的情况
串行执行的全表扫描中,表尺寸较大的情况
→ 花费时间较长的并行查询以及全表扫描等处理都会成为Smart Scan的对象。
1.与CBO是否使用Exadata无关,直接以Global水平制成执行计划
2.觉得是否进行Direct lead
3.如果Exadata上已经存在对象文件,就会使用smart scan
※从11.2.0.3.8开始,追加Exadata固有的获得系统统计的功能,Optimizer可以制成关照到使用了Exadata的情况的计划了。
Optimizer的演进
11.2.0.3.8以后、系统统计的选项中,通过追加了Exadata 选项得到改善
–Exadata V2 Half 11.2.0.3.9 / 11.2.3.1.1
–表 test_tbl
※元表:60,000,000行。缩小为1/6,000

※元表:60,000,000行。缩小为1/6,000

11.2.0.3.8
中的Cost计算①
–对其他查询有影响
–Exadata的话,Full Scan的成本估计比起之前降低了
一次OS 读入中,读入的db block数
–Exadata中,1次OS读入量为1MB
因此8KB db block的情况下会读入128 block
=> 但是,Optimizer使用MBRC = 8
begin
dbms_stats.gather_system_stats('EXADATA');
end;
/
–DB_FILE_MULTI_BLOCK_READ_COUNT的値(默认値 : 1MB / blocksize)
(Exadata中,包含11.2.0.2.18 or 11.2.0.3.8 以上)


※元表:60,000,000行。缩小1/6,000


Smart Scan不适用的案例
详细内容请参考User’s Guide 「7 Monitoring and Tuning Oracle Exadata Storage Server Software」
The CELL_OFFLOAD_PROCESSING parameter is set to FALSE.
The table or partition being scanned is small.
The optimizer does not use direct path read.
A scan is performed on a clustered table.
A scan is performed on an index-organized table.
A fast full scan is performed on compressed indexes.
A fast full scan is performed on reverse key indexes.
The table has row dependencies enabled or the rowscn is being fetched.
The optimizer wants the scan to return rows in ROWID order.
The optimizer does not use direct path read.
The command is CREATE INDEX using nosort.
A LOB or LONG column is being selected or queried.
A SELECT … VERSIONS query is done on a table.
A query that has more than 255 columns referenced and heap table is
uncompressed, or Basic or OLTP compressed. However such queries on
Exadata Hybrid Columnar Compression-compressed tables are offloaded.
The tablespace is encrypted, and the CELL_OFFLOAD_DECRYPTION
parameter is set to FALSE. In order for Exadata Cell to perform
decryption, Oracle Database needs to send the decryption key to Exadata
Cell. If there are security concerns about keys being shipped across the
network to Exadata Cell, then disable the decryption feature.
The tablespace is not completely stored on Exadata Cell.
The predicate evaluation is on a virtual column.
对加密数据进行Smart Scan
–TDE表区域加密化、TDE列加密化数据都可以进行SmartScan
–可以灵活使用Xeon 5600 / E7 处理器的AES-NI功能
–通过在存储中卸载多个处理的过载可以大幅提高DB Server的CPU效率
–通过加密化数据进行filtering,可以减少向DB Server发送的 数据量
HCC压縮时的操作
–在直接路径加载时执行压缩
–数据在每个列中以列単位执行压缩
–压縮处理通过服务器自身来执行进程

对HCC压縮数据执行查询时的操作
–以压缩状态读入到缓冲区高速缓存中,在PGA中展开
–全表扫描的情况
–在Exadata Storage Server上执行解压,应用Smart Scan。
※列filtering的情况中,仅将对应的列以压缩状态传送到Database Server中。
–Database Server的解压处理中减少CPU过载
–压縮数据通过行filtering,可以减少向DB Server发送的数据量

Smart I/O 【Optimized Smart Scan】
11.2.2.3开始的新功能
Smart Scan时,通过排除Cell的CPU瓶颈,提高性能
监视Cell的CPU使用率,CPU使用率超过阀值的话,就会作为通过cell执行的处理的一部分在DB中执行
用户以及管理者的不需要设置就可以自动执行的功能
通过Optimized Smart Scan,如果不执行Smart Scan的话,在Cell中使用CPU的下列处理都会被跳过
–Smart Scan
–压縮数据的解压处理
–加密数据的解密处理
–判断是否制成Storage Index等
Optimized Smart Scan的活用例
所有Smart IO都是Optimized Smart Scan的对象。以Cell/DB CPU使用率为基准进行判断

Optimized Smart Scan的特征
通过各Cell进行Optimized Smart Scan判断时,是以1MB数据単位来执行的,并不是SQL単位以及DB単位
cellsrv进程每0.2秒都会获得Cell CPU使用率
如果接受Smart I/O 需求的话,以现在的Cell/DB的CPU使用率为基础,就可以判断是否需要执行,不执行Smart Scan等直接返回到DB中的处理
DB与Exadata
Storage Grid的統合
–Cluster member发生变化后不执行STALE I/O
–可以不破坏数据高效变更结构
数据库可以查看存储的操作,存储也可以查看数据库的操作
这是仅限Exadata存储中的合作功能

处理ASM硬件故障
–在Allocation Unit水平上执行镜像。
–使用Exadata时,会自动对每个cell制成故障group(可能同时发生故障的磁盘组合)primary与镜像的AU会分别储存在各自的故障group中。
–磁盘以及Cell故障从数据库中穿透性地执行

Brown
out的保护
*Brown out=暂时中止等
–Read I/O会重放被镜像化的数据库。
–追踪失败的Write I/O。
–例)cell crash以及临时hang
–例)更新cell软件

Cell以及CELLSRV故障时的操作
–临时的,时间较短的故障的案例较多。
–CELLSRV 进程的故障时会即时检测到RS进程,就会重启CELLSRV。这时,IO客户端就不会返回IO故障报告,这种案例可以通过自动重新连接继续进行IO处理来处理(Automatic Reconnect)
–Cell终止时(OS,设备终止)到重启为止会花费较多时间。
–DB Server中的Diskmon进程会监视Cell,检测到cell故障时,就会舍弃对应的cell,对此,ASM就会在对应的cell的磁盘中被舍弃。DB一起ASM的IO会重放其他的cell中的ASM的镜像。
–观察IO客户端的话,CELLSRV性能就会下降,就会变成待机状态,直到diskmon吧故障cell排除为止,都会使得那个cell上的IO进行hang
Cell以及CELLSRV故障时的Brownout时间
–记载着每个版本各个部件故障时,对应用的影响时间
–最新版的Outage Matrix如下述Note所示
–Oracle Exadata Database Machine Unplanned Outage Matrix 11203 BP7 – 11.2.3.1.1 (Doc ID 1471527.1)
检测Cell以及CELLSRV故障的检测
(11.2.0.3 + 11.2.3.1.1)
例1)CELLSRV进程故障等的情况
Brownout时間:数秒(最大8秒)



停止Cell上的软件stack
–因为CELLSRV, MS, RS全部终止了,进程无法重启
[root@jigenc01 ~]# date; service celld stop; date;
Mon Aug 6 19:16:53 JST 2012
Stopping the RS, CELLSRV, and MS services...
The SHUTDOWN of services was successful.
Mon Aug 6 19:17:04 JST 2012
[root@jigenc01 ~]#
查看Diskmon的日志
2012-08-06 19:16:52.281: [ DISKMON][16663:1105365312] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2012-08-06 19:16:55.284: [ DISKMON][16663:1105365312] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2012-08-06 19:16:58.285: [ DISKMON][16663:1105365312] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2012-08-06 19:16:58.463: [ DISKMON][16663:1096874304] dskm_tcpmon_thrd_main: Detected a cell death o/192.168.20.51. Posting hbb thread.
2012-08-06 19:16:58.463: [ DISKMON][16663:1111669056] dskm_hb_thrd_main7: posted out of skgxpwait()
2012-08-06 19:16:58.463: [ DISKMON][16663:1111669056] dskm_hb_thrd_main7.1: posted by TCPmon thread
2012-08-06 19:16:58.463: [ DISKMON][16663:1111669056] INFO: Entering Cell Reconnect: rscnam: o/192.168.20.51 rsc: 0x10b81040 state: UNREACHABLE reconn_attempts: 0 last_reconn_ts: 1344248182
2012-08-06 19:16:58.463: [ DISKMON][16663:1111669056] dskm_node_guids_are_offline: query SM done. retcode = 56891(REACHABLE)
2012-08-06 19:16:58.464: [ DISKMON][16663:1111669056] dskm_oss_get_net_info3: oss_get_net_info for device o/192.168.20.51 failed with error 5 (nip = 1)
2012-08-06 19:16:58.464: [ DISKMON][16663:1111669056] dskm_ant_rsc_monitor_start1: dskm_oss_get_net_info failed with error 56841
2012-08-06 19:16:58.464: [ DISKMON][16663:1111669056] dskm_ant_rsc_monitor_start: rscnam: o/192.168.20.51 rsc: 0x10b81040 state: UNREACHABLE reconn_attempts: 1 last_reconn_ts: 1344248218
2012-08-06 19:17:00.466: [ DISKMON][16663:1111669056] dskm_node_guids_are_offline: query SM done. retcode = 56891(REACHABLE)
2012-08-06 19:17:00.467: [ DISKMON][16663:1111669056] dskm_oss_get_net_info3: oss_get_net_info for device o/192.168.20.51 failed with error 5 (nip = 1)
2012-08-06 19:17:00.467: [ DISKMON][16663:1111669056] dskm_ant_rsc_monitor_start1: dskm_oss_get_net_info failed with error 56841
2012-08-06 19:17:00.467: [ DISKMON][16663:1111669056] dskm_ant_rsc_monitor_start: rscnam: o/192.168.20.51 rsc: 0x10b81040 state: UNREACHABLE reconn_attempts: 2 last_reconn_ts: 1344248220
2012-08-06 19:17:10.302: [ DISKMON][16663:1105365312] dskm_process_msg5: received msg type KGZM_PING (0x0011)
2012-08-06 19:17:10.478: [ DISKMON][16663:1111669056] dskm_node_guids_are_offline: query SM done. retcode = 56891(REACHABLE)
2012-08-06 19:17:10.479: [ DISKMON][16663:1111669056] dskm_oss_get_net_info3: oss_get_net_info for device o/192.168.20.51 failed with error 5 (nip = 1)
2012-08-06 19:17:10.479: [ DISKMON][16663:1111669056] dskm_ant_rsc_monitor_start1: dskm_oss_get_net_info failed with error 56841
2012-08-06 19:17:10.479: [ DISKMON][16663:1111669056] dskm_ant_rsc_monitor_start: rscnam: o/192.168.20.51 rsc: 0x10b81040 state: UNREACHABLE reconn_attempts: 7 last_reconn_ts: 1344248230
2012-08-06 19:17:10.479: [ DISKMON][16663:1111669056] dskm_queue_tcpmon_request: posting
2012-08-06 19:17:10.479: [ DISKMON][16663:1111669056] dskm_post_tcpmon_thrd
2012-08-06 19:17:10.480: [ DISKMON][16663:1096874304] dskm_tcpmon_thrd_main: posted, poll returned with retcode = 45
2012-08-06 19:17:10.480: [ DISKMON][16663:1096874304] dskm_tcpmon_thrd_main: Got a request with type 2, cellname = o/192.168.20.51, cellname length 16, cell incarnation = 0
2012-08-06 19:17:10.480: [ DISKMON][16663:1096874304] dskm_tcpmon_thrd_main: Cant find the corresponding monitor request in progress, unmonitor request will be ignored
Reconnect 失败8次,进入Cell的evict的进程。
之后diskmon通知dskm,cell down。
Mon Aug 06 19:17:10 2012
Exadata cell: o/192.168.20.51 is no longer accessible. I/O errors to disks on this might get suppressed
Mon Aug 06 19:17:10 2012
NOTE: process _user14223_+asm1 (14223) initiating offline of disk 74.3916043773 (DATA_H_CD_05_JIGENC01) with mask 0x7e[0x7f] in group 1
WARNING: Disk 74 (DATA_H_CD_05_JIGENC01) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 72 (DATA_H_CD_04_JIGENC01) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 73 (DATA_H_CD_02_JIGENC01) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 75 (DATA_H_CD_01_JIGENC01) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
<中略。Cell上的全Griddisk分>
NOTE: initiating PST update: grp = 1, dsk = 79/0xe96a1602, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 80/0xe96a1603, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 81/0xe96a1604, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 82/0xe96a1605, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 1, dsk = 83/0xe96a1606, mask = 0x6a, op = clear
Mon Aug 06 19:17:10 2012
NOTE: process _user26170_+asm1 (26170) initiating offline of disk 66.3916043034 (DBFS_DG_CD_11_JIGENC01) with mask 0x7e[0x7f] in group 4
WARNING: Disk 66 (DBFS_DG_CD_11_JIGENC01) in group 4 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 60 (DBFS_DG_CD_08_JIGENC01) in group 4 in mode 0x7f is now being taken offline on ASM inst 1
WARNING: Disk 61 (DBFS_DG_CD_07_JIGENC01) in group 4 in mode 0x7f is now being taken offline on ASM inst 1
马上对对应的cell中的disk进行offline
2-1. 切断TCP connection,即时检测Cell death
Proactive disk drop
(11.2.1.3.1~)
–在ASM超时之前,就会在pro-active从ASM磁盘group中删除磁盘
–ASM会将故障磁盘上的数据重新移动到其他磁盘上
可以及时回复ASM磁盘group的冗长性,减少丢失数据的危险
自动修复的进程
–MS进程
–XDMG进程(Exadata Automation Manager)
–XDWK进程(Exadata Automation Worker)
操作Proactive Disk Drop的条件
–HDD
–Flash disk
–HDD
–Flash disk
–Flash disk
注意:仅仅通过物理性地拔出HDD ,是无法查看 Failed 的状态的
这种情况的对策一如既往,(将ASM 磁盘offline,等待disk_repair_time
超时,删除。重新插入磁盘的话,就会自动重新制成Griddisk,Celldisk。
自动执行ASM 磁盘的offline或者 Add。
到此,关于“Oracle Exadata存储服务器原理是什么”的学习就结束了,希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习,快去试试吧!若想继续学习更多相关知识,请继续关注天达云网站,小编会继续努力为大家带来更多实用的文章!