1
2
3
4
5
6
7
作者:李晓辉

联系方式:

1. 微信:Lxh_Chat

2. 邮箱:939958092@qq.com

描述 OSD 映射

OSD 映射包含每个 OSD 的地址和状态、池列表和详情,以及其他信息,如 OSD 接近容量限制信息

当集群的基础架构有变化时,例如 OSD 加入或离开集群, MON 会相应地更新对应的映射。MON 维护映射修订的历史记录。Ceph 使用递增整数的有序集合(称为 epoch)来标识各个映射的每一版本

使用 ceph status -f json-pretty 命令可显示每个映射的epoch。使用 ceph map dump 子命令来显示各个映射,如 ceph osd dump。每当 OSD 加入或离开集群时,Ceph 都会更新 OSD 映射。OSD 可能会因为 OSD 故障或硬件故障而离开 Ceph 集群。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
[root@serverc ~]# ceph status -f json-pretty | more

{
"fsid": "2ae6d05a-229a-11ec-925e-52540000fa0c",
"health": {
"status": "HEALTH_OK",
"checks": {},
"mutes": []
},
"election_epoch": 122,
"quorum": [
0,
1,
2,
3
],
"quorum_names": [
"serverc.lab.example.com",
"clienta",
"serverd",
"servere"
],
"quorum_age": 3832,
"monmap": {
"epoch": 4,
"min_mon_release_name": "pacific",
"num_mons": 4
},
"osdmap": {
"epoch": 290,
"num_osds": 9,
"num_up_osds": 9,
"osd_up_since": 1722915456,
"num_in_osds": 9,
"osd_in_since": 1635491258,
"num_remapped_pgs": 0
},

查看osd的信息

果然看到了epoch和池信息,osd信息等

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
[root@serverc ~]# ceph osd dump | more
epoch 290
fsid 2ae6d05a-229a-11ec-925e-52540000fa0c
created 2021-10-01T09:30:32.028240+0000
modified 2024-08-07T06:56:44.969909+0000
flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
crush_version 36
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client luminous
min_compat_client luminous
require_osd_release pacific
stretch_mode_enabled false
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_c
hange 261 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 262
flags hashpspool stripe_width 0 application rgw
pool 3 'default.rgw.log' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_chang
e 263 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.control' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_c
hange 264 flags hashpspool stripe_width 0 application rgw
pool 5 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change
265 lfor 0/184/182 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 8 application rgw
pool 6 'abc' erasure profile default size 4 min_size 3 crush_rule 3 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_chan
ge 290 flags hashpspool stripe_width 8192 application rbd
pool 7 'ecpool' erasure profile lxhecprofile size 6 min_size 5 crush_rule 3 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on l
ast_change 289 flags hashpspool stripe_width 16384 application rbd
max_osd 9
osd.0 up in weight 1 up_from 190 up_thru 281 down_at 189 last_clean_interval [24,185) [v2:172.25.250.12:6800/1196803373,v1:172.25.250.
12:6801/1196803373] [v2:172.25.249.12:6802/1196803373,v1:172.25.249.12:6803/1196803373] exists,up 5be66be9-8262-4c4b-9654-ed549f6280f7
osd.1 up in weight 1 up_from 189 up_thru 276 down_at 188 last_clean_interval [24,185) [v2:172.25.250.12:6807/4220845531,v1:172.25.250.
12:6808/4220845531] [v2:172.25.249.12:6809/4220845531,v1:172.25.249.12:6810/4220845531] exists,up 3f751363-a03c-4b76-af92-8114e38bfa09

分析OSD Map更新

每当有OSD加入或离开集群时,Ceph都会更新OSD的map。一个OSD可以因为OSD故障或硬件故障而离开Ceph集群

OSD 不使用leader来管理 OSD 映射;它们会在自身之间传播映射。OSD 会利用 OSD 映射epoch标记它们交换的每一条消息。当 OSD 检测到自己已拖后时,它会使用其对等 OSD 执行映射更新,接收节点会执行增量映射更新

传播 OSD 映射

OSD 定期向监控器报告自己的状态。此外,OSD 会交换心跳,这样 OSD 可以检测对等点的故障,并将该事件报告给监控器。

当leader监控器了解到 OSD 故障时,它会更新映射,递增epoch,并使用 Paxos 更新协议来通知其他监控器,同时撤销它们的租用。在多数监控器确认更新并且集群具有仲裁后,leader监控器会发布新的租用,使得监控器能够分发更新的 OSD 映射。这种方法可避免映射epoch后退到集群中的任何位置并查找仍然有效的旧租用。