CEPH Bluestore WAL/DB on Software RAID1 for redundancy

As you may already know CEPH is not recommended to be installed over hardware/software RAID due to performance issues. but in this scenario and I will quote CEPH documentation :

If there is a mix of fast and slow devices (spinning and solid state), it is recommended to place block.db on the faster device while block (data) lives on the slower (spinning drive).


The block.db/wal if added on faster device (ssd/nvme) and that fast device dies out you will lose all OSDs using that ssd. And based on your used CRUSH rule such event might lose all your data. so the best solution to mitigate such event is to use RAID1 for that fast device holding your block.db/wal data.

If you have hardware RAID then it should be an easy task but if you don't have hardware RAID then I used this on one of my implementations:/

# Assuming we have 1TB HDD OSD
pvcreate /dev/sdc
# change 01 to 02 for your second OSD
vgcreate cephblock01 /dev/sdc
lvcreate -l 100%FREE -n data cephblock01

# here I had 2 x 512GB ssd
pvcreate /dev/sda1 /dev/sdb1
vgcreate cephdb /dev/sda1 /dev/sdb1

# Optionally create swap on RAID1 SSD/NVMe
lvcreate --mirrors 1 --type raid1 -L4G -n swap cephdb01

# for each 1TB OSD preserve 40GiB block-db
lvcreate --mirrors 1 --type raid1  -L 40G -n block-db cephdb01
# for each 1TB OSD preserve 10GiB block-db
lvcreate --mirrors 1 --type raid1  -L 10G -n block-wal cephdb01

# change 01 to 02 for your second OSD
ceph-volume lvm create --bluestore --data cephblock01/data  --block.db cephdb/block-db  --block.wal cephdb/block-wal

You can even use RAID0 for faster DB/WAL reads/writes. but you will need to plan your CRUSH rule wisely.


Popular posts from this blog

Upgrade an Arabic vbulletin 3.x to 5.x and convert it's mysql data from cp1256/latin1 to utf8