存储相关 ceph 一个编译参数引起的ceph性能大幅下降 zphj1987 2024-03-21 2024-04-17 背景 最近翻ceph的官方的博客,发现了一篇博客提到了一个ubuntu下面的编译参数引起的rocksdb的性能下降,这个地方应该是ceph官方代码的参数没生效
受影响的系统
P版本之前的ceph版本
操作系统是ubuntu的
某些ceph版本
这个要素比较多,所以运行的环境并不一定受到影响,那么我们看下,收到影响的版本是哪些,非ubuntu的系统可以忽略这个问题
我对15的版本比较熟,就以这个版本举例子
受到影响的版本 这个版本是从ceph官方同步过来的版本
1 https://mirrors.tuna.tsinghua.edu.cn/ceph/debian-15.2.17/pool/main/c/ceph/
不受影响的版本 1 https://launchpad.net/ubuntu/focal/+source /ceph
这个是ubuntu官方打包的版本
下载不同的版本 源文件
1 2 3 4 5 6 7 8 9 10 11 12 root@lab103:~ deb http://archive.ubuntu.com/ubuntu/ focal main restricted deb http://archive.ubuntu.com/ubuntu/ focal-updates main restricted deb http://archive.ubuntu.com/ubuntu/ focal universe deb http://archive.ubuntu.com/ubuntu/ focal-updates universe deb http://archive.ubuntu.com/ubuntu/ focal multiverse deb http://archive.ubuntu.com/ubuntu/ focal-updates multiverse deb http://archive.ubuntu.com/ubuntu/ focal-backports main restricted universe multiverse deb http://security.ubuntu.com/ubuntu/ focal-security main restricted deb http://security.ubuntu.com/ubuntu/ focal-security universe deb http://security.ubuntu.com/ubuntu/ focal-security multiverse deb [trusted=yes ] https://mirrors.tuna.tsinghua.edu.cn/ceph/debian-15.2.17 focal main
上面的源是安装ceph的软件包的,如果屏蔽掉最后一行,安装的就是ubuntu官方的版本,如果留着最后一行,安装的就是ceph官方的版本,包的名称不一样,这个很好区分
两个版本的区别 我们下载ubuntu官方的debian的打包文件
1 wget https://launchpad.net/ubuntu/+archive/primary/+sourcefiles/ceph/15.2.17-0ubuntu0.20.04.6/ceph_15.2.17-0ubuntu0.20.04.6.debian.tar.xz
ubuntu的官方的包是把debian的目录剥离出来的
我们再下载ceph官方的源码包,这个跟git里面是同步的我们看下内容
1 wget https://mirrors.tuna.tsinghua.edu.cn/ceph/debian-15.2.17/pool/main/c/ceph/ceph_15.2.17.orig.tar.gz
ceph的官方包里面是有debian目录的,我们直接查看内容
我们需要比对的是debian/rules里面的内容
ubuntu官方的 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 export JAVAC=javacextraopts += -DWITH_OCF=ON -DWITH_NSS=ON -DWITH_PYTHON3=ON -DWITH_DEBUG=ON extraopts += -DWITH_PYTHON2=OFF -DMGR_PYTHON_VERSION=3 extraopts += -DWITH_CEPHFS_JAVA=ON extraopts += -DWITH_CEPHFS_SHELL=ON extraopts += -DWITH_TESTS=OFF extraopts += -DWITH_SYSTEM_BOOST=ON extraopts += -DWITH_LTTNG=OFF -DWITH_EMBEDDED=OFF extraopts += -DCMAKE_INSTALL_LIBEXECDIR=/usr/lib extraopts += -DWITH_MGR_DASHBOARD_FRONTEND=OFF extraopts += -DWITH_SYSTEMD=ON -DCEPH_SYSTEMD_ENV_DIR=/etc/default extraopts += -DCMAKE_INSTALL_SYSCONFDIR=/etc extraopts += -DCMAKE_INSTALL_SYSTEMD_SERVICEDIR=/lib/systemd/system extraopts += -DWITH_RADOSGW_KAFKA_ENDPOINT=OFF extraopts += -DCMAKE_BUILD_TYPE=RelWithDebInfo ifneq (,$(filter parallel=%,$(DEB_BUILD_OPTIONS))) NUMJOBS = $(patsubst parallel=%,%,$(filter parallel=%,$(DEB_BUILD_OPTIONS))) extraopts += -DBOOST_J=$(NUMJOBS) endif
ceph官方的 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 extraopts += -DWITH_OCF=ON -DWITH_LTTNG=ON extraopts += -DWITH_MGR_DASHBOARD_FRONTEND=OFF extraopts += -DWITH_PYTHON3=3 extraopts += -DWITH_CEPHFS_JAVA=ON extraopts += -DWITH_CEPHFS_SHELL=ON extraopts += -DWITH_SYSTEMD=ON -DCEPH_SYSTEMD_ENV_DIR=/etc/default extraopts += -DWITH_GRAFANA=ON extraopts += -DCMAKE_INSTALL_LIBDIR=/usr/lib extraopts += -DCMAKE_INSTALL_LIBEXECDIR=/usr/lib extraopts += -DCMAKE_INSTALL_SYSCONFDIR=/etc extraopts += -DCMAKE_INSTALL_SYSTEMD_SERVICEDIR=/lib/systemd/system ifneq (,$(filter parallel=%,$(DEB_BUILD_OPTIONS))) NUMJOBS = $(patsubst parallel=%,%,$(filter parallel=%,$(DEB_BUILD_OPTIONS))) extraopts += -DBOOST_J=$(NUMJOBS) endif
区别就是这个
1 extraopts += -DCMAKE_BUILD_TYPE=RelWithDebInfo
RelWithDebInfo: 既优化又能调试的版本
这个参数带来的效果是
1 -DCMAKE_CXX_FLAGS='-Wno-deprecated-copy -Wno-pessimizing-move' "
会带来这两个参数,以及一些其它的关闭,属于生产包需要加这个参数好一点
ceph官方是放在自己的cmake里面控制,但是deb打包的时候,有自己的这个参数,变量就被冲掉了,也就是没生效
1 https://github.com/ceph/ceph/pull/55500
官方现在改了,应该是解决了,但是ubuntu官方里面是直接在最上层就用参数去控制了,也就没有这个问题 修改cmake/modules/BuildRocksDB.cmake
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 endif() include(CheckCXXCompilerFlag) check_cxx_compiler_flag("-Wno-deprecated-copy" HAS_WARNING_DEPRECATED_COPY) set (rocksdb_CXX_FLAGS "${CMAKE_CXX_FLAGS} " ) if (HAS_WARNING_DEPRECATED_COPY) - set (rocksdb_CXX_FLAGS -Wno-deprecated-copy) + string(APPEND rocksdb_CXX_FLAGS " -Wno-deprecated-copy" ) endif() check_cxx_compiler_flag("-Wno-pessimizing-move" HAS_WARNING_PESSIMIZING_MOVE) if (HAS_WARNING_PESSIMIZING_MOVE) - set (rocksdb_CXX_FLAGS "${rocksdb_CXX_FLAGS} -Wno-pessimizing-move" ) + string(APPEND rocksdb_CXX_FLAGS " -Wno-pessimizing-move" ) endif() if (rocksdb_CXX_FLAGS) list(APPEND rocksdb_CMAKE_ARGS -DCMAKE_CXX_FLAGS='${rocksdb_CXX_FLAGS}' )
打包过程可以看到这两个参数加进去了没
1 2 3 [ 8%] Performing configure step for 'rocksdb_ext' cd /ceph/ceph-15.2.17/obj-x86_64-linux-gnu/src/rocksdb && /usr/bin/cmake -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DWITH_GFLAGS=OFF -DCMAKE_PREFIX_PATH= -DCMAKE_CXX_COMPILER=/usr/bin/c++ -DWITH_SNAPPY=TRUE -DWITH_LZ4=TRUE -DLZ4_INCLUDE_DIR=/usr/include -DLZ4_LIBRARIES=/usr/lib/x86_64-linux-gnu/liblz4.so -DWITH_ZLIB=TRUE -DPORTABLE=ON -DCMAKE_AR=/usr/bin/ar -DCMAKE_BUILD_TYPE=RelWithDebInfo -DFAIL_ON_WARNINGS=OFF -DUSE_RTTI=1 "-GUnix Makefiles" -DCMAKE_C_FLAGS=-Wno-stringop-truncation "-DCMAKE_CXX_FLAGS=' -Wno-deprecated-copy -Wno-pessimizing-move'" "-GUnix Makefiles" /ceph/ceph-15.2.17/src/rocksdb-- Build files have been written to: /ceph/ceph-15.2.17/obj-x86_64-linux-gnu/src/rocksdb
那么我们整体捋一捋
1 2 - ceph在代码里面加了参数,参数被打包过程冲掉了,引起了性能下降 - ubuntu在打包ceph里面加另外一个参数,让这两个参数生效了,所以打出来的包没有问题
大概就是这么回事,差不多就是发行版本出的包,没有按优化的版本打包,问题很小,影响还比较大,如果正好使用的就是这个ubuntu的官方包的话
性能测试 上面是说了这个问题的来源,我们来体验一下这个性能的区别
为了测试的准确性,构建了一个单机,单副本的环境,单个nvme的osd,在宿主机创建好osd之后,停止osd,然后把osd映射到docker里面进行手动启动
这样做的目的是,osd不变,减少变量,容器系统一致,只替换了ceph的包
容器启动方式
1 docker run -it --privileged=true -v /dev/:/dev/ -v /var/lib/ceph/:/var/lib/ceph -v /etc/ceph:/etc/ceph --network host ubuntu:focal /bin/bash --name ceph_deb
ceph官方包 1 2 root@lab103:/ ii ceph 15.2.17-1focal amd64 distributed storage and file system
手动启动osd
1 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
测试
1 rados -p data bench 30 write -b 4096
也可以用其它命令,小io4k的比较明显,时间不久,就都贴上来
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 [root@lab103 zp] hints = 1 Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096 for up to 30 seconds or 0 objects Object prefix: benchmark_data_lab103_235759 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 16 6811 6795 26.5416 26.543 0.00314961 0.00234824 2 15 13061 13046 25.4781 24.418 0.00280007 0.00244734 3 16 18804 18788 24.4609 22.4297 0.00179994 0.00255073 4 16 24430 24414 23.839 21.9766 0.0027436 0.00261775 5 16 29900 29884 23.3441 21.3672 0.00184661 0.00267347 6 16 35338 35322 22.9932 21.2422 0.00376749 0.00271445 7 16 40864 40848 22.7917 21.5859 0.0023982 0.00273855 8 16 46654 46638 22.7695 22.6172 0.00265023 0.0027412 9 15 52491 52476 22.7731 22.8047 0.00237885 0.00274086 10 16 58334 58318 22.7775 22.8203 0.0032622 0.00274042 11 16 64011 63995 22.7224 22.1758 0.00224359 0.00274685 12 16 69693 69677 22.6782 22.1953 0.00282742 0.00275244 13 16 75483 75467 22.6733 22.6172 0.0024294 0.00275293 14 16 81331 81315 22.6852 22.8438 0.00243682 0.00275157 15 16 87110 87094 22.6776 22.5742 0.00249853 0.00275236 16 16 92892 92876 22.6717 22.5859 0.00234551 0.00275325 17 16 98682 98666 22.6683 22.6172 0.00279748 0.00275354 18 16 104350 104334 22.6388 22.1406 0.00247821 0.00275716 19 16 110108 110092 22.6309 22.4922 0.00265833 0.00275813 2024-03-21T18:09:10.318468+0800 min lat: 0.000901059 max lat: 0.0108369 avg lat: 0.00275815 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 20 16 115907 115891 22.6318 22.6523 0.00308136 0.00275815 21 16 121632 121616 22.6188 22.3633 0.00107606 0.00275984 22 16 127334 127318 22.6029 22.2734 0.00307552 0.00276152 23 16 133012 132996 22.5844 22.1797 0.00268477 0.00276388 24 16 138652 138636 22.5612 22.0312 0.00284827 0.00276685 25 16 144273 144257 22.5369 21.957 0.0022021 0.00276986 26 16 149771 149755 22.496 21.4766 0.00333811 0.00277466 27 16 155250 155234 22.4553 21.4023 0.00249963 0.00277992 28 16 160745 160729 22.4199 21.4648 0.00245874 0.00278427 29 16 166247 166231 22.3878 21.4922 0.00295188 0.00278836 Total time run: 30.0023 Total writes made: 171762 Write size: 4096 Object size: 4096 Bandwidth (MB/sec): 22.3631 Stddev Bandwidth: 1.02477 Max bandwidth (MB/sec): 26.543 Min bandwidth (MB/sec): 21.2422 Average IOPS: 5724 Stddev IOPS: 262.341 Max IOPS: 6795 Min IOPS: 5438 Average Latency(s): 0.00279145 Stddev Latency(s): 0.000623496 Max latency(s): 0.0108369 Min latency(s): 0.000901059 Cleaning up (deleting benchmark objects) Removed 171762 objects Clean up completed and total clean up time :30.6685
ubuntu官方包 1 2 3 4 root@lab103:/ root@lab103:/ ii ceph 15.2.17-0ubuntu0.20.04.6 amd64 distributed storage and file system ii ceph-base 15.2.17-0ubuntu0.20.04.6 amd64 common ceph daemon libraries and management tools
手动启动osd
1 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
测试
1 rados -p data bench 30 write -b 4096
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 [root@lab103 zp] hints = 1 Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096 for up to 30 seconds or 0 objects Object prefix: benchmark_data_lab103_236362 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 16 15303 15287 59.7107 59.7148 0.00106137 0.00104294 2 16 30338 30322 59.2169 58.7305 0.000799582 0.00105216 3 16 46691 46675 60.7683 63.8789 0.00116935 0.00102532 4 16 62548 62532 61.0597 61.9414 0.000970205 0.00102051 5 16 78056 78040 60.9618 60.5781 0.000754982 0.00102213 6 16 93210 93194 60.6659 59.1953 0.000796767 0.00102717 7 16 108491 108475 60.5256 59.6914 0.000820318 0.0010296 8 15 123545 123530 60.31 58.8086 0.00117671 0.00103331 9 15 139028 139013 60.3281 60.4805 0.0010041 0.00103303 10 16 154628 154612 60.3878 60.9336 0.00106916 0.00103203 11 16 169869 169853 60.3095 59.5352 0.00100121 0.00103337 12 16 185143 185127 60.255 59.6641 0.000874392 0.00103434 13 16 200261 200245 60.162 59.0547 0.000937596 0.00103596 14 16 212138 212122 59.1782 46.3945 0.000740705 0.00105322 15 16 227092 227076 59.1267 58.4141 0.000774189 0.00105417 16 16 241651 241635 58.9853 56.8711 0.00140325 0.00105664 17 16 257762 257746 59.217 62.9336 0.000777715 0.00105251 18 15 273338 273323 59.3072 60.8477 0.000842773 0.00105092 19 16 288882 288866 59.3808 60.7148 0.000944547 0.00104963 2024-03-21T18:13:37.381819+0800 min lat: 0.000545632 max lat: 0.037882 avg lat: 0.00104838 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 20 16 304446 304430 59.4512 60.7969 0.000774379 0.00104838 21 16 320053 320037 59.5228 60.9648 0.000891544 0.00104713 22 16 335395 335379 59.5409 59.9297 0.00105081 0.00104681 23 16 351159 351143 59.6291 61.5781 0.000691308 0.00104527 24 15 365701 365686 59.5112 56.8086 0.000910524 0.00104736 25 16 381282 381266 59.5649 60.8594 0.000934152 0.00104642 26 16 396656 396640 59.5834 60.0547 0.000904033 0.00104609 27 15 411989 411974 59.5947 59.8984 0.00273626 0.00104583 28 16 424394 424378 59.1965 48.4531 0.000757671 0.00105296 29 15 435855 435840 58.6989 44.7734 0.00237303 0.00106191 Total time run: 30.0008 Total writes made: 449321 Write size: 4096 Object size: 4096 Bandwidth (MB/sec): 58.5038 Stddev Bandwidth: 4.48913 Max bandwidth (MB/sec): 63.8789 Min bandwidth (MB/sec): 44.7734 Average IOPS: 14976 Stddev IOPS: 1149.22 Max IOPS: 16353 Min IOPS: 11462 Average Latency(s): 0.00106545 Stddev Latency(s): 0.000482816 Max latency(s): 0.0431733 Min latency(s): 0.00047586 Cleaning up (deleting benchmark objects) Removed 449321 objects Clean up completed and total clean up time :22.541
一个iops 14976(58MB/s) - ubuntu的包 一个iops是5724(22MB/s) - ceph的包
可以看到差距还是很明显的
这个如果正好碰到了,可以按ubuntu这个方式改下debian的rule,或者把ceph的pr弄进去重新打包即可,具体验证有没有问题,可以通过检查下打包的参数或者直接单机验证下性能也可以