Reduce redundant call of prepareClientToWrite when call addReply* continuously. #670

lipzhu · 2024-06-19T06:37:53Z

Description

While exploring hotspots with profiling some benchmark workloads, we noticed the high cycles ratio of prepareClientToWrite, taking about 9% of the CPU of smembers, lrange commands. After deep dive the code logic, we thought we can gain the performance by reducing the redundant call of prepareClientToWrite when call addReply* continuously.

For example: In https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L1080-L1082, prepareClientToWrite is called three times in a row.

Profiling of Hotspots

Test Environment

OPERATING SYSTEM: CentOS Stream 9
Kernel: 6.2.0
PROCESSOR: Intel Xeon Platinum 8380
Server and Client in same socket.

Performance Result

Perf Boost

Test Name	Perf Boost
memtier_benchmark-1key-list-100-elements-lrange-all-elements.yml	3.5%
memtier_benchmark-1key-list-1K-elements-lrange-all-elements.yml	13.1%
memtier_benchmark-1key-set-100-elements-smembers.yml	4.9%
memtier_benchmark-1key-list-1K-elements-lrange-all-elements.yml	10.0%

Test Scenario

1. Start server

taskset -c 0-3 ~/valkey/src/valkey-server /tmp/valkey.conf

port 9001
bind * -::*
daemonize no
protected-mode no
save ""

2. Prepare test data

valkey-cli -h 127.0.0.1 -p 9001 "SADD" "set:1000" "tocawprsxz" "csqcfxyzsy" "ipubipttcb" "jqurtadjro" "zdulfflfqx" "bbfvuialin" "ifpfyncdfe" "kjeevccyof" "stttkrbfqs" "qatkvfuttq" "ltvfnuuwil" "znartcywze" "bzursuzuei" "jlghqxtvok" "osncqcuest" "uxvibjduto" "giubyhvaav" "joijmgposs" "lqxpnplleo" "bnatichltp" "nnfxoqebys" "lyhqvxolfw" "caaqjozcqh" "nlqtadqinl" "hfjxrrsszf" "fussukqrph" "cbjovvgqqy" "wcpbfslakk" "slskrnekbc" "nvonwipkta" "jhdcicttmm" "rpyroriegs" "lnuuootxmp" "ufdxqlonpg" "qgasrnjrld" "jhocasnttw" "smwbxeqbed" "kapxglqccs" "hhwvslfxmf" "rbdabbmnrf" "ltgidziwzm" "qpffifpdiz" "sadqcfniau" "bqoabrqwwj" "wghyakzbit" "bnxtlmiwup" "akzsgaeqon" "zwlhpcahwu" "kfselnpkim" "dxtzxeguoi" "roczxpuisd" "myzjxvtvjh" "alfftdxaxc" "vyiqkfoics" "dygkzcpakt" "ukprrucywq" "afzlyodwiz" "vdaebbupfe" "wemmvswznk" "xzbqjpzqlm" "lqqivzlppd" "rrzcqyzdzf" "ncckxlmsvg" "cpjveufsvk" "babfjxxabw" "btcvhacldb" "mqqrgbacfa" "eqaxrccwjq" "erahoeivfw" "omeatkwwtc" "mjwrbndexu" "gktcjcfxbb" "tfonhwnuxj" "pheajlhymx" "vefgwelnfo" "gayysuldha" "tqpqihwjtl" "eirhwkdgfq" "rnfodijavx" "erqgyscser" "nnnxouavyp" "yhejmjwwni" "mwmcwqzbld" "ofcurtthcs" "idmjjbjvni" "ovqohpxjft" "ocoflktdhp" "kgopxvsdah" "pyjpxqnavq" "nubsytpfao" "ddpgrvwowd" "glynpmsjcf" "whsxmqffqg" "sstqpivwip" "cqfnhujrbj" "gsvkmnluiz" "zdmgjjyukl" "gcfcbjybkx" "qmhyoyctod" "kdodndexvr" "tkgihmsrha" "kmifjielrw" "gefoharnza" "tcpwkimype" "nxllkzroin" "bpvbnmpekh" "ljinllovsw" "mugdxqnjxj" "tqqmmvwact" "uidvmrqyjd" "dthtfrqkce" "efhynoxlul" "iosqxoobrk" "sujbwndgwl" "btxehrokkw" "pmaagvqldo" "evuxmkrrfl" "dclualrzqb" "jfqxkxgqhj" "fvemodlpgz" "lawrpikwsk" "socoxaegfa" "snomfrutha" "yxsnreuepl" "vjihaakiof" "nnhrgirrtw" "jccorylnjg" "gehuriygwq" "icqjxcermo" "ocgjeuljxf" "qslrwqmixc" "rhzpguhsws" "zxlbhyeckf" "iziwqojsoq" "qlorevyltp" "gbjzsedhag" "mkxysrkpug" "bhrvnadcdk" "qxxinxaqxn" "ctnaggdbru" "fsthobmdxk" "cvnnitrrow" "vlhtdpqavh" "vhjaphfdpj" "yhdgqenmwv" "ysntbzffxq" "emfjcnujqn" "qnqzibcmip" "ngcxqjjpdm" "gkneclxnnt" "rhkpfsuhoq" "dgodkfjzos" "isqymcuffe" "ripecixnpr" "dxpepbctea" "gbeizdzdyb" "aqlapmghln" "yhlalzusch" "xglrugpjkt" "ngwifjdpha" "jvekvvldai" "hmdzsuuyrn" "ncabqesziv" "whdftyqojv" "rhzqdtxucc" "ftyxhyfokj" "vqtixjkcbb" "krfosgrmjb" "ahcaaodvgi" "ooeswhfdnj" "jhctncrzlw" "haxesjafmh" "vxrtzngznb" "fidsuuizcf" "mamtueyrqn" "quemrlmwod" "pkgpjwyfbh" "ckibsdtfff" "tjnjhejnju" "puvgjfjyaf" "cvmicoarvv" "mxpzuzrzuo" "rrrfhnclbv" "xeurpmfdmo" "yqvkykgjbe" "behdxlfdho" "dyzedskzkq" "rfhlttsuqy" "pkehotsmka" "alokvrpbih" "mobwpcyxuk" "umwunfzsvo" "naggqdxcjm" "rakustfykw" "dtkfydidli" "kohpozxkhl" "usjmfkopln" "axhoxkubdv" "asretszbav" "tmkoxwdgpx" "wjhaavxfge" "pcuaesomdc" "vjhpmffzxc" "qwxzqlqter" "jjumoixniz" "ruxsmttpak" "pjdundsxrd" "kdklhpxntt" "muhewfzihs" "dplonqlliz" "wjibkklezg" "dfemamyevk" "nryvfijxhj" "bqqohkuylc" "wiqhmhkiel" "lftmqoxhfc" "sjbaedopjb" "dlomhvkoxg" "jhkdwtqvwl" "vqashxkrik" "mupcilqfjg" "suahxaebee" "rqkcyxiwhz" "jqgtbgbybq" "ygbfgfefac" "kjblkrvknt" "yajpmxmuwz" "wwowdvybjj" "bdtbaxnuko" "adfhfatarh" "vfcpevtekf" "fiugzrozky" "spogjykkfs" "tdggmsxysk" "aoqlctikzg" "nwywtydqew" "qjrhtqgwjc" "dhzgpwewsx" "outdlyeqvq" "trwzipsers" "qtpcwuafar" "scgjdkyetq" "aqyfvxyjqr" "xkvgnzjgrm" "hhbceuegvh" "paitaeqrpb" "yfdsmhtria" "bxwvqvndcc" "dpyjoihqrs" "tnratexlre" "hplvvuoscb" "muocqqypmt" "pxzkuasjek" "flrsaczxzc" "pubqtzzzko" "vpqlxtfkjz" "fiafoggekm" "qtewhixedb" "iijjcabgak" "tqjpijliii" "uttazeawix" "hxbmykkugi" "bekchebgys" "ffrviosqzo" "rjrptuhkav" "sldzewoxas" "uesalivsis" "maxylirjgh" "vpzsmbjkvy" "eiziligjfr" "tqblforkpa" "nszbrpweoz" "rzanpefsfy" "cejkfhuykf" "abinkgshoi" "gqybtjuhvq" "oqdlpaubsc" "nrbfkysxaf" "mhxojehvxx" "vuqlqdpfdn" "orqqevpmca" "xigznrdgqy" "jzceexkqam" "szupcnvvij" "btgeubdzbb" "nojnedgabk" "sdnkjddyut" "lbjarnpxhh" "wevfinjbqk" "dvgqwzignk" "ejzwnidqwr" "nlxwjmzwln" "brrlblrxwa" "hyikggurti" "wybmlpqblt" "hertbwuzyw" "rwhzzytdsq" "symbgeyple" "zbfeyptemz" "pghbwbtfmk" "mxydilgynv" "bhwytqsafu" "ecsburyjhh" "cvohdragtx" "lscjhgztom" "giswndixdf" "etsngvbrff" "lgqazzajpx" "pypepewjvq" "nswjopvtqv" "tuajnnqtcq" "bvvoibkfrt" "kjqeujfkoh" "diwmfuckel" "bwizktcwmb" "ughnpilqqm" "ihealvwnxb" "thqttakyzy" "auwfujaoya" "rofnkytnhm" "ilkuddrdvh" "hmwfncgzxg" "pzrchtwaaw" "ffksbrtbfq" "ethxaycsil" "uwiqrvcqvu" "fgcehqgsso" "yoblelzlkd" "gjiwldcfqh" "sbrjnwxdip" "nenhiiibwx" "ebhhhgabjd" "xpkwqbfban" "pupmdjgyed" "aejnvyfdst" "krxneqolle" "nouncgkoik" "kamgfgbxel" "fffylsswky" "agswwrfabr" "pkvcbelpos" "mxapzqqqsw" "ywmqoaztmy" "sfuvzzxbxq" "kdcvbkrbsj" "twpiiaedpc" "egmgddriry" "nmfihtnkel" "kqzjnkdlxd" "eovsizpcjp" "bsavjyaksg" "xlmvatfsly" "dlhjfafskj" "wmvhvwnowp" "vjjozwrovk" "gbazuqnmit" "ubwlcefgqb" "jttqzbazgz" "dozecfsvue" "pgdhjrxhga" "gzekysdunp" "ygoiannoht" "hklchdenoe" "sotbjzlsvz" "qjwrnhooax" "cdghgcsoth" "mjlpvuoghe" "qclkaeciey" "oownjpxrov" "nvqfyljbef" "tsnawydcru" "wrrgxxkxkc" "ylulwsnjay" "lxsinouutc" "ozpyyaznsh" "cmhkstsjok" "ybckvbeoib" "fsoardckcw" "ltkauvxggz" "sqwhsgboef" "wgtjxahmef" "spoqshzjoi" "pfvfxrrfhl" "nahweurftw" "fojjpqmbck" "zexblqeora" "qsoiwsugdv" "ksppwhhqzj" "otadcihtmd" "imnjbkmsls" "zzenkvuesw" "kbfqdppnfa" "igehetokzq" "koujdppfua" "wqsqzzbqhm" "tglieutcis" "owovlhorvw" "nraylduhut" "nwnyjkugcf" "kpfqxroqbs" "xwxwosqkhm" "ollacusjzj" "wcouaiatsu" "nvkfnfzoki" "fgjnsosfrp" "pltsnzqvpi" "rhnkzlsjtk" "ysnndkycix" "bpnfopinub" "blujwnyluy" "wgtmckqknh" "zorzyqtjtr" "hvtlkrungk" "rgtondctpo" "mjgvtydjtm" "kcbotffyca" "gybxnvwchp" "gazojexans" "hmcpcrjumm" "zejhycldyy" "iiissmznfe" "qvpuudyuks" "gviypfayfm" "plqbwsiuzw" "nunchscyqc" "qocjpufxio" "iqbyikqjmx" "omwbgglqsp" "nywteueaig" "ntmgbzaivy" "ijdgnlzprg" "rnlaakgsrf" "fpdflprzvn" "azkdbpnshy" "mvfnirshbd" "sotsxznskx" "uzktwqcdeb" "myrrmvflyw" "jgaieawkcu" "utymwhxigo" "vtaiyncmyg" "gpodilvrnm" "xgfzndhodu" "saqilljaid" "jxiewthqls" "nbwksmwxpx" "rwfykeeqgx" "tlnkrncpwi" "ogyvxbgcwi" "ffcqkkzllx" "rtnhivnxtb" "vzcclamtun" "jjlefkekuw" "xjksnqifds" "ctusqixohm" "osaekeukqx" "irlduoinie" "nifzrybfuh" "ctqxoyxbwc" "vsvhjrymqc" "bzwxqcpftf" "ltghdkluqq" "vklwhyzqhk" "ghwcrdlbjj" "lzzptujbjp" "qlvgfplbod" "ghepftfjgk" "aiqqyusnuv" "rspghuhpbp" "lfkqrtxocm" "iibgagtkpg" "ywiurvfbpg" "tdceweesxh" "pvwvdaorrl" "ejlunxlwxn" "ymqxhmnidz" "lydebbpmfb" "ztjuqomjck" "eyrbqexkff" "oqmuhlruqy" "gnrmnwaxls" "mumhqarhgg" "skbzfbeziu" "hnnfmyurhx" "yrsizkbbwz" "azpwrzovza" "txhllnvudv" "aslibwggrp" "ubghghklvj" "jqqogagqni" "emfqsjraia" "ctgwmawlgl" "mivctgaajt" "knycrcrsbm" "ubtiscdgrn" "ulepgommyy" "qbhdjhoohc" "cctlfgicpv" "phfuspevwk" "oeawjlqnyg" "jpphbjtbrh" "ofykgotycd" "csjfbpjyzq" "thmmmlqluk" "buzhjxsbkm" "pisgqibyae" "skkawcmqqt" "mmqblvrscy" "dpkiubfzbx" "yivxcecwlp" "kbnjiilaqd" "rwrxxrnwtq" "veegnotgmj" "pbfijwccjp" "expefhkisx" "ynnhyctikq" "bhfmhanvxe" "otclvmbilg" "hskkmrluuf" "ftnbjymlll" "nbkaxrojqq" "qydrgilxxt" "dxufcyurjx" "fgygwdazbm" "tivnqailcl" "jwvqixjhho" "oglqutqfcx" "wvrlxfoxff" "ropuqidkxv" "qcsxjrjcfc" "twuvkpjzzw" "fqtktfghcv" "suhwnartid" "wvsnfinuil" "rngtndwjyg" "tsmzfswaxo" "uvlswctlhx" "llamjvxyqo" "wovoupawzt" "caxgjftjyj" "gwzqcetcji" "yzrdbalexf" "fnpdsuozxt" "dbtbtvkqss" "pwgjoppmgc" "wxjdgbugeu" "qchpfcigwa" "lxzdcbveuy" "bwjyghaztz" "uedehyieof" "pfaytznuaa" "lspvrnxnjo" "zkbqvttlzy" "fkdmuxraqf" "nbizrabfuo" "fgzwwaedjy" "gkmwutvars" "bwsdzrxzse" "txgjxzovte" "cbtpbbfrdd" "vqgztpmzhz" "rdipvyeqoi" "bovkdabcdo" "fhobhpwwkp" "mkbkflixkr" "mjifqzmtsd" "pkcqdokojd" "dtgjnddwch" "uboipezuni" "dfdodbelzn" "fzsoiryhfn" "krtsiucvvu" "aieekmivcb" "aeafusfzdn" "ehnrizfmfo" "dcjlwhfstw" "wksgvbkbyw" "hvfprkjlbc" "jlgepeyhpc" "ljklggibcy" "mhrvuemywb" "wdqygrxkya" "ystnkbogee" "flvftlpbjq" "vgfgbsbnwy" "rsivptwulz" "bzjzucrypq" "bweysooxiv" "mmcunsiwad" "mszjkgsrio" "bvurseeqmh" "wtcpliaxmk" "ndwiompimr" "mdcwoblmkl" "dflxukffgl" "mcojdazpfq" "tctgzmjads" "dewdgfrhos" "iwqanwtvcd" "nfucelqjfe" "wgtrwefdsw" "skstqdgbos" "rwllkdzxrj" "qwozutlufu" "fmpdixcckx" "jybzltmwrs" "ossjrvqmaa" "adlxahxsbq" "mbewprqunw" "xbvbujurqw" "rnvhfxbuoi" "pyrpwxalpc" "adlryhdbpr" "gritvkzfgw" "aufhfrhccf" "umoicweaab" "kgirldeylz" "nknlysgviv" "plbxaamppj" "ikpikupjoi" "eioxaswdee" "imexfccbxk" "ouroipthpq" "jbzyfznpdn" "asidljmwgb" "jeazfmhrcb" "dablvesuho" "zuoqjiciij" "qmxxfyuodo" "vkqalcokst" "jhibapuhga" "cmqraybrlw" "beqsnrixhl" "rmqxtqcxua" "ndltyojjxj" "hyanpicfan" "yzutuazhmh" "tumnalubch" "eksvvoxziw" "weqhfkosif" "wwfbpjatrp" "lrhrkuyzry" "uvbtcgtopw" "fmyleefltp" "kkrxiaiife" "gbkqhfumyu" "tdmjyuitvv" "jvtalmlkng" "rdsfcdvkqz" "xqvjnlpssl" "fuftndsnim" "keklddczkd" "wrqnytptzm" "rwzijctxzs" "btakuczlec" "fuipidfbjt" "kjiqagynco" "ahjawbsqcw" "iehxaaneev" "ezbiwqnabg" "pnnzqcutoq" "wlogkzxkpo" "xzswnnldvs" "qqfnugftmr" "zuccleayil" "ckqebhazel" "brwlqbfoat" "anmcogawkg" "roqzbzpbbt" "dxnprfawun" "fffreqppjj" "gfdzgxfdcg" "sshbuxfljd" "shckmujxzo" "rqurawzebz" "vpehhmoxva" "vldwfdnicm" "tzhjrlfvfp" "ymwwctfodg" "qsxfnsicrx" "gfhrrjczsp" "gtqrsktbaa" "dniplpxfof" "htawohddyn" "dbcxnsiacw" "dhfundvlpn" "uewpgskfpu" "cuuytorpnp" "vlcnbfqvox" "jbqibabrmv" "xhspgwheck" "fsuovvpgng" "gcjruttnno" "wxswusqpeo" "qhhhipzncq" "mcbuftndrr" "owjfgjqqjc" "vvmkjgajwa" "wvlvshnhmx" "ekponflaeq" "kuiumwomxi" "aoydkdfrpe" "cglxptkcsz" "uqbpcvkipa" "ubzgvzputq" "wmyphdckda" "ukdnaklmcp" "ramoirrdyd" "vwayaqmtid" "ltomuspfzc" "wzxdkpehwf" "yzcspfvcot" "cgpvvnbvlk" "farwqgfyjf" "lbxvlwzony" "ocesqguvym" "yzviqaobku" "cnngbbpowp" "ucxeoqcssr" "zcffhzusrl" "yzmodbpsnb" "aryiyaltqw" "xkaailrpns" "lpahctqgna" "cnbqnvxmjp" "nugjvhftma" "xsgcuvxzor" "xwtwtwmbgu" "emdwpvauyc" "ahfktrqmgh" "jznackjcrd" "etcsjxoqab" "kpzmuwqbnt" "dspznsgszk" "rcwbzvwbva" "mlznoaajqq" "iwuuxdactm" "zujobawsct" "snepgcispg" "cgmivhyskk" "snunzlgfkd" "ppdxnadmje" "wtzqqecgfy" "ncremxgfdb" "cblsafugqk" "hjekcxfyds" "faxedqgskm" "jjczogqdwz" "jfbgmhtjke" "nehqnkqnld" "lcdchjadll" "llimzyabsp" "iwapedwyle" "iobkwbwceu" "twmbtaxdro" "nmtmjmhmdl" "ewoqykjbkc" "tmyuncyoyd" "dcepfcdddn" "dnvwyhyhsn" "nrencopzqn" "yjyffpgoop" "uvqtefqdhk" "yjhypaonqq" "uqvzpcvugl" "cakvxrdpmj" "tvzacklhdz" "higdkhodzy" "ormdblyhhn" "wbouqpojzl" "eyhgspybnr" "lywsezpzgf" "usykkwszvh" "bcwncpnibg" "jgcqryhsvk" "yfvwesgulw" "geizujxrkg" "zknlteeaxq" "nqwjivcosg" "qmnxipsiga" "pthacnunjj" "afamsavgsi" "bzfzxzecrs" "sxcihybfci" "padscbypdo" "gaotvjctjh" "beicnwdryg" "xsueeljljp" "mkrrypcfzy" "ekjgqnjxyl" "iyeiercbxr" "rkwlgzhvvy" "hmnaoeeasz" "aquymkrswt" "ulnnuwyptq" "xftfzsoiwc" "urkkyscfti" "wabroeeoop" "qpzkuxsipr" "dxdngrmypg" "icatrrbcjs" "fhuptkhkzm" "apyzwvajot" "vealtjlqyc" "khkkfmzkow" "trzqdcaqdw" "itmekixthv" "pudgkcbwdx" "zuibhuihtz" "kzuywkxlku" "ogtqmpnzie" "jetamrlglx" "fjdjumschq" "kprzbyngsw" "xeyxlxiqch" "dtuhvpszzt" "fpmbbgiaao" "hjlhurakwh" "mshexjmkmn" "cynhehkcxs" "cvbbbdzmie" "cvnlzjdfgf" "ifhkjgmxrd" "audguegpmo" "jzstgleeby" "eafrzhdhhq" "fmmammvdyj" "uncqdpbhwb" "fzatoyblsr" "xtwlklqdna" "ydqppngxvh" "mkngszsxeu" "vyewicgjio" "tstbluhyhj" "qzxxwlfeki" "ocmtsfpsgh" "xmknbbmdbf" "pdjmftsmob" "ygrpkpstxq" "hrhiqcarju" "aadzbodres" "curhymvwsx" "tbqidtevrl" "avchkjnlwm" "tyephutkmb" "lxoaezrdxs" "ctkwlhmgfz" "xkiuuciwrn" "irrovfyshb" "hwuofuftlr" "mhbfsuaovv" "wzuhzzdezi" "jlpobgvouj" "qbpmtomqpu" "shlwywnxpk" "srkvjhetmj" "hvxefqtmqu" "fazsvkljef" "bstezdkmig" "asbtvfzien" "vewfxcxkpf" "tqkprkoixe" "rcaatkjyur" "euleuicawb" "ifiizdeong" "cjcrpmggtu" "kxggjpatkd" "klwqsggtob" "mnsaklzgob" "xfxlervrgn" "eraxdyjftw" "xrvonyieqa" "fswhywqxhy" "iqzxblqkeo" "rxvhmzvbcv" "wvdmobfisx" "ujybghjfnf" "yufagalzhk" "qxbqbfcgjp" "vorgqhmaoq" "zewylkylsy" "vvmaucizkv" "bgcoyoduda" "vnsufnurol" "rtskokvklv" "svvdufedug" "qgdgujdvtg" "rjrtvpntke" "shgetgsird" "ywgeotcect" "zsikdzycyt" "gcsswbksnc" "qgobfhgspy" "pbxrbaxnor" "viwarrumob" "eaetplspga" "jqmscuprwq" "nkyuframnm" "gygftrsdbm" "qzlfnntjar" "fzzcioobeb" "ydigxptqbl" "bgtxhxkhvv" "hggqmlgwha" "ywlqbjqeug" "qwowxqzrkz" "zybosgbtxt" "cflarkquuv" "klaeknlbrm" "ccnbldglgl" "dpauqcpgyi" "ylxiwiesps" "xyxmlrdbui" "arqfxfqkzh" "byrkeibrfb" "laepwenqmc" "kluswgtjsf" "mgldvzleyy" "yqmzmmzwpd" "tvlckdoyfe" "dmxcbvzrxg" "qquwyuyvvw" "pmytvtksfi" "umttshfkpk" "rmdayyptch" "glwrmjpotx" "bgcnzgcmza" "ivinvxopgz" "dmbarohbfj" "rncdgqxqfq" "zmmwzkjrjl" "gdlztbhpeq" "zrwgpknaop" "powzkcrtvv" "cszvzbrmoy" "dtjljhzqcm" "anznywecwk" "amuwlfaxwv" "ajdkqflpen" "evjrybtwww" "oxsdmrdbit" "yafipxfsip" "xekxarmwcq" "dgcesswkvc" "gdqgmwxkmt" "spdyueanru" "yrvmdhnnfc" "aexxjlgwuo" "xpcpytommm" "gjutzwoxlf" "stnfirydgi" "snpuvnebpy" "rfxibyjmpg" "ortxlvmdoc" "gdozstnglr" "eqiukbyscu" "qzcrpbvatq" "dwzqowbrsd" "iesbitdnjd" "inboyxgoqa" "lfojnetxdc" "njmufqrykx" "ybcdthmgws" "igwekdegcw" "ajkgxmtamu" "qkyfpamste" "nwybjbhgep" "arqqmfmmbz" "rqiyxwpuyv" "nsdvirehqh" "qckueiqiwh" "tjnbsybxws" "jphvxuqipp" "ghtoyhrfxh" "erglflfnql" "kngwkkzwts" "nmguhcygct" "jigyicdeft" "gamcdtywne" "nunpqugdit" "ghqwxaqlef" "nqxdrqigvf" "xepfvvcovk" "ezgxjiwwig" "izizuzzjuv" "mallnshtok" "tctrsxgnrc" "exhjfssojj" "yilvzcevlj" "nepxmyiuhr" "dqqfcdugde" "iamjlqlznh" "mvmsikqfxu" "kmqlwfbsex" "pribqncfuf" "zavrjnezrf" "kmcwshsbye" "uzaejrbwue" "olezxlliej" "hjjxyybxiv"

3. Run benchmark

numactl --cpunodebind 1 memtier_benchmark -s 127.0.0.1 -p 9001 --command="SMEMBERS set:1000"  --hide-histogram --test-time 180

codecov · 2024-06-19T06:50:43Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.20%. Comparing base (5a51bf5) to head (f57e7d3).
Report is 12 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #670      +/-   ##
============================================
+ Coverage     70.17%   70.20%   +0.02%     
============================================
  Files           110      110              
  Lines         60077    60100      +23     
============================================
+ Hits          42160    42194      +34     
+ Misses        17917    17906      -11

Files	Coverage Δ
src/networking.c	`85.29% <100.00%> (-0.09%)`	⬇️
src/server.h	`100.00% <ø> (ø)`

... and 14 files with indirect coverage changes

lipzhu · 2024-06-21T01:36:27Z

Ping @valkey-io/core-team, could you help to take a look, is this optimization sensible?

madolson · 2024-06-21T02:38:20Z

Having a couple of weird optimizations that are dangerous to get wrong (we won't install the correct handler) feels weird. Do we know why that specific function is taking 7% of the CPU? Is there some cache thrashing going on maybe we can fix more generically? I read through the code in prepareClientToWrite and it seemed fairly straight forward.

PingXie · 2024-06-21T03:39:42Z

QPS can increase by up to ~7%.

Is this just an estimate or measured using the same tests? Can you share the row test numbers before and after the change?

Do we know why that specific function is taking 7% of the CPU?

+1. prepareClientToWrite doesn't involve any system calls. Do you mind sharing cache misses metrics (for both data and instruction) for the tests?

src/module.c

lipzhu · 2024-06-21T06:47:37Z

@madolson @PingXie @zuiderkwast Thanks for your review on this.

Having a couple of weird optimizations that are dangerous to get wrong (we won't install the correct handler) feels weird

@madolson I did some check, the write handler should only be installed in the first call of prepareClientToWrite. Am I missing anything?

Do we know why that specific function is taking 7% of the CPU?

Maybe caused by L1 cache miss(see below profiling result).

QPS can increase by up to ~7%.

Is this just an estimate or measured using the same tests? Can you share the row test numbers before and after the change?

@PingXie The 7% improvement is based on this test-suite https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1key-set-1K-elements-smembers.yml.

Do you mind sharing cache misses metrics (for both data and instruction) for the tests?

I did some profiling for cache missing
perf record -p 'pidof valkey-server' -e mem_load_retired.l1_miss,mem_load_retired.l2_miss,mem_load_retired.l3_miss sleep 10

We can find the the getClientType (inlined) and its associated symbols contribute more than 17% L1 cache miss, but didn't observe the prepareClientToWrite.

madolson · 2024-06-21T16:17:00Z

I did some check, the write handler should only be installed in the first call of prepareClientToWrite. Am I missing anything?

Your code looks correct. However there are other cases, such as hitting the CoB limit, that will cause your implementation to behavior slightly differently.

madolson · 2024-06-21T16:54:22Z

We can find the the getClientType (inlined) and its associated symbols contribute more than 17% L1 cache miss, but didn't observe the prepareClientToWrite.

This is interesting. I don't have a dedicated bare metal server to test on, but we are just calling this function a lot.

madolson · 2024-06-21T17:41:07Z

Some other observations while looking at the code more:

putClientInPendingWriteQueue doesn't need to be called every time. It could just be called at the end of every command processed. We just need to set a flag that we need to do it.
We do a bunch of tests that are basically immutable for the entire duration of the command. CLIENT_MASTER, CLIENTER_REPLY_OFF, CLIENT_SCRIPT, CLIENT_MODULE never change. Maybe we refactor this into a single flag we check. It should give solid branch prediction while in the inner loop of the command.
~~We check c->conn, which is in a different cache line. This is also an immutable property and we should only need to check this once. Maybe we just add a flag for this?~~ This shouldn't be an issue

I wonder if we try to optimize the codepath first?

…tinuously. Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com> Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>

lipzhu · 2024-06-22T14:07:50Z

@madolson Thanks for your suggestions.

Some other observations while looking at the code more:

putClientInPendingWriteQueue doesn't need to be called every time. It could just be called at the end of every command processed. We just need to set a flag that we need to do it.

You mean we cloud move below logic outside prepareClientToWrite?

if (!clientHasPendingReplies(c) && io_threads_op == IO_THREADS_OP_IDLE) putClientInPendingWriteQueue(c);

We do a bunch of tests that are basically immutable for the entire duration of the command. CLIENT_MASTER, CLIENTER_REPLY_OFF, CLIENT_SCRIPT, CLIENT_MODULE never change. Maybe we refactor this into a single flag we check. It should give solid branch prediction while in the inner loop of the command.

Could you be more specific?

We check c->conn, which is in a different cache line. This is also an immutable property and we should only need to check this once. Maybe we just add a flag for this?

~~Got it.~~ @madolson The c->conn is in the same cache line with c->flags, in the prepareClientToWrite, the memory latency should be ignorable. Am I misunderstanding?

I wonder if we try to optimize the codepath first?

Do you want to combine them together with this patch or separate?

PingXie · 2024-06-22T17:32:54Z

We can find the the getClientType (inlined) and its associated symbols contribute more than 17% L1 cache miss, but didn't observe the prepareClientToWrite.

I assume this is the data cache misses?

Some other observations while looking at the code more:

putClientInPendingWriteQueue doesn't need to be called every time. It could just be called at the end of every command processed. We just need to set a flag that we need to do it.

We do a bunch of tests that are basically immutable for the entire duration of the command. CLIENT_MASTER, CLIENTER_REPLY_OFF, CLIENT_SCRIPT, CLIENT_MODULE never change. Maybe we refactor this into a single flag we check. It should give solid branch prediction while in the inner loop of the command.

We check c->conn, which is in a different cache line. This is also an immutable property and we should only need to check this once. Maybe we just add a flag for this?

I wonder if we try to optimize the codepath first?

+1 on 1) and 3) but not sure about 2). flags should be hot in L1 d-cache so is the idea here about reducing the instruction count (via a combined flag), @madolson?

PingXie

LGTM. Thanks @lipzhu!

Can you confirm that you get the expected performance boost with these changes? and how much is that?

lipzhu · 2024-06-23T04:48:42Z

We can find the the getClientType (inlined) and its associated symbols contribute more than 17% L1 cache miss, but didn't observe the prepareClientToWrite.

I assume this is the data cache misses?

Yes.

LGTM. Thanks @lipzhu!

Can you confirm that you get the expected performance boost with these changes? and how much is that?

@PingXie I test 4 test suites which may benefit from this patch, they are all from https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites repo.

Test Name	Perf Boost
memtier_benchmark-1key-list-100-elements-lrange-all-elements.yml	3.5%
memtier_benchmark-1key-list-1K-elements-lrange-all-elements.yml	13.1%
memtier_benchmark-1key-set-100-elements-smembers.yml	4.9%
memtier_benchmark-1key-list-1K-elements-lrange-all-elements.yml	10.0%

madolson · 2024-06-24T17:45:43Z

Do you want to combine them together with this patch or separate?

This change is much simpler, so I'm OK merging this if we are still seeing a good performance boost.

src/networking.c

madolson · 2024-06-24T18:14:55Z

flags should be hot in L1 d-cache so is the idea here about reducing the instruction count (via a combined flag), @madolson?

Yes. I disassembled it for my x86 machine, and there were a lot of conditional jumps which might have been causing some issues on the instruction cache. I'm surprised there wasn't a better way to optimize it though.

@madolson The c->conn is in the same cache line with c->flags, in the prepareClientToWrite, the memory latency should be ignorable. Am I misunderstanding?

You're right, I glanced at it briefly and did the math wrong. This shouldn't be a concern.

lipzhu · 2024-06-25T00:43:51Z

Do you want to combine them together with this patch or separate?

This change is much simpler, so I'm OK merging this if we are still seeing a good performance boost.

@madolson , Thanks for you help, I already updated the performance boost info in top comments. For the suggestion listed in #687, I‘d like to search a benchmark test suite later which can help us to prove it really works. What do you think?

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com> Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>

madolson · 2024-06-25T01:27:14Z

For the suggestion listed in #687, I‘d like to search a benchmark test suite later which can help us to prove it really works. What do you think?

I'm honestly okay just dropping that issue. I prefer an approach where we start with a benchmark we think is useful, and then try to optimize that case, like you did here. My suggestions were more to see if there was a way to have a more general optimization that might help the case you brought up.

zuiderkwast · 2024-06-25T16:19:56Z

Worth mentioning in the release notes as one of the performance optimizations?

madolson · 2024-06-25T17:15:54Z

Worth mentioning in the release notes as one of the performance optimizations?

There are always going to be a lot of little performance optimizations. In Redis we usually only listed large ones (maybe this counts as large?).

I guess it's probably better to mark it and then we can decide later to remove them.

…tinuously (#13412) This PR is based on the commits from PR valkey-io/valkey#670. ## Description While exploring hotspots with profiling some benchmark workloads, we noticed the high cycles ratio of `prepareClientToWrite`, taking about 9% of the CPU of `smembers`, `lrange` commands. After deep dive the code logic, we thought we can gain the performance by reducing the redundant call of `prepareClientToWrite` when call addReply* continuously. For example: In https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L1080-L1082, `prepareClientToWrite` is called three times in a row. --------- Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com> Co-authored-by: Lipeng Zhu <lipeng.zhu@intel.com> Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>

…tinuously (redis#13412) This PR is based on the commits from PR valkey-io/valkey#670. ## Description While exploring hotspots with profiling some benchmark workloads, we noticed the high cycles ratio of `prepareClientToWrite`, taking about 9% of the CPU of `smembers`, `lrange` commands. After deep dive the code logic, we thought we can gain the performance by reducing the redundant call of `prepareClientToWrite` when call addReply* continuously. For example: In https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L1080-L1082, `prepareClientToWrite` is called three times in a row. --------- Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com> Co-authored-by: Lipeng Zhu <lipeng.zhu@intel.com> Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>

lipzhu force-pushed the redundant_prepareClientToWrite branch from 3ec1437 to 4549e46 Compare June 19, 2024 06:41

zuiderkwast reviewed Jun 21, 2024

View reviewed changes

src/module.c Outdated Show resolved Hide resolved

lipzhu force-pushed the redundant_prepareClientToWrite branch 2 times, most recently from 9b5d3a8 to b7b83bd Compare June 22, 2024 03:03

Reduce redundant call of prepareClientToWrite when call addReply* con…

b7b83bd

…tinuously. Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com> Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>

PingXie approved these changes Jun 22, 2024

View reviewed changes

lipzhu mentioned this pull request Jun 24, 2024

Optimize code path of prepreClientToWrite. #687

Open

madolson reviewed Jun 24, 2024

View reviewed changes

src/networking.c Outdated Show resolved Hide resolved

madolson reviewed Jun 24, 2024

View reviewed changes

src/networking.c Outdated Show resolved Hide resolved

Patch enhancement.

f57e7d3

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com> Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>

madolson approved these changes Jun 25, 2024

View reviewed changes

madolson merged commit 4d3d6c0 into valkey-io:unstable Jun 25, 2024
19 checks passed

lipzhu deleted the redundant_prepareClientToWrite branch June 25, 2024 01:42

zuiderkwast added performance labels Jun 25, 2024

zuiderkwast added release-notes This issue should get a line item in the release notes labels Jun 25, 2024

This was referenced Jul 10, 2024

Reduce redundant call of prepareClientToWrite when call addReply* con… sundb/redis#330

Closed

Reduce redundant call of prepareClientToWrite when call addReply* continuously redis/redis#13412

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce redundant call of prepareClientToWrite when call addReply* continuously. #670

Reduce redundant call of prepareClientToWrite when call addReply* continuously. #670

lipzhu commented Jun 19, 2024 •

edited

Loading

codecov bot commented Jun 19, 2024 •

edited

Loading

lipzhu commented Jun 21, 2024

madolson commented Jun 21, 2024 •

edited

Loading

PingXie commented Jun 21, 2024

lipzhu commented Jun 21, 2024

madolson commented Jun 21, 2024

madolson commented Jun 21, 2024

madolson commented Jun 21, 2024 •

edited

Loading

lipzhu commented Jun 22, 2024 •

edited

Loading

PingXie commented Jun 22, 2024

PingXie left a comment

lipzhu commented Jun 23, 2024

madolson commented Jun 24, 2024

madolson commented Jun 24, 2024

lipzhu commented Jun 25, 2024

madolson commented Jun 25, 2024 •

edited

Loading

zuiderkwast commented Jun 25, 2024

madolson commented Jun 25, 2024 •

edited

Loading

Reduce redundant call of prepareClientToWrite when call addReply* continuously. #670

Reduce redundant call of prepareClientToWrite when call addReply* continuously. #670

Conversation

lipzhu commented Jun 19, 2024 • edited Loading

Description

Profiling of Hotspots

Test Environment

Performance Result

Perf Boost

Test Scenario

1. Start server

2. Prepare test data

3. Run benchmark

codecov bot commented Jun 19, 2024 • edited Loading

Codecov Report

lipzhu commented Jun 21, 2024

madolson commented Jun 21, 2024 • edited Loading

PingXie commented Jun 21, 2024

lipzhu commented Jun 21, 2024

madolson commented Jun 21, 2024

madolson commented Jun 21, 2024

madolson commented Jun 21, 2024 • edited Loading

lipzhu commented Jun 22, 2024 • edited Loading

PingXie commented Jun 22, 2024

PingXie left a comment

Choose a reason for hiding this comment

lipzhu commented Jun 23, 2024

madolson commented Jun 24, 2024

madolson commented Jun 24, 2024

lipzhu commented Jun 25, 2024

madolson commented Jun 25, 2024 • edited Loading

zuiderkwast commented Jun 25, 2024

madolson commented Jun 25, 2024 • edited Loading

lipzhu commented Jun 19, 2024 •

edited

Loading

codecov bot commented Jun 19, 2024 •

edited

Loading

madolson commented Jun 21, 2024 •

edited

Loading

madolson commented Jun 21, 2024 •

edited

Loading

lipzhu commented Jun 22, 2024 •

edited

Loading

madolson commented Jun 25, 2024 •

edited

Loading

madolson commented Jun 25, 2024 •

edited

Loading