Skip to content

Latest commit

 

History

History
302 lines (210 loc) · 9 KB

094.md

File metadata and controls

302 lines (210 loc) · 9 KB
layout title
post
第94期

C++ 中文周刊 第94期

周刊项目地址

公众号

欢迎投稿,推荐或自荐文章/软件/资源等

提交 issue

马上2022就要结束了。祝大家新年快乐。

我感觉这波阳性我算比较早期的。希望大家都没事。


文章

值类型重申,可能很多人还停留在modern effective c++介绍的auto那里

考虑一种需求,把二进制编成ascii码, base64有点复杂,不如base16

void encode_scalar(const uint8_t *source, size_t len, char *target) {
  const uint16_t table[] = {
      0x3030, 0x3130, 0x3230, 0x3330, 0x3430, ...
      0x6366, 0x6466, 0x6566, 0x6666};
  for (size_t i = 0; i < len; i++) {
    uint16_t code = table[source[i]];
    ::memcpy(target, &code, 2);
    target += 2;
  }
}

显然,能simd

 __m128i shuf = _mm_set_epi8('f', 'e', 'd', 'c', 'b', 'a', '9', '8', '7', '6',
                              '5', '4', '3', '2', '1', '0');
  size_t i = 0;
  __m128i maskf = _mm_set1_epi8(0xf);
  for (; i + 16 <= len; i += 16) {
    __m128i input = _mm_loadu_si128((const __m128i *)(source + i));
    __m128i inputbase = _mm_and_si128(maskf, input);
    __m128i inputs4 =
        _mm_and_si128(maskf, _mm_srli_epi16(input, 4));
    __m128i firstpart = _mm_unpacklo_epi8(inputs4, inputbase);
    __m128i output1 = _mm_shuffle_epi8(shuf, firstpart);
    __m128i secondpart = _mm_unpackhi_epi8(inputs4, inputbase);
    __m128i output2 = _mm_shuffle_epi8(shuf, secondpart);
    _mm_storeu_si128((__m128i *)(target), output1);
    target += 16;
    _mm_storeu_si128((__m128i *)(target), output2);
    target += 16;
  }

代码就不列出来了,在这里 https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2022/12/23/base16.cpp

直接看性能吧

方法 速度
table lookup 0.9 GB/s
128-bit vectors 6.4 GB/s
256-bit vectors 11 GB/s

常规

static const std::unordered_set<std::string_view> special_set = {
    "ftp", "file", "http", "https", "ws", "wss"};

bool hash_is_special(std::string_view input) {
  return special_set.find(input) != special_set.end();
}

枚举

bool direct_is_special(std::string_view input) {
  return (input == "https") | (input == "http") | (input == "ftp") |
         (input == "file") | (input == "ws") | (input == "wss");
}

既然是这种特殊短字符串,直接转成int给大家开开眼

static inline uint64_t string_to_uint64(std::string_view view) {
  uint64_t val;
  std::memcpy(&val, view.data(), sizeof(uint64_t));
  return val;
}

uint32_t string_to_uint32(const char *data) {
  uint32_t val;
  std::memcpy(&val, data, sizeof(uint32_t));
  return val;
}


bool fast_is_special(std::string_view input) {
  uint64_t inputu = string_to_uint64(input);
  if ((inputu & 0xffffffffff) == string_to_uint64("https\0\0\0")) {
    return input.size() == 5;
  }
  if ((inputu & 0xffffffff) == string_to_uint64("http\0\0\0\0")) {
    return input.size() == 4;
  }
  if (uint32_t(inputu) == string_to_uint32("file")) {
    return input.size() == 4;
  }
  if ((inputu & 0xffffff) == string_to_uint32("ftp\0")) {
    return input.size() == 3;
  }
  if ((inputu & 0xffffff) == string_to_uint32("wss\0")) {
    return input.size() == 3;
  }
  if ((inputu & 0xffff) == string_to_uint32("ws\0\0")) {
    return input.size() == 2;
  }
  return false;
}

直接看看速度

GCC 11 Intel Ice Lake

方法 速度
std::unordered_map 20 ns/string
direct 9.1 ns/string
fast 3.0 ns/string

Apple M2 LLVM 12

方法 速度
std::unordered_map 14 ns/string
direct 5.5 ns/string
fast 1.6 ns/string

硬转效果还挺好

看代码

struct person {
    int age;
    std::string name;
}

// ...

auto alice = std::make_shared<person>("Alice", 38);


std::shared_ptr<std::string> name(alice, &alice->name);
assert(alice.use_count() == name.use_count()); // single-threaded use only

通过第二种构造,name相当于alice的别名了,这么写问题出在哪里?

alice 可能被乱搞,可能已经失效了,这个时候使用alice的name是有问题的

直接来个极端的例子

std::shared_ptr<void> null;
std::shared_ptr<std::string> weirdo(null, &some_global_string_that_is_always_valid)

null是无效的但weirdo是有效的。https://godbolt.org/z/xT5qzK443

不要用shared_ptr的这种构造函数。很容易写出坑

数据复制是最常见的场景了,把数据传来传去,所以说一个memcpy速度快是很重要的。

那么实现一个快速的memcpy要考虑什么呢?

  • Fail: Copy-in-RAM and DMA engines
  • Both the CPU side and the memory side of caches matter
  • Load/Store Partial and Double-width-shift help significantly
  • Longer Prefetching matters
  • Letting the memory controller know about prefetch length can be 3x faster
  • Controlling cache pollution matters
  • Fixed-precision formatting of floating-point numbers

感觉很牛逼。浮点数啥的我一直不懂

介绍c++23新特性。不多说了。说了好多次了

template<auto... Ns> consteval auto fn() {
  std::vector v{Ns...};
  return std::size(v);
}

static_assert(3uz == fn<1, 2, 3>());

额,怎么说呢这段代码,想不出有啥用途

性能路径尽量别用异常,懂得都懂。不多说了

能快一点,但这玩意一般不是瓶颈

size_t sve_strlen(const char *s) {
  /*if(svcntb() > 256) {
    // do something here because we assume that our
    // vectors have no more than 256.
  }*/
  size_t len = 0;
  while (true) {
    svuint8_t input = svldff1_u8(svptrue_b8(), (const uint8_t *)s + len);
    svbool_t matches = svcmpeq_n_u8(svptrue_b8(), input, 0);
    if (svptest_any(svptrue_b8(), matches)) {
      return len + svlastb_u8(svbrka_z(matches, matches), svindex_u8(0, 1));
    }
    len += svcntb();
  }
}

都是一坨难受的代码

timer内部标记一个flag,然后外面UI事件框架根据flag来搞优先级?

没有嗷别做梦了

手把手教你cmake嵌入复杂文件信息,tar包

值得一看

视频

cppcon

讲struc_pack的。和boost.pfr差不多

开源项目需要人手

  • asteria 一个脚本语言,可嵌入,长期找人,希望胖友们帮帮忙,也可以加群384042845和作者对线

新项目介绍/版本更新

  • saf asio基础上的scheduler
  • libenvpp A modern C++ library for type-safe environment variable parsing
  • xmake.sh xmake脚本

本文永久链接