Releases · wanghenshui/cppweeklynews

13 Apr 14:23

wanghenshui

v1.5.3

1451711

C++ 中文周刊 2024-03-30 第153期

本期文章由 HNY 赞助

最近博客内容较少，所以基本整合起来发

资讯

标准委员会动态/ide/编译器信息放在这里

最近的热门事件无疑是xz被植入后门了，埋木马的哥们主动参与社区贡献，骗取信任拿到直接commit权限

趁主要维护人休假期间埋木马，但是木马有问题回归测试被安全人员发现sshd CPU升高，找到xz是罪魁祸首

无间道搁这

文章

How can I tell C++ that I want to discard a nodiscard value?

std::ignore 或者 decltype(std::ignore) _; 然后用 _，或者不自己写，等c++26

Step-by-Step Analysis of Crash Caused by Resize from Zero

resize 的参数为负数会异常(比如参数溢出意外负数)

有句讲句，标准库里的异常有时候很奇怪，大动干戈，副作用还是异常应该有明显的区分。但是目前来看显然是一股脑全异常了

比如stoi异常，这些场景里expect<T>更合适，或者c传统的返回值处理更合理一些

RDMA性能优化经验浅谈（一）

科普

GCC 14 Boasts Nice ASCII Art For Visualizing Buffer Overflows

告警更明显一些

Improvements to static analysis in the GCC 14 compiler

gcc14加了个-fanalyzer

包括上面的buffer溢出分析，死循环分析，比如 https://godbolt.org/z/vn55nn43z

void test (int m, int n) {
  float arr[m][n];
  for (int i = 0; i < m; i++)
    for (int j = 0; j < n; i++)
      arr[i][j] = 0.f;
  /* etc */
}

这里里面的循环条件一直没变，所以一直是死的

编译器能分析出问题

<source>: In function 'test':
<source>:5:23: warning: infinite loop [CWE-835] [-Wanalyzer-infinite-loop]
    5 |     for (int j = 0; j < n; i++)
      |                     ~~^~~
  'test': events 1-5
    |
    |    5 |     for (int j = 0; j < n; i++)
    |      |                     ~~^~~  ~~~
    |      |                       |     |
    |      |                       |     (4) looping back...
    |      |                       (1) infinite loop here
    |      |                       (2) when 'j < n': always following 'true' branch...
    |      |                       (5) ...to here
    |    6 |       arr[i][j] = 0.f;
    |      |       ~~~~~~~~~        
    |      |             |
    |      |             (3) ...to here
    |
ASM generation compiler returned: 0
<source>: In function 'test':
<source>:5:23: warning: infinite loop [CWE-835] [-Wanalyzer-infinite-loop]
    5 |     for (int j = 0; j < n; i++)
      |                     ~~^~~
  'test': events 1-5
    |
    |    5 |     for (int j = 0; j < n; i++)
    |      |                     ~~^~~  ~~~
    |      |                       |     |
    |      |                       |     (4) looping back...
    |      |                       (1) infinite loop here
    |      |                       (2) when 'j < n': always following 'true' branch...
    |      |                       (5) ...to here
    |    6 |       arr[i][j] = 0.f;
    |      |       ~~~~~~~~~        
    |      |             |
    |      |             (3) ...to here
    |
Execution build compiler returned: 0
Program returned: 255

这个功能非常有用 gcc14已经发布，能体验到赶紧用起来，免费的静态检查了属于是

A case in API ergonomics for ordered containers

range的问题，如果range 的顺序颠倒，可能会产生未定义行为

举个例子，正常的范围使用

std::set<int> x=...;

// elements in [a,b]
auto first = x.lower_bound(a);
auto last  = x.upper_bound(b);
 
while(first != last) std::cout<< *first++ <<" ";

// elements in [a,b)
auto first = x.lower_bound(a);
auto last  = x.lower_bound(b);

// elements in (a,b]
auto first = x.upper_bound(a);
auto last  = x.upper_bound(b);

// elements in (a,b)
auto first = x.upper_bound(a);
auto last  = x.lower_bound(b);

这里的用法的潜在条件是a < b，如果不满足，就完蛋了, 似乎没有办法预防写错，手动assert？

这也容易引起错误，能不能让使用者不要用接口有隐形成本？

boost multiindex设计了一种接口

template<typename LowerBounder,typename UpperBounder>
std::pair<iterator,iterator>
range(LowerBounder lower, UpperBounder upper);

显然，不同的类型，隐含一层检查，看上去不好用，但是结合boost lambda2，非常直观

// equivalent to std::set<int>
boost::multi_index_container<int> x=...;

using namespace boost::lambda2;

// [a,b]
auto [first, last] = x.range(_1 >= a, _1 <= b);

// [a,b)
auto [first, last] = x.range(_1 >= a, _1 < b);

// (a,b]
auto [first, last] = x.range(_1 > a,  _1 <= b);

// (a,b)
auto [first, last] = x.range(_1 > a,  _1 < b);

倾向于返回range处理，而不是手动拿到range，即使出现a>b的场景，顶多返回空range

这样要比上面的用法更安全一些

唉，API设计的问题还是有很多需要关注的地方

C++ left arrow operator

幽默代码一例(别这么写)

#include <iostream>
 
template<class T>
struct larrow {
    larrow(T* a_) : a(a_) { }
    T* a;
};
 
template <class T, class R>
R operator<(R (T::* f)(), larrow<T> it) {
    return (it.a->*f)();
}
 
template<class T>
larrow<T> operator-(T& a) {
    return larrow<T>(&a);
}
 
struct C {
    void f() { std::cout << "foo\n"; }    
};
 
int main() {
    C x;
    (&C::f)<-x;
}

Upgrading the compiler: undefined behaviour uncovered

TLDR enum没指定默认值的bug，类似int不指定默认值

Trivial, but not trivially default constructible

一个例子

template<class T>
struct S {
    S() requires (sizeof(T) > 3) = default;
    S() requires (sizeof(T) < 5) = default;
};

static_assert(std::is_trivial_v<S<int>>);
static_assert(not std::is_default_constructible_v<S<int>>);

是trivial的，但是构造函数有点多，就没法默认构造

你问这有什么用，确实没用。当不知道好了

今天也和读者聊天问push back T构造异常了咋办，

那我只能说这个T的实现很没有素质，除了bad alloc别的老子不想管

希望大家都做一个有素质的人

Understanding and implementing fixed point numbers

看不懂

Random distributions are not one-size-fits-all (part 1)

Random distributions are not one-size-fits-all (part 2)

随机数生成和场景关联程度太大了，lemire的算法省掉了取余% 但是部分场景性能并不能打败使用取余%的版本

How fast is rolling Karp-Rabin hashing?

其实就是滚动hash，比如这种

uint32_t hash = 0;
for (size_t i = 0; i < len; i++) {
  hash = hash * B + data[i];
}
return hash;

这个B可能是个质数，比如31，不过不重要

考虑一个字符串子串匹配场景，这种场景下得计算字串hash，比如长字符串内长度为N的子串，代码类似这样

for(size_t i = 0; i < len-N; i++) {
  uint32_t hash = 0;
  for(size_t j = 0; j < N; j++) {
    hash = hash * B + data[i+j];
  }
  //...
}

这个代码的问题是效率低，有没有什么优化办法？

显然这里面有重复计算，到N之前的hash计算完全可以提前算出来

后面变动的减掉就行

uint32_t BtoN = 1;
for(size_t i = 0; i < N; i++) { BtoN *= B; }

uint32_t hash = 0;
for(size_t i = 0; i < N; i++) {
  hash = hash * B + data[i];
}
// ...
for(size_t i = N; i < len; i++) {
  hash = hash * B + data[i] - BtoN * data[i-N];
  // ...
}

不知道你看懂没？

这样提前算好性能翻个五倍没啥问题

代码在这里 https://github.com/lemire/clhash/

还有这个 https://github.com/lemire/rollinghashcpp

视频

C++ Weekly - Ep 421 - You're Using optional, variant, pair, tuple, any, and expected Wrong!

不要直接从原类型返回optional这种盒子类型，会破坏RVO 手动make_optional就行了

Assets 2

24 Mar 16:47

wanghenshui

v1.5.2

1bd5bd5

C++ 中文周刊 2024-03-25 第152期

周刊项目地址

公众号

qq群点击进入

RSS https://github.com/wanghenshui/cppweeklynews/releases.atom

欢迎投稿，推荐或自荐文章/软件/资源等，评论区留言

资讯

标准委员会动态/ide/编译器信息放在这里

c++26 东京会议如火如荼，详情Mick235711已经发了，公众号也发了，这里不再赘述

文章

Introduction To Low Latency Programming: External Processing

其实就是提前算好，包括不限于利用编译期利用脚本生成以及constexpr

异步拆分，让别人算

Condvars and atomics do not mix

用condvar 条件就要放到mutex下修改，即便这个变量是atomic，也要放到mutex下修改

Jumbled Protocol Buffer Message Layout

protoc会重排你定义的字段布局

static void OptimizeLayoutHelper(std::vector<const FieldDescriptor*>* fields,
                                 const Options& options,
                                 MessageSCCAnalyzer* scc_analyzer) {
  if (fields->empty()) return;

  // The sorted numeric order of Family determines the declaration order in the
  // memory layout.
  enum Family {
    REPEATED = 0,
    STRING = 1,
    // Laying out LAZY_MESSAGE before MESSAGE allows a single memset to zero
    // MESSAGE and ZERO_INITIALIZABLE fields together.
    LAZY_MESSAGE = 2,
    MESSAGE = 3,
    ZERO_INITIALIZABLE = 4,
    OTHER = 5,
    kMaxFamily
  };

作者观察到一个现象，本来字段很多，删掉一部份字段，性能应该有提升，结果并没有

message Stats {
    int64 ts                   = 1;
    int64 show                 = 2;
    int64 click                = 3;
    int64 cost                 = 4;
    int64 hour_show            = 5;
    int64 hour_click           = 6;
    int64 hour_cost            = 7;
    int64 acc_show             = 8;
    int64 acc_click            = 9;
    int64 acc_cost             = 10;
    repeated int64 bucket      = 11;
    repeated int64 hour_bucket = 12;
    repeated int64 acc_bucket  = 13;
}

优化成这样

// After remove the `hour_*` fields
message StatsOpt {
    int64 ts                   = 1;
    int64 show                 = 2;
    int64 click                = 3;
    int64 cost                 = 4;
    int64 acc_show             = 5;
    int64 acc_click            = 6;
    int64 acc_cost             = 7;
    repeated int64 bucket      = 8;
    repeated int64 acc_bucket  = 9;
}

内存布局原来是这样

+------ 16 BYTE ------+- 8 BYTE -+------ 16 BYTE ------+- 8 BYTE -+------ 16 BYTE ------+
|    (11)bucket       | (11)size |      (12)hour       | (12)size |  (13)acc_bucket     |
+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+
| (13.a)   |   (1)ts  |  (2)show | (3)click |  (4)cost |    (5)   |    (6)   | (7)      |
+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+
|    (8)   |    (9)   |   (10)   |     *    |     *    |     *    |     *    |     *    |
+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+

现在是这样

+------ 16 BYTE ------+- 8 BYTE -+------ 16 BYTE ------+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+
|    (8)bucket        | (8)size  |      (9)hour        | (9)size  | (13.a)   |   (1)ts  |
+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+
|  (2)show | (3)click |  (4)cost |    (5)   |    (6)   | (7)      |     *    |     *    |
+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+- 8 BYTE -+

生成文件是这样的

 struct Impl_ {
  ::PROTOBUF_NAMESPACE_ID::RepeatedField< int64_t > bucket_;
  mutable std::atomic<int> _bucket_cached_byte_size_;
  ::PROTOBUF_NAMESPACE_ID::RepeatedField< int64_t > acc_bucket_;
  mutable std::atomic<int> _acc_bucket_cached_byte_size_;
  int64_t ts_;
  int64_t show_;
  int64_t click_;
  int64_t cost_;
  int64_t acc_show_;
  int64_t acc_click_;
  int64_t acc_cost_;
  mutable ::PROTOBUF_NAMESPACE_ID::internal::CachedSize _cached_size_;
};

union { Impl_ _impl_; };

能看到ts和cost跨cacheline了。以前的字段虽然大，但不是夸cacheline的，优化后反而跨cacheline导致变慢

C++ exit-time destructors

介绍析构 runtime细节，非常细，值得一看

Daily bit(e) of C++ | Coroutines: step by step

又一个协程教程

C++23: Encoding related changes

介绍了一些编码方面的改进，包括多语言支持/unicode等等

std::locale::global(std::locale("Russian.1251"));
auto s = std::format("День недели: {}", std::chrono::Monday);

Bug hunting in Btrfs

调试代码发现了btrfs有bug，然后去找bug在哪里，很细，值得一看，这里标记TODO

C++20: Basic Chrono Terminology with Time Duration and Time Point

std::chrono::months 你猜是多少？30.436875

std::chrono::years 你猜是多少？365.2425

非常之令人无语，这种傻逼接口有存在的必要吗

Two handy GDB breakpoint tricks

gdb小技巧

工作招聘

金山招聘，感兴趣点击链接

互动环节

微信公众号终于有评论区了

最近有点卷，更新不太及时, 而且这几周没啥有营养的文章，非常可惜，有啥好玩的我单独发吧

上一期

Assets 2

10 Mar 16:43

wanghenshui

v1.5.1

a7f44d3

C++ 中文周刊 2024-03-09 第151期

欢迎投稿，推荐或自荐文章/软件/资源等

请提交 issue 或评论区留言

本期文章由不语 HNY {} 赞助

周末有点忙，内容不多，这周争取双更

话说看到了别人的知识星球真有一种这也能卖钱的感觉

c++知识普及还是很远，无论深度还是广度，优质内容还是太少了，这种稍微懂点就敢开知识星球开课了

资讯

标准委员会动态/ide/编译器信息放在这里

文章

浅谈侵入式结构的应用

总结的的非常好

In C++/WinRT, you shouldn’t destroy an object while you’re co_awaiting it

co_wait对象如果被析构，可能有bug

struct MyThing : winrt::implements<MyThing, winrt::IInspectable>
{
    winrt::IAsyncAction m_pendingAction{ nullptr };

    winrt::IAsyncAction DoSomethingAsync() {
        auto lifetime = get_strong();
        m_pendingAction = LongOperationAsync();
        co_await m_pendingAction;
        PostProcessing();
    }

    void Cancel() {
        if (m_pendingAction) {
            m_pendingAction.Cancel();
            m_pendingAction = nullptr;
        }
    }
};

这段代码，如果DoSomethingAsync的时候另一个线程Cancel了，pendingAction被析构了，co_await就会挂

解决方案，co_await副本，复制一份或者decay_copy auto{}

这种指针问题和co_await关系不大，但是容易忽略，普通函数也可以触发，比如异步调用lambda，然后lambda里reset指针，这种场景，可能需要copy这个对象

Borrow Checker, Lifetimes and Destructor Arguments in C++ Avanced compile-time validation with stateful

看得我眼睛疼在线演示 https://godbolt.org/z/71qs619Ge

LLVM's 'RFC: C++ Buffer Hardening' at Google

安全加固 c buffer的 google内部测试加固完性能衰退也就%1 希望llvm合了

RAII all the things?

用unique_ptr来搞，以前讲过类似的。直接贴代码

struct fcloser {
    void operator()(std::FILE* fp) const noexcept {
        std::fclose(fp);
    }
};

using file_ptr = std::unique_ptr<std::FILE, fcloser>;


struct mem_unmapper {
    size_t length{};

    void operator()(void* addr) const noexcept {
        ::munmap(addr, length);
    }
};

using mapped_mem_ptr = std::unique_ptr<void, mem_unmapper>;

[[nodiscard]] inline mapped_mem_ptr make_mapped_mem(void* addr, size_t length, int prot, int flags, int fd, off_t offset) {
    void* p = ::mmap(addr, length, prot, flags, fd, offset);
    if (p == MAP_FAILED) { // MAP_FAILED is not NULL
        return nullptr;
    }
    return {p, mem_unmapper{length}}; // unique_ptr owns a deleter, which remembers the length
}


// Intentionally non-RAII
class file_descriptor {
    int fd_{-1};
public:
    file_descriptor(int fd = -1): fd_(fd) {}
    file_descriptor(nullptr_t) {}
    operator int() const { return fd_; }
    explicit operator bool() const { return fd_ != -1; }
    friend bool operator==(file_descriptor, file_descriptor) = default; // Since C++20
};

struct fd_closer {
    using pointer = file_descriptor; // IMPORTANT
    void operator()(pointer fd) const noexcept {
        ::close(int(fd));
    }
};
//using unique_fd = std::unique_ptr<int,             fd_closer>; // Ok
using unique_fd = std::unique_ptr<file_descriptor, fd_closer>; // Ok
//using unique_fd = std::unique_ptr<void,            fd_closer>; // Ok

其他场景自己拼一个defer或者scope_exit

How do I make an expression non-movable? What’s the opposite of std::move?

如何实现强制不move？

template<typename T>
std::remove_reference_t<T> const& no_move(T&& t)
{
    return t;
}

std::vector<int> v = no_move(make_vector());

Python3.13的JIT是如何实现的

这个也有点意思

不用指针实现pimpl

代码很短

class impl
{
public:
  impl();
  ~impl();

  int add(int x) const;

private:
  alignas(private_align) unsigned char buffer[private_size];
};

class impl_private
{
public:
  explicit impl_private(int y);

  int add(int x) const;

private:
  int _y;
};

template<typename T>
using impl_t = std::conditional_t<std::is_const_v<std::remove_pointer_t<T>>,
                                  impl_private const,
                                  impl_private>*;

#define IMPL std::launder(reinterpret_cast<impl_t<decltype(this)>>(buffer))

impl::impl()
{
  (void)new (buffer) impl_private(1);
}

impl::~impl()
{
  IMPL->~impl_private();
}

int impl::add(int x) const
{
  return IMPL->add(x);
}

核心是用buffer存实现类，然后硬转，看一乐

工作招聘

字节跳动图数据库有个招聘，友情推荐一下

字节图数据库 ByteGraph 团队招聘数据库研发工程师，参与 ByteGraph 存储引擎、查询引擎、数据库与计算融合引擎的核心代码开发。
实习、校招、社招均可，base 地北京/成都/杭州/新加坡均可，新加坡由于签证问题对级别有一定要求，细节可私聊。

可以加微信 Expelidarmas或者邮件 huyingqian@bytedance.com

详情可以看这个

他们vldb发过论文的，字节的业务也很强

互动

还是缺少优质内容，大家给给点子

上一期

Assets 2

03 Mar 15:46

wanghenshui

v1.5.0

d833c87

C++ 中文周刊 2024-02-24 第150期

本期文章由黄亮Anthony Amnesia 赞助

最近沸沸扬扬的白宫发文，转向更安全的语言，明示c++不行

除了把NSA之前的观点重新提出来之外，没有任何新东西，

就像个想离婚的在这里埋怨不想过了，死鬼你也不改你看人家xx语言

要我说这就是美帝不行的原因，从上到下都没有耐性我靠

最近很忙视频都没来得及看。后面慢慢补吧，视频可能单独发总结

资讯

标准委员会动态/ide/编译器信息放在这里

本台记者 kenshin报道，visual studio最近更新了非常有用的功能，分析编译时间之类，分析字段内存布局，分析include等等

感兴趣可以更新一下 What’s New for C++ Developers in Visual Studio 2022 17.9

另外clang gcc有单独的工具，比如ftime-trace，比如这个

xmake 2.8.7发布

https://github.com/xmake-io/xmake/wiki/Xmake-v2.8.7-released,-Add-cosmocc-toolchain-support,-build%E2%80%90once-run%E2%80%90anywhere

boost新parser正在review中 https://lists.boost.org/Archives/boost/2024/02/255957.php

类似boost spirit，代码在这里 https://github.com/tzlaine/parser

think-cell出了个意见，他们在自己的库里维护了boost spirit，觉得重新造轮子不太合理，详情见 https://www.think-cell.com/en/career/devblog/parsers-vs-unicode

文章

Rage Against The Glue: Beyond Run-Time Media Frameworks with Modern C++

音视频领域有个 M x N问题

不同的media processors 在N种平台上导致api复杂度上升不可维护

考虑一种接口设计方法，让代码更简练

琢磨半天结果是concept + boost pfr之类的检测接口/策略模版

代码在这里

还有一些其他的想法在这里

constexpr and consteval functions

记住这段代码就行了

// This is a pure compile-time function.
// Any evaluation is fully done at compile-time;
// no runtime code will be generated by the compiler, just like `static_assert`.
consteval size_t strlen_ct(const char* s) {
    size_t n = 0;
    for (; s[n] != '\0'; ++n);
    return n;
}

// This is a pure runtime function, which can only be invoked at runtime.
size_t strlen(const char* s);

// This function can be invoked both at both compile-time and at runtime,
// depending on the context.
constexpr size_t strlen_dual(const char* s) {
    if consteval {
        return strlen_ct(s); // compile-time path
    } else {
        return strlen(s);    // runtime path
    }
}

constexpr最好两种分支都实现，避免意外的问题

How to debug C and C++ programs with rr

其实rr和gdb record差不多，感觉可以用这个文档例子做个视频，这里标记一个TODO

Undefined behavior in C and C++

讲UB产生的场景，以及如何避免，给的方案编译flag ubsan以及换个语言，我谢谢你

Measuring energy usage: regular code vs. SIMD code

lemire新活，只要你的代码足够快，即使是simd这种费电的指令相比而言整体费电也不多

代码在这里

值得复现一下

My late discovery of std::filesystem - Part I

介绍fs相关api，比如遍历之类的，我就不列出来了

How to write unit tests in C++ relying on non-code files?

介绍非代码文件怎么和代码编译到一起的，bazel/cmake都有类似configfile的方法。embed赶紧来吧

Navigating Memory in C++: A Guide to Using std::uintptr_t for Address Handling

为什么用它，用int32 int64之类的类型，reinterpret cast可能会有fpermissive报错，比如

#include <cstdint>

namespace HAL {
class UART {
public:
 explicit UART(std::uint32_t base_address); 
 void write(std::byte byte);
 std::byte read() const;
 // ...
private:
 struct Registers* const registers;
};
}

constexpr std::uint32_t com1_based_address = 0x4002'0000U;

int main() {
 HAL::UART com1{com1_based_address}; 
 com1.write(std::byte{0x5A});
 auto val = com1.read();
 ...
}
namespace {
 struct Registers // layout of hardware UART
{ 
  std::uint32_t status;     // status register
  std::uint32_t data;      // data register
  std::uint32_t baud_rate;    // baud rate register
  ...
  std::uint32_t guard_prescaler; // Guard time and prescaler register
 };
 static_assert(sizeof(Registers) == 40, "Registers struct has padding");
 Registers Mock_registers{};
} 

// doctest
TEST_CASE("UART Construction") {
  constexpr std::uint32_t baud_115k = 0x8b;
  constexpr std::uint32_t b8_np_1stopbit = 0x200c;

  HAL::UART com3 {reinterpret_cast<std::uint32_t>(&Mock_registers)}; // 不行

  CHECK(Mock_registers.baud_rate == baud_115k);
  CHECK(Mock_registers.ctrl_1 == b8_np_1stopbit);
}

但如果把UART构造函数改成intptr就没问题

#include <cstdint>

namespace HAL {
class UART {
public:
 explicit UART(std::uintptr_t base_addr); 
 ...
};
}

// doctest
TEST_CASE("UART Construction") {
  constexpr std::uint32_t baud_115k = 0x8b;
  constexpr std::uint32_t b8_np_1stopbit = 0x200c;

  HAL::UART com3 {reinterpret_cast<std::uintptr_t>(&Mock_registers)};

  CHECK(Mock_registers.baud_rate == baud_115k);
  CHECK(Mock_registers.ctrl_1 == b8_np_1stopbit);
}

为什么不用intptr？如果涉及到负数移动，或者做差有负数之类的，可以用intptr，其他场景还是uintptr更合适

Introduction To Low Latency Programming: Minimize Branching And Jumping

哥们出书了，120页卖10刀有点贵我靠，书在这里 https://a.co/d/0U6KOfb

这个文章是节选，大概思路就是降低跳转和分支

勤用 && || 利用短路特性
关注能生成cmov的写法，condition mov几乎和mov差不多，利用cmov替代jump test更好，什么能生成cmov？冒号表达式
减少虚函数使用，你用variant模拟那也是虚函数，不要自欺欺人哈
inline，可以用[[gnu::always_inline]]
善用 __builtin_expect
分支改switch

一个样例

void processThisString(std::string_view input)
{
  if (input == "production") {
    processProd(input);
  } else if (input == "RC") {
    processRC(input);
  } else if (input == "beta")
    processBeta(input);
  }
}
void processThisString(std::string_view input)
{
  constexpr auto i = 0;
  switch (input[i]) {
    case "production"[i]: processProd(input); break;
    case "RC"[i]: processRC(input); break;
    case "beta"[i]: processBeta(input); break;
  }
}

1 << n vs. 1U << n and a cell phone autofocus problem

#include <stdio.h>

static unsigned long set_bit_a(int bit) {
  return 1 << bit;
}

static unsigned long set_bit_b(int bit) {
  return 1U << bit;
}

int main() {
  printf("sizeof(unsigned long) here: %zd\n", sizeof(unsigned long));

  for (int i = 0; i < 32; ++i) {
    printf("1 << %d : 0x%lx | 0x%lx\n", i, set_bit_a(i), set_bit_b(i));
  }

  return 0;
}

64位机器 31会打印什么？https://gcc.godbolt.org/z/qa3o34hrW

非常幽默

1 << 0 : 0x1 | 0x1
1 << 1 : 0x2 | 0x2
1 << 2 : 0x4 | 0x4
1 << 3 : 0x8 | 0x8
1 << 4 : 0x10 | 0x10
1 << 5 : 0x20 | 0x20
...
1 << 29 : 0x20000000 | 0x20000000
1 << 30 : 0x40000000 | 0x40000000
1 << 31 : 0xffffffff80000000 | 0x80000000

m32并没有这个问题

自底向上理解memory_order

理解理解

Using std::expected from C++23

不会的拖出去

#include <charconv>
#include <expected>
#include <string>
#include <system_error> 
#include <iostream>

std::expected<int, std::string> convertToInt(const std::string& input) {
    int value{};
    auto [ptr, ec] = std::from_chars(input.data(), input.data() + input.size(), value);
    
    if (ec == std::errc())
        return value;

    if (ec == std::errc::invalid_argument)
        return std::unexpected("Invalid number format");
    else if (ec == std::errc::result_out_of_range)
        return std::unexpected("Number out of range");

    return std::unexpected("Unknown conversion error");
}

int main() {
    std::string userInput = "111111111111111";

    auto result = convertToInt(userInput);
    if (result)
        std::cout << "Converted number: " << *result << '\n';
    else
        std::cout << "Error: " << result.error() << '\n';
}

c23有一种模拟的方法

#include <stdio.h>

struct { bool success; int value; } parse(const char* s) {
    if (s == NULL)
       return (typeof(parse(0))) { false, 1};

    return (typeof(parse(0))) { true, 1 };
};

int main() {
   auto r = parse("1");
   if (r.success) {
     printf("%d", r.value);
   }
}

不建议使用

LLVM 中的一致性分析框架详解之发散源和指令一致性判断

学点llvm

A story of a very large loop with a long instruction dependency chain

简单来说就是拆循环 loop fission + 降低数据buffur大小，小于l1 cacheline，提升性能

需要复现一下

如何发现是内存子系统的问题（buffer大于cacheline）？使用likwid 查的。这个实验需要复现一下看看

When an instruction depends on the previous instruction depends on the previous instructions… : long instruction dependency chains and performance

简单来说就是通过 interleave 拆分任务，来加速，这个和loop fission还不太一样，loop fission就是单纯的拆循环，interleave又不同任务分发调度的感觉

Atomics And Concurrency

这个比较基础和cpprefence差不多

The Auto macro

#pragma once

template<class L>
class AtScopeExit {
    L& m_lambda;
public:
    AtScopeExit(L& action) : m_lambda(action) {}
    ~AtScopeExit() noexcept(false) { m_lambda(); }
};

#define TOKEN_PASTEx(x, y) x ## y
#define TOKEN_PASTE(x, y) TOKEN_PASTEx(x, y)

#define Auto_INTERNAL1(lname, aname, ...) ...

Assets 2

27 Feb 15:45

wanghenshui

v1.4.9

075c7ee

C++ 中文周刊 2024-02-17 第149期

欢迎投稿，推荐或自荐文章/软件/资源等

请提交 issue 或评论区留言

本期文章由不语黄亮Anthony Tudou kenshin 赞助

勉强算半期吧，返程没时间了，简单更新一下

资讯

标准委员会动态/ide/编译器信息放在这里

二月邮件

其实重点比较明朗，就execution reflect graph这些，剩下的基本都是修正

fiber_context也好久了

文章

Clang 出了 Bug 怎么办？来一起修编译器吧！

看一遍基本就把clang llvm这套东西串起来了。都学一下吧，llvm战未来朋友们

C++20的constexpr string为什么无法工作

感受抽象的gcc sso优化实现

另外clang本来是没做constexpr sso优化的，最近又给加上了

微信群里也讨论了，咨询了maskray老师意见，可能就是为了对齐libstdcxx的行为

我和这个想法相同，你都constexpr了，还sso干啥

C++ 中 constexpr 的发展史!

感觉有点不认识const了

Velox: Meta’s Unified Execution Engine

还挺有意思的

too dangerous for c++

对比rust c++的shared_ptr没有太限制生命周期，可能会用错，c++还是太自由了

On the virtues of the trailing comma

就是这种行尾的逗号，对于git merge也友好

// C, C++
Thing a[] = {
    { 1, 2 },
    { 3, 4 },
    { 5, 6 },
    //      ^ trailing comma
};

// C#
Thing[] a = new[] {
    new Thing {
        Name = "Bob",
        Id = 31415,
        //        ^ trailing comma
    },
    new Thing {
        Name = "Alice",
        Id = 2718,
        //       ^ trailing comma
    },
//   ^ trailing comma
};

Dictionary d = new Dictionary<string, Thing>() {
    ["Bob"] = new Thing("Bob") { Id = 31415 },
    ["Alice"] = new Thing("Alice", 2718),
    //                                  ^ trailing comma
};

感觉这是个不成文规定实现

Formatting User-Defined Types in C++20

简单实现

#include <format>
#include <iostream>

class SingleValue {
 public: 
    SingleValue() = default;
    explicit SingleValue(int s): singleValue{s} {}
    int getValue() const {
        return singleValue;
    }
 private:
    int singleValue{};
};

template<>
struct std::formatter<SingleValue> : std::formatter<int> {             // (1)
  auto format(const SingleValue& singleValue, std::format_context& context) const {
    return std::formatter<int>::format(singleValue.getValue(), context);
  }
};

int main() {
    SingleValue singleValue0;
    SingleValue singleValue2020{2020};
    SingleValue singleValue2023{2023};

    std::cout << std::format("{:*<10}", singleValue0) << '\n';
    std::cout << std::format("{:*^10}", singleValue2020) << '\n';
    std::cout << std::format("{:*>10}", singleValue2023) << '\n';
}

Visual overview of a custom malloc() implementation

典型内存池实现介绍

Vectorizing Unicode conversions on real RISC-V hardware

哥们看不懂rsicv 不太懂

C++20 Concepts Applied – Safe Bitmasks Using Scoped Enums

直接贴代码吧

template<typename T>
constexpr std::
  enable_if_t<
    std::conjunction_v<std::is_enum<T>,
      // look for enable_bitmask_operator_or
      // to  enable this operator ①
      std::is_same<bool,
        decltype(enable_bitmask_operator_or(
          std::declval<T>()))>>,
  T>
operator|(const T lhs, const T rhs) {
  using underlying = std::underlying_type_t<T>;
  return static_cast<T>(
    static_cast<underlying>(lhs) |
    static_cast<underlying>(rhs));
}
namespace Filesystem {
  enum class Permission : uint8_t {
    Read = 1,
    Write,
    Execute,
  };
  // Opt-in for operator| ②
  constexpr bool 
    enable_bitmask_operator_or(Permission);
} // namespace Filesystem

这个玩法就是针对部分提供enable_bitmask_operator_or 的enum class 提供 operator |

现在是2024了，有没有新的玩法

concept

template<typename T>
requires(std::is_enum_v<T>and requires(T e) {
  // look for enable_bitmask_operator_or to
  // enable this operator ①
  enable_bitmask_operator_or(e);
}) constexpr auto
operator|(const T lhs, const T rhs) {
  using underlying = std::underlying_type_t<T>;
  return static_cast<T>(
    static_cast<underlying>(lhs) |
    static_cast<underlying>(rhs));
}
namespace Filesystem {
  enum class Permission : uint8_t {
    Read    = 0x01,
    Write   = 0x02,
    Execute = 0x04,
  };
  // Opt-in for operator| ②
  consteval 
    void enable_bitmask_operator_or(Permission);
} // namespace Filesystem

c++23 有to_underlying了

template<typename T>
requires(std::is_enum_v<T>and requires(T e) {
  enable_bitmask_operator_or(e);
}) constexpr auto
operator|(const T lhs, const T rhs)
{
  return static_cast<T>(std::to_underlying(lhs) |
                        std::to_underlying(rhs));
}

简洁一丢丢

开源项目介绍

asteria 一个脚本语言，可嵌入，长期找人，希望胖友们帮帮忙，也可以加群753302367和作者对线
Unilang deepin的一个通用编程语言，点子有点意思，也缺人，感兴趣的可以github讨论区或者deepin论坛看一看。这里也挂着长期推荐了
graphiz 一个图遍历演示库，挺好玩的
mantis P2M: A Fast Solver for Querying Distance from Point to Mesh Surface 的实现

Assets 2

09 Feb 09:28

wanghenshui

v1.4.8

25b191b

C++ 中文周刊 2024-02-09 第148期

周刊项目地址

qq群点击进入

欢迎投稿，推荐或自荐文章/软件/资源等

请提交 issue 或评论区留言

本期文章由黄亮Anthony HNY 不语赞助

祝大家新年快乐

资讯

boost 新增 charconv

把from_chars搬到c++11，我建议放弃c++11 ，2024了bro 文档

标准委员会动态/ide/编译器信息放在这里

What’s New in vcpkg January 2024

另外有个重写loki的活动哈，有点幽默，感兴趣可以点击直达

（重写loki不是已经做了吗，folly啊）

brpc发布1.8版本 release note

文章

使用 hugetlb 提升性能

redis 为啥不用?避免影响rdb生成？

另外这有个 hugetop命令

代码这里 https://gitlab.com/procps-ng/procps

C++异常的误用以及改进

群友翻译，非常干货，值得看看

不过异常设计的还是太傻呗了

[RFC] Upstreaming ClangIR https://discourse.llvm.org/t/rfc-upstreaming-clangir/76587/19

之前聊到的MLIR 在c/c++上的落地 CIR准备合入到LLVM

感觉clang明显更激进一些，而gcc还是一群老登

Option Soup: the subtle pitfalls of combining compiler flags https://hacks.mozilla.org/2024/01/option-soup-the-subtle-pitfalls-of-combining-compiler-flags/

傻逼locale问题，虽然你是静态连接libstdcxx-static，但是locale并不static

errno and libc https://dxuuu.xyz/errno.html

errno是内核设置还是libc设置？当然是libc

怎么验证？简单来说就是同一个系统调用，调用syscall/通过汇编调用，观察errno变化

static int use_wrapper(int cmd, union bpf_attr *attr, unsigned int size) {
    long ret;
    errno = 0;
    ret = syscall(__NR_bpf, cmd, attr, size);
    if (ret < 0)
        printf("wrapped syscall failed, ret=%d, errno=%d\n", ret, errno);
    else
        printf("wrapped syscall succeeded\n");
}

asm

static int use_raw(int cmd, union bpf_attr *attr, unsigned int size) {
    long ret;
    errno = 0;
    __asm__(
        "movq %1, %%rax\n"        /* syscall number */
        "movq %2, %%rdi\n"        /* arg1 */
        "movq %3, %%rsi\n"        /* arg2 */
        "movq %4, %%rdx\n"        /* arg3 */
        "syscall\n"
        "movq %%rax, %0\n"

        /* retval */
        : "=r"(ret)

        /* input operands */
        : "r"((long)__NR_bpf), "r"((long)cmd), "r"((long)attr), "r"((long)size)

        /* clobbers */
        : "rax", "rdi", "rsi", "rdx"
       );

    /* Check return value */
    if (ret < 0)
        printf("raw syscall failed, ret=%d, errno=%d\n", ret, errno);
    else
        printf("raw syscall succeeded\n");
}

gcc 7.3 bug一例 class template argument deduction fails in new-expression

template <typename T1, typename T2>
struct Bar {
    Bar(T1, T2) { }
};

int main() {
    auto x = Bar(1, 2);
    auto y = new Bar(3, 4);
    auto z = new Bar{3, 4};
}

低版本的gcc （现在低版本gcc指的是7/8了）用大括号绕过即可，为什么列这个呢，因为我遇到了

Unexpected Ways Memory Subsystem Interacts with Branch Prediction

https://github.com/ibogosavljevic/johnysswlab/blob/master/2023-12-branches-memory/binary_search.cpp#L53

int binary_search(int* array, int number_of_elements, int key) {
    int low = 0, high = number_of_elements-1, mid;
    while(low <= high) {
        mid = (low + high)/2;

        if (st == search_type::REGULAR) {
            if(array[mid] < key)
                low = mid + 1; 
            else if(array[mid] == key)
                return mid;
            else
                high = mid-1;
        }

        if (st == search_type::CONDITIONAL_MOVE) {
            int middle = array[mid];
            if (middle == key) {
                    return mid;
                }

            int new_low = mid + 1;
            int new_high = mid - 1;
            __asm__ (
                "cmp %[array_middle], %[key];"
                "cmovae %[new_low], %[low];"
                "cmovb %[new_high], %[high];"
                : [low] "+&r"(low), [high] "+&r"(high)
                : [new_low] "g"(new_low), [new_high] "g"(new_high), [array_middle] "g"(middle), [key] "g"(key)
                : "cc"
            );
        }

        if (st == search_type::ARITHMETIC) {
            int middle = array[mid];
            if (middle == key) {
                return mid;
            }

            int new_low = mid + 1;
            int new_high = mid - 1;
            int condition = array[mid] < key;
            int condition_true_mask = -condition;
            int condition_false_mask = -(1 - condition);

            low += condition_true_mask & (new_low - low);
            high += condition_false_mask & (new_high - high); 

        }
    }
    return -1;
}

Array Size (in elements)	Original	Conditional Moves	Arithmetics
4 K	Runtime: 0.22 sInstr: 434 M CPI: 1.96Mem. Data Volume: 0.45 GB	Runtime: 0.14 sInstr: 785 MCPI: 0.728Mem. Data Volume: 0.25 GB	Runtime: 0.19 sInstr: 1.102 MCPI: 0.69Mem. Data Volume: 0.32 GB
16 K	Runtime: 0.26 sInstr: 511 MCPI: 2.01Mem. Data Volume: 0.49 GB	Runtime: 0.19 sInstr: 928 MCPI: 0.77Mem. Data Volume: 0.39 GB	Runtime: 0.24 sInstr: 1.308 MCPI: 0.72Mem. Data Volume: 0.46 GB
64 K	Runtime: 0.32 sInstr: 584 MCPI: 2.143Mem. Data Volume: 0.48 GB	Runtime: 0.24 sInstr: 1.064 MCPI: 0.90Mem. Data Volume: 0.25 GB	Runtime: 0.31Instr: 1.504CPI: 0.82Mem. Data Volume: 0.26 GB
256 K	Runtime: 0.43 sInstr: 646 MCPI: 2.59Mem. Data Volume: 0.36 GB	Runtime: 0.39 sInstr: 1.199 MCPI: 1.28Mem. Data Volume: 0.32 GB	Runtime: 0.47 sInstr: 1.698 MCPI: 1.09Mem. Data Volume: 0.36 GB
1 M	Runtime: 0.56 sInstr: 727 MCPI: 3.05Mem. Data Volume: 0.67 GB	Runtime: 0.59 sInstr: 1.333 MCPI: 1.72Mem. Data Volume: 0.59 GB	Runtime: 0.70 sInstr: 1.891 MCPI: 1.42Mem. Data Volume: 0.68 GB
4 M	Runtime: 1.127 sInstr: 798 MCPI: 4.65Mem. Data Volume: 9.94 GB	Runtime: 1.48 sInstr: 1.467 MCPI: 3.1Mem. Data Volume: 3.75 GB	Runtime: 1.59 sInstr: 2.084 MCPI: 2.45Mem. Data Volume: 3.9 GB
16 M	Runtime: 1.65 sInstr: 870 MCPI: 6.26Mem. Data Volume: 18.48 GB	Runtime: 2.75 sInstr: 1.601CPI: 4.16Mem. Data Volume: 6.95 GB	Runtime: 2.90 sInstr: 2.277 MCPI: 3.18Mem. Data Volume: 7.05 GB

课外阅读

快排代码

static int partition(std::vector<float>& vector, int low, int high) {
    float pivot = vector[high];
    int i = (low - 1);
    for (int j = low; j < high; j++) {
        if (vector[j] <= pivot) {
            i++;
            std::swap(vector[i], vector[j]);
        }
    }
    i = i + 1;
    std::swap(vector[i], vector[high]);
    return i;
}

    static int partition(std::vector<float>& vector, int low, int high) {
        float* vector_i = &vector[low];
        float* vector_j = &vector[low];
        float* vector_end = &vector[high];

        __m128 pivot = _mm_load_ss(&vector[0] + high);
        while(true) {
            if (vector_j >= vector_end) break;
            __m128 vec_i = _mm_load_ss(vector_i);
            __m128 vec_j = _mm_load_ss(vector_j);

            __m128 compare = _mm_cmplt_ss(vec_j, pivot); // if (vec_j < pivot)
            __m128 new_vec_i = _mm_blendv_ps(vec_i, vec_j, compare);
            __m128 new_vec_j = _mm_blendv_ps(vec_j, vec_i, compare);

            int increment = _mm_extract_epi32(_mm_castps_si128(compare), 0) & 0x1;

            _mm_store_ss(vector_i, new_vec_i);
            _mm_store_ss(vector_j, new_vec_j);

            vector_i += increment;

            vector_j++;
        }

        std::swap(*vector_i, *vector_end);
        return (vector_i - &vector[0]);
    }

代码在这里 https://github.com/ibogosavljevic/johnysswlab/blob/master/2022-01-sort/

感觉值得展开讲讲。我找作者要了授权，后面还会继续介绍这个。本地也复现一下

constexpr number parsing

有句讲句 from_chars接口有点难用

auto res = std::from_chars(str.data() + start, str.data() + str.size(), result);    
if (res.ec == std::errc{}) {...}

可以用结构化绑定，更好看一点

c++26可以直接这么用

auto res = std::from_chars(str.data() + start, str.data() + str.size(), result);    
if (res) { ... }

converting string_view to time point

std::chrono::sys_seconds convertToTimePoint(std::string_view fmtstring)
{
    std::chrono::sys_seconds syssec;
    std::istringstream in{std::string{raw_data}};//raw_data is a string_view
    in >> std::chrono::parse(fmtstring, syssec);
    return syssec;
}

没有std::chrono::parse？

std::chrono::sys_seconds convertToTimePoint(std::string_view fmtstring)
{
    std::chrono::sys_seconds syssec;
    std::istringstream in{std::string{raw_data}};//raw_data is a string_view
    std::tm tm = {};
    in >> std::get_time(&tm, fmtstring.data());
    std::time_t time = std::mktime(&tm);
    if(in.good())
      syssec =  std::chrono::time_point_cast< std::c...

Assets 2

29 Jan 03:07

wanghenshui

v1.4.7

abef22e

C++ 中文周刊 2024-01-26 第147期

qq群点击进入

欢迎投稿，推荐或自荐文章/软件/资源等

本期文章由不语沧海彩虹蛇皮虾赞助

jetbrain发布了23年 c++ 生态回顾 https://blog.jetbrains.com/clion/2024/01/the-cpp-ecosystem-in-2023/

感兴趣的可以看看，没啥意思

资讯

标准委员会动态/ide/编译器信息放在这里

一月邮件

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/#mailing2024-01

文章

全新的构造函数，C++ 中的 relocate 构造函数

其实这个概念之前讨论了很久，老熟人Arthur O’Dwyer 提了很多相关的提案 patch。大家感兴趣的可以读一下。算是一个优化的点

之前也提到过，比如

讲trivial relocation的现状以及开源实现

		T.r. types	Non-t.r. types	Throwing-move types	Rightward motion (`insert`)	Leftward motion (`erase`)	Non-pointer iterators
STL Classic (non-relocating)	`std::copy`	N/A	N/A	✓	UB	✓	✓
	`std::copy_n`	N/A	N/A	✓	UB	UB	✓
	`std::copy_backward`	N/A	N/A	✓	✓	UB	✓
cstring	`memcpy`	✓	UB	✓	UB	UB	SFINAE
cstring	`memmove`	✓	UB	✓	✓	✓	SFINAE
Qt	`q_uninitialized_relocate_n`	✓	✓	✓?	UB	UB	SFINAE
Qt	`q_relocate_overlap_n`	✓	✓	✓	✓	✓	SFINAE
BSL	`destructiveMove`	✓	✓	✓	UB	UB	SFINAE
P2786R0	`trivially_relocate`	✓	SFINAE	SFINAE	✓	✓	SFINAE
	`relocate`	✓	✓	SFINAE	✓	✓	SFINAE
	`move_and_destroy`	✓	✓	SFINAE	UB	?	✓
P1144R6	`uninitialized_relocate`	✓	✓	✓	UB	✓	✓
P1144R6	`uninitialized_relocate_n`	✓	✓	✓	UB	✓	✓
P1144R7	`uninitialized_relocate_backward`	✓	✓	✓	✓	UB	✓

std::relocate’s implementation is cute

等等，周边信息很多

why gcc and clang sometimes emit an extra mov instruction for std::clamp on x86

直接贴代码 https://godbolt.org/z/rq9dsGxh5

#include <algorithm>

double incorrect_clamp(double v, double lo, double hi){
    return std::min(hi, std::max(lo, v));
}

double official_clamp(double v, double lo, double hi){ 
    return std::clamp(v, lo, hi); 
}

double official_clamp_reordered(double hi, double lo, double v){ 
    return std::clamp(v, lo, hi); 
}

double correct_clamp(double v, double lo, double hi){
    return std::max(std::min(v, hi), lo);
}

double correct_clamp_reordered(double lo, double hi, double v){
    return std::max(std::min(v, hi), lo);
}

对应的汇编

incorrect_clamp(double, double, double):
        maxsd   xmm0, xmm1
        minsd   xmm0, xmm2
        ret
official_clamp(double, double, double):
        maxsd   xmm1, xmm0
        minsd   xmm2, xmm1
        movapd  xmm0, xmm2
        ret
official_clamp_reordered(double, double, double):
        maxsd   xmm1, xmm2
        minsd   xmm0, xmm1
        ret
correct_clamp(double, double, double):
        minsd   xmm2, xmm0
        maxsd   xmm1, xmm2
        movapd  xmm0, xmm1
        ret
correct_clamp_reordered(double, double, double):
        minsd   xmm1, xmm2
        maxsd   xmm0, xmm1
        ret

为什么正确的代码多了一条 mov xmm？

浮点数 +-0的问题，标准要求返回第一个参数，比如 std::clamp(-0.0f, +0.0f, +0.0f)

如果配置了-ffinite-math-only -fno-signed-zeros 最终汇编是一样的 https://godbolt.org/z/esMY18a5z

Fuzzing an API with libfuzzer

举一个fuzz例子，大家都学一下

你看这个接口感觉可能无从下手

extern "C"
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)

我们要测试的接口长这样

template <typename T, size_t Capacity>
requires (std::is_nothrow_move_constructible_v<T> && std::is_nothrow_move_assignable_v<T>)
class fixed_stack
{
public:
    T& push(T t) {
        if (size() == capacity()) throw size_error("push on full stack");
        return data_[++size_] = std::move(t);
    }
    T& back() {
        if (empty()) throw size_error("back on empty stack");
        return data_[size_];
    }
    T pop() {
        if (empty()) throw size_error("pop on empty stack");
        return std::move(data_[size_--]);
    }
    [[nodiscard]] bool empty() const { return size() == 0; }
    [[nodiscard]] size_t size() const { return size_; }
    [[nodiscard]] static size_t capacity() { return Capacity; }
private:
    size_t size_ = 0;
    std::array<T, Capacity> data_{};
};

考虑一下测试代码

可能长这样

truct failure : std::string {
    using std::string::string;
};

#define REQUIRE(...) if (__VA_ARGS__) {;} else throw failure #(__VA_ARGS__)
#define FAIL(...) throw failure(__VA_ARGS__)

int main() {
    unsigned fail_count = 0;
    struct test {
        const char* name;
        std::function<void()> f;
    };
    test tests[] {
            { "default constructed stack is empty",
              []{
                fixed_stack<int, 8> s;
                REQUIRE(s.size() == 0);
                REQUIRE(s.empty());
            }},
            { "Each push grows size by one",
              [] {
                fixed_stack<int, 8> s;
                s.push(3);
                REQUIRE(s.size() == 1);
                s.push(2);
                REQUIRE(s.size() == 2);
                s.push(8);
                REQUIRE(s.size() == 3);
              }
            },
            { "Pop returns the pushed elements in reverse order",
              []{
                fixed_stack<int, 8> s;
                s.push(3);
                s.push(2);
                s.push(8);
                REQUIRE(s.pop() == 8);
                REQUIRE(s.pop() == 2);
                REQUIRE(s.pop() == 3);
            }
            },
            { "pop on empty throws",
              []{
                fixed_stack<int, 8> s;
                s.push(3);
                s.pop();
                try {
                    s.pop();
                    FAIL("didn't throw");
                }
                catch (const size_error&)
                {
                    // good!
                }
            }}
    };
    for (auto& t : tests){
        try {
            std::cout << std::setw(60) << std::left << t.name << "\t";
            t.f();
            std::cout << "PASS!";
        } catch (const failure& f) {
            std::cout << "FAILED!\nError: " << f << '\n';
            ++fail_count;
        } catch (...) {
            std::cout <<  "FAILED!\nUnknown reason!";
            ++fail_count;
        }
        std::cout << '\n';
    }
}

现在咱们考虑怎么把这个测试代码改写成fuzz test?

简单来说输入的就是一段二进制，怎么根据这个二进制拆解出不同的动作，拆解出不同的输入？

struct exhausted {};
struct source {
    std::span<const uint8_t> input;
    template <typename T>
    requires (std::is_trivial_v<T>)
    T get() {
        constexpr auto data_size = sizeof(T);
        if (input.size() < data_size) throw exhausted{};
        alignas (T) uint8_t buff[data_size];
        std::copy_n(input.begin(), data_size, buff);
        input = input.subspan(data_size);
        return std::bit_cast<T>(buff);
    }
};
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    source s{{data, size}};
    std::vector<int> comparison;
    std::optional<fixed_stack<std::unique_ptr<int>, 8>> stack;
    try {
        for (;;) {
            if (!stack.has_value()) {
                stack.emplace();
            }
            //   通过source 拿一个u8来枚举动作
            const auto action = s.get<uint8_t>();
            switch (action) {
                case 0: // push
                {
                    // 通过source拿到需要的输入数据
                    const int v = s.get<int>();
                    const auto size = stack->size();
                    try {
                        stack->push(std::make_unique<int>(v));
                        comparison.push_back(v);
                        assert(stack->size() == comparison.size());
                        assert(stack->b...

Assets 2

03 Feb 03:15

wanghenshui

v1.4.1

abef22e

C++ 中文周刊第141期

qq群手机qq点击进入

RSS https://github.com/wanghenshui/cppweeklynews/releases.atom

最近在找工作准备面试题，更新可能有些拖沓，见谅

本周内容比较少

本期文章由黄亮Anthony HNY 赞助

资讯

标准委员会动态/ide/编译器信息放在这里

clion新增AI助手 https://www.jetbrains.com/clion/whatsnew/

编译器信息最新动态推荐关注hellogcc公众号 OSDT Weekly 2023-12-06 第231期

文章

现代C++语言核心特性解析C++23标准补充 - 免费电子书

感兴趣的可以看一下。很短

Outline: 代码分离编译优化

抽出相同的二进制，节省二进制大小。和inline逻辑相反

可能会有性能衰退

原理？如何找出重复的二进制序列？后缀树爆搜

也可以从不同角度来做，比如IR层

具体很细节。感兴趣的可以看看

HotColdSplitting: 代码分离之性能优化

借助outline做冷热分离，有性能提升，还挺有意思的，算是PGO一部分吧，拿到profile来分析

For processing strings, streams in C++ can be slow

stream就是垃圾 strstream没人用。有spanstream代替

Parsing 8-bit integers quickly

lemire博士新活

常规

int parse_uint8_naive(const char *str, size_t len, uint8_t *num) {
  uint32_t n = 0;
  for (size_t i = 0, r = len & 0x3; i < r; i++) {
    uint8_t d = (uint8_t)(str[i] - '0');
    if (d > 9)
     return 0;
    n = n * 10 + d;
  }
  *num = (uint8_t)n;
  return n < 256 && len && len < 4;
}

当然c++可以用from chars加速

int parse_uint8_fromchars(const char *str, size_t len, uint8_t *num) {
  auto [p, ec] = std::from_chars(str, str + len, *num);
  return (ec == std::errc());
}

能不能更快？这是u8场景，考虑SWAR，组成一个int来处理

int parse_uint8_fastswar(const char *str, size_t len, 
    uint8_t *num) {
  if(len == 0 || len > 3) { return 0; }
  union { uint8_t as_str[4]; uint32_t as_int; } digits;
  memcpy(&digits.as_int, str, sizeof(digits));
  digits.as_int ^= 0x30303030lu;
  digits.as_int <<= ((4 - len) * 8);
  uint32_t all_digits = 
    ((digits.as_int | (0x06060606 + digits.as_int)) & 0xF0F0F0F0) 
       == 0;
  *num = (uint8_t)((0x640a01 * digits.as_int) >> 24);
  return all_digits 
   & ((__builtin_bswap32(digits.as_int) <= 0x020505));
}

评论区bob给了个更快的

int parse_uint8_fastswar_bob(const char *str, size_t len, uint8_t *num) {
  union { uint8_t as_str[4]; uint32_t as_int; } digits;
  memcpy(&digits.as_int, str, sizeof(digits));
  digits.as_int ^= 0x303030lu;
  digits.as_int <<= (len ^ 3) * 8;
  *num = (uint8_t)((0x640a01 * digits.as_int) >> 16);
  return ((((digits.as_int + 0x767676) | digits.as_int) & 0x808080) == 0) 
   && ((len ^ 3) < 3) 
   && __builtin_bswap32(digits.as_int) <= 0x020505ff;
}

感兴趣可以玩一玩

Nerd Snipe: Small Integer Parsing

场景完美hash，4bytes字符串做key，如何快速算hash？

直接把字符串当成int来算

#define SIZE 512
uint8_t lut[SIZE] = {};

// multiply, shift, mask
uint32_t simple_hash(uint32_t u) {
    uint64_t h = (uint64_t) u * 0x43ff9fb13510940a;
    h = (h >> 32) % SIZE;
    return (uint32_t) h;
}

// generate, cast and hash
void build_lut() {
    char strings[256*4];
    memset(strings, 0, sizeof(strings));
    char *iter = strings;
    for (int i = 0; i < 256; ++i) {
        sprintf(iter, "%d", i);
        iter += 4;
    }

    iter = strings;
    for (int i = 0; i < 256; ++i) {
        unsigned c = *(unsigned*) iter;
        iter += 4;
        unsigned idx = simple_hash(c);
        lut[idx] = i;
    }
}

视频

cppcon2023 工作日开始更新视频了，这周好玩的列一下

A Long Journey of Changing std::sort Implementation at Scale - Danila Kutenin - CppCon 2023 https://www.youtube.com/watch?v=cMRyQkrjEeI

这个作者danlark在llvm比较活跃

这个视频非常值得一看，列举了sort的改进优化，各个系统的差异，以及nth_element的副作用问题

很多库写的median算法实际是错的！

https://godbolt.org/z/9xWoYTfMP

int median(std::vector<int>& v) {
   int mid = v.size() / 2;
   std::nth_element(v.begin(), v.begin() + mid, v.end());
   int result = v[mid];
   if (v.size() % 2 == 0) {
     std::nth_element(v.begin(), v.begin() + mid - 1, v.end());
     result = (v[mid] + v[mid-1])/2;  
     // result = (result + v[mid-1]) /2;
   }
   return result;
}

由于nth_element不保证整体有序，只保证n的位置是对的，所以第二次的计算可能改变第一次的结果

然而社区很多median实现都是错的

Customization Methods: Connecting User and C++ Library Code - Inbal Levi - CppCon 2023 https://www.youtube.com/watch?v=mdh9GLWXWyY

介绍了一些查找逻辑的设计，从swap到ADL，到CPO tag_invoke 再到最近的讨论，有Custom function设计

还算有意思。但有句讲句tag_invoke很扭曲，cpo也是

Variable Monitoring with Declarative Interfaces - Nikolaj Fogh - Meeting C++ 2023 https://www.youtube.com/watch?v=AJDbu1kaj5g

auto myMonitor = Monitor([](int i){ return i > 0; }, [](bool valid){ std::cout << "Valid: " << valid << std::endl; }]);
int variable = 0;
myMonitor(variable); // Prints Valid: 0
variable = 1;
myMonitor(variable); // Prints Valid: 1

不过不知道有啥用途。signal handler类似的玩意

比如监控内存，真到了瓶颈，直接在发现的位置条件判断也不是不行

或者类似bvar之类的玩意，把数据导出回调交给别的组件

不知道什么场景能用上

招聘

字节的音视频团队，主要负责剪映上的音视频/非线性编辑相关工作，业务前景也比较好，目前有三个方向的岗位

桌面端音视频研发 https://job.toutiao.com/s/i8enPrw5
多端音视频引擎研发 https://job.toutiao.com/s/i8enr7Es
C++工程基础架构研发 https://job.toutiao.com/s/i8enjTHT

base北上广深杭都可以，薪资open，有兴趣的同学可以通过链接投递

英伟达招llvm实习生

https://nvidia.wd5.myworkdayjobs.com/NVIDIAExternalCareerSite/job/China-Shanghai/Software-Intern--LLVM-Compiler-Optimization_JR1976102

联系方式 vwei@nvidia.com

或微信 aoewqf1997 （请备注“LLVM实习生”

Assets 2

29 Jan 03:03

wanghenshui

v1.4.6

bb2adb9

C++ 中文周刊 2024 01 19 第146期

qq群点击进入

欢迎投稿，推荐或自荐文章/软件/资源等

本期文章由黄亮Anthony Amnisia HNY CHENL 赞助

上周和朋友们吃饭耽误了，一直没空写

资讯

标准委员会动态/ide/编译器信息放在这里

最近的最大热门就是Linux社区又有人讨论引入c++了，很多c宏实际上做的就是一部份concept工作，引入concept还是很爽的，不过linus有生之年应该不会引入，不过是又一次炒冷饭

祝linus健康

文章

The C++20 Naughty and Nice List for Game Devs

介绍一些对游戏开发比较好的c++20特性

<=> 不错
coroutine不错
std::bit_cast 不错复制转换，避免UB
<numbers>不错，有PI可以用了
新的同步原语 <barrier>, <latch>, and <semaphore>
<span>可以
Designated initializers 非常好用,c一直都有，居然没兼容

struct Point {
    float x;
    float y;
    float z;
};

Point origin{.x = 0.f, .y = 0.f, .z = 0.f};

char8_t比较脑瘫，众所周知，char8_t是unsigned char，但u8 udl以前是修饰char的，c++20改成修饰char8_t了

破坏u8语义了，msvc可以/Zc:char8_t关掉，gcc也可以关 -fno-char8_t

https://en.cppreference.com/w/cpp/language/string_literal 第五条六条
(5,6) UTF-8 string literal

const char[N](until C++20)

const char8_t[N](since C++20)

no_unique_address msvc有ABI问题，慎用

Modules没法用

ranges没屌用

format 二进制太大了

source_location 没易用性提升不说，std::source_location::file_name居然返回 const char*

怎么想的我真他妈服了

Why My Print Didn't Output Before a Segmentation Fault

#include <stdio.h>

int main(void)
{
        printf("%s", "Hello!");
        int *p = NULL;
        *p = 5;
        // Will not be reached due to crash above
        printf("%s", "Another Hello!");
}
//$ gcc -Wall -Wextra -o hello hello.c && ./hello
//Segmentation fault (core dumped)

经典buffer IO没刷buffer。怎么改成正常的？加\n 用stderr用fflush

C++ time_point wackiness across platforms

timepoint在mac上有精度损失，代码

#include <stdio.h>

#include <chrono>

int main() {
  std::chrono::system_clock::time_point tp =
      std::chrono::system_clock::from_time_t(1234567890);

  // Okay.
  tp += std::chrono::milliseconds(1);

  // No problem here so far.
  tp += std::chrono::microseconds(1);

  // But... this fails on Macs:
  // tp += std::chrono::nanoseconds(123);

  // So you adapt, and this works everywhere.  It slices off some of that
  // precision without any hint as to why or when, and it's ugly too!

  tp += std::chrono::duration_cast<std::chrono::system_clock::duration>(
      std::chrono::nanoseconds(123));

  // Something like this swaps the horizontal verbosity for vertical
  // stretchiness (and still slices off that precision).

  using std::chrono::duration_cast;
  using std::chrono::system_clock;
  using std::chrono::nanoseconds;

  tp += duration_cast<system_clock::duration>(nanoseconds(123));

  // This is what you ended up with:

  auto tse = tp.time_since_epoch();

  printf("%lld\n", (long long) duration_cast<nanoseconds>(tse).count());

  // Output meaning when split up:
  //
  //        sec        ms  us  ns
  //
  // macOS: 1234567890 001 001 000  <-- 000 = loss of precision (246 ns)
  //
  // Linux: 1234567890 001 001 246  <-- 246 = 123 + 123 (expected)
  //

  return 0;
}

Implementing the missing sign instruction in AVX-512

sign函数很常用, 大概长这样


function sign(a, b): # a and b are integers
   if b == 0 : return 0
   if b < 0 : return -a 
   if b > 0 : return a

很容易用sign实现abs

abs(a) = sign(a,a)

进入正题，写一个avx512 sign

#include <x86intrin.h>

__m512i _mm512_sign_epi8(__m512i a, __m512i b) {
  __m512i zero = _mm512_setzero_si512();
  __mmask64 blt0 = _mm512_movepi8_mask(b);
  __mmask64 ble0 = _mm512_cmple_epi8_mask(b, zero);
  __m512i a_blt0 = _mm512_mask_mov_epi8(zero, blt0, a);
  return _mm512_mask_sub_epi8(a, ble0, zero, a_blt0);;
}

如果单独处理0场景，可以这样

#include <x86intrin.h>

__m512i _mm512_sign_epi8_cheated(__m512i a, __m512i b) {
   __m512i zero = _mm512_setzero_si512();
  __mmask64 blt0 = _mm512_movepi8_mask(b);
  return _mm512_mask_sub_epi8(a, blt0, zero, a);;
}

/*
function sign_cheated(a, b): # a and b are integers
   if b < 0 : return -a 
   if b ≥ 0 : return a
*/

What the `func` is that?

c++26咱们有四个function了 std::function std::move_only_function
std::copyable_function std::function_ref

都什么玩意？

std::function_ref好理解，就std::function的引用view版本，那他为啥不叫std::function_view?

另外两个是啥玩意？

回到function上，function的缺点是什么？看代码

struct Functor {
    void operator()() { std::cout << "Non-const\n"; }
    void operator()() const { std::cout << "Const\n"; }
};

const Functor ftor;                   // I'm const!
const std::function<void()> f = ftor; // So am I! Const all the way
f();                                  // Prints "Non-const"

问题就在于function的表现，复制的时候，用的是值，自然用的是non const版本

这是缺陷！如何变成正常的样子？也就是这样

      std::function<void()> f = ftor; f(); // prints "Non-const"
const std::function<void()> f = ftor; f(); // prints "Const"

为了修复这个const 问题，引入move_only_function 显然只能初始化一次

另外引入copyable_function 告诉大伙，function应该是copyable_function，大家注意语义

raymond chen环节，看不太懂

How do I prevent my C++/WinRT implementation class from participating in COM aggregation?

In C++/WinRT, how can I await multiple coroutines and capture the results?, part 1
In C++/WinRT, how can I await multiple coroutines and capture the results?, part 2
In C++/WinRT, how can I await multiple coroutines and capture the results?, part 3

视频

Taro: Task Graph-Based Asynchronous Programming Using C++ Coroutine – Dian-Lun Lin - CppCon 2023

他这个设计就是taskflow的coroutine版本！说实话我之前想到过这个点子，但人家费心思实现了，我就想想

互动环节

最近甲流非常严重，周围很多得的进医院的，但一开始按照普通感冒治疗没用，得抗病毒多喝水

希望大家别得

啥也不是，散会！

本文永久链接

Assets 2

29 Jan 03:01

wanghenshui

v1.4.5

14a845d

C++ 中文周刊第145期

qq群点击进入

RSS https://github.com/wanghenshui/cppweeklynews/releases.atom

欢迎投稿，推荐或自荐文章/软件/资源等评论区留言

本期文章由黄亮Anthony HNY 赞助

2024 01 07

资讯

标准委员会动态/ide/编译器信息放在这里

文章

现代分支预测：从学术界到工业界

看个乐呵, 了解概念对于CPU运行还是有点理解的

LLVM中指令选择的流程是啥样的？

LLVM知识，学吧，都是知识，早晚碰到

【数据结构】Jemalloc中的Radix Tree

解析Jemalloc的关键数据结构

jemalloc最新知识，学吧

Optimizing the unoptimizable: a journey to faster C++ compile times

编译很慢，怎么抓？

#include <fmt/core.h>

int main() {
  fmt::print("Hello, {}!\n", "world");
}
// c++ -ftime-trace -c hello.cc -I include -std=c++20

ftime-trace的数据可以放到浏览器的tracing里，比如 chrome://tracing/

firefox可以用这个 https://profiler.firefox.com/from-url/https%3A%2F%2Fvitaut.net%2Ffiles%2F2024-hello-before.json/marker-chart/?globalTrackOrder=0&hiddenLocalTracksByPid=65312-fwx3&thread=0&timelineType=category&v=10

我没看懂他是怎么分析出头文件的耗时的，总之，把string前向声明一下

#ifdef FMT_BEGIN_NAMESPACE_STD
FMT_BEGIN_NAMESPACE_STD
template <typename Char>
struct char_traits;
template <typename T>
class allocator;
template <typename Char, typename Traits, typename Allocator>
class basic_string;
FMT_END_NAMESPACE_STD
#else
# include <string>
#endif

但是这种接口编译不过

template <typename... T>
 FMT_NODISCARD FMT_INLINE auto format(format_string<T...> fmt, T&&... args)
    -> basic_string<char> {
   return vformat(fmt, fmt::make_format_args(args...));
 }

因为basic_string<char>找不到实现，怎么破？

template <typename... T, typename Char = char>
 FMT_NODISCARD FMT_INLINE auto format(format_string<T...> fmt, T&&... args)
    -> basic_string<Char> {
   return vformat(fmt, fmt::make_format_args(args...));
 }

然后这个操作就省掉了大量编译时间

Why doesn’t my code compile when I change a shared_ptr(p) to an equivalent make_shared(p)?

结构是这样的

class WidgetContainer : IWidgetCallback
{
    //    
};

    auto widget = std::shared_ptr<Widget>(new Widget(this));

能不能换成make_shared？不能，因为是private继承

怎么破？

    auto widget = std::make_shared<Widget>(
        static_cast<IWidgetCallback*>(this));

Did you know about C++26 static reflection proposal (2/N)?

struct foo {
  int a{};
  int b{};
  int c{};
};

static_assert(3 == std::size(std::meta::nonstatic_data_members_of(^foo)));

Inside STL: The deque, implementation

deque msvc实现有坑爹的地方

	gcc	clang	msvc
Block size	as many as fit in 512 bytes but at least 1 element	as many as fit in 4096 bytes but at least 16 elements	power of 2 that fits in 16 bytes but at least 1 element
Initial map size	8	2	8
Map growth	2×	2×	2×
Map shrinkage	On request	On request	On request
Initial first/last	Center	Start	Start
Members	block** map; size_t map_size; iterator first; iterator last;	block map; block first_block; block last_block; block end_block; size_t first; size_t size;	block** map; size_t map_size; size_t first; size_t size;
Map layout	counted array	simple_deque	counted array
Valid range	Pair of iterators	Start and count	Start and count
Iterator	T* current; T* current_block_begin; T* current_block_end; block** current_block;	T* current; block** current_block;	deque* parent; size_t index;
begin()/end()	Copy first and last.	Break first and first + size into block index and offset.	Break first and first + size into block index and offset.
Spare blocks	Aggressively pruned	Keep one on each end	Keep all

block size太小了

windows相关

视频

What we've been (a)waiting for? - Hana Dusíková - Meeting C++ 2023

介绍协程并写了个co curl 有点意思，视频我也传B站了 https://www.bilibili.com/video/BV1NG411B7Fy/

代码在这里 https://github.sheincorp.cn/hanickadot/co_curl

开源项目更新/新项目介绍

fmt 10.2更新，支持duration打印 %j 还支持这么玩

#include <fmt/chrono.h>

int main() {
  fmt::print("{}\n", std::chrono::days(42)); // prints "42d"
}

mp-units 2.1.0 released!

编译期物理计算的

nanobind 一个python binding，速度性能都不错，群友kenshin推荐
asteria 一个脚本语言，可嵌入，长期找人，希望胖友们帮帮忙，也可以加群753302367和作者对线
Unilang deepin的一个通用编程语言，点子有点意思，也缺人，感兴趣的可以github讨论区或者deepin论坛看一看。这里也挂着长期推荐了
gcc-mcf 懂的都懂

工作招聘

https://job.toutiao.com/s/i8Tv36Jf
字节杭州虚拟机v8研发

字节的音视频团队，主要负责剪映上的音视频/非线性编辑相关工作，业务前景也比较好，目前有三个方向的岗位

桌面端音视频研发 https://job.toutiao.com/s/i8enPrw5
多端音视频引擎研发 https://job.toutiao.com/s/i8enr7Es
C++工程基础架构研发 https://job.toutiao.com/s/i8enjTHT

base北上广深杭都可以，薪资open，有兴趣的同学可以通过链接投递

互动环节

新的一年开始了，本周刊也走过了三个年头，希望大家都健康我也继续保持更新下去

本文永久链接

如果有疑问评论最好在上面链接到评论区里评论，这样方便搜索，微信公众号有点封闭/知乎吞评论

Assets 2

Releases: wanghenshui/cppweeklynews

C++ 中文周刊 2024-03-30 第153期

资讯

文章

视频

C++ 中文周刊 2024-03-25 第152期

资讯

文章

工作招聘

互动环节

C++ 中文周刊 2024-03-09 第151期

资讯

文章

工作招聘

互动

C++ 中文周刊 2024-02-24 第150期

资讯

文章

C++ 中文周刊 2024-02-17 第149期

资讯

文章

开源项目介绍

C++ 中文周刊 2024-02-09 第148期

资讯

文章

[RFC] Upstreaming ClangIR https://discourse.llvm.org/t/rfc-upstreaming-clangir/76587/19

Option Soup: the subtle pitfalls of combining compiler flags https://hacks.mozilla.org/2024/01/option-soup-the-subtle-pitfalls-of-combining-compiler-flags/

errno and libc https://dxuuu.xyz/errno.html

C++ 中文周刊 2024-01-26 第147期

资讯

文章

C++ 中文周刊 第141期

资讯

文章

视频

招聘

C++ 中文周刊 2024 01 19 第146期

资讯

文章

raymond chen环节，看不太懂

视频

Taro: Task Graph-Based Asynchronous Programming Using C++ Coroutine – Dian-Lun Lin - CppCon 2023

热门库最近更新了什么

互动环节

C++ 中文周刊 第145期

资讯

文章

windows相关

视频

开源项目更新/新项目介绍

工作招聘

互动环节

C++ 中文周刊第141期

C++ 中文周刊第145期