Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiFlash crash for Attempted access has violated the permissions assigned to the memory are #7316

Closed
lilinghai opened this issue Apr 19, 2023 · 3 comments · Fixed by #7335
Closed

Comments

@lilinghai
Copy link

lilinghai commented Apr 19, 2023

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

stale read and chaos

2. What did you expect to see? (Required)

run stable

3. What did you see instead (Required)

tiflash crash and can't start

[2023/04/19 09:04:45.250 +08:00] [ERROR] [BaseDaemon.cpp:376] [########################################] [source=BaseDaemon] [thread_id=6009]
[2023/04/19 09:04:45.251 +08:00] [ERROR] [BaseDaemon.cpp:377] ["(from thread 101) Received signal Segmentation fault(11)."] [source=BaseDaemon] [thread_id=6009]
[2023/04/19 09:04:45.251 +08:00] [ERROR] [BaseDaemon.cpp:407] ["Address: 0x7f6d9a751000"] [source=BaseDaemon] [thread_id=6009]
[2023/04/19 09:04:45.251 +08:00] [ERROR] [BaseDaemon.cpp:413] ["Access: read."] [source=BaseDaemon] [thread_id=6009]
[2023/04/19 09:04:45.251 +08:00] [ERROR] [BaseDaemon.cpp:419] ["Attempted access has violated the permissions assigned to the memory area."] [source=BaseDaemon] [thread_id=6009]
[2023/04/19 09:04:45.292 +08:00] [ERROR] [BaseDaemon.cpp:569] ["
       0x753aae1    faultSignalHandler(int, siginfo_t*, void*) [tiflash+122923745]
                    libs/libdaemon/src/BaseDaemon.cpp:220
  0x7f707f10ed90    <unknown symbol> [libc.so.6+347536]
       0x8653b9c    crc64::_detail::update_simd(unsigned long, void const*, unsigned long) [tiflash+140852124]
                    libs/libcommon/src/crc64_sse2_asimd.cpp:63
       0x865378d    crc64::Digest::Digest(crc64::Mode)::$_2::__invoke(unsigned long, void const*, unsigned long) [tiflash+140851085]
                    libs/libcommon/src/crc64.cpp:52
       0x1b0ce43    DB::PS::V3::BlobStore<DB::PS::V3::u128::BlobStoreTrait>::write(DB::WriteBatch&&, std::__1::shared_ptr<DB::WriteLimiter> const&) [tiflash+28364355]
                    dbms/src/Storages/Page/V3/BlobStore.cpp:361
       0x75a8c8a    DB::PS::V3::PageStorageImpl::writeImpl(DB::WriteBatch&&, std::__1::shared_ptr<DB::WriteLimiter> const&) [tiflash+123374730]
                    dbms/src/Storages/Page/V3/PageStorageImpl.cpp:138
       0x7556597    DB::PageWriter::write(DB::WriteBatchWrapper&&, std::__1::shared_ptr<DB::WriteLimiter>) const [tiflash+123037079]
                    dbms/src/Storages/Page/PageStorage.cpp:580
       0x7f944a0    DB::RegionPersister::doPersist(std::__1::tuple<unsigned long, DB::MemoryWriteBuffer, unsigned long, unsigned long>&, DB::RegionTaskLock const&, DB::Region const&) [tiflash+133776544]
                    dbms/src/Storages/Transaction/RegionPersister.cpp:111
       0x7f93ca6    DB::RegionPersister::persist(DB::Region const&, DB::RegionTaskLock const&) [tiflash+133774502]
                    dbms/src/Storages/Transaction/RegionPersister.cpp:72
       0x7f3638d    DB::KVStore::persistRegion(DB::Region const&, DB::RegionTaskLock const&, char const*) [tiflash+133391245]
                    dbms/src/Storages/Transaction/KVStore.cpp:340
       0x7f37752    DB::KVStore::forceFlushRegionDataImpl(DB::Region&, bool, DB::TMTContext&, DB::RegionTaskLock const&, unsigned long, unsigned long) [tiflash+133396306]
                    dbms/src/Storages/Transaction/KVStore.cpp:425
       0x7f36e38    DB::KVStore::canFlushRegionDataImpl(std::__1::shared_ptr<DB::Region> const&, unsigned char, bool, DB::TMTContext&, DB::RegionTaskLock const&, unsigned long, unsigned long) [tiflash+133393976]
                    dbms/src/Storages/Transaction/KVStore.cpp:410
       0x7f371ee    DB::KVStore::tryFlushRegionData(unsigned long, bool, bool, DB::TMTContext&, unsigned long, unsigned long) [tiflash+133394926]
                    dbms/src/Storages/Transaction/KVStore.cpp:377
       0x7f55ae8    TryFlushData [tiflash+133520104]
                    dbms/src/Storages/Transaction/ProxyFFI.cpp:153
  0x7f7080b4a1b7    _$LT$engine_store_ffi..observer..TiFlashObserver$LT$T$C$ER$GT$$u20$as$u20$raftstore..coprocessor..AdminObserver$GT$::pre_exec_admin::h8a2321ec830b8ac0 [libtiflash_proxy.so+23855543]
  0x7f7081a2328d    raftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::apply_raft_cmd::h0d6480cc9a77a25f [libtiflash_proxy.so+39424653]
  0x7f7081a3ec28    raftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::process_raft_cmd::hc5c1bcdc35a04965 [libtiflash_proxy.so+39537704]
  0x7f7081a444ba    raftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::handle_raft_committed_entries::h6af5feec71e8f970 [libtiflash_proxy.so+39560378]
  0x7f7081a15bdc    raftstore::store::fsm::apply::ApplyFsm$LT$EK$GT$::handle_apply::h0e908bd981bf034b [libtiflash_proxy.so+39369692]
  0x7f7081a1a131    raftstore::store::fsm::apply::ApplyFsm$LT$EK$GT$::handle_tasks::hceef58dc853a2734 [libtiflash_proxy.so+39387441]
  0x7f7080c30f0e    _$LT$raftstore..store..fsm..apply..ApplyPoller$LT$EK$GT$$u20$as$u20$batch_system..batch..PollHandler$LT$raftstore..store..fsm..apply..ApplyFsm$LT$EK$GT$$C$raftstore..store..fsm..apply..ControlFsm$GT$$GT$::handle_normal::hcef411993d0cec81 [libtiflash_proxy.so+24801038]
  0x7f7080ba8d03    batch_system::batch::Poller$LT$N$C$C$C$Handler$GT$::poll::h735b90a4bed179e6 [libtiflash_proxy.so+24243459]
  0x7f7080c98da8    std::sys_common::backtrace::__rust_begin_short_backtrace::h1cb2e050923d86be [libtiflash_proxy.so+25226664]
  0x7f7080cde33e    core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h3a3308035458aa5d [libtiflash_proxy.so+25510718]
  0x7f70821d5915    std::sys::unix::thread::Thread::new::thread_start::hd2791a9cabec1fda [libtiflash_proxy.so+47495445]
                    /rustc/96ddd32c4bfb1d78f0cd03eb068b1710a8cebeef/library/std/src/sys/unix/thread.rs:108
  0x7f707f159802    start_thread [libc.so.6+653314]"] [source=BaseDaemon] [thread_id=6009]

4. What is your TiFlash version? (Required)

master

@JaySon-Huang
Copy link
Contributor

/assign @JaySon-Huang

@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented Apr 21, 2023

The root cause is BlobStore<Trait>::handleLargeWrite get a buffer from MemoryWriteBuffer and access the memory by write.size, which exceed the memory bound.

BufferBase::Buffer data_buf = write.read_buffer->buffer();
digest.update(data_buf.begin(), write.size);
entry.checksum = digest.checksum();
UInt64 field_begin, field_end;
for (size_t i = 0; i < write.offsets.size(); ++i)
{
ChecksumClass field_digest;
field_begin = write.offsets[i].first;
field_end = (i == write.offsets.size() - 1) ? write.size : write.offsets[i + 1].first;
field_digest.update(data_buf.begin() + field_begin, field_end - field_begin);
write.offsets[i].second = field_digest.checksum();
}
if (!write.offsets.empty())
{
// we can swap from WriteBatch instead of copying
entry.field_offsets.swap(write.offsets);
}
try
{
auto blob_file = getBlobFile(blob_id);
blob_file->write(data_buf.begin(), offset_in_file, write.size, write_limiter);

MemoryWriteBuffer actually is composed by a list of chunks

class MemoryWriteBuffer
: public WriteBuffer
, public IReadableWriteBuffer
, boost::noncopyable
, private Allocator<false>
{
public:
/// Use max_total_size_ = 0 for unlimited storage
explicit MemoryWriteBuffer(
size_t max_total_size_ = 0,
size_t initial_chunk_size_ = DBMS_DEFAULT_BUFFER_SIZE,
double growth_rate_ = 2.0,
size_t max_chunk_size_ = 128 * DBMS_DEFAULT_BUFFER_SIZE);
void nextImpl() override;
~MemoryWriteBuffer() override;
protected:
std::shared_ptr<ReadBuffer> getReadBufferImpl() override;
const size_t max_total_size;
const size_t initial_chunk_size;
const size_t max_chunk_size;
const double growth_rate;
using Container = std::forward_list<BufferBase::Buffer>;
Container chunk_list;
Container::iterator chunk_tail;

Affects since v6.2 to v7.1 when the region data exceed 256MB, normally caused by large transaction.

@JaySon-Huang
Copy link
Contributor

After restart from the crash, TiFlash may failed to start again with error message like
["DB::Exception: Reading with entries meet checksum not match ...[file=/path/to/kvstore/blobfile_xxx]"] ...
or
[DB::Exception: invalid flag \u0000 in lock value 50267480000000000000505F728000000000C013FB8180B0E5CF84A28F06DBC805760A80000100000000000000000000000000000000000000000000000000]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants