forked from duckdb/duckdb
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add s3_url_compatibility_mode to create_secret_functions #82
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…_install) Idea is that if `.info` files ends up being corrupt, you will be stuck in a loop like: duckdb -c "FORCE INSTALL x" IO Error: Failed to read info file for 'x' extension: Try reinstalling the extension using 'FORCE INSTALL x;' And given in the force_install codepath (that is either in case of `FORCE INSTALL` or in case of `UPDATE EXTENSIONS`) the info file will be rewritten in any case I think this is fine. Note that we could also just avoid the read (but only in the case of FORCE INSTALL), but I wanted avoid yet another slightly different codepath.
…e to detect headers and types. (duckdb#14174) When globbing multiple CSV files, if a file that would be ran with the adaptive sniffer only has one row, the adaptive sniffer could misidentify if the one row would be a header or a data row. This PR extends the checks to identify this case, and adds tests for it. Fix: duckdb#14166
Fixes duckdb#14294 I ran the test with 1 thread and saw it consistently yielded 5 files, so this should be the proper lower bound.
This is a follow-up to duckdb#13712, using a more fine-granular locking approach. duckdb#13712 has fixed an issue where checkpoints together with concurrent queries that referenced the same table multiple times could lead to deadlocks. The locking approach in that PR used a transaction-level lock to coordinate lock acquisition for the same table. In case of a query having many table scans (for different tables), this could potentially now block all executor threads if only a single table lock could not be acquired (e.g. due to a concurrent checkpoint). One thread would be blocked trying to acquire the checkpoint lock for the given table, and many others on the active_locks_lock (even though they would only be interested in acquiring the checkpoint lock for another table that wasn't being checkpointed).
…b#14395) This PR fixes duckdb#14386 Offsets were not used correctly, now they are
…kdb#14397) File list materialization can be expensive and should be done only when necessary
Clamp the range of shifted defaults/NULLs to the end of the partition, not the end of the chunk. fixes: duckdb#14398 fixes: duckdblabs/duckdb-internal#3300
Clamp the range of shifted defaults/NULLs to the end of the partition, not the end of the chunk. fixes: duckdb#14398 fixes: duckdblabs/duckdb-internal#3300
str(summarized_phase.percentage * 100)[:6] will fail if the number is on the form of 1.23e-05 or similar. Changed to round.
The existing code to calculate the percentages shown in the html report `str(summarized_phase.percentage * 100)[:6]` will fail if the number is on the form of 8.09860748592347e-07 or similar. Changed to `round(summarized_phase.percentage,4)` to give a rounded number that is correct. ```python summarized_phase.percentage = 8.09860748592347e-07 str(summarized_phase.percentage*100)[:6] >'8.0986' round(summarized_phase.percentage*100,4) >0.0001 ```
…_ROW_ID (duckdb#14480) Updated two instances where LogicalType::BIGINT was mistakenly used instead of ROW_TYPE when handling the COLUMN_IDENTIFIER_ROW_ID pseudo-column. This change aligns the code with the rest of the project for consistency and clarity.
… installing) (duckdb#14272) Idea is that if `.info` files ends up being corrupt, you will be stuck in a loop like: ``` duckdb -c "FORCE INSTALL x" IO Error: Failed to read info file for 'x' extension: ... .... Try reinstalling the extension using 'FORCE INSTALL x;' ``` And given in the force_install codepath (that is either in case of `FORCE INSTALL` or in case of `UPDATE EXTENSIONS`) the info file will be rewritten in any case I think this is OK, and there is no need to notify user that this happened. In the LOAD codepath this will still be visible, we only need a way to get unblocked (yes, removing the file works, but it's not super cool). Note that we could also just avoid the read (but only in the case of proper `FORCE INSTALL`), but I wanted avoid yet another slightly different codepath.
…ls but avoid testing them (duckdb#14510) Discussed with @Mytherin, Python 3.7 is end of life already, we keep the support just since it seems weird to remove what currently is working (but keeps generating noise).
…ning empty lists (duckdb#14538) This PR fixes duckdb#14497 I can't produce a test for this other than the reproduction from the issue, which uses pyiceberg. The problem was obvious though, there are no offsets when the list is empty, so `offsets[0]` accesses uninitialized memory, it no longer does that.
…unicode (python enforces this)
…b#14556) Fix duckdb#14545 Yacc "helpfully" allows you to accidentally define an empty token, which defaults to returning nothing, in the following way: ``` pivot_header: | d_expr { $$ = list_make1($1); } ``` This PR fixes the issue and makes the query return a syntax error (as it should).
…pushdown (duckdb#14553) This PR fixes duckdb#14344 I've also included a small change to TableFilter, using ToString is impossible when debugging, as `ToString("c0")` would complain about `c0` not being a `std::string` DebugToString inserts a bogus default column name so the method can be called.
Pre-allocating the hash-vector results in a large amount of memory being consumed by duckdb::Vector type (70% of the program memory) and also results in a 3-fold increase in resident memory usage.
Fixes an internally found issue
…uckdb#14570) The `DistinctStatistics` constructor pre-initializes a hash vector of type `duckdb::Vector` with a size of `sizeof(uin64)*2048` = 16KB. This pre-initialization ends up consuming a lot of memory as seen in this massif output of a program that attaches 50 catalogs, each having 97 columns. We see that 78MB of the 116MB allocated is related to this vector. ``` -------------------------------------------------------------------------------- n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) -------------------------------------------------------------------------------- 45 933,253,335,399 113,180,936 112,322,871 858,065 0 46 933,265,400,713 114,960,272 114,086,884 873,388 0 47 945,145,815,618 114,955,328 114,082,020 873,308 0 48 948,725,400,279 116,662,272 115,778,767 883,505 0 99.24% (115,778,767B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->67.65% (78,921,728B) 0xEB0B05: duckdb::VectorBuffer::CreateStandardVector(duckdb::PhysicalType, unsigned long) (in /home/admin/a.out) | ->67.65% (78,921,728B) 0xEB1F0D: duckdb::Vector::Initialize(bool, unsigned long) (in /home/admin/a.out) | ->67.65% (78,921,728B) 0xEB2065: duckdb::Vector::Vector(duckdb::LogicalType, unsigned long) (in /home/admin/a.out) | ->59.93% (69,910,528B) 0x2670C4F: duckdb::DistinctStatistics::DistinctStatistics(duckdb::unique_ptr<duckdb::HyperLogLog, std::__1::default_delete<duckdb::HyperLogLog>, true>, unsigned long, unsigned long) (in /home/admin/a.out) | | ->59.93% (69,910,528B) 0x26573C6: duckdb::DistinctStatistics::Deserialize(duckdb::Deserializer&) (in /home/admin/a.out) | | ->59.93% (69,910,528B) 0x267340E: duckdb::unique_ptr<duckdb::DistinctStatistics, std::__1::default_delete<duckdb::DistinctStatistics>, true> duckdb::Deserializer::ReadPropertyWithExplicitDefault<duckdb::unique_ptr<duckdb::DistinctStatistics, std::__1::default_delete<duckdb::DistinctStatistics>, true> >(unsigned short, char const*, duckdb::unique_ptr<duckdb::DistinctStatistics, std::__1::default_delete<duckdb::DistinctStatistics>, true>&&) (in /home/admin/a.out) | | ->59.93% (69,910,528B) 0x2670A4E: duckdb::ColumnStatistics::Deserialize(duckdb::Deserializer&) (in /home/admin/a.out) | | ->59.93% (69,910,528B) 0x26AF648: duckdb::TableStatistics::Deserialize(duckdb::Deserializer&, duckdb::ColumnList&) (in /home/admin/a.out) | | ->59.93% (69,910,528B) 0x259D9FC: duckdb::TableDataReader::ReadTableData() (in /home/admin/a.out) ``` After this change, the same program only takes up 36MB and the dominant memory consumption is from `duckdb::FileBuffer` at 26MB (not surprising at all). ``` -------------------------------------------------------------------------------- n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) -------------------------------------------------------------------------------- 45 1,580,151,349,285 31,102,848 30,481,800 621,048 0 46 1,580,163,374,582 31,273,928 30,640,637 633,291 0 47 1,642,024,314,218 31,269,416 30,636,181 633,235 0 48 1,645,707,202,738 36,805,360 36,075,097 730,263 0 98.02% (36,075,097B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->71.81% (26,429,440B) 0xD76F48: duckdb::Allocator::DefaultAllocate(duckdb::PrivateAllocatorData*, unsigned long) (in /home/admin/a.out) | ->71.81% (26,429,440B) 0xD76FFF: duckdb::Allocator::AllocateData(unsigned long) (in /home/admin/a.out) | ->71.78% (26,419,200B) 0xD8ADC3: duckdb::FileBuffer::Resize(unsigned long) (in /home/admin/a.out) | | ->71.22% (26,214,400B) 0x255CBE3: duckdb::Block::Block(duckdb::Allocator&, long, unsigned long) (in /home/admin/a.out) | | | ->71.22% (26,214,400B) 0x257A571: duckdb::SingleFileBlockManager::CreateBlock(long, duckdb::FileBuffer*) (in /home/admin/a.out) | | | ->71.22% (26,214,400B) 0x2596FBD: duckdb::BlockHandle::Load(duckdb::unique_ptr<duckdb::FileBuffer, std::__1::default_delete<duckdb::FileBuffer>, true>) (in /home/admin/a.out) | | | ->71.22% (26,214,400B) 0x256F56C: duckdb::StandardBufferManager::Pin(duckdb::shared_ptr<duckdb::BlockHandle, true>&) (in /home/admin/a.out) | | | ->71.22% (26,214,400B) 0x2625E40: duckdb::MetadataReader::ReadNextBlock() (in /home/admin/a.out) ```
duckdb#14290 changes the timing field from `operator_timing` to `latency` for the root node. This follows that change to make the query_graph tool work again.
carlopi
force-pushed
the
fix_secret_attempt2
branch
from
October 29, 2024 14:38
6a90eb4
to
c33b4b9
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.