Skip to content

Update last written LSN for gin/gist index metadata #182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 19, 2022

Conversation

knizhnik
Copy link

This commit fix the problem with pg_regress with last_written_lsn_cache.
Indexes which are populated in special way (gin,gist,spgist,...) requires update of last written LSN for relation metadata

@knizhnik knizhnik requested a review from hlinnaka July 14, 2022 21:11
@hlinnaka hlinnaka requested review from MMeent and aome510 July 14, 2022 22:26
Copy link

@aome510 aome510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, I didn't receive notification for review requested in this PR. LGTM, but let's wait for CIs in https://github.com/neondatabase/neon/tree/exp-07-18 to make sure tests with the new changes passed.

Comment on lines 424 to 425
SetLastWrittenLSN(XactLastRecEnd, index->rd_smgr->smgr_rnode.node.relNode, 0, RelationGetNumberOfBlocks(index));
SetLastWrittenLSN(XactLastRecEnd, index->rd_smgr->smgr_rnode.node.relNode, REL_METADATA_PSEUDO_BLOCKNO, REL_METADATA_PSEUDO_BLOCKNO);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API where you have to call SetLastWrittenLSN twice, is a bit awkward. In all of these calls where you pass REL_METADATA_PSEUDO_BLOCKNO, you have just loaded the relation with data. How replacing SetLastWrittenLSN() with two functions:

SetLastWrittenLSNForRelation(Relation, ForkNumber, XLogRecPtr);
SetLastWrittenLSNForBlock(RelFileNode, ForkNumber, BlockNumber, XLogRecPtr);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@hlinnaka
Copy link
Contributor

but let's wait for CIs in https://github.com/neondatabase/neon/tree/exp-07-18 to make sure tests with the new changes passed.

thanks!

Comment on lines -87 to -91
/*
* Pseudo block number used to associate LSN with relation metadata (relation size)
*/
#define REL_METADATA_PSEUDO_BLOCKNO InvalidBlockNumber

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move this back here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately - not, it is use in pagestore_smgr.c in zenith_get_request_lsn -> GeLastWrittenLsn.
Certainly it is possible to add two versions of GetLastWrittenLsn function and somehow explain zenith_get_request_lsn which version to call. But it complicates code and I am not sure that it makes it ore readable. So I left it as it is.

@knizhnik knizhnik merged commit 7517d1c into main Jul 19, 2022
@knizhnik knizhnik deleted the last_written_lsn_cache_fix branch July 19, 2022 04:50
@hlinnaka
Copy link
Contributor

Got a CI failure with this:

test_runner/batch_others/test_recovery.py:64: in test_pageserver_recovery
    cur.execute("select count(*) from foo")
E   psycopg2.errors.IoError: could not read block 1 in rel 1663/12972/2663.0 from page server at lsn 0/016AE7D8
E   LINE 1: select count(*) from foo
E                                ^
E   DETAIL:  page server returned error: initdb failed

@hlinnaka
Copy link
Contributor

It looks like this isn't really ready yet. How about we revert these changes, ie. commit 595ac69 and 7517d1c, so that we can ship the other changes on the 'main' branch. Namely, adding uuid-ossp to the list of supported extensions.

@knizhnik
Copy link
Author

I can not reproduce the problem with test_recovery failure locally, did you? Or it just happens in CI?

Certainly, if there are such errors and this patch is blocking other patches, then it should be undone.
Do you want me to do it? Just reset main branch to 9c99008
and then apply db67809 on top of it?

Also I have implemented version of last written LSN cache which uses buffer tag as a key:
https://github.com/neondatabase/postgres/tree/last_written_lsn_buftag_cache
I still think that it is enough to have relnode as a key, because it makes comparison/hash calculation operations several times faster. And probability of collision is small. But if you think that it is more important to avoid conflicts we can use this branch.

So, if we undo this commits, what do you think will be the next steps with this PR?
Should we try to fix errors and merge it before launch?
Or do you think that there is something wrong with the proposed approach and we should think about something different?
Original patch (with per-relation cache) was much simpler but doesn't optimize case with parallel insert/selects from huge table.
I am not sure how popular this workload can be and will per-chunk last written lsn cache be really able to increase performance of such workload.

But first of all it will be nice to understand the reason of this failure...
It may indicate soe other error not directly related with last written LSN cache.

@hlinnaka
Copy link
Contributor

I can not reproduce the problem with test_recovery failure locally, did you? Or it just happens in CI?

Certainly, if there are such errors and this patch is blocking other patches, then it should be undone. Do you want me to do it? Just reset main branch to 9c99008 and then apply db67809 on top of it?

That works, but requires a force-push. I'd suggest git revert 7517d1cec45224841eac327cad7e0ddc81c734ff 595ac69260719d8d7b43c09ab7dfd8f232542e50, and git rebase -i origin/main to squash the two revert commits into one.

Yeah, please go ahead and do the revert. Let's unblock the uuid-ossp change, and get vendor/postgres back in sync with the postgres main-branch again.

Also I have implemented version of last written LSN cache which uses buffer tag as a key: https://github.com/neondatabase/postgres/tree/last_written_lsn_buftag_cache I still think that it is enough to have relnode as a key, because it makes comparison/hash calculation operations several times faster. And probability of collision is small. But if you think that it is more important to avoid conflicts we can use this branch.

It's more about readability; it's surprising to use just the relfilenode oid as the key. I don't feel strongly about it, but unless you can see a measurable, significant performance difference, I would prefer using the buffer tag.

So, if we undo this commits, what do you think will be the next steps with this PR? Should we try to fix errors and merge it before launch? Or do you think that there is something wrong with the proposed approach and we should think about something different? Original patch (with per-relation cache) was much simpler but doesn't optimize case with parallel insert/selects from huge table. I am not sure how popular this workload can be and will per-chunk last written lsn cache be really able to increase performance of such workload.

I think the high-level approach here makes sense, we're just talking about the details.

But first of all it will be nice to understand the reason of this failure... It may indicate soe other error not directly related with last written LSN cache.

Agreed. It certainly looks like an LSN-cache related bug, but in any case we need to find the bug.

@knizhnik
Copy link
Author

I tried to run CI test with this postgres versions.
2 tests are failed in debug build: test_tenant_relocation and test_wal_acceptor.
Both are failed with timeouts and looks like their failure has no relation to this patch.
@hlinnaka are you sure that test_recovery.py on the latest version of this PR and we should undo it?

knizhnik added a commit that referenced this pull request Jul 19, 2022
This reverts commit 7517d1c.

Revert "Large last written lsn cache (#177)"

This reverts commit 595ac69.
@aome510
Copy link

aome510 commented Jul 19, 2022

2 tests are failed in debug build: test_tenant_relocation and test_wal_acceptor.
Both are failed with timeouts and looks like their failure has no relation to this patch.

One possible cause is that this patch slows down the performance of those tests. That said, I wasn't able to reproduce the failures locally when running tests individually. Maybe, running in parallel would be different.

The same tests also fail in https://app.circleci.com/pipelines/github/neondatabase/neon/8175/workflows/96416e07-8028-46d0-94d7-2a470ae42425/jobs/83642. They're consistent, so definitely there is something wrong here.

knizhnik added a commit that referenced this pull request Jul 26, 2022
…183)

This reverts commit 7517d1c.

Revert "Large last written lsn cache (#177)"

This reverts commit 595ac69.
MMeent pushed a commit that referenced this pull request Aug 18, 2022
* Update last written LSN for gin/gist index metadata

* Replace SetLastWrittenLSN with family of SetLastWrittenLSNFFor* functions
MMeent pushed a commit that referenced this pull request Aug 18, 2022
…183)

This reverts commit 7517d1c.

Revert "Large last written lsn cache (#177)"

This reverts commit 595ac69.
lubennikovaav pushed a commit that referenced this pull request Nov 21, 2022
* Update last written LSN for gin/gist index metadata

* Replace SetLastWrittenLSN with family of SetLastWrittenLSNFFor* functions
lubennikovaav pushed a commit that referenced this pull request Nov 21, 2022
…183)

This reverts commit 7517d1c.

Revert "Large last written lsn cache (#177)"

This reverts commit 595ac69.
MMeent pushed a commit that referenced this pull request Feb 10, 2023
* Update last written LSN for gin/gist index metadata

* Replace SetLastWrittenLSN with family of SetLastWrittenLSNFFor* functions
MMeent pushed a commit that referenced this pull request Feb 10, 2023
…183)

This reverts commit 7517d1c.

Revert "Large last written lsn cache (#177)"

This reverts commit 595ac69.
MMeent pushed a commit that referenced this pull request Feb 10, 2023
* Update last written LSN for gin/gist index metadata

* Replace SetLastWrittenLSN with family of SetLastWrittenLSNFFor* functions
MMeent pushed a commit that referenced this pull request Feb 10, 2023
…183)

This reverts commit 7517d1c.

Revert "Large last written lsn cache (#177)"

This reverts commit 595ac69.
MMeent pushed a commit that referenced this pull request May 11, 2023
* Update last written LSN for gin/gist index metadata

* Replace SetLastWrittenLSN with family of SetLastWrittenLSNFFor* functions
MMeent pushed a commit that referenced this pull request May 11, 2023
…183)

This reverts commit 7517d1c.

Revert "Large last written lsn cache (#177)"

This reverts commit 595ac69.
tristan957 pushed a commit that referenced this pull request Aug 10, 2023
* Update last written LSN for gin/gist index metadata

* Replace SetLastWrittenLSN with family of SetLastWrittenLSNFFor* functions
tristan957 pushed a commit that referenced this pull request Aug 10, 2023
…183)

This reverts commit 7517d1c.

Revert "Large last written lsn cache (#177)"

This reverts commit 595ac69.
tristan957 pushed a commit that referenced this pull request Nov 8, 2023
* Update last written LSN for gin/gist index metadata

* Replace SetLastWrittenLSN with family of SetLastWrittenLSNFFor* functions
tristan957 pushed a commit that referenced this pull request Nov 8, 2023
…183)

This reverts commit 7517d1c.

Revert "Large last written lsn cache (#177)"

This reverts commit 595ac69.
tristan957 pushed a commit that referenced this pull request Nov 8, 2023
* Update last written LSN for gin/gist index metadata

* Replace SetLastWrittenLSN with family of SetLastWrittenLSNFFor* functions
tristan957 pushed a commit that referenced this pull request Nov 8, 2023
…183)

This reverts commit 7517d1c.

Revert "Large last written lsn cache (#177)"

This reverts commit 595ac69.
tristan957 pushed a commit that referenced this pull request Feb 5, 2024
* Update last written LSN for gin/gist index metadata

* Replace SetLastWrittenLSN with family of SetLastWrittenLSNFFor* functions
tristan957 pushed a commit that referenced this pull request Feb 5, 2024
…183)

This reverts commit 7517d1c.

Revert "Large last written lsn cache (#177)"

This reverts commit 595ac69.
tristan957 pushed a commit that referenced this pull request Feb 5, 2024
* Update last written LSN for gin/gist index metadata

* Replace SetLastWrittenLSN with family of SetLastWrittenLSNFFor* functions
tristan957 pushed a commit that referenced this pull request Feb 5, 2024
…183)

This reverts commit 7517d1c.

Revert "Large last written lsn cache (#177)"

This reverts commit 595ac69.
tristan957 pushed a commit that referenced this pull request Feb 6, 2024
* Update last written LSN for gin/gist index metadata

* Replace SetLastWrittenLSN with family of SetLastWrittenLSNFFor* functions
tristan957 pushed a commit that referenced this pull request Feb 6, 2024
…183)

This reverts commit 7517d1c.

Revert "Large last written lsn cache (#177)"

This reverts commit 595ac69.
tristan957 pushed a commit that referenced this pull request May 10, 2024
* Update last written LSN for gin/gist index metadata

* Replace SetLastWrittenLSN with family of SetLastWrittenLSNFFor* functions
tristan957 pushed a commit that referenced this pull request May 10, 2024
…183)

This reverts commit 7517d1c.

Revert "Large last written lsn cache (#177)"

This reverts commit 595ac69.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants