Skip to content

Cached resolver #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Cached resolver #1

wants to merge 3 commits into from

Conversation

magec
Copy link
Collaborator

@magec magec commented Nov 22, 2022

Pgcat is a postgres pooler written in Rust that uses Tokio to allow concurency.

We are comparing several poolers, not only Pgcat.

The fact is that it lacks (PgCat and the rest of poolers we've seen so far) an important feature that Pgbouncer has:

In Pgbouncer (from its docu):

[...] Host names are resolved at connection time, the result is cached per dns_max_ttl parameter. When a host name’s resolution changes, existing server connections are automatically closed when they are released (according to the pooling mode), and new server connections immediately use the new resolution. If DNS returns several results, they are used in a round-robin manner [...].

Even though the DNS round robin is not used in our infrastructure, the DNS caching feature allow us to change read only replicas without having to restart anything.

This is a (working) proof of concept of an implementation of such feature in PgCat. The purpose of this was to see:

  • How feasible was making a contribution
  • See how mature the project is

This change is Reviewable

Copy link

@lilymara-onesignal lilymara-onesignal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 11 files at r1, all commit messages.
Reviewable status: 7 of 11 files reviewed, 2 unresolved discussions (waiting on @magec)


Cargo.toml line 37 at r1 (raw file):

phf = { version = "0.11.1", features = ["macros"] }
exitcode = "1.1.2"
trust-dns-resolver = "*"

Wildcard version should not be used, pin to latest version - 0.22


src/config.rs line 211 at r1 (raw file):

    pub fn default_dns_max_ttl() -> u64 {
	30

please run cargo fmt in the repository, there are many formatting issues and I'm having a hard time parsing the code as-written

Copy link

@lilymara-onesignal lilymara-onesignal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r2, all commit messages.
Reviewable status: all files reviewed, 19 unresolved discussions (waiting on @magec)


src/client.rs line 62 at r2 (raw file):

    client_server_map: ClientServerMap,

    /// Client parameters, e.g. user, client_encoding, etc1

extra chars added to EoL


src/client.rs line 69 at r2 (raw file):

    stats: Reporter,

    /// Clients want to talk to admin database0

extra chars added to EoL


src/dns_cache.rs line 17 at r2 (raw file):

#[derive(Clone, PartialEq, Debug)]
pub struct AddrSet {
    pub set: HashSet<IpAddr>,

field should not be public


src/dns_cache.rs line 21 at r2 (raw file):

impl AddrSet {
    pub fn new() -> AddrSet {

fn should not be public


src/dns_cache.rs line 39 at r2 (raw file):

//
// A CachedResolver is dns resolution cache mechanism with customizable expiration time.

comments on items in this file should be "doc comments" - use /// instead of //. This will ensure the documentation system picks them up and includes them in autogenerated documentation


src/dns_cache.rs line 61 at r2 (raw file):

pub struct CachedResolver {
    // The configuration of the cached_resolver.
    pub config: CachedResolverConfig,

field should not be public


src/dns_cache.rs line 77 at r2 (raw file):

}

impl CachedResolver {

None of the functions in this struct should need to take &mut, since all mutations are done behind an RwLock. &self should be fine


src/dns_cache.rs line 95 at r2 (raw file):

        // Construct a new Resolver with default configuration options
        let resolver = Arc::new(TokioAsyncResolver::tokio_from_system_conf()?);
        let data = Arc::new(RwLock::new(HashMap::new()));

imo the refresh_dns_entries_loop fn should be private and it should be spawned here inside of the new fn. new should return an Arc<Self>


src/dns_cache.rs line 106 at r2 (raw file):

    // Schedules the refresher
    pub async fn refresh_dns_entries_loop(&mut self) {
        let data = self.data.clone();

clone not necessary, use self.data directly


src/dns_cache.rs line 121 at r2 (raw file):

            for hostname in hostnames.iter() {
                match CachedResolver::fetch_from_data_cache(data.clone(), hostname.as_str()) {

use fetch_from_cache


src/dns_cache.rs line 121 at r2 (raw file):

            for hostname in hostnames.iter() {
                match CachedResolver::fetch_from_data_cache(data.clone(), hostname.as_str()) {

It's fine to use .expect() on this function, since hosts cannot be removed from the cache, so a hostname fetched from the list of keys on the cache is guaranteed to hold a value.


src/dns_cache.rs line 135 at r2 (raw file):

                                    addrset, new_addrset
                                );
                                CachedResolver::store_in_cache(data.clone(), hostname, new_addrset);

ditto &self


src/dns_cache.rs line 184 at r2 (raw file):

            None => {
                debug!("Not found, executing a dns query!");
                let addr_set = AddrSet::from(self.resolver.clone().lookup_ip(host).await?);

resolver clone not necessary


src/dns_cache.rs line 209 at r2 (raw file):

    }

    fn fetch_from_data_cache(

this function shouldn't exist,fetch_from_cache should be used where this function is used


src/pool.rs line 164 at r2 (raw file):

                Err(err) => {
                    error!("Error Starting cached_resolver error: {:?}", err);
                    error!("Will continue without this feature");

log messages should be single events containing all relevant information, they should not be spread across two messages


src/server.rs line 93 at r2 (raw file):

        stats: Reporter,
    ) -> Result<Server, Error> {
        let addr_set = match *cached_resolver.clone() {

the two clones in this block can probably be removed, can you try using .as_ref instead?


src/server.rs line 93 at r2 (raw file):

        stats: Reporter,
    ) -> Result<Server, Error> {
        let addr_set = match *cached_resolver.clone() {

I notice that we don't use this new DNS cache for anything except invalidating connections. That's probably fine, but calling the feature a "DNS cache" seems misleading


src/server.rs line 581 at r2 (raw file):

    /// This happens with clients that misbehave.
    pub fn is_bad(&self) -> bool {
        if let Some(cached_resolver) = &(*(self.cached_resolver.clone())) {

the RHS of this could probably be shortened to self.cached_resolver.as_ref()

Copy link
Collaborator Author

@magec magec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 19 unresolved discussions (waiting on @lilymara-onesignal)


Cargo.toml line 37 at r1 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

Wildcard version should not be used, pin to latest version - 0.22

Done.


src/client.rs line 62 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

extra chars added to EoL

Done.


src/client.rs line 69 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

extra chars added to EoL

Done.


src/dns_cache.rs line 17 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

field should not be public

Done.


src/dns_cache.rs line 21 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

fn should not be public

Done


src/dns_cache.rs line 39 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

comments on items in this file should be "doc comments" - use /// instead of //. This will ensure the documentation system picks them up and includes them in autogenerated documentation

Done.


src/dns_cache.rs line 61 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

field should not be public

Done.


src/dns_cache.rs line 77 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

None of the functions in this struct should need to take &mut, since all mutations are done behind an RwLock. &self should be fine

👍


src/dns_cache.rs line 95 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

imo the refresh_dns_entries_loop fn should be private and it should be spawned here inside of the new fn. new should return an Arc<Self>

Done.


src/dns_cache.rs line 106 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

clone not necessary, use self.data directly

Done.


src/dns_cache.rs line 121 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

use fetch_from_cache

Done.


src/dns_cache.rs line 121 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

It's fine to use .expect() on this function, since hosts cannot be removed from the cache, so a hostname fetched from the list of keys on the cache is guaranteed to hold a value.

Done.


src/dns_cache.rs line 135 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

ditto &self

Done.


src/dns_cache.rs line 184 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

resolver clone not necessary

Done.


src/dns_cache.rs line 209 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

this function shouldn't exist,fetch_from_cache should be used where this function is used

Done.


src/pool.rs line 164 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

log messages should be single events containing all relevant information, they should not be spread across two messages

Done.


src/server.rs line 93 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

the two clones in this block can probably be removed, can you try using .as_ref instead?

Done.


src/server.rs line 93 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

I notice that we don't use this new DNS cache for anything except invalidating connections. That's probably fine, but calling the feature a "DNS cache" seems misleading

Well, I kind of disagree, even though we do not use all of its features in the code, the struct is a CachedResolver, maybe the name can be improved, but the statement A CachedResolver is a DNS resolution cache mechanism with customizable expiration time. is correct IMO.


src/server.rs line 581 at r2 (raw file):

Previously, lilymara-onesignal (Lily Mara) wrote…

the RHS of this could probably be shortened to self.cached_resolver.as_ref()

Done.

@magec magec force-pushed the cached-resolver branch 6 times, most recently from 8bed231 to a7890c9 Compare December 7, 2022 14:10
@magec magec force-pushed the cached-resolver branch 3 times, most recently from 96cf5c5 to dfe1d0e Compare January 23, 2023 13:11
@magec magec force-pushed the cached-resolver branch 2 times, most recently from a8b3ad8 to 7a2d5e4 Compare February 2, 2023 12:09
@magec magec force-pushed the cached-resolver branch 2 times, most recently from 5026387 to 99871c6 Compare February 23, 2023 09:25
@magec magec force-pushed the cached-resolver branch from 99871c6 to ab932a9 Compare March 13, 2023 09:14
@magec magec force-pushed the cached-resolver branch from ab932a9 to 87a0d0f Compare March 28, 2023 15:45
Copy link

@lilymara-onesignal lilymara-onesignal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 5 files at r3, 1 of 8 files at r5, 7 of 9 files at r8, 8 of 8 files at r9, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @magec)


src/dns_cache.rs line 79 at r9 (raw file):

    // This is the hash that contains the hash.
    data: Option<RwLock<HashMap<String, AddrSet>>>,

I don't see any place where data might be stored as None, so I'm not sure why this field is marked as optional


src/dns_cache.rs line 82 at r9 (raw file):

    // The resolver to be used for DNS queries.
    resolver: Option<TokioAsyncResolver>,

Ditto, resolver is never None


src/dns_cache.rs line 174 at r9 (raw file):

    // Schedules the refresher
    async fn refresh_dns_entries_loop(&self) {
        let resolver = TokioAsyncResolver::tokio_from_system_conf().unwrap();

Why create a resolver in this function if there's one available on self?


src/errors.rs line 16 at r8 (raw file):

    TlsError,
    StatementTimeout,
    DNSCachedError(String),

This isn't really your problem to solve, but upstream pgcat doesn't have a great implementation of error-handling.

@magec magec force-pushed the cached-resolver branch 2 times, most recently from 3e0fbbd to 9a4eac1 Compare April 24, 2023 10:34
magec added 2 commits May 2, 2023 09:56
… changes.

Adds a module to deal with dns_cache feature. It's
main struct is CachedResolver, which is a simple thread safe
hostname <-> Ips cache with the ability to refresh resolutions
every `dns_max_ttl` seconds. This way, a client can check whether its
ip address has changed.
@magec magec force-pushed the cached-resolver branch from 9a4eac1 to ac39ea0 Compare May 2, 2023 07:57
@nicolasvan
Copy link
Member

Closing this since this made it into upstream pgcat in postgresml#249

@nicolasvan nicolasvan closed this Mar 18, 2024
@nicolasvan nicolasvan deleted the cached-resolver branch March 18, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants