Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add tars discovery #6599

Merged
merged 9 commits into from
Mar 21, 2022
Merged

feat: add tars discovery #6599

merged 9 commits into from
Mar 21, 2022

Conversation

zhixiongdu027
Copy link
Contributor

What this PR does / why we need it:

Pre-submission checklist:

  • Did you explain what problem does this PR solve? Or what new features have been added?
  • Have you added corresponding test cases?
  • Have you modified the corresponding document?
  • Is this PR backward compatible? If it is not backward compatible, please discuss on the mailing list first

#6570

Hello everyone:

This is a PR for the tars discovery module, which is part of "tars-proxy".

Work on "tars-proxy" continues, but maybe we can finish tars discovery first

use t::APISIX 'no_plan';

repeat_each(1);
log_level('debug');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to use debug level?


=== TEST 1: create initial server and servant
--- timeout: 12
--- yaml_config eval: $::yaml_config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already set the yaml_config?

"DONE\n",
"{ 1 1 1 }\n",
]
--- no_error_log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's save the no_error_log like

if ((!defined $block->error_log) && (!defined $block->no_error_log)) {


function _M.init_worker()
-- TODO: maybe we can read dict name from discovery config
endpoint_dict = ngx.shared.discovery
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a new shdict instead of hijacking authz-keycloak's

end

local conf = local_conf.discovery.tars
default_weight = local_conf.discovery.tars.default_weight or 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
default_weight = local_conf.discovery.tars.default_weight or 100
default_weight = conf.default_weight

The default weight is set already.


if not last_fetch_error then
db_cli:set_keepalive(120 * 1000, 1)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need close if err?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to handle this.

Copy link
Contributor Author

@zhixiongdu027 zhixiongdu027 Mar 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need close if err?

Sorry, I think this place doesn't actively call db:close() should be acceptable,
Here provides agentzh's view

return nil
end

local endpoint = core.json.decode(endpoint_content)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get the err and put it in the msg

local activated_buffer = core.table.new(10, 0)
local nodes_buffer = core.table.new(0, 5)

local endpoints_pattern = core.table.concat(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you show us the data which is matched by this regex?

end


local function create_endpoint_lrucache(servant)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
local function create_endpoint_lrucache(servant)
local function add_endpoint_to_lrucache(servant)



local function get_endpoint(servant)
local endpoint_version = endpoint_dict:get_stale(servant .. "#version")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you tell me why get_statle is used instead of get?

@zhixiongdu027
Copy link
Contributor Author

@spacewander
Hi, all comment issues have been taken care of. please help review.
Many TKS.

@zhixiongdu027
Copy link
Contributor Author

image

@spacewander @membphis

Sorry, I don't understand how this failure happened.
Please give a hand. TKS


if not last_fetch_error then
db_cli:set_keepalive(120 * 1000, 1)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to handle this.

@@ -266,6 +266,7 @@ nginx_config: # config for render the template to generate n
introspection: 10m
access-tokens: 1m
ext-plugin: 1m
tars: 1m
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it configurable in the next PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to handle this.

Sorry,I did not get your point
Your means we can keep using this db hanlder event get last_db_error ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to close the handler when getting last_db_error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean we need to decide whether to close or not based on the value of error ?

Currently the case of error=='again' has been handled
But So Sorry for I don't know what other error values do not need to be closed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the example, we need to close if we don't keepalive?
https://github.com/openresty/lua-resty-mysql#synopsis

				-- put it into the connection pool of size 100,
                -- with 10 seconds max idle timeout
                local ok, err = db:set_keepalive(10000, 100)
                if not ok then
                    ngx.say("failed to set keepalive: ", err)
                    return
                end

                -- or just close the connection right away:
                -- local ok, err = db:close()
                -- if not ok then
                --     ngx.say("failed to close: ", err)
                --     return
                -- end

Copy link
Contributor Author

@zhixiongdu027 zhixiongdu027 Mar 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think this place doesn't actively call db:close() should be acceptable,
this link provides agentzh's view
@spacewander



local function get_endpoint(servant)
local endpoint_version, err = endpoint_dict:get_stale(servant .. "#version")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use get_stale when we don't have an expiration time?

Copy link
Contributor Author

@zhixiongdu027 zhixiongdu027 Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fitst:
All servant:nodes pairs are from SQL queries and have no expiration time in the cache sense

Next:
When we read the results of fectch_full the logic is as follows:

1. endpoint_dict:flush_all() 
2. for (result : results )
    { 
         endpoint_dict[servant]=nodes 
    } 
3. endpoint_dict:flush_expired()

get_endpoint may happen at 2, we have to use endpoint_dict:get_stale

@spacewander

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't set any expiration for the shdict's data, right? So why do we need to use get_stale & flush_expired?

Copy link
Contributor Author

@zhixiongdu027 zhixiongdu027 Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't set any expiration for the shdict's data, right?

We call endpoint_dict:flush_all() to marks all the existing items as expired before read full query results.

Why:

Consider the following situation:
If the current endpoint is as follows:
k1:v1, k2:v2, k3:v3

Then we full query get the following results
k1:v1, k4:v4, k5:v5

At this time, we need
1: add k4:v4, k5:v5
2: delete k2:v2, k3:v3

flush_all and flush_expired can achieve this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your explanation. Let's enrich the comment in the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhixiongdu027
Please add some comment about the expiration operation, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I add comment in function fetch_full and forget function get_endpoint()

I will update

database: db_tars
user: root
password: tars2022
full_fetch_interval: 90
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use a shorter interval instead of waiting > 90 in the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this value Is check by schema ,
can we use an fake schema.lua int test ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can reduce the min in the schema? A sane user may not use a low value.

Copy link
Contributor Author

@zhixiongdu027 zhixiongdu027 Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A sane user may not use a low value.

Yes, I agree with this

But I think the meaning of scheame's existence is to ensure that this value is within a reasonable range,

Is it not appropriate to modify the schema if it is only for testing .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about rewriting schema in the test?
Like

local client = require("skywalking.client")
client.backendTimerDelay = 0.5

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will try

local core = require("apisix.core")
local d = require("apisix.discovery.tars")

ngx.sleep(11)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

@zhixiongdu027
Copy link
Contributor Author

@spacewander

Hi, I still have some questions to ask, please take a look at the comments I wrote.
thank you very much.

apisix/discovery/tars/init.lua Outdated Show resolved Hide resolved
apisix/discovery/tars/init.lua Show resolved Hide resolved
apisix/discovery/tars/init.lua Outdated Show resolved Hide resolved
apisix/discovery/tars/init.lua Show resolved Hide resolved
apisix/discovery/tars/init.lua Show resolved Hide resolved
@zhixiongdu027
Copy link
Contributor Author

@spacewander @tzssangglass
Hi , I have submitted a new submission,
Please help to review, thank you very much

return err
end
extract_endpoint(res)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For style, let's add return nil at the end



local function get_endpoint(servant)
local endpoint_version, err = endpoint_dict:get_stale(servant .. "#version")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhixiongdu027
Please add some comment about the expiration operation, thanks.

end
extract_endpoint(res)
end
endpoint_dict:flush_expired()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For style, let's add return nil at the end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comment about the expiration operation, thanks.

Sorry, I add comment in function fetch_full and forget function get_endpoint()

I will update

@tzssangglass
Copy link
Member

hi @zhixiongdu027 , file conflict, you need to merge master branch to your PR.

@zhixiongdu027
Copy link
Contributor Author

hi @zhixiongdu027 , file conflict, you need to merge master branch to your PR.

Done

@spacewander spacewander merged commit 5b21d31 into apache:master Mar 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants