-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Product management (1st part) #773
Conversation
alembic.ini
Outdated
# WARN: This needs to be changed before "alembic" calls depending on | ||
# which database we are revisioning, the RUN or the PRODUCT CONFIG one. | ||
|
||
# Value for RUN databases: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have two ini files if that is possible than having to change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if alembic supports such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can either have separate directories with separate alembic.ini, or we can share the ini file:
f you really wanted two completely independent sets of migration scripts, then you'd run two migration environments.
They can share the same alembic.ini like this:
[my_db_one]
sqlalchemy.url =
[my_db_two]
sqlalchemy.url =
you then run alembic with "alembic -n my_db_one" or "alembic -n my_db_two". The "default" config area is set by -n.
A single env.py script can get multiple database URLs in any way it wants, as it determines how config is accessed. If you look in the multidb/env.py script, you'll see it's pulling multiple database urls from one section using config.get_section(name) - config file:
[alembic]
# path to migration scripts
script_location = ${script_location}
# template used to generate migration files
# file_template = %%(rev)s_%%(slug)s
databases = engine1, engine2
[engine1]
sqlalchemy.url = driver://user:pass@localhost/dbname
[engine2]
sqlalchemy.url = driver://user:pass@localhost/dbname2
usage:
config = context.config
db_names = config.get_main_option('databases')
for name in re.split(r',\s*', db_names):
engines[name] = rec = {}
rec['engine'] = engine_from_config(
config.get_section(name),
prefix='sqlalchemy.',
poolclass=pool.NullPool)
Over here I have both forms of multi db at the same time. There's two migration environments, and one migration environment does two databases that are largely mirrored, so three databases total. All three make use of a common env.py script that's in my application as a library, they then implement an env.py in the migration environment that draws upon the myapp/lib/env.py script for common features.
You can pass instructions to a single env.py that may be controlling multiple databases using --tag:
"alembic --tag my_tag"
"my_tag" is available in env.py as context.get_tag_argument(). You can use that to conditionally run migrations on one database or the other.
This is all DIY. Multi-database migrations can happen in many different ways so you'd need to build the approach that suits your situation best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Xazax-hun Thanks for clearing this up. I've updated the script.
api/products.thrift
Outdated
|
||
|
||
/* | ||
struct PrivilegeRecord { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not like the idea of checking in commented code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in line with the not yet fully committed database tables for product access, in the config_db_model
. I'd rather not remove this, as these structs and object models set forth the further development line.
* `8001` - CodeChecker result viewer | ||
* `8001` - CodeChecker server | ||
|
||
The server listens only on the local machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should mention that there is a command line argument to not to constrain the listening to the local machine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is why it says Default configuration. If someone checks the help for server
, they will see.
libcodechecker/cmd/product_client.py
Outdated
LOG.debug("Sending request to add product...") | ||
success = client.addProduct(prod) | ||
if success: | ||
LOG.info("Added successfully.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have a more descriptive info message, like Product added successfully.
libcodechecker/libclient/client.py
Outdated
Check if server product API is supported by the client. | ||
""" | ||
|
||
version = client.getAPIVersion() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intended that the same version queried here and above? Also, this functionality is inside the code for at least 3 times, (e.g. for database versioning). Maybe we should not duplicate this that many times but only have one version checking function since we are using the same versioning schema for everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not the same version. There is a database version, a product API version, and a run API version. The fact that they all are 6.0
right now is merely coincidental, as we are heavily breaking API and not versioning it at this point, but there will be one release which will set forth versioning from 6.0
and onwards.
Same as how the database migration is not handled, as the whole infrastructure in a big WIP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want separate versions for those APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would, yes. It could very well be that the run API changes earlier than the product API. They are very essentially two different systems, which could have two different clients, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is it likely to have a client that is only interested in one of the APIs and not the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If a client can make sure through every other means that the product exists (and later on, the user has rights to access), then it need not access the product-api at all.
But... then again, is it likely that we will have clients not our own? 😁
I'll eliminate these and put the version into the shared file. But the code that checks the version here must stay, because this function and the other can be called entirely separately. (One client is the one that returns the same version from endpoint Products
, the other from CodeCheckerService
.) Two distinct control flow, even though the value right now is the same, and even if the value will be made actually the same constant all the time.
|
||
import os | ||
import sys | ||
# import datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeps this file in accordance with the other two helpers in the module.
"=" + session_token} | ||
self.transport.setCustomHeaders(headers) | ||
|
||
# ------------------------------------------------------------ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not like these kind of comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeps this file in accordance with the other two helpers in the module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but why do we have a separate ThriftClientCall
wrapper? Shouldn't we reduce the duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is unknown. I think back when doing the Authentication module we decided to keep the pattern. Also, different endpoint clients could have different ways of handling the received exceptions. I think there are subtle differences in the except
statements already across the modules.
I would rather not introduce extra refactorings into this package. The scheme was already given and existing in the code, and I like keeping that.
libcodechecker/libhandlers/cmd.py
Outdated
help="The URL of the product to store the " | ||
"results for, in the format of " | ||
"host:port/ProductName.") | ||
elif not needs_product_url: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a simple else instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says above, that needs_product_url
handles None
, True
and False
separately.
None
indicates that the command does not connect to any sort of server, and no arguments related to connection are, thus, added.
If it is not None
, then:
True
means command connects to a product (most of thecmd
subcommands), and this code includes--url
to the arg list.False
means command connects to a server (such asproducts
, andlogin
), and the code includes--host
and--port
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but I was wondering, if None and True already handled, there is only one case left.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I could change. I would have opted for a more explicit code, due to lack of a trit type.
"specified! The database specified here must exist, and be " | ||
"connectible by the server.") | ||
|
||
# TODO: --dbSOMETHING arguments are kept to not break interface from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I mean, we do not need to preserve backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These TODOs will be cleared later on, as part of #745. Right now, this is here, to keep the argument list akin to that of server
.
|
||
# The split array looks like ['', 'product-name', ...]. | ||
first_part = urlparse.urlparse(self.path).path.split('/', 2)[1] | ||
return first_part if first_part not in NON_PRODUCT_NAMES else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't be better to have an URL like http://localhost:8001/product/[product-name]/#{request-parts}
so we do not need to maintain a list like NON_PRODUCT_NAMES
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would make the command line and the thrift client creation lot more complicated. (*) This list doesn't really need to be maintained, as there is almost no chance that a new subfolder under www-root
is added, as we already have every "resource folder" (fonts, styles, etc.) present and used.
(*): Right now, due to how URLs work, setting up the API endpoint in Thrift in the web code like new Thrift("Authentication")
automatically puts it under /[product-name]/Authentication
. Do we want to make it /product/[product-name]/service/Authentication
or... something else? It would make the webserver code a lot more complicated, as we need to start actually parsing the active URL there.
b5dde20
to
79284a3
Compare
api/products.thrift
Outdated
} | ||
|
||
/* ProductConfiguration carries administrative data regarding product settings. */ | ||
struct ProductConfiguration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost the same as Product. Do we need both of these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will, yes. Product will be extended with extra data, and Product carries information for users.
ProductConfiguration on the other hand carries configuration data, such as database stuff, which is applicable for the - upcoming - Web admin and command line management tools.
api/products.thrift
Outdated
3: i32 port, | ||
4: string username_b64, | ||
5: optional string password_b64, // Database password is NOT sent server->client! | ||
6: string database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a more descriptive name like database_name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would not be more descriptive. If engine
is sqlite
, this is a database path. If postgresql
, this is a database. (And almost all connection strings in the world name this field as database.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment would be nice what could be the values of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any new comments on this?
1: string engine, // The database engine, such as "sqlite" or "postgresql". | ||
2: string host, | ||
3: i32 port, | ||
4: string username_b64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the b64 suffix? Do we really want to encode this in base64 also in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, no. But right now, Thrift is pretty much messed up and there isn't a good way to get over this. This was a decision to be made. Later on, when the upstream packages are updated, the API will be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the just put a comment that this value should be b46 encoded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't changing how data is transmitted considered a backward-incompatible API break? In this case, the rename of the variable would automatically make the break for us.
~~~~~~ | ||
|
||
Start web server to view the results. | ||
Check a test project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check sub command no longer starts the server even when we are using SQLite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Since the "one storage and viewer server" update in 5.9, server needed to be started before a check. This update only changes so that check takes URL instead of host-port.
@@ -831,13 +836,13 @@ CodeChecker server -w ~/codechecker_wp --dbname myProjectdb --dbport 5432 --dbad | |||
The checking process can be started separately on the same machine | |||
|
|||
~~~~~~~~~~~~~~~~~~~~~ | |||
CodeChecker check -w ~/codechecker_wp -n myProject -b "make -j 4" --host localhost --port 8001 | |||
CodeChecker check -w ~/codechecker_wp -n myProject -b "make -j 4" --url localhost:8001/Default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to spell out the Default? Can't we omit it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, you can emit the entire --url
parameter, as it defaults to localhost:8001/Default
automatically. However, I think it is better, if the very advanced usage guides explicitly state that this argument is there.
Just as how, before the change, --host
and --port
was there, even though it was not needed, as this example here set them to their default values already.
libcodechecker/libclient/client.py
Outdated
@@ -156,7 +148,93 @@ def setup_client(host, port, uri): | |||
"login'.") | |||
sys.exit(1) | |||
|
|||
client = thrift_helper.ThriftClientHelper(host, port, uri, session_token) | |||
|
|||
def setup_product_client(host, port, product_name=None, already_authed=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a good idea to have an already_authed
argument? Can't you query that using the session token somehow?
libcodechecker/libclient/client.py
Outdated
product = product_client.getProducts(product_name, None) | ||
product_error_str = None | ||
if not (product and len(product) == 1): | ||
product_error_str = "It does not exist." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe Product does not exist.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the context of this function, you'll see that the full error printed to the user is formatted as:
The given product {URL_ENDPOINT} cannot be used! {ERROR_REASON}
It would be superfluous to say "product" once again.
"=" + session_token} | ||
self.transport.setCustomHeaders(headers) | ||
|
||
# ------------------------------------------------------------ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but why do we have a separate ThriftClientCall
wrapper? Shouldn't we reduce the duplication?
self.__connected = False | ||
self.__last_connect_attempt = (datetime.datetime.now(), ex.message) | ||
|
||
def teardown(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to make this a context manager so it can work with with
statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't. These database connection engines, just like the config one, are shared and alive on the entire server. These methods only exist to support loading and unloading databases mid-operations.
Contexts are created with the scoped_session
objects instantiated from the session_factory
property, see do_POST
for reference.
Session.configure(bind=self.__engine) | ||
self.sc_session = Session | ||
self.context = context | ||
self.check_env = check_env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need these new fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They store the external environment "pointers" from libhandlers/server.py
, where this class is theoretically initialised. Otherwise, at every product database connection, the context configuration files and the checker environment would need to be loaded, which results in an increased disk usage, and an increased bloat in the log output.
function (declare, domConstruct, ItemFileWriteStore, topic, Button, | ||
TextBox, BorderContainer, ContentPane, DataGrid, util) { | ||
|
||
function prettifyDuration(seconds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file contain the same function: https://github.com/whisperity/CodeChecker/blob/product_management/www/scripts/codecheckerviewer/ListOfRuns.js#L22.
Move it to a common place such as util.js.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for noticing this. It was an unused method in this file anyways.
(Needless to say I used ListOfRuns
as a source when I started doing this page 😀 )
api/products.thrift
Outdated
} | ||
typedef list<Product> Products | ||
|
||
service productService { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other service names start with the prefix codeChecker
:
- codeCheckerAuthentication
- codeCheckerDBAccess
I think we should use the same convention here (E.g.: codeCheckerProduct)
fe2e5b2
to
7a459ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of my comments are not related to the diff but to how the code is organized right now. If you do not want to deal with those problems now, I'm fine with adding a TODO comment.
def get_auth_client(port, host='localhost', uri='/Authentication', | ||
auto_handle_connection=True, session_token=None): | ||
return CCAuthHelper(host, port, uri, | ||
auto_handle_connection, | ||
session_token) | ||
|
||
|
||
def get_product_client(port, host='localhost', uri='/Products', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have these simple forwarding functions rather than instantiating the clients?
(And why do they have helper in their names? )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ask the original author of the test suite, the one who wrote the functions above. I kept myself to this scheme.
They have the name Helper
in their types just as how the Thrift client helpers (call stub forwarders) in libcodechecker/libclient
do. These objects are roughly the same as those, but they don't expose their method stubs to the environment either, and act as a true low-level forwarders... whereas those in libclient
have at least a method stub defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These even more low-level stubs exist here so theoretically, the test uses uses the least possible code from libcodechecker
.
# import only if necessary; some tests may not add this to PYTHONPATH | ||
def __init__(self, host, port, product, endpoint, | ||
auto_handle_connection=True, session_token=None): | ||
# Import only if necessary; some tests may not add this to PYTHONPATH. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a good pattern? Wouldn't we end up importing these twice? Does that have a negative side effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Importing a module multiple times is not a problem docs. It only gives a performance overhead on name resolution.
A module can contain executable statements as well as function definitions. [...] They are executed only the first time the module name is encountered in an import statement.
But even if importing the same module multiple times would execute, the things imported here are functions and constant, which are stateless.
from thrift_client_to_db import get_auth_client | ||
|
||
from functional import PKG_ROOT | ||
from functional import REPO_ROOT | ||
|
||
|
||
def get_free_port(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a good way to get a free port? Once we closed the socket, the OS might give this port to someone else, so there might be a race condition (although it should be pretty rare to run into this problem, and this is in test code anyways)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is absolutely not a good way, due to that problem.
It is also absolutely a method that has been moved from another location to this place.
But since we only ever set up a single server (excluding some certain more instristic test cases), this method is used way less times than before this patch. AFAIK, we call this 4 times, as opposed to the around 20-25 (postgresql AND server port was retrieved by this!) previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I would add a TODO though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
added.
@@ -71,6 +71,10 @@ def setUp(self): | |||
self._test_config = env.import_test_cfg(test_workspace) | |||
self._run_names = env.get_run_names(test_workspace) | |||
|
|||
self._url = self._test_config['codechecker_cfg']['viewer_host'] + \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw similar code multiple times. Maybe a config->url converter utility would be helpful? Or you might even have one somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appeared in 3 places. I've factored it out.
""" | ||
|
||
try: | ||
product = self.__session.query(Product) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you could use get
instead of filter
and early return when the result is None? (So we do not fail when the clients try to remove a non-existing product.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't attempting to remove a non-existent thing a failure? I agree with an early-"return", but in form of an early raise
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is fine for me as well.
msg) | ||
|
||
@timeit | ||
def getCurrentProduct(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the current product? Where is the state stored? Is this per connection? When and why would a client query this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line immediately after the comment of this method points you to where the "current product" is retrieved from. It is in self.__product
. This is populated by the router, if one connects to this endpoint as /[product-name]/ProductService
instead of /ProductService
.
Currently, only one place uses it. The run viewer client (the old home page) queries the name of the current product via this endpoint (setting up the API call in the aforementioned manner), and puts it into the browser's title page, and presents it on the web page body too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but I was wondering whether self.__product
is per connection or what is its lifetime. So should we afraid of race conditions? I would rather eliminate a state like this in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable pointed by self.__product
has a global lifetime of the entire server, as this is a Product
connection object, containing the database engine and everything, defined in client_db_access_server
. For each product currently connected, such an object exists.
self
itself is the Thrift handler, which is constructed per-API-call in do_POST
of the server handler, which gives one of these Product
instances as argument.
if not server_product.connected: | ||
server_product.connect() | ||
|
||
name = base64.b64encode(prod.display_name.encode('utf-8')) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this pattern duplicated. Wouldn't it be better to always store a name instead of always checking for this being None?
__tablename__ = 'products' | ||
|
||
__table_args__ = ( | ||
UniqueConstraint('endpoint'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we expect endpoint to be unique, shouldn't that be the primary key instead of having a separate autoincrement id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather have an ID as primary key in the product_admin
table, and various other permission tables that are upcoming. Strings as primary keys are a problematic subject... for example, MSSQL does not allow foreign keying onto a string (personal experience from half a year ago).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, while endpoint is expected to be unique, it isn't expected to always stay the same.
from sqlalchemy.ext.declarative import declarative_base | ||
from sqlalchemy.orm import * | ||
|
||
CC_META = MetaData(naming_convention={ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we move this to a common place? (Maybe even Base)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can not move Base
to a common place, at it is the base class of this ORM hierarchy (and run_db_model
has its own). And the two hierarchies are separate, they cannot be joined by the same base class. It would result in the database creation always creating both schemas in every database, which eventually prohibits every chance of migration.
@@ -4,41 +4,37 @@ | |||
# License. See LICENSE.TXT for details. | |||
# ------------------------------------------------------------------------- | |||
""" | |||
Database server handling. | |||
Database server handling for the server's configuration database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still a bit uncomfortable with duplicating this whole file. Could you explain why is it better than instantiating the database objects multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it uses entirely different semantics in subtle places. First of all, the used schema is imported from a different place. Second of all, failure to instantiate the config connection is a critical condition that brings down the entire server -- failure to instantiate a product connection is only an error. Config connection is initiated from the command-line, whereas product connections are created from connection strings.
From above, the two files look very similar. But there are many subtle differences. Just as how we have duplicated the common parts in the schema (such as the Metatable), and duplicate the migration environment's common code, I think it's much better if the two database connection systems are separated on the package name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we could pass the schema as an argument, it could always raise an exception and in case the exception is from the product database, the caller could bring down the server. And we have utilities to convert between connection strings and command line arguments. In this case, my gut feeling is that the need to maintain the code twice is worse than abstracting these away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I think, it would make debugging harder. I ran into a situation while creating this patch where it was very useful that in the stack trace it was obvious which handler was used in a certain case.
If we were only passing by arguments, this information would be lost. I think there is a divide between us about this, because I am very much for keeping these things separate. (Originally, they were even in separate folders under libcodechecker.server
but I did not want to have import strings that contain 3-4 elements in their path.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern, this way we always need to remember to change the code in multiple places. The information you mentioned is not lost, you can always check how the database engine was created (or add debug prints).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll do this, most likely tomorrow.
5e16bea
to
2115e35
Compare
87d323e
to
e167bb2
Compare
e167bb2
to
0c71680
Compare
b45da9b
to
5171df5
Compare
4ac35ca
to
d2de71b
Compare
api/products.thrift
Outdated
3: i32 port, | ||
4: string username_b64, | ||
5: optional string password_b64, // Database password is NOT sent server->client! | ||
6: string database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment would be nice what could be the values of it.
typedef list<ProductConfiguration> ProductConfigurations | ||
|
||
/* Product carries data to the end user's product list and tasks. */ | ||
struct Product { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we use a more descriptive name here? There is access control data for a specific user and global product data mixed together.
Should we separate this information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This struct is used (akin to RunData
) to query the information for listing products (and checking their accessibility). Separating this would result in the client (both the GUI and the CLI) having to send multiple API requests to figure out if a product exists.
(In this case, from the client's perspective, exists
means: the product endpoint is registered in the configuration
(with some extra fluff such as displayed name and ID) + the server can connect to the database
+ the user has rights to access the product
. If any of this doesn't hold, the clients will report an appropriate error, as no runs can be accessed in the product.)
Perhaps the next patch will remove ąccessible
. It is not fleshed out yet, only a placeholder exists so the GUI can properly present a 🔒 symbol. Currently, the API handler for getProducts()
hardcodes accessible=True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer api-s which check only one thing and not multiple things. Is it really that much of an overhead if multiple api calls are made (2-3 extra?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For each product, this would mean the following:
- Construction of a scope-specific permission configuration with the individual product's ID
- Encoding this data and making an API call
- The server opening a session on the configuration database and querying whether the user has rights to view the product
- This might incur more than one query due to permission inheritance
- Creating the response and sending it back
- The UI handling this
For each product. Which we don't know how many will be.
This will be almost as bad if for each Run
we would individually query the number of reports for each detection status. AKA:
- getDetectionStatusForRun(run_id, NEW)
- gDSFR(id, UNRESOLVED)
- ...
For each run.
Whereas this data (especially whether or not the database connection to the product could be made) is readily available at product listing, and is an integral part of getting user-facing information about the product.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I understand this, but I still think clean api is better which does/checks only one thing.
I did not sad we have to change it I've just sad I prefer not to mix multiple types of checks
"product endpoint exists" and "the user has access rights to it" into one api call.
On my query speed up I did not change the behavior of the api call or merged together with another just to perform better. I've improved an already existing api call which does only one thing.
@@ -4,6 +4,8 @@ | |||
// License. See LICENSE.TXT for details. | |||
// ------------------------------------------------------------------------- | |||
|
|||
const string API_VERSION = '6.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any other data what we share between the new product api, the report server api and the auth api?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. I think this change here was a result of a discussion with the team in person to not have separate API versions for every endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've just asked it because there are multiple structs in this file which should be moved to the report server thrift file, if they are used only there.
|
||
optional arguments: | ||
-h, --help show this help message and exit | ||
-n DISPLAY_NAME, --name DISPLAY_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be misleading if the display name is changed to a different value. How can I get the product name for a given display name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The display name is what appears in the title bar when you are viewing the product's runs. It can contain spaces and special characters.
CodeChecker cmd products list
shows the product's endpoint (and so does your browser's address bar when viewing the runs). The endpoint will have heavy restrictions imposed on its format, such as only alphanumeric characters, not many special chars, certain keywords will be prohibited, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Access in the command-line is done through the endpoint (although cmd products list
shows the displayed name). The display name is purely cosmetic, but the Web GUI filter filters based on display name. Think of it as run ID vs run name. Run name is purely cosmetic, as API access and database integrity is done through the ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that this can be a bit confusing that if I wan to list the product I get the displayed name but if I want to interact with a product I have to use the product name in the command line.
What do you think @dkrupp, @Xazax-hun ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Product endpoint, not name. Those are two different things, as explained earlier.
Values of these arguments are ignored, unless '--postgresql' is specified! | ||
The database specified here must exist, and be connectible by the server. | ||
|
||
--dbaddress DBADDRESS, --db-host DBADDRESS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this go directly to the database? The centrally started server could not do this, through the api?
So in this case why do we need the configuration database connection arguments here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand the question. This utility is cmd products add
, which essentially calls the addProduct
API function.
The run database could be anywhere, just like how the configuration database could be hosted on an arbitrary machine. The server configuration is modular in the sense that you can connect any run database (to which you have valid credentials for, of course).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think I understand my own confusion, and hopefully yours, too. These are NOT the configuration database options here. These are the database options for the run database.
CodeChecker cmd products add \
# This says which server to connect to.
# The server is already running and has a configuration database mounted through
# its cmdline arguments.
--host codechecker.mydomain.com --port 9999 \
MyProduct --name "A fancy product" \
--postgresql --dbaddress fancyproduct.mycompany.net \
# These are the arguments where the RUN database is located.
--dbport 5432 --dbuser whisperity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for clearing it up!
@@ -64,9 +62,6 @@ def handle_auth(host, port, username, login=False): | |||
if auth_token and handshake.sessionStillActive: | |||
LOG.info("You are already logged in.") | |||
return | |||
else: | |||
LOG.info("Server requires authentication to access. Please use " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we provide the information to the user somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. When the user tries to log in, they get this on the cmd
:
$ CodeChecker cmd runs
[ERROR] [18:59] - Access denied. This server requires authentication.
[ERROR] [18:59] - Please log in onto the server using 'CodeChecker cmd login'.
This message was removed so that typing in CodeChecker cmd login
does not prompt like this:
$ CodeChecker cmd login
[19:00] - Server requires authentication to access. Please use 'CodeChecker cmd login' to authenticate.
[19:00] - Logging in using credentials from command line...
Please provide password for user 'whisperity'
(The first line is removed. It is superfluous to tell the user to use [command] to authenticate right when they are actually using [command].)
a2d7550
to
cf8ec7d
Compare
alembic.ini
Outdated
# are written from script.py.mako | ||
# output_encoding = utf-8 | ||
|
||
sqlalchemy.url = postgres://codechecker@localhost:5432/codechecker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we use codechecker_cfg
database name so it will not be the same as the current database name used to store the reports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these values used anywhere? When the database is started in PostgreSQL mode, the user has to explicitly specify the config database's name.
api/products.thrift
Outdated
3: i32 port, | ||
4: string username_b64, | ||
5: optional string password_b64, // Database password is NOT sent server->client! | ||
6: string database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any new comments on this?
1: string engine, // The database engine, such as "sqlite" or "postgresql". | ||
2: string host, | ||
3: i32 port, | ||
4: string username_b64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the just put a comment that this value should be b46 encoded?
typedef list<ProductConfiguration> ProductConfigurations | ||
|
||
/* Product carries data to the end user's product list and tasks. */ | ||
struct Product { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer api-s which check only one thing and not multiple things. Is it really that much of an overhead if multiple api calls are made (2-3 extra?)
|
||
optional arguments: | ||
-h, --help show this help message and exit | ||
-n DISPLAY_NAME, --name DISPLAY_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that this can be a bit confusing that if I wan to list the product I get the displayed name but if I want to interact with a product I have to use the product name in the command line.
What do you think @dkrupp, @Xazax-hun ?
if success: | ||
LOG.info("Product added successfully.") | ||
else: | ||
LOG.error("Adding the product has failed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sys.exit(1)
if adding fails?
if success: | ||
LOG.info("Product removed.") | ||
else: | ||
LOG.error("An error occurred in product removal.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sys.exit(1)
here too if we fail to remove the product?
libcodechecker/libclient/client.py
Outdated
try: | ||
host, port, product_name = split_product_url(product_url) | ||
except: | ||
LOG.error("Couldn't understand the product URL!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "Wrong product URL format was provided"
would be a better error message.
help="Name of the database to use.") | ||
|
||
def __handle(args): | ||
"""Custom handler for 'add' so custom error messages can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is is worth to create a separate handle just because we do not want to capture the exceptions in main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This same code resides in server.py
too. This is because main()
everywhere deals with a proper argparse.Namespace
. This __handle
is basically an extension over argparse's argument parsing routine, as there are some special rules that can NOT be expressed with argparse's existing tooling.
@@ -8,6 +8,8 @@ | |||
functionality of 'store' and 'parse'. | |||
""" | |||
|
|||
# TODO: This command should be removed as a whole. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An issue would be better than a TODO comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#745's first action point is this removal.
023e259
to
2559306
Compare
@@ -26,6 +26,7 @@ | |||
from sqlalchemy import func | |||
|
|||
import shared | |||
from shared import constants as shared_constants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shared.constants
could be used if only shared
is imported its length is the same as shared_constants
2559306
to
22ba5ea
Compare
22ba5ea
to
b4db8fe
Compare
b4db8fe
to
d087ed9
Compare
- Introduce new configuration database schema, and database connection handling to this new database - Added basic API for product management - PostgreSQL server is not started automatically anymore - Products have a distinct endpoint (localhost:8001/ProductName/CodeCheckerService) in the request, and the requests are routed to the individual products based on this. - If server is started with SQLite configuration database and this is a brand new start, automatically configure a 'Default' product in the same folder. - Store a connection failure state for products that could not be connected to at server start. As no other packages manage and use database connections anymore, put the ORM and the run database handling into libcodechecker.server. This commit breaks the tests. It will be fixed later on. [ci skip]
…multiple database engines at the same time [ci skip]
- The CodeChecker Viewer now by default opens a product list. Users can select the product they want to view and will be routed to the run list. - If only ONE products exist on the server, simply opening http://host:port will automatically redirect the user to the run list of this single product. - This will only happen if this only product is properly connected. - Added a button to run lists to explicitly return to the product list page - The current product's name is now shown in the run list's title bar [ci skip]
- 'store' and 'cmd' now take --url PRODUCT_URL where applicable, instead of --host and --port. This URL defaults to the default product created by a brand new 'server' start, so can be omitted by end-users if they only work a single product. - Added 'cmd products' to handle basic product-related configuration, such as listing, adding and removing products - Added the appropriate server-side code to on-the-fly add and disconnect product databases Tests will be updated in a later commit. [ci skip]
- Tests now start a single server at the beginning of testing - PostgreSQL must be started prior the tests being ran - Each test suite registers a new product with a unique endpoint, and uses that particular run database to execute the tests that were already implemented. - These databases are dropped when the test is over
d087ed9
to
f484490
Compare
❗ Backward compatibility has broken
This patch massively breaks database backwards compatibility, because the meaning of the DB arguments changed! Do NOT start this without saving out your
~/.codechecker/codechecker.sqlite
file, or your PSQL database. Most likely data will not be lost (as the table names are orthogonal), but you won't be able to access the run results anymore, at least not from CodeChecker.This patch implements the first half of the product management system.
Changes summed up
host:port/ProductID
(localhost:8001/Default
), in the browser. This is the old CodeChecker page listing the runs in that product.--host
and--port
, an--url
argument, which is formatted exactly like the new web URL to access a product, that is,localhost:8001/Default
.Some stuff, such as below, are yet to be implemented:
This product does not exist.
It does not bring down the server, but it certainly can be more graceful, it will be later on.)The "webadmin" pages are missing, too. There is currently only way to manage products is via the command-line.