Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Product management (1st part) #773

Merged
merged 6 commits into from
Sep 1, 2017

Conversation

whisperity
Copy link
Contributor

Partially implements #691.

❗ Backward compatibility has broken

This patch massively breaks database backwards compatibility, because the meaning of the DB arguments changed! Do NOT start this without saving out your ~/.codechecker/codechecker.sqlite file, or your PSQL database. Most likely data will not be lost (as the table names are orthogonal), but you won't be able to access the run results anymore, at least not from CodeChecker.


This patch implements the first half of the product management system.

Changes summed up

  • The server process now organises run databases into products, which are unique database (files or psql connections), each of which are a "run database". (Run database is the new identifier name for what was the CodeChecker database before this patch. The other database is called the config database.)
  • Products are accessible through host:port/ProductID (localhost:8001/Default), in the browser. This is the old CodeChecker page listing the runs in that product.
  • The command-line also takes, instead of --host and --port, an --url argument, which is formatted exactly like the new web URL to access a product, that is, localhost:8001/Default.
  • Initial start of a SQLite server automatically creates this Default product for ease of use.
  • Tests have been rewritten to use the new structure. Testing time takes around half with this change, as the tests no longer bring up a server and wait for a database on every test.
  • The new homepage, the product list, automatically redirects to the only product's run list, if there is only one product configured, yet again, to ease the use for those who don't want to delve into this extensive feature.

Some stuff, such as below, are yet to be implemented:

  • Sanitising product URL parts (endpoints) from problematic special chars.
  • Handling if the database dies out mid-operation more gracefully. (Currently the server will continuously respond This product does not exist. It does not bring down the server, but it certainly can be more graceful, it will be later on.)
  • The concepts of root, superuser, product admin does not apply. If you have access to the server, you have access to every product whatsoever.

The "webadmin" pages are missing, too. There is currently only way to manage products is via the command-line.

@whisperity whisperity added API change 📄 Content of patch changes API! WARN ⚠️: Backward compatibility breaker! MIND THE GAP! Merging this patch will mess up compatibility with the previous releases! CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands database 🗄️ Issues related to the database schema. enhancement 🌟 labels Aug 7, 2017
@whisperity whisperity added this to the 6.0 pre2 milestone Aug 7, 2017
alembic.ini Outdated
# WARN: This needs to be changed before "alembic" calls depending on
# which database we are revisioning, the RUN or the PRODUCT CONFIG one.

# Value for RUN databases:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have two ini files if that is possible than having to change this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if alembic supports such.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can either have separate directories with separate alembic.ini, or we can share the ini file:

f you really wanted two completely independent sets of migration scripts, then you'd run two migration environments.  

They can share the same alembic.ini like this:

[my_db_one]
sqlalchemy.url =

[my_db_two]
sqlalchemy.url =

you then run alembic with "alembic -n my_db_one" or "alembic -n my_db_two".    The "default" config area is set by -n.

A single env.py script can get multiple database URLs in any way it wants, as it determines how config is accessed.   If you look in the multidb/env.py script, you'll see it's pulling multiple database urls from one section using config.get_section(name) - config file:

[alembic]
# path to migration scripts
script_location = ${script_location}

# template used to generate migration files
# file_template = %%(rev)s_%%(slug)s

databases = engine1, engine2

[engine1]
sqlalchemy.url = driver://user:pass@localhost/dbname

[engine2]
sqlalchemy.url = driver://user:pass@localhost/dbname2

usage:

    config = context.config

    db_names = config.get_main_option('databases')

    for name in re.split(r',\s*', db_names):
        engines[name] = rec = {}
        rec['engine'] = engine_from_config(
                                    config.get_section(name),
                                    prefix='sqlalchemy.',
                                    poolclass=pool.NullPool)

Over here I have both forms of multi db at the same time.   There's two migration environments, and one migration environment does two databases that are largely mirrored, so three databases total.   All three make use of a common env.py script that's in my application as a library, they then implement an env.py in the migration environment that draws upon the myapp/lib/env.py script for common features.

You can pass instructions to a single env.py that may be controlling multiple databases using --tag:

"alembic --tag my_tag"

"my_tag" is available in env.py as context.get_tag_argument().   You can use that to conditionally run migrations on one database or the other.

This is all DIY.  Multi-database migrations can happen in many different ways so you'd need to build the approach that suits your situation best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xazax-hun Thanks for clearing this up. I've updated the script.



/*
struct PrivilegeRecord {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like the idea of checking in commented code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in line with the not yet fully committed database tables for product access, in the config_db_model. I'd rather not remove this, as these structs and object models set forth the further development line.

* `8001` - CodeChecker result viewer
* `8001` - CodeChecker server

The server listens only on the local machine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should mention that there is a command line argument to not to constrain the listening to the local machine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is why it says Default configuration. If someone checks the help for server, they will see.

LOG.debug("Sending request to add product...")
success = client.addProduct(prod)
if success:
LOG.info("Added successfully.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have a more descriptive info message, like Product added successfully.

Check if server product API is supported by the client.
"""

version = client.getAPIVersion()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended that the same version queried here and above? Also, this functionality is inside the code for at least 3 times, (e.g. for database versioning). Maybe we should not duplicate this that many times but only have one version checking function since we are using the same versioning schema for everything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not the same version. There is a database version, a product API version, and a run API version. The fact that they all are 6.0 right now is merely coincidental, as we are heavily breaking API and not versioning it at this point, but there will be one release which will set forth versioning from 6.0 and onwards.

Same as how the database migration is not handled, as the whole infrastructure in a big WIP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want separate versions for those APIs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would, yes. It could very well be that the run API changes earlier than the product API. They are very essentially two different systems, which could have two different clients, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is it likely to have a client that is only interested in one of the APIs and not the other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If a client can make sure through every other means that the product exists (and later on, the user has rights to access), then it need not access the product-api at all.

But... then again, is it likely that we will have clients not our own? 😁

I'll eliminate these and put the version into the shared file. But the code that checks the version here must stay, because this function and the other can be called entirely separately. (One client is the one that returns the same version from endpoint Products, the other from CodeCheckerService.) Two distinct control flow, even though the value right now is the same, and even if the value will be made actually the same constant all the time.


import os
import sys
# import datetime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeps this file in accordance with the other two helpers in the module.

"=" + session_token}
self.transport.setCustomHeaders(headers)

# ------------------------------------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like these kind of comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeps this file in accordance with the other two helpers in the module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but why do we have a separate ThriftClientCallwrapper? Shouldn't we reduce the duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is unknown. I think back when doing the Authentication module we decided to keep the pattern. Also, different endpoint clients could have different ways of handling the received exceptions. I think there are subtle differences in the except statements already across the modules.

I would rather not introduce extra refactorings into this package. The scheme was already given and existing in the code, and I like keeping that.

help="The URL of the product to store the "
"results for, in the format of "
"host:port/ProductName.")
elif not needs_product_url:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a simple else instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says above, that needs_product_url handles None, True and False separately.

None indicates that the command does not connect to any sort of server, and no arguments related to connection are, thus, added.

If it is not None, then:

  • True means command connects to a product (most of the cmd subcommands), and this code includes --url to the arg list.
  • False means command connects to a server (such as products, and login), and the code includes --host and --port.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but I was wondering, if None and True already handled, there is only one case left.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I could change. I would have opted for a more explicit code, due to lack of a trit type.

"specified! The database specified here must exist, and be "
"connectible by the server.")

# TODO: --dbSOMETHING arguments are kept to not break interface from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I mean, we do not need to preserve backward compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These TODOs will be cleared later on, as part of #745. Right now, this is here, to keep the argument list akin to that of server.


# The split array looks like ['', 'product-name', ...].
first_part = urlparse.urlparse(self.path).path.split('/', 2)[1]
return first_part if first_part not in NON_PRODUCT_NAMES else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be better to have an URL like http://localhost:8001/product/[product-name]/#{request-parts} so we do not need to maintain a list like NON_PRODUCT_NAMES?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would make the command line and the thrift client creation lot more complicated. (*) This list doesn't really need to be maintained, as there is almost no chance that a new subfolder under www-root is added, as we already have every "resource folder" (fonts, styles, etc.) present and used.

(*): Right now, due to how URLs work, setting up the API endpoint in Thrift in the web code like new Thrift("Authentication") automatically puts it under /[product-name]/Authentication. Do we want to make it /product/[product-name]/service/Authentication or... something else? It would make the webserver code a lot more complicated, as we need to start actually parsing the active URL there.

@whisperity whisperity requested a review from gyorb August 7, 2017 12:15
@whisperity whisperity force-pushed the product_management branch 4 times, most recently from b5dde20 to 79284a3 Compare August 7, 2017 14:48
}

/* ProductConfiguration carries administrative data regarding product settings. */
struct ProductConfiguration {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost the same as Product. Do we need both of these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will, yes. Product will be extended with extra data, and Product carries information for users.

ProductConfiguration on the other hand carries configuration data, such as database stuff, which is applicable for the - upcoming - Web admin and command line management tools.

3: i32 port,
4: string username_b64,
5: optional string password_b64, // Database password is NOT sent server->client!
6: string database
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a more descriptive name like database_name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would not be more descriptive. If engine is sqlite, this is a database path. If postgresql, this is a database. (And almost all connection strings in the world name this field as database.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment would be nice what could be the values of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any new comments on this?

1: string engine, // The database engine, such as "sqlite" or "postgresql".
2: string host,
3: i32 port,
4: string username_b64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the b64 suffix? Do we really want to encode this in base64 also in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, no. But right now, Thrift is pretty much messed up and there isn't a good way to get over this. This was a decision to be made. Later on, when the upstream packages are updated, the API will be changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the just put a comment that this value should be b46 encoded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't changing how data is transmitted considered a backward-incompatible API break? In this case, the rename of the variable would automatically make the break for us.

~~~~~~

Start web server to view the results.
Check a test project.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check sub command no longer starts the server even when we are using SQLite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. Since the "one storage and viewer server" update in 5.9, server needed to be started before a check. This update only changes so that check takes URL instead of host-port.

@@ -831,13 +836,13 @@ CodeChecker server -w ~/codechecker_wp --dbname myProjectdb --dbport 5432 --dbad
The checking process can be started separately on the same machine

~~~~~~~~~~~~~~~~~~~~~
CodeChecker check -w ~/codechecker_wp -n myProject -b "make -j 4" --host localhost --port 8001
CodeChecker check -w ~/codechecker_wp -n myProject -b "make -j 4" --url localhost:8001/Default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to spell out the Default? Can't we omit it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, you can emit the entire --url parameter, as it defaults to localhost:8001/Default automatically. However, I think it is better, if the very advanced usage guides explicitly state that this argument is there.

Just as how, before the change, --host and --port was there, even though it was not needed, as this example here set them to their default values already.

@@ -156,7 +148,93 @@ def setup_client(host, port, uri):
"login'.")
sys.exit(1)

client = thrift_helper.ThriftClientHelper(host, port, uri, session_token)

def setup_product_client(host, port, product_name=None, already_authed=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a good idea to have an already_authed argument? Can't you query that using the session token somehow?

product = product_client.getProducts(product_name, None)
product_error_str = None
if not (product and len(product) == 1):
product_error_str = "It does not exist."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Product does not exist.?

Copy link
Contributor Author

@whisperity whisperity Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the context of this function, you'll see that the full error printed to the user is formatted as:

The given product {URL_ENDPOINT} cannot be used! {ERROR_REASON}

It would be superfluous to say "product" once again.

"=" + session_token}
self.transport.setCustomHeaders(headers)

# ------------------------------------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but why do we have a separate ThriftClientCallwrapper? Shouldn't we reduce the duplication?

self.__connected = False
self.__last_connect_attempt = (datetime.datetime.now(), ex.message)

def teardown(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to make this a context manager so it can work with with statements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't. These database connection engines, just like the config one, are shared and alive on the entire server. These methods only exist to support loading and unloading databases mid-operations.

Contexts are created with the scoped_session objects instantiated from the session_factory property, see do_POST for reference.

Session.configure(bind=self.__engine)
self.sc_session = Session
self.context = context
self.check_env = check_env
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these new fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They store the external environment "pointers" from libhandlers/server.py, where this class is theoretically initialised. Otherwise, at every product database connection, the context configuration files and the checker environment would need to be loaded, which results in an increased disk usage, and an increased bloat in the log output.

function (declare, domConstruct, ItemFileWriteStore, topic, Button,
TextBox, BorderContainer, ContentPane, DataGrid, util) {

function prettifyDuration(seconds) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file contain the same function: https://github.com/whisperity/CodeChecker/blob/product_management/www/scripts/codecheckerviewer/ListOfRuns.js#L22.
Move it to a common place such as util.js.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for noticing this. It was an unused method in this file anyways.
(Needless to say I used ListOfRuns as a source when I started doing this page 😀 )

}
typedef list<Product> Products

service productService {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other service names start with the prefix codeChecker:

  • codeCheckerAuthentication
  • codeCheckerDBAccess

I think we should use the same convention here (E.g.: codeCheckerProduct)

@whisperity whisperity force-pushed the product_management branch 2 times, most recently from fe2e5b2 to 7a459ba Compare August 8, 2017 08:38
Copy link
Contributor

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of my comments are not related to the diff but to how the code is organized right now. If you do not want to deal with those problems now, I'm fine with adding a TODO comment.

def get_auth_client(port, host='localhost', uri='/Authentication',
auto_handle_connection=True, session_token=None):
return CCAuthHelper(host, port, uri,
auto_handle_connection,
session_token)


def get_product_client(port, host='localhost', uri='/Products',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have these simple forwarding functions rather than instantiating the clients?
(And why do they have helper in their names? )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ask the original author of the test suite, the one who wrote the functions above. I kept myself to this scheme.

They have the name Helper in their types just as how the Thrift client helpers (call stub forwarders) in libcodechecker/libclient do. These objects are roughly the same as those, but they don't expose their method stubs to the environment either, and act as a true low-level forwarders... whereas those in libclient have at least a method stub defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These even more low-level stubs exist here so theoretically, the test uses uses the least possible code from libcodechecker.

# import only if necessary; some tests may not add this to PYTHONPATH
def __init__(self, host, port, product, endpoint,
auto_handle_connection=True, session_token=None):
# Import only if necessary; some tests may not add this to PYTHONPATH.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a good pattern? Wouldn't we end up importing these twice? Does that have a negative side effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing a module multiple times is not a problem docs. It only gives a performance overhead on name resolution.

A module can contain executable statements as well as function definitions. [...] They are executed only the first time the module name is encountered in an import statement.

But even if importing the same module multiple times would execute, the things imported here are functions and constant, which are stateless.

from thrift_client_to_db import get_auth_client

from functional import PKG_ROOT
from functional import REPO_ROOT


def get_free_port():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a good way to get a free port? Once we closed the socket, the OS might give this port to someone else, so there might be a race condition (although it should be pretty rare to run into this problem, and this is in test code anyways)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is absolutely not a good way, due to that problem.
It is also absolutely a method that has been moved from another location to this place.

But since we only ever set up a single server (excluding some certain more instristic test cases), this method is used way less times than before this patch. AFAIK, we call this 4 times, as opposed to the around 20-25 (postgresql AND server port was retrieved by this!) previously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I would add a TODO though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO added.

@@ -71,6 +71,10 @@ def setUp(self):
self._test_config = env.import_test_cfg(test_workspace)
self._run_names = env.get_run_names(test_workspace)

self._url = self._test_config['codechecker_cfg']['viewer_host'] + \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw similar code multiple times. Maybe a config->url converter utility would be helpful? Or you might even have one somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appeared in 3 places. I've factored it out.

"""

try:
product = self.__session.query(Product) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could use get instead of filter and early return when the result is None? (So we do not fail when the clients try to remove a non-existing product.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't attempting to remove a non-existent thing a failure? I agree with an early-"return", but in form of an early raise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fine for me as well.

msg)

@timeit
def getCurrentProduct(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the current product? Where is the state stored? Is this per connection? When and why would a client query this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line immediately after the comment of this method points you to where the "current product" is retrieved from. It is in self.__product. This is populated by the router, if one connects to this endpoint as /[product-name]/ProductService instead of /ProductService.

Currently, only one place uses it. The run viewer client (the old home page) queries the name of the current product via this endpoint (setting up the API call in the aforementioned manner), and puts it into the browser's title page, and presents it on the web page body too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but I was wondering whether self.__product is per connection or what is its lifetime. So should we afraid of race conditions? I would rather eliminate a state like this in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable pointed by self.__product has a global lifetime of the entire server, as this is a Product connection object, containing the database engine and everything, defined in client_db_access_server. For each product currently connected, such an object exists.

self itself is the Thrift handler, which is constructed per-API-call in do_POST of the server handler, which gives one of these Product instances as argument.

if not server_product.connected:
server_product.connect()

name = base64.b64encode(prod.display_name.encode('utf-8')) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this pattern duplicated. Wouldn't it be better to always store a name instead of always checking for this being None?

__tablename__ = 'products'

__table_args__ = (
UniqueConstraint('endpoint'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we expect endpoint to be unique, shouldn't that be the primary key instead of having a separate autoincrement id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather have an ID as primary key in the product_admin table, and various other permission tables that are upcoming. Strings as primary keys are a problematic subject... for example, MSSQL does not allow foreign keying onto a string (personal experience from half a year ago).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, while endpoint is expected to be unique, it isn't expected to always stay the same.

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import *

CC_META = MetaData(naming_convention={
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we move this to a common place? (Maybe even Base)

Copy link
Contributor Author

@whisperity whisperity Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can not move Base to a common place, at it is the base class of this ORM hierarchy (and run_db_model has its own). And the two hierarchies are separate, they cannot be joined by the same base class. It would result in the database creation always creating both schemas in every database, which eventually prohibits every chance of migration.

@@ -4,41 +4,37 @@
# License. See LICENSE.TXT for details.
# -------------------------------------------------------------------------
"""
Database server handling.
Database server handling for the server's configuration database.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still a bit uncomfortable with duplicating this whole file. Could you explain why is it better than instantiating the database objects multiple times?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it uses entirely different semantics in subtle places. First of all, the used schema is imported from a different place. Second of all, failure to instantiate the config connection is a critical condition that brings down the entire server -- failure to instantiate a product connection is only an error. Config connection is initiated from the command-line, whereas product connections are created from connection strings.

From above, the two files look very similar. But there are many subtle differences. Just as how we have duplicated the common parts in the schema (such as the Metatable), and duplicate the migration environment's common code, I think it's much better if the two database connection systems are separated on the package name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we could pass the schema as an argument, it could always raise an exception and in case the exception is from the product database, the caller could bring down the server. And we have utilities to convert between connection strings and command line arguments. In this case, my gut feeling is that the need to maintain the code twice is worse than abstracting these away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I think, it would make debugging harder. I ran into a situation while creating this patch where it was very useful that in the stack trace it was obvious which handler was used in a certain case.

If we were only passing by arguments, this information would be lost. I think there is a divide between us about this, because I am very much for keeping these things separate. (Originally, they were even in separate folders under libcodechecker.server but I did not want to have import strings that contain 3-4 elements in their path.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern, this way we always need to remember to change the code in multiple places. The information you mentioned is not lost, you can always check how the database engine was created (or add debug prints).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll do this, most likely tomorrow.

@whisperity whisperity force-pushed the product_management branch 2 times, most recently from 5e16bea to 2115e35 Compare August 10, 2017 14:48
@whisperity whisperity added the WIP 💣 Work In Progress label Aug 10, 2017
@whisperity whisperity force-pushed the product_management branch 6 times, most recently from 87d323e to e167bb2 Compare August 11, 2017 09:43
@whisperity whisperity removed the WIP 💣 Work In Progress label Aug 11, 2017
3: i32 port,
4: string username_b64,
5: optional string password_b64, // Database password is NOT sent server->client!
6: string database
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment would be nice what could be the values of it.

typedef list<ProductConfiguration> ProductConfigurations

/* Product carries data to the end user's product list and tasks. */
struct Product {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use a more descriptive name here? There is access control data for a specific user and global product data mixed together.
Should we separate this information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct is used (akin to RunData) to query the information for listing products (and checking their accessibility). Separating this would result in the client (both the GUI and the CLI) having to send multiple API requests to figure out if a product exists.

(In this case, from the client's perspective, exists means: the product endpoint is registered in the configuration (with some extra fluff such as displayed name and ID) + the server can connect to the database + the user has rights to access the product. If any of this doesn't hold, the clients will report an appropriate error, as no runs can be accessed in the product.)

Perhaps the next patch will remove ąccessible. It is not fleshed out yet, only a placeholder exists so the GUI can properly present a 🔒 symbol. Currently, the API handler for getProducts() hardcodes accessible=True.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer api-s which check only one thing and not multiple things. Is it really that much of an overhead if multiple api calls are made (2-3 extra?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each product, this would mean the following:

  • Construction of a scope-specific permission configuration with the individual product's ID
  • Encoding this data and making an API call
  • The server opening a session on the configuration database and querying whether the user has rights to view the product
    • This might incur more than one query due to permission inheritance
  • Creating the response and sending it back
  • The UI handling this

For each product. Which we don't know how many will be.

This will be almost as bad if for each Run we would individually query the number of reports for each detection status. AKA:

  • getDetectionStatusForRun(run_id, NEW)
  • gDSFR(id, UNRESOLVED)
  • ...

For each run.

Whereas this data (especially whether or not the database connection to the product could be made) is readily available at product listing, and is an integral part of getting user-facing information about the product.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gyorb You yourself had to make #844 to speed up the queries. Your comment here states the exact opposite of what you seemed to have to do in #844 to make the system faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I understand this, but I still think clean api is better which does/checks only one thing.
I did not sad we have to change it I've just sad I prefer not to mix multiple types of checks
"product endpoint exists" and "the user has access rights to it" into one api call.
On my query speed up I did not change the behavior of the api call or merged together with another just to perform better. I've improved an already existing api call which does only one thing.

@@ -4,6 +4,8 @@
// License. See LICENSE.TXT for details.
// -------------------------------------------------------------------------

const string API_VERSION = '6.0'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any other data what we share between the new product api, the report server api and the auth api?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. I think this change here was a result of a discussion with the team in person to not have separate API versions for every endpoint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've just asked it because there are multiple structs in this file which should be moved to the report server thrift file, if they are used only there.


optional arguments:
-h, --help show this help message and exit
-n DISPLAY_NAME, --name DISPLAY_NAME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be misleading if the display name is changed to a different value. How can I get the product name for a given display name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The display name is what appears in the title bar when you are viewing the product's runs. It can contain spaces and special characters.

CodeChecker cmd products list shows the product's endpoint (and so does your browser's address bar when viewing the runs). The endpoint will have heavy restrictions imposed on its format, such as only alphanumeric characters, not many special chars, certain keywords will be prohibited, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Access in the command-line is done through the endpoint (although cmd products list shows the displayed name). The display name is purely cosmetic, but the Web GUI filter filters based on display name. Think of it as run ID vs run name. Run name is purely cosmetic, as API access and database integrity is done through the ID.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that this can be a bit confusing that if I wan to list the product I get the displayed name but if I want to interact with a product I have to use the product name in the command line.
What do you think @dkrupp, @Xazax-hun ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Product endpoint, not name. Those are two different things, as explained earlier.

Values of these arguments are ignored, unless '--postgresql' is specified!
The database specified here must exist, and be connectible by the server.

--dbaddress DBADDRESS, --db-host DBADDRESS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this go directly to the database? The centrally started server could not do this, through the api?

So in this case why do we need the configuration database connection arguments here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the question. This utility is cmd products add, which essentially calls the addProduct API function.

The run database could be anywhere, just like how the configuration database could be hosted on an arbitrary machine. The server configuration is modular in the sense that you can connect any run database (to which you have valid credentials for, of course).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I understand my own confusion, and hopefully yours, too. These are NOT the configuration database options here. These are the database options for the run database.

CodeChecker cmd products add \
 # This says which server to connect to.
 # The server is already running and has a configuration database mounted through
 # its cmdline arguments.
  --host codechecker.mydomain.com --port 9999 \  
  MyProduct --name "A fancy product"          \
  --postgresql --dbaddress fancyproduct.mycompany.net \
 # These are the arguments where the RUN database is located.
  --dbport 5432 --dbuser whisperity  

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for clearing it up!

@@ -64,9 +62,6 @@ def handle_auth(host, port, username, login=False):
if auth_token and handshake.sessionStillActive:
LOG.info("You are already logged in.")
return
else:
LOG.info("Server requires authentication to access. Please use "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we provide the information to the user somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. When the user tries to log in, they get this on the cmd:

$ CodeChecker cmd runs
[ERROR] [18:59] - Access denied. This server requires authentication.
[ERROR] [18:59] - Please log in onto the server using 'CodeChecker cmd login'.

This message was removed so that typing in CodeChecker cmd login does not prompt like this:

$ CodeChecker cmd login
[19:00] - Server requires authentication to access. Please use 'CodeChecker cmd login' to authenticate.
[19:00] - Logging in using credentials from command line...
Please provide password for user 'whisperity'

(The first line is removed. It is superfluous to tell the user to use [command] to authenticate right when they are actually using [command].)

@whisperity whisperity force-pushed the product_management branch 2 times, most recently from a2d7550 to cf8ec7d Compare August 25, 2017 15:48
alembic.ini Outdated
# are written from script.py.mako
# output_encoding = utf-8

sqlalchemy.url = postgres://codechecker@localhost:5432/codechecker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use codechecker_cfg database name so it will not be the same as the current database name used to store the reports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these values used anywhere? When the database is started in PostgreSQL mode, the user has to explicitly specify the config database's name.

3: i32 port,
4: string username_b64,
5: optional string password_b64, // Database password is NOT sent server->client!
6: string database
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any new comments on this?

1: string engine, // The database engine, such as "sqlite" or "postgresql".
2: string host,
3: i32 port,
4: string username_b64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the just put a comment that this value should be b46 encoded?

typedef list<ProductConfiguration> ProductConfigurations

/* Product carries data to the end user's product list and tasks. */
struct Product {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer api-s which check only one thing and not multiple things. Is it really that much of an overhead if multiple api calls are made (2-3 extra?)


optional arguments:
-h, --help show this help message and exit
-n DISPLAY_NAME, --name DISPLAY_NAME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that this can be a bit confusing that if I wan to list the product I get the displayed name but if I want to interact with a product I have to use the product name in the command line.
What do you think @dkrupp, @Xazax-hun ?

if success:
LOG.info("Product added successfully.")
else:
LOG.error("Adding the product has failed.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys.exit(1) if adding fails?

if success:
LOG.info("Product removed.")
else:
LOG.error("An error occurred in product removal.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys.exit(1) here too if we fail to remove the product?

try:
host, port, product_name = split_product_url(product_url)
except:
LOG.error("Couldn't understand the product URL!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "Wrong product URL format was provided" would be a better error message.

help="Name of the database to use.")

def __handle(args):
"""Custom handler for 'add' so custom error messages can be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is worth to create a separate handle just because we do not want to capture the exceptions in main?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This same code resides in server.py too. This is because main() everywhere deals with a proper argparse.Namespace. This __handle is basically an extension over argparse's argument parsing routine, as there are some special rules that can NOT be expressed with argparse's existing tooling.

@@ -8,6 +8,8 @@
functionality of 'store' and 'parse'.
"""

# TODO: This command should be removed as a whole.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An issue would be better than a TODO comment here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#745's first action point is this removal.

@whisperity whisperity force-pushed the product_management branch 2 times, most recently from 023e259 to 2559306 Compare August 30, 2017 11:55
@@ -26,6 +26,7 @@
from sqlalchemy import func

import shared
from shared import constants as shared_constants
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shared.constants could be used if only shared is imported its length is the same as shared_constants

 - Introduce new configuration database schema, and database connection
   handling to this new database
 - Added basic API for product management
 - PostgreSQL server is not started automatically anymore
 - Products have a distinct endpoint
   (localhost:8001/ProductName/CodeCheckerService) in the request, and the
   requests are routed to the individual products based on this.
 - If server is started with SQLite configuration database and this is a
   brand new start, automatically configure a 'Default' product in the
   same folder.
 - Store a connection failure state for products that could not be
   connected to at server start.

As no other packages manage and use database connections anymore, put
the ORM and the run database handling into libcodechecker.server.

This commit breaks the tests. It will be fixed later on.
[ci skip]
…multiple database engines at the same time

[ci skip]
 - The CodeChecker Viewer now by default opens a product list. Users can
   select the product they want to view and will be routed to the run list.
 - If only ONE products exist on the server, simply opening
   http://host:port will automatically redirect the user to the run list
   of this single product.
    - This will only happen if this only product is properly connected.
 - Added a button to run lists to explicitly return to the product list
   page
 - The current product's name is now shown in the run list's title bar

[ci skip]
 - 'store' and 'cmd' now take --url PRODUCT_URL where applicable,
   instead of --host and --port. This URL defaults to the default
   product created by a brand new 'server' start, so can be omitted by
   end-users if they only work a single product.
 - Added 'cmd products' to handle basic product-related configuration,
   such as listing, adding and removing products
 - Added the appropriate server-side code to on-the-fly add and
   disconnect product databases

Tests will be updated in a later commit.
[ci skip]
 - Tests now start a single server at the beginning of testing
 - PostgreSQL must be started prior the tests being ran
 - Each test suite registers a new product with a unique endpoint, and
   uses that particular run database to execute the tests that were
   already implemented.
 - These databases are dropped when the test is over
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API change 📄 Content of patch changes API! CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands database 🗄️ Issues related to the database schema. documentation 📖 Changes to documentation. enhancement 🌟 GUI 🎨 server 🖥️ test ☑️ Adding or refactoring tests WARN ⚠️: Backward compatibility breaker! MIND THE GAP! Merging this patch will mess up compatibility with the previous releases!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants