Skip to content
This repository was archived by the owner on Jan 3, 2023. It is now read-only.

Conversation

@rsm0001
Copy link
Contributor

@rsm0001 rsm0001 commented Apr 20, 2017

Karthik,
pl. review without merging. Pl. notify Kushal also, if an extra review helps. Any
suggestions to conform to current project coding standards would be appreciated.
Thanks.
Ramesh

@rsm0001 rsm0001 requested a review from kgururaj April 20, 2017 19:36
Copy link
Contributor

@kgururaj kgururaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good code.
General comment: Let's replace "meta" with "mapping". "meta" is overloaded so many times in the genomics area that it causes a lot of confusion.

CMakeLists.txt Outdated
find_library(LIBRT_LIBRARY rt)

#librt
find_library(LIBRT_LIBRARY rt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate rt?

CMakeLists.txt Outdated

#pgsql driver and dbi libs
find_package(libdbi REQUIRED)
include_directories(${LIBDBI_INCLUDE_DIR})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this an optional dependency - if LIBDBI_FOUND, then a preprocessor macro -DLIBDBI_FOUND should be passed (see add_definitions() for libcsv in the CMakeLists.txt).

In the source code,

#ifdef LIBDBI
  //All libdbi related code
#endif

CMakeLists.txt Outdated
if(LIBCSV_FOUND)
target_link_libraries(${target} ${LIBCSV_LIBRARY})
endif()
target_link_libraries(${target} ${LIBDBI_DEV_LIBRARY} ${LIBDBI1_LIBRARY} ${LIBPGSQL_DRIVER_LIBRARY})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional dependency

find_path(LIBDBI_INCLUDE_DIR NAMES dbi.h HINTS "/usr/include/dbi")
find_library(LIBDBI_DEV_LIBRARY NAMES dbi HINTS "/usr/lib/x86_64-linux-gnu")
find_library(LIBDBI1_LIBRARY NAMES dbi HINTS "/usr/lib/x86_64-linux-gnu")
find_library(LIBPGSQL_DRIVER_LIBRARY NAMES dbdpgsql HINTS "/usr/lib/x86_64-linux-gnu/dbd")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions must have a variable LIBDBI_ROOT (similar to LIBCSV_DIR) which the user can set to point to the libdbi libraries and headers. This allows users to install the headers/libraries in non-standard locations and still compile GenomicsDB

find_library(LIBDBI1_LIBRARY NAMES dbi HINTS "/usr/lib/x86_64-linux-gnu")
find_library(LIBPGSQL_DRIVER_LIBRARY NAMES dbdpgsql HINTS "/usr/lib/x86_64-linux-gnu/dbd")
include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(libcsv "Could not find libcsv headers and/or libraries ${DEFAULT_MSG}" LIBDBI_INCLUDE_DIR LIBDBI_DEV_LIBRARY LIBDBI1_LIBRARY LIBPGSQL_DRIVER_LIBRARY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change libcsv to libdbi


~SQLBasedVidMapper();
protected:
int num_contigs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to store this in the object - use local variable in the function. Once data is loaded into the VidMapper objects m_contig_idx_to_info.size() will give you the number of contigs anyway.

protected:
int num_contigs;
dbi_conn conn;
dbi_inst instance;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, we distinguish members of a class from local variables with a naming convention.

  • Google coding style recommends adding "_" at the end of variable name.
  • In GenomicsDB, I have been using the Hungarian notation - prefix every member variable name with "m_"
  • Snake case for variable names.

dbi_conn_set_option(conn, "username", "postgres");
dbi_conn_set_option(conn, "password", "postgres");
dbi_conn_set_option(conn, "dbname", "gendb");
dbi_conn_set_option(conn, "encoding", "UTF-8");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be arguments to the constructor. Other arguments include:

  • Workspace (string)
  • Array (string)

##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##reference=file:///seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this file deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange. I find it in my local repository. Let me check.

@rsm0001
Copy link
Contributor Author

rsm0001 commented Apr 21, 2017

Thank you Karthik! @kgururaj
Will make necessary fixes.

Copy link
Contributor

@kgururaj kgururaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good time to start adding CI tests in Travis. Let's start by installing the pre-requisites (libdbi-dev and libdbd-pgsql modules) and compile on Travis

build_GenomicsDB_executable(example_libtiledb_variant_driver)
build_GenomicsDB_executable(test_genomicsdb_bcf_generator)
build_GenomicsDB_executable(test_genomicsdb_importer)
build_GenomicsDB_executable(test_mapping_data_loader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if(LIBDBI_FOUND)
build_GenomicsDB_executable(test_mapping_data_loader)
endif()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

cpp/src/utils/lut.cc
cpp/src/utils/known_field_info.cc
cpp/src/utils/vid_mapper.cc
cpp/src/utils/vid_mapper_sql.cc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if(LIBDBI_FOUND)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary. There is a macro enveloping the code which prevents code from compiling
if libdbi is not installed. Should we remove that check and place this check instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, that's fine

dbi_conn_set_option(m_conn, "encoding", "UTF-8");

if (dbi_conn_connect(m_conn) < 0) {
VERIFY_OR_THROW(1 < 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw exception with a descriptive error message "Could not connect to DB : with "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

contig_name = dbi_result_get_string(result, "name");

/**
* There is probably a constraint which prevents the following
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

class SQLBasedVidMapper : public VidMapper {
public:
SQLBasedVidMapper(const SQLVidMapperRequest&);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably delete the copy constructor since there is no way to do deep copy of the libdbi structures.
If you wish to create a move constructor, you can probably copy the libdbi fields to the current object and zero out the other structure.
To delete the copy and move constructors, do:
SQLBasedVidMapper(const SQLBasedVidMapper& other) = delete;
SQLBasedVidMapper(SQLBasedVidMapper&& other) = delete;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not a copy constructor. We are passing a request object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that.

The compiler will always initialize a default copy constructor if you do not explicitly define one or delete the copy constructor. The default copy constructor will likely be incorrect since we have no idea whether the dbi* structures will be copied 'deeply' by an assignment operator. Hence, I'm suggesting to delete the copy and move constructors

.travis.yml Outdated
- sudo apt-get update -q
- sudo apt-get install g++-4.9 -y
- sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 60
- sudo apt-get install libdbi-dev libdbd-pgsql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apt-get -y install


#pgsql driver and dbi libs
find_package(libdbi)
if(LIBDBI_FOUND)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the libdbi dependence be optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is optional. If libdbi is not installed in an environment, the dependent files won't be compiled.
SQLVidMapper in this instance.

MappingDataLoaderTester tester(request);

tester.print_contig_info();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a test code with any assertion is meaningless. Instead of printing these values, why not check whether they match with known values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.


SQLBasedVidMapper::SQLBasedVidMapper(const SQLVidMapperRequest& request) : VidMapper() {
dbi_initialize_r(NULL, &m_instance);
m_conn = dbi_conn_new_r("pgsql", m_instance);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't hardcode the driver name! This goes to the header file as a constant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

continue;
}

int64_t tiledb_column_offset = dbi_result_get_longlong(result, "tiledb_column_offset");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgururaj, these field names are hard-coded in many places. better to use macros for all occurrences

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

#vcf_metadata_and_tile
vcf_only_tile
echo "--------------------------------------"
# -----------------------------------------------------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@rsm0001
Copy link
Contributor Author

rsm0001 commented May 4, 2017

@kdatta @kgururaj Pl. review.
Yet to code populating callsets.

Copy link
Contributor

@kgururaj kgururaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one - let's add callset mapping query

int load_mapping_data_from_db();

std::vector<ContigInfo> get_contigs() { return(m_contig_idx_to_info); }
std::vector<ContigInfo>& get_contigs() { return(m_contig_idx_to_info); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this to VidMapper?

std::vector<std::pair<int64_t, int>>& get_contig_end_offsets() {
return(m_contig_end_2_idx);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

ss <<"and (b.name = '" <<m_array_name <<"') and ";
ss <<"(a.reference_set_id = b.reference_set_id))";
ss <<"(a.reference_set_id = b.reference_set_id)) ";
ss <<"order by a.tiledb_column_offset";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
let the DB do the work


SQLBasedVidMapper&& operator=(SQLBasedVidMapper&&) = delete;

int load_mapping_data_from_db();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rsm0001 add documentation to the class functions/members in the header file.

ss <<"order by a.tiledb_column_offset";

std::string query = ss.str();
std::cout <<"QUERY: <" <<query <<">\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete hanging print statements like this one.

@@ -0,0 +1,170 @@
#! /bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove author name. Add Intel MIT license

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


# -----------------------------------------------------------------------------

create_load_tile_cfg() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_load_tiledb_cfg

# -----------------------------------------------------------------------------

create_load_tile_cfg() {
outfile="${INITDIR}/load_to_tile.cfg"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename load_to_tiledb.cfg

Copy link
Contributor

@kgururaj kgururaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next review phase completed


find_library(LIBPGSQL_DRIVER_LIBRARY NAMES dbdpgsql HINTS "${LIBDBI_DIR}/lib/${CMAKE_C_LIBRARY_ARCHITECTURE}/dbd")

find_library(LIBDBI_DEV_LIBRARY NAMES dbi)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the same HINTS be passed for libdbi also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEV_LIB is being found without any issues. It was necessary for PGSQL driver because the
library is in a subdirectory (dbd).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's being found because you are installing the libraries in a standard system directory (/usr ?) - if the library is installed in a non-standard location, it wouldn't be detected. Adding HINTS would fix that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Karthik - these are libraries installed thro' package manager. Unless we build libraries from source, we shouldn't have to worry. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


while (dbi_result_next_row(result)) {
int64_t field_id = dbi_result_get_longlong(result, DBTABLE_FIELD_COLUMN_ID.c_str());
int64_t field_idx = (field_id - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field_id returned from the DB could be much larger than the num_fields for this specific array when the mapping DB has information about multiple arrays. Computing field_idx from the field_id will cause writes to un-allocated memory in the lines below. Use an incrementing counter for field_idx.

auto known_field_enum = 0u;
auto is_known_field = KnownFieldInfo::get_known_field_enum_for_name(field_name, known_field_enum);

if ((0 == length_type.compare("ERROR")) || length_type.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the std::string, you can use length_type == "ERROR"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Let us leave it this way.

get_input("pass_word", "postgres", request.pass_word);
get_input("db_name", "gendb", request.db_name);

get_input("work_space", "/home/rmantrix/git_repos/NomixDB/INST_GenomicsDB/tests/workspace", request.work_space);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coded path :(
Even if this is default - no hard-coding for workspace, array, please. localhost for mapping DB is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

CMakeLists.txt Outdated
build_GenomicsDB_executable_common(${target})
endif()
endfunction()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the tests being called in the CI somewhere? if you use add_test() in CMake, they will be called when "make test" is invoked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. However, CLASSPATH and LD_LIBRARY_PATH are expected in the environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you modify these environment variables in the .travis.yml file as needed? See the section:

env:
    global:

You have to install gtest on the Travis VM - add commands to the .travis.yml. The tests didn't build because the library wasn't found.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you should add install(TARGETS ${target} RUNTIME DESTINATION bin) for the gtest executable inside the if block (after line 216)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added gtest installation in .travis.yml
LD_LIBRARY_PATH and CLASSPATH already exist
install TARGET is on line 203

}
}

std::string combine_op = dbi_result_get_string(result, DBTABLE_FIELD_COLUMN_COMBOP.c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the result be null? Based on your PR Intel-HLS/GenomicsSampleAPIs#33, it looks like it can be

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Fixed.

Copy link
Contributor

@kgururaj kgururaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final review - if the CI tests pass, can be merged

CMakeLists.txt Outdated
build_GenomicsDB_executable_common(${target})
endif()
endfunction()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you modify these environment variables in the .travis.yml file as needed? See the section:

env:
    global:

You have to install gtest on the Travis VM - add commands to the .travis.yml. The tests didn't build because the library wasn't found.


find_library(LIBPGSQL_DRIVER_LIBRARY NAMES dbdpgsql HINTS "${LIBDBI_DIR}/lib/${CMAKE_C_LIBRARY_ARCHITECTURE}/dbd")

find_library(LIBDBI_DEV_LIBRARY NAMES dbi)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's being found because you are installing the libraries in a standard system directory (/usr ?) - if the library is installed in a non-standard location, it wouldn't be detected. Adding HINTS would fix that

@rsm0001
Copy link
Contributor Author

rsm0001 commented Jun 21, 2017

Karthik - these are libraries installed thro' package manager. Unless we build libraries from source, we shouldn't have to worry. What do you think?

@kgururaj
Copy link
Contributor

Merge with latest master commit and I will merge the PR into the master

@kgururaj kgururaj merged commit 8a2ff80 into master Jun 27, 2017
@kgururaj kgururaj deleted the metadata_server branch June 27, 2017 18:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants