-
Notifications
You must be signed in to change notification settings - Fork 28
Metadata server #104
Metadata server #104
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good code.
General comment: Let's replace "meta" with "mapping". "meta" is overloaded so many times in the genomics area that it causes a lot of confusion.
CMakeLists.txt
Outdated
| find_library(LIBRT_LIBRARY rt) | ||
|
|
||
| #librt | ||
| find_library(LIBRT_LIBRARY rt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate rt?
CMakeLists.txt
Outdated
|
|
||
| #pgsql driver and dbi libs | ||
| find_package(libdbi REQUIRED) | ||
| include_directories(${LIBDBI_INCLUDE_DIR}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this an optional dependency - if LIBDBI_FOUND, then a preprocessor macro -DLIBDBI_FOUND should be passed (see add_definitions() for libcsv in the CMakeLists.txt).
In the source code,
#ifdef LIBDBI
//All libdbi related code
#endif
CMakeLists.txt
Outdated
| if(LIBCSV_FOUND) | ||
| target_link_libraries(${target} ${LIBCSV_LIBRARY}) | ||
| endif() | ||
| target_link_libraries(${target} ${LIBDBI_DEV_LIBRARY} ${LIBDBI1_LIBRARY} ${LIBPGSQL_DRIVER_LIBRARY}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional dependency
cmake/Modules/Findlibdbi.cmake
Outdated
| find_path(LIBDBI_INCLUDE_DIR NAMES dbi.h HINTS "/usr/include/dbi") | ||
| find_library(LIBDBI_DEV_LIBRARY NAMES dbi HINTS "/usr/lib/x86_64-linux-gnu") | ||
| find_library(LIBDBI1_LIBRARY NAMES dbi HINTS "/usr/lib/x86_64-linux-gnu") | ||
| find_library(LIBPGSQL_DRIVER_LIBRARY NAMES dbdpgsql HINTS "/usr/lib/x86_64-linux-gnu/dbd") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions must have a variable LIBDBI_ROOT (similar to LIBCSV_DIR) which the user can set to point to the libdbi libraries and headers. This allows users to install the headers/libraries in non-standard locations and still compile GenomicsDB
cmake/Modules/Findlibdbi.cmake
Outdated
| find_library(LIBDBI1_LIBRARY NAMES dbi HINTS "/usr/lib/x86_64-linux-gnu") | ||
| find_library(LIBPGSQL_DRIVER_LIBRARY NAMES dbdpgsql HINTS "/usr/lib/x86_64-linux-gnu/dbd") | ||
| include(FindPackageHandleStandardArgs) | ||
| find_package_handle_standard_args(libcsv "Could not find libcsv headers and/or libraries ${DEFAULT_MSG}" LIBDBI_INCLUDE_DIR LIBDBI_DEV_LIBRARY LIBDBI1_LIBRARY LIBPGSQL_DRIVER_LIBRARY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change libcsv to libdbi
|
|
||
| ~SQLBasedVidMapper(); | ||
| protected: | ||
| int num_contigs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to store this in the object - use local variable in the function. Once data is loaded into the VidMapper objects m_contig_idx_to_info.size() will give you the number of contigs anyway.
| protected: | ||
| int num_contigs; | ||
| dbi_conn conn; | ||
| dbi_inst instance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, we distinguish members of a class from local variables with a naming convention.
- Google coding style recommends adding "_" at the end of variable name.
- In GenomicsDB, I have been using the Hungarian notation - prefix every member variable name with "m_"
- Snake case for variable names.
| dbi_conn_set_option(conn, "username", "postgres"); | ||
| dbi_conn_set_option(conn, "password", "postgres"); | ||
| dbi_conn_set_option(conn, "dbname", "gendb"); | ||
| dbi_conn_set_option(conn, "encoding", "UTF-8"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be arguments to the constructor. Other arguments include:
- Workspace (string)
- Array (string)
| ##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads"> | ||
| ##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities"> | ||
| ##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias"> | ||
| ##reference=file:///seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this file deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange. I find it in my local repository. Let me check.
|
Thank you Karthik! @kgururaj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a good time to start adding CI tests in Travis. Let's start by installing the pre-requisites (libdbi-dev and libdbd-pgsql modules) and compile on Travis
example/CMakeLists.txt
Outdated
| build_GenomicsDB_executable(example_libtiledb_variant_driver) | ||
| build_GenomicsDB_executable(test_genomicsdb_bcf_generator) | ||
| build_GenomicsDB_executable(test_genomicsdb_importer) | ||
| build_GenomicsDB_executable(test_mapping_data_loader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if(LIBDBI_FOUND)
build_GenomicsDB_executable(test_mapping_data_loader)
endif()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| cpp/src/utils/lut.cc | ||
| cpp/src/utils/known_field_info.cc | ||
| cpp/src/utils/vid_mapper.cc | ||
| cpp/src/utils/vid_mapper_sql.cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if(LIBDBI_FOUND)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary. There is a macro enveloping the code which prevents code from compiling
if libdbi is not installed. Should we remove that check and place this check instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, that's fine
| dbi_conn_set_option(m_conn, "encoding", "UTF-8"); | ||
|
|
||
| if (dbi_conn_connect(m_conn) < 0) { | ||
| VERIFY_OR_THROW(1 < 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw exception with a descriptive error message "Could not connect to DB : with "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| contig_name = dbi_result_get_string(result, "name"); | ||
|
|
||
| /** | ||
| * There is probably a constraint which prevents the following |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
| class SQLBasedVidMapper : public VidMapper { | ||
| public: | ||
| SQLBasedVidMapper(const SQLVidMapperRequest&); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably delete the copy constructor since there is no way to do deep copy of the libdbi structures.
If you wish to create a move constructor, you can probably copy the libdbi fields to the current object and zero out the other structure.
To delete the copy and move constructors, do:
SQLBasedVidMapper(const SQLBasedVidMapper& other) = delete;
SQLBasedVidMapper(SQLBasedVidMapper&& other) = delete;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not a copy constructor. We are passing a request object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that.
The compiler will always initialize a default copy constructor if you do not explicitly define one or delete the copy constructor. The default copy constructor will likely be incorrect since we have no idea whether the dbi* structures will be copied 'deeply' by an assignment operator. Hence, I'm suggesting to delete the copy and move constructors
.travis.yml
Outdated
| - sudo apt-get update -q | ||
| - sudo apt-get install g++-4.9 -y | ||
| - sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 60 | ||
| - sudo apt-get install libdbi-dev libdbd-pgsql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apt-get -y install
|
|
||
| #pgsql driver and dbi libs | ||
| find_package(libdbi) | ||
| if(LIBDBI_FOUND) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the libdbi dependence be optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is optional. If libdbi is not installed in an environment, the dependent files won't be compiled.
SQLVidMapper in this instance.
| MappingDataLoaderTester tester(request); | ||
|
|
||
| tester.print_contig_info(); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a test code with any assertion is meaningless. Instead of printing these values, why not check whether they match with known values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do.
|
|
||
| SQLBasedVidMapper::SQLBasedVidMapper(const SQLVidMapperRequest& request) : VidMapper() { | ||
| dbi_initialize_r(NULL, &m_instance); | ||
| m_conn = dbi_conn_new_r("pgsql", m_instance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't hardcode the driver name! This goes to the header file as a constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| continue; | ||
| } | ||
|
|
||
| int64_t tiledb_column_offset = dbi_result_get_longlong(result, "tiledb_column_offset"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgururaj, these field names are hard-coded in many places. better to use macros for all occurrences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| #vcf_metadata_and_tile | ||
| vcf_only_tile | ||
| echo "--------------------------------------" | ||
| # ----------------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good one - let's add callset mapping query
| int load_mapping_data_from_db(); | ||
|
|
||
| std::vector<ContigInfo> get_contigs() { return(m_contig_idx_to_info); } | ||
| std::vector<ContigInfo>& get_contigs() { return(m_contig_idx_to_info); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move this to VidMapper?
| std::vector<std::pair<int64_t, int>>& get_contig_end_offsets() { | ||
| return(m_contig_end_2_idx); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
| ss <<"and (b.name = '" <<m_array_name <<"') and "; | ||
| ss <<"(a.reference_set_id = b.reference_set_id))"; | ||
| ss <<"(a.reference_set_id = b.reference_set_id)) "; | ||
| ss <<"order by a.tiledb_column_offset"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
let the DB do the work
|
|
||
| SQLBasedVidMapper&& operator=(SQLBasedVidMapper&&) = delete; | ||
|
|
||
| int load_mapping_data_from_db(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rsm0001 add documentation to the class functions/members in the header file.
| ss <<"order by a.tiledb_column_offset"; | ||
|
|
||
| std::string query = ss.str(); | ||
| std::cout <<"QUERY: <" <<query <<">\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete hanging print statements like this one.
| @@ -0,0 +1,170 @@ | |||
| #! /bin/bash | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove author name. Add Intel MIT license
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
tests/load_genomics_metadata.sh
Outdated
|
|
||
| # ----------------------------------------------------------------------------- | ||
|
|
||
| create_load_tile_cfg() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_load_tiledb_cfg
tests/load_genomics_metadata.sh
Outdated
| # ----------------------------------------------------------------------------- | ||
|
|
||
| create_load_tile_cfg() { | ||
| outfile="${INITDIR}/load_to_tile.cfg" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename load_to_tiledb.cfg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next review phase completed
cmake/Modules/Findlibdbi.cmake
Outdated
|
|
||
| find_library(LIBPGSQL_DRIVER_LIBRARY NAMES dbdpgsql HINTS "${LIBDBI_DIR}/lib/${CMAKE_C_LIBRARY_ARCHITECTURE}/dbd") | ||
|
|
||
| find_library(LIBDBI_DEV_LIBRARY NAMES dbi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the same HINTS be passed for libdbi also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEV_LIB is being found without any issues. It was necessary for PGSQL driver because the
library is in a subdirectory (dbd).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's being found because you are installing the libraries in a standard system directory (/usr ?) - if the library is installed in a non-standard location, it wouldn't be detected. Adding HINTS would fix that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Karthik - these are libraries installed thro' package manager. Unless we build libraries from source, we shouldn't have to worry. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
|
||
| while (dbi_result_next_row(result)) { | ||
| int64_t field_id = dbi_result_get_longlong(result, DBTABLE_FIELD_COLUMN_ID.c_str()); | ||
| int64_t field_idx = (field_id - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
field_id returned from the DB could be much larger than the num_fields for this specific array when the mapping DB has information about multiple arrays. Computing field_idx from the field_id will cause writes to un-allocated memory in the lines below. Use an incrementing counter for field_idx.
| auto known_field_enum = 0u; | ||
| auto is_known_field = KnownFieldInfo::get_known_field_enum_for_name(field_name, known_field_enum); | ||
|
|
||
| if ((0 == length_type.compare("ERROR")) || length_type.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the std::string, you can use length_type == "ERROR"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Let us leave it this way.
| get_input("pass_word", "postgres", request.pass_word); | ||
| get_input("db_name", "gendb", request.db_name); | ||
|
|
||
| get_input("work_space", "/home/rmantrix/git_repos/NomixDB/INST_GenomicsDB/tests/workspace", request.work_space); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard-coded path :(
Even if this is default - no hard-coding for workspace, array, please. localhost for mapping DB is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
CMakeLists.txt
Outdated
| build_GenomicsDB_executable_common(${target}) | ||
| endif() | ||
| endfunction() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the tests being called in the CI somewhere? if you use add_test() in CMake, they will be called when "make test" is invoked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. However, CLASSPATH and LD_LIBRARY_PATH are expected in the environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you modify these environment variables in the .travis.yml file as needed? See the section:
env:
global:
You have to install gtest on the Travis VM - add commands to the .travis.yml. The tests didn't build because the library wasn't found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you should add install(TARGETS ${target} RUNTIME DESTINATION bin) for the gtest executable inside the if block (after line 216)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added gtest installation in .travis.yml
LD_LIBRARY_PATH and CLASSPATH already exist
install TARGET is on line 203
| } | ||
| } | ||
|
|
||
| std::string combine_op = dbi_result_get_string(result, DBTABLE_FIELD_COLUMN_COMBOP.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the result be null? Based on your PR Intel-HLS/GenomicsSampleAPIs#33, it looks like it can be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final review - if the CI tests pass, can be merged
CMakeLists.txt
Outdated
| build_GenomicsDB_executable_common(${target}) | ||
| endif() | ||
| endfunction() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you modify these environment variables in the .travis.yml file as needed? See the section:
env:
global:
You have to install gtest on the Travis VM - add commands to the .travis.yml. The tests didn't build because the library wasn't found.
cmake/Modules/Findlibdbi.cmake
Outdated
|
|
||
| find_library(LIBPGSQL_DRIVER_LIBRARY NAMES dbdpgsql HINTS "${LIBDBI_DIR}/lib/${CMAKE_C_LIBRARY_ARCHITECTURE}/dbd") | ||
|
|
||
| find_library(LIBDBI_DEV_LIBRARY NAMES dbi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's being found because you are installing the libraries in a standard system directory (/usr ?) - if the library is installed in a non-standard location, it wouldn't be detected. Adding HINTS would fix that
|
Karthik - these are libraries installed thro' package manager. Unless we build libraries from source, we shouldn't have to worry. What do you think? |
|
Merge with latest master commit and I will merge the PR into the master |
…csDB into metadata_server
Karthik,
pl. review without merging. Pl. notify Kushal also, if an extra review helps. Any
suggestions to conform to current project coding standards would be appreciated.
Thanks.
Ramesh