Skip to content

Commit

Permalink
correctly catch errors thrown from Java in the Coma algorithm, raised…
Browse files Browse the repository at this point in the history
… in #58 (#69)
  • Loading branch information
kPsarakis authored Nov 9, 2023
1 parent 3f8a9e5 commit 057aec8
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 12 deletions.
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,11 @@ Valentine can be used to find matches among columns of a given pair of pandas Da
### Matching methods
In order to do so, the user can choose one of the following 5 matching methods:

1. `Coma(int: max_n str: strategy)` is a python wrapper around [COMA 3.0 Comunity edition](https://sourceforge.net/projects/coma-ce/)
1. `Coma(int: max_n, bool: use_instances, str: java_xmx)` is a python wrapper around [COMA 3.0 Comunity edition](https://sourceforge.net/projects/coma-ce/)
* **Parameters**:
* **max_n**(*int*) - Accept similarity threshold, default is 0.
* **strategy**(*str*) - Choice of "COMA\_OPT" (schema based matching - default) or "COMA\_OPT\_INST" (schema and instance based matching)
* **max_n**(*int*) - Accept similarity threshold, (default: 0).
* **use_instances**(*bool*) - Wheather Coma will make use of the data instances or just the schema information, (default: False).
* **java_xmx**(*str*) - The amount of RAM that Coma is allowed to use, (default: "1024m") .

2. `Cupid(float: w_struct, float: leaf_w_struct, float: th_accept)` is the python implementation of the paper [Generic Schema Matching with Cupid](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.79.4079&rep=rep1&type=pdf)
* **Parameters**:
Expand Down
25 changes: 16 additions & 9 deletions valentine/algorithms/coma/coma.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@
from ...utils.utils import get_project_root


class JavaException(Exception):
pass


class Coma(BaseMatcher):

def __init__(self,
Expand Down Expand Up @@ -45,15 +49,18 @@ def __run_coma_jar(self,
source_data = os.path.join(tmp_folder_path, source_table_f_name)
target_data = os.path.join(tmp_folder_path, target_table_f_name)
coma_output_path = os.path.join(tmp_folder_path, coma_output_path)
with open(os.path.join(tmp_folder_path, "NUL"), "w") as fh:
subprocess.call(['java', f'-Xmx{self.__java_XmX}',
'-cp', jar_path,
'-DinputFile1=' + source_data,
'-DinputFile2=' + target_data,
'-DoutputFile=' + coma_output_path,
'-DmaxN=' + str(self.__max_n),
'-Dstrategy=' + self.__strategy,
'Main'], stdout=fh, stderr=fh)
try:
subprocess.check_output(['java', f'-Xmx{self.__java_XmX}',
'-cp', jar_path,
'-DinputFile1=' + source_data,
'-DinputFile2=' + target_data,
'-DoutputFile=' + coma_output_path,
'-DmaxN=' + str(self.__max_n),
'-Dstrategy=' + self.__strategy,
'Main'], stderr=subprocess.DEVNULL)
except subprocess.CalledProcessError:
raise JavaException("Either Java (JRE) is not installed or Java does not have enough memory to operate. "
"Try raising the java_xmx parameter of the Coma class")

def __write_schema_csv_files(self,
table1: BaseTable,
Expand Down

0 comments on commit 057aec8

Please sign in to comment.