Entity mapping - expose score and candidates

dariosky · dariosky · commit 30a5e8b86492 · 2020-11-04T16:53:28.000+01:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,42 +1,53 @@
 # Changelog
 
+## v1.0.43 (2020-11-04)
+
+* Entity-mapping - expose the matching score and the candidates
+
 ## v1.0.40 (2020-10-07)
+
 * Support for the /jobs endpoint (to list the user past endpoints)
 
 ## v1.0.39 (2020-09-22)
+
 * Persistent sessions between API-calls
 
 ## v1.0.38 (2020-09-04)
+
 * Text-Analytics API additional functions: `get_analytics` and `get_annotated`
 
 ## v1.0.36 (2020-05-14)
+
 * Text-Analytics API endpoint updated: folders & richer metadata
 * Extended error handling to support Feed disconnection problems
 
 ## v1.0.35 (2020-02-22)
+
 Initial support for the Text-Analytics API endpoints
 
 ## v1.0.34 (2019-11-11)
+
 Retrieve a lazy-loaded dataset when setting one of its paramters.
 
 ## v1.0.33 (2019-10-17)
 
 * A default timeout of 10" on connection and 60" on silence has been added to all the API calls
 * Retrieve or save a flatfile using the new methods `get_flatfile` and `save_flatfile`.
-See `get_historical_flat_list.py` for a complete example.
+  See `get_historical_flat_list.py` for a complete example.
 
 ## v1.0.32 (2019-08-13)
+
 The RPApi instance gets two new methods:
 
 * `get_document_url` to retrieve the document url from a RP_STORY_ID
 * `get_flatfile_list` to retrieve the list of the available flatfiles for `companies`
- or `full` (for all the entities)
+  or `full` (for all the entities)
 
 ## v1.0.29 (2019-05-21)
+
 **dataset creation explicit parameters**
 
-The Dataset parameters are not explictly passed in the constructor
-instead of being hidden in the kwargs.
+The Dataset parameters are not explictly passed in the constructor instead of being hidden in the kwargs.
 
 This allows also to clearly support custom_fields and conditions.
 
@@ -45,6 +56,7 @@ A few new examples have been added or updated:
 [create a dataset with custom_fields and conditions](ravenpackapi/examples/indicator_datasets.py).
 
 ## v1.0.28 (2019-05-15)
+
 **dataset.count method**
 
 ```python
diff --git a/README.rst b/README.rst
@@ -11,7 +11,7 @@ Installation
 
 ::
 
-    pip install ravenpackapi
+   pip install ravenpackapi
 
 About
 -----
@@ -24,7 +24,7 @@ Usage
 -----
 
 In order to be able to use the RavenPack API you will need an API KEY.
-If you don't already have one please contact your `customer
+If you don’t already have one please contact your `customer
 support <mailto:sales@ravenpack.com>`__ representative.
 
 To begin using the API you will need to instantiate an API object that
@@ -35,9 +35,9 @@ environment variable or set it in your code:
 
 .. code:: python
 
-    from ravenpackapi import RPApi
+   from ravenpackapi import RPApi
 
-    api = RPApi(api_key="YOUR_API_KEY")
+   api = RPApi(api_key="YOUR_API_KEY")
 
 Creating a new dataset
 ~~~~~~~~~~~~~~~~~~~~~~
@@ -47,19 +47,19 @@ API with a Dataset instance.
 
 .. code:: python
 
-    from ravenpackapi import Dataset
+   from ravenpackapi import Dataset
 
-    ds = api.create_dataset(
-        Dataset(
-            name="New Dataset",
-            filters={
-                "relevance": {
-                    "$gte": 90
-                }
-            },
-        )
-    )
-    print("Dataset created", ds)
+   ds = api.create_dataset(
+       Dataset(
+           name="New Dataset",
+           filters={
+               "relevance": {
+                   "$gte": 90
+               }
+           },
+       )
+   )
+   print("Dataset created", ds)
 
 Getting data from the datasets
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -71,10 +71,10 @@ Here is how you may get a dataset definition for a pre-existing dataset
 
 .. code:: python
 
-    # Get the dataset description from the server, here we use 'us30'
-    # one of RavenPack public datasets with the top30 companies in the US  
+   # Get the dataset description from the server, here we use 'us30'
+   # one of RavenPack public datasets with the top30 companies in the US  
 
-    ds = api.get_dataset(dataset_id='us30')
+   ds = api.get_dataset(dataset_id='us30')
 
 Downloads: json
 ^^^^^^^^^^^^^^^
@@ -85,13 +85,13 @@ use the asynchronous datafile endpoint instead.
 
 .. code:: python
 
-    data = ds.json(
-        start_date='2018-01-05 18:00:00',
-        end_date='2018-01-05 18:01:00',
-    )
+   data = ds.json(
+       start_date='2018-01-05 18:00:00',
+       end_date='2018-01-05 18:01:00',
+   )
 
-    for record in data:
-        print(record)
+   for record in data:
+       print(record)
 
 Json queries are limited to \* granular datasets: 10,000 records \*
 indicator datasets: 500 entities, timerange 1 year, lookback window 1
@@ -108,13 +108,13 @@ some time to complete.
 
 .. code:: python
 
-    job = ds.request_datafile(
-        start_date='2018-01-05 18:00:00',
-        end_date='2018-01-05 18:01:00',
-    )
+   job = ds.request_datafile(
+       start_date='2018-01-05 18:00:00',
+       end_date='2018-01-05 18:01:00',
+   )
 
-    with open('output.csv') as fp:
-        job.save_to_file(filename=fp.name)
+   with open('output.csv') as fp:
+       job.save_to_file(filename=fp.name)
 
 Streaming real-time data
 ~~~~~~~~~~~~~~~~~~~~~~~~
@@ -130,64 +130,64 @@ You can find a `real-time streaming example
 here <ravenpackapi/examples/get_realtime_news.py>`__.
 
 The Result object handles the conversion of various fields into the
-appropriate type, i.e. ``record.timestamp_utc`` will be converted to
+appropriate type, i.e. \ ``record.timestamp_utc`` will be converted to
 ``datetime``
 
 Entity mapping
 ~~~~~~~~~~~~~~
 
-The entity mapping endpoint allow you to find the RP\_ENTITY\_ID mapped
-to your universe of entities.
+The entity mapping endpoint allow you to find the RP_ENTITY_ID mapped to
+your universe of entities.
 
 .. code:: python
 
-    universe = [
-        "RavenPack",
-        {'ticker': 'AAPL'},
-        'California USA',
-        {  # Amazon, specifying various fields
-            "client_id": "12345-A",
-            "date": "2017-01-01",
-            "name": "Amazon Inc.",
-            "entity_type": "COMP",
-            "isin": "US0231351067",
-            "cusip": "023135106",
-            "sedol": "B58WM62",
-            "listing": "XNAS:AMZN"
-        },
-        
-    ]
-    mapping = api.get_entity_mapping(universe)
-
-    # in this case we match everything
-    assert len(mapping.matched) == len(universe)
-    assert [m.name for m in mapping.matched] == [
-        "RavenPack International S.L.",
-        "Apple Inc.",
-        "California, U.S.",
-        "Amazon.com Inc."
-    ]
+   universe = [
+       "RavenPack",
+       {'ticker': 'AAPL'},
+       'California USA',
+       {  # Amazon, specifying various fields
+           "client_id": "12345-A",
+           "date": "2017-01-01",
+           "name": "Amazon Inc.",
+           "entity_type": "COMP",
+           "isin": "US0231351067",
+           "cusip": "023135106",
+           "sedol": "B58WM62",
+           "listing": "XNAS:AMZN"
+       },
+       
+   ]
+   mapping = api.get_entity_mapping(universe)
+
+   # in this case we match everything
+   assert len(mapping.matched) == len(universe)
+   assert [m.name for m in mapping.matched] == [
+       "RavenPack International S.L.",
+       "Apple Inc.",
+       "California, U.S.",
+       "Amazon.com Inc."
+   ]
 
 Entity reference
 ~~~~~~~~~~~~~~~~
 
 The entity reference endpoint give you all the available information for
-an Entity given the RP\_ENTITY\_ID
+an Entity given the RP_ENTITY_ID
 
 .. code:: python
 
-    ALPHABET_RP_ENTITY_ID = '4A6F00'
+   ALPHABET_RP_ENTITY_ID = '4A6F00'
 
-    references = api.get_entity_reference(ALPHABET_RP_ENTITY_ID)
+   references = api.get_entity_reference(ALPHABET_RP_ENTITY_ID)
 
-    # show all the names over history
-    for name in references.names:
-        print(name.value, name.start, name.end)
-        
-    # print all the ticket valid today
-    for ticker in references.tickers:
-        if ticker.is_valid():
-            print(ticker)
+   # show all the names over history
+   for name in references.names:
+       print(name.value, name.start, name.end)
+       
+   # print all the ticket valid today
+   for ticker in references.tickers:
+       if ticker.is_valid():
+           print(ticker)
 
 Text Analytics
 ~~~~~~~~~~~~~~
@@ -211,15 +211,15 @@ internal proxy:
 
 .. code:: python
 
-    api = RPApi()
-    api.common_request_params.update(
-        dict(
-            proxies={'https': 'http://your_internal_proxy:9999'},
-            verify=False,
-        )
-    )
+   api = RPApi()
+   api.common_request_params.update(
+       dict(
+           proxies={'https': 'http://your_internal_proxy:9999'},
+           verify=False,
+       )
+   )
 
-    # use the api to do requests
+   # use the api to do requests
 
 PS. For setting your internal proxies, requests will honor the
-HTTPS\_PROXY environment variable.
+HTTPS_PROXY environment variable.
diff --git a/ravenpackapi/core.py b/ravenpackapi/core.py
@@ -15,7 +15,7 @@
 from ravenpackapi.utils.dynamic_sessions import DynamicSession
 
 _VALID_METHODS = ('get', 'post', 'put', 'delete', 'patch')
-VERSION = '1.0.42'
+VERSION = '1.0.43'
 
 logger = logging.getLogger("ravenpack.core")
 
diff --git a/ravenpackapi/examples/query_entity_mapping.py b/ravenpackapi/examples/query_entity_mapping.py
@@ -1,13 +1,23 @@
 from ravenpackapi import RPApi
 
 if __name__ == '__main__':
-    entities = [{'ticker': 'AAPL', 'name': 'Apple Inc.'},
-                {'ticker': 'JPM'},
-                {'listing': 'XNYS:DVN'}]
+    entities = [
+        {'ticker': 'AAPL', 'name': 'Apple Inc.'},
+        {'ticker': 'JPM'},
+        {'listing': 'XNYS:DVN'},
+
+        # this won't match with confidence
+        {'isin': 'US88339J1051', 'name': 'TRADE DESK INC/THE -CLASS A'},
+    ]
     api = RPApi()
 
     mapping = api.get_entity_mapping(entities)
 
     # show the matched entities
     for match in mapping.matched:
-        print(match.id, match.name, match.type, match.request)
+        print(match.id, match.name, match.type, match.score, match.request)
+
+    for close_match in mapping.errors:
+        if close_match.candidates:
+            best_match = close_match.candidates[0]
+            print(best_match.id, best_match.name, best_match.type, best_match.score, close_match.request)
diff --git a/ravenpackapi/models/mapping.py b/ravenpackapi/models/mapping.py
@@ -16,10 +16,23 @@ class RPMappingMatch(object):
     def __init__(self, data):
         self.request = data['request_data']
         self.errors = data['errors']
+        self.candidates = [
+            RPMappingCandidate(candidate)
+            for candidate in data.get('rp_entities', [])
+        ]
+
         if not self.errors:
-            self.candidates = data['rp_entities']
             # let's put the best candidate data on the obj for convenience
             best_match = self.candidates[0]
-            self.id = best_match['rp_entity_id']
-            self.name = best_match['rp_entity_name']
-            self.type = best_match['rp_entity_type']
+            self.id = best_match.id
+            self.name = best_match.name
+            self.type = best_match.type
+            self.score = best_match.score
+
+
+class RPMappingCandidate(object):
+    def __init__(self, data):
+        self.id = data['rp_entity_id']
+        self.name = data['rp_entity_name']
+        self.type = data['rp_entity_type']
+        self.score = data['score']
diff --git a/ravenpackapi/tests/test_entity_mapping.py b/ravenpackapi/tests/test_entity_mapping.py
@@ -59,3 +59,20 @@ def test_matching_by_cusip(self):
         mapping = api.get_entity_mapping(entities)
         assert not mapping.errors
         assert len(mapping.matched) == len(mapping.submitted) == 3
+
+    def test_multiple_candidates(self):
+        entities = [
+            {'isin': 'US88339J1051', 'name': 'TRADE DESK INC/THE -CLASS A'},
+        ]
+        api = self.api
+        mapping = api.get_entity_mapping(entities)
+        assert len(mapping.errors) == 1
+        assert len(mapping.matched) == 0
+
+        for close_match in mapping.errors:
+            if close_match.candidates:
+                best_match = close_match.candidates[0]
+                assert best_match.id == '0E698B'
+                assert best_match.name == 'The Trade Desk Inc.'
+                assert best_match.type == 'comp'
+            assert close_match.request == entities[0]
diff --git a/setup.py b/setup.py
diff --git a/tox.ini b/tox.ini