Skip to content

Commit 30a5e8b

Browse files
committed
Entity mapping - expose score and candidates
1 parent 756d27e commit 30a5e8b

File tree

8 files changed

+147
-95
lines changed

8 files changed

+147
-95
lines changed

CHANGELOG.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,53 @@
11
# Changelog
22

3+
## v1.0.43 (2020-11-04)
4+
5+
* Entity-mapping - expose the matching score and the candidates
6+
37
## v1.0.40 (2020-10-07)
8+
49
* Support for the /jobs endpoint (to list the user past endpoints)
510

611
## v1.0.39 (2020-09-22)
12+
713
* Persistent sessions between API-calls
814

915
## v1.0.38 (2020-09-04)
16+
1017
* Text-Analytics API additional functions: `get_analytics` and `get_annotated`
1118

1219
## v1.0.36 (2020-05-14)
20+
1321
* Text-Analytics API endpoint updated: folders & richer metadata
1422
* Extended error handling to support Feed disconnection problems
1523

1624
## v1.0.35 (2020-02-22)
25+
1726
Initial support for the Text-Analytics API endpoints
1827

1928
## v1.0.34 (2019-11-11)
29+
2030
Retrieve a lazy-loaded dataset when setting one of its paramters.
2131

2232
## v1.0.33 (2019-10-17)
2333

2434
* A default timeout of 10" on connection and 60" on silence has been added to all the API calls
2535
* Retrieve or save a flatfile using the new methods `get_flatfile` and `save_flatfile`.
26-
See `get_historical_flat_list.py` for a complete example.
36+
See `get_historical_flat_list.py` for a complete example.
2737

2838
## v1.0.32 (2019-08-13)
39+
2940
The RPApi instance gets two new methods:
3041

3142
* `get_document_url` to retrieve the document url from a RP_STORY_ID
3243
* `get_flatfile_list` to retrieve the list of the available flatfiles for `companies`
33-
or `full` (for all the entities)
44+
or `full` (for all the entities)
3445

3546
## v1.0.29 (2019-05-21)
47+
3648
**dataset creation explicit parameters**
3749

38-
The Dataset parameters are not explictly passed in the constructor
39-
instead of being hidden in the kwargs.
50+
The Dataset parameters are not explictly passed in the constructor instead of being hidden in the kwargs.
4051

4152
This allows also to clearly support custom_fields and conditions.
4253

@@ -45,6 +56,7 @@ A few new examples have been added or updated:
4556
[create a dataset with custom_fields and conditions](ravenpackapi/examples/indicator_datasets.py).
4657

4758
## v1.0.28 (2019-05-15)
59+
4860
**dataset.count method**
4961

5062
```python

README.rst

Lines changed: 80 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Installation
1111

1212
::
1313

14-
pip install ravenpackapi
14+
pip install ravenpackapi
1515

1616
About
1717
-----
@@ -24,7 +24,7 @@ Usage
2424
-----
2525

2626
In order to be able to use the RavenPack API you will need an API KEY.
27-
If you don't already have one please contact your `customer
27+
If you dont already have one please contact your `customer
2828
support <mailto:sales@ravenpack.com>`__ representative.
2929

3030
To begin using the API you will need to instantiate an API object that
@@ -35,9 +35,9 @@ environment variable or set it in your code:
3535

3636
.. code:: python
3737
38-
from ravenpackapi import RPApi
38+
from ravenpackapi import RPApi
3939
40-
api = RPApi(api_key="YOUR_API_KEY")
40+
api = RPApi(api_key="YOUR_API_KEY")
4141
4242
Creating a new dataset
4343
~~~~~~~~~~~~~~~~~~~~~~
@@ -47,19 +47,19 @@ API with a Dataset instance.
4747

4848
.. code:: python
4949
50-
from ravenpackapi import Dataset
50+
from ravenpackapi import Dataset
5151
52-
ds = api.create_dataset(
53-
Dataset(
54-
name="New Dataset",
55-
filters={
56-
"relevance": {
57-
"$gte": 90
58-
}
59-
},
60-
)
61-
)
62-
print("Dataset created", ds)
52+
ds = api.create_dataset(
53+
Dataset(
54+
name="New Dataset",
55+
filters={
56+
"relevance": {
57+
"$gte": 90
58+
}
59+
},
60+
)
61+
)
62+
print("Dataset created", ds)
6363
6464
Getting data from the datasets
6565
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -71,10 +71,10 @@ Here is how you may get a dataset definition for a pre-existing dataset
7171

7272
.. code:: python
7373
74-
# Get the dataset description from the server, here we use 'us30'
75-
# one of RavenPack public datasets with the top30 companies in the US
74+
# Get the dataset description from the server, here we use 'us30'
75+
# one of RavenPack public datasets with the top30 companies in the US
7676
77-
ds = api.get_dataset(dataset_id='us30')
77+
ds = api.get_dataset(dataset_id='us30')
7878
7979
Downloads: json
8080
^^^^^^^^^^^^^^^
@@ -85,13 +85,13 @@ use the asynchronous datafile endpoint instead.
8585

8686
.. code:: python
8787
88-
data = ds.json(
89-
start_date='2018-01-05 18:00:00',
90-
end_date='2018-01-05 18:01:00',
91-
)
88+
data = ds.json(
89+
start_date='2018-01-05 18:00:00',
90+
end_date='2018-01-05 18:01:00',
91+
)
9292
93-
for record in data:
94-
print(record)
93+
for record in data:
94+
print(record)
9595
9696
Json queries are limited to \* granular datasets: 10,000 records \*
9797
indicator datasets: 500 entities, timerange 1 year, lookback window 1
@@ -108,13 +108,13 @@ some time to complete.
108108

109109
.. code:: python
110110
111-
job = ds.request_datafile(
112-
start_date='2018-01-05 18:00:00',
113-
end_date='2018-01-05 18:01:00',
114-
)
111+
job = ds.request_datafile(
112+
start_date='2018-01-05 18:00:00',
113+
end_date='2018-01-05 18:01:00',
114+
)
115115
116-
with open('output.csv') as fp:
117-
job.save_to_file(filename=fp.name)
116+
with open('output.csv') as fp:
117+
job.save_to_file(filename=fp.name)
118118
119119
Streaming real-time data
120120
~~~~~~~~~~~~~~~~~~~~~~~~
@@ -130,64 +130,64 @@ You can find a `real-time streaming example
130130
here <ravenpackapi/examples/get_realtime_news.py>`__.
131131

132132
The Result object handles the conversion of various fields into the
133-
appropriate type, i.e. ``record.timestamp_utc`` will be converted to
133+
appropriate type, i.e. \ ``record.timestamp_utc`` will be converted to
134134
``datetime``
135135

136136
Entity mapping
137137
~~~~~~~~~~~~~~
138138

139-
The entity mapping endpoint allow you to find the RP\_ENTITY\_ID mapped
140-
to your universe of entities.
139+
The entity mapping endpoint allow you to find the RP_ENTITY_ID mapped to
140+
your universe of entities.
141141

142142
.. code:: python
143143
144-
universe = [
145-
"RavenPack",
146-
{'ticker': 'AAPL'},
147-
'California USA',
148-
{ # Amazon, specifying various fields
149-
"client_id": "12345-A",
150-
"date": "2017-01-01",
151-
"name": "Amazon Inc.",
152-
"entity_type": "COMP",
153-
"isin": "US0231351067",
154-
"cusip": "023135106",
155-
"sedol": "B58WM62",
156-
"listing": "XNAS:AMZN"
157-
},
158-
159-
]
160-
mapping = api.get_entity_mapping(universe)
161-
162-
# in this case we match everything
163-
assert len(mapping.matched) == len(universe)
164-
assert [m.name for m in mapping.matched] == [
165-
"RavenPack International S.L.",
166-
"Apple Inc.",
167-
"California, U.S.",
168-
"Amazon.com Inc."
169-
]
144+
universe = [
145+
"RavenPack",
146+
{'ticker': 'AAPL'},
147+
'California USA',
148+
{ # Amazon, specifying various fields
149+
"client_id": "12345-A",
150+
"date": "2017-01-01",
151+
"name": "Amazon Inc.",
152+
"entity_type": "COMP",
153+
"isin": "US0231351067",
154+
"cusip": "023135106",
155+
"sedol": "B58WM62",
156+
"listing": "XNAS:AMZN"
157+
},
158+
159+
]
160+
mapping = api.get_entity_mapping(universe)
161+
162+
# in this case we match everything
163+
assert len(mapping.matched) == len(universe)
164+
assert [m.name for m in mapping.matched] == [
165+
"RavenPack International S.L.",
166+
"Apple Inc.",
167+
"California, U.S.",
168+
"Amazon.com Inc."
169+
]
170170
171171
Entity reference
172172
~~~~~~~~~~~~~~~~
173173

174174
The entity reference endpoint give you all the available information for
175-
an Entity given the RP\_ENTITY\_ID
175+
an Entity given the RP_ENTITY_ID
176176

177177
.. code:: python
178178
179-
ALPHABET_RP_ENTITY_ID = '4A6F00'
179+
ALPHABET_RP_ENTITY_ID = '4A6F00'
180180
181-
references = api.get_entity_reference(ALPHABET_RP_ENTITY_ID)
181+
references = api.get_entity_reference(ALPHABET_RP_ENTITY_ID)
182182
183-
# show all the names over history
184-
for name in references.names:
185-
print(name.value, name.start, name.end)
186-
187-
# print all the ticket valid today
188-
for ticker in references.tickers:
189-
if ticker.is_valid():
190-
print(ticker)
183+
# show all the names over history
184+
for name in references.names:
185+
print(name.value, name.start, name.end)
186+
187+
# print all the ticket valid today
188+
for ticker in references.tickers:
189+
if ticker.is_valid():
190+
print(ticker)
191191
192192
Text Analytics
193193
~~~~~~~~~~~~~~
@@ -211,15 +211,15 @@ internal proxy:
211211

212212
.. code:: python
213213
214-
api = RPApi()
215-
api.common_request_params.update(
216-
dict(
217-
proxies={'https': 'http://your_internal_proxy:9999'},
218-
verify=False,
219-
)
220-
)
214+
api = RPApi()
215+
api.common_request_params.update(
216+
dict(
217+
proxies={'https': 'http://your_internal_proxy:9999'},
218+
verify=False,
219+
)
220+
)
221221
222-
# use the api to do requests
222+
# use the api to do requests
223223
224224
PS. For setting your internal proxies, requests will honor the
225-
HTTPS\_PROXY environment variable.
225+
HTTPS_PROXY environment variable.

ravenpackapi/core.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
from ravenpackapi.utils.dynamic_sessions import DynamicSession
1616

1717
_VALID_METHODS = ('get', 'post', 'put', 'delete', 'patch')
18-
VERSION = '1.0.42'
18+
VERSION = '1.0.43'
1919

2020
logger = logging.getLogger("ravenpack.core")
2121

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,23 @@
11
from ravenpackapi import RPApi
22

33
if __name__ == '__main__':
4-
entities = [{'ticker': 'AAPL', 'name': 'Apple Inc.'},
5-
{'ticker': 'JPM'},
6-
{'listing': 'XNYS:DVN'}]
4+
entities = [
5+
{'ticker': 'AAPL', 'name': 'Apple Inc.'},
6+
{'ticker': 'JPM'},
7+
{'listing': 'XNYS:DVN'},
8+
9+
# this won't match with confidence
10+
{'isin': 'US88339J1051', 'name': 'TRADE DESK INC/THE -CLASS A'},
11+
]
712
api = RPApi()
813

914
mapping = api.get_entity_mapping(entities)
1015

1116
# show the matched entities
1217
for match in mapping.matched:
13-
print(match.id, match.name, match.type, match.request)
18+
print(match.id, match.name, match.type, match.score, match.request)
19+
20+
for close_match in mapping.errors:
21+
if close_match.candidates:
22+
best_match = close_match.candidates[0]
23+
print(best_match.id, best_match.name, best_match.type, best_match.score, close_match.request)

ravenpackapi/models/mapping.py

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,23 @@ class RPMappingMatch(object):
1616
def __init__(self, data):
1717
self.request = data['request_data']
1818
self.errors = data['errors']
19+
self.candidates = [
20+
RPMappingCandidate(candidate)
21+
for candidate in data.get('rp_entities', [])
22+
]
23+
1924
if not self.errors:
20-
self.candidates = data['rp_entities']
2125
# let's put the best candidate data on the obj for convenience
2226
best_match = self.candidates[0]
23-
self.id = best_match['rp_entity_id']
24-
self.name = best_match['rp_entity_name']
25-
self.type = best_match['rp_entity_type']
27+
self.id = best_match.id
28+
self.name = best_match.name
29+
self.type = best_match.type
30+
self.score = best_match.score
31+
32+
33+
class RPMappingCandidate(object):
34+
def __init__(self, data):
35+
self.id = data['rp_entity_id']
36+
self.name = data['rp_entity_name']
37+
self.type = data['rp_entity_type']
38+
self.score = data['score']

ravenpackapi/tests/test_entity_mapping.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,3 +59,20 @@ def test_matching_by_cusip(self):
5959
mapping = api.get_entity_mapping(entities)
6060
assert not mapping.errors
6161
assert len(mapping.matched) == len(mapping.submitted) == 3
62+
63+
def test_multiple_candidates(self):
64+
entities = [
65+
{'isin': 'US88339J1051', 'name': 'TRADE DESK INC/THE -CLASS A'},
66+
]
67+
api = self.api
68+
mapping = api.get_entity_mapping(entities)
69+
assert len(mapping.errors) == 1
70+
assert len(mapping.matched) == 0
71+
72+
for close_match in mapping.errors:
73+
if close_match.candidates:
74+
best_match = close_match.candidates[0]
75+
assert best_match.id == '0E698B'
76+
assert best_match.name == 'The Trade Desk Inc.'
77+
assert best_match.type == 'comp'
78+
assert close_match.request == entities[0]

0 commit comments

Comments
 (0)