-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exposing lower level model evaulation data #35
Comments
that sounds interesting. would like to see a PR, yes.
…On Mon, Feb 6, 2017 at 10:21 PM, Phillip Baker ***@***.***> wrote:
Thanks for all the hard work on this! Parserator has definitely made it
easy to create a model with crfsuite. As I dig into fine tuning my model,
I'd like to have access to the metrics provided by crfsuite (accuracy,
precision, recall).
It looks the python wrapper does provide access to this data (
scrapinghub/python-crfsuite#42 (comment)
<scrapinghub/python-crfsuite#42 (comment)>),
what do you think of a PR that exposes this as a return value of
trainModel
<https://github.com/datamade/parserator/blob/master/parserator/training.py#L29>
?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#35>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAgxbTPtsXtby9wcJLO3GTAo8N2OF2q2ks5rZ_FAgaJpZM4L5CbA>
.
--
773.888.2718
|
Just following up on this. We ended up using modified versions of the def parse(raw_string, verbose=False):
if not TAGGER:
raise IOError(
'\nMISSING MODEL FILE: %s\nYou must train the model before you can '
'use the parse and tag methods\nTo train the model annd create the '
'model file, run:\nparserator train [traindata] [modulename]' % MODEL_FILE)
tokens = tokenize(raw_string)
if not tokens:
return []
features = tokens2features(tokens)
tags = TAGGER.tag(features)
if verbose:
probabilities = []
for index, tag in enumerate(tags):
probabilities.append(TAGGER.marginal(tag, index))
return list(zip(tokens, tags, probabilities))
return list(zip(tokens, tags))
def tag(raw_string, probability_cutoff=None):
tagged = OrderedDict()
if probability_cutoff:
tagged_probability = OrderedDict()
for token, label, probability in parse(raw_string, verbose=True):
tagged_probability.setdefault(label, {'tokens': []})
if tagged_probability[label].get('probability'):
tagged_probability[label]['probability'] = tagged_probability[label]['probability'] * probability
else:
tagged_probability[label]['probability'] = probability
tagged_probability[label]['tokens'].append(token)
for label, token_probabilities in tagged_probability.items():
if token_probabilities['probability'] > probability_cutoff:
tagged[label] = token_probabilities['tokens']
else:
for token, label in parse(raw_string):
tagged.setdefault(label, []).append(token)
for token in tagged:
component = ' '.join(tagged[token])
component = component.strip(' ,;')
tagged[token] = component
return tagged |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for all the hard work on this! Parserator has definitely made it easy to create a model with crfsuite. As I dig into fine tuning my model, I'd like to have access to the metrics provided by crfsuite (accuracy, precision, recall).
It looks the python wrapper does provide access to this data (scrapinghub/python-crfsuite#42 (comment)), what do you think of a PR that exposes this as a return value of
trainModel
?The text was updated successfully, but these errors were encountered: