Skip to content

Conversation

@pkuyym
Copy link
Contributor

@pkuyym pkuyym commented May 16, 2017

fixes #2078

@pkuyym
Copy link
Contributor Author

pkuyym commented May 16, 2017

@guoshengCS Thanks for testing and document writing.

@wangkuiyi wangkuiyi requested review from pengli09 and reyoung May 16, 2017 15:41
this->storeLocalValues();
std::vector<std::string> buffers;
paddle::str::split(name, '.', &buffers);
auto it = this->values_.find(buffers[buffers.size() - 1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change buffers[buffers.size() - 1] to buffers.back()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


private:
void storeLocalValues() const {
CHECK_GT(numOutputSegments_, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change CHECK_GT to CHECK_GE, numOutputSegments_ can be 0 in practice. For example, the label sequence is O O O O.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

private:
void storeLocalValues() const {
CHECK_GT(numOutputSegments_, 0);
CHECK_GT(numLabelSegments_, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change CHECK_GT to CHECK_GE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

void storeLocalValues() const {
CHECK_GT(numOutputSegments_, 0);
CHECK_GT(numLabelSegments_, 0);
double precision = (double)numCorrect_ / numOutputSegments_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change it to double precision = !numOutputSegments_ ? 0 : (double)numCorrect_ / numOutputSegments_;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

CHECK_GT(numOutputSegments_, 0);
CHECK_GT(numLabelSegments_, 0);
double precision = (double)numCorrect_ / numOutputSegments_;
double recall = (double)numCorrect_ / numLabelSegments_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change it to double recall = !numLabelSegments_ ? 0 : (double)numCorrect_ / numLabelSegments_;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

To make it clear, let's illustrate by a NER example.
Assuming that there are two named entity types including ORG and PER which are called 'chunk type' here,
if 'IOB' scheme were used, the label set will be extended to a set including B-ORG, I-ORG, B-PER, I-PER and O,
in which B-ORG for begining of ORG and I-ORG for end of ORG.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change end of ORG to inside of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.. code-block:: python
'plain' means the whole chunk must contain exactly the same chunk label.
Realizing that the number of is chunk type is 2 and number of tag type is 2, it is easy to validate this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change the example to make chunk type and tag type of different number. In this case, both of them are 2 and may fail to help the users to clarify their misunderstanding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

For each label in the label sequence, we have:
To use chunk evaluator, the construction of label dict should obey the following rules:
(1) Use one of the listed labelling schemes. These schemes differ in ways indicating chunk boundry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should define "chunk type", "tag type" before the following table. And we'd better have a running example to show how to label the words using different schemes. In fact, the following table is the protocol for assigning tag types, not the definition of the schemes. Therefore, I think we also need another table for the definitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

The total number of different labels is numTagType*numChunkTypes+1.
We support 4 labelling scheme.
The tag type for each of the scheme is shown as follows:
(2) Map can be done correctly by the listed equations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change Map to Mapping
Change can be done to is done. I think is done is better. Because if can be done was used, it may mislead the users to think this is only one of the feasible options. However, this is the only feasible one because we hard coded it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

IOB 0 1 - -
IOE - 0 1 -
IOBES 0 1 2 3
Continue the NER example, and the label dict should like this to satify above equations:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change like to look like

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.Override getTypeImpl instead of getType.
2.I think holding precision, recall and F1-score into an unified map could make the code cleaner and easier to maintain and the extra computation cost is trivial.
3.Revise the document following the review comments.

Copy link
Contributor

@pengli09 pengli09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except getNames(). Please consult other members to make the final decision.

@luotao1
Copy link
Contributor

luotao1 commented May 18, 2017

请参考 http://www.paddlepaddle.org/develop/doc_cn/howto/dev/write_docs_cn.html 看一下生成出的文档格式是否正确。

IOB Two labels for chunk type X, B-X for chunk begining and I-X for chunk inside.
IOE Two labels for chunk type X, E-X for chunk ending and I-X for chunk inside.
IOBES Four labels for chunk type X, B-X for chunk begining, I-X for chunk inside, E-X for chunk end and S-X for single word chunk.
.. code-block:: python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

366行可以去掉,只需要一个.. code-block::python即可。下同。
请生成下文档看下显示是否正确。目前粗看,会有一些问题。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

The construction of label dict should obey the following rules:
(1) Use one of the listed labelling schemes. These schemes differ in ways indicating chunk boundry.
.. code-block:: python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why code-block is python? It seems a plain text?

..[SPACE]code-block:: [language]
[EMPTY_LINE]
[SPACE][SPACE][SPACE]Your texts.
.. code-block:: text

   abc
   def

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@reyoung reyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM.

@pkuyym pkuyym merged commit d3e003b into PaddlePaddle:develop May 22, 2017
@pkuyym pkuyym deleted the fix-2078 branch May 24, 2017 06:30
fsylmxx pushed a commit to fsylmxx/Paddle that referenced this pull request Nov 25, 2025
* fix update time

* add rerun

* fix permission error

* fix delete container

* fix mlu env

* fix mlu ci error

* fix cleanup

* fix cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Evaluator cannot return the expected metrics.

4 participants