Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[formrecognizer] initial selection marks #14024

Merged
merged 10 commits into from
Oct 16, 2020
5 changes: 5 additions & 0 deletions sdk/formrecognizer/azure-ai-formrecognizer/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ methods to recognize data from business cards.
- Recognize receipt methods now take keyword argument `locale` to optionally indicate the locale of the receipt for
improved results
- Added ability to create a composed model from the `FormTrainingClient` by calling method `begin_create_composed_model()`
- Added support to train and recognize custom forms with selection marks such as check boxes and radio buttons.
This functionality is only available for models trained with labels
- Added property `selection_marks` to `FormPage` which contains a list of `FormSelectionMark`
- When passing `include_field_elements=True`, the property `field_elements` on `FieldData` and `FormTableCell` will
also be populated with any selection marks found on the page
- Added the properties `model_name` and `properties` to types `CustomFormModel` and `CustomFormModelInfo`
- Added keyword argument `model_name` to `begin_training()` and `begin_create_composed_model()`
- Added model type `CustomFormModelProperties` that includes information like if a model is a composed model
Expand Down
18 changes: 13 additions & 5 deletions sdk/formrecognizer/azure-ai-formrecognizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Azure Cognitive Services Form Recognizer is a cloud service that uses machine le
from form documents. It includes the following main functionalities:

* Custom models - Recognize field values and table data from forms. These models are trained with your own data, so they're tailored to your forms.
* Content API - Recognize text and table structures, along with their bounding box coordinates, from documents. Corresponds to the REST service's Layout API.
* Content API - Recognize text, table structures, and selection marks, along with their bounding box coordinates, from documents. Corresponds to the REST service's Layout API.
* Prebuilt receipt model - Recognize data from USA sales receipts using a prebuilt model.
* Prebuilt business card model - Recognize data from business cards using a prebuilt model.

Expand Down Expand Up @@ -134,15 +134,15 @@ form_recognizer_client = FormRecognizerClient(
- Recognizing form fields and content using custom models trained to recognize your custom forms. These values are returned in a collection of `RecognizedForm` objects.
- Recognizing common fields from US receipts, using a pre-trained receipt model. These fields and metadata are returned in a collection of `RecognizedForm` objects.
- Recognizing common fields from business cards, using a pre-trained business card model. These fields and metadata are returned in a collection of `RecognizedForm` objects.
- Recognizing form content, including tables, lines and words, without the need to train a model. Form content is returned in a collection of `FormPage` objects.
- Recognizing form content, including tables, lines, words, and selection marks, without the need to train a model. Form content is returned in a collection of `FormPage` objects.

Sample code snippets are provided to illustrate using a FormRecognizerClient [here](#recognize-forms-using-a-custom-model "Recognize Forms Using a Custom Model").

### FormTrainingClient
`FormTrainingClient` provides operations for:

- Training custom models without labels to recognize all fields and values found in your custom forms. A `CustomFormModel` is returned indicating the form types the model will recognize, and the fields it will extract for each form type. See the [service documentation][fr-train-without-labels] for a more detailed explanation.
- Training custom models with labels to recognize specific fields and values you specify by labeling your custom forms. A `CustomFormModel` is returned indicating the fields the model will extract, as well as the estimated accuracy for each field. See the [service documentation][fr-train-with-labels] for a more detailed explanation.
- Training custom models with labels to recognize specific fields, selection marks, and values you specify by labeling your custom forms. A `CustomFormModel` is returned indicating the fields the model will extract, as well as the estimated accuracy for each field. See the [service documentation][fr-train-with-labels] for a more detailed explanation.
- Managing models created in your account.
- Copying a custom model from one Form Recognizer resource to another.
- Creating a composed model from a collection of existing trained models with labels.
Expand Down Expand Up @@ -239,6 +239,14 @@ for cell in table.cells:
print("Cell text: {}".format(cell.text))
print("Location: {}".format(cell.bounding_box))
print("Confidence score: {}\n".format(cell.confidence))

print("Selection marks found on page {}:".format(page[0].page_number))
for selection_mark in page[0].selection_marks:
print("Selection mark is '{}' within bounding box '{}' and has a confidence of {}".format(
selection_mark.state,
selection_mark.bounding_box,
selection_mark.confidence
))
```

### Recognize Receipts
Expand Down Expand Up @@ -382,8 +390,8 @@ model_id = "<model_id from the Train a Model sample>"

custom_model = form_training_client.get_custom_model(model_id=model_id)
print("Model ID: {}".format(custom_model.model_id))
print("Model name: {}".format(model.model_name))
print("Is composed model?: {}".format(model.properties.is_composed_model))
print("Model name: {}".format(custom_model.model_name))
print("Is composed model?: {}".format(custom_model.properties.is_composed_model))
print("Status: {}".format(custom_model.status))
print("Training started on: {}".format(custom_model.training_started_on))
print("Training completed on: {}".format(custom_model.training_completed_on))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@
CustomFormModelField,
FieldValueType,
CustomFormModelProperties,
FormSelectionMark,
SelectionMarkState,
)
from ._api_versions import FormRecognizerApiVersion

Expand Down Expand Up @@ -65,6 +67,8 @@
'CustomFormModelField',
'FieldValueType',
'CustomFormModelProperties',
'FormSelectionMark',
'SelectionMarkState',
]

__VERSION__ = VERSION
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ def get_element_type(element_pointer):
if re.search(line_ref, element_pointer):
return "line"

selection_mark_ref = re.compile(r'/readResults/\d+/selectionMarks/\d+')
if re.search(selection_mark_ref, element_pointer):
return "selectionMark"

return None


Expand All @@ -45,6 +49,11 @@ def get_element(element_pointer, read_result):
ocr_line = read_result[read].lines[line]
return "line", ocr_line, read+1

if get_element_type(element_pointer) == "selectionMark":
mark = indices[1]
selection_mark = read_result[read].selection_marks[mark]
return "selectionMark", selection_mark, read+1

return None, None, None


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# Licensed under the MIT License.
# ------------------------------------

# pylint: disable=protected-access
# pylint: disable=protected-access, too-many-lines

from enum import Enum
from collections import namedtuple
Expand All @@ -31,7 +31,8 @@ def resolve_element(element, read_result):
return FormWord._from_generated(element, page=page)
if element_type == "line":
return FormLine._from_generated(element, page=page)

if element_type == "selectionMark":
return FormSelectionMark._from_generated(element, page=page)
raise ValueError("Failed to parse element reference.")


Expand Down Expand Up @@ -60,9 +61,20 @@ def get_field_value(field, value, read_result): # pylint: disable=too-many-retu
key: FormField._from_generated(key, value, read_result)
for key, value in value.value_object.items()
}
if value.type == "selectionMark":
return value.text

return None


class SelectionMarkState(str, Enum):
maririos marked this conversation as resolved.
Show resolved Hide resolved
"""State of the selection mark.
"""

SELECTED = "selected"
UNSELECTED = "unselected"


class FieldValueType(str, Enum):
"""Semantic data type of the field value.
"""
Expand All @@ -75,6 +87,7 @@ class FieldValueType(str, Enum):
INTEGER = "integer"
LIST = "list"
DICTIONARY = "dictionary"
SELECTION_MARK = "selectionMark"


class LengthUnit(str, Enum):
Expand Down Expand Up @@ -152,9 +165,9 @@ class FormElement(object):
:ivar int page_number:
The 1-based number of the page in which this content is present.
:ivar str kind:
The kind of form element. Possible kinds are "word" or "line" which
correspond to a :class:`~azure.ai.formrecognizer.FormWord` or
:class:`~azure.ai.formrecognizer.FormLine`, respectively.
The kind of form element. Possible kinds are "word", "line", or "selectionMark" which
correspond to a :class:`~azure.ai.formrecognizer.FormWord` :class:`~azure.ai.formrecognizer.FormLine`,
or :class:`~azure.ai.formrecognizer.FormSelectionMark`, respectively.
"""
def __init__(self, **kwargs):
self.bounding_box = kwargs.get("bounding_box", None)
Expand Down Expand Up @@ -213,7 +226,7 @@ class FormField(object):

:ivar str value_type: The type of `value` found on FormField. Described in
:class:`~azure.ai.formrecognizer.FieldValueType`, possible types include: 'string',
'date', 'time', 'phoneNumber', 'float', 'integer', 'dictionary', or 'list'.
'date', 'time', 'phoneNumber', 'float', 'integer', 'dictionary', 'list', or 'selectionMark'.
:ivar ~azure.ai.formrecognizer.FieldData label_data:
Contains the text, bounding box, and field elements for the field label.
Note that this is not returned for forms analyzed by models trained with labels.
Expand Down Expand Up @@ -289,7 +302,7 @@ class FieldData(object):
elements constituting this field or value is returned. The list
constitutes of elements such as lines and words.
:vartype field_elements: list[Union[~azure.ai.formrecognizer.FormElement, ~azure.ai.formrecognizer.FormWord,
~azure.ai.formrecognizer.FormLine]]
~azure.ai.formrecognizer.FormLine, ~azure.ai.formrecognizer.FormSelectionMark]]
"""

def __init__(self, **kwargs):
Expand Down Expand Up @@ -356,6 +369,10 @@ class FormPage(object):
certain cases proximity is treated with higher priority. As the sorting order depends on
the detected text, it may change across images and OCR version updates. Thus, business
logic should be built upon the actual line location instead of order.
:ivar selection_marks: List of selection marks extracted from the page.
:vartype selection_marks: list[~azure.ai.formrecognizer.FormSelectionMark]
.. versionadded:: v2.1-preview
*selection_marks* property
"""

def __init__(self, **kwargs):
Expand All @@ -366,6 +383,7 @@ def __init__(self, **kwargs):
self.unit = kwargs.get("unit", None)
self.tables = kwargs.get("tables", None)
self.lines = kwargs.get("lines", None)
self.selection_marks = kwargs.get("selection_marks", None)

@classmethod
def _from_generated_prebuilt_model(cls, read_result):
Expand All @@ -381,15 +399,17 @@ def _from_generated_prebuilt_model(cls, read_result):
) for page in read_result]

def __repr__(self):
return "FormPage(page_number={}, text_angle={}, width={}, height={}, unit={}, tables={}, lines={})" \
return "FormPage(page_number={}, text_angle={}, width={}, height={}, unit={}, tables={}, lines={}," \
"selection_marks={})" \
.format(
self.page_number,
self.text_angle,
self.width,
self.height,
self.unit,
repr(self.tables),
repr(self.lines)
repr(self.lines),
repr(self.selection_marks)
)[:1024]


Expand Down Expand Up @@ -474,6 +494,52 @@ def __repr__(self):
)[:1024]


class FormSelectionMark(FormElement):
"""Information about the extracted selection mark.

:ivar str text: The text content - not returned for FormSelectionMark.
maririos marked this conversation as resolved.
Show resolved Hide resolved
:ivar list[~azure.ai.formrecognizer.Point] bounding_box:
A list of 4 points representing the quadrilateral bounding box
that outlines the text. The points are listed in clockwise
order: top-left, top-right, bottom-right, bottom-left.
Units are in pixels for images and inches for PDF.
:ivar float confidence: Confidence value.
:ivar state: Required. State of the selection mark. Possible values include: "selected",
"unselected".
:type state: str or ~azure.ai.formrecognizer.SelectionMarkState
:ivar int page_number:
The 1-based number of the page in which this content is present.
:ivar str kind: For FormSelectionMark, this is "selectionMark".
"""

def __init__(
self,
**kwargs
):
super(FormSelectionMark, self).__init__(kind="selectionMark", **kwargs)
self.confidence = kwargs['confidence']
self.state = kwargs['state']

@classmethod
def _from_generated(cls, mark, page):
return cls(
confidence=mark.confidence,
state=mark.state,
bounding_box=get_bounding_box(mark),
page_number=page
)

def __repr__(self):
return "FormSelectionMark(text={}, bounding_box={}, confidence={}, page_number={}, state={})" \
.format(
self.text,
self.bounding_box,
self.confidence,
self.page_number,
self.state
)[:1024]


class FormTable(object):
"""Information about the extracted table contained on a page.

Expand Down Expand Up @@ -528,7 +594,7 @@ class FormTableCell(object): # pylint:disable=too-many-instance-attributes
constitutes of elements such as lines and words.
For calls to begin_recognize_content(), this list is always populated.
:vartype field_elements: list[Union[~azure.ai.formrecognizer.FormElement, ~azure.ai.formrecognizer.FormWord,
~azure.ai.formrecognizer.FormLine]]
~azure.ai.formrecognizer.FormLine, ~azure.ai.formrecognizer.FormSelectionMark]]
"""

def __init__(self, **kwargs):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@
FormTable,
FormTableCell,
FormPageRange,
RecognizedForm
RecognizedForm,
FormSelectionMark
)


Expand Down Expand Up @@ -71,6 +72,8 @@ def prepare_content_result(response):
unit=page.unit,
lines=[FormLine._from_generated(line, page=page.page) for line in page.lines] if page.lines else None,
tables=prepare_tables(page_result[idx], read_result),
selection_marks=[FormSelectionMark._from_generated(mark, page.page) for mark in page.selection_marks]
if page.selection_marks else None
)
pages.append(form_page)
return pages
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ All of these samples need the endpoint to your Form Recognizer resource ([instru
|**File Name**|**Description**|
|----------------|-------------|
|[sample_authentication.py][sample_authentication] and [sample_authentication_async.py][sample_authentication_async]|Authenticate the client|
|[sample_recognize_content.py][sample_recognize_content] and [sample_recognize_content_async.py][sample_recognize_content_async]|Recognize text and table structures of a document|
|[sample_recognize_content.py][sample_recognize_content] and [sample_recognize_content_async.py][sample_recognize_content_async]|Recognize text, selection marks, and table structures in a document|
|[sample_recognize_receipts.py][sample_recognize_receipts] and [sample_recognize_receipts_async.py][sample_recognize_receipts_async]|Recognize data from a file of a US sales receipt using a prebuilt model|
|[sample_recognize_receipts_from_url.py][sample_recognize_receipts_from_url] and [sample_recognize_receipts_from_url_async.py][sample_recognize_receipts_from_url_async]|Recognize data from a URL of a US sales receipt using a prebuilt model|
|[sample_recognize_business_cards.py][sample_recognize_business_cards] and [sample_recognize_business_cards_async.py][sample_recognize_business_cards_async]|Recognize data from a file of a business card using a prebuilt model|
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,13 @@ async def get_bounding_boxes(self):
format_bounding_box(word.bounding_box),
word.confidence
))
elif element.kind == "selectionMark":
print("......Selection mark is '{}' within bounding box '{}' "
"and has a confidence of {}".format(
element.state,
format_bounding_box(element.bounding_box),
element.confidence
))

print("---------------------------------------------------")
print("-----------------------------------")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,12 @@ async def recognize_content(self):
))
for word in line.words:
print("...Word '{}' has a confidence of {}".format(word.text, word.confidence))
for selection_mark in content.selection_marks:
print("Selection mark is '{}' within bounding box '{}' and has a confidence of {}".format(
selection_mark.state,
format_bounding_box(selection_mark.bounding_box),
selection_mark.confidence
))
print("----------------------------------------")


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,13 @@ def get_bounding_boxes(self):
format_bounding_box(word.bounding_box),
word.confidence
))

elif element.kind == "selectionMark":
print("......Selection mark is '{}' within bounding box '{}' "
"and has a confidence of {}".format(
element.state,
format_bounding_box(element.bounding_box),
element.confidence
))
print("---------------------------------------------------")
print("-----------------------------------")

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,12 @@ def recognize_content(self):
))
for word in line.words:
print("...Word '{}' has a confidence of {}".format(word.text, word.confidence))
for selection_mark in content.selection_marks:
print("Selection mark is '{}' within bounding box '{}' and has a confidence of {}".format(
selection_mark.state,
format_bounding_box(selection_mark.bounding_box),
selection_mark.confidence
))
print("----------------------------------------")


Expand Down
Loading