Skip to content

Commit

Permalink
fix: ValueError when converting cells to html (#359)
Browse files Browse the repository at this point in the history
This PR will address
#357 and
#358.

### Summary
- add logic to validate the input parameter to the fill_cells()
function. Now, the function checks if the input is a list of
dictionaries before processing.
- correct type hint for parameter `cells` in
`table_cells_to_dataframe()`
  • Loading branch information
christinestraub authored Jun 21, 2024
1 parent 0911892 commit 662571a
Show file tree
Hide file tree
Showing 4 changed files with 12 additions and 2 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
## 0.7.36

fix: add input parameter validation to `fill_cells()` when converting cells to html

## 0.7.35

Fix syntax for generated HTML tables

## 0.7.34
Expand Down
2 changes: 1 addition & 1 deletion unstructured_inference/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.7.35" # pragma: no cover
__version__ = "0.7.36" # pragma: no cover
4 changes: 3 additions & 1 deletion unstructured_inference/inference/layoutelement.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,9 @@ def reduce(keep: Rectangle, reduce: Rectangle):
reduce(keep=region_b, reduce=region_a)


def table_cells_to_dataframe(cells: dict, nrows: int = 1, ncols: int = 1, header=None) -> DataFrame:
def table_cells_to_dataframe(
cells: List[dict], nrows: int = 1, ncols: int = 1, header=None
) -> DataFrame:
"""convert table-transformer's cells data into a pandas dataframe"""
arr = np.empty((nrows, ncols), dtype=object)
for cell in cells:
Expand Down
3 changes: 3 additions & 0 deletions unstructured_inference/models/tables.py
Original file line number Diff line number Diff line change
Expand Up @@ -664,6 +664,9 @@ def fill_cells(cells: List[dict]) -> List[dict]:
whether this cell is a column header
"""
if not cells:
return []

table_rows_no = max({row for cell in cells for row in cell["row_nums"]})
table_cols_no = max({col for cell in cells for col in cell["column_nums"]})
filled = np.zeros((table_rows_no + 1, table_cols_no + 1), dtype=bool)
Expand Down

0 comments on commit 662571a

Please sign in to comment.