Skip to content

Question: Can Unstructured track hierarchical header info, such as H1, H2, H3, etc.? #1354

@aaronsteers

Description

@aaronsteers

In HTML and Markdown, we have a concept of nested and hierarchical header text, and I think it could be helpful to track for TitleText.

Does Unstructured currently have any tracking for?:

  1. Identifying the hierarchical level of a title during parsing.
  2. Storing this hierarchy level (H1, H2, etc.) in the metadata.
  3. Exposing the hierarchy of relevant headers on sub-elements, such as on metadata of a NarrativeText element?

Totally understandable if this is out of scope or perhaps not relevant to mainstream use cases. For my part, I just wanted to better understand if this is something that Unstructured can do, or if not if there are existing plans to add something like this in the future.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions