Skip to content
This repository has been archived by the owner on Jun 15, 2024. It is now read-only.

CrucibleSDS/tungsten

Repository files navigation

Tungsten

Tungsten

A material safety data sheet parser.

Installation

Tungsten is available on PyPi via pip. To install, run the following command:

pip install tungsten-sds

Note

Currently, the version of tabula-java (1.0.5) included with tabula-py (2.5.1) by default is inadequate as it does not provide page number metadata. To fix this, you must create a custom build of the tabula-java JAR file. To do this, follow the instructions in the tabula-java repository: https://github.com/tabulapdf/tabula-java#building-from-source . Commit 50ff2df2e62644260d519e2d875a4db7d87d6746 has been tested to work with Tungsten. To enable this custom build, set the TABULA_JAR environment variable to the path of the JAR file.

Usage Example

from pathlib import Path

from tungsten import SigmaAldrichSdsParser

sds_parser = SigmaAldrichSdsParser()
sds_path = Path("sigma_aldrich_w4502.pdf")

# Convert PDF file to parsed data
with open(sds_path, "rb") as f:
    sds = sds_parser.parse_to_ghs_sds(f, sds_name=sds_path.stem)

# Serialize parsed data to JSON and dump to a file
with open(sds_path.stem + ".json", "w") as f:
    sds.dump(f)

License

This work is licensed under MIT. Media assets in the assets directory are licensed under a Creative Commons Attribution-NoDerivatives 4.0 International Public License.

Notes

This library currently comes bundled with a new build of tabula-java, which is also licensed under MIT, to see the full license, see https://github.com/tabulapdf/tabula-java/blob/master/LICENSE.