1. Embedded images are extracted to a dedicated folder, which i observed for some of the documents. There are some graphical images in the below pdf which are not getting extracted to separate folder. 2. There are also superscripts in the pdf, which are not referenced. [sample_document.pdf](https://github.com/user-attachments/files/16749392/sample_document.pdf)