This repository contains various Indian scriptures ๐ in a structured .csv format. The files contain the verses in their original Sanskrit language and their verse number.
The data folder contains both raw and processed data. The raw data is the direct output of the scrapy spiders and the processed data contains data after additional processing.
The notebooks folder contain the notebooks used to create the processed dataset.
The scriptures folder is a scrapy project which contains the scrapy spiders to scrape data from the web.
The project aims to provide Indian scriptures in a format that is suitable for text mining and natural language processing. If you would like to propose any changes, kindly send a pull request.
All the files are scraped from https://www.upanishads.iitk.ac.in using scrapy ๐ท๏ธ framework.