Legal Search Engine

Welcome to the Legal Search Engine project! This search engine is designed to facilitate the retrieval of legal judgments and orders from the Supreme Court of Pakistan. The project involves various tasks, including data collection, vocabulary generation, inverted index construction, interface development, ranking, evaluation, information presentation, and report writing.

Project Overview

[Task 1] Data Collection

The first task involved gathering a textual document collection of legal judgments/orders from the Supreme Court of Pakistan. The corpus consists of 1000 to 1500 judgments/orders categorized into 5 to 10 classes, such as Criminal Cases, Civil Appeals, Human Rights, Suo Moto, and Family Cases.

The categorization is presented in a plain text file, showcasing category names and brief descriptions of each order in the respective category.

[Task 2] Vocabulary Generation

For the vocabulary generation, each document in the collection was processed. The vocabulary includes terms and their index numbers, as well as a plain text file for documents and their index numbers.

[Task 3] Inverted Index Construction

An inverted index was generated to store TFIDF and BM25 term weightings. Four plain text files were created, consisting of inverted index data for raw term frequency, log frequency weighting, TFIDF weighting, and BM25 weighting.

[Task 4] Interface and Queries Benchmark

A simple web interface was developed to search legal documents. The interface, resembling a Google page, allows users to input queries. The query collection benchmark includes ten queries, with five two-word queries and five three-term queries.

[Task 5] Cosine Similarity and Ranking

The application now computes the similarity between each document and query and ranks them accordingly. Information retrieval is performed on 10 queries for both TFIDF and BM25 weights. The results are presented in plain text files.

[Task 6] Evaluation

The search engine is evaluated using precision, recall, f-measure, average precision, and mean average precision. Two plain text files are provided for each query, showing the required information for both TFIDF and BM25 weights.

[Task 7] Information Presentation

The top 10 documents, based on cosine similarity between query and documents, are presented as abstracts in the form of a list of snippets. A word cloud is displayed on the right side for each page, considering words from the corresponding top documents.

[Task 8] Report Writing

A comprehensive draft of the assignment, detailing the methodology, results, and findings, is included in the report.

Getting Started

To run the Legal Search Engine locally, follow these steps:

Clone the repository: git clone https://github.com/hzaheer48/LegalSearchEngine.git
Navigate to the project directory: cd LegalSearchEngine
Run the web interface: web_interface.py

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
C.A		C.A
C.M.A		C.M.A
C.P		C.P
Const.P		Const.P
Crl.A		Crl.A
Crl.P		Crl.P
S.M.C		S.M.C
screenshots		screenshots
static/css		static/css
templates		templates
README.md		README.md
abstracts.txt		abstracts.txt
bm25_each_document.txt		bm25_each_document.txt
bm25_frequencies.py		bm25_frequencies.py
bm25_frequenices.txt		bm25_frequenices.txt
bm25_map_calculator.txt		bm25_map_calculator.txt
bm25_top_10.txt		bm25_top_10.txt
bm_25_map_calculator.py		bm_25_map_calculator.py
chromedriver.exe		chromedriver.exe
consine_sim_bm25.py		consine_sim_bm25.py
consine_sim_tf_idf.py		consine_sim_tf_idf.py
corrupt_index.txt		corrupt_index.txt
counts.txt		counts.txt
doc_index.txt		doc_index.txt
index_generator.py		index_generator.py
inverted_index_frequencies.py		inverted_index_frequencies.py
inverted_index_frequencies.txt		inverted_index_frequencies.txt
log_frequencies.py		log_frequencies.py
log_term_frequencies.txt		log_term_frequencies.txt
query_terms.txt		query_terms.txt
scrapper.py		scrapper.py
term_frequencies.py		term_frequencies.py
term_frequencies.txt		term_frequencies.txt
tf_idf_each_document.txt		tf_idf_each_document.txt
tf_idf_frequencies.py		tf_idf_frequencies.py
tf_idf_frequencies.txt		tf_idf_frequencies.txt
tf_idf_map_calculator.py		tf_idf_map_calculator.py
tf_idf_map_calculator.txt		tf_idf_map_calculator.txt
tf_idf_top_10.txt		tf_idf_top_10.txt
top_ten_bm25_values.txt		top_ten_bm25_values.txt
top_ten_tf_idf_values.txt		top_ten_tf_idf_values.txt
top_ten_values.py		top_ten_values.py
vocb_index.txt		vocb_index.txt
web_interface.py		web_interface.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Legal Search Engine

Project Overview

[Task 1] Data Collection

[Task 2] Vocabulary Generation

[Task 3] Inverted Index Construction

[Task 4] Interface and Queries Benchmark

[Task 5] Cosine Similarity and Ranking

[Task 6] Evaluation

[Task 7] Information Presentation

[Task 8] Report Writing

Getting Started

Screenshots

About

Releases

Packages

Languages

hzaheer48/LegalSearchEngine

Folders and files

Latest commit

History

Repository files navigation

Legal Search Engine

Project Overview

[Task 1] Data Collection

[Task 2] Vocabulary Generation

[Task 3] Inverted Index Construction

[Task 4] Interface and Queries Benchmark

[Task 5] Cosine Similarity and Ranking

[Task 6] Evaluation

[Task 7] Information Presentation

[Task 8] Report Writing

Getting Started

Screenshots

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages