ElasticSearch-Movie-Search-Engine

Project Description

This project is a movie search engine that allows users to search for movies based on various attributes such as title, actors, genre, and more. It leverages Elasticsearch for efficient search capabilities and Sentence Transformers for embedding movie descriptions. The application consists of three main components:

Data Cleaning: Cleans and preprocesses the movie dataset.
Embedding and Storing: Generates embeddings for the cleaned data and stores them in Elasticsearch.
Search Application: Provides a user interface for searching movies.

Features

Data Cleaning: Handles missing values, converts data types, and cleans text fields.
Embedding Generation: Uses Sentence Transformers to generate embeddings for movie descriptions.
Elasticsearch Integration: Stores movie data and embeddings in Elasticsearch for fast and efficient search.
Search Interface: A Streamlit-based web application for searching movies.

Requirements

Python 3.7 or higher
Pandas
Sentence Transformers
Elasticsearch Python client
Streamlit

Installation

Clone the repository:

git clone https://github.com/meggitt/ElasticSearch-Movie-Search-Engine.git
cd movie-search-engine

Install required packages:
```
pip install -r requirements.txt
```
Download the movie dataset: Ensure you have the imdb_top_1000.csv file in the root directory.

Configuration

Configure Elasticsearch:
- Create an example.ini file with your Elasticsearch cloud ID and API keys(You can get a 14 day free trial at Elastic Cloud):
```
[DEFAULT]
cloud_id = "DeploymentCloudID"
apikey_id = "API Key ID"
apikey_key = "API Key"
```

Running the Application

Step 1: Data Cleaning

Run the data cleaning script:
```
python clean_data.py
```
This script will clean the dataset and save it as cleaned_dataset.csv.

Step 2: Embedding and Storing

Run the embedding and storing script:
```
python embed_and_store.py
```
This script will generate embeddings for the cleaned data and store them in Elasticsearch.

Step 3: Search Application

Run the search application:
```
streamlit run search_app.py
```
Open the provided URL in your browser to access the search interface.

Security Considerations

Ensure Elasticsearch is securely configured to prevent unauthorized access.
Keep your API keys and sensitive information secure.
Validate and sanitize user inputs to prevent injection attacks.

Scenarios Handled

Search by Title: Users can search for movies by entering the title.
Search by Actor: Users can search for movies by entering an actor's name.
Search by Genre: Users can search for movies by entering a genre.
Search by Keywords: Users can search using any relevant keywords.

Contributions

Contributions are welcome! Please create an issue or submit a pull request with your changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ElasticSearch-Movie-Search-Engine

Table of Contents

Project Description

Features

Requirements

Installation

Configuration

Running the Application

Step 1: Data Cleaning

Step 2: Embedding and Storing

Step 3: Search Application

Security Considerations

Scenarios Handled

Contributions

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
clean_data.py		clean_data.py
cleaned_dataset.csv		cleaned_dataset.csv
embed_and_store_data.py		embed_and_store_data.py
example.ini		example.ini
imdb_top_1000.csv		imdb_top_1000.csv
search_app.py		search_app.py

meggitt/ElasticSearch-Movie-Search-Engine

Folders and files

Latest commit

History

Repository files navigation

ElasticSearch-Movie-Search-Engine

Table of Contents

Project Description

Features

Requirements

Installation

Configuration

Running the Application

Step 1: Data Cleaning

Step 2: Embedding and Storing

Step 3: Search Application

Security Considerations

Scenarios Handled

Contributions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages