Overview: This is a NLP project, where we try to generate tags for a given movie,, after extracting details about the movvie using the IMDb ID of the movie.
The project uses BeautifulSoup for web--scraping and uses LangChain to manage the interaction with the OpenAI API for generating and cleaning tags.
For this project, we made three major classes:
1. The TMDb API key 2. The OMDb API key Once the object is initialized for the Class, we will then use the extract_data function of this class, to get the details of the movie, using the IMDb ID.
The function will return a dictionary containing the following details of the movie:
'IMDb ID', 'Title', 'Plot Synopsis (IMDb)', 'Movie Summary (TMDb)', 'About Movie (Wikipedia)', 'Plot Summary (OMDb)', 'Director', 'Cast', 'Genres', 'Keywords'
The class extracts data from multiple sources:
- IMDb: Plot synopsis and basic information.
- TMDb: Movie summary, genres, cast, and keywords.
- OMDb: Plot summary.
- Wikipedia: Detailed plot summary using Wikidata.
1. It generates tags based on the movie details extracted. 2. It then cleans the generated tags, i.e., it removes repeated tags, removes irrelevant tags etc.
During the initialization of an object of this class, there are three things required:
- The API key for OpenAI
- The model that we are going to use (in our case, I have set it to GPT-4 by default, this can be changed)
- The temperature value (this is used to control the randomness of the generation) (in our case, I have set it to 0 by default, as we want our model to be deterministic, and produce less random outputs, but this can too be changed)
Once the object is initialized for the Class, we will then use the generate_tags function of this class, to generate tags based on the movie details.
The function will return a list containing the tags for the movie.
The class uses LangChain to manage the interaction with the OpenAI API for generating and cleaning tags.
- tags_generator_template: This is the prompt template, that is used for generating the tags.
- tags_cleaner_template: This is the prompt template, that is used for cleaning the generated tags.
During the initialization of an object of this class, there is only one thing required:
1. The API key for OpenAI
This time, I did not add the model name and the temperature, because I assumed it will be the same as for the previous class.
Incase the parameters (model name and temperature) is changed in the TagGenerator Class, then make sure to change it here too
Once the object is initialized for the Class, we will then use the score_tags function of this class, to score the tags, based on the movie details
The function will return a list containing the scored tags for the movie.
The class also uses LangChain to manage the interaction with the OpenAI API for scoring the tags.
- tags_scoring_template: This is the prompt template, that is used for scoring the tags.
Keep in mind, that the scores of the tags will in general be pretty high, as the previous class has made sure only the 'relevant' tags of the movies will remain in the tags. Hence, as the tags will seem pretty relevant based on the movie details, therefore the scores will automatically be pretty high.
The final outputs achieved after running the files, have been uploaded in the Outputs Folder.