
Neth Pehesara Botheju
Second-year Computer Science student at the University of Westminster.
Movie Prediction Application
Project Overview
Movie Prediction Application that predicts the movie based on the user's summary or what they remember about the movie. The goal is to create an accurate predictor.
Features
- Users can submit a summary through the frontend web application and receive predictions in a user-friendly interface without needing to work with the CLI.
- Users will receive the top 3 predicted movies based on the submitted summary, along with a similarity score for each movie.
- The frontend will display additional metadata about each predicted movie, such as the release year, genres, etc., for better clarification.
Technical Implementation
DatasetI couldn't find a dataset on the internet that includes movie data along with the plot as a column for each record. So, I merged multiple datasets to create the final dataset.
- First, I obtained three datasets: the IMDb movie dataset, the Wikipedia movie dataset, and a dataset containing movie plots with Wikipedia movie IDs.
- I first merged the IMDb and Wikipedia datasets using the movie name and release year. Then, I merged the resulting dataset with the Wikipedia movie plot dataset using the Wikipedia movie ID.
- During model training, after text cleaning, I generated embeddings using the all-MiniLM-L6-V2 model.
- I fine-tuned the model using triplet loss (anchor = summary text, positive = correct title, negative = random title).
- I built a FAISS index with the embeddings and saved it for future predictions.
The backend of the application has been developed using Python and the Flask framework. It is responsible for predicting movies by utilizing a pre-saved FAISS index to perform efficient similarity searches. The application exposes APIs to handle both input requests and output responses, enabling seamless interaction between the client and the server.
FrontendThe frontend of the application is developed using Vue.js, providing a responsive and interactive user interface. It communicates with the backend by connecting to the exposed API endpoints, allowing users to send their prompts and receive movie predictions in real time.
Future Enhancements
- Collect additional information from the user if they know it (e.g., actors, genre, year), which can help make more accurate predictions.
- According to testing, the model seems to overfit and doesn't fully understand the sentences. Future work will focus on generalizing the model.
- Display movie posters alongside the predicted movies for a better user experience.