Introduction
Have you ever felt confused and wondered what new movie to watch, or watch a movie of a particular genre or maybe even a new movie that is similar to your favourite movie.
Well, why not make our Movie Recommender that takes a text query as an input and gives you a list of movies according to the query.
This use case can be thought of as a text search problem where the search system takes input as a text query and searches for movies similar to the query within the database.
How does it Work?
This recommender uses textual data to find the most relevant movies according to the text provided. We provide the name, description, and genre of the movies to the model from our database.
Technical Stack → Jina AI, Rest API, Dart
Database Used → IMDB Movies Dataset
Architecture Diagram
Following is the step-by-step walkthrough of the application logic:
- Fetch the Movies Dataset from Kaggle.
- Add the data to Jina’s DocumentArray for further pre-processing and indexing.
- Pass the DocumentArray to Jina’s Flow for indexing of data using Jina Hub Executors.
- The search flow will encode the input query and search through the indexed data for the nearest match.
- After finding the best possible match, the output data is sent as a Rest API which can be used for various frontend frameworks.
Code Walkthrough
Python
We will be using python to form our JinaAI backend. Firstly let's install the JinaAI
pip install jina
You might wanna use a virtual environment for this
The key components of this application are building a Flow, indexing, and search functionality. Let’s look at each of them one by one:
Create a document → Now we need to take our data and convert it into a docarray type. Let's make a file named helper.py DocArray is a library for nested, unstructured data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer the data with a Pythonic API.
from random import randint
from docarray.document.generators import from_csv
from jina import DocumentArray
with open("Python_Code/data/Movies_Reduced.csv") as file:
movies = DocumentArray(from_csv(file = 'Python_Code\data\Movies_Reduced.csv', field_resolver={'Summary': 'text'}))
movies = movies.shuffle(seed=randint)
for i in range(len(movies)):
movies[i].text = movies[i].text + f"{ movies[i].tags['Genres'] }" + f"{ movies[i].tags['Title']}"
Here, we are feeding the variable movies a .csv file and mapping the Summary column of the database to a field name of text in our docarray. Then we will add the genre and title of the movie into the summary again. Let's start with now training the model.
Create a Flow → The Flow consists of two Executors: SimpleIndexer
and TransformerTorchEncoder
. This will help in indexing and encoding our textual data. We will export the HTTP port 12345 to listen to our JinaAI backend for our app. JinaAI also provided a Swagger UI as a testing environment and thus we will not require Postman to test our API. We will also be using a pre-trained model to reduce our training time as well as get better results. All the indexed data is stored in a database named index.db and we will use it to get our results.
from jina import Flow
flow = (
Flow(port_expose='12345', protocol='http').add(
uses="jinahub://TransformerTorchEncoder",
uses_with={
"pretrained_model_name_or_path":
"sentence-transformers/paraphrase-distilroberta-base-v1"
},
name="encoder",
install_requirements=True
)
.add(
uses="jinahub://SimpleIndexer/latest",
uses_metas={"workspace": "workspace"},
volumes="./workspace:/workspace/workspace",
name="indexer"
)
)
Create an Index function → The index function will take the dataset of images in text format converts them into Jina’s native DocumentArray and passes it to the Flow for indexing and searching. We will provide input as a DocArray and the training will start here only. After the model is trained, our python script will start listening to API calls and provide us results as a response.
with flow as f:
f.post(on="/index", inputs=movies, show_progress=True)
f.post(on="/", show_progress=True)
f.cors = True
We are finally done with the Python Implementation. Let's work on how we can call our API via Dart/Flutter
Dart
Create a search function → The search function takes the text input and makes an HTTP post request to fetch similar movie titles from the Jina backend. The following code is written in Dart for an enhanced user experience. We will have to use the localhost address with our assigned port. Then we will form a JSON for our query and various parameters. After that, we will form a POST response and then wait for the results.
import 'dart:convert';
import 'package:http/http.dart';
makePostRequest() async {
final uri = Uri.parse('http://192.168.1.9:12345/search');
final headers = {'Content-Type': 'application/json'};
var final_data = [];
Map<String, dynamic> body = {
"data": [
{"text": "comedy"}
],
"parameters": {"limit": 10}
};
String jsonBody = json.encode(body);
final encoding = Encoding.getByName('utf-8');
Response response = await post(
uri,
headers: headers,
body: jsonBody,
encoding: encoding,
);
int statusCode = response.statusCode;
String responseBody = response.body;
print(statusCode);
var convertedData = jsonDecode(responseBody);
final_data = convertedData['data'][0]['matches'];
for (var item in final_data) {
print(item['tags']['Title']);
}
}
void main(List<String> arguments) {
print("Starting");
makePostRequest();
}
And that’s it. It’s this easy to make a neural search engine using Jina AI.
Movie Recommender in Action
Learning Resources
GitHub Repository
For full application source code with Flutter Implementation, check out this GitHub Repository.