Generates sentence-level embeddings from movie plot synopses after quick sanity checks in EDA. Normalizes vectors for cosine geometry, saves a reusable .npy artifact, and sets a consistent foundation for downstream similarity search and graph analytics.
Builds and inspects a cosine-similarity matrix over the embeddings to surface near-duplicates, sequels, and tight thematic clusters. Highlights top-k neighbors per title and uses thresholds/heatmaps to confirm that the representation captures meaningful plot-level semantics.
Creates a RediSearch index with text/numeric metadata and a vector field, ingests the corpus, and implements KNN queries using the same embedding model. Demonstrates responsive semantic search with optional hybrid filters (e.g., year/genre) and lays the groundwork for a simple API.
Materializes a similarity graph in Neo4j (nodes = movies, edges = weighted similarity) and runs GDS algorithms like PageRank, Betweenness, and Louvain. The results reveal influential bridge titles and community structure, useful for recommendations, playlists, and catalog exploration.
“IN THE END… We only regret the chances we didn’t take, the relationships we were afraid to have,and the decisions we waited too long to make.” ― Lewis Carroll