Retail review intelligence
Topic analysis across thousands of e-commerce clothing reviews — feedback categorized into themes with semantic and similarity search over customer sentiment.
Retail / E-commerce
Business-aware search and analysis
OpenAI embeddings · ChromaDB · t-SNE · Cosine similarity · pandas / scikit-learn
Applied project
The problem
A women's clothing retailer holds thousands of customer reviews — too many to read, too valuable to ignore.
What was engineered
Vector embeddings
Every review embedded as a vector, making meaning computable. The embedding space is visualized with t-SNE to expose the natural structure of the feedback.
Theme categorization
Reviews are categorized against four themes — quality, fit, style, comfort — by cosine similarity.
Semantic search
The full corpus is loaded into a vector database for semantic search: describe a sentiment in plain language and retrieve the reviews that express it.
From the build
similarities = [
{class="code-string">"distance": cosine(review_embedding, category_emb),
class="code-string">"index": i}
for i, category_emb in enumerate(category_embeddings)
]
closest = min(similarities, key=lambda s: s[class="code-string">"distance"])
category = categories[closest[class="code-string">"index"]]
class=class="code-string">"code-comment"># Semantic search over the full corpus: describe a
class=class="code-string">"code-comment"># sentiment in plain language, retrieve the reviews
class=class="code-string">"code-comment"># that express it.
results = collection.query(
query_texts=[class="code-string">"silky, comfortable, wore it all day"],
n_results=3,
) Why it matters
This is business-aware search in miniature — find feedback, documents, or customers by what they mean, not what they say. The same architecture powers the search and analysis work in our client applications.
Stack
All work has been anonymized to protect clients.
Start a project
Ready to build? Let's talk.
Start with a free 30-minute call. We scope the first useful version and deliver a fixed quote.
30 minutes · a clear answer either way