A department store chain receives several Google reviews for their stores, but needs a better way to take insights from this data. A centralized place to see not only what people are talking about each store, but also a summarized view of all their stores would be ideal, but we could go a bit further. Is it possible to see this kind of information about the competitor stores?
Reviews, from both our stores and the competitors, were scraped from the internet. The text of the reviews are analyzed with two focus points in mind: What are the main topics of the reviews? And what is the sentiment of the messages?
For the first point, for each message we can break it down to individual words, or tokens, to understand what is being talked about the most in the reviews. We can also see combinations of words as well, such as bigrams for two words, trigrams for three words, etc. With these tokens, we can create statistics about popular words that appear in the reviews in any given period of time we are observing.
To understand the sentiment of the reviews (if they are negative, positive or neutral) for the second point, we use a lexicon and rule-based analysis that creates a compound score by summing the valence scores of each word in the lexicon, adjusting according to the rules, and then normalizing between -1 and +1 (most extreme negative and positive, respectively). This gives us a useful metric with a unidimensional measure of sentiment for any given sentence.
With this information, we can visualize those statistics either by a single store, stores in a specific region, or even the whole portfolio. Making it easy to see the total number of reviews, positive to negative ratio, top tokens that were written in the past week, and more.
This allows the store owners to identify issues and opportunities, improving to solve customer's complaints, applying to stores what is being praised in any of them, and even taking hints with what our competitor's customers are writing about them.
Reviews were scraped with Python, using Selenium and Beautiful Soup. With the text reviews in a Pandas dataframe, it was cleaned and tokenized using NLTK to get most common tokens, and sentiment was extracted with a lexicon and rule-based analysis with Vader.
If you'd like to learn more about my projects or work together, feel free to reach out! You can also connect with me on LinkedIn