Food Pulse — top reviewed dishes from your local restaurant

Rahul Maramreddy
11 min readFeb 3, 2021

You are trying to find the most delectable items from a restaurant you have not been to before. You don’t want to goof up the order and regret it, so what do you do? You scan the Google reviews or Yelp to try and find which dishes are the most talked about, so you can make an informed decision (the one time where peer pressure actually makes sense). The problem is that you don’t have time to scroll through all the reviews, keep track of the food items, and pick the “best” ones. With Food Pulse, you can select the restaurant you are interested in and receive a real time analysis of the top food items for that restaurant. No more wasting time endlessly scrolling and being indecisive.

Published Chrome Extension
Food Pulse in action (video)
Source Code

Business Problem

“There has to be a better way.” That is what I told myself after struggling to come up with what to order at a local restaurant. There were many options to choose from and there were over 500 reviews. Turns out that I am not the only person with this problem. Check out the “Investor pitch”.

Final Results

Before I dive into the details, feel free to give our extension a whirl (brace yourselves Rick and Morty fans).

This is one of my favorite restaurants near my house. Let’s see what Food Pulse can do.

Screen shot of a sample result

Can confirm the “crab fried rice” is really good.

Now that you have a “flavor” for Food Pulse, let’s “unpack” this project (puns are intended).

Starting the project

I liked the idea of having a motivated partner to work on the project with, so I reached out to a buddy of mine, Rohan Dasika, and pitched the idea. He also liked the idea of building something to address this problem, and so we partnered up and began.

“Man, this project doesn’t seem too bad.” — us after we discussed it (biggest lie we told ourselves)

Constraints

Just to make things interesting, we wanted to do this entire project without spending a single dollar. There was no particular constraint on time, but the sooner the better. Besides that, anything was fair game.

Why a Chrome extension?

Before deciding on the extension, it would help to understand why other applications were not feasible.

Web-App: When you’re about to checkout on Amazon, have you noticed that the navigation buttons to the rest of the website disappear? Amazon wants you to checkout, right away. They don’t want you to go back to the website, and potentially abandon your cart (and their profits)…

Minimizing user friction was a priority for us. Imagine if there was another web-app you had to navigate to, enter a URL, and after a short wait, you’d find out the best foods? How many people would actually be willing to go through that cumbersome process? Especially when they’re hungry and searching for good food, they might be more likely to not try the app at all. So, this was an easy decision for us not to pursue a standalone web-app.

Furthermore, having the user search for the restaurant would have required the use of the Google Places API to find a particular restaurant and possibly the use of other 3rd party APIs to get the corresponding reviews. This would have been an extremely costly way of obtaining the reviews.

Mobile App: Why not have users download an app? Same reasons as the website, but now we have to build a cross platform app to support both iOS and android users. Too much complexity.

Chrome Extension: What if we can get rid of users manually entering url links, have a built in url error checker, and is relatively easier to build? Chrome extensions to the rescue. Although the extension would be easy to use, the biggest trade-off is that you are restricted to browser based searches even though ~44% of total web visits are mobile.

Have no fear, the Chrome extension is here

Approach

There are a few large components to this project:

  • Creating a user interface that interacts with the cloud
  • Architecting the processing of the data on the cloud
  • Scraping all the food items
  • Identifying the food entities and their corresponding sentiment in reviews
  • Publishing the extension on the Chrome Web Store

I’ll be covering how we addressed each component in a different post, but the final data pipeline is shown below:

Visual representation of the data pipeline
  1. Send the restaurant url from the Extension to an AWS Lambda function (1).
  2. Lambda function 1 opens up a headless browser and scrapes all of the restaurant reviews of the specified url.
  3. The scraped reviews are sent to another Lambda function (2) which uses a trained Spacy named-entity recognition (NER) model to find all of the food entities and then performs sentiment analysis to find the most “passionately” talked about foods.
  4. Lambda function 2 then returns the foods along with their sentiment to Lambda function 1, which then returns the data back to the Extension for final processing (pretty simple, right?).

Project challenges

To say there were a few challenges would be a gross understatement. As soon as one issue got resolved, another would pop up, but in the end, it was overcoming these challenges that made the final product that much more special.

Me after fixing one problem only to run into another one

I’m going to go through a few of the large obstacles we faced throughout the project and how we dealt with them:

Lack of high quality training data: Any data scientist will tell you that proper training data is paramount to getting a decent ML model. As the saying for training ML models goes, garbage in = garbage out. Unfortunately, with our high budget of $0, we didn’t have the luxury of paying for AWS Mechanical Turk or other crowd sourced labelling services to gather properly annotated review data. After a lot of Googling, we managed to find a large enough dataset to get the ball rolling. It wasn’t the ideal dataset (there were a lot of mistakes in labeling), but beggars can’t be choosers.

No option to use APIs that provided scraped review data: There were services that offered scraped Google reviews, but we nearly fainted when we found out how much they were charging (actual daylight robbery). The only solution was to scrape the reviews ourselves — which we did using Selenium. By the way, if you want to create a startup to compete with those guys, hit us up!

Headless scraping: After we got the scraping working locally, we needed it to work remotely on the AWS Lambda function. After a lot of trial and error, we finally managed to get it to work with the help of this wonderful repo by Jairo Vadillo.

Fitting all of the logic into 1 lambda function: Because of the size of the dependencies to scrape on the cloud and perform NER / sentiment analysis (Chrome binary, Spacy, etc.), we had to split the functions into two separate lambda functions. This obviously required a larger overhead, but it ended up getting the job done.

Getting TextBlob to work on Lambda: So apparently, you have to be REALLY careful when you upload dependencies for a specific package. Since Lambda uses a Linux environment, certain .whl dependencies have to be manually downloaded from pypi in order for the package to work. In the case of textBlob, we had to manually upload the linux based numpy .whl file from here. #facepalm

API Gateway failure: After configuring the entire pipeline using AWS API Gateway and setting up an API endpoint to call from our extension, we realized that the gateway times out after 30 seconds. Obviously, our extension takes longer than 30 seconds to run, so using an AWS API endpoint was out of the question. This called for an interesting solution: We had to use the node aws-sdk to send the request to the Lambda function. But wait… there’s more (problems)! How in the world do you ‘require’ modules on an extension that is not server based? Like a knight rescuing a damsel in distress, Browserify swoops in to save the day. From their website, “Browserify lets you require(‘modules’) in the browser by bundling up all of your dependencies.” Problem solved.

Me after realizing using API Gateway isn’t going to work

Securing of API Keys: Without security, there is chaos. Because our keys have to be made available for our extension to send and receive requests, the best we can do is take the proper defensive measures by following AWS best practices. By following the principle of least privilege, the keys can only perform the action of invoking only the one lambda function we specify (and absolutely nothing else). Should a bad actor want to try and perform other activities like spin up 100 EC2 instances to mine crypto using our keys, this concept prevents them from doing so.

Foods not being returned for reviews greater than 450: For some reason, whenever the number of reviews exceeded the 450 mark, nothing gets returned to the extension. After a lot of debugging using CloudWatch logs and being sure the code was fine, the culprit was the aws-sdk. Turns out the aws-sdk has a built in timeout and retry functionality, which prevented the Lambda timeouts from having an effect. Since longer reviews take more time to process (duh), and the reviews were not sent back by the time the aws-sdk timed out, it retried the initial request a few times and quit. #nevergiveup

Us after finishing the project

Lessons Learned

What a journey this project was. What started as a fun idea turned into something much more. The lows, the highs, the testing moments, and the moments of triumph were a roller coaster ride of emotions. Here are a few tips I can give for others who want to build something interesting:

Work on things you actually care about: Food Pulse was my solution to a seemingly trivial, but deeply entrenched routine. I love to try new restaurants and cuisines, and I would scour the reviews for any hints on the best items to order. The fact that I would use this app every time I ordered out, made building this product all the more personal. There are many problems out there that can be solved, but you need to be passionate about solving them, or there is a good chance you will give up or cut corners.

Find other like-minded people: In my opinion, this ties in importance to the prior point. Doing a project by yourself is great, but having another person who sees the same vision as you allows you to bounce ideas off of each other and helps you stay motivated.

See each challenge to the end: There were many times I could have just ignored small bugs that most users wouldn’t notice or skipped over features that were seemingly hard to implement. But the reality of product development is that growth happens when you’re most uncomfortable. The most valuable lessons come from the biggest frustrations. Just by sticking with seemingly impossible challenges until a reasonable conclusion, you’ll be surprised at how much you pick up along the way.

Seek help: This is a very underrated tip. You’ll be surprised at how many people would jump to help — you just have to ask. Reach out to friends, or DM people on LinkedIn and see how effortlessly an experienced individual can guide you to solve a problem that you are stuck on. Not everyone has a secret agenda.

Have fun: If you have not picked a project you care about, this one might be tough. Find some way to make the project more interesting. Although I knew it was extra work, I went through and added a lot of Rick and Morty themed stuff, because come on, who doesn’t like Rick and Morty. #picklerick

Inspired yet?

Next steps

Like everything in life, things are always a work in progress. If you would like to work on the next iteration of Food Pulse, here are a few possible next steps.

Software Architecture:

  • Look up restaurants for the user to enter from a dropdown of possible restaurants.
  • Store the reviews in a db, so you can write once and read many to avoid redundancies in lookup. If a user searches for a restaurant and it’s not available, then perform the live analysis.
  • Set up an offline batch processing script which mass scrapes restaurants and adds it to a db.
  • Combine the two lambda functions into one. Will require the use of Lambda layers for Lambda function 1.

Data Science:

  • For people/orgs with a big wallet ($400), Prodi.gy seems like a promising way to get better results. Since our budget was a whopping $0, this was a bit out of reach.
  • Look into many other possible ways to train a more robust model (ex. using NeuroNER). Another idea is to create a list of possible food items for a restaurant type (American, Indian, Chinese etc.), and use Levenshtein distance to find the closest match between items in a review and the list of possible food items.
  • Attempt to perform sentiment analysis the ‘right’ way as there is more to it than using TextBlob.
  • What about edge cases where food entities are referenced in a different sentence? This is where you need to dive into another interesting topic called ‘entity linking’.

Closing remarks

What if I told you that the core ideas behind Food Pulse are used by ~80 million monthly active users?

As it turns out, the “Grubhub” of India, Zomato, has a pretty slick feature that makes ordering food a heck of a lot easier. Prepare to have your mind blown. You ready for it…?

You can rate individual food items after you order them.

Sample restaurant from Zomato (sorry vegans/vegetarians)

This astonishingly simple idea pretty much gets rid of the need to read and scroll through reviews to find out what dish to order. Imagine being able to sort by rating (feature they don’t have) and boom — you have the top rated food items from that restaurant.

No scraping, no NER, no sentiment analysis, and most importantly — no user guessing.

Whaaaaaaaaat?!

Now the better question to ask is why have you guys not implemented this feature? @Grubhub, @Postmates, @Doordash, @UberEats

Data Science (Part 2) — Coming soon
Scraping+ Serverless (Part 3) — Coming soon

Super Special Thanks

Pranay Marella — Best front-end engineer I know. Helped with tricky front end features.
Leyuan Yu — Provided guidance for the data science components. Agreed to help out after randomly reaching out to him on LinkedIn.

Thanks

Google — For being Google.
Stackoverflow — For being Stackoverflow.

Reach out to either Rohan or me for any questions, comments, or concerns about Food Pulse or about this article. Feedback is always appreciated.

I’m always interested in working on interesting data science related software engineering projects, so if you have any ideas, reach out!

--

--