Yelp is a massive directory for businesses worldwide. You can find whatever you need; a plumber to fix your pipes, a realtor to sell your house, or some sushi to satisfy your sushi craving, in any location! Before the days of the internet you had to flip through your phone book to find what you were looking for. Us Data Savvy people can use yelp for a number of things, we can extract data about businesses and aggregate with data elsewhere (ie. twitter or google analytics) or preform sentiment analysis on the reviews to name a few. While we can do this with webscraping, I advise using the Yelp API as it is easier to deal with and saves us from working with html. In this tutorial I cover the basics for how to get started with Yelp’s API.
Before you can get started you need to create a Yelp Developer account by clicking here. Once that’s done you can go to the yelp fusion api which will be on the Yelp Developer page, then click ‘Get Started’.
it should take you to a page that looks like this…
The next step will be to create an app and fill out the required fields. Once that’s done you’ll be given a key. Once your in you can have a look at the documentation, this article specifically talks about the Business search.
Now that we’re set up, let’s get into the Python!
our only dependencies will be requests to query the data and pandas to store the data. we assign https://api.yelp.com/v3/businesses/search as our url variable and the key given as the key variable. We then create headers and parameters dictionaries, don’t worry about the headers you can just copy mine.
breaking down the parameters
only location and radius are required
- location = exact location, an address or geo coordinates
- term = what kinds of businesses we are looking for
- radius = meters within exact location, all results within a number of meters radius
- limit = number of results the query actually returns
In summary, we’re looking at all the malls within ~3 miles of an apartment complex in Beverly Hills, but we only want to see the first 3 results.
Now it’s time to run our response request, it should return a 200 meaning it’s good to go. we use the .json() message to see our query.
All Together Now!
Now it’s time to put all of this into a function that also extracts features we want and puts the information in a pandas dataframe. Basically what this function does is put the name, rating, number of $ given for price, number of reviews, and address all in a dataframe. here’s a list of all the features you can choose from id, alias, name, image_url, is_closed, url, review_count, categories, rating, coordinates, transactions, price, location, phone, display_phone, distance.
Here’s our very tiny dataset!
Once we have identified businesses as searchable terms we can query far more information about them across other platforms! While the possibilities our endless I’ll just cover how to get tweets regarding a given business. Like Yelp, We must create a twitter developer account and create an app. Once that’s done you will be given 4 different keys. Then you set it up like I have in the image below, please note that I assigned the tokens in the previous cell. as long as the tokens are passed into OAuthHandler as strings you’re good.
Here we can collect tweets on a given topic. If your interested in other data, you can just do api.Search(‘your topic name’). As you can see we can simply Query the name of the business, and get access to a whole new world of data!