My first REST API

I have always wanted to try creating a REST API, but my busy uni life (plus a bit of laziness) means that I have not had the chance. Fear not, I have done it!

As part of a job application process, I am tasked to create a REST API that would return the largest 5 public GitHub repositories given a username, and serve it on a publicly visible URL. It is actually rather easy to write the API. I spent more time deploying it on my nginx-powered website than actually writing the API.

The blog post aims to explain how I achieved this, both in more general terms and with technical details. All the code is available in this repository on my GitHub.

What is a REST API?

An API (application programming interface) abstracts away functionalities that might otherwise require specific understanding of how something works in order to interact with it. For instance, when you try to login to your Google account, all you have to do is to type in your email and password, be it in a browser or in an app on any operating system, and click the login button, without having to know what happens behind the scene.

In the same sense, a REST API serves information through HTTP that enables information to be shared and maintain interoperability, i.e. the ability to provide the same information regardless of the device or browser you request the information from. Just like going to a website in a browser, you simply have to type in the correct URL that leads you to the desired information.

Information is served at endpoints. An endpoint is the last part of a website. For example, the endpoint of https://www.melaus.xyz/about/ is my About Me page. It leads you to the correct information and can be exchanged in any format. JSON is a common format used by REST APIs, but XML is sometimes used, too.

On top of interoperability, a REST API is powerful in the sense that not only can you request information (GET), you can also send information (POST) and update the information stored (PUT, DELETE), all through a URL structure that is easily understood.

Live Demo

Be gentle with it, as there is a rate limit of 5000 per hour (i.e. only 5000 requests are allowed per hour).

The base URL for the API is https://apis.melaus.xyz/top5, and has two endpoints, <username> and <username>/detailed. The first endpoint gives you a summary of three fields about the 5 largest public repositories of the given GitHub username, and the second gives you everything the GitHub API returns. Information is returned as list of JSON's.

The following shows an example when I queried for the largest 5 repos of Facebook using the first endpoint:

# Querying https://apis.melaus.xyz/top5/facebook
[
  {
    "html_url": "https://github.com/facebook/pose-aligned-deep-networks",
    "name": "pose-aligned-deep-networks",
    "size": 1617006
  },
  {
    "html_url": "https://github.com/facebook/mysql-8.0",
    "name": "mysql-8.0",
    "size": 1272017
  },
  {
    "html_url": "https://github.com/facebook/buck",
    "name": "buck",
    "size": 506004
  },
  {
    "html_url": "https://github.com/facebook/mysql-5.6",
    "name": "mysql-5.6",
    "size": 413715
  },
  {
    "html_url": "https://github.com/facebook/facebook-clang-plugins",
    "name": "facebook-clang-plugins",
    "size": 299725
  }
]

Implementation

API

Languages Used

My love for Python means that this API is implemented using Python, using the Flask framework. In my opinion, Python serves as a great language when dealing with JSON-based REST APIs, since JSON objects can simply be manipulated as dictionaries in python. Built-in libraries such as json and requests make it very easy to manipulate data and create JSON objects for output. In fact, Flask contains a function called jsonify that automatically creates REST-compatible JSON objects.

Structure

My code is split into two files:

  • github_lib.py handles and manipulate data to obtain the 5 largest public repos of a user.
  • top5_api.py creates defines the endpoints and calls the function required to obtain the 5 largest repos.

Spliting the code that handles the data and the API into two files makes it easy to make changes and add functionalities in the future.

Pagination

The GitHub API returns at most 100 items (it varies from endpoint to endpoint) per call. As we are required to obtain the largest 5 repositories, we have to obtain all the repos of a user. Pagination is used in the case when a user has more than 100 repositories (e.g. Facebook).

For the GitHub API, It returns a Link field in the HTTP header if there contains more pages of content. As you can see below, there contains 6 pages and the next page is 2. It even provides you with the link to the next page.

>>> import requests
>>> content = requests.get('https://api.github.com/users/facebook/repos')
>>> content.headers['Link']
'<https://api.github.com/user/69631/repos?page=2>; rel="next", <https://api.github.com/user/69631/repos?page=6>; rel="last"'

Now, pagination is easy. All we have to do is to iterate until we reach the last page and store the results (or simply, when there is not more 'next' pages in the Link field). Using Python, we can easily write the code required to perform this and obtain the top 5.

Deployment

Deployment is always challenging if you are not experienced with how something works. Luckily, I learnt a bit about how nginx and self-hosting work1. However, that was a year ago, so it still took me a long while to get it working…!

Firstly, I create the subdomain apis.melaus.xyz, as I want to keep the API away from the main website. To do so, I add a DNS record that tells the Web where apis.melaus.xyz should lead to. In my case, I am serving my API and the subdomain on the same server as my website, so all I have to do is to say that it is an 'alias to melaus.xyz.'2.

Next, to ensure a stable API, I serve our Flask application using the Python library uwsgi3, which requires us to create a configuration file top5_api.ini and a file wsgi.py that defines how the application is run. I wrap this as a service so that it can easily restarted when we make changes or after rebooting the server.

Finally, we create an nginx config for apis.melaus.xyz to make our API live. All of this done in about half a day — not too bad for the first time I reckon!

Footnotes

  1. DigitalOcean provides many guides that makes it easy to host your website. Definitely recommend if you want to host your own website. See my previous posts about this.

  2. See this guide if you are interested in finding out more about DNS records.

  3. More about that in the documentation and this guide