Retrieving ordered chromosomes with ENSEMBL REST API

Submitted by toniher on dv, 29/08/2014 - 4:59pm

It's already almost 2 years that I wrote about the upcoming RESTful times in Bioinformatics. And well, it's already happening...
Just as curiosity, I will explain my recent experience using ENSEMBL REST API.

For a long time I've been taking care of mirroring different popular biological databases in my research centre, such as ENSEMBL. One common task we perform from those data is creating indexes for different alignment programs (e. g., Bowtie). We actually provide indexes that map well-referred canonical chromosome locations, but this is not always the case for the whole genome sequence you can get from a determinate species (e.g. for Human in release 76). For sake of simplicity (and disk space) we only download toplevel files, and then for certain indexers we clean "non-chromosomal" entries.
But, how do we know which chromosomes do a species have and their expected order? This seems a stupid question, but it's not so easy as simply considering an ascending integer succession (as it's partially the case of human). Not only because of sex chromosomes and mithocondria, but also because of different conventions from well-established communities, such as the fruitfly one.

For retrieving this information, I used some ini files which provided this, and even though I was warned that this was a deprecated method, I must admit I was continuing using this since, in the end, it seemed it continued working... And well, it worked for human until the actual ini file was recently updated for new GRCh38 version.

I might have followed the advice above, and recur to remote MySQL, but I found out and alternative way that was far easier in my opinion: to retrieve chromosomes list via ENSEMBL REST API. For instance, for human and fruitfly by just retrieving the karyotype key from the JSON response. However, it must be noted that for some species the returned outcome of this query is rather big and right now I would discourage its direct usage in real-time webservices. Of course, this can always be cached and for mirroring pipelines (which was my issue) is not a problem at all...

I hope this personal anecdote makes you imagine different possibilities about integrating biological REST APIs in your workflows.

BONUS: Get a prettier view of JSON output with JSONView.

bioinformatics

REST

In English

toniher's blog