bioinformatics

Are you moonlighting?

A few weeks ago I finally had my dissertation (PhD) defence. Below you can see the shown slides (in Catalan), trying to touch in around 3/4 hour many of the different involved works.

One of the mentioned topics was protein moonlighting (or multitasking), that is, the property that some protein molecules may have additional functions or roles apart from the one that is primarily annotated or known. As example, the group in which I worked during my PhD studies is actually keeping an exhaustive list of these cases.

This usage can be considered as a kind of metaphor of the original moonlighting term. As it can be read in Urban Dictionary, it refers to the fact of having an additional job, normally during moonlit hours (at night).

For a non-native English speaker it's always hard to know how popular certain words or expressions are. It was funny to learn that at least this word seemed to exist already during the 1970s in the USA. However, as we can see during a conversation of the main character of Taxi Driver, not all people may have been fully familiar with it.

Using Neo4j with NCBI taxonomy and Gene Ontology datasets

Around one year and a half ago I started some testing with graph databases (Neo4j so far) and I used Gene Ontology and NCBI taxonomy datasets as sample cases. I explained my experience in this presentation by February 2015:

After a while, I finally found time to update my importing scripts and API Java extension so they could work with newer versions of Neo4J and Py2neo (2.2.3 and 2.0.7 at the time of writing).

Regarding Py2neo, I noticed that Neo4j REST API seems to rely more explicitly on Cypher queries that it did in the past. With the help of this article about multiprocessing in Python and Py2neo, and after several tries, I managed to get importing work within acceptable time.

As final tips, if you plan to use similar approaches with your own data, I would suggest to create nodes and populate their properties at the same time (keeping data in memory if necessary). I also noticed that trying to create relationships with multiple parallel processes fails, so keep only one worker for these steps.

We are starting to live RESTful times in Biosciences

It's a long time since Molecular Biology data (from sequence strings to protein structural coordinates) are being released openly to the public, as it's the Web an interface for exploring and visualizing those data. Indeed, my first approaches to Bioinformatics and to, more or less, serious programing were preparing CGI points to command-line applications or to results of analysed data.

Compartir efectivament les dades de la recerca bioinformàtica

Tal com es fa ressò el portal nodalpoint.org arran de la publicació de l'especial anual del NAR sobre les bases de dades bioinformàtiques, s'evidencia que cada vegada n'hi ha més que les que efectivament es podrà arribar a consultar mai de forma adequada. Tot això amb l'agreujant que, si bé l'accés és normalment lliure, llur ús acostuma a estar restringit o, amb sort i amb més o menys destresa, requereixen d'una atenció particular.

Subscribe to RSS - bioinformatics