If you are used to performing researches with biomedical search engines such as novoseek or Pubmed, you might have noticed that different query terms can lead to the same results among the biomedical literature. Why that? Simply because there are many ways (eg synonyms) to refer to a same biomedical concept (disease, gene, pharmacological susbtance, etc).
The issue with search engines is that almost everyone has its own knowledge and habits when referring to something. Some will give it an explicit name (for instance “breast cancer type I susceptibility protein“), some will give an abbreviation (“BRCA 1“) and others another synonym (“BRCC1“). When it comes to dealing with databases and information extraction, the problem becomes even trickier.
Why is this making the information retrieval more difficult? Because when you are doing a research on biomedical search engine, the system generally retrieves information from one or several databases that compiles thousands of journals. No need to say that each author, each scientist uses its own genes terms accordingly to his field of study and knowledge. As a result, this plurality of terms makes information retrieval difficult as the system is unable to analyze all of them as a whole. Therefore one would read information about “BRCA 1” because the term is mentioned as it comes in the publication but would miss the publications where it is referred as to “Breast Cancer Type I susceptibility” -which is the same!
In order to cope with this problem, what has been done at novoseek is develop a unique information extraction system that based on dictionaries is able to return the publications no matter the synonym used.
Did you know that, on average, there are 7 synonyms for a single human gene? Interestingly enough to be mentioned, the one which most has reach 164 terms. How do we know that exactly? Because in our databases, we fill in the IDF (for Identifier), FA (for Functional Annotation) and SYN (for Synonyms) for each gene. Based on that, we are able to computate information about each of them.
Let’s see with a search example in both pubmed and novoseek. We are going to try with GLO1 (a glutathione-binding protein involved in the detoxification of methylglyoxal, a side-product of glycosis). This search in novoseek gives 739 results and the search is mapped as GLO1. When clicking on the Gene, a window pops up and shows the synonyms for this term, as shown in the image below.

Performing this search in pubmed returns 204 results and a search for one of its synonyms (Lactoylglutathione lyase) returns more than 700 results.
Now have a look at this very search in novoseek and see how the search has been interpreted.

Interesting isn’t it?
What’s doing novoseek is perform a concept search and analyze all the synonyms (alternative names) to the current search term in order to return all the corresponding results. Obviously, this is making searches easier and more comprehensive as you do not have to look any further. The information extraction process is illustrated below.

You can now understand what is the benefit of this technology of analysis in order to return all the publications no matter the synonym used.
I could tell you now about the importance of context to disambiguate the results and return the publications that you need to read…We will do that in another post!
![[Connotea]](http://blog.novoseek.com/wp-content/plugins/bookmarkify/connotea.png)
![[del.icio.us]](http://blog.novoseek.com/wp-content/plugins/bookmarkify/delicious.png)
![[Digg]](http://blog.novoseek.com/wp-content/plugins/bookmarkify/digg.png)
![[diigo]](http://blog.novoseek.com/wp-content/plugins/bookmarkify/diigo.png)
![[Google]](http://blog.novoseek.com/wp-content/plugins/bookmarkify/google.png)
![[LinkedIn]](http://blog.novoseek.com/wp-content/plugins/bookmarkify/linkedin.png)
![[Reddit]](http://blog.novoseek.com/wp-content/plugins/bookmarkify/reddit.png)
![[StumbleUpon]](http://blog.novoseek.com/wp-content/plugins/bookmarkify/stumbleupon.png)
![[Email]](http://blog.novoseek.com/wp-content/plugins/bookmarkify/email.png)

2 comments ↓
[...] novoseek, Resources, search results, User experience Some time ago, we explained to you how novoseek interprets a query and is able to return relevant publications, no matter the synonym used in the article and in the query. Indeed, the use of synonyms to extend [...]
[...] post of a series of post about the technology behind novoseek. In the first issue we talked about the problem of synonyms, in the second we showed the challenge of dealing with homonyms, in our third issue we would like [...]
Leave a Comment