Improvements in novoseek – March 2010

[Connotea] [del.icio.us] [Digg] [diigo] [Google] [LinkedIn] [Reddit] [StumbleUpon] [Email]

There have been several major improvements this month in novoseek:

  • Select the Publication Type from the Advanced Search panel
  • Users have been asking for it and it is now available when you are on the Advanced Search panel.

    TIP 1: Hold Ctrl (control) to select several Publication Types

    TIP 2: Learn more about the different Publication Types and their use when looking for scientific publications.
  • Complete authors list for each article
  • With a view to providing you with more information about the authors of an article, we have updated the meta data of every publication with the complete list of authors.

    - In the search results page, you will see the two first authors and the last author of the publication. Check with a search example

    TIP: when you are looking for a specific author, this author will appear highlighted within the results and you will see 4 authors in total for every publication (the 3 mentioned previously + the author you are looking for and highlighted within the results) Check with this example for Eley Robert

    - In the detail page of an author, all of the authors are now listed. Check authors in a publication detail page

  • Disambiguation of authors
  • A common problem within the scientific literature is the broad range of text formating that has an influence on authors name too. Sometimes an author name is written with Last name, First name or Last name, initial First name, etc. We now index all the known aliases of an author to make searches for an author publication more comprehensive. Check direct example of author disambiguation

  • Better navigation from one search results page to another
  • Users suggested to give a more intuitive navigation menu at the bottom of search results pages to switch from one page to another. This is done!

7 on average: a story about synonyms in biomedical concepts

[Connotea] [del.icio.us] [Digg] [diigo] [Google] [LinkedIn] [Reddit] [StumbleUpon] [Email]

If you are used to performing researches with biomedical search engines such as novoseek or Pubmed, you might have noticed that different query terms can lead to the same results among the biomedical literature. Why that? Simply because there are many ways (eg synonyms) to refer to a same biomedical concept (disease, gene, pharmacological susbtance, etc).

The issue with search engines is that almost everyone has its own knowledge and habits when referring to something. Some will give it an explicit name (for instance “breast cancer type I susceptibility protein“), some will give an abbreviation (“BRCA 1“) and others another synonym (“BRCC1“). When it comes to dealing with databases and information extraction, the problem becomes even trickier.

Why is this making the information retrieval more difficult? Because when you are doing a research on biomedical search engine, the system generally retrieves information from one or several databases that compiles thousands of journals. No need to say that each author, each scientist uses its own genes terms accordingly to his field of study and knowledge. As a result, this plurality of terms makes information retrieval difficult as the system is unable to analyze all of them as a whole. Therefore one would read information about “BRCA 1” because the term is mentioned as it comes in the publication but would miss the publications where it is referred as to “Breast Cancer Type I susceptibility” -which is the same!

In order to cope with this problem, what has been done at novoseek is develop a unique information extraction system that based on dictionaries  is able to return the publications no matter the synonym used.

Did you know that, on average, there are 7 synonyms for a single human gene? Interestingly enough to be mentioned, the one which most has reach 164 terms. How do we know that exactly? Because in our databases, we fill in the IDF (for Identifier), FA (for Functional Annotation) and SYN (for Synonyms) for each gene. Based on that, we are able to computate information about each of them.

Let’s see with a search example in both pubmed and novoseek. We are going to try with GLO1 (a glutathione-binding protein involved in the detoxification of methylglyoxal, a side-product of glycosis). This search in novoseek gives 739 results and the search is mapped as GLO1. When clicking on the Gene, a window pops up and shows the synonyms for this term, as shown in the image below.

screenshot_GLO1

Performing this search in pubmed returns 204 results and a search for one of its synonyms (Lactoylglutathione lyase) returns more than 700 results.

Now have a look at this very search in novoseek and see how the search has been interpreted.

screenshot_Lactoylglutathione_lyase

Interesting isn’t it?

What’s doing novoseek is perform a concept search and analyze all the synonyms (alternative names) to the current search term in order to return all the corresponding results. Obviously, this is making searches easier and more comprehensive as you do not have to look any further. The information extraction process is illustrated below.

novoseek_process_synonyms

You can now understand what is the benefit of this technology of analysis in order to return all the publications no matter the synonym used.

I could tell you now about the importance of context to disambiguate the results and return the publications that you need to read…We will do that in another post!

Is Bing that Big?

[Connotea] [del.icio.us] [Digg] [diigo] [Google] [LinkedIn] [Reddit] [StumbleUpon] [Email]

Microsoft has long been striving to be someone in the search engines landscape. Unfortunately, they have never achieved it. But on May of 2009, Microsoft stroke again. They released Bing, a new search engine with over 100 millions budget which should be enough to make it a great competitor on the market. So we’ve put it at a test. Is Bing that Big? What does it have to offer? Is it worth using it for searching biomedical literature?

Is Bing that big? This is what we’ve wondered after it’s been released and all the discussions there have been around. As we are concerned by delivering useful resources for the life sciences professionals, we decided to put Bing a test for biomedical research.

What biomedical information this search engine can give you? We started with a very basic search on Breast Cancer. First of all, the amount of results is huge. No less than 50.300.000 results for this mere search. Mainly because there are so many results, you have to get used to it before you can analyze where to find the information you need to read. Obviously the main results, the ones you are used to have in every search engine are displayed in the center, below the search box and premium advertising. There is one thing that is funny with Bing, they didn’t beat around the bush and put the information in the same way as google does (blue titles, black text and green URLs). Who knows, they could be right after all.

Once you are used to the layout of the page you can start analyzing how is the information clustered the way we show it to you below.

bing_layout
  1. Features the main search result meant to be the most relevant. This result often mentions the Wikipedia result as google does.
  2. When available, Bing displays a set of related informations to the query you’ve typed in, like Articles, Symptoms, Treatment, Stages, Surgery, Prevention and Reference. This is an interesting feature but unfortunately it does work only for really general queries such as Breast Cancer. Try Breast Carcinoma (synonym of breast cancer) and you won’t get any related information in that way.
  3. Displays relevant Related searches to the current search. Don’t expect to have anything very specific though, the related searches are general. That is to say, the last 2 related searches suggest Lung and Prostate Cancer.The same search in novoseek would allow you to refine it with more specific breast cancer-related filters such as invasive breast cancer, brca2 mutation or contralateral breast cancer
    novoseek_related_breastcancer
  4. Search history. Nice functionality and very simple to use. You can turn it off whenever you want or just go to your search history and clear some of them, if you need to.
  5. Now let’s keep reviewing Bing a bit more. As you scroll down the results you see the information clustered in the categories we mentioned before. If you click on them, you will have the whole set of results in that field. This is an interesting feature but unfortunately it is hardly working.

    One of the interesting thing that Bing is doing is the preview of results as shown in the image below. It will help you read more of the article without having to click on it. It is not yet revolutionary but still it is interesting to consider as an additional function.

    bing_preview_layout

    After this sample search, it seems clear that we are not dealing with the same amount of information and ways of treating it as would a biomedical search engine would do. To be true, BING does not propose much compared to Google but it still has this brand-new-product effect. The technology is not yet an outstanding one. What is true is that the results are pretty similar to Google or Yahoo. So Bing may not be that big but it could compete with Yahoo who is still late in bringing innovation on the market.

    Last but not least. How hard for Microsoft Bing to realize that when they are just launching their product, Google is giving a preview of Google Wave (web of the future?) and announces the beta release of Google Squared, another way of searching the web. It seems that whatever competitors do, Google is always a step ahead.

Judgmental heuristics

[Connotea] [del.icio.us] [Digg] [diigo] [Google] [LinkedIn] [Reddit] [StumbleUpon] [Email]

Robert B. Cialdini uses the term judgmental heuristics in his book Influence, science and practice to define different mental shortcuts we build to deal with the increasing complex and rapidly moving environment. One of the examples he gives is expensive=good.

Many people seem to follow this same rule when it comes to evaluate results from search engines.

In the years I have been presenting search engines products it was fairly common to see people surprised or annoyed when the system I was presenting retrieved less results than the one they were currently using. However, they didn’t seem to mind if with a different query, it was the system I presented the one offering a higher number of results. Probably they followed the judgmental heuristic of more results=better.

With the amount of information available, its accessibility and the daily use of information retrieval technology, can we still have this kind of judgmental heuristics? Can we cope with the amount of results that we actually get from search engines?

As more expensive is not necessarily better more results does not mean better results. But I guess the fear of missing relevant results prevents from totally vanishing this more result=better shortcut.