Entries Tagged 'User experience' ↓

Data organization and interaction

[Connotea] [del.icio.us] [Digg] [diigo] [Google] [LinkedIn] [Reddit] [StumbleUpon] [Email]

This is the 3rd and last post of a series of post about the technology behind novoseek. In the first issue we talked about the problem of synonyms, in the second we showed the challenge of dealing with homonyms, in our third issue we would like to share some thoughts behind data organization and its representation which is a common issue to any type of Web application.

I must confess, I am fond of data visualization. I love all those keynotes, or graphics that have great colors, shapes… They catch my attention despite the fact that I might not understand them or that they provide me with irrelevant information. However some of them are really amazing. When I was doing my research on bioinformatics I was desperate looking for ways to represent all the data I had on protein interactions in a way that I could get a big picture at first and then focus on the details. I found a few amazing things at Visual Complexity but not flexible enough. I must confess that I failed in my intention to apply my programming skills to this task.

AKS

When I joined Bioalma and I started promoting our first product AKS, I was really excited with one of its main features that represents the relations among concepts based on the co-occurence in the literature. Is a great piece of software that lets you see at-a-glance which concepts are more related and visualize clusters. However, the information behind it was not always understandable.

When we started the novoseek project we decide to embrace the KIS (Keep It Simple) principle. Although we  try to keep up with this philosophy, I must confess that in our meetings the development manager, marketing director and an art director, its hard to say if we are even close to this philosophy.

Regarding the novoseek interface

As you might remember from previous posts, novoseek analyzes all the literature with an algorithm that integrates database information and takes into account the context of terms to annotate them in the literature. So when we started the project and we had all the data from the analysis of all the literature, we asked ourselve “what should we do with it? How could the user take advantage of all this analysis?”. Obviously, putting it in a search engine that is simple, clear and easy to use was our best choice. We needed to start organizing the data and designing a visualization interface to interact with it.

We needed to arrange all that information in a data structure that could give a fast, efficient and scalable service. The scalability issue was a really important concern. We didn’t want to change the data model when the system needed to serve millions of simultaneous petitions.

We also needed to have a picture of what type of information we wanted to display and how the user could interact with it. Based on our experience we knew that we needed to develop something not only simple but also familiar to the end user. We knew that designing an advanced interface with lots of information would be likely to disconcert the users. Our CEO was always telling us “we need to do something that doesn’t need to be explained to use it and understand it”. And so we did.

So the indexing technology and the automatic disambiguation method enabled novoseek to search faster and more efficiently the most relevant documents. We decided to take advantage of that and build what we called Profile. This profile is the result of the analysis that novoseek does taking advantage of the results of our text-mining analysis to build a list of the most relevant concepts to the query. We thought that this list would be really helpful since it gives a quick idea of what are the main themes related to the query. As we thought this list of relevant concepts needed to be interactive, we then added some functionality to it. Whenever you click on one of the terms of the list you get all the documents that take into account the very query term and the clicked concept. You can check examples with our user cases.

After that, we added many other features, some of which are really handy! Others may be a bit more hidden for advanced users that want to make the most out of the system.

However, understanding the users, the way they interact with us, what is useful and what can be removed to keep up with the KIS philosophy is an endless and ongoing process. At Bioalma, we are always studying what would happen if we put this menu here, if we choose this color or if we set up this log-in box there. Indeed, we mix our own craziness with the user suggestions and it is clear that sometimes we come up with a different (or strange) interface. So stay tuned and find out soon the results of our conversation with users and our own schizophrenia.

What is your path to successful searches in PubMed?

[Connotea] [del.icio.us] [Digg] [diigo] [Google] [LinkedIn] [Reddit] [StumbleUpon] [Email]

The other day Valentin and I were discussing how scientists confront the time-consuming task of looking for information in the scientific literature. From my experience as a scientist and from conversations with friends and colleagues, we found out that many of them end up in a frustrating situation when searching and that their path to successful searches in Pubmed can be summarized in one of the 4 following options:

  1. Direct
  2. We manage to find results from PubMed although in some cases we have to face the use of MeSH terms. Hashtag #nsdirect

  3. Ask for help
  4. After some time facing Pubmed search engine without any success, we decide to ask for help from a colleague or a librarian. Some of our friends told us that they don’t they take this path without trying to do it themselves. Hashtag #nsafh

  5. Alternatives
  6. After performing some searches in PubMed and not succeeding in our commitment we just look for alternative search engines like Google, 3rd party pubmed tools or obviously novoseek (we asked our friends they what do you expect them to use besides PubMed. ;-) Hashtag #nsalt

  7. Beer. Why not?
  8. I mean, after a hard working day, what is better than a beer and face the challenge some other day. Hashtag #nsbeer

Take a look at the image below, its so funny and so real ;-)

what is your path to successful searchesi n pubmed



Now we need you to act! What is your path? Tweet this post to your followers adding the #hashtag that better describes you.

  1. Direct, this is my path to successful searches in Pubmed
  2. Ask for help, this is my path to successful searches in Pubmed
  3. Alternatives, this is my path to successful searches in Pubmed
  4. Beer, why not?, this is my path to successful searches in Pubmed

Have a great weekend.

The importance of context in text disambiguation

[Connotea] [del.icio.us] [Digg] [diigo] [Google] [LinkedIn] [Reddit] [StumbleUpon] [Email]

Some time ago, we explained to you how novoseek interprets a query and is able to return relevant publications, no matter the synonym used in the article and in the query. Indeed, the use of synonyms to extend a search makes one of the user’s main goals-and matter-of-factly ours- possible: find the best and most comprehensive information regarding a research area. This appeared all the more important as Techcrunch was pointing out recently that Netbase was giving not relevant – when not really inconvenient – results due to severe problems in their text-mining techniques and semantic knowledge.

However, the path to returning accurate and comprehensive information to the final user is a tricky one. Once the synonyms to a query word have been analyzed, it comes a second challenging  problem: disambiguate homonyms.

Homonyms are terms with the same spelling but with different meanings. When a search is performed, many of the potential results can deal with a totally different area of interest. This forces the user to try with new queries and to make sure that the system is understanding the query correctly; which will avoid further searches.

Obviously, this takes a long time to achieve and it could be summed up in a sentence: “If the search engine would only know the meaning of the search term this process could be reduced to minutes“.

How is the homonyms disambiguation process performed?
Novoseek looks for the word in the literature and based on the semantic role of the word in the sentence and the analysis of the context is able to assign it to an entry in our build-in biomedical dictionary. Below is a sample image of what the context of the spot is with an extract of an article found for BRCA1.

spot_context

As a result of the analysis, we are able to determine if a document is on-topic or off-topic. For example, CAT is a gene symbol of the human gene catalase, but it is also an homonym for cat the animal or for Carnitine acetyltransferase. This means that if “CAT” appears in a document, a text mining-based system will have to decide to which concept it actually refers and disambiguate the symbol before proceeding to any higher level analysis steps.

CAT

Furthermore, there can be an ambiguity as the same gene entity can have the same name in different organisms. As a result the analysis of context information must be able to tell to which organism it is referenced. At this level, it is crucial for a text mining system to get the analyses correct and only associate those documents to a certain biological entity that actually mentions that entity. Errors at this level would populate throughout the system and the end result presented to the user would be wrong.

novoseek_process_homonyms

In regular search engines you will get all documents for a query term no matter its meaning. With novoseek you can focus on the meaning you want for your term to retrieve just the documents you are looking for.

The text analysis is just one of the first steps in nooseek’s text mining technology. The results of these analyses has to be structured and delivered to the user in a fast and easy way.  But we’ll talk about this in another post.

Considerations around the upcoming pubmed enhancements

[Connotea] [del.icio.us] [Digg] [diigo] [Google] [LinkedIn] [Reddit] [StumbleUpon] [Email]

The idea for this post came to me while I was conversing with a relative. She is a medical resident and informed me that she had to start using Pubmed overnight and happened to find it a bit complicated. Consequently, I could confirm that Pubmed is pretty hard for novices to use and took advantage of the opportunity to pitch novoseek to her. Should I remind you that novoseek is a free, easy and intuitive biomedical search engine? Anyway, this discussion with my relative reminded me that some time ago, I heard (thanks to fellow followers present on the MLA in Hawaï) that Pubmed was about to enhance its interface this summer.

This announcement is actually big news for the life sciences community as Pubmed, the search engine of the National Institute of Health, is one of the most used among the choices offered on the web today. Due to the amount of queries it has every day, improving the user experience was something normal and expected. Alisha Miles (a medical librarian for a non-profit hospital in Georgia) declared: “these all sound like wonderful improvements. Hopefully, we will get to a point where we can provide input to NLM before some changes are rolled out“.

Interestingly, these changes aim to make it “easier to use“, will “simplify the interface” and “refresh the look” and offer “better organized text on screen“. It is interesting that Pubmed is moving towards a simpler user interface, as novoseek has been doing this from the beginning.

If you are not familiar with Pubmed, let’s have a look at the screenshot below in order to realize how the layout organized currently.

pubmed_current1

Compare it to novoseek’s current layout.

novoseek_layout_vs_pubmed

We acknowledge that a change -as slight as it can be- was necessary. Indeed, Pubmed is difficult to use. It requires learning, training and improving skill to handle it properly. This is why there are many resources (Check this for instance: 18 ways to improve your pubmed searches) and classes about it. The changes will be the following:

  1. The tabs will disappear
  2. A narrower top banner
  3. Combination of Abstract and Abstract +
  4. +” below each citation
  5. Send to” option a lot more visible
  6. The right column will be wider and occupies almost 25% of the screen. It will show: the related articles, “Also try” option and recent activity

If you want to have a sneak preview of what it’ll look like you can check directly on David Gillikin’s presentation, although the images are not optimized for viewing on purpose. To make a long story short: Pubmed is about to go a bit more social and current.

Obviously, I have to compare these changes to novoseek’s features. Pubmed currently has more functions than novoseek. However, novoseek has been developed from the beginning with the goal of making it an easy to use, simple and fast biomedical search engine. Now Pubmed seems to be going that way, too.

In addition, we are adding new functions according to your needs. You can now check your search history, save searches and articles, create alerts and manage labels through my novoseek. These are functions we have developed according to the users’ expectations. Indeed, being close to users through twitter, uservoice make interactions and quick answers to their questions possible. We believe it is one of our strenghts against Pubmed.

Should you need to discover how to use novoseek to the best of its ability, you should have a look at the presentation below:

7 on average: a story about synonyms in biomedical concepts

[Connotea] [del.icio.us] [Digg] [diigo] [Google] [LinkedIn] [Reddit] [StumbleUpon] [Email]

If you are used to performing researches with biomedical search engines such as novoseek or Pubmed, you might have noticed that different query terms can lead to the same results among the biomedical literature. Why that? Simply because there are many ways (eg synonyms) to refer to a same biomedical concept (disease, gene, pharmacological susbtance, etc).

The issue with search engines is that almost everyone has its own knowledge and habits when referring to something. Some will give it an explicit name (for instance “breast cancer type I susceptibility protein“), some will give an abbreviation (“BRCA 1“) and others another synonym (“BRCC1“). When it comes to dealing with databases and information extraction, the problem becomes even trickier.

Why is this making the information retrieval more difficult? Because when you are doing a research on biomedical search engine, the system generally retrieves information from one or several databases that compiles thousands of journals. No need to say that each author, each scientist uses its own genes terms accordingly to his field of study and knowledge. As a result, this plurality of terms makes information retrieval difficult as the system is unable to analyze all of them as a whole. Therefore one would read information about “BRCA 1” because the term is mentioned as it comes in the publication but would miss the publications where it is referred as to “Breast Cancer Type I susceptibility” -which is the same!

In order to cope with this problem, what has been done at novoseek is develop a unique information extraction system that based on dictionaries  is able to return the publications no matter the synonym used.

Did you know that, on average, there are 7 synonyms for a single human gene? Interestingly enough to be mentioned, the one which most has reach 164 terms. How do we know that exactly? Because in our databases, we fill in the IDF (for Identifier), FA (for Functional Annotation) and SYN (for Synonyms) for each gene. Based on that, we are able to computate information about each of them.

Let’s see with a search example in both pubmed and novoseek. We are going to try with GLO1 (a glutathione-binding protein involved in the detoxification of methylglyoxal, a side-product of glycosis). This search in novoseek gives 739 results and the search is mapped as GLO1. When clicking on the Gene, a window pops up and shows the synonyms for this term, as shown in the image below.

screenshot_GLO1

Performing this search in pubmed returns 204 results and a search for one of its synonyms (Lactoylglutathione lyase) returns more than 700 results.

Now have a look at this very search in novoseek and see how the search has been interpreted.

screenshot_Lactoylglutathione_lyase

Interesting isn’t it?

What’s doing novoseek is perform a concept search and analyze all the synonyms (alternative names) to the current search term in order to return all the corresponding results. Obviously, this is making searches easier and more comprehensive as you do not have to look any further. The information extraction process is illustrated below.

novoseek_process_synonyms

You can now understand what is the benefit of this technology of analysis in order to return all the publications no matter the synonym used.

I could tell you now about the importance of context to disambiguate the results and return the publications that you need to read…We will do that in another post!