Data organization and interaction

[Connotea] [del.icio.us] [Digg] [diigo] [Google] [LinkedIn] [Reddit] [StumbleUpon] [Email]

This is the 3rd and last post of a series of post about the technology behind novoseek. In the first issue we talked about the problem of synonyms, in the second we showed the challenge of dealing with homonyms, in our third issue we would like to share some thoughts behind data organization and its representation which is a common issue to any type of Web application.

I must confess, I am fond of data visualization. I love all those keynotes, or graphics that have great colors, shapes… They catch my attention despite the fact that I might not understand them or that they provide me with irrelevant information. However some of them are really amazing. When I was doing my research on bioinformatics I was desperate looking for ways to represent all the data I had on protein interactions in a way that I could get a big picture at first and then focus on the details. I found a few amazing things at Visual Complexity but not flexible enough. I must confess that I failed in my intention to apply my programming skills to this task.

AKS

When I joined Bioalma and I started promoting our first product AKS, I was really excited with one of its main features that represents the relations among concepts based on the co-occurence in the literature. Is a great piece of software that lets you see at-a-glance which concepts are more related and visualize clusters. However, the information behind it was not always understandable.

When we started the novoseek project we decide to embrace the KIS (Keep It Simple) principle. Although we  try to keep up with this philosophy, I must confess that in our meetings the development manager, marketing director and an art director, its hard to say if we are even close to this philosophy.

Regarding the novoseek interface

As you might remember from previous posts, novoseek analyzes all the literature with an algorithm that integrates database information and takes into account the context of terms to annotate them in the literature. So when we started the project and we had all the data from the analysis of all the literature, we asked ourselve “what should we do with it? How could the user take advantage of all this analysis?”. Obviously, putting it in a search engine that is simple, clear and easy to use was our best choice. We needed to start organizing the data and designing a visualization interface to interact with it.

We needed to arrange all that information in a data structure that could give a fast, efficient and scalable service. The scalability issue was a really important concern. We didn’t want to change the data model when the system needed to serve millions of simultaneous petitions.

We also needed to have a picture of what type of information we wanted to display and how the user could interact with it. Based on our experience we knew that we needed to develop something not only simple but also familiar to the end user. We knew that designing an advanced interface with lots of information would be likely to disconcert the users. Our CEO was always telling us “we need to do something that doesn’t need to be explained to use it and understand it”. And so we did.

So the indexing technology and the automatic disambiguation method enabled novoseek to search faster and more efficiently the most relevant documents. We decided to take advantage of that and build what we called Profile. This profile is the result of the analysis that novoseek does taking advantage of the results of our text-mining analysis to build a list of the most relevant concepts to the query. We thought that this list would be really helpful since it gives a quick idea of what are the main themes related to the query. As we thought this list of relevant concepts needed to be interactive, we then added some functionality to it. Whenever you click on one of the terms of the list you get all the documents that take into account the very query term and the clicked concept. You can check examples with our user cases.

After that, we added many other features, some of which are really handy! Others may be a bit more hidden for advanced users that want to make the most out of the system.

However, understanding the users, the way they interact with us, what is useful and what can be removed to keep up with the KIS philosophy is an endless and ongoing process. At Bioalma, we are always studying what would happen if we put this menu here, if we choose this color or if we set up this log-in box there. Indeed, we mix our own craziness with the user suggestions and it is clear that sometimes we come up with a different (or strange) interface. So stay tuned and find out soon the results of our conversation with users and our own schizophrenia.

What is your path to successful searches in PubMed?

[Connotea] [del.icio.us] [Digg] [diigo] [Google] [LinkedIn] [Reddit] [StumbleUpon] [Email]

The other day Valentin and I were discussing how scientists confront the time-consuming task of looking for information in the scientific literature. From my experience as a scientist and from conversations with friends and colleagues, we found out that many of them end up in a frustrating situation when searching and that their path to successful searches in Pubmed can be summarized in one of the 4 following options:

  1. Direct
  2. We manage to find results from PubMed although in some cases we have to face the use of MeSH terms. Hashtag #nsdirect

  3. Ask for help
  4. After some time facing Pubmed search engine without any success, we decide to ask for help from a colleague or a librarian. Some of our friends told us that they don’t they take this path without trying to do it themselves. Hashtag #nsafh

  5. Alternatives
  6. After performing some searches in PubMed and not succeeding in our commitment we just look for alternative search engines like Google, 3rd party pubmed tools or obviously novoseek (we asked our friends they what do you expect them to use besides PubMed. ;-) Hashtag #nsalt

  7. Beer. Why not?
  8. I mean, after a hard working day, what is better than a beer and face the challenge some other day. Hashtag #nsbeer

Take a look at the image below, its so funny and so real ;-)

what is your path to successful searchesi n pubmed



Now we need you to act! What is your path? Tweet this post to your followers adding the #hashtag that better describes you.

  1. Direct, this is my path to successful searches in Pubmed
  2. Ask for help, this is my path to successful searches in Pubmed
  3. Alternatives, this is my path to successful searches in Pubmed
  4. Beer, why not?, this is my path to successful searches in Pubmed

Have a great weekend.

Thank you.

[Connotea] [del.icio.us] [Digg] [diigo] [Google] [LinkedIn] [Reddit] [StumbleUpon] [Email]

The second of February of 2009 we did our official announcement of novoseek. Now we are a year older. We have learned a lot along the way from our users, partners and competitors. We have gone through some difficult and some really exciting moments. We continue to develop our system in order to give our users an alternative to PubMed that is easier to use and with which you can get relevant results faster.

New challenges are coming up this year. We are anxious to show the new features  that novoseek is going to offer which most of you will love while others will just think “why didn’t I came up with that idea” ;-)

Stay tuned and don’t miss this image that represent a few of the things we have been doing this pass year. We hope you like it.

Thank you.!

novoseek, the first year

novoseek, the first year

Scientific literature helps to avoid tricky situations

[Connotea] [del.icio.us] [Digg] [diigo] [Google] [LinkedIn] [Reddit] [StumbleUpon] [Email]

In ‘There’s Something About Mary‘ (1998), high school senior Ted Stroehmann (Ben Stiller) suffers a tremendous accident triggered by an awkward situation that take place while he is in the bathroom.

This scene came to my head the moment Christian, inspired on a discussion over a post on the conversation on NCBI ROFL blog, sent me a search result in novoseek for penis zipper. I must confess that I was shocked to see that novoseek found 16 Medline articles on the subject and  that the first document goes all the way to the 70s. I guess that what it really surprised me is that in the year 2006 it still seems to be an unsolved problem.

One of my favorite study among the results  was the one from the journal The American journal of emergency medicine comparing 2 different methods of emergent zipper release. In this one, they study an alternate method of zipper release that is up to 65,3 second faster then the standard procedure that goes for 15 seconds over the minute. It as well concludes that the “optimal procedure is also dependent on the location of the entrapped tissue and the type of zipper”. What is also interesting about this study is that it was made with volunteers.  I can hardly imagine being part of them in this type of studies. And on top of that, testing in different types of tissues? wow!!

Anyway, another interesting result is that novoseek didn’t find any awarded grant for this kind of research. Does that mean that it is not and interesting research issue anymore? Has it been solved yet?

I guess that if Ted Stroehmann would have only known a way to get out of his situation in 10,8 seconds it wouldn’t have been such an embarrassment, and the movie wouldn’t have been as funny.

Open access vs Free access

[Connotea] [del.icio.us] [Digg] [diigo] [Google] [LinkedIn] [Reddit] [StumbleUpon] [Email]

Plos open access logoWe have recently added to novoseek new articles from PubMed Central. This new feature provides the ability to access “full text publications” and we have noticed that there is quite some misunderstanding regarding what has actually been indexed. So let us explain it in detail.

Indeed, we have included the Open Access subset of PubMed Central. What is that? Well, Open Access is the free online access to research papers. Obviously, this definition has driven some confusion and misuse of the term “open” access as it is often considered a synonym to “free” access.

The first definition for open access came up at the Budapest Open Access Initiative which was later revised in Bethesda and Berlin. This led to what Peter Suber calls the BBB open access definition for which most of the Open Access Movement agreed on.
The Open Access definition stands around two ideas:

  • Free of charge accessibility
  • Tears down permission barriers

Consequently, these ideas make distribution, copying and derivative work production possible to anyone.

Interestingly, we’ve observed that most of the time, open access is used as a synonym to free access. This is not quite correct since open access goes beyond just free access to content. For a better understanding of the differences between them, have a look at the graphic below.

open-access

PubMed Central is a free peer reviewed digital archive of biomedical and life sciences literature developed and managed by the NIH. It gives free access to articles among which some are open access. As we have discussed in previous posts, the NIH public access policy has ensured the access to published results of NIH funded research. However it does not say whether it has to be through a free access or an open access policy.

In novoseek, we have analyzed with our text mining algorithms the full text of the open access subset and we have made it public. So now you will find full text articles in which you will be able to highlight all the relevant keywords, and enjoy the great features of our technology.

We hope you like this new data set and we will more than welcome your comments and suggestions.