April 16th, 2010 | Valentin | Media coverage | Tags: bioinformatics, blog, cloud computing, PubMed, PubMed central, quality of service, User experience, web 2.0
The interview of Christian Blaschke PhD, scientific director at novoseek was originaly published in Spanish and titled Bioinformatics in the business world in Jose María Fernández González’s blog. José María Fernández González is a bioinformatician at CNIO (Madrid – Spain). He has developed web services for iHOP. With a view to sharing it with the rest of the scientific community and english speaking people, we have translated it and published it here.
I’ve always wondered how to make a bioinformatics-related development in the business, because the objectives are different. In the scientific world almost always you have to publish prior to your competitors, whereas in the business world, the objectives are more related to the versatility and robustness of the tools and systems developed.
Therefore, when I got the opportunity to conduct a series of questions about someone who is “on the other side”, I grabbed the chance. Christian Blaschke, working at Bioalma answered my questions about the development of novoseek, a text-mining product.
Christian Blaschke is a graduate in Plant Physiology from the University of Salzburg and has a Ph.D. in Molecular Biology from the Autnoma University in Madrid. He began his career developing data-mining systems and information extraction in the Protein Design Group. Today he is the Research & Development Director and Principal Investigator in several European projects in which Bioalma takes part. He was also the coordinator for the first edition of the BioCreAtivE, an assessment for text-mining algorithms. He has been conducting research in text mining for more than 10 years.
- In general, for ordinary people, what is novoseek?
It is a Web 2.0 search engine for scientific literature and also an alternative to Pubmed to search in Medline, in full-text articles from PubMedCentral and in U.S.
Grants. It is based on a unique text-mining technology that analyzes and processes the nearly 20 millions publications available in PubMed and the 3 million existing concepts in the literature. Our technology analyzes and takes into account the synonyms and homonyms to the search term, which allows to return relevant and complete results in the very first search. In addition, a profile (which appears in the left bar of the browser) is created for each search. This profile displays important concepts related to the search with a view to using them as filters and make the search more specific. Thus, the user finds the publications he needs to read in a more simple, fast and reliable way.
- What was originally the idea to create this tool?
In the late 90s I was fortunate to work with Alfonso Valencia (then working at the National Center for Biotechnology in Madrid) in subjects dealing with word processing and information extraction. He was among the first to work on these subjects in the field of molecular biology and bioinformatics and I was able to explore many ideas. At that time we were interested in extracting proteins interactions and the analysis of the results on DNA microarrays based on the knowledge published in the scientific literature. Later we realized we could offer the benefits of the technologies we had developed to a wider audience and find a way in which biomedical researchers could benefit from it. So in Bioalma we started working on products that would be based on text analysis in the biomedical field. One could say that novoseek is the third generation of this products development and that we have now brought it online.
- How many people were necessary for the development of novoseek? Did they / Do they have highly specialized profiles (text mining, databases, etc …)?
We started with a few people and we are currently a dozen that participate actively in the development of novoseek. We are a multidisciplinary team which includes people trained in many areas. From software engineers, experts in the development of databases, bioinformatitians, biochemists, pharmacists to experts in artificial intelligence. In addition we have long been dealing with texts and analyzing natural language. This is an area in which most of our team has experience.
- Are there critical points with the current tools and web systems such as keeping the information updated and consistent. Did you have / Do you have many issues?
At first it wasn’t easy because the set of documents included in PubMed were much larger than anything we had processed before in our work experience. But I have to say that we have a great team and today we integrate documents published in PubMed (abstracts of publications) and PubMedCentral (full text) every day.
- How do you get feedback from regular users? I mean, do they propose interesting features, or do they help you detect problems or system failures.
Novoseek is a service based on state-of-the-art technology, people working in the company are quite young, they know the internet well and are concerned with constantly improving the user experience. Therefore, their feedback is very important to us. We have opened discussion platforms that have a particular role. In uservoice, users tend to make us suggestions as to new developments and usability. We study them and we include them into our development “road map”. There are things that are easy to implement and take little time (like export to CiteULike) and others that we need to assess and may take longer (such as search in figures and images). Twitter (@novoseek) is a tool we use for real time communication with our users and to share information such as: interesting publications, news and interesting links for our community, surveys or a more direct feedback. For example, I remember the time someone asked us if novoseek was down and in 5 minutes, 5 people (including us) told her that it was not.
I admit that there is a subtle balance between what people want in the web-based service and what we think is good for efficient searches and a nice user experience. In general, user feedback helps us a lot.
- If today you had to start from scratch the design of a tool with the same target as novoseek, having the background that you now have, what would you not do?
Our professional education is very technical and this was reflected in our previous products. They were very powerful but sometimes too complex for our target audience. We thought that more (functionality) was better than less and we did not consider enough the point of view of our users. For us this has been quite a journey in which we learned a lot. In the last months we have conducted many usability tests, and we realized that there are elements that are not clear enough. So we are currently working on a redesign of novoseek. This should help understand better how it differs from PubMed and what it actually bring to users.
- In the current scientific landscape of web 2.0, web services, bibliographic social networks (such as CiteULike, Zotero 2.0, …), etc … that is beginning to be beyond PubMed or Google Scholar “Are you facing many challenges to link (or provide links) to these resources?
Given our work and activity online, we know well the other web 2.0 tools that today are part of the life of a novoseek user. They are tools we are also using ourselves and that we consider important because they are completing the service offered by novoseek. It is a requirement that we must meet so that people keep using novoseek. So far, we have done it for CiteULike and it is pending for Zotero 2.0 and Mendeley. As these web 2.0 services grow in number and their use is increasing among scientists, novoseek has to be more compatible with them.
- Nearly all bioinformatics services today (either academic or commercial) offer programmatic APIs. What can you tell about yours?
For novoseek’s API we have used REST based on the XML standard because it is relatively simple to use and there are libraries for most programming languages available today.
As for the functionality it offers, we tried to bring most things that can be done in novoseek to the API. One can do searches based on words and biological concepts (like e.g. genes, diseases, drugs or chemicals) to retrieve documents. The documents offer all the entries included in novoseek and these can be used as a basis for new text mining services. It also offers the key concepts that are calculated for a search related to the documents returned and that characterize this set of documents.
Our main goal is to offer the possibility to integrate the functionality of novoseek on other platforms. For example to enrich the content of web pages or blogs. Furthermore, it is now very common to do ” mash-ups” between different systems to create something totally new. We wanted people to be able to use novoseek in new ways beyond what might occur to us. People interested can request an API Key in http://api.novoseek.com
- What are the future plans for a tool like novoseek?
In the future we want to extract more and more information from the documents which are indexed in novoseek to allow ever more powerful searches. One problem is that e.g. in PubMed you can not search for a person. If you search for “John Smith” the system will return documents where the name refers to different people. Or in documents where “J Smith” appears as an author, you do not know if it belongs to “John Smith” or “Jeff Smith”. Another problem that requires a lot of work is to find specific information such as, e.g., what drugs treat a disease or what are the genetic causes of a disease. We want to solve these problems for our users to save them time spent searching and so that they could devote to actually reading the documents that are relevant to them.
- Can you tell more about the infrastructure needed to provide this service?
At first we set up novoseek on a small cluster of Linux machines installed in our offices in Madrid. But we realized that keeping a 24 hour service with minimum disruption was not easy. We were depending on a single Internet line that failed several times in the first months. The air conditioning system was not secure enough and we could not withstand power outages of over 15 minutes. After evaluating many options such as hosting of machines in a data center or collocation of our own hardware in one of them, we chose the web services offered by Amazon (which is known as AWS – Amazon Web Services consisting of EC2 and S3). Amazon offers what is known today as “the cloud”, a system of virtual machines that are configured in a flexible way. It is easy to create more nodes to meet our growing needs and also pay only what is actually used. The decision to migrate novoseek to the Amazon platform solved the problems I mentioned before because it is a very stable environment that has not failed us so far.
Thank you to José María Fernández González and Christian Blaschke for their time and dedication for this interview.
February 26th, 2010 | allende | Thoughts, User experience | Tags: novoseek, PubMed, search engines, search path
The other day Valentin and I were discussing how scientists confront the time-consuming task of looking for information in the scientific literature. From my experience as a scientist and from conversations with friends and colleagues, we found out that many of them end up in a frustrating situation when searching and that their path to successful searches in Pubmed can be summarized in one of the 4 following options:
- Direct
We manage to find results from PubMed although in some cases we have to face the use of MeSH terms. Hashtag #nsdirect
- Ask for help
After some time facing Pubmed search engine without any success, we decide to ask for help from a colleague or a librarian. Some of our friends told us that they don’t they take this path without trying to do it themselves. Hashtag #nsafh
- Alternatives
After performing some searches in PubMed and not succeeding in our commitment we just look for alternative search engines like Google, 3rd party pubmed tools or obviously novoseek (we asked our friends they what do you expect them to use besides PubMed.
Hashtag #nsalt
- Beer. Why not?
I mean, after a hard working day, what is better than a beer and face the challenge some other day. Hashtag #nsbeer
Take a look at the image below, its so funny and so real
Now we need you to act! What is your path? Tweet this post to your followers adding the #hashtag that better describes you.
-
Direct, this is my path to successful searches in Pubmed
Ask for help, this is my path to successful searches in Pubmed
Alternatives, this is my path to successful searches in Pubmed
Beer, why not?, this is my path to successful searches in Pubmed
Have a great weekend.
December 14th, 2009 | Valentin | Events, librarians | Tags: blog, Events, librarians, medical librarians, open access, PubMed, Resources, social media, twitter, web 2.0
Welcome to the MedLib’s Round 1.9. This month, the MedLib’s Round did not specify a special theme. This may have encouraged medical librarians and you to submit articles as 13 16 people took part in this round. It is always interesting to read medical librarians’ concerns as they are using state-of-the-art tools and techniques to work better and face the new challenges of communication, information retrieval and are always keen on sharing their impression on new services and debates. You will notice that this month, the MedLib’s Round leaves room for discussions on Social Media, web 2.0 services and Tips for health.
Thoughts
Social Media in health
- How Can We Help? Roles for Librarians in Public Health on PH/HA News by Alison Aldrich
Alison exposes her feelings after she attended the American Public Health Association conferences. Her post is a nice sum up of the conference and the people that were there as lecturers. The amount of conferences about social media for health matters lets us guess its importance in discussions. She raises a great question about the importance of advocating open access to public health research too. Indeed, she spent a time in the National Library of Medicine’s booth in the exhibit hall where she could talk about one common question: “how can I get all of your journal articles for free?“.
- What is Google Wave and why should I care? on Krafty Library.
Michelle sums up what is google Wave about and how you can use it. This article will be perfect for you to discover, understand and start using Wave in a proper way. She describes how medecial librarians have already created dedicated waves but still doubts about usefulness of the tool. (Follow her on Krafty)
- Manhunt: Google Wave for Community (Emergency?) Communication posted at Eagle Dawg Blog by Nicole S. Dettmar.
Nicole took part in the google wave about the manhunt in Seattle that happened in early december. She shows how powerful was the very google wave as to information exchange and how users helped enriching previous content. At the same time, she raises the problem of false information in waves (new email tool from google encouraging real time exchange) and spam that may get into these new communication channels. (Follow EagleDawg on twitter)
- FDASM Highlights for UM Stakeholders, pt. 1: Early Presenters as SWOT-Plus posted at Emerging Technologies Librarian by Patricia F Anderson
Patricia wrote a great post (the second) on the highlights of the FDASM. The FDASM is an initiative from the FDA about the use of internet and social media for health related communications for FDA-regulated products. This public hearing held in early november was a beginning to knowing and discovering how to use social media channels to communicate about products. In that field, she recalls how the FDA has already been providing essential resources online. There is more to learn and I encourage you to read it. (Follow Patricia F. Anderson on twitter)
Web 2.0 services for health
- Biomedical search on Biomedsearch by Dr. Shock on Shock M.D.
In this article, we learn with Dr Shock about a new tool that wants to provide free access to documents relating to the biomedical field. He explains the functions of this search engine and wonders about whether it can be an alternative to the redesigned pubmed.
- How to switch from one to the other antidepressant by Dr Shock.
There’s one common problem with antidepressant which is either that the antidepressant does not work or provokes side effects. When that happens, you have to switch from one to another. This can be a tricky task and Dr. Shock presents us some great resources websites to manage it. (Follow Dr. Shock on twitter)
- Medpedia Now Includes News & Analysis, Alerts, Q&A by Walter Jessen on HighlightHealth
Walter Jessen focuses here on new functionalities brought recently to Medpedia and create a richer experience for users. Mepdedia is a medical wiki and has useful functionalities. You will now be able to use in Medpedia the following features: News & Analysis from over 150 professionals, Alerts from real time web platforms and Answers (a kind of medical Yahoo Answers). He then wonders about the possibility for Medpedia to become a medical wikipedia thanks to the amount of reliable information it has. (Follow HighlightHealth on twitter)
Tips
- Adding Methodological Filters to MyNCBI posted at Laika’s MedLibLog by Jacqueline
Jacqueline has created a great tutorial to learn how to add methodological filters to MyNCBI. MyNCBI is one’s account on Pubmed. Obviously, creating filters is a must-use option when you are keen on research and need to automate search processes. In that case, she shows how Pubmed allows to create and run advanced filters to save time. It is always nice to read well detailed techniques that will turn the reader in a better prepared person for searches. Jacqueline writes a lot about Pubmed and she has a great experience with it. Enjoy learning with her. (Follow Laikas on twitter)
- How to follow Twitter users in Google Reader on Clinical Cases and Images Blog by Dr. Ves Dimov.
In this post, Dr. Ves Dimov (who has a great blog in medicine) makes us discover a way to easily read Twitter updates without even following the people via Google Reader. Dr. Ves Dimov explains how this approach makes it easier for him to manage multiple information streams. Plus, Google Reader is web-based and can be accessed from any devide with an interent connection. (Follow Dr. Ves Dimov on twitter)
- How to make and maintain a Library Twitter account on DigiCMB by Guus Van Den Brekel
Guus shows in this example of a new twitter account how to fine tune parameters to receive all the possible interesting updates and twitts right into your twitter account. This tutorial will definitely take you to the best practices in terms of interconnection and follow up! (Follow DigiCMB on twitter)
- Allergy Notes: If you think blogs don’t matter, think again: this blog is the number one search result for “allergic rhinitis guidelines” on Allergy News Updated Daily Blog Dr. Ves Dimov.
Interesting reflexion by Dr. Ves Dimov on the role of blogs when looking for information online through search engines. From the example “allergic rhinitis guidelines” which is the first result on Google, and a blog post -hence listed before NEJM- he shares with us his vision of the future of search results. Blogs and fresh content can play a significant role but better quality source should always be sought for. (Follow Dr. Ves Dimov on twitter)
- A review of the main reference management softwares on Knowledge beyond words by Valentin.
Through a detailed post, we describe the main citation managers available out there and their particularity. You should consult this article if you are to decide what is the citation managers most adapted to your needs and uses. There is also the results of a poll launched on twitter asking people what is their favorite citation manager. (Follow novoseek on twitter)
Thank you for reading this MedLib’s Round on Knowledge beyond words. We’d like to help spread Jacqueline’s message who is looking for ideas for a logo and a new name for the MedLib’s Round which is according to Berci, one of the important things of a blog carnival. So feel free to submit her your ideas, it will be much appreciated.
Feel free to subscribe to the RSS feed of MedLib’s Round Blog Carnival. Next MedLib’s Round will be published next January 5th on Dr. Shock’s blog and you can already submit your materials via this form.
October 27th, 2009 | Valentin | News releases | Tags: PubMed, quality of service, search engines
There has been quite a surprise yesterday on the world wide web as the redesigned version of Pubmed was released once and for all all of a sudden, like said Stephanie Fulton on twitter. However this was almost a non-surprise as it was taken off almost right away and made Librarian EagleDawg write about it. In fact, it looks like Pubmed expected technical difficulties releasing the redesigned version of its search engine.
Guys, we would like all of the Pubmed users to know that we -novoseek- are not responsible at all for this and that we did not touch or unplug Pubmed at any moment
.
You can click the image to view it in 1280 x 800 pixels and save it to your computer.
September 2nd, 2009 | Valentin | News releases, User experience | Tags: presentation, PubMed, search results, User experience
The idea for this post came to me while I was conversing with a relative. She is a medical resident and informed me that she had to start using Pubmed overnight and happened to find it a bit complicated. Consequently, I could confirm that Pubmed is pretty hard for novices to use and took advantage of the opportunity to pitch novoseek to her. Should I remind you that novoseek is a free, easy and intuitive biomedical search engine? Anyway, this discussion with my relative reminded me that some time ago, I heard (thanks to fellow followers present on the MLA in Hawaï) that Pubmed was about to enhance its interface this summer.
This announcement is actually big news for the life sciences community as Pubmed, the search engine of the National Institute of Health, is one of the most used among the choices offered on the web today. Due to the amount of queries it has every day, improving the user experience was something normal and expected. Alisha Miles (a medical librarian for a non-profit hospital in Georgia) declared: “these all sound like wonderful improvements. Hopefully, we will get to a point where we can provide input to NLM before some changes are rolled out“.
Interestingly, these changes aim to make it “easier to use“, will “simplify the interface” and “refresh the look” and offer “better organized text on screen“. It is interesting that Pubmed is moving towards a simpler user interface, as novoseek has been doing this from the beginning.
If you are not familiar with Pubmed, let’s have a look at the screenshot below in order to realize how the layout organized currently.

Compare it to novoseek’s current layout.

We acknowledge that a change -as slight as it can be- was necessary. Indeed, Pubmed is difficult to use. It requires learning, training and improving skill to handle it properly. This is why there are many resources (Check this for instance: 18 ways to improve your pubmed searches) and classes about it. The changes will be the following:
- The tabs will disappear
- A narrower top banner
- Combination of Abstract and Abstract +
- “+” below each citation
- “Send to” option a lot more visible
- The right column will be wider and occupies almost 25% of the screen. It will show: the related articles, “Also try” option and recent activity
If you want to have a sneak preview of what it’ll look like you can check directly on David Gillikin’s presentation, although the images are not optimized for viewing on purpose. To make a long story short: Pubmed is about to go a bit more social and current.
Obviously, I have to compare these changes to novoseek’s features. Pubmed currently has more functions than novoseek. However, novoseek has been developed from the beginning with the goal of making it an easy to use, simple and fast biomedical search engine. Now Pubmed seems to be going that way, too.
In addition, we are adding new functions according to your needs. You can now check your search history, save searches and articles, create alerts and manage labels through my novoseek. These are functions we have developed according to the users’ expectations. Indeed, being close to users through twitter, uservoice make interactions and quick answers to their questions possible. We believe it is one of our strenghts against Pubmed.
Should you need to discover how to use novoseek to the best of its ability, you should have a look at the presentation below: