Entries Tagged 'Media coverage' ↓

Interview with Christian Blaschke, scientific director of novoseek

[Connotea] [del.icio.us] [Digg] [diigo] [Google] [LinkedIn] [Reddit] [StumbleUpon] [Email]

The interview of Christian Blaschke PhD, scientific director at novoseek was originaly published in Spanish and titled Bioinformatics in the business world in Jose María Fernández González’s blog. José María Fernández González is a bioinformatician at CNIO (Madrid – Spain). He has developed web services for iHOP. With a view to sharing it with the rest of the scientific community and english speaking people, we have translated it and published it here.



I’ve always wondered how to make a bioinformatics-related development in the business, because the objectives are different. In the scientific world almost always you have to publish prior to your competitors, whereas in the business world, the objectives are more related to the versatility and robustness of the tools and systems developed.

Therefore, when I got the opportunity to conduct a series of questions about someone who is “on the other side”, I grabbed the chance. Christian Blaschke, working at Bioalma answered my questions about the development of novoseek, a text-mining product.

Christian Blaschke is a graduate in Plant Physiology from the University of Salzburg and has a Ph.D. in Molecular Biology from the Autnoma University in Madrid. He began his career developing data-mining systems and information extraction in the Protein Design Group. Today he is the Research & Development Director and Principal Investigator in several European projects in which Bioalma takes part. He was also the coordinator for the first edition of the BioCreAtivE, an assessment for text-mining algorithms. He has been conducting research in text mining for more than 10 years.

  1. In general, for ordinary people, what is novoseek?
  2. It is a Web 2.0 search engine for scientific literature and also an alternative to Pubmed to search in Medline, in full-text articles from PubMedCentral and in U.S.
    Grants. It is based on a unique text-mining technology that analyzes and processes the nearly 20 millions publications available in PubMed and the 3 million existing concepts in the literature. Our technology analyzes and takes into account the synonyms and homonyms to the search term, which allows to return relevant and complete results in the very first search. In addition, a profile (which appears in the left bar of the browser) is created for each search. This profile displays important concepts related to the search with a view to using them as filters and make the search more specific. Thus, the user finds the publications he needs to read in a more simple, fast and reliable way.

  3. What was originally the idea to create this tool?
  4. In the late 90s I was fortunate to work with Alfonso Valencia (then working at the National Center for Biotechnology in Madrid) in subjects dealing with word processing and information extraction. He was among the first to work on these subjects in the field of molecular biology and bioinformatics and I was able to explore many ideas. At that time we were interested in extracting proteins interactions and the analysis of the results on DNA microarrays based on the knowledge published in the scientific literature. Later we realized we could offer the benefits of the technologies we had developed to a wider audience and find a way in which biomedical researchers could benefit from it. So in Bioalma we started working on products that would be based on text analysis in the biomedical field. One could say that novoseek is the third generation of this products development and that we have now brought it online.

  5. How many people were necessary for the development of novoseek? Did they / Do they have highly specialized profiles (text mining, databases, etc …)?
  6. We started with a few people and we are currently a dozen that participate actively in the development of novoseek. We are a multidisciplinary team which includes people trained in many areas. From software engineers, experts in the development of databases, bioinformatitians, biochemists, pharmacists to experts in artificial intelligence. In addition we have long been dealing with texts and analyzing natural language. This is an area in which most of our team has experience.

  7. Are there critical points with the current tools and web systems such as keeping the information updated and consistent. Did you have / Do you have many issues?
  8. At first it wasn’t easy because the set of documents included in PubMed were much larger than anything we had processed before in our work experience. But I have to say that we have a great team and today we integrate documents published in PubMed (abstracts of publications) and PubMedCentral (full text) every day.

  9. How do you get feedback from regular users? I mean, do they propose interesting features, or do they help you detect problems or system failures.
  10. Novoseek is a service based on state-of-the-art technology, people working in the company are quite young, they know the internet well and are concerned with constantly improving the user experience. Therefore, their feedback is very important to us. We have opened discussion platforms that have a particular role. In uservoice, users tend to make us suggestions as to new developments and usability. We study them and we include them into our development “road map”. There are things that are easy to implement and take little time (like export to CiteULike) and others that we need to assess and may take longer (such as search in figures and images). Twitter (@novoseek) is a tool we use for real time communication with our users and to share information such as: interesting publications, news and interesting links for our community, surveys or a more direct feedback. For example, I remember the time someone asked us if novoseek was down and in 5 minutes, 5 people (including us) told her that it was not.

    I admit that there is a subtle balance between what people want in the web-based service and what we think is good for efficient searches and a nice user experience. In general, user feedback helps us a lot.

  11. If today you had to start from scratch the design of a tool with the same target as novoseek, having the background that you now have, what would you not do?
  12. Our professional education is very technical and this was reflected in our previous products. They were very powerful but sometimes too complex for our target audience. We thought that more (functionality) was better than less and we did not consider enough the point of view of our users. For us this has been quite a journey in which we learned a lot. In the last months we have conducted many usability tests, and we realized that there are elements that are not clear enough. So we are currently working on a redesign of novoseek. This should help understand better how it differs from PubMed and what it actually bring to users.

  13. In the current scientific landscape of web 2.0, web services, bibliographic social networks (such as CiteULike, Zotero 2.0, …), etc … that is beginning to be beyond PubMed or Google Scholar “Are you facing many challenges to link (or provide links) to these resources?
  14. Given our work and activity online, we know well the other web 2.0 tools that today are part of the life of a novoseek user. They are tools we are also using ourselves and that we consider important because they are completing the service offered by novoseek. It is a requirement that we must meet so that people keep using novoseek. So far, we have done it for CiteULike and it is pending for Zotero 2.0 and Mendeley. As these web 2.0 services grow in number and their use is increasing among scientists, novoseek has to be more compatible with them.

  15. Nearly all bioinformatics services today (either academic or commercial) offer programmatic APIs. What can you tell about yours?
  16. For novoseek’s API we have used REST based on the XML standard because it is relatively simple to use and there are libraries for most programming languages available today.
    As for the functionality it offers, we tried to bring most things that can be done in novoseek to the API. One can do searches based on words and biological concepts (like e.g. genes, diseases, drugs or chemicals) to retrieve documents. The documents offer all the entries included in novoseek and these can be used as a basis for new text mining services. It also offers the key concepts that are calculated for a search related to the documents returned and that characterize this set of documents.
    Our main goal is to offer the possibility to integrate the functionality of novoseek on other platforms. For example to enrich the content of web pages or blogs. Furthermore, it is now very common to do ” mash-ups” between different systems to create something totally new. We wanted people to be able to use novoseek in new ways beyond what might occur to us. People interested can request an API Key in http://api.novoseek.com

  17. What are the future plans for a tool like novoseek?
  18. In the future we want to extract more and more information from the documents which are indexed in novoseek to allow ever more powerful searches. One problem is that e.g. in PubMed you can not search for a person. If you search for “John Smith” the system will return documents where the name refers to different people. Or in documents where “J Smith” appears as an author, you do not know if it belongs to “John Smith” or “Jeff Smith”. Another problem that requires a lot of work is to find specific information such as, e.g., what drugs treat a disease or what are the genetic causes of a disease. We want to solve these problems for our users to save them time spent searching and so that they could devote to actually reading the documents that are relevant to them.

  19. Can you tell more about the infrastructure needed to provide this service?
  20. At first we set up novoseek on a small cluster of Linux machines installed in our offices in Madrid. But we realized that keeping a 24 hour service with minimum disruption was not easy. We were depending on a single Internet line that failed several times in the first months. The air conditioning system was not secure enough and we could not withstand power outages of over 15 minutes. After evaluating many options such as hosting of machines in a data center or collocation of our own hardware in one of them, we chose the web services offered by Amazon (which is known as AWS – Amazon Web Services consisting of EC2 and S3). Amazon offers what is known today as “the cloud”, a system of virtual machines that are configured in a flexible way. It is easy to create more nodes to meet our growing needs and also pay only what is actually used. The decision to migrate novoseek to the Amazon platform solved the problems I mentioned before because it is a very stable environment that has not failed us so far.

Thank you to José María Fernández González and Christian Blaschke for their time and dedication for this interview.

You can get an API key here


A user case inspired by Flash Forward and a poll

[Connotea] [del.icio.us] [Digg] [diigo] [Google] [LinkedIn] [Reddit] [StumbleUpon] [Email]

Flash Forward is unsurprisingly one of the most exciting TV show of this year. As a result, I’m dying to watch a new episode every week. And when I do, I’m having a great time that leads to great discussions at the office about whether this is about future or not and how it can be modified. Nevertheless, I could notice a few weeks ago the interesting case of Edward Ned (also called Ned Ned) whose flash forward vision finds him in a club and having his skin totally black, whereas he’s white currently. Dr. Olivia Benford chooses to treat him as a regular patient no matter his flash forward but Dr. Bryce Varley -her colleague and now totally changed by his flash forward- has another opinion. Indeed, he thinks that this color change may be due to a disease; and that would explain many things regarding this patient. This is why he decides to refer to an online search engine to look for more information.

In order to know more about this Ned’s health condition, Bryce looks for “Pigment Change” in a symptoms search engine. His search returns 107 results and then helps him explain afetrwards that:

- Ned may have Addison’s disease which would explain why he’s black in the future (as he sees himself in his flashforward)
- The disease forces his body make melanine compounds instead of adrenaline
- Without Adrenaline his body is unable to build proper stress response (which explains he’s being so serene)

Obviously, novoseek has different goals (to the webpage Bryce is using) as it offers to explore the scientific literature. Nevertheless we can search for that disease -Addison’s disease- and observe what are the results like.

  1. A search for Addison’s disease via the Advanced Search panel returns 2,563 results in Medline.
  2. Observing the related concepts sidebar we can see that the most relevant diseases related to Addison’s disease are: Adrenal insufficiencies, primary adrenal insufficiency, autoimmune addisons disease, diabetes and Hyperpigmentation (with a relevance of 41%).
  3. addison_related_diseases
  4. Also, the most relevant related Signs and Symptoms indicate: alopecia, fatigue, malaise, cryoglobulinemic purpura, scalp pruritus…
  5. We click the “hyperpigmentation” disease and it is added to the current search: there are now 66 results in Medline
  6. From there, we can start exploring the literature and read interesting publications such as Adrenal autoantibodies and organ-specific autoimmunity in patients with Addison’s disease, Generalized pigmentation due to Addison disease., Long-lasting subclinical Addison’s disease..
  7. The reading of these is a good starting point to know more about the disease, its origins and possible treatments.

Obviously, this complementary information helps save Ned during surgery and Dr. Olivia Benford now has to admit that Ned’s Flash Forward actually helped save him. Based on that, we see the importance of research to know more about a disease, its symptoms and the existing treatments. Furthermore, a search for Addison’s Disease in US Grants could help know what are the current studies about this disease.

And now, I’m asking you:

Do you think Dr. Bryce Varley should use novoseek next time?

View Results

Loading ... Loading ...

dr_bryce_varley

NIH public access policy made permanent, new challenges

[Connotea] [del.icio.us] [Digg] [diigo] [Google] [LinkedIn] [Reddit] [StumbleUpon] [Email]
Old book

Old book

Good news!!! Today I have seen that the NIH initiative of public access policy will be made permanent. This is quite some time in a so competitive area as Science. Since the policy was implemented the percentage of manuscript sent to PMC has increase over 3,000 new articles each month.

If the information was overwhelming enough with 2,000 new articles per day -more than 18M scientific articles all together- the free access to full articles will increase the amount of data relevant to biomedicine. This increase is not only on the side of number of articles available but also on the total amount of information since the whole text of the article is going to be accessible. This brings new interesting challenges.

The question now is, how do we get through all the new information fast and efficiently? System that helps get relevant scientific information such as novoseek are more needed than ever.

However, is it really useful for scientists to have the results freely available 1 year behind? Obviously it is not the best possible scenario but the analysis of literature and Grant information could give us an insight on what would be new potential upcoming articles.