Workflow shapes
Capturing methodologies from scientific literature

The methods we choose to use in any analysis contribute strongly to the outcome. Furthermore, it is common throughout science to find many different methods that all perform the same basic task, all of which can make any number of varying assumptions about your data. This variety of methods and assumptions can create difficulties when communicating the methodology used in an experiment. It also makes it difficult for new researchers to decide which methods they should use in their experiments.

My work has centred on the computational collection and analysis of methodologies from full text scientific articles. We have already conducted a survey of 22,000 articles related to phylogenetic analysis. Our results have highlighted interesting field-specific and temporal trends of phylogenetic practice. We have also investigated the influence of 'Expert' authors in relation to others in their field.

Workflow shapes
Software, services and resources for text mining and bioinformatics

The level of sophistication in text mining software is very high. For various reasons, however, a lot of this software is often not made available to the wider research community. I have been working on core text mining software and services (see below) to automate common text mining tasks (such as corpus collection). I have deliberately aimed to make my software available through a range of interfaces (browser, web service, API, local GUI) that support different needs. Most of the software I have made is written in Java which can be run on a large range of very different machines. I have also made the vast majority of the code I have written for these projects available to view, download and generally mess around with, this is made possible by standard open source licenses. My software projects have all made use of third-party open source software libraries and projects and without these, much of the work would have been significantly more labourious.

Workflow shapes
Information mining from full text scientific literature

Scientific literature is now widely available in a variety of electronic formats. This enables us to employ computational methods to survey vast literature collections for information. An article can contain many different forms of information that can be relevant to other researchers (e.g. citations, protocols, algorithms, hypotheses, sequence data etc.). As a preliminary step to isolating and extracting these different forms of information, I have developed a text classifier that can label the sections of an article (e.g. Introduction, Results). This enables us to target information mining software to specific sections of an article, based on the information we are seeking, thus improving accuracy and reducing the computational effort required.

Publications
2009
Afzal H, Eales J, Stevens R and Nenadic G
Mining Semantic Networks of Bioinformatics e-Resources from the Literature
Proceedings of the SWAT4LS Workshop, 2009, Amsterdam, The Netherlands, Full Text (PDF) , Proceedings
Sarafraz F, Eales JM, Mohammadi R, Dickerson J, Robertson DL, Nenadic G.
Biomedical event detection using rules, conditional random fields and parse tree distances
Proceedings of the Workshop on BioNLP: Shared Task, NAACL-HLT 2009, Boulder, USA, Full Text (PDF) , Proceedings
2008
Eales JM, Stevens RD, Robertson DL.
Full-Text Mining: Linking Practice, Protocols and Articles in Biological Research
Proceedings of the BioLink SIG, ISMB 2008, Toronto, Canada, Full Text (PDF) , Proceedings
Eales JM, Pinney JW, Stevens RD, Robertson DL.
Methodology capture: discriminating between the "best" and the rest of community practice.
BMC Bioinformatics 9:359, Full Text (HTML) , Full Text (PDF) , doi:10.1186/1471-2105-9-359 , UK PubMed Central
Software and Web Services
Software
Article Section Classifier A classifier that determines where, in the standard scientific article structure, a supplied piece of text is from. Google Code Project Site
Full Text Article Downloader An intelligent software agent which will download the full text of scientific articles in PDF form. Google Code Project Site
Services (Web or otherwise)
Article Section Classifier Web App A web application that allows you to use the text classifier described above, as well as some other services. It also has links to the WSDL/SOAP implementations of these services.
Full Text Web Services A static list of SOAP/WSDL web services that can be used in the processing of full text articles, these are being added to at the moment.
The text mining repository (HomeTextRepo) A resource website that describes, links and structures resources used in text mining research. It is intended to be a central repository for text mining resources.
Poster Presentations
2007
Eales JM, Pinney JW, Stevens RD, Robertson DL.
Capturing phylogenetic workflows from scientific literature
ISMB/ECCB 2007, Vienna, Austria, Poster
Eales JM, Stevens RD, Robertson DL.
Capturing best practice phylogenetic workflows from scientific literature
PopGroup 2007, Manchester, UK, Poster