Assessing the skills shortage in bioinformatics
Introduction
This report provides information regarding a survey conducted on the skills portfolio sought by potential employers of today's graduates in bioinformatics. A number of European companies and academic institutes (predominately UK-based) were contacted; these companies/institutes were involved in bioinformatics, biotechnology or drug development. EMBER participants and other EMBnet nodes were also included in this survey.
Methods
A questionnaire, which is appended to the end of this document, was sent out via e-mail to a number companies and institutes.
Results
The total number of contacts made during this survey was 188 (30 EMBnet nodes and 158 other), and the number of responses are provided in the following table:
Table 1
|
Type |
Contact |
Response |
No response |
|
EMBnet node |
30 |
5 |
25 |
|
Academic |
13 |
2 |
11 |
|
Corporate |
145 |
24 |
121 |
Of the companies/institutes ('Academic' and 'Corporate' in table 1) that responded, eight did not complete the questionnaire because they did not use bioinformatics and, hence, did not seek such graduates. Of those that did complete the questionnaire, the following table provides a breakdown of the results:
Table 2
|
Question |
Response |
|
|
Yes |
No |
|
|
1A Sequence, structure and database technologies |
22 |
1 |
|
1B Database management systems |
15 |
8 |
|
1C Database development strategies |
11 |
11 |
|
2A Molecular evolution |
16 |
7 |
|
2B Sequence similarity tools |
23 |
0 |
|
2C Multiple sequence alignment tools |
23 |
0 |
|
3A Secondary, super-secondary and tertiary structure |
21 |
2 |
|
3B Visualisation tools |
20 |
2 |
|
3C Prediction techniques |
16 |
5 |
|
4A Genomic structure |
21 |
2 |
|
4B Sequencing techniques |
17 |
6 |
Table 2 continued
|
Question |
Response |
|
|
Yes |
No |
|
|
4C Gene prediction methods |
21 |
2 |
|
5A EST clustering and assembly techniques |
17 |
5 |
|
5B High-throughput gene expression techniques |
17 |
6 |
|
5C Microarray design and analysis |
17 |
6 |
|
6A Experimental techniques in protein identification and characterisation |
14 |
8 |
|
6B Protein prediction techniques |
17 |
4 |
|
6C Image analysis |
11 |
11 |
|
7A Knowledge of procedural and object-oriented programming languages |
16 |
6 |
|
7B Basic statistics and information theory |
21 |
1 |
|
7C Alignment algorithms |
18 |
4 |
|
7D Biological ontologies |
14 |
8 |
|
8 Are there any obvious deficiencies in graduates from bioinformatics courses |
10 |
4 |
Discussion
From the results provided in table 2, most employers of graduates in bioinformatics expect general knowledge of all bioinformatics techniques; although opinions differed slightly depending on the interest of the contact. For example, some contacts commented that working knowledge of sequencing techniques (question 4B in table 2) is not always a prerequisite; however, some appreciation of the techniques used is expected in order to communicate effectively to biologists and to help solve problems.
When asked to express any concerns regarding graduates from current bioinformatics courses (question 8 in table 2), a general issue was the lack of bioinformaticists coupled with the need for diversity in the workplace. It would seem that generalists and specialists in bioinformatics are required, as well those who apply techniques (computer-literate biologists) and those who develop software (biology-literate computer scientists). Many bioinformatics courses are geared towards biologists, however, it was suggested that graduate skills could be funnelled: many biologists come to bioinformatics with a need to strengthen their computer skills, whereas more computer scientists should be attracted with a view to providing them with the appropriate biological knowledge. As it is likely that MSc courses will produce generalists and PhD research programmes will produce specialists, this project should be aimed at producing generalists and, perhaps, strengthening skills of both biologists and computer scientists.
Other issues highlighted were the need to know the success/reliability of various techniques and to understand results generated. For example, few graduates fully appreciate the difference between BLAST and PSI-BLAST, or the meaning of E-values reported in BLAST results. It was also suggested by one contact that many courses are preoccupied with sequence analysis techniques, leading to a lack of knowledge in other areas: i.e., use of enzymology, signal transduction, gene regulation and histology databases; Hidden Markov Models; neural networks; logic programming; and statistics. Perhaps by focusing on certain skills sets required by postgraduate students, such areas can be introduced into a course without comprising crucial modules.
Conclusion
In order to provide a course suitable for those with a biological or computational background, a syllabus is required that consists of four main categories: informatics, molecular biology, general bioinformatics and new techniques. The first category is required for biologists to strengthen their computer skills; the second to provide computer scientists with adequate biological knowledge; the third to provide postgraduates from all backgrounds with a basic knowledge of bioinformatics techniques; and the fourth to increase awareness of new and upcoming technologies. During the survey, lack of awareness of microarray design and analysis was highlighted; therefore, the last category is necessary to make students, at the minimum, aware of new technologies that may be in place when taking up employment.
To assess the extent of the current skills shortage in bioinformatics, we wish to learn about the skills portfolio sought by potential employers of today's graduates. Please could you help us by answering the following questions, and feel free to add your own comments:
What do you expect graduates to be familiar with:
1. Databases and database technologies
(a) Sequence, structure and specialised databases?
(b) Database management systems?
(c) Database development strategies?
(d) Specific comments:
2. Sequence analysis
(a) Molecular evolution?
(b) Sequence similarity search tools?
(c) Multiple sequence alignment tools?
(d) Specific comments:
3. Protein structure
(a) Secondary, super-secondary and tertiary structure?
(b) Visualisation tools?
(c) Prediction techniques?
(d) Specific comments:
4. Genomics
(a) Genome structure?
(b) Sequencing techniques?
(c) Gene prediction methods?
(d) Specific comments:
5. EST analysis
(a) EST clustering and assembly techniques?
(b) High-throughput gene expression techniques?
(c) Microarray design and analysis?
(d) Specific comments:
6. Proteomics
(a) Experimental techniques in protein identification and characterisation?
(b) Protein prediction methods?
(c) Image analysis?
(d) Specific comments:
7. Biocomputing and statistics
(a) Knowledge of procedural and object-oriented programming languages?
(b) Basic statistics and information theory?
(c) Alignment algorithms?
(d) Biological ontologies?
(e) Specific comments:
8. Are there any obvious deficiencies in graduates from current bioinformatics courses?
Specific comments: