Assessing the skills shortage in bioinformatics

Introduction

This report provides information regarding a survey conducted on the skills portfolio sought by potential employers of today's graduates in bioinformatics. A number of European companies and academic institutes (predominately UK-based) were contacted; these companies/institutes were involved in bioinformatics, biotechnology or drug development. EMBER participants and other EMBnet nodes were also included in this survey.

Methods

A questionnaire, which is appended to the end of this document, was sent out via e-mail to a number companies and institutes.

Results

The total number of contacts made during this survey was 188 (30 EMBnet nodes and 158 other), and the number of responses are provided in the following table:

                   Table 1

Type

Contact

Response

No response

EMBnet node

30

5

25

Academic

13

2

11

Corporate

145

24

121

 

Of the companies/institutes ('Academic' and 'Corporate' in table 1) that responded, eight did not complete the questionnaire because they did not use bioinformatics and, hence, did not seek such graduates. Of those that did complete the questionnaire, the following table provides a breakdown of the results:

 Table 2

Question

Response

Yes

No

1A  Sequence, structure and database technologies

22

1

1B  Database management systems

15

8

1C  Database development strategies

11

11

2A  Molecular evolution

16

7

2B  Sequence similarity tools

23

0

2C  Multiple sequence alignment tools

23

0

3A  Secondary, super-secondary and tertiary structure

21

2

3B Visualisation tools

20

2

3C  Prediction techniques

16

5

4A  Genomic structure

21

2

4B  Sequencing techniques

17

6

Table 2 continued

Question

Response

Yes

No

4C  Gene prediction methods

21

2

5A  EST clustering and assembly techniques

17

5

5B  High-throughput gene expression techniques

17

6

5C Microarray design and analysis

17

6

6A  Experimental techniques in protein identification and characterisation

14

8

6B  Protein prediction techniques

17

4

6C  Image analysis

11

11

7A  Knowledge of procedural and object-oriented programming languages

16

6

7B  Basic statistics and information theory

21

1

7C  Alignment algorithms

18

4

7D  Biological ontologies

14

8

8     Are there any obvious deficiencies in graduates from bioinformatics courses

10

4

Discussion

From the results provided in table 2, most employers of graduates in bioinformatics expect general knowledge of all bioinformatics techniques; although opinions differed slightly depending on the interest of the contact. For example, some contacts commented that working knowledge of sequencing techniques (question 4B in table 2) is not always a prerequisite; however, some appreciation of the techniques used is expected in order to communicate effectively to biologists and to help solve problems.

When asked to express any concerns regarding graduates from current bioinformatics courses (question 8 in table 2), a general issue was the lack of bioinformaticists coupled with the need for diversity in the workplace. It would seem that generalists and specialists in bioinformatics are required, as well those who apply techniques (computer-literate biologists) and those who develop software (biology-literate computer scientists). Many bioinformatics courses are geared towards biologists, however, it was suggested that graduate skills could be funnelled: many biologists come to bioinformatics with a need to strengthen their computer skills, whereas more computer scientists should be attracted with a view to providing them with the appropriate biological knowledge. As it is likely that MSc courses will produce generalists and PhD research programmes will produce specialists, this project should be aimed at producing generalists and, perhaps, strengthening skills of both biologists and computer scientists.

Other issues highlighted were the need to know the success/reliability of various techniques and to understand results generated. For example, few graduates fully appreciate the difference between BLAST and PSI-BLAST, or the meaning of E-values reported in BLAST results. It was also suggested by one contact that many courses are preoccupied with sequence analysis techniques, leading to a lack of knowledge in other areas: i.e.,  use of enzymology, signal transduction, gene regulation  and histology databases; Hidden Markov Models; neural networks; logic programming; and statistics. Perhaps by focusing on certain skills sets required by postgraduate students, such areas can be introduced into a course without comprising crucial modules.

Conclusion

In order to provide a course suitable for those with a biological or computational background, a syllabus is required that consists of four main categories: informatics, molecular biology, general bioinformatics and new techniques. The first category is required for biologists to strengthen their computer skills; the second to provide computer scientists with adequate biological knowledge; the third to provide postgraduates from all backgrounds with a basic knowledge of bioinformatics techniques; and the fourth to increase awareness of new and upcoming technologies. During the survey, lack of awareness of microarray design and analysis was highlighted; therefore, the last category is necessary to make students, at the minimum, aware of new technologies that may be in place when taking up employment.


Evaluating the Skills Shortage in Bioinformatics

To assess the extent of the current skills shortage in bioinformatics, we wish to learn about the skills portfolio sought by potential employers of today's graduates. Please could you help us by answering the following questions, and feel free to add your own comments:

What do you expect graduates to be familiar with:

1. Databases and database technologies

(a)     Sequence, structure and specialised databases?                                    

(b)     Database management systems?                                                       

(c)     Database development strategies?                                                       

(d)     Specific comments:

2. Sequence analysis

(a)     Molecular evolution?                                                               

(b)     Sequence similarity search tools?                                                              

(c)     Multiple sequence alignment tools?                                                              

(d)     Specific comments:

3. Protein structure

(a)     Secondary, super-secondary and tertiary structure?                                       

(b)     Visualisation tools?                                                                              

(c)     Prediction techniques?                                                            

(d)     Specific comments:

4. Genomics

(a)     Genome structure?                                                                           

(b)     Sequencing techniques?                                                            

(c)     Gene prediction methods?                                                               

(d)     Specific comments:

5. EST analysis

(a)     EST clustering and assembly techniques?                                          

(b)     High-throughput gene expression techniques?                                    

(c)     Microarray design and analysis?                                                         

(d)     Specific comments:

6. Proteomics

(a)     Experimental techniques in protein identification and characterisation?                

(b)     Protein prediction methods?                                                               

(c)     Image analysis?                                                                           

(d)     Specific comments:

7. Biocomputing and statistics

(a)     Knowledge of procedural and object-oriented programming languages?       

(b)     Basic statistics and information theory?                                                

(c)     Alignment algorithms?                                                            

(d)     Biological ontologies?                                                             

(e)     Specific comments:

8. Are there any obvious deficiencies in graduates from current bioinformatics courses?          

Specific comments: