EscherichSupplementary online material for:
From Escherich to the Escherichia coli genome:
how a commensal bacterium shaped the history of modern microbiology

Guillaume Méric1, Matthew D. Hitchings2, Ben Pascoe1,3, Samuel K. Sheppard1,3,4

1Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, UK; 2Swansea University Medical School, Institute of Life Science, Singleton Park, Swansea, UK; 3MRC Cloud Infrastructure for Microbial Bioinformatics (CLIMB) Consortium, UK; 4Department of Zoology, University of Oxford, Oxford, UK.

This companion webpage contains supplementary material to the study of the dissemination and genomics of Escherichia coli strain NCTC86, one of the very first isolates of Theodor Escherich from 1885, published in 2016 in The Lancet Infectious Diseases.

Supplementary methods and analyses (genomics):

From the original stock archived at the NCTC, we obtained the whole genome sequence of NCTC86 using an Illumina MiSeq. The assembly of raw reads was made using Velvet into 314 contiguous sequences for a total assembled length of 4,933,644 bp (N50=32,645bp, N90=9,133bp). The genome sequence was archived in a BIGSdb web-based platform which allowed the export of specific sequences using BLAST [1]. Contiguous sequences were automatically annotated using RAST [2], giving a list of 4,856 predicted coding sequences in NCTC86. A reference pan-genome [3] was created using 12 reference genomes and NCTC86 (Table S1), and the resulting list of 7,608 genes was examined in a gene-by-gene manner [4] in a total of 72 public reference genomes, including NCTC86. All genomes shared 2,600 core genes, from which a concatenated gene-by-gene [4,5] alignment and a neighbour-joining phylogenetic tree were created (Figure S2). Strain NCTC86 shared 4,142 genes (85.3% of its coding genome) with the pig E. coli strain B41 (NCBI accession: NZ_AFAH00000000.2), which was its closest relative from our comparison dataset on a phylogenetic tree (Figure S2). Of these shared genes, 2,688 alleles (64.8% of all shared genes) were identical. There were 35 genes that were found only in NCTC86, amongst which genes possibly involved in maltose/maltodextrin utilisation and mannitol metabolism (Table S2), and possibly contributing to the positive maltose fermentation test performed in 1952 on the strain (Figure S1). Also in the list are some fimbriae and fimbrial assembly precursors (Table S2) that could play a role in host colonisation.

NCTC86 belongs to the common ST-10 within phylogroup A, and its reported resistance to penicillin and erythromycin (Figure S2) was not reflected by the presence of any allele listed in the Comprehensive Antibiotic Resistance Database (CARD) [6] or from the Lahey Clinic comprehensive list (, suggesting that specific mutations or general efflux metabolisms could contribute to this reported resistance. In our study, no gene associated with resistance to synthetic antibiotics such as sulphonamides or quinolones was detected in NCTC86. The isolate did not harbour any gene encoding for Shiga toxin subunits. Moreover, using the pathogenicity-island (PAI) detection server PAIDB v2.0 [7], we found no obvious PAI in the genome NCTC86. Additionally, very few virulence/colonisation-associated factors were detected: (a) the hek outermembrane protein, which is an auto-aggregating adhesin and invasion [8], (b) sfaEFA, from the S-fimbriae biosynthesis cluster, which linked to the attachment of bacteria to the host tissues [9] and (c) the entire vpeRABC locus, responsible for the biosynthesis of a carbohydrate permease that has been linked to fitness and virulence of uropathogenic E. coli (UPEC) [10].

  1. Jolley KA, Maiden MC (2010) BIGSdb: Scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics 11: 595.
  2. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, et al. (2008) The RAST Server: rapid annotations using subsystems technology. Bmc Genomics 9: 75.
  3. Meric G, Yahara K, Mageiros L, Pascoe B, Maiden MC, et al. (2014) A reference pan-genome approach to comparative bacterial genomics: identification of novel epidemiological markers in pathogenic Campylobacter. PLoS One 9: e92798.
  4. Sheppard SK, Jolley KA, Maiden MCJ (2012) A Gene-By-Gene Approach to Bacterial Population Genomics: Whole Genome MLST of Campylobacter. Genes 3: 261-277.
  5. Maiden MC, van Rensburg MJ, Bray JE, Earle SG, Ford SA, et al. (2013) MLST revisited: the gene-by-gene approach to bacterial genomics. Nature Reviews Microbiology 11: 728-736.
  6. McArthur AG, Waglechner N, Nizam F, Yan A, Azad MA, et al. (2013) The comprehensive antibiotic resistance database. Antimicrob Agents Chemother 57: 3348-3357.
  7. Yoon SH, Park YK, Kim JF (2015) PAIDB v2.0: exploration and analysis of pathogenicity and resistance islands. Nucleic Acids Res 43: D624-630.
  8. Fagan RP, Smith SG (2007) The Hek outer membrane protein of Escherichia coli is an auto-aggregating adhesin and invasin. FEMS Microbiol Lett 269: 248-255.
  9. Balsalobre C, Morschhauser J, Jass J, Hacker J, Uhlin BE (2003) Transcriptional analysis of the sfa determinant revealing mmRNA processing events in the biogenesis of S fimbriae in pathogenic Escherichia coli. J Bacteriol 185: 620-629.
  10. Martinez-Jehanne V, Pichon C, du Merle L, Poupel O, Cayet N, et al. (2012) Role of the vpe carbohydrate permease in Escherichia coli urovirulence and fitness in vivo. Infect Immun 80: 2655-2666.

Supplementary figures and tables:

Figure S1. Reproduction of the earliest Batch Records for strain NCTC86 at the National Culture Type Collections.

FigureS1 (1).jpg

Figure S2. Phylogenetic context of strain NCTC86. The tree has been constructed using a neighbour-joining algorithm, based on gene-by-gene whole-genome alignments of strain NCTC86 with 71 other publicly-available representative genomes. The scale indicates the number of substitutions per sites.


Table S1. Genomes used to construct a reference pan-genome used in this study.


  1. Blattner FR, Plunkett G, 3rd, Bloch CA, Perna NT, Burland V, et al. (1997) The complete genome sequence of Escherichia coli K-12. Science 277: 1453-1462.
  2. Rasko DA, Rosovitz MJ, Myers GS, Mongodin EF, Fricke WF, et al. (2008) The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. Journal of Bacteriology 190: 6881-6893.
  3. Crossman LC, Chaudhuri RR, Beatson SA, Wells TJ, Desvaux M, et al. (2010) A commensal gone bad: complete genome sequence of the prototypical enterotoxigenic Escherichia coli strain H10407. J Bacteriol 192: 5822-5831.
  4. Cress BF, Linhardt RJ, Koffas MA (2013) Draft Genome Sequence of Escherichia coli Strain Nissle 1917 (Serovar O6:K5:H1). Genome Announc 1: e0004713.
  5. Hayashi T, Makino K, Ohnishi M, Kurokawa K, Ishii K, et al. (2001) Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. DNA Res 8: 11-22.
  6. Touchon M, Hoede C, Tenaillon O, Barbe V, Baeriswyl S, et al. (2009) Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. Plos Genetics 5: e1000344.
  7. Welch RA, Burland V, Plunkett G, 3rd, Redford P, Roesch P, et al. (2002) Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci U S A 99: 17020-17024.
  8. Yang F, Yang J, Zhang X, Chen L, Jiang Y, et al. (2005) Genome dynamics and diversity of Shigella species, the etiologic agents of bacillary dysentery. Nucleic Acids Res 33: 6445-6458.
  9. Jin Q, Yuan Z, Xu J, Wang Y, Shen Y, et al. (2002) Genome sequence of Shigella flexneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157. Nucleic Acids Res 30: 4432-4441.

Table S2. Genes specifically detected in E. coli strain NCTC86 and not in 71 other reference E. coli genomes. Annotations and predicted functions are based on automatic annotations using the RAST server [18]. FIGfam codes are as inferred by RAST, and defined as predicted similar functions for the predicted encoded protein for genes having the same FIGfam code. RAST evidence codes are a summary of the functional groups of the predicted gene functions.


Data deposition:
The genome sequence of E. coli NCTC86 is available on the NCBI repository (BioProject: PRJNA312020; Biosample: SAMN04492850).

Work in the Sheppard laboratory is supported by grants from the Wellcome Trust and the Medical Research Council (MRC). Genome sequencing was supported by the A4B project funded by the Welsh Assembly. G.M. is supported by a NISCHR Health Research Fellowship (HF-14-13). We thank Trevor Hince and Naomi King at the Lister Institute for their very kind help and access to archives. We thank Julie Russell for discussions and access to the archives of the NCTC labs.

Complete list of references cited in the manuscript:

  1. Escherich T (1885) Die Darmbakterien des Neugeborenen und Säuglings. Fortschr Med 3: 515-522; 547-554.
  2. Durham HE (1901) Some theoretical considerations upon the nature of agglutinins, together with further observations upon Bacillus typhi abdominalis, Bacillus enteritidis, Bacillus coli communis, Bacillus lactis aerogenes, and some other bacilli of allied character. J Exp Med 5: 353-388.
  3. MacConkey A (1900) The differentiation and isolation from mixtures of the Bacillus coli communis and Bacillus typhosus by the use of sugars and the salts of bile. In:
  4. Boyce R, Sherrington CS, editors. The Thompson Yates Laboratories Report: The University Press of Liverpool.
  5. Chick H (1901) The distribution of B. coli commune, Part II. In: Boyce R, Sherrington CS, editors. The Thompson Yates Laboratories Report: The University Press of Liverpool.
  6. Macconkey A (1905) Lactose-Fermenting Bacteria in Faeces. J Hyg (Lond) 5: 333-379.
  7. MacConkey AT (1908) Bile Salt Media and their advantages in some Bacteriological Examinations. J Hyg (Lond) 8: 322-334.
  8. G.F.P. (1931) Obituary. Dr. A.T. MacConkey. Nature 127: 980-981.
  9. Chick J (2011) War on Disease: a History of the Lister Institute. BMJ 343: d8209.
  10. Boycott AE (1906) Observations on the Bacteriology of Paratyphoid Fever and on the Reactions of Typhoid and Paratyphoid Sera. J Hyg (Lond) 6: 33-73.
  11. Bainbridge FA (1911) The Action of Certain Bacteria on Proteins. J Hyg (Lond) 11: 341-355.
  12. Russell JE (2014) Historical Perspectives. The National Collection of Type Cultures. Microbiologist: Society for Applied Microbiology.
  13. Castellani A, Chalmers AJ (1919) Manual of Tropical Medicine, 3rd edition: William Wood and Company, New York.
  14. Friedmann HC (2006) Escherich and Escherichia. Adv Appl Microbiol 60: 133-196.
  15. Lederberg J, Tatum EL (1946) Gene recombination in Escherichia coli. Nature 158: 558.
  16. Susman M (1995) The Cold Spring Harbor Phage Course (1945-1970): a 50th anniversary remembrance. Genetics 139: 1101-1106.
  17. Benzer S (1961) On the Topography of the Genetic Fine Structure. Proc Natl Acad Sci U S A 47: 403-415.
  18. Blattner FR, Plunkett G, 3rd, Bloch CA, Perna NT, Burland V, et al. (1997) The complete genome sequence of Escherichia coli K-12. Science 277: 1453-1462.