From “reads” to WGS-based first-line influenza and SARS-CoV-2 laboratory surveillance

INSaFLU (“INSide the FLU”) is a bioinformatics free web-based suite that deals with primary data (reads) towards the automatic generation of the output data that are actually the core first-line “genetic requests” for effective and timely viral influenza and SARS-CoV-2 laboratory surveillance (e.g., type and sub-type, gene and whole-genome consensus sequences, variants annotation, alignments and phylogenetic trees). Data integration is continuously scalable, fitting the need for a real-time epidemiological surveillance during the flu and COVID-19 epidemics.

The INSaFLU bioinformatics pipeline currently consists of 6 core steps (see Figure). For more details about INSaFLU, please read Borges V, Pinheiro M et al. Genome Medicine (2018) 10:46. A detailed tutorial of INSaFLU usage, as well as documentation about the current software settings and versions, is provided here.

Read quality analysis and improvement

This first step automatically analyzes and improves the quality of the uploaded raw sequencing data (reads). It generates improved NGS data and graphical quality control reports.

Type and sub-type identification

INSaFLU automatically detects the influenza type and sub-type/lineage of each sample upon data submission. This typing data guides the subsequent downstream module and constitutes an optimal complement to the traditional real-time RT-PCR assays, as INSaFLU is able to discriminate all currently defined 18 hemagglutinin subtypes, 11 neuraminidase sub-types and influenza B lineages.

Variant detection and consensus generation

This module provides the first-line "genetic data" for seasonal influenza laboratory surveillance, i.e. the list of variants (SNPs and indels) and their effect at protein level and also consensus sequences (at both the locus and whole-genome levels). The latter constitutes the whole basis for the downstream phylogenetic inferences driving the continuous tracking of influenza temporal/geographical spread.

Coverage analysis

INSaFLU automatically provides a deep analysis of the vertical and horizontal coverage per sample (for each amplicon), by generating several coverage statistics and also graphical plots of the fluctuation of the depth of coverage throughout each amplicon region. This step enables the inspection of the technical success and the potential unveiling of relevant genetic events, such as reassortment or homologous recombination.


INSaFLU dynamically builds ready-to-explore nucleotide/amino acid sequence alignments and phylogenetic trees at both locus and whole-genome scale. These outputs are automatically re-build and updated as more samples are added to user-restricted INSaFLU projects, making continuous data integration completely flexible and scalable. Alignments and trees can be explored in situ or through multiple compatible downstream applications for fine-tune data analysis.

Intra-host minor variant detection (and uncovering of putative mixed infections)

Influenza evolution has historically been inferred from consensus sequences representing the dominant virus lineage within each infected host at a particular instant, which has limited our knowledge on intra-patient virus population diversity and transmission dynamics. In this context, INSaFLU additionally provides the user the possibility to get insight on the influenza intra-patient sub-population dynamics by the scrutiny and annotation of minor intra-host single nucleotide variants (iSNVs). INSaFLU also automatically flags samples as “putative mixed infections” if more than one type, HA / NA subtype or lineage is detected or if the population admixture enrolls influenza viruses presenting clearly distinct genetic backgrounds (e.g., mixed infection with distinct same-subtype viruses).

Future Directions

INSaFLU is under active development in order to have additional features, such as modules to automatically detect virus reassortment and to perform temporal and geographical data integration and visualization.

How to cite

If you use INSaFLU in your work, please cite this publication:

Borges V, Pinheiro M et al. Genome Medicine (2018) 10:46


INSaFLU development is being co-funded by the European Commission on behalf of OneHealth EJP TELE-Vir project.


We thank the Infraestrutura Nacional de Computação Distribuída (INCD) for providing computational resources for testing the INSaFLU platform. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016


If you have any questions, comments or suggestions, please contact us:

vitor.borges at j.paulo.gomes at