16.2: Introduction to Integration of Systems - Biology

16.2: Introduction to Integration of Systems - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

What you’ll learn to do: Discuss how different body systems interact with one another

As we’ve learned our bodies are complicated systems made up of cells, tissues, organs, and organ systems. In this section, we’ll learn how systems work together, and we’ll learn about a few essential life functions that require work from multiple body systems.

Systems Biology - A Pivotal Research Methodology for Understanding the Mechanisms of Traditional Medicine

Systems biology is a novel subject in the field of life science that aims at a systems’ level understanding of biological systems. Because of the significant progress in high-throughput technologies and molecular biology, systems biology occupies an important place in research during the post-genome era.


The characteristics of systems biology and its applicability to traditional medicine research have been discussed from three points of view: data and databases, network analysis and inference, and modeling and systems prediction.


The existing databases are mostly associated with medicinal herbs and their activities, but new databases reflecting clinical situations and platforms to extract, visualize and analyze data easily need to be constructed. Network pharmacology is a key element of systems biology, so addressing the multi-component, multi-target aspect of pharmacology is important. Studies of network pharmacology highlight the drug target network and network target. Mathematical modeling and simulation are just in their infancy, but mathematical modeling of dynamic biological processes is a central aspect of systems biology. Computational simulations allow structured systems and their functional properties to be understood and the effects of herbal medicines in clinical situations to be predicted.


Systems biology based on a holistic approach is a pivotal research methodology for understanding the mechanisms of traditional medicine. If systems biology is to be incorporated into traditional medicine, computational technologies and holistic insights need to be integrated.

ELIXIR Omics Integration and Systems Biology – Online

The National Bioinformatics Infrastructure Sweden (NBIS) / ELIXIR Sweden is pleased to announce the workshop in Omics Integration and Systems Biology. This workshop is open for PhD students, postdocs, group leaders and core facility staff from European institutions looking for an introduction to multi-omics integration and systems biology approaches.

This workshop will include lectures and hands-on exercises from NBIS / Scilifelab experts from Stockholm, Lund and Gothenburg, as well as guest sessions from:

    , PhD, Babraham Institute, United Kingdom , PhD, Melbourne University, Australia , PhD, EMBL-EBI, United Kingdom , PhD, DTU Technical University of Denmark, Denmark

More information to come on the course website.

Important Dates

  • Application opens: 3 June
  • Application closes: 9 August
  • Confirmation to accepted students: 16 August


This online training event has no fee. However, if you accept a position at the workshop and do not participate (no-show) you will be invoiced 2000 SEK.

* Please note that NBIS cannot invoice individuals.


The aim of this workshop is to provide an integrated view of data-driven hypothesis generation through biological network analysis, constraint-based modelling, and supervised and unsupervised integration methods. A general description of different methods for analysing different omics data (e.g. transcriptomics and genomics) will be presented with some of the lectures discussing key methods and pitfalls in their integration. The techniques will be discussed in terms of their rationale and applicability. The course will also include hands-on sessions and several seminars by invited speakers.

Some of the covered topics include:

  • Data pre-processing and cleaning prior to integration
  • Application of key machine learning methods for multi-omics analysis including deep learning
  • Multi-omics integration, clustering and dimensionality reduction
  • Biological network inference, community and topology analysis and visualization
  • Condition-specific and personalized modeling through Genome-scale Metabolic models for integration of transcriptomic, proteomic, metabolomic and fluxomig data
  • Identification of key biological functions and pathways
  • Identification of potential biomarkers and targetable genes through modeling and biological network analysis
  • Application of network approaches in meta-analyses
  • Similarity network fusion and matrix factorization techniques
  • Integrated data visualization techniques

Further details about the course content may be found on the course website.

Entry requirements

This is a course is open for PhD students, postdocs, group leaders and core facility staff from European institutions. Please note that NBIS training events do not provide any formal university credits. If formal credits are crucial, the student needs to confer with the home department before submitting a course application in order to establish whether the course is valid for formal credits or not.

Practical exercises can be performed using R or Python, so we only accept students with previous experience in one of those programming languages. We will not discuss how to process specific omics, and the students are referred to other NBIS courses for this matter.

  • Basic knowledge in R or Python
  • Basic understanding of frequentist statistics
  • A computer with web camera, Zoom, and permissions for installing software.
  • Experience with analysis of NGS and other omic data
  • Completing NBIS courses “Introduction to Bioinformatics using NGS data” and “Introduction to biostatistics and machine learning”
  • Basic conda and git knowledge

This workshop can accommodate a maximum of 25 participants. If we receive more applications, participants will be selected based on several criteria including entry requirements, motivation to attend the course as well as gender and geographical balance.


First-generation bioinformatics solutions for data integration typically employ specific, non-generalizable, non-modular approaches to translate data from one format into another. This means writing programs to parse, extract and transform necessary data for each particular application. The second-generation of data integration solutions provide a more structured environment for code-reuse and more flexible, scalable, robust integration. They can roughly be divided into two major categories according to access and architecture: the data warehousing approach, and the federated approach. 3 , 5

The data warehouse approach copies data sources into a centralized system with a global data schema and an indexing system for integration and navigation. They require reliable operation and maintenance, and fairly stable underlying databases. Examples of the warehousing approach include UCSC Genome Browser 13 , EnsEMBL Database Project 6 , and AllGenes 14 .

The federated approaches do not require a centralized persistent database, and thus the underlying data sources remain autonomous. The federated systems maintain a common data model and rely on a schema mapping to translate heterogeneous database schema into the target schema for integration 5 . The advantage of the federated approach is its flexibility, scalability and modularity. Examples of federated systems include TAMBIS 9 , ACEDB 15 , Kleisli 7 , OPM 8 and DiscoveryLink 16 .

Each of these “general purpose” data integration systems has its own strength, however it hasn’t been shown whether these systems can be effectively applied in research domains other than molecular biology. Although the SenseLab 17 project at Yale University has been successful in integrating multidisciplinary neuroscience data at the genetic, protein, cellular and circuit levels, it is not a “general purpose” system in that all data sources were pooled into a single EAV/CR database system, and therefore cannot be easily reconfigured for use by diverse research groups. Further discussion of the BioMediator architecture can be found in the Architecture section of our 2004 IIWeb paper 12 .


The EU FP6 BioBridge Systems Medicine project focused on the integration of genomics and chronic disease phenotype data with modelling and simulation tools for clinicians to support understanding, diagnosis and therapy of chronic diseases. We have configured and extended the generic BioXM knowledge management environment to create the knowledge base for this translational system biology approach, focusing on chronic obstructive pulmonary disease (COPD) as an initial use case.

Data model configuration

In general within BioXM a particular scientific area of interest is semantically modelled as a network of related elements (see Table 1 for a list of fundamental semantic concepts available in BioXM). While there is some agreement throughout the life sciences regarding a number of semantic objects such as gene or phenotype the different communities such as clinical research, virology, plant research or synthetic biology differ on the concepts and definitions they use. Not only will a plant related knowledge base require the object plant instead of patient, a vector in virology might describe an infectious agent while in synthetic biology it will more likely define a DNA expression shuttle. Different ontology development initiatives (e.g., try to develop a consensus on these issues but currently no overarching "life science" data model can be defined. Therefore one of the main features of BioXM is the ability to dynamically create a data model specific to the project for which it is used and, while a project develops, easily adapt the data model based on the consensus between the project stakeholders.

Within BioBridge we had initial discussions between the project partners (clinicians, experimentalists and modellers) about the kind of knowledge that needed to be represented (see Table 2 for the resulting list) as well as how this representation should be conceptualised. This is a typical first step for a knowledge management project which usually provides only an initial, rough idea of the final goal that will need many iterations to reach. Setting-up and changing the data model is done directly within the data model graph (Figure 1) using a context menu to create and edit the fundamental semantic objects (Figure 1B). From the "new" function the basic semantic concepts (element, relation, annotation, experiment, context, ontology, external object) become available to generate for example an element gene. By selecting existing object definitions they can be edited, connected by new or existing relations, assigned an annotation or assigned into a context. We created elements such as "gene", "protein", "patient", "compound" and a total of 15 element types sufficed to describe the clinical and biological knowledge relevant to chronic disease (see additional file 1 for a full data model XML export). Information about the interconnection of entities is captured in relations such as "is medicated by" for the compound a patient receives as medication or "regulates" for a protein that regulates expression of a gene. The initial data model was configured within one week and subsequently many iterations extended and adapted the model while the knowledge base was already being populated and in productive use. In total within the current BioBridge COPD knowledge base we used 82 relation types to capture the details of semantic connections between element types, ontology terms, experiments or sub-networks. To describe assemblies of entities or relations with some common feature e.g. a signalling pathway we defined sub-networks as contexts. 16 types of contexts where used to capture for example SBML-based simulation models [32], KEGG pathways [33] or inflammatory processes involved in COPD derived by literature mining. All "semantic concepts" (such as elements, relations, contexts or ontology ) can be associated by annotations with information such as age, weight and gender for a patient, function for a gene or experimental evidence for a protein-protein interaction. Annotations are based on freely definable forms and support hierarchical organisation of information (nested annotation forms). Multiple semantic objects can share annotation to imply relationships. Within BioBridge we defined 61 annotation forms with 892 attributes to provide for example, electronic case report forms for anthropometric, diagnostic, physiologic and questionnaire data. Experimental data is seen conceptually as special, performance optimised annotation. We defined seven experiment formats covering data types such as transcription, metabolite or enzyme kinetics. BioXM supports the conceptualisation of entire areas of interest by using ontologies, which can be used to infer facts and construct abstract queries. 19 different pre-existing ontologies such as GO or the NCI-thesaurus were integrated into the BioBridge data model. The graphical data model builder and the context menu used for the configuration of the BioBridge specific data model is shown in Figure 1. Within BioBridge we focus on the configuration of a data model suited to knowledge, clinical and experimental date around COPD, other instances of BioXM have been configured to cover different diseases or indeed fields of life science research such as enzyme biotechnology or synthetic biology, underscoring the general applicability of the semantic data model configuration process.

Populating the data model

The model described above enabled us to semantically integrate existing public databases and information derived from the literature with clinical and experimental data created during the BioBridge project.

To populate the knowledge base with data from public databases with large sizes and regular updates we mainly use virtual objects with a manual mapping of the object concept into the data model. For sources without appropriate interfaces or weak performance and for project internal data and knowledge we manually generated mappings in import-templates. A graphical wizard for import-template generation provides a selection of possible import options and mappings. Only applicable objects of the data model are presented for example after defining an object as gene only relations enabled for gene are available for the next mapping step. The available selection automatically adapts to any change in the data model configuration. From this selection the import operations are assembled by drag-and-drop to provide the mapping for a given data source forming an import script which can be saved and re-used (see Figure 5). For sources which are imported and provide regular updates a scheduling system is used to define automatic execution of the corresponding access and import methods.

Populating the data model. Based on the given data model, the import wizard provides the selection of available import operations in the left frame. These are moved by drag-and-drop into the right frame where they form the import script which provides the mapping information between a data source and the data model. Here two elements from the data source are defined as type "Protein" and are referenced by their UniProt IDs. The relation between the two proteins is a "Protein interaction" from the Reactome data source and the associated evidence is stored as annotation.

Mapping a resource to the data model requires expertise about the semantic concept of the resource and the configured BioXM data model. To integrate the individual entities of a data source semantically the mapping method for the entities need to be defined. If available, BioXM makes use of namespace based standard identifiers, existing cross-references and ontologies for the population of the data model. In most cases the semantics of a given data source are not (yet) described in machine readable form and the initial mapping template needs to be generated manually. The BioXM core framework is extended with pre-defined semantic mappings currently existing for about 70 public data resources and formats (see additional file 2). In addition text-mining and sequence similarity (BLAST) based mappings are enabled, however users need to be aware of the pitfalls of these methods as no automatic conflict resolution is attempted.

For the COPD knowledge base we use entities, references and ID mappings provided by EntrezGene [34], Genbank [35], RefSeq [36], HGNC [37], ENSEMBL [38], UniProt [39] and EMBL [40] to populate the system with instances of genes and proteins from human, mouse and rat. Starting with EntrezGene we create gene instances and map entities from the other sources iteratively by reference. For each database the quality of the references to external sources needs to be judged individually and correspondingly be constrained against ambiguous connections. UniProt protein entries for example provide references to DNA databases, with some references pointing to mRNA which allows the corresponding gene to be identified uniquely, and others pointing to contigs and whole chromosomes with multiple gene references. Use of references in this case therefore is constrained to the target entry type "mRNA". A new instance is generated for each database entry from the corresponding organisms which can not be mapped to an existing instance by ID reference no name based mapping or name conflict resolution is attempted at this stage. As the knowledge base develops iterative rounds of extension occur with additional data sources. Based on non-ambiguous identifiers we map additional information from the sources described below with mappings being extended, removed and remapped during each updating round.

Generating an import template using the import wizard requires no software development knowledge and for many sources only takes minutes (e.g. for protein-protein interaction data which uses UniProt accessions to unambiguously identify the protein entities and the Molecular Interaction Ontology [41] to describe the interaction type and evidence). However, integration can also take up to a week of software development if extensive parsing and transformation of a complex data source such as ENSEMBL is required.

Naming conflicts and lack of descriptive, structured metainformation are the main reasons for the lack of semantic integration in the life sciences, issues that are as much technological as sociological. The use of a structured knowledge management tool within BioBridge ensured all newly produced data makes use of unique identifiers and provides extensive, structured metainformation. This semantic integration and standardisation fostered data exchange as well as social interactions within the project, which are a pre-requisite for translational systems biology projects and their highly diverse multi-subject expert teams. In addition the semantic integration greatly simplifies the future sharing of the produced data as it is immediately available in semantic form.

For the import of data several formats are supported from simple manually mapped delimiter formats such as tab-delimited to XML formats with potentially fully automatic semantic mapping like Pedro [42], SBML or OWL. If machine readable metainformation is provided, such as MIRIAM references in SBML, they are used to automatically map the imported entities to existing instances of semantic objects. In the current version, the knowledge base integrates more than 20 different public databases (see Table 2) representing a total of 80 793 relevant genes (30 246 human, 27 237 mouse and 23 310 rat), 1 307 pathways, 78 528 compounds with related gene/disease information, 1 525 474 protein interactions and the entire Gene Expression Omnibus and PubChem databases resulting in a total of 3 666 313 connections within the knowledge network. In addition two BioBridge specific datasets, 54 inflammation and tissue specific pathways and 122 COPD and exercise specific metabolite and enzyme concentrations and activities were manually curated from the literature within the project. The pathway curation followed a standard text-mining supported process as described for example in [43] while the enzyme concentration and activity curation was fully manual due to the small set of available relevant publications. To our knowledge BioBridge thus provides the first semantically integrated knowledge base of public COPD-specific information. In addition the resource will be continuously extended as more COPD specific data becomes publicly available e.g. the experimental data generated within BioBridge (160 pre- and post-training expression, metabolite and proteomics data sets) will become publicly available as soon as the consortium has generated an initial analysis of the data. Currently the COPD knowledge base contains almost 10 million experimental result data of which almost 6 million come from public data. In other projects we are currently using BioXM with several hundred million data points on networks with tens of million edges and nodes, showing that the approach scales for at least two more orders of magnitude (unpublished data).

Browse, query and retrieve

Here we provide an overview of the available functionality, for a detailed step-by-step tutorial of the knowledge base please see the additional file 3: Step_by_Step_Tutorial_BioXM.pdf. When accessing the BioBridge portal you will have to register to access the knowledge network (the registration is required to sustain funding support and enable personalisation of the interface, no further use of personal data will be made). Then access the knowledge base by following the BioXM links. The BioXM user interface (Figure 6) provides a visually driven query system with which information can be browsed from a network graph (see Figure 2B), based on pre-defined queries (purple smart folders in the navigation tree, see Figure 2A) or by interactive query generation (a detailed application example is described below).

BioXM graphical user interface overview. The BioXM graphical user interface (GUI) consists of three frames. A Navigation bar provides the functions for importing, managing, reporting and searching data. A project and repositories frame to the left, allows all data available to a user to be accessed in the repositories section and the data to be organised in a user and project specific way in the projects section. A right frame, is used to display detailed information about any object selected in the left frame.

Users visually browse and query the network simply by right-clicking on any focus of interest (e.g. a gene, a patient or a protein-protein interaction) so that associated entities can be added to the existing network visualisation. The corresponding context menu is dynamic, offering all those entities for selection which, based on the data model, are directly associated with the initial focus (i.e. one step in the network). A researcher could, for example, expand from a gene to include its relationships with diseases. From the disease association it may be of interest to identify patients represented in the gene expression database who share that particular diagnosis. Entities distanced by more than one step in the data model can be associated with each other by complex queries which transverse several nodes within the graph and aggregate information to decide whether a connection is valid. These complex queries are transparent to the user who executes them as part of the graphical navigation when asking for "associated objects" (see below for query construction). Graph based navigation will become difficult in terms of visualisation layout and performance beyond several thousand objects.

The intuitive graphic query system therefore is supplemented by a more complex wizard that allows dynamic networks to be created by in depth, structured searches, which combine semantic terms that are dynamically pre-defined by the data model (see Figure 3 for the query wizard and Figure 2A for a resulting network. The additional files 3 and 4 provide details and example data on how to create a query). The query construction is natural language like and thus allows to generate complex searches without knowledge of special query languages such as SQL or SPARQL but some knowledge about the data model must be acquired to work efficiently with the wizard. A search for all patients diagnosed with COPD severity grade above 2 but no cancer which have low body mass index for example would read: "Object to find is a Patient which simultaneously is annotated by Patient diagnostic data which has GOLD attribute greater than 2 and is annotated by Patient Anthropometrics which has BMI-BT attribute less than 18 and never is diagnosed with a NCI Thesaurus entry which is inferred by ontology entry which has name like '*cancer*'". A query can be saved as a "smart folder" or query template for re-use and thus allows experienced users to share their complex queries with less frequent users. For saved queries "Query variables" can be defined so generic smart folder queries can be adapted to specific question. In the example above the actual parameters for COPD severity grade, BMI and diagnosed disease might be set as variables for other users to change. Normal folders (yellow) allow users to organise data manually in their private space by drag-and-drop e.g. to create a permanent list of "favourite genes" or a specific pathway. In contrast the content of "smart folders" is dynamic, as it is actually a query result, which immediately updates whenever changes in the content of the knowledge base occur. Defining a query takes between seconds for simple "search all compounds used as medication" type questions to tens of minutes for complex questions which traverse the full connectivity of the semantic network. In the same way performance of query execution directly depends on the complexity of the query. Queries traversing many connections in the semantic network with entwined constrains may take several minutes to execute while simple queries even with millions of results return within seconds.

All individual entries, contents of folders and query results are organized in configurable reports presenting the initial element and any desired associated information. To reflect differences in interest and focus for different users multiple reports can be defined for any object for example providing a quick overview of patient laboratory data for the clinician while another patient report provides the expression data for the data analyst. Reports can be exported both in tabular and XML format. Reports can transparently integrate external applications (e.g. the statistical programming environment R) in order to derive visualisation and/or further analysis of the data (Figure 7). The report is based on individual "view items" defined with the same type of wizard as for query generation and import template generation.

Report with integrated external application result. This result view shows the first three significantly different pathways between sedentary and trained healthy people. The 3 D scatterplot on the top visualises the PC1 for each experiment as spot in the 3 D space with the KEGG pathways as dimensions. Green (experiment group 1, here pre-training) and red (experiment group 2, here post-training) spots clearly occupy two different regions of the plot, indicating differences. The significance of the differences is visible in the tabular report where the first column provides the name of the pathway. Column 2 and 3 list the PC1 values for each of the associated experiments in group 1 and 2. Columns 4 and 5 show the overall PC1 mean of the pre- and post-training data. The following columns list the t-, p- and adjusted p-value respectively.

Therefore configuring reports does require no software development skill but is based on an understanding of the data model configuration. As with the query and import template wizards, view items are drawn from the data model using functions such as "related object", "assigned annotation" or "query result" which can be further restricted to specific types such as "relation of type protein expression". Configuring a new report on average takes only minutes but, as reports can contain query results, can also take tens of minutes if a new complex query needs to be defined. Reports defined for an object like gene can be re-used as "nested reports" wherever a gene type object is included as view item in another report allowing complex reports to be assembled from simple units.

Report display performance directly depends on the configuration and takes between seconds and several minutes. Simple, fast reports depend on directly related information e.g. a gene report which brings together Sequence Variant, gene-disease and gene-compound information. Complex, slow reports integrate queries to traverse the semantic network and pull together distant information e.g. the medication for the patients for which a given gene was upregulated.

View items can also be used within "information layers" which visualise the information directly on top of a network graph by changing size and colour of the displayed objects. To define the information layer ranges of expected numerical or nominal values in the view item are assigned to colour and size ranges for the graphical object display. In a simple case this is used to display expression data on top of a gene network but based on using query results as view items it can also be used to display the number of publications associated with a gene - phenotype association. Within the graph information layers are executed for every suitable object displayed and depending on the complexity of the defined view item the generation of the overlay can take between seconds and several minutes.

Application case

The BioBridge knowledge base implementation enables the integrative analysis of clinical data e.g. questionnaires, anthropometric and physiologic data with gene expression and metabolomics data and literature derived molecular knowledge. The knowledge base is currently used by data analysis and modelling groups within BioBridge to extend literature-derived, COPD-specific molecular networks with probabilistic networks derived from expression data (method described in [44]). Output of the probabilistic networks together with expression and metabolomics data is then used to tune mathematical models of the central metabolism ([45]) for COPD specific simulations.

As an example use case we briefly describe the initial search for connections between molecular sub-networks affected by exercise, which is generated as a starting point of the BioBridge investigation (for detailed descriptions of this and further use cases see the associated Step-by-step tutorial). To this end we searched the expression data in the COPD knowledge base to retrieve studies involving patients diagnosed with diseases affecting muscle tissue or involving "exercise" as treatment. Based on these experiments we used the R interface to conduct a principal component analysis to extract the KEGG pathways most strongly affected by expression changes (R scripts provided in additional files 5 and 6). Interestingly the affected pathways are mainly associated with tissue remodelling and signal transduction pathways. Using the enzyme and compound concentration and kinetic measurements extracted from key manuscripts on muscle dysfunction in COPD and training effects we find that a number of compounds and proteins involved in key pathways with altered expression derived from the principal component analysis show significant changes in concentration/activity in the published literature. Therefore independent support from existing knowledge is gained from the integrated COPD knowledge resource for the statistical analysis results of expression data. For these compounds and proteins we use the network search algorithm to search the entire COPD knowledge network but restrict the allowed connections to human genetic interaction, protein-protein interaction, gene-compound or gene-disease interaction (see Table 2 for the individual sources mapped to each of these relation types). The resulting network connects inflammatory and metabolic processes affected in COPD patients (see Figure 8A and additional file 7: Knowledge_network_table.pdf). Visualising the number of disease specific pathways associated with each of the nodes as information layer in this network immediately provides one aspect of the potential weight of the individual nodes for future investigation (see Figure 8B) with TNF-receptor associated factor 2 and 6 showing the highest relevance in this respect for all newly connected nodes. The definition of the network search can be changed to include multiple additional information types e.g. drug-relations. Weighing of evidence is achieved by specifying penalty weights in the network search parameter set for each type of relation searched. Additionally query results provide further evidence directly visualised as information layer on top of the network graph to change size and colour of the displayed objects. Within the query all attributes available in the knowledge base can be used as described in the previous section, from the number of independent literature occurrences supporting a gene-disease relation to the category of the experimental evidence supporting a protein-protein interaction to derive an informed weighted list of further investigation targets. Different information layers can be defined to overlay different types of information such as quantitative experimental data from gene expression or qualitative text-mining results. The decision about what kind of evidence should be weighted in which way is depending on the individual type of data considered, we do currently not provide automatic scoring or ranking algorithms. However, based on the R plug-in or the API, R-scripts or external algorithms can be integrated to calculate corresponding scores. Due to the fact that information layer can integrate complex queries for many objects this is a performance critical function and may take several minutes to execute.

Network connecting inflammation with central metabolism. A) Individual nodes (compounds and proteins) are connected based on the shortest path algorithm. The initial compounds and proteins (marked in yellow) have been selected based on their involvement in pathways detected by PCA-pathway analysis and their significantly different concentration/activity in the COPD or training specific literature. Proteins and compounds are connected by following all possible relations within the COPD knowledge network which results in the most parsimonious network revealing a putative mechanistic connection between inflammation related processes and the central metabolism. B) Visualisation of a disease specific pathway association as a proxy for possible protein importance in a disease mechanism. Each protein in the network is queried for the number of associated disease specific pathways manually curated from the literature into the knowledge base. Numbers are visualised from light red (view pathways) to deep red (high number of pathways associated).


From Lamarck, 1802: Biology: this is one of the three divisions of terrestrial physics it includes all which pertains to living bodies and particularly to their organization, their developmental processes, the structural complexity resulting from prolonged action of vital movements, the tendency to create special organs and to isolate them by focusing activity in a center, and so on.

From Treviranus, 1802: The objects of our research will be the different forms and phenomena of life, the conditions and laws under which they occur and the causes whereby they are brought into being. The science which concerns itself with these objects we shall designate Biology or the Science of Life.

SOURCE: As translated by William Coleman in Biology in the Nineteenth Century: Problems of Form, Function, and Transformation (1971), p. 2.

Despite arguments for the unity of the increasingly diverse biological sciences, controversies and debates erupt between biologists about fundamental concepts in the biological sciences. Differences are especially pronounced between more reductionistic, physicalist, laboratory-driven, and experimental sciences such as molecular biology and biochemistry and more integrative, field-oriented, observational, and historical sciences such as evolutionary biology and ecology. In the mid-1960s, university biology departments became divided over differences in conceptual foundations, goals, methodology, philosophy, and scientific style. As a result, at locations such as Harvard University, departments of biology formally divided into departments of molecular biology and organismic biology, an area defined as an integrative approach to the biological sciences that includes a strong historical and ecological component. Roughly at this time ecology&mdasha science of enormous heterogeneity drawing on a range of approaches, practices, and methodologies and rooted in questions pertaining to adaptive responses to varying environments&mdashbecame integrated with evolutionary approaches and instituted in departments of ecology and evolution. Often located within ecology and evolution departments are systematics and biodiveristy studies, a newer area concerned with biodiversity, including classification and conservation.

In 1961 the evolutionary biologist, historian, and philosopher Ernst Mayr, reflecting on some of these growing differences between biologists, provocatively suggested that biology in fact comprises two sciences. The first is a biology based on proximate causes that answers questions of function (molecular biology, biochemistry, and physiology). The second is a biology based on ultimate causes that seeks historical explanation (evolutionary biology, systematics, and the larger discipline of organismic biology). While the biology of proximate causes is reductionistic and physicalist, the biology of ultimate causes is historical and is characterized by emergent properties. Much of Mayr's reflections on the structure of the biological sciences has formed the backbone of the history and philosophy of biology and has made its way into some textbooks in the biological sciences. While vitalism is no longer tenable in biology, there is considerable support for the belief that complex properties emerge from simpler strata in biology and for the idea that such emergent properties are useful in explaining life.

The Interventions

Only Systems Interventions Will Work

As explained above, type 2 diabetes is multifactorial and thus requires systems interventions. This implies that all underlying causes are quantified and addressed in a personalized and (chrono-)logical order. These causes fall into categories spanning the range of biological, psychological, sociological/environmental, and spiritual domains. Interventions likely fail if one or more of these domains are not properly addressed, or when the interaction between each of these domains is not well understood. It also means to go beyond the treatment of symptoms, but rather to understand the syndemics behind health problems (45). Each type of intervention is addressed in detail below. Some common features that should mark all interventions include

1. The efficacy of a systems approach is based on its individual components and a tailored analysis of the best combination of components (15).

2. Interventions for lifestyle-related diseases need to be based on “self-empowerment.” Most externally imposed interventions are not sustainable. This is the case for the large majority of interventions, with the LookAHEAD study as prime example (46).

3. Interventions always aim to improve flexibility and/or resilience. This holds for both the biology/physiology and psychosocial aspects.

Dietary Interventions

(Chronic) Calorie Restriction

There is no doubt that caloric restriction and weight loss ameliorate metabolic anomalies in patients with T2D (47, 48). Indeed, loss of 5% of bodyweight or more reduces HbA1c, lipoprotein levels, and blood pressure. Restricting energy intake to 600 kcal/day for 8 weeks normalizes beta-cell function and hepatic insulin sensitivity in obese type 2 diabetics, coinciding with reduction of hepatic and pancreatic fat content (23, 49, 50). These data clearly indicate that type 2 diabetes is a reversible disease, which can be cured by appropriate dietary measures (although the genetic predisposition obviously never disappears). Interestingly, upon publication of the data, many T2D patients reported similar effects of very low calorie intake in their daily life practice, demonstrating that the disease can also be reversed in a non-research self-empowerment setting (51). (Severe) caloric restriction is difficult to sustain, if not plain deleterious in the long run. Thus, the 𠇍iRECT” study was designed as a 4-year demonstrator of the above-mentioned caloric restriction, where the treatment of 149 type 2 diabetes patients consisted of withdrawal of antidiabetic and antihypertensive drugs, total diet replacement (񾡐 kcal/day formula diet for 3𠄵 months), stepped food reintroduction (2𠄸 weeks), and structured support for long-term weight loss maintenance. The first-year results are published and show remission of type 2 diabetes (i.e., HbA1c below 6.5% without medication) in 48% of the subjects, while the control group (standard care) showed 4% remission. Interestingly, the remission was associated with weight los, with subjects losing more than 15 kg showing 86% remission (52).

(Intermittent) Fasting

Fasting, including caloric energy restriction and different intermittent fasting regimes, has been shown effective in weight loss, improving insulin sensitivity, and decreasing cardiovascular risk in both non-diabetic and diabetic subjects (53�). A study by Halberg et al. in which participants followed an alternate day fasting scheme while maintaining body weight and fat mass, still demonstrated increased insulin-mediated whole body glucose uptake rates, insulin-induced inhibition of adipose tissue lipolysis, and increased plasma adiponectin levels (57). This suggest that the positive effects of intermittent fasting are not solely attributable to weight loss, but are also driven by other mechanisms enhancing metabolic/phenotypic flexibility. The profound metabolic benefits of intermittent and periodic fasting have been well documented in preclinical experiments. For example, alternate day fasting (i.e., complete fasting for 24 h alternated with 24 h periods of ad libitum intake) fully reverses the high insulin and glucose levels of db/db diabetic mice to normal, despite similar overall food intake and stable bodyweight compared to ad libitum fed animals (58). However, fasting every other day is probably even less realistic than chronic calorie restriction and its effects in humans are poorly understood (59). The periodic and prolonged use of the so-called fasting-mimicking diets (FMDs) may offer an effective and safe alternative for the treatment of T2D in humans. FMDs are meal replacement plans, mimicking the endocrine and metabolic effects of fasting while containing modest numbers of calories. The characteristics of these diets that are critical for appropriate copying of the effects of total fasting (despite considerable calorie content) are lack of refined carbohydrate, a very low-protein content and high levels of healthy fats, all from plant-based sources (60). Both animal and human studies suggest that FMDs can be applied as infrequently as once a month for 5 days, requiring approximately 50% reduction in calories to be effective in promoting strong effects on metabolic syndrome risk factors (61, 62). Remarkably, and in keeping with previous reports (58), animal studies indicate that the effects of the periodic FMD on disease risk factors does not require overall calorie restriction, since mice on the FMD consumed the same number of calories per month as mice on the ad libitum diet (61). The molecular mechanisms that underpin the benefits involve the persistent endocrine and metabolic shifts typically induced by fasting: (1) reduction of (bioavailable) insulin growth factor-1, insulin, ectopic fat storage, and endogenous glucose production (2) increased adipose lipolysis and fat oxidation and (3) use of glycerol and ketone bodies instead of glucose as preferred carbon sources (59). Notably, recent experimental evidence suggests that periodic use of FMDs can drive beta-cell regeneration to restore insulin production in animal models of type 2 (and type 1) diabetes (24). The potential benefits of intermittent fasting could lie in 𠇏lexibility training” i.e., the frequent switching between metabolic modes, from glucose to free fatty acids and ketone bodies as energy source, i.e., inherent to fasting (59). This finding is in line with the positive impact of intermittent fasting on insulin sensitivity, inflammatory markers, oxidative damage, and stimulation of autophagy (59, 63, 64).

Some studies, however, did show adverse effects of intermittent fasting in healthy, non-obese subjects, including increased levels of free fatty acids and impaired glucose tolerance (65, 66), suggesting that intermittent fasting should only be applied in metabolically inflexible persons.

No studies as yet have been performed that compare effectiveness of an isocaloric intermittent fasting scheme and caloric energy restriction, or healthy diet intervention. Such studies are required to confirm the ability to “train” metabolic flexibility with intermittent fasting.

Collectively, currently available evidence strongly supports the clinical potential of periodic fasting as an effective and safe alternative to chronic restriction of calories. FMDs may be a feasible mode to implement this therapeutic strategy.

Macronutrient Composition

The macronutrient composition of food is relevant for metabolic control as well. As mentioned above, in theory, abolishing the need to store excess glucose by low-carbohydrate (particularly sugar and starch restricted) diets obviates the need for glucose lowering medication, simply because there is no glucose to be stored. Numerous studies have been performed with low or poorly digestible carbohydrate in type 2 diabetics [e.g., Ref. (67�)]. Follow-up was 1 year in most of them. The data suggest that, in the short term (4 weeks to 6 months), carbohydrate restriction is effective in terms of glucose and weight control. However, in the long run, the effects in terms of bodyweight, glucose control, and lipid profiles of low-carbohydrate diets do not appear to be superior to those of high-carbohydrate diets (72). A meta-analysis of intervention studies (in diabetic and non-diabetic humans) revealed that substitution of carbohydrates with poly-unsaturated fatty acids significantly reduces plasma HbA1c and insulin, while replacing carbohydrate with saturated fat reduces only insulin levels somewhat (73). Besides carbs and fats, effects of dietary protein on diabetes type 2 also received a lot of attention in literature. The acute insulinotropic effects of especially whey protein have been quite well established (74). This effect seems to be mainly driven by an increase in GLP-1, as well as decreased gastric emptying (75, 76). However, increased insulin production is not a beneficial effect if not accompanied by increased insulin sensitivity. Overstimulation of the pancreas and stimulation of gluconeogenesis by high-protein diets may in the long term even result in increased (muscle) insulin resistance (77), if not compensated by increases in lean muscle mass or weight loss (78, 79). It can thus be concluded that manipulating the macronutrient content of the diet without restriction of calories does not cure T2D in the vast majority of patients, although it can improve the diabetic phenotype to a certain extent.

Micronutrients and Non-Nutrients

Many micronutrients and non-nutrients were shown to improve glucose control, interacting with a large variety of pathways and processes. Interestingly, almost all processes involved in maintaining systems flexibility (see above) are targets of micro- or non-nutrients. These are summarized in many dedicated reviews (17, 29, 80, 81). Micronutrients are involved in ectopic lipid disposition [e.g., choline deficiency reduces fatty acid oxidation (82, 83) and hampers VLDL particle synthesis, and thus stimulates fatty liver (84)]. Various nutritional therapies were shown to be efficacious for non-alcoholic fatty liver disease (84). Nutritional anti-inflammatory compounds (e.g., polyphenols, omega-3 fatty acids) contribute to reversing type 2 diabetes, although primarily in combination with other interventions (85), and in a gender-specific manner (86). Insulin secretion is optimized by zinc (87), magnesium (88), and vitamin D (coinciding with its effect on inflammation and glucose control) (89). In theory, a healthy diet should suffice in the needs of these micronutrients. Given the fact that type 2 diabetics usually do not have a track record in healthy eating, targeted prescription of micronutrients and non-nutrients may be required. Diagnosis of the malfunctioning components of systems flexibility in T2D (or any other disease) will allow the design of specific (nutritional) therapies.


One of the reasons for the lack of consistent, significant effects of the dietary interventions described above probably pertains to the fact that people are not born or raised equal. Moreover, as described in the previous paragraphs, systems diseases require systems diagnosis based on quantifying phenotypic flexibility. This reveals the underlying physiological disease cause(s), embedded in a complex network of metabolic and inflammatory processes. For optimal systems flexibility, each process needs to function optimally. In type 2 diabetes, many organs and processes can contribute to disruption of (glucose) metabolism (90). The degree of insulin sensitivity of the three main organs involved in maintaining glucose homeostasis (muscle, liver, and adipose tissues), and the degree of insulin excretion by the pancreas can be assessed by measuring glucose and insulin at 30-min intervals during an OGTT, together with fasting free fatty acids (16). The severity of insulin resistance can differ between the various organs, and different interventions may have organ-specific effects related to increasing insulin sensitivity (91), as demonstrated by the example of treatment of type 2 diabetic patients with a very low-caloric diet (VLCD) or physical exercise. When insulin resistance primarily affects muscle, physical exercise rapidly restores glucose tolerance (92). In contrast, patients with insulin resistance of the liver respond particularly well to VLCD (23). However, when β-cell capacity is not sufficient, neither VLCD nor physical exercise will fully reverse glucose tolerance (23, 93�). Similar tissue-specific metabolic effects of distinct interventions were also demonstrated in the CordioPrev cohort, where the Mediterranean diet specifically improved glucose metabolism in T2D patients who predominantly had muscle insulin resistance, while a low-fat diet specifically benefited patients with liver insulin resistance (36). These observations are supported by mechanistic data on metabolic flexibility, demonstrating differential molecular routes for insulin resistance in distinct organs (96). This begs for systems interventions based on diagnosis of decreased flexibility of (tissue-)specific health-related processes. In taking this concept further, toward all aspects of phenotypic flexibility, Table 1 gives an example of how we envision systems diagnosis and related interventions for T2D. Many systems biology based examples of nutrient𠄿lexibility–health relationships and the ways in which they can be personalized have been reviewed (29). In the area of hypertension, an “integrative approach” was proposed, using a multitude of food products and supplements on top of the DASH diet (97). Recently, an international panel recommendations for the prevention and management of metabolic syndrome with lifestyle also showed clear scientific evidence for lifestyle treatments based on weight loss and increased energy expenditure through physical activity, for a Mediterranean-type diet, with or without energy restriction, as well as for other similar dietary patterns, next to quitting smoking, reducing intake of sugar-sweetened beverages and meat products (17).

Physical Activity Interventions

There is strong evidence for the beneficial effects of physical activity on insulin resistance, type 2 diabetes, dyslipidemia, hypertension, and obesity (98, 99). Even a single exercise bout improves blood pressure, glycemia, carbohydrate oxidation during exercise, and fat oxidation after exercise (100). Physical activity programs were shown to be effective in NAFLD treatment (101). Specific physical activity programs achieve specific metabolic health improvements. For example, high-intensity intermittent exercise was shown to specifically reduce liver fat (102). Sixty minutes of walking resulted in a 75% increase of whole body insulin sensitivity (103). On the other hand, just like overeating, lack of physical exercise as such is associated with a number of lifestyle-related diseases (104�). Physical activity programs can be targeted to specifically reduce ectopic fat and thus become part of a lifestyle program for T2D. A meta-analysis indicated that especially endurance training (aerobic exercise) had an effect on visceral fat and possibly intrahepatic fat in type 2 diabetics (107).

Mechanisms involved are restoration of metabolic flexibility, the oxidative part of insulin resistance, and reducing the reactive oxygen species production. Also, physical activity attenuates inflammation in a positive manner, which contributes to management and reversal of type 2 diabetes (108). Most of these mechanisms involve the mitochondria and are especially valid for the muscle (109), suggesting a specific benefit of physical activity for patients with muscle insulin resistance.

The Role of Medication in Lifestyle Therapy

The current pharmaceutical approach to diabetes care is to assist the patient in maintaining glucose, lipid, and blood pressure control. Metformin primarily decreases hepatic glucose production. Thiazolidinediones assist in fatty acid storage in adipose tissue, thereby increasing the use of glucose as energy source and reducing ectopic fat storage. Sulfonylureas stimulate insulin secretion, DPP4 inhibitors prolong the half-life of insulin-stimulating hormones, and exogenous insulin facilitates organ glucose uptake. Medication is prescribed depending on the stage and severity of the disease. Yet, none of these medications addresses the root cause of T2D and will thus not cure the patient.

Plant-based diets almost immediately reduce the need for insulin in “insulin-dependent” type 2 diabetics (110), because absorption of complex carbohydrates from plants just modestly increases plasma glucose levels, which obviates the need for insulin to facilitate glucose storage. Very strict low calorie (600 kcal/day) dieting restores plasma glucose concentrations to normal in T2D patient in 1 week (23). Yet, although such intervention initiate the reversal of type 2 diabetes, the trajectory toward real cure (i.e., restoration of organ insulin sensitivity and beta-cell insulin excretion) will take much more effort and time. Meanwhile, medication may be needed. Therefore, a lifestyle intervention as therapy for T2D will need to be supervised by medical professionals. Notably, patients who choose to adopt lifestyle changes to treat their T2D often need to convince their doctors to adapt their medication. Doctors generally consider medication as required for glucose management (111).

In a fully integrated approach to cure of T2D, lifestyle and medication should (sometimes) be combined. Pharmaceutical strategies should be designed to support optimal reversal of tissue insulin resistance and beta-cell failure, i.e., a combined “precision medicine/precision lifestyle strategy.” As described above, type 2 diabetes essentially needs a systems view, systems diagnosis, and systems therapy to restore insulin sensitivity of all relevant tissues (liver, muscle, and adipose), and reactivation of beta-cell insulin secretory capacity. The extent to which insulin sensitivity and/or action are compromised in each of these tissues may differ between patients. Therefore, the treatment strategy should be tailored to personal disease characteristics. For example, hepatic insulin resistance due to hepatosteatosis in a relatively lean subject may be due to impaired fatty acid uptake by subcutaneous fat or by excessive consumption of refined carbohydrates. Combining a PPAR agonists with restriction of sugar (and alcohol) intake could be a beneficial 𠇏ood-pharma” couple.

Depending on the disease progression, two other aspects related to medication need to be taken into account. First, not all aspects of systems flexibility may be fully restored, as irreversible damage may have occurred. For example, impaired insulin production can be reversed by reducing pancreatic fat storage and glucose toxicity (112), but beta-cell damage may not be fully reversible. Second, not all comorbidities and pathologies resulting from hyperglycemia are reversible, although the causal drivers (obesity, hypercholesterolemia, hyperglycemia) can be reversed. Indeed, cardiovascular, renal, ocular, and neural complications might need specific medication.

Sustained Behavior Change

A lasting change in health requires sustainable lifestyle changes of individuals and active self-management. Behavioral change interventions are designed to affect the actions that individuals take with regard to their health. These interventions can be directed at the individual via psychological determinants of behavior, or target their social or physical environment to create a supportive environment.

First of all, for sustained behavior change, various phases need to be confronted. Initiation of behavior change is by no means a guarantee for continuation. Rothman et al. suggested four phases of behavior change: behavioral initiation, continuation, maintenance, and habit (Table 3) (113). Before the phase of initiation of new behaviors, old habits (like unhealthy eating) are prominent. In the phases of initiation and continuation, the newly adopted behaviors can be in conflict with old patterns, and within this phase the relapse to old habits is likely. If the new behavior becomes the dominant response across context, and people are able to override previous automatic responses, maintenance, and habit become likely. Each phase needs tailored behavior change strategies (113). Whereas the initiation phase is mainly based on outcome expectations and efficacy beliefs, continuation depends on the reward gained from the behavior and self-regulatory skills like planning, sustained self-efficacy, self-regulatory effort, including self-monitoring (114, 115). Maintenance has been linked to factors like self-identity, satisfaction/enjoyment of the new behavior (116). A new habit is achieved when the newly formed behavior has become the automated response.

Table 3. The various phases in behavioral change, together with the individual’s perspective and coaching methods.

Behavior change interventions have been effective in creating changes in physical activity and dietary intake (117, 118) and have resulted in changes in HbA1c levels (119), especially for those with a higher baseline HbA1c and for interventions of at least 1 year. Behavioral maintenance of physical activity and dietary behaviors can be achieved when these interventions are conducted over a longer period (㸤 weeks), include face-to-face contact, multiple intervention strategies, and follow-up prompts (120). The effectiveness of these interventions depends on the balanced combination of behavior change strategies to promote and support behavior change (118).

Studies that have effectively supported type 2 diabetes patients have been based on:

– prompt focus on past success (identifying and emphasizing successful behavior change from the individual’s past)

– barrier identification/problem-solving (identifying salient barriers to physical activity for the individual and strategies to overcome them)

– use of follow-up prompts (such as reminder postcards or motivational telephone calls)

– provide information on where and when to perform the behavior (individuals are given explicit information on locations, times, and opportunities available locally for changing physical activity behavior)

– prompt review of behavioral goals followed by revisions or adjustments (121)

– maintenance motives (focusing on reward of changing the behavior) (122)

– environmental restructuring (to facilitate the desired behavior) (123).

Lifestyle change is not an effortless or innately pleasurable process for most people. Thus, engaging in healthy behaviors needs to be fulfilling or rewarding—until the long-term benefits of the new healthy lifestyle become manifest and the new lifestyle has become habitual. Reward-based reinforcement is a much-researched learning theory strategy to promote behavior change, which includes vicarious reinforcement (imitating behaviors of others who have experienced rewards due to the behavior) and incentives [which can be (financial) presents]. Incentive-based interventions have been successfully applied to various areas of behavior change, ranging from smoking cessation and physical exercise to counseling attendance and medication adherence (124). Reward-based approaches address the principle that people are prone to succumb to instant gratification, rather than investing for the future. Some people are more prone to instant gratification, and it is often associated with the immediate reward of eating. Meta-analytical evidence shows that providing (financial) incentives is an effective strategy in health behavior change (124). As a consequence, reward-based programs have gained a lot of interest among policymakers and governments. Both the United States and the United Kingdom governments advocate the use of incentives to encourage healthy lifestyle choices and have started large-scale reward programs that target vulnerable populations. In South Africa, health insurer discovery has developed an advanced reward program that collaborates with businesses to form a “lifestyle loyalty” program. Participants earn reward points for all kinds of health activities and behaviors, and the points can be exchanged for goods and discounts at local and national businesses. Active participation in the program showed to increase exercise, healthy food purchases, and decrease healthcare costs. A comparable approach is adopted in the Netherlands for cardiac patients and could be easily adjusted to type 2 diabetes.

Persuasiveness is a key element to promote behavioral maintenance and adherence to interventions (125). Although monetary rewards may function as incentives, other types of persuasive design methods of [ehealth (eHealth)] interventions have been examined as a means to increase engagement with interventions and behavior change. Whereas the behavior-change strategies are mainly targeting primary task support, supporting behavior change, other persuasive design features focus on dialog support, increasing attractiveness of interventions or the reward of behavior change. A particular persuasive strategy is gaming or gamification principles. These so-called serious games are not only meant to entertain but also aim to educate or promote behavior change. Serious games can promote the use and adherence to behavior change interventions. Indeed, a recent meta-analyses examining the role of video games in the promotion of lifestyle changes, suggested that serous gaming is effective, albeit to a limited extend (126, 127). A particular type of gaming has been exergaming or active video games, which are games that require physical activity. These include games, such as dancing games, which generally are done indoor. Exergames seem to be able to contribute to physical activity and bodyweight control, although the effects appear to be limited in uncontrolled settings, and keeping people involved seems a challenge (128, 129). Nevertheless, with the emergence of new types of mobile exergames, for instance, using combined virtual reality and stimulating physical activity at any time and place, exergaming may have potential (130).

Therapeutic alliance is another type of engagement strategy. Therapeutic alliance has been defined as a non-specific feature of treatment, reflecting the extent of collaboration, purposeful action, and emotional connection between a client and therapist. Within the context of interventions guided by healthcare workers, it has been shown that therapeutic alliance increases adherence and effectiveness (131). Also in eHealth interventions, therapeutic alliance has been shown to enhance engagement (132, 133). This certainly is important, as early dropout tends to occur during eHealth interventions.

Lifestyle-based therapy aims to results at reversing and possibly curing type 2 diabetes. Reversal may be achieved within a relatively short period (weeks to months, depending on intensity of the program and severity of the diabetic state). After this 𠇌ure phase,” continued support may be helpful lifelong, both focusing on maintaining an optimal lifestyle and on supporting behavior. Figure 2 presents a schematic overview on the various phases, mentioning some of the important components that can be applied during the various phases.

Figure 2. Schematic representation of a personal cure and maintenance trajectory. Depending on the personal 360° diagnosis, the duration and intensity of the cure and maintenance phases and the relative contribution of the components may vary.

Health Literacy

Health literacy in type 2 diabetes is often but not unambiguously associated with glycemic control and other disease endpoints (134). Since care and cure of type 2 diabetes heavily depend on self-management, and since this is a complex matter involving many aspects of information, awareness, reasoning, and knowledge, health literacy is of high importance. As mentioned above, health literacy should be taken into account when diagnosing a patient.

Health literacy methods are the topic of research and evaluations on their efficacy are in progress. Taken together, multiple aspects and approaches are available and real-life implementation in a tailored, if not personalized manner involving all stakeholders needs to be part of the diabetes care and cure agenda (135).

Structural Interventions for Risk Groups

Besides individual support, structural interventions should be initiated for particular groups of people with T2D that hamper changing their lifestyle due to social𠄼ontextual constraints, including neighborhood characteristics, availability of foods, poverty, the local role of primary care, etc. (136). Structural interventions, i.e., interventions aimed to alter the context in which people are living, are likely to promote sustainability and may have additional positive effects on people, besides lifestyle changes and changes in physical health, such as lower experienced loneliness, improved social cohesion, etc. (137). This in particular may be needed for people at risk, e.g., citizens of low socioeconomic gradient, as they are often experiencing a multitude of problems, which need to be targeted by interventions to enable them to effectively change their lifestyles (138). Solutions are sought in community-oriented multidisciplinary primary care interventions in many countries.

EHealth and mHealth in Type 2 Diabetes

Information and communication technologies, i.e., eHealth, can facilitate health care (139). Particularly in the area of eHealth self-management for chronic somatic conditions, guided and embedded interventions have been shown to be as effective as face-to-face treatment but are usually more cost-effective (140). In type 2 diabetes, innovations in eHealth have a demonstrated potential for supporting patients with self-management behaviors, in particular dietary and physical activity behaviors, and may result in better diabetes outcomes such as HbA1c (141). Innovations that can have positive impacts on self-management behaviors include text messages, smartphone apps, and web-based programs (141). Several systematic reviews showed that telemonitoring can improve HbA1c levels (142�). Video games, virtual and augmented reality, and wearables are also promising, but individuals should be adequately trained in the use of these technologies (141). Moreover, guided eHealth interventions are usually more effective than non-guided interventions (140), which urges for 𠇋lended care” solutions, so that eHealth is part of a structured care plan.

Thousands of mobile (mHealth) apps for diabetes are available for download in App stores (145). Only a small number of the available apps are evidence based (146). Some systematic reviews and meta-analyses showed that mobile phone interventions lead to improvements in glycemic control and self-management (147, 148). Apps for diabetes self-management typically share a limited number of basic functions, which can be classified into several categories, e.g., self-monitoring, education, support, alerts and reminders, and communication (149). To evaluate the effectiveness of the different functions on glycemic efficacy, Wu et al. developed a taxonomy of apps for diabetes self-management (150). In line with earlier studies, they showed that the use of mobile app-based interventions yields a clinically significant HbA1c reduction among adults with type 2 diabetes. Having a complication prevention module and/or a structured display was associated with a greater HbA1c reduction. Other functions (medication management, generation education, personalized feedback, communication, potential-risk intervention with clinical decision-making) were not associated with greater HbA1c reductions.

eHealth and mHealth can be applied both in the management and prevention of diabetes and thus should be embedded in the continuum of prevention, care, and cure. The American Diabetes Association stated that eHealth technologies, such as Internet-based social networks, distance learning, DVD-based content, and mobile applications may be a useful element of effective lifestyle modification to prevent diabetes (151). The US Center for Disease Control and prevention and the Diabetes Prevention Recognition Program have begun to certify eHealth- and mHealth-based modalities as effective media for diabetes prevention interventions that may be considered together with more traditional face-to-face and coach-drive programs (blended care). Apps for weight loss and diabetes prevention have been validated for their ability to reduce HbA1c in the setting of prediabetes (152). Traditionally, most weight loss apps focus on reducing caloric intake, but for people with (pre)diabetes it is more important to make food choices that induce normal post-meal glycemic responses (153).

mHealth can be greatly improved when ICT technologies are combined with evidence-based behavior change interventions. It has been shown that eHealth interventions that use more behavior change strategies were more likely to effectively change health behavior (physical activity, healthy eating, weight loss) (118, 154). And through, e.g., cultural tailoring (e.g., tailoring to gender, age, religion, ethnic background, or health literacy) acceptance, loyalty, and effectiveness of digital health can be improved (155). Furthermore, computer-tailored interventions have become increasingly common for facilitating improvement in health behaviors. Dynamic tailoring (where the intervention variables are assessed before giving feedback) was shown to be more efficacious than static tailoring (all feedback is based on one baseline assessment) and has long-term effects (156). For all eHealth interventions, guided treatments are usually more effective than non-guided treatments (140).

Combining face-to-face counseling with extended care (via dynamically tailored support) has a potential to increase the effectiveness of T2D management (157). There is a clear need for delivery systems to use team-based models and engage patients in shared decision-making (SDM), where patients and providers together make healthcare decisions that are tailored to the specific characteristics and values of the patient. It has been demonstrated that such an approach leads to patients reporting better understanding of diabetes and showing improved hemoglobin HbA1c values, while healthcare providers reported the SDM aids increased cohesion among team members (including patients) and facilitated patient education and behavioral goal setting (158). In a large pragmatic trial making use of motivational interviewing, SDM, and collaborative goal setting in chronic conditions, a striking difference in mortality rates was found after 2 years of telephone-based health coaching (OR = 0.64 p = 0.005), which was achieved with an average of 12.9 calls per patient (159).

The Value of a Timeline of the Health and Behavior Trajectory

Ideally, biomarkers and diagnostics develop into two dimensions. First, from a single process to the complete quantification of health, including flexibility (“systems flexibility biomarker,” described above), and second, along the timeline of an individual’s health trajectory, building the life story of systems flexibility, a personal 𠇋iopassport.” Loss of systems flexibility is a process that develops over the timespan of many years. Interventions are most successful in early stages, when full reversal and cure is possible. The storage and availability of biomarker data has been common practice in longitudinal cohorts, but the translations of its results into health care is a tediously slow process. Also, personal health(care) data are usually not available in a structured and understandable manner for the citizen/patient to valorize for his personal health. Since lifestyle-related health is primarily dependent on self-management and self-empowerment, it is vital that the citizen/patient has access to all relevant health data and information (8). If biomarkers of phenotypic flexibility are the key in optimizing metabolic health and in prevention and treatment of metabolic diseases, they need to be measured in regular intervals. At this moment, this is not practical and not affordable, and moreover most healthcare systems neither focus on nor reimburse preventive diagnostics. Therefore, new diagnostic applications need to be developed which are cost-effective, minimally invasive in preferably 𠇍o-it-yourself” applications. Developments, both in ICT (personal health portals) and in diagnostics (“gadgets,” dried blood spot diagnostics, etc.) are rapidly disclosing this area. Various sensors (e.g., wearable, on-phone, at home) and other measurement devices (e.g., glucose monitors, weighing scale) have become cheaper and more user friendly. The challenge is to use these techniques in a complementary manner (160). As an example, the “Nutrition Researcher Cohort” (161) started this movement in 2011. However, this cohort encountered major obstacles related to research ethics as it appeared to be virtually impossible to merge personal health data collection with 𠇌itizen-science” research, and a further development of “participant led research is needed” (162). The NIH 𠇊ll of Us” cohort as part of the Precision Medicine Initiative is professionalizing this movement (163), although not focusing on lifestyle. New, partly commercial activities are maturing this area (11). Also, 𠇋ig data” and artificial intelligence-derived solutions are emerging for clinical decision support, like IBM’s health analytics (164). Finally, apart from the above-mentioned developments that shift diagnosis away from the traditional medical domain toward self-empowerment, other more unexpected developments arise. Internet search engine data analysis is rapidly becoming a powerful tool for surveillance (165). These opportunities now move from surveillance to research tools (166), but it may take some time before they become applicable for personal diagnosis due to privacy and ethical constraints. A biopassport is the ideal starting point for the design of lifestyle-based personal health optimization and self-empowerment strategies. The biopassport can be extended with the above described �° diagnosis.” eHealth applications can be tapped into the biopassport to deliver advice and guidance. Ecological Momentary Assessment/Intervention application can be embedded in and will enrich the biopassport (167). Essentially, the personal availability of the timeline of health data will trigger a wealth of applications and economic developments that support personal healthy living and lifestyle-based therapies.

Health Data Cooperatives As Ultimate Platform of Health Democracy

Science, business, and society now begin to realize the value of citizen-owned health data (8). Numerous participant or patient centric initiatives have emerged, either from within or outside the healthcare institutions, both commercial and non-profit, and all of them based on Internet and social networking (168, 169). An enormous power is being unlocked in extending and integrating these data, and this is acknowledged by investments in this area through companies like 23andMe and PatientsLikeMe, and activities of most major 𠇋ig data companies” (170). Of course, the best entity to valorize citizen-owned health data for any purpose (science, health care, or economy) should be the legal owner, i.e., the citizen itself. The “Health data Cooperative” (171) may be an attractive model for this purpose, as it is the most democratic shared ownership and decision-making legal entity available. This would unlock the value of personal health data for the benefit of personal and public health, facilitate an optimal merge between healthcare quality management and research, and provide the citizen/consumer/patient with the power to become the center of health care. If indeed the economic value of health data can be invested for the genuine benefit of the citizen/consumer/patient instead of commercial stakeholders, this has the potential to disrupt the current healthcare system into a citizen-centered and self-empowering healthcare system and economy, where services can be implemented that can optimize health, prevent lifestyle-related disease, and cure these using the right tools. Among the right tools should be support of a structured population health management strategy aiming at stratification of the population with T2D according to their risk of various relevant adverse health outcomes. This can be realized with linkage and advanced analysis of (when needed coded and anonymized) routine healthcare data from all domains, resulting in a structured approach of individuals who are member of subpopulations sharing defined risks.

Toward Integrated 𠇌ompanion Systems”

Once the above expose materializes in actionable interventions, the type 2 diabetes patients will be overwhelmed by a wide range of advice and information. Generally speaking, these patients have not shown excellence in adherence to advice nor have they all the skills required to adequately deal with a wide range of advice and information. Thus, many components line up for a new failure in implementation and dropout of patients after initial start. Especially, if we look at mHealth we see the number of mobile phone apps for type 2 diabetes is enormous, being the largest application area in the 100,000+ health Apps available in 2014 (145). It is unlikely that any of these apps covers all areas needed, and it is unlikely that any significant percentage of type 2 diabetics will be able to sensibly maneuver among this overwhelming offer.

We therefore propose a different approach, where a multitude of overlapping personal advice systems applications are replaced by an integrated ecosystem of health services, provided to the end-user in an “on demand” manner based on real-time needs. Health services can span advice on diet, medication compliance, physical activity, behavioral guidance, health and product information, community building, etc. These services should only be presented to the (ex-) type 2 diabetes patient when relevant and needed, in a format and language that is fine-tuned to the user’s socioeconomic𠄼ultural needs, and optimally facilitate liaison with healthcare professionals. Obviously, these services can become very complex and may include artificial intelligence, ecological momentary assessment and interventions, just in time adaptive interventions and similar approaches. Also, a wealth of personal health data can be used as input, spanning from personal health monitoring to medical records. Yet, none of this complexity should be visible to the end-user, but the interaction with this “life companion” should be minimized to non-intrusive essentials matching the health literacy of the user. Various presentation modes (mobile, desktop, life coaching, or SDM) are possible. Figure 3 presents a schematic overview of the functionalities of a life companion approach. Examples are emerging which combine the layers described above, integrating all aspects of P4-medicine (personalized, predictive, preventive, and participatory) with data- and knowledge drive advice systems in the area of type 2 diabetes (172).

Figure 3. Schematic overview of a “life companion” approach. The citizen–patient interacts with a single ehealth (eHealth) platform (any combination of phone, desktop, life coach, healthcare provider) and receives interventions in all relevant areas (diet and lifestyle, behavior, information, etc.) at the right time in the right message format, based on both initial and continued diagnosis. The intervention is generated by “health services,” i.e., models that exploit personal health and behavior data. Timelines of diagnostic and intervention information are owned by the citizen/patient and may be shared within a community (health data cooperative), thus further strengthening the personal health data service with a 𠇋ig data” component.

Developing GDi-CRISPR System for Multi-copy Integration in Saccharomyces cerevisiae

In recent years, Saccharomyces cerevisiae has been widely used in the production of biofuels and value-added chemicals. To stably express the target products, it is necessary to integrate multiple target genes into the chromosome of S. cerevisiae. CRISPR multi-copy integration technology relying on delta sites has been developed, but it often requires the help of high-throughput screening or resistance markers, resulting in non-replicability and high cost. This study aims to develop a low-cost and easy-to-use multi-copy integration tool in S. cerevisiae. Firstly, twenty-one Cas proteins from different microorganisms were tested in S. cerevisiae to find the functional Cas proteins with optimal cleavage ability. Results showed that eight Cas proteins can complete gene editing. However, most of the transformants have low copy numbers, which may be caused by high cutting efficiency exceeding the repair rate. Therefore, the effect of donor translocation order was further investigated. Results showed that 4 copies were obtained by donor first translocation. Then, the gene drive delta site integration system by the CRISPR system (GDi-CRISPR) was developed by combining gene drive principle and CRISPR system. To be clear, the gRNA was put into donor fragments. Then, both of them were integrated into the genome, which can drive further cutting and repair due to increasing number of gRNA. Instead of high-throughput screening or resistance pressure, 6 copies were obtained in only 5–6 days using the GDi-CRISPR system. It is expected to further advance the development of S. cerevisiae multi-copy integration tools.

This is a preview of subscription content, access via your institution.

Omics Integration and Systems Biology – ONLINE

Course open for PhD students, postdocs, and researchers looking for an introduction to multi-omics integration and systems biology. This course is run by the National Bioinformatics Infrastructure Sweden (NBIS).

Due to the COVID-19 situation, the 2021 course will be held online.

Important Dates

  • Application opens: 15 January
  • Application closes: 12 March
  • Confirmation to accepted students: 19 March

Application and more information can be found on the course website.

Contact information

For questions about this workshop please contact: [email protected]

This online training event has no fee. However, if you accept a position at the workshop and do not participate (no-show) you will be invoiced 2000 SEK.

* Please note that NBIS cannot invoice individuals.

Course content

The aim of this workshop is to provide an integrated view of biological network construction and integration, constraint-based modelling, multi-omics integration through Machine Learning, and data-driven hypothesis generation. It will provide a general description of different methods for analysing different omics data, including key methods and pitfalls in their integration. The techniques will be discussed in terms of their rationale and applicability, with a particular focus on possible confounding factors. The course will also include hands-on sessions and invited speaker seminars.

Some of the covered topics include:

  • Data wrangling in omics studies
  • Condition-specific and personalized modeling through Genome-scale Metabolic models based on integration of transcriptomic, proteomic and metabolomic data
  • Biological network inference, community and topology analysis and visualization
  • Identification of key biological functions and pathways
  • Identification of potential biomarkers and targetable genes through modeling and biological network analysis
  • Application of key machine learning methods for multi-omics analysis including deep learning
  • Multi-omics integration at single-cell level
  • Multi-omics integration, clustering and dimensionality reduction
  • Similarity network fusion and Recommender systems
  • Integrated data visualization techniques

Further details about the course content may be found on the course homepage.

Entry requirements

The course is aimed at M.Sc., PhD- or postdoc-level researchers with basic programming experience (e.g. R or Python). We will not discuss how to process the raw omics data and the students are referred to other NBIS courses for this matter.

Required to be able to follow the course and complete the practical exercises:

  • Programming/scripting experience (in R or python).
  • Basic understanding of frequentist statistics
  • Be able to use your own computer with a web camera and R or Python installed for the practical computational exercises. Instructions on installation will be sent by email to accepted participants.
  • Experience with analysis of omic data (e.g. metabolomics, proteomics, transcriptomics) and NGS analysis
  • Completing NBIS courses “Introduction to Bioinformatics using NGS data”, “Introduction to biostatistics and machine learning”

Due to limited space the course can accommodate maximum of 25 participants. If we receive more applications, participants will be selected based on several criteria. Selection criteria include correct entry requirements, motivation to attend the course as well as gender and geographical balance.

Science and Engineering Information Integration and Informatics (SEIII)

Full Proposal Deadline(s) (due by 5 p.m. submitter's local time):


In furtherance of the President's Management Agenda, in Fiscal Year 2005, NSF has identified 23 programs that will offer proposers the option to utilize to prepare and submit proposals. provides a single Government-wide portal for finding and applying for Federal grants online.

Proposers may opt to submit proposals in response to this Program Solicitation via or via the NSF FastLane system.

In determining which method to utilize in the electronic preparation and submission of the proposal, please note the following:

  1. Collaborative Proposals. All collaborative proposals must be submitted via the NSF FastLane system. This includes collaborative proposals submitted:
    • by one organization (and which include one or more subawards) or
    • as separate submissions from multiple organizations.

Proposers are advised that collaborative proposals submitted in response to this Program Solicitation via will be requested to be withdrawn and proposers will need to resubmit these proposals via FastLane. (Chapter II, Section D.3 of the Grant Proposal Guide provides additional information on collaborative proposals.)

  1. All Other Types of Proposals That Contain Subawards. All other types of proposals that contain one or more subawards also must be submitted via the NSF FastLane system.

The following Revisions and Updates were included in the original program solicitation NSF 04-528:

The Dear Colleague Letter, "Proposal Submission Deadlines for the Division of Information and Intelligent Systems [IIS]," (NSF 01-156 dated September 6, 2001) established two annual proposal submission deadlines, March 1 and November 16. The Dear Colleague Letter is being replaced by individual IIS program solicitations, each with one annual proposal submission deadline. Please see the IIS Web site ( for additional information.

Effective on the day this program solicitation is posted by NSF, the deadline for Science and Engineering Information Integration and Informatics proposals is March 4, 2004, December 15, 2004 and December 15, annually, thereafter. Proposals submitted in anticipation of a November 16, 2003 deadline will be accepted and reviewed with those submitted for the March 4, 2004 deadline.


General Information

Science and Engineering Information Integration and Informatics (SEIII)

The Science and Engineering Information Integration and Informatics (SEIII) program focuses on advancing the state of the art in the application of advanced information technology to science and engineering problems in specific domains, such as astronomy, biology, the geosciences, public health and health care delivery. Since many scientific problems have common needs for information management and data analysis, the advancement of these technologies is central to SEIII. Similarly, within computer science, the study of complex distributed computer and network systems requires the collection and analysis of timely, accurate and reliable information.  Although methods for the analysis of scientific data and information will be supported by the program, a special emphasis will be placed on domain-specific and general-purpose tools for integrating information from disparate sources. Such integration is a key step of many projects yet is rarely addressed in full generality. The SEIII program will have two separate components to address these research areas: Science and Engineering Informatics (SEI) and Information Integration (II).

Within this program, the NSF intends to support a group of projects that will advance the understanding of technology to enable scientific discovery, and that will creatively integrate research and education for the benefit of technical specialists and the general population.

Cognizant Program Officer(s):

James C. French, Program Director, Directorate for Computer & Information Science & Engineering, Division of Information and Intelligent Systems, 1125 S, telephone: (703) 292-8930, fax: (703) 292-9073, email: [email protected]

Sylvia Spengler, Program Director, Directorate for Computer & Information Science & Engineering, Division of Information and Intelligent Systems, 1125 N, telephone: (703) 292-8936, fax: (703) 292-9073, email: [email protected]

Applicable Catalog of Federal Domestic Assistance (CFDA) Number(s):

Eligibility Information

  • Organization Limit: None Specified.
  • PI Eligibility Limit: None Specified.
  • Limit on Number of Proposals: None Specified.

Award Information

  • Anticipated Type of Award: Standard or Continuing Grant
  • Estimated Number of Awards: 25 to 30
  • Anticipated Funding Amount: $14,500,000

Proposal Preparation and Submission Instructions

A. Proposal Preparation Instructions
  • Full proposals submitted via FastLane:
    • Grant Proposal Guide (GPG) Guidelines apply

    Full proposals submitted via

    • NSF Application Guide: A Guide for the Preparation and Submission of NSF Applications via Guidelines apply (Note: The NSF Application Guide is available on the website and on the NSF website at: To obtain copies of the Application Guide and Application Forms Package: click on the Apply tab on the website, then click on the Apply Step 1: Download a Grant Application Package and Application Instructions link and enter the funding opportunity number, (the program solicitation number without the NSF prefix) and press the Download Package button.

    This solicitation contains information that supplements the standard Grant Proposal Guide (GPG) proposal preparation guidelines. Please see the full text of this solicitation for further information.

    B. Budgetary Information
    • Cost Sharing Requirements: Cost Sharing is not required by NSF.
    • Indirect Cost (F&A) Limitations: Not Applicable.
    • Other Budgetary Limitations: Not Applicable.
    C. Due Dates

    Proposal Review Information

    Award Administration Information

    • Award Conditions: Standard NSF award conditions apply.
    • Reporting Requirements: Standard NSF reporting requirements apply.



    The efficiency and progress of the scientific enterprise has been chronically hampered by inadequate access to appropriate data and tools for analyzing and visualizing scientific data. Domain informatics specifically recognizes the importance of domain-specific information and data, and analysis methods necessary to support significant advances in data-driven inquiry.  The Science and Engineering Information Integration and Informatics (SEIII) program supports research and related educational programs with the goal of maximally exploiting data and information to enable new scientific discovery in the areas of science and engineering that are supported by the various Directorates of NSF.

    The importance of a coordinated SEIII effort cannot be overstated. SEIII seeks to catalyze and capitalize on synergies between general information technology and domain-specific informatics. Data-driven inquiry requires evaluating multiple competing hypotheses using multiple types of evidence and relating new findings to the existing knowledge and literature in a field. Throughout the nation and the world, huge quantities of data are gathered at great expense the scientific community as a whole is deluged with new data from a variety of sources yet each individual scientist sees only a small fraction of this data.  Widely dispersed, multidisciplinary groups collaborating to enable scientific discovery produce large amounts of incongruous data. The challenge for SEIII is to exploit these assets so that science can be done more efficiently and to represent data in such a way as to make it useful for discovery.   In particular, the plethora of data formats, interface protocols and vocabulary differences across disciplines must be tamed.

    Within the study of complex computer and network systems, requirements for timely, accurate and reliable information integration are becoming increasingly critical to ensure enhanced performance for existing technologies such as the Internet and to enable new functionalities through emerging ubiquitous information technologies such as sensors.

    Our society relies on a well trained and diverse workforce to develop new ideas and make the technical advances affecting its well-being. The SEIII program supports activities directed toward improving the tools and environment available to researchers and the use of the tools in educational environments. The goal is to revolutionize the education of researchers in science and engineering to accelerate the pace of knowledge discovery for future generations.


    The goal of the Science and Engineering Information Integration and Informatics (SEIII) program is to focus information technology research on addressing problems that will enable scientific discovery via analysis of large data sets or information resources.  There are typically two steps in this problem: the assembly of empirical data and other information, and their subsequent analysis to generate or test hypotheses. Specifically, this program encompasses two related components: (1) Science and Engineering Informatics (SEI) and (2) Information Integration (II). The SEIII program has the following two objectives.

    1.    Stimulation of multi-disciplinary research in Science and Engineering Informatics (SEI) that addresses significant, real requirements of an application domain. Understanding of the requirements should be derived through collaboration with the domain scientists or engineers.  An ideal project will have three key elements:

    A significant domain challenge

    A significant computer science problem that is a barrier to achieving the domain challenge and

    Demonstrated expertise in these two aspects.

    2.    Information Integration (II) research that leads to a uniform interface to a multitude of heterogeneous, independently developed data sources. The goal is to free users from having to locate the data sources, interact with each data source in isolation, and manually combine data from multiple formats and multiple sources. 

    To take maximum advantage of these SEIII activities, innovative approaches are needed in education so that capable students participate in research and so that research results are quickly integrated into the educational process.

    Proposals are encouraged on methodologies and tools for the representation and manipulation of large volumes of science or engineering data in distributed or heterogeneous environments.  In this context, projects in two related areas are encouraged: 

    A. Science and Engineering Informatics (SEI)

    1. Science and Engineering Data Models and Systems.  Theoretical foundations for the representation and manipulation of advanced data types (e.g., temporal, spatial and image data, textual data, spectrum data, engineering design data, materials data, chemical compounds, sequences, graphs, user-defined objects with inheritance and encapsulation, or declarative extensions) data/knowledge calibration and validation and handling and visualization of uncertainty in the underlying data.  Systems issues include system extensibility rapid prototyping support development of user-transparent, multi-level storage management (main memory through tertiary storage) multi-media data indexing partial match retrieval algorithms archiving and version control.  Research in this area must consider the special data and information characteristics associated with a science or engineering domain necessary to make a contribution to a particular science or engineering problem.
    2. Analysis of Science Databases and Information Resources.  Topics span computing environment transparency establishing baseline patterns, data examination, selection, analysis and manipulation of temporally or spatially related data knowledge discovery algorithms information extraction (e.g., from abstracts of publications), citation analysis, scientific visualization parallel model execution and cross-validation on large volumes of data automated knowledge acquisition incorporation of new knowledge into a system and audit trail provisions including data provenance.  The research in this area must be done in connection with a specific science or engineering problem. Computer science problems are not excluded in this context. It would be quite appropriate, for example, to propose a new method for gathering and analyzing operational network data (the tool) with the goal of supporting real-time network adaptation (the problem).
    3. Analysis of Scientific and Engineering Images.  A key research challenge in many research problems is to derive measurements or abstract features from 2-D, 3-D and multispectral images and to use this derived information for generating or evaluating hypotheses.
    4. Shared Resources Environments.  The construction of shared, archived, and documented data, publication, or software resources that can accelerate the rate of scientific discovery.

    The topics listed above are not intended to represent the complete set of issues comprising the area they are intended to be suggestive rather than limiting.

    Scope and Scale of Support of Science and Engineering Informatics (SEI)

    The awards are anticipated to provide support for inter-disciplinary teams, that is, researcher(s) in computer and information science and engineering collaborating with domain scientist(s) or engineer(s). A typical award is expected to be for 3 years, although awards of longer duration are possible.  The fiscal year 2004 plan includes $6.5 million for awards under this part of the solicitation, contingent on the quality of projects proposed and the availability of funds.

    B. Information Integration (II)

    Traditionally, an individual researcher developed hypotheses, designed experiments to test these hypotheses, collected observational data, and published results based on experiments. The data were often published in print to allow others to build upon or verify the results.  In nearly every field of 21st century science and engineering, including all of the disciplines funded by the NSF, research is now achieved by teams of researchers analyzing data sets that are far too large to publish in journals and sometimes collected independently by other scientists with different goals in mind. The goal of information integration research is to provide the necessary foundations to provide science and engineering researchers seamless access to a multitude of independently developed, heterogeneous data sources.

    Information integration seeks to maximally exploit available information to create new scientific knowledge. Effective information integration will also enhance public education by facilitating comprehensive access to distributed information resources.  Even though the Information Integration effort is directed specifically at science and engineering information, the research results developed under this research activity are expected to be broadly applicable to information of all kinds.  The focus of this area is integrating information, not manipulating it after the integration.  

    The information integration environment should have the following capabilities:

    • Integrate many different, disparate and possibly distributed sources
    • Support automated discovery of new data sources and information within them
    • Facilitate configuration, management and system maintenance
    • Incorporate structured, semi-structured, text, image, video, time-series, 3D images, citations, graphs, and data streams and
    • Provide flexible querying of the sources and the data.

    Some of the specific challenges include:

    1. Unifying Data Models and System Descriptions:  There is a need to develop stronger theoretical foundations for the representation and integration of information of various types from extant data models (e.g., temporal, spatial and image data, textual data, spectrum data, engineering design data, materials data, chemical compounds, sequences, graphs, user-defined objects) as well as the scientific literature into conceptually coherent views.  Specific topics include: metadata management and integration the automated collection of metadata from instruments and processes that transform data, ontologies and taxonomies data/knowledge calibration heterogeneity of data type and format scale of distributed systems rapid integration of new information sources.  Research in this area must consider the special data characteristics associated with science and engineering disciplines.
    2. Reconciling heterogeneous formats schemas and ontologies:  The fundamental problem in any data sharing application is that systems are heterogeneous in many different aspects, such as different ways of representing data and/or knowledge about the world, different representation mechanisms (e.g., relational databases, legacy systems, XML schemas, ontologies), different access methods and policies. In order to share data among heterogeneous sources, approaches to form a semantic mapping of their respective representations are needed to avoid manual intervention in each step of converting and merging data resources.
    3. Web semantics:  Data on the web needs to be defined and linked in a way that it can be used by machines not just for display purposes, but also for automation, integration and reuse of data across various applications. Supported research topics will include frameworks for describing resources, methods of automating inferences about web data and resources, and the development of interoperable ontologies, mark up languages and representations for specific scientific domains.
    4. Decentralized data-sharing: Traditional data integration systems use a centralized mediation approach, in which a centralized mediator, employing a mediated schema, accepts user queries and reformulates them over the schemas of the different sources. However, mediated schemas are often hard to agree upon, construct and maintain. For example, labs conducting geosciences research share their experimental results with each other, but may do it in an ad hoc fashion. A similar scenario is found in data sharing among government agencies. Architectures and protocols that enable large-scale sharing of data with no central control are needed.
    5. Data-sharing on advanced cyberinfrastructure: Research topics will include models for federating information resources in advanced grid computing and/or Web services, integration and understanding of sensor information, the collection of metadata from sensors including models and tools to cope with the scale, pervasiveness, concurrency and redundancy of sensor data. Effective integration of network management information will be critical to enable basic networking functions such as routing, overlay node placement, denial-of-service detection, and fault recovery.  The integration of network management information will facilitate adapting network resources to changing conditions.
    6. On-the-fly integration: Currently, data integration systems rely on relatively static configurations with a set of long-lived data sources. On-the-fly integration refers to scenarios where one wants to integrate data from a source immediately after discovering it. We may use a source only a few times for a particular set of tasks. The challenge is to significantly reduce the time and skill needed to integrate data sources so that scientists can focus on domain problems instead of information technology problems.
    7. Information Integration Resources: Proposals are encouraged that create toolkits for data integration that can be shared among researchers. These toolkits should remove the need for implementing an entire data integration system from scratch for every project and will facilitate large-scale collaborations. There will also be a need for a small number of test beds to validate the techniques being pursued by the funded projects in this theme area. More definite progress will be made if competing techniques can be evaluated on a level playing field. Thus, proposals for innovative test beds and evaluation methodology are also encouraged. 

    Scope and Scale of Support of Information Integration (II)

    The awards are anticipated to provide support for inter-disciplinary teams, that is, researcher(s) in computer and information science and engineering collaborating with domain scientist(s) or engineer(s). A typical award is expected to be for 3-5 years.  The fiscal year 2004 plan includes $8 million for awards under this part of the solicitation, contingent on the quality of projects proposed and the availability of funds.


    Education to develop and maintain both a highly skilled SEIII workforce and an informed populace is essential to the nation. To develop, maintain, and enhance this critical educational infrastructure, all proposals must include an educational component. Proposals must specifically describe their educational contributions. Appropriate goals include integration of research and education, promotion of knowledge transfer, reaching diverse populations and promoting diversity.

    Sample activities include: developing materials to integrate SEIII into existing courses providing access to science data, both raw and refined, to the general public mentoring faculty of K-12 institutions creating tutorial material to bring an understanding of the applicability of state-of-the art information technology to specific scientific communities and developing online resources for faculty and students.


    The categories of proposers identified in the Grant Proposal Guide are eligible to submit proposals under this program announcement/solicitation.


    Estimated program budget, number of awards, and average award size/duration are subject to the availability of funds. The NSF anticipates making 25-30 Standard or Continuing Grants under this solicitation in FY 2004. The estimated program budget for FY 2004 is $14.5 million.


    A. Proposal Preparation Instructions

    Full Proposal Instructions:

    Proposals submitted in response to this program announcement/solicitation should be prepared and submitted in accordance with the general guidelines contained in the NSF Grant Proposal Guide (GPG). The complete text of the GPG is available electronically on the NSF Website at: Paper copies of the GPG may be obtained from the NSF Publications Clearinghouse, telephone (703) 292-7827 or by e-mail from [email protected]

    Proposers are reminded to identify this program announcement/solicitation number in the program announcement/solicitation block on the NSF Cover Sheet For Proposal to the National Science Foundation. Compliance with this requirement is critical to determining the relevant proposal processing guidelines. Failure to submit this information may delay processing.

    Proposals submitted in response to this program solicitation via should be prepared and submitted in accordance with the NSF Application Guide: A Guide for the Preparation and Submission of NSF Applications via The complete text of the NSF Application Guide is available on the website and on the NSF website at: ( To obtain copies of the Application Guide and Application Forms Package, click on the Apply tab on the site, then click on the Apply Step 1: Download a Grant Application Package and Application Instructions link and enter the funding opportunity number, (the program solicitation number without the NSF prefix) and press the Download Package button. Paper copies of the Application Guide also may be obtained from the NSF Publications Clearinghouse, telephone (703) 292-7827 or by e-mail from [email protected]

    Supplemental Proposal Preparation Instructions For Use Whether Submitting via either FastLane or

    Special attention should be paid to the following items when submitting a proposal to SEIII Program:

    Proposal Titles: To assist NSF staff in sorting proposals for review, proposal titles should begin with "SEI:" or "II:", corresponding to the major technical areas of the solicitation. The title may be prefixed with "SEI+II:" when significant aspects of both technical areas are involved. Proposals for SEI projects involving applications in a particular scientific discipline may also choose to give the label of the NSF directorate primarily concerned with that research area [e.g., a title may begin with "SEI(GEO):" or "SEI(BIO):"]. NSF will, however, make the final decision on where to review each proposal.

    B. Budgetary Information

    Cost sharing is not required by NSF in proposals submitted under this Program Announcement.

    C. Due Dates

    Proposals must be submitted by the following date(s):

    Full Proposal Deadline(s) (due by 5 p.m. submitter's local time):

    D. FastLane/ Requirements

    Detailed instructions for proposal preparation and submission via FastLane are available at: For FastLane user support, call the FastLane Help Desk at 1-800-673-6188 or e-mail [email protected] The FastLane Help Desk answers general technical questions related to the use of the FastLane system. Specific questions related to this program announcement/solicitation should be referred to the NSF program staff contact(s) listed in Section VIII of this announcement/solicitation.

    Submission of Electronically Signed Cover Sheets. The Authorized Organizational Representative (AOR) must electronically sign the proposal Cover Sheet to submit the required proposal certifications (see Chapter II, Section C of the Grant Proposal Guide for a listing of the certifications). The AOR must provide the required electronic certifications within five working days following the electronic submission of the proposal. Proposers are no longer required to provide a paper copy of the signed Proposal Cover Sheet to NSF. Further instructions regarding this process are available on the FastLane Website at:

    Before using for the first time, each organization must register to create an institutional profile.  Once registered, the applicant’s organization can then apply for any federal grant on the website. 

    The’s Grant Community User Guide is a comprehensive reference document that provides technical information about Proposers can download the User Guide as a Microsoft Word document or as a PDF document.  The User Guide is available at:  In addition, the NSF Application Guideprovides additional technical guidance regarding preparation of proposals via For user support, contact the Contact Center at 1-800-518-4726 or by email: [email protected]  The Contact Center answers general technical questions related to the use of Specific questions related to this program solicitation should be referred to the NSF program staff contact(s) listed in Section VIII of this solicitation.

    Submitting the Proposal.  Once all documents have been completed, the Authorized Organizational Representative (AOR) must submit the application to and verify the desired funding opportunity and agency to which the application is submitted. The AOR must then sign and submit the application to The completed application will be transferred to the NSF FastLane system for further processing.


    A. NSF Proposal Review Process

    Reviews of proposals submitted to NSF are solicited from peers with expertise in the substantive area of the proposed research or education project. These reviewers are selected by Program Officers charged with the oversight of the review process. NSF invites the proposer to suggest, at the time of submission, the names of appropriate or inappropriate reviewers. Care is taken to ensure that reviewers have no conflicts with the proposer. Special efforts are made to recruit reviewers from non-academic institutions, minority-serving institutions, or adjacent disciplines to that principally addressed in the proposal.

    The National Science Board approved revised criteria for evaluating proposals at its meeting on March 28, 1997 (NSB 97-72). All NSF proposals are evaluated through use of the two merit review criteria. In some instances, however, NSF will employ additional criteria as required to highlight the specific objectives of certain programs and activities.

    On July 8, 2002, the NSF Director issued Important Notice 127, Implementation of new Grant Proposal Guide Requirements Related to the Broader Impacts Criterion. This Important Notice reinforces the importance of addressing both criteria in the preparation and review of all proposals submitted to NSF. NSF continues to strengthen its internal processes to ensure that both of the merit review criteria are addressed when making funding decisions.

    In an effort to increase compliance with these requirements, the January 2002 issuance of the GPG incorporated revised proposal preparation guidelines relating to the development of the Project Summary and Project Description. Chapter II of the GPG specifies that Principal Investigators (PIs) must address both merit review criteria in separate statements within the one-page Project Summary. This chapter also reiterates that broader impacts resulting from the proposed project must be addressed in the Project Description and described as an integral part of the narrative.

    Effective October 1, 2002, NSF will return without review proposals that do not separately address both merit review criteria within the Project Summary. It is believed that these changes to NSF proposal preparation and processing guidelines will more clearly articulate the importance of broader impacts to NSF-funded projects.

    The two National Science Board approved merit review criteria are listed below (see the Grant Proposal Guide Chapter III.A for further information). The criteria include considerations that help define them. These considerations are suggestions and not all will apply to any given proposal. While proposers must address both merit review criteria, reviewers will be asked to address only those considerations that are relevant to the proposal being considered and for which he/she is qualified to make judgments.

      What is the intellectual merit of the proposed activity?
      How important is the proposed activity to advancing knowledge and understanding within its own field or across different fields? How well qualified is the proposer (individual or team) to conduct the project? (If appropriate, the reviewer will comment on the quality of the prior work.) To what extent does the proposed activity suggest and explore creative and original concepts? How well conceived and organized is the proposed activity? Is there sufficient access to resources?
      What are the broader impacts of the proposed activity?
      How well does the activity advance discovery and understanding while promoting teaching, training, and learning? How well does the proposed activity broaden the participation of underrepresented groups (e.g., gender, ethnicity, disability, geographic, etc.)? To what extent will it enhance the infrastructure for research and education, such as facilities, instrumentation, networks, and partnerships? Will the results be disseminated broadly to enhance scientific and technological understanding? What may be the benefits of the proposed activity to society?

    NSF staff will give careful consideration to the following in making funding decisions:

      Integration of Research and Education
      One of the principal strategies in support of NSF's goals is to foster integration of research and education through the programs, projects, and activities it supports at academic and research institutions. These institutions provide abundant opportunities where individuals may concurrently assume responsibilities as researchers, educators, and students and where all can engage in joint efforts that infuse education with the excitement of discovery and enrich research through the diversity of learning perspectives.
      Integrating Diversity into NSF Programs, Projects, and Activities
      Broadening opportunities and enabling the participation of all citizens -- women and men, underrepresented minorities, and persons with disabilities -- is essential to the health and vitality of science and engineering. NSF is committed to this principle of diversity and deems it central to the programs, projects, and activities it considers and supports.

    B. Review Protocol and Associated Customer Service Standard

    All proposals are carefully reviewed by at least three other persons outside NSF who are experts in the particular field represented by the proposal. Proposals submitted in response to this announcement/solicitation will be reviewed by Ad Hoc and/or panel review.

    Reviewers will be asked to formulate a recommendation to either support or decline each proposal. The Program Officer assigned to manage the proposal's review will consider the advice of reviewers and will formulate a recommendation.

    A summary rating and accompanying narrative will be completed and submitted by each reviewer. In all cases, reviews are treated as confidential documents. Verbatim copies of reviews, excluding the names of the reviewers, are sent to the Principal Investigator/Project Director by the Program Director. In addition, the proposer will receive an explanation of the decision to award or decline funding.

    NSF is striving to be able to tell proposers whether their proposals have been declined or recommended for funding within six months. The time interval begins on the closing date of an announcement/solicitation, or the date of proposal receipt, whichever is later. The interval ends when the Division Director accepts the Program Officer's recommendation.

    In all cases, after programmatic approval has been obtained, the proposals recommended for funding will be forwarded to the Division of Grants and Agreements for review of business, financial, and policy implications and the processing and issuance of a grant or other agreement. Proposers are cautioned that only a Grants and Agreements Officer may make commitments, obligations or awards on behalf of NSF or authorize the expenditure of funds. No commitment on the part of NSF should be inferred from technical or budgetary discussions with a NSF Program Officer. A Principal Investigator or organization that makes financial or personnel commitments in the absence of a grant or cooperative agreement signed by the NSF Grants and Agreements Officer does so at their own risk.


    A. Notification of the Award

    Notification of the award is made to the submitting organization by a Grants Officer in the Division of Grants and Agreements. Organizations whose proposals are declined will be advised as promptly as possible by the cognizant NSF Program Division administering the program. Verbatim copies of reviews, not including the identity of the reviewer, will be provided automatically to the Principal Investigator. (See section VI.A. for additional information on the review process.)

    B. Award Conditions

    An NSF award consists of: (1) the award letter, which includes any special provisions applicable to the award and any numbered amendments thereto (2) the budget, which indicates the amounts, by categories of expense, on which NSF has based its support (or otherwise communicates any specific approvals or disapprovals of proposed expenditures) (3) the proposal referenced in the award letter (4) the applicable award conditions, such as Grant General Conditions (NSF-GC-1) * or Federal Demonstration Partnership (FDP) Terms and Conditions * and (5) any announcement or other NSF issuance that may be incorporated by reference in the award letter. Cooperative agreement awards are administered in accordance with NSF Cooperative Agreement Financial and Administrative Terms and Conditions (CA-FATC). Electronic mail notification is the preferred way to transmit NSF awards to organizations that have electronic mail capabilities and have requested such notification from the Division of Grants and Agreements.

    *These documents may be accessed electronically on NSF's Website at Paper copies of these documents may be obtained from the NSF Publications Clearinghouse, telephone (703) 292-7827 or by e-mail from [email protected]

    More comprehensive information on NSF Award Conditions is contained in the NSF Grant Policy Manual (GPM) Chapter II, available electronically on the NSF Website at The GPM is also for sale through the Superintendent of Documents, Government Printing Office (GPO), Washington, DC 20402. The telephone number at GPO for subscription information is (202) 512-1800. The GPM may be ordered through the GPO Website at

    C. Reporting Requirements

    For all multi-year grants (including both standard and continuing grants), the PI must submit an annual project report to the cognizant Program Officer at least 90 days before the end of the current budget period.

    Within 90 days after the expiration of an award, the PI also is required to submit a final project report. Failure to provide final technical reports delays NSF review and processing of pending proposals for the PI and all Co-PIs. PIs should examine the formats of the required reports in advance to assure availability of required data.

    PIs are required to use NSF's electronic project reporting system, available through FastLane, for preparation and submission of annual and final project reports. This system permits electronic submission and updating of project reports, including information on project participants (individual and organizational), activities and findings, publications, and other specific products and contributions. PIs will not be required to re-enter information previously provided, either with a proposal or in earlier updates using the electronic system.


    General inquiries regarding this program should be made to:

    James C. French, Program Director, Directorate for Computer & Information Science & Engineering, Division of Information and Intelligent Systems, 1125 S, telephone: (703) 292-8930, fax: (703) 292-9073, email: [email protected]

    Sylvia Spengler, Program Director, Directorate for Computer & Information Science & Engineering, Division of Information and Intelligent Systems, 1125 N, telephone: (703) 292-8936, fax: (703) 292-9073, email: [email protected]

    Other divisions within CISE and other Directorates within NSF are interested in aspects of this solicitation. PIs are encouraged to designate additional programmatic interest in their submissions to this solicitation.

    For questions related to the use of contact:

    • Contact Center: If the Authorized Organizational Representative (AOR) has not received a confirmation message from within 48 hours of submission of the application, please contact via telephone: 1-800-518-4726 e-mail: [email protected]

    For questions related to the use of FastLane, contact:

    Velma J. Swales, Lead Program Assistant, Directorate for Computer & Information Science & Engineering, Division of Information and Intelligent Systems, 1125 S, telephone: (703) 292-7845, fax: (703) 292-9073, email: [email protected]


    The NSF Guide to Programs is a compilation of funding for research and education in science, mathematics, and engineering. The NSF Guide to Programs is available electronically at General descriptions of NSF programs, research areas, and eligibility information for proposal submission are provided in each chapter.

    Many NSF programs offer announcements or solicitations concerning specific proposal requirements. To obtain additional information about these requirements, contact the appropriate NSF program offices. Any changes in NSF's fiscal year programs occurring after press time for the Guide to Programs will be announced in the NSF E-Bulletin, which is updated daily on the NSF Website at, and in individual program announcements/solicitations. Subscribers can also sign up for NSF's MyNSF News Service ( to be notified of new funding opportunities that become available.


    The National Science Foundation (NSF) funds research and education in most fields of science and engineering. Awardees are wholly responsible for conducting their project activities and preparing the results for publication. Thus, the Foundation does not assume responsibility for such findings or their interpretation.

    NSF welcomes proposals from all qualified scientists, engineers and educators. The Foundation strongly encourages women, minorities and persons with disabilities to compete fully in its programs. In accordance with Federal statutes, regulations and NSF policies, no person on grounds of race, color, age, sex, national origin or disability shall be excluded from participation in, be denied the benefits of, or be subjected to discrimination under any program or activity receiving financial assistance from NSF, although some programs may have special requirements that limit eligibility.

    Facilitation Awards for Scientists and Engineers with Disabilities (FASED) provide funding for special assistance or equipment to enable persons with disabilities (investigators and other staff, including student research assistants) to work on NSF-supported projects. See the GPG Chapter II, Section D.2 for instructions regarding preparation of these types of proposals.

    The National Science Foundation promotes and advances scientific progress in the United States by competitively awarding grants and cooperative agreements for research and education in the sciences, mathematics, and engineering.

    Watch the video: systems biology explained (August 2022).