1.1: A Systematic Approach - Biology

1.1: A Systematic Approach - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Learning Objecctives

  • Describe how microorganisms are classified and distinguished as unique species
  • Compare historical and current systems of taxonomy used to classify microorganisms

Once microbes became visible to humans with the help of microscopes, scientists began to realize their enormous diversity. Microorganisms vary in all sorts of ways, including their size, their appearance, and their rates of reproduction. To study this incredibly diverse new array of organisms, researchers needed a way to systematically organize them.

The Science of Taxonomy

Taxonomy is the classification, description, identification, and naming of living organisms. Classification is the practice of organizing organisms into different groups based on their shared characteristics. The most famous early taxonomist was a Swedish botanist, zoologist, and physician named Carolus Linnaeus (1701–1778). In 1735, Linnaeus published Systema Naturae, an 11-page booklet in which he proposed the Linnaean taxonomy, a system of categorizing and naming organisms using a standard format so scientists could discuss organisms using consistent terminology. He continued to revise and add to the book, which grew into multiple volumes (Figure (PageIndex{1})).

In his taxonomy, Linnaeus divided the natural world into three kingdoms: animal, plant, and mineral (the mineral kingdom was later abandoned). Within the animal and plant kingdoms, he grouped organisms using a hierarchy of increasingly specific levels and sublevels based on their similarities. The names of the levels in Linnaeus’s original taxonomy were kingdom, class, order, family, genus (plural: genera), and species. Species was, and continues to be, the most specific and basic taxonomic unit.

Evolving Trees of Life (Phylogenies)

With advances in technology, other scientists gradually made refinements to the Linnaean system and eventually created new systems for classifying organisms. In the 1800s, there was a growing interest in developing taxonomies that took into account the evolutionary relationships, or phylogenies, of all different species of organisms on earth. One way to depict these relationships is via a diagram called a phylogenetic tree (or tree of life). In these diagrams, groups of organisms are arranged by how closely related they are thought to be. In early phylogenetic trees, the relatedness of organisms was inferred by their visible similarities, such as the presence or absence of hair or the number of limbs. Now, the analysis is more complicated. Today, phylogenic analyses include genetic, biochemical, and embryological comparisons, as will be discussed later in this chapter.

Linnaeus’s tree of life contained just two main branches for all living things: the animal and plant kingdoms. In 1866, ErnstHaeckel, a German biologist, philosopher, and physician, proposed another kingdom, Protista, for unicellular organisms (Figure (PageIndex{2})). He later proposed a fourth kingdom, Monera, for nicellular organisms whose cells lack nuclei, like bacteria.

Nearly 100 years later, in 1969, American ecologist Robert Whittaker (1920–1980) proposed adding another kingdom—Fungi—in his tree of life. Whittaker’s tree also contained a level of categorization above the kingdom level—the empire or superkingdom level—to distinguish between organisms that have membrane-bound nuclei in their cells (eukaryotes) and those that do not (prokaryotes). Empire Prokaryota contained just the Kingdom Monera. The Empire Eukaryota contained the other four kingdoms: Fungi, Protista, Plantae, and Animalia. Whittaker’s five-kingdom tree was considered the standard phylogeny for many years.

Figure (PageIndex{3}) shows how the tree of life has changed over time. Note that viruses are not found in any of these trees. That is because they are not made up of cells and thus it is difficult to determine where they would fit into a tree of life.

Exercise (PageIndex{1})

Briefly summarize how our evolving understanding of microorganisms has contributed to changes in the way that organisms are classified.

Clinical Focus: PART 2

Antibiotic drugs are specifically designed to kill or inhibit the growth of bacteria. But after a couple of days on antibiotics, Cora shows no signs of improvement. Also, her CSF cultures came back from the lab negative. Since bacteria or fungi were not isolated from Cora’s CSF sample, her doctor rules out bacterial and fungal meningitis. Viral meningitis is still a possibility.

However, Cora now reports some troubling new symptoms. She is starting to have difficulty walking. Her muscle stiffness has spread from her neck to the rest of her body, and her limbs sometimes jerk involuntarily. In addition, Cora’s cognitive symptoms are worsening. At this point, Cora’s doctor becomes very concerned and orders more tests on the CSF samples.

Exercise (PageIndex{2})

What types of microorganisms could be causing Cora’s symptoms?

The Role of Genetics in Modern Taxonomy

Haeckel’s and Whittaker’s trees presented hypotheses about the phylogeny of different organisms based on readily observable characteristics. But the advent of molecular genetics in the late 20th century revealed other ways to organize phylogenetic trees. Genetic methods allow for a standardized way to compare all living organisms without relying on observable characteristics that can often be subjective. Modern taxonomy relies heavily on comparing the nucleic acids (deoxyribonucleic acid [DNA] or ribonucleic acid [RNA]) or proteins from different organisms. The more similar the nucleic acids and proteins are between two organisms, the more closely related they are considered to be.

In the 1970s, American microbiologist Carl Woese discovered what appeared to be a “living record” of the evolution of organisms. He and his collaborator George Fox created a genetics-based tree of life based on similarities and differences they observed in the small subunit ribosomal RNA (rRNA) of different organisms. In the process, they discovered that a certain type of bacteria, called archaebacteria (now known simply as archaea), were significantly different from other bacteria and eukaryotes in terms of the sequence of small subunit rRNA. To accommodate this difference, they created a tree with three Domains above the level of Kingdom: Archaea, Bacteria, and Eukarya (Figure (PageIndex{4})). Genetic analysis of the small subunit rRNA suggests archaea, bacteria, and eukaryotes all evolved from a common ancestral cell type. The tree is skewed to show a closer evolutionary relationship between Archaea and Eukarya than they have to Bacteria.

Exercise (PageIndex{3})

  1. In modern taxonomy, how do scientists determine how closely two organisms are related?
  2. Explain why the branches on the “tree of life” all originate from a single “trunk.”

Naming Microbes

In developing his taxonomy, Linnaeus used a system of binomial nomenclature, a two-word naming system for identifying organisms by genus and species. For example, modern humans are in the genus Homo and have the species name sapiens, so their scientific name in binomial nomenclature is Homo sapiens. In binomial nomenclature, the genus part of the name is always capitalized; it is followed by the species name, which is not capitalized. Both names are italicized.

Taxonomic names in the 18th through 20th centuries were typically derived from Latin, since that was the common language used by scientists when taxonomic systems were first created. Today, newly discovered organisms can be given names derived from Latin, Greek, or English. Sometimes these names reflect some distinctive trait of the organism; in other cases, microorganisms are named after the scientists who discovered them. The archaeon Haloquadratum walsbyi is an example of both of these naming schemes. The genus, Haloquadratum, describes the microorganism’s saltwater habitat (halo is derived from the Greek word for “salt”) as well as the arrangement of its square cells, which are arranged in square clusters of four cells (quadratum is Latin for “foursquare”). The species, walsbyi, is named after Anthony Edward Walsby, the microbiologist who discovered Haloquadratum walsbyi in in 1980. While it might seem easier to give an organism a common descriptive name—like a red-headed woodpecker—we can imagine how that could become problematic. What happens when another species of woodpecker with red head coloring is discovered? The systematic nomenclature scientists use eliminates this potential problem by assigning each organism a single, unique two-word name that is recognized by scientists all over the world.

In this text, we will typically abbreviate an organism’s genus and species after its first mention. The abbreviated form is simply the first initial of the genus, followed by a period and the full name of the species. For example, the bacterium Escherichia coli is shortened to E. coli in its abbreviated form. You will encounter this same convention in other scientific texts as well.

Bergey’s Manuals

Whether in a tree or a web, microbes can be difficult to identify and classify. Without easily observable macroscopic features like feathers, feet, or fur, scientists must capture, grow, and devise ways to study their biochemical properties to differentiate and classify microbes. Despite these hurdles, a group of microbiologists created and updated a set of manuals for identifying and classifying microorganisms. First published in 1923 and since updated many times, Bergey’s Manual of Determinative Bacteriology and Bergey’s Manual of Systematic Bacteriology are the standard references for identifying and classifying different prokaryotes. (Appendix D of this textbook is partly based on Bergey’s manuals; it shows how the organisms that appear in this textbook are classified.) Because so many bacteria look identical, methods based on nonvisual characteristics must be used to identify them. For example, biochemical tests can be used to identify chemicals unique to certain species. Likewise, serological tests can be used to identify specific antibodies that will react against the proteins found in certain species. Ultimately, DNA and rRNA sequencing can be used both for identifying a particular bacterial species and for classifying newly discovered species.

  • What is binomial nomenclature and why is it a useful tool for naming organisms?
  • Explain why a resource like one of Bergey’s manuals would be helpful in identifying a microorganism in a sample.


Within one species of microorganism, there can be several subtypes called strains. While different strains may be nearly identical genetically, they can have very different attributes. The bacteriumEscherichia coli is infamous for causing food poisoning and traveler’s diarrhea. However, there are actually many different strains of E. coli, and they vary in their ability to cause disease.

One pathogenic (disease-causing) E. coli strain that you may have heard of is E. coli O157:H7. In humans, infection from E. coli O157:H7 can cause abdominal cramps and diarrhea. Infection usually originates from contaminated water or food, particularly raw vegetables and undercooked meat. In the 1990s, there were several large outbreaks of E. coli O157:H7 thought to have originated in undercooked hamburgers.

While E. coli O157:H7 and some other strains have given E. coli a bad name, most E. coli strains do not cause disease. In fact, some can be helpful. Different strains of E. coli found naturally in our gut help us digest our food, provide us with some needed chemicals, and fight against pathogenic microbes.


  • Carolus Linnaeus developed a taxonomic system for categorizing organisms into related groups.
  • Binomial nomenclature assigns organisms Latinized scientific names with a genus and species designation.
  • A phylogenetic tree is a way of showing how different organisms are thought to be related to one another from an evolutionary standpoint.
  • The first phylogenetic tree contained kingdoms for plants and animals; Ernst Haeckel proposed adding kingdom for protists.
  • Robert Whittaker’s tree contained five kingdoms: Animalia, Plantae, Protista, Fungi, and Monera.
  • Carl Woese used small subunit ribosomal RNA to create a phylogenetic tree that groups organisms into three domains based on their genetic similarity.
  • Bergey’s manuals of determinative and systemic bacteriology are the standard references for identifying and classifying bacteria, respectively.
  • Bacteria can be identified through biochemical tests, DNA/RNA analysis, and serological testing methods.


binomial nomenclature
a universal convention for the scientific naming of organisms using Latinized names for genus and species
an organism made up of one or more cells that contain a membrane-bound nucleus and organelles
the evolutionary history of a group of organisms
an organism whose cell structure does not include a membrane-bound nucleus
the classification, description, identification, and naming of living organisms


  • Nina Parker, (Shenandoah University), Mark Schneegurt (Wichita State University), Anh-Hue Thi Tu (Georgia Southwestern State University), Philip Lister (Central New Mexico Community College), and Brian M. Forster (Saint Joseph’s University) with many contributing authors. Original content via Openstax (CC BY 4.0; Access for free at

Biology in Time and Space: A Partial Differential Equation Modeling Approach

How do biological objects communicate, make structures, make measurements and decisions, search for food, i.e., do all the things necessary for survival? Designed for an advanced undergraduate audience, this book uses mathematics to begin to tell that story. It builds on a background in multivariable calculus, ordinary differential equations, and basic stochastic processes and uses partial differential equations as the framework within which to explore these questions.

An instructor's solutions manual for this title is available electronically to those instructors who have adopted the textbook for classroom use. Please send email to [email protected] for more information.

This book tells the story of living processes that change in time and space. Driven by scientific inquiry, methods from partial differential equations, stochastic processes, dynamical systems, and numerical methods are brought to bear on the subject, and their exposition seems effortless in the pursuit of deeper biological understanding. With subjects ranging from spruce budworm populations to calcium dynamics and from tiger bush patterns to collective behavior, this is a must-read for anyone who is serious about modern mathematical biology.

&mdash Mark Lewis, University of Alberta

Prof. Keener is one of the Great Minds in Math Biology who has trained generations of fine scientists and mathematicians over the years.

&mdash Leah Edelstein-Keshet, University of British Columbia

This is a fantastic book for those of us who teach mathematical modelling of spatiotemporal phenomena in biology, and for anyone who wishes to move into the field. It guides the reader on how one should tackle the art of modelling and, in a very systematic and natural way, introduces many of the necessary mathematical and computational approaches, seamlessly integrating them with the biology. It is a pleasure to read.

&mdash Philip Maini, University of Oxford

Mathematical Biology has few foundational texts. But this is one.

&mdash Michael C. Reed, Duke University


Undergraduate and graduate students and researchers interested in mathematical biology and PDEs.

Reviews & Endorsements

This book tells the story of living processes that change in time and space. Driven by scientific inquiry, methods from partial differential equations, stochastic processes, dynamical systems, and numerical methods are brought to bear on the subject, and their exposition seems effortless in the pursuit of deeper biological understanding. With subjects ranging from spruce budworm populations to calcium dynamics and from tiger bush patterns to collective behavior, this is a must-read for anyone who is serious about modern mathematical biology.

-- Mark Lewis, University of Alberta

Prof. Keener is one of the Great Minds in Math Biology who has trained generations of fine scientists and mathematicians over the years.

-- Leah Edelstein-Keshet, University of British Columbia

This is a fantastic book for those of us who teach mathematical modelling of spatiotemporal phenomena in biology, and for anyone who wishes to move into the field. It guides the reader on how one should tackle the art of modelling and, in a very systematic and natural way, introduces many of the necessary mathematical and computational approaches, seamlessly integrating them with the biology. It is a pleasure to read.

-- Philip Maini, University of Oxford

Mathematical Biology has few foundational texts. But this is one.

-- Michael C. Reed, Duke University

Table of Contents

A systematic approach to biology

At the turn of the millennium, and as the Human Genome Project approached its last phase, the pharmaceutical industry was buzzing with optimism about the innovative new medicines that would derive from the coming avalanche of human genomic and proteomic data. The evidence since then, however, has shown that this hype has not been justified in fact, fewer novel pharmaceuticals are entering the clinic each year now than during the 1990s. The industry is increasingly coming to realise that, just as biochemical processes do not work in isolation, a drug aimed at a single target is likely to affect others in the same pathway, with unexpected and potentially damaging consequences.

Many companies, therefore, are turning to the new discipline of systems or network biology to solve this dilemma. Adriano Henney, director of AstraZeneca’s Pathways Capability at Alderley Park, UK, defines systems biology in the context of the pharmaceutical industry as ‘understanding how a target will respond in the context of cellular networks’. Yet the same industry, strapped for cash, is hardly in a position to invest in complete new facilities. Sensibly, many companies are choosing collaboration: working closely with both academia and software companies to produce tailor-made solutions. One company that has found this approach productive is Merrimack Pharmaceuticals, based in Cambridge, Massachusetts, USA. The scientists at Merrimack collaborate closely with Massachusetts-based software company The MathWorks, and have been impressed with the performance of their novel systems biology platform, SimBiology.

Merrimack is a biopharmaceutical company, founded in 2000, that is developing a portfolio of biotherapeutics for the treatment of autoimmune disease and cancer. Its first product, MM-093, a recombinant version of human alpha-fetoprotein, has recently concluded a Phase 2 clinical trial for rheumatoid arthritis and a pilot study in psoriasis. It is now concentrating more on the oncology pipeline, developing monoclonal and bi-specific antibodies that will bind to, and so ‘turn off’ the growth factors that drive the growth of tumours. The Network Biology team at Merrimack uses the principles of chemical engineering, computational modelling and high-throughput biology to identify which of the many signalling proteins involved in a pathway is likely to be the best target for drug development, and what types of intervention will be the most effective. ‘The models and our Network Biology approach are used to better understand complex pathways in disease, to design targeted therapeutics, to predict synergistic drug combinations, and to identify patients most likely to respond to targeted therapies,’ says Birgit Schoeberl, Merrimack’s director of Network Biology.

Merrimack’s collaboration with The Mathworks’ goes back to the company’s beginnings. Initially Merrimack just used the latter’s complex Matlab environment to design and develop models of biochemical pathways. It has now been using SimBiology for about two years, initially as beta-testers. This product is essentially a graphical user interface (GUI) that sits on top of Matlab. Schoeberl explains the rationale behind Merrimack’s choice and use of this product: ‘SimBiology is a GUI that allows us to easily build and share mathematical models based on biochemical reaction schemes between modellers and experimentalists without losing any of the flexibility of writing our own code where we need to. Using SimBiology, it is very easy to change one or more input conditions and see how it affects the simulation results,’ adds Schoeberl. ‘The ease of use and the flexibility of basic Matlab in combination with all the toolboxes is why we implemented it. It makes it much easier for theoretical and experimental biologists to collaborate.’

Schoeberl and her colleagues examined a number of other modelling tools and GUIs, including open-source solutions, before settling on SimBiology. She is sure that one of the main factors influencing their choice was the fact that they were already familiar with Matlab for other applications. ‘In choosing SimBiology, we found a solution that enabled us to bolt the GUI seamlessly on to a comprehensive platform that we already knew well, and that includes bioinformatics and statistics tools as well as simulation software.’ They are also finding the facilities for searching for reactions and chemical species within models particularly useful, as some of the networks they work with can contain as many as 500 initial conditions and 200 kinetic parameters. They have found few disadvantages with the software. ‘Simbiology is still a little slow solving large systems of nonlinear ODE, and we would prefer it to be faster especially for sensitivity analysis or parameter estimation,’ admits Schoeberl.

The close working relationship between Merrimack and The MathWorks has also allowed Schoeberl and her colleagues to suggest novel functionality for SimBiology, and to see changes implemented promptly. One recent example has been the incorporation of parameter estimation and sensitivity analysis, techniques that Merrimack’s modellers have developed in-house, into the SimBiology platform. Schoeberl explains: ‘If you are modelling a complex protein interaction network you will want to know which parameters and conditions the output will be most sensitive to, as these are likely to be the most rational ones to adjust. Sensitivity analysis is a method of identifying these.’ As this issue went to press, Merrimack and the Mathworks were due to present a poster at the seventh annual international conference on systems biology (ICSB), held in Yokohama, Japan. The work presented there uses the example of the ErbB receptor network, which is important in cancer development, to explore differences between global and local sensitivity analysis.

Sensitivity analysis uses all the available experimental data about the pathway concerned and the parameters predicted to be most sensitive are then modified in order to fit the model better to that data. At present, the most important limitation of this method is that much of the data, particularly the kinetic parameters that describe the speed of reactions, is still not known.

A consortium of German academic groups is taking a similar approach to Merrimack in teaming up with a software company, GeneData from Basel, Switzerland, to simulate a cellular system of clinical importance. Jens Timmer from the Physics Institute in Freiburg, Germany, is the coordinator of the HepatoSys project, which was set up four or five years ago to model hepatocytes (liver cells). The consortium includes experimental and theoretical biologists and clinicians organised into local networks, some of which investigate specific biochemical processes while others develop generic methods. The German government recently agreed that HepatoSys’ first funding period had delivered ‘proof of principle’ and doubled its funding to 24 million Euros over three years.

HepatoSys scientists’ collaboration with GeneData came about because of the need to store quantities of data and models in standard formats and share them between all the groups in a disparate consortium. ‘When we set up the consortium we found that although all the groups were happy to share their data, it was stored in different formats and often poorly documented, so collaboration was very difficult in practice. Now we all use a single data format via an Oracle database built by GeneData. We hope that it will become a “gold standard” throughout Germany and beyond for the storage of data for systems biology, in parallel with the widely used Systems Biology Markup Language (SBML) standard for models,’ says Timmer. The choice of GeneData as a partner was largely based on that company’s years of experience in database design for the pharmaceutical industry. The database was written to a 100-page requirement specification produced by Timmer and his collaborators, who are also using GeneData’s Phylosopher software to integrate different types of molecular data and use it to reconstruct metabolic networks.

This collaboration has advantages for the industrial partner as well as the academic consortium. ‘The science funded through the HepatoSys project is cutting-edge, and it is very rewarding to work with them,’ says GeneData’s Hans-Peter Fischer. ‘There are also many technological challenges involved in setting up a large and complex database to store both experimental and simulated data.’ Fischer and colleagues are hoping that there will be further opportunities to market the database they have developed to the pharmaceutical and biotechnology industries when these industries are ready to put more investment into systems biology.

Nat Goodman from the Institute of Systems Biology in Seattle, Washington, USA agrees that The MathWorks and GeneData produce software that is useful, reliable and mathematically sound. However, he prefers to stress the limitations currently imposed on the modelling community by the dearth of accurate kinetic data. ‘So far, the most impressive network and pathway models have come from simple, isolated and well-studied systems where very detailed data is available, such as the effect of a single-gene “switch”: not so much “low-hanging fruit” as fruit that is already falling off the trees,’ he says. He believes that accurate simulations of more complex, interesting and clinically relevant networks will need data gathering ‘on a scale that most biologists cannot even imagine’. However, the first of Merrimack’s oncology products developed using network modelling are now in research and moving towards the clinic. A successful launch of one of these would prove the principle that even the imperfect models in current use are of significant value in modelling the complex process that is cancer development.


Collection of FDA-approved anticancer drugs and their relation information

We have collected anticancer drugs approved by FDA since 1949 to the end of 2014 from multiple data sources. We started the collection of the anticancer drugs from anticancer drug-focused websites, including National Cancer Institute (NCI) drug information [15], MediLexicon cancer drug list [16], and NavigatingCancer [17]. Then, we employed the tool MedEx-UIMA, a new natural language processing system, to retrieve the generic names for these drugs [18]. Using the generic names, we searched [email protected] [19] and downloaded their FDA labels. For those that cannot be found in the [email protected], we obtained their labels from Dailymed [20] or DrugBank [21]. From the drug label, we manually retrieved the initial approval year, drug action mechanism, drug target, delivery method, and indication for each drug. We further checked the multiple sources such as the MyCancerGenome [22], DrugBank, and the several publications [4, 23] to obtain the drug targets. For drug category, we manually checked the ChemoCare [24] to assign the drugs as cytotoxic or targeted agents. In our curated drug list, we did not include the medicines to treat drug side effects, cancer pain, other conditions, or cancer prevention.

Classes of drug targets and cancer

For these targeted agents, we collected their targets from FDA drug labels, DrugBank, and MyCancerGenome. We then manually curated the primary effect-mediating targets for each drug. We further retrieved the gene annotation from Ingenuity Pathway Analysis (IPA) [25] to obtain their subcellular location and family classes. For the indication, we first collected the detail information from FDA drug labels and then manually classified them into higher-level class for the purpose of data analysis. For example, drug idelalisib can be used to treat relapsed chronic lymphocytic leukemia (CLL), relapsed follicular B-cell non-Hodgkin lymphoma (FL), relapsed small lymphocytic lymphoma (SLL) from FDA labels. In our data analysis, we recorded the drug’s therapeutic classes as leukemia and lymphoma.

Cancer genes and somatic mutations of the cancer genome

The cancer gene set contains 594 genes from the Cancer Gene Census, which have been implicated in tumorigenesis by experimental evidence in the literature (July 14, 2016) [26]. We obtained 50 oncogenes (OCGs) and 50 tumor suppressor genes (TSGs) with high confidence from Davioli et al. [27]. The somatic mutations were obtained from Supplementary Table 2 in one previous work [28]. The table contains the somatic mutations in 3268 patients across 12 types of cancer. They are bladder urothelial carcinoma (BLCA), breast adenocarcinoma (BRCA), colon and rectal adenocarcinoma (COAD/READ), glioblastoma (GBM), head and neck squamous cell carcinoma (HNSC), kidney renal clear cell carcinoma (KIRC), acute myeloid leukemia (LAML), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), ovarian cancer (OV), and uterine corpus endometrioid carcinoma (UCEC). The mutations include missense, silent, nonsense, splice site, readthrough, frameshift indels (insertions/deletions) and inframe indels [28].

Network analysis

We built two networks based on our curated data, drug-cancer and drug-cancer-target networks. In the drug-cancer network, there are two types of nodes representing drug or cancer types and edges suggesting drug as the approved treatment for the cancer. In the drug-cancer-target network, there are three types of nodes representing cancer types, drug or drug target and edges indicating cancer-drug associations or drug-target interactions. The network degree is used to assess the toplogical feature of each cancer type and drug, i.e., the number of edges of each node in the network.

Common target-based approach

We used common target-based approach to discover novel drug-cancer associations [29]. It is one of the “guilt-by-association” strategies based on the knowledge that whether the drugs shared common targets or not. If two drugs A and B have a common target, drug A is in current use for treating cancer type C and drug B is used for cancer type D, it is highly likely to be effective for drug A-cancer type D and drug B-cancer type C associations.


We have presented a systematic benchmark evaluation comprehensively comparing 18 scRNA-seq imputation methods. Our comparison is subject to several limitations. Firstly, the imputation methods were mostly compared with default parameters which may not achieve optimal performance across all datasets. Our work could be further improved with the use of methods such as molecular cross-validation (MCV) [56]. In addition, we used 72 h as the time limit for convergence for imputation methods, which does not guarantee algorithmic convergence for some methods. In our evaluation of imputation methods on inferring pseudotime with trajectory analysis methods, the cell types of the tissue HCA_10x_tissue cells were computationally annotated. Another limitation is that there are no disease tissues included in this study. In the future, it is worthwhile to continue investigating how conclusions presented in this study may translate to applications in a diseased setting such as cancer tissues. One challenge is that the expression of some genes in diseased cells might be abnormal [57–60], which could lead to the false identification of similar cells and could affect the imputation performance.

An open problem not investigated in our current study is the impact of imputation methods on the RNA velocity analysis [61–63]. Since RNA velocity is estimated by analyzing unspliced and spliced mRNA, it takes into account both spliced and unspliced counts. Existing imputation methods deal with the drop-out events by imputing the gene expression values rather than the original reads, and the gene expression is usually quantified on exons only which may not distinguish contributions from the spliced versus unspliced transcripts. Therefore, whether existing imputation methods can also be applied to velocity analysis and to separately impute spliced and unspliced transcripts (including introns) remains an open problem that requires extensive future investigation which is beyond the scope of the present study. In addition to the velocity analysis, evaluating how imputation methods may impact other emerging analyses such as spatial transcriptomics [64–68] also warrants future investigation.


The original methods used in cladistic analysis and the school of taxonomy derived from the work of the German entomologist Willi Hennig, who referred to it as phylogenetic systematics (also the title of his 1966 book) the terms "cladistics" and "clade" were popularized by other researchers. Cladistics in the original sense refers to a particular set of methods used in phylogenetic analysis, although it is now sometimes used to refer to the whole field. [7]

What is now called the cladistic method appeared as early as 1901 with a work by Peter Chalmers Mitchell for birds [8] [9] and subsequently by Robert John Tillyard (for insects) in 1921, [10] and W. Zimmermann (for plants) in 1943. [11] The term "clade" was introduced in 1958 by Julian Huxley after having been coined by Lucien Cuénot in 1940, [12] "cladogenesis" in 1958, [13] "cladistic" by Arthur Cain and Harrison in 1960, [14] "cladist" (for an adherent of Hennig's school) by Ernst Mayr in 1965, [15] and "cladistics" in 1966. [13] Hennig referred to his own approach as "phylogenetic systematics". From the time of his original formulation until the end of the 1970s, cladistics competed as an analytical and philosophical approach to systematics with phenetics and so-called evolutionary taxonomy. Phenetics was championed at this time by the numerical taxonomists Peter Sneath and Robert Sokal, and evolutionary taxonomy by Ernst Mayr.

Originally conceived, if only in essence, by Willi Hennig in a book published in 1950, cladistics did not flourish until its translation into English in 1966 (Lewin 1997). Today, cladistics is the most popular method for inferring phylogenetic trees from morphological data.

In the 1990s, the development of effective polymerase chain reaction techniques allowed the application of cladistic methods to biochemical and molecular genetic traits of organisms, vastly expanding the amount of data available for phylogenetics. At the same time, cladistics rapidly became popular in evolutionary biology, because computers made it possible to process large quantities of data about organisms and their characteristics.

The cladistic method interprets each shared character state transformation as a potential piece of evidence for grouping. Synapomorphies (shared, derived character states) are viewed as evidence of grouping, while symplesiomorphies (shared ancestral character states) are not. The outcome of a cladistic analysis is a cladogram – a tree-shaped diagram (dendrogram) [16] that is interpreted to represent the best hypothesis of phylogenetic relationships. Although traditionally such cladograms were generated largely on the basis of morphological characters and originally calculated by hand, genetic sequencing data and computational phylogenetics are now commonly used in phylogenetic analyses, and the parsimony criterion has been abandoned by many phylogeneticists in favor of more "sophisticated" but less parsimonious evolutionary models of character state transformation. Cladists contend that these models are unjustified because there is no evidence that they recover more "true" or "correct" results from actual empirical data sets [17]

Every cladogram is based on a particular dataset analyzed with a particular method. Datasets are tables consisting of molecular, morphological, ethological [18] and/or other characters and a list of operational taxonomic units (OTUs), which may be genes, individuals, populations, species, or larger taxa that are presumed to be monophyletic and therefore to form, all together, one large clade phylogenetic analysis infers the branching pattern within that clade. Different datasets and different methods, not to mention violations of the mentioned assumptions, often result in different cladograms. Only scientific investigation can show which is more likely to be correct.

Until recently, for example, cladograms like the following have generally been accepted as accurate representations of the ancestral relations among turtles, lizards, crocodilians, and birds: [19]

If this phylogenetic hypothesis is correct, then the last common ancestor of turtles and birds, at the branch near the ▼ lived earlier than the last common ancestor of lizards and birds, near the ♦ . Most molecular evidence, however, produces cladograms more like this: [20]

If this is accurate, then the last common ancestor of turtles and birds lived later than the last common ancestor of lizards and birds. Since the cladograms show two mutually exclusive hypotheses to describe the evolutionary history, at most one of them is correct.

The cladogram to the right represents the current universally accepted hypothesis that all primates, including strepsirrhines like the lemurs and lorises, had a common ancestor all of whose descendants were primates, and so form a clade the name Primates is therefore recognized for this clade. Within the primates, all anthropoids (monkeys, apes and humans) are hypothesized to have had a common ancestor all of whose descendants were anthropoids, so they form the clade called Anthropoidea. The "prosimians", on the other hand, form a paraphyletic taxon. The name Prosimii is not used in phylogenetic nomenclature, which names only clades the "prosimians" are instead divided between the clades Strepsirhini and Haplorhini, where the latter contains Tarsiiformes and Anthropoidea.

The following terms, coined by Hennig, are used to identify shared or distinct character states among groups: [21] [22] [23]

  • A plesiomorphy ("close form") or ancestral state is a character state that a taxon has retained from its ancestors. When two or more taxa that are not nested within each other share a plesiomorphy, it is a symplesiomorphy (from syn-, "together"). Symplesiomorphies do not mean that the taxa that exhibit that character state are necessarily closely related. For example, Reptilia is traditionally characterized by (among other things) being cold-blooded (i.e., not maintaining a constant high body temperature), whereas birds are warm-blooded. Since cold-bloodedness is a plesiomorphy, inherited from the common ancestor of traditional reptiles and birds, and thus a symplesiomorphy of turtles, snakes and crocodiles (among others), it does not mean that turtles, snakes and crocodiles form a clade that excludes the birds.
  • An apomorphy ("separate form") or derived state is an innovation. It can thus be used to diagnose a clade – or even to help define a clade name in phylogenetic nomenclature. Features that are derived in individual taxa (a single species or a group that is represented by a single terminal in a given phylogenetic analysis) are called autapomorphies (from auto-, "self"). Autapomorphies express nothing about relationships among groups clades are identified (or defined) by synapomorphies (from syn-, "together"). For example, the possession of digits that are homologous with those of Homo sapiens is a synapomorphy within the vertebrates. The tetrapods can be singled out as consisting of the first vertebrate with such digits homologous to those of Homo sapiens together with all descendants of this vertebrate (an apomorphy-based phylogenetic definition). [24] Importantly, snakes and other tetrapods that do not have digits are nonetheless tetrapods: other characters, such as amniotic eggs and diapsid skulls, indicate that they descended from ancestors that possessed digits which are homologous with ours.
  • A character state is homoplastic or "an instance of homoplasy" if it is shared by two or more organisms but is absent from their common ancestor or from a later ancestor in the lineage leading to one of the organisms. It is therefore inferred to have evolved by convergence or reversal. Both mammals and birds are able to maintain a high constant body temperature (i.e., they are warm-blooded). However, the accepted cladogram explaining their significant features indicates that their common ancestor is in a group lacking this character state, so the state must have evolved independently in the two clades. Warm-bloodedness is separately a synapomorphy of mammals (or a larger clade) and of birds (or a larger clade), but it is not a synapomorphy of any group including both these clades. Hennig's Auxiliary Principle [25] states that shared character states should be considered evidence of grouping unless they are contradicted by the weight of other evidence thus, homoplasy of some feature among members of a group may only be inferred after a phylogenetic hypothesis for that group has been established.

The terms plesiomorphy and apomorphy are relative their application depends on the position of a group within a tree. For example, when trying to decide whether the tetrapods form a clade, an important question is whether having four limbs is a synapomorphy of the earliest taxa to be included within Tetrapoda: did all the earliest members of the Tetrapoda inherit four limbs from a common ancestor, whereas all other vertebrates did not, or at least not homologously? By contrast, for a group within the tetrapods, such as birds, having four limbs is a plesiomorphy. Using these two terms allows a greater precision in the discussion of homology, in particular allowing clear expression of the hierarchical relationships among different homologous features.

It can be difficult to decide whether a character state is in fact the same and thus can be classified as a synapomorphy, which may identify a monophyletic group, or whether it only appears to be the same and is thus a homoplasy, which cannot identify such a group. There is a danger of circular reasoning: assumptions about the shape of a phylogenetic tree are used to justify decisions about character states, which are then used as evidence for the shape of the tree. [26] Phylogenetics uses various forms of parsimony to decide such questions the conclusions reached often depend on the dataset and the methods. Such is the nature of empirical science, and for this reason, most cladists refer to their cladograms as hypotheses of relationship. Cladograms that are supported by a large number and variety of different kinds of characters are viewed as more robust than those based on more limited evidence. [27]

Mono-, para- and polyphyletic taxa can be understood based on the shape of the tree (as done above), as well as based on their character states. [22] [23] [28] These are compared in the table below.

Term Node-based definition Character-based definition
Monophyly A clade, a monophyletic taxon, is a taxon that includes all descendants of an inferred ancestor. A clade is characterized by one or more apomorphies: derived character states present in the first member of the taxon, inherited by its descendants (unless secondarily lost), and not inherited by any other taxa.
Paraphyly A paraphyletic assemblage is one that is constructed by taking a clade and removing one or more smaller clades. [29] (Removing one clade produces a singly paraphyletic assemblage, removing two produces a doubly paraphylectic assemblage, and so on.) [30] A paraphyletic assemblage is characterized by one or more plesiomorphies: character states inherited from ancestors but not present in all of their descendants. As a consequence, a paraphyletic assemblage is truncated, in that it excludes one or more clades from an otherwise monophyletic taxon. An alternative name is evolutionary grade, referring to an ancestral character state within the group. While paraphyletic assemblages are popular among paleontologists and evolutionary taxonomists, cladists do not recognize paraphyletic assemblages as having any formal information content – they are merely parts of clades.
Polyphyly A polyphyletic assemblage is one which is neither monophyletic nor paraphyletic. A polyphyletic assemblage is characterized by one or more homoplasies: character states which have converged or reverted so as to be the same but which have not been inherited from a common ancestor. No systematist recognizes polyphyletic assemblages as taxonomically meaningful entities, although ecologists sometimes consider them meaningful labels for functional participants in ecological communities (e. g., primary producers, detritivores, etc.).

Cladistics, either generally or in specific applications, has been criticized from its beginnings. Decisions as to whether particular character states are homologous, a precondition of their being synapomorphies, have been challenged as involving circular reasoning and subjective judgements. [31] Of course, the potential unreliability of evidence is a problem for any systematic method, or for that matter, for any empirical scientific endeavor at all. [32]

Transformed cladistics arose in the late 1970s [33] in an attempt to resolve some of these problems by removing a priori assumptions about phylogeny from cladistic analysis, but it has remained unpopular. [34]

The cladistic method does not identify fossil species as actual ancestors of a clade. [35] Instead, fossil taxa are identified as belonging to separate extinct branches. While a fossil species could be the actual ancestor of a clade, there is no way to know that. Therefore, a more conservative hypothesis is that the fossil taxon is related to other fossil and extant taxa, as implied by the pattern of shared apomorphic features. [36]

The comparisons used to acquire data on which cladograms can be based are not limited to the field of biology. [37] Any group of individuals or classes that are hypothesized to have a common ancestor, and to which a set of common characteristics may or may not apply, can be compared pairwise. Cladograms can be used to depict the hypothetical descent relationships within groups of items in many different academic realms. The only requirement is that the items have characteristics that can be identified and measured.

Anthropology and archaeology: [38] Cladistic methods have been used to reconstruct the development of cultures or artifacts using groups of cultural traits or artifact features.

Comparative mythology and folktale use cladistic methods to reconstruct the protoversion of many myths. Mythological phylogenies constructed with mythemes clearly support low horizontal transmissions (borrowings), historical (sometimes Palaeolithic) diffusions and punctuated evolution. [39] They also are a powerful way to test hypotheses about cross-cultural relationships among folktales. [40] [41]

Literature: Cladistic methods have been used in the classification of the surviving manuscripts of the Canterbury Tales, [42] and the manuscripts of the Sanskrit Charaka Samhita. [43]

Historical linguistics: [44] Cladistic methods have been used to reconstruct the phylogeny of languages using linguistic features. This is similar to the traditional comparative method of historical linguistics, but is more explicit in its use of parsimony and allows much faster analysis of large datasets (computational phylogenetics).

Textual criticism or stemmatics: [43] [45] Cladistic methods have been used to reconstruct the phylogeny of manuscripts of the same work (and reconstruct the lost original) using distinctive copying errors as apomorphies. This differs from traditional historical-comparative linguistics in enabling the editor to evaluate and place in genetic relationship large groups of manuscripts with large numbers of variants that would be impossible to handle manually. It also enables parsimony analysis of contaminated traditions of transmission that would be impossible to evaluate manually in a reasonable period of time.

Astrophysics [46] infers the history of relationships between galaxies to create branching diagram hypotheses of galaxy diversification.

Systematics: Meaning, Branches and Its Application

The term systematics is derived from the Latinised Greek word and ‘systema’ means ‘together’. The systematics partly overlap with taxonomy and originally used to des­cribe the system of classification prescribed by early biologists. Linnaeus applied the word “Systematics” in the system of classi­fication in his famous book ‘Systema Natu­rae’ published in 1735.

Blackwelder and Boyden (1952) gave a definition that “sys­tematics is the entire field dealing with the kinds of animals, their distinction, classifica­tion and evolution”. C. G. Simpson (1961) considers that “Systematics is the scientific study of the kinds and diversity of organ­isms and of any and all relationships among them”.

The simpler definition by Ernst Mayr (1969), and Mayr and Ashlock (1991) is “Sys­tematics is the science of the diversity of organisms”. Christoffersen (1995) has de­fined systematics as “the theory, principles and practice of identifying (discovering) systems, i.e., of ordering the diversity of organisms (parts) into more general systems of taxa according to the most general causal processes”.

The systematics includes both taxonomy and evolution. Taxonomy includes classifi­cation and nomenclature but inclines heavily on systematics for its concepts. So study of systematics includes a much broader aspect that includes not only morphology and ana­tomy but also genetics, molecular biology, behavioural aspects and evolutionary biology.

The recent approach to the science of biology has added a new dimension to the science of classification and the new system­atics has emerged as a synthesis of progress in all the major disciplines of Biology.

Branches of Systematics:

The new systematics may be divided into following branches:

1. Numerical systematics:

This type of systematics is based on bio-statistical method in identification and classifi­cation of animals. This branch is called biometry.

2. Biochemical systematics:

This branch of systematics deals with classification of animals on the basis of biochemical analysis of protoplasm.

3. Experimental systematics:

This branch of systematics deals with identification of various evolutionary units within a species and their role in the process of evolution. Here mutation is considered as evolutionary unit.

Application of Systematics in Biology:

1. Systematics is the study of diversity of organisms including past and present and relationships among living things. Relationships are established by mak­ing cladograms, phylogenetic trees and phylogenies. The phylogeny is the evolutionary history of an animal or plant, for a taxonomic group.

Phylogenies include two parts—the first part shows the group relationships and the second part indicates the amount of evolution. Phylogenetic trees of species and higher taxa are established by morphological, physi­ological and molecular characteristics, and the distribution of animals and their ancestors are related to geogra­phy. In this way the systematics is used to understand the evolutionary history of organisms.

2. The field of systematics provides scientific names of the organisms, de­scription of the species, ordering the organisms into higher taxa, classifica­tion of the organisms and evolution­ary histories.

3. Systematics is also important in imple­menting the conservation issues be­cause it attempts to explain the biodiversity which is related to differ­ent kinds of species and could be used in preservation and protect the endan­gered animals and plants.

The loss of biodiversity is related to the extreme harmful of the existence of mankind. The unchecked human population destroy different kinds of plants and animals for food and other factors.

4. The destruction or suppression of harm­ful pests or animals by the introduc­tion and increase of their natural en­emies is called biological control.

The natural enemies of pests are often in­troduced for biological control for the advantage of agriculture and forestry. The natural enemies include insectivorous spiders, centipedes, some insects, frogs and birds which are much more economical than the chemical control because they have no injurious side effects.

The predaceous insects play a vital role in the natural control of in­jurious insects. The adult and larval stages of predatory insects of lady bird beetles (Coccinellidae) are economi­cally very important and are responsi­ble for the destruction of the colonies of plant lice, scale insects, mealy bugs and white flies which are found as serious pests in various parts of the world.

Some chrysopids are also preda­tory enemies of mealy bugs and plant lice. An egg parasite, Trichogramma sp. is utilized in India for the control of sugar cane borers and boll worms of cotton.

In all cases the proper identification of parasites and their hosts are necessary for the control of the pests. The systematists are involved in implement­ing the biological control programmes of the pests and diseases most effec­tively.

5. There are a lot of insects which act as vectors of various human diseases. For example, some species of Anopheles sp. are the vector of malaria diseases, Aedes aegypti spreads the virus of dengu fe­ver and phlebotomus argentipes spreads the pathogens of kala-azar fever.

So taxonomists play a vital role in iden­tification of the species of vectors, and control strategy programmes of the vectors should be planned in such a way that the target species is attacked.

Classification of Enzymes | Biochemistry

In this article we will discuss about the classification of enzymes.

Some enzymes are often designated by common names based on usage (pepsin, trypsin, chymotrypsin, papain etc) but these names contain no infor­mation on the substrate and the reaction catalyzed. In certain cases, enzymes catalyzing hydrolysis reactions are designated by the name of the substrate followed by the suffix “ase” (peptidase, phosphatase, arginase, etc.).

A slightly more precise denomination uses the name of the substrate and then that of the reaction catalyzed, with the suffix “ase”, for example, violate dehydrogenase.

In 1961, the Commission on Enzymes of the International Union of Biochemistry established a systematic and much more rigorous classification and nomencla­ture comprising 6 classes divided into sub-classes the latter are themselves divided into sub-sub-classes which are numbered.

In this new denomination, malate dehydrogenase is called malate-NAD-oxidoreductase this name reflects not only the type of reaction catalyzed, but also the name of the substrate and that of the hydrogen acceptor.

The following are a few examples illustrating this classification:

This class comprises the enzymes which were earlier called dehydrogenases, oxidases, peroxidases, hydroxylases, oxigenases, etc.

1.1.1 With NAD + or NADP + as hydrogen acceptor

Ex. L-Malate : NAD-oxidoreductase (, see fig. 4-38.

L-lactate: NAD-oxidoreductase (, see fig. 4-30.

1.1.2 With a cytochrome as acceptor

Ex. L-lactate: ferricytochrome c-oxidoreductase (

1.1.3 With O2 as hydrogen acceptor

Ex. glucose oxidase or β-D-glucose: oxygen-oxidoreductase (

1.2.1 With NAD + or NADP + as acceptor

Ex. D-glyceraldehyde-3-phosphate: NAD oxidoreductase ( see fig. 4-27.

Ex. xanthine oxygen oxidoreductase (

1.2.4 With lipoic acid as acceptor.

1.3.1 With NAD + or NADP + as acceptor

Ex. 4.5 dihydrouracil: NAD oxidoreductase (

1.3.2 With a cytochrome as acceptor.

Ex. 4.5-dihydro-orotate: oxygen oxidoreductase (, see fig. 6-22.

1.4.1 With NAD + or NADP + as acceptor

Ex. L-Glutamate: NAD oxidoreductase ( see fig. 7-3. etc.

2.1 Transferring a monocarbon group (C1):

Ex. S-adenosyl-methionine: L-homocysteine S-methyl transferase (

2.1.2 Hydroxymethyl transferases and formyl transferases

Ex. L-serine : tetrahydrofolate 5,10 hydroxymethyl transferase ( see fig. 7-9.

2.1.3 Carboxyl transferases and carbamoyl transferases

Ex. carbamylphosphate: L-aspartate carbamyl transferase

Ex. UDPG-glucose: D-fructose glucosyl transferase (

2.4.2 Pentosyl transferases

Ex. Uridine: Orthophosphate ribosyltransferase (

2.6 Transferring nitrogen groups:

Ex. L-aspartate: ketoglutarate amino transferase (

2.7 Phosphoryl transferases:

2.7.1 With an alcohol group as acceptor

Ex. ATP: D-hexose-6-phosphotransferase ( see. fig. 4-20.

2.7.2 With a carboxylic group as acceptor

Ex. ATP: 3 Phosphoglycerate 1-phosphotransferase ( see fig. 4-28

2.7.3 With a nitrogen group as acceptor

Ex. ATP: creatine phosphotransferase ( see fig. 7-13. etc.

3.1 Splitting the Ester Bonds:

Ex. Lipase or glycerol-ester hydrolase (

Ex. alkaline phosphatase ( see fig. 6-13.

Ex. ribonucleases, deoxyribonucleases, see figs. 6-10 to 6-12 deoxyribonucleate 3′ nucleotido hydrolase (

Ex. β-glucosidase or β-D-glucoside glucohydrolase (

3.4 Splitting peptide bonds

3.4.1 α-aminopeptido-amino acid hydrolases

Ex. aminopeptidase or aminoacyl-peptide hydrolase (

3.4.2 α-carboxypeptido-amino acid hydrolases

Ex. carboxypeptidase A or peptidyl-L amino acid hydrolase (

3.4.4 Peptido-peptide hydrolases (endopeptidases).

Catalyzing the removal of a group by a process other than hydrolysis (often, with formation of a double bond) or on the contrary, catalyzing the addition of a group.

4.1.1 Carboxylases (Carboxylases or Decarboxylases)

Ex. aspartate decarboxylase or L-aspartate 4 carboxylase (

Ex. fructose-bisphosphate aldolase or fructose 1-6 bisphosphate:

D-glyceraldehyde-3-phosphate lyase ( see fig. 4-26.

Ex. L-aspartate-ammonium lyase ( see fig. 7-5 etc.

5.1 Racemases and epimerases

5.1.1 Acting on amino acids

Ex. D-ribulose-5-phosphate-3-epimerase ( see fig. 4-40.

Ex. 4 maleyl-aceto acetate cis-trans isomerase ( see fig. 7-24.

5.3 Intramolecular oxidorcductases

5.3.1 Catalyzing the interconversion aldose-ketose.

Ex. D-glyceraldehyde 3 phosphate keto-isomcrase or triosephosphate isomerase ( see fig. 4-26.

5.4 Intramolecular transferases

Ex. L-methylmalonyl-coA-coA-carbonyl mutase ( see fig. 5-13. etc.


WGS data are also a rich source for chloroplast assemblies. For nearly half of the analyzed data without available chloroplast genome, we could generate complete assemblies using at least one of the tools.

Still, even with simulated (i.e., “perfect”) data, not all tools succeeded in generating complete chloroplast assemblies. Therefore, we determined the strengths and weaknesses of the specific tools and have provided guidelines for users. It might however be necessary to combine different methods or manually explore the parameter space. Ultimately, large-scale studies reconstructing hundreds or thousands of chloroplast genomes are now feasible using the currently available tools.

The biology of human overfeeding: A systematic review

George A. Bray, MD and Claude Bouchard, PhD, Pennington Biomedical Research Center, Louisiana State University System, Baton Rouge, LA, USA.

Pennington Biomedical Research Center, Louisiana State University System, Baton Rouge, Louisiana, USA

George A. Bray, MD and Claude Bouchard, PhD, Pennington Biomedical Research Center, Louisiana State University System, Baton Rouge, LA, USA.

Pennington Biomedical Research Center, Louisiana State University System, Baton Rouge, Louisiana, USA

George A. Bray, MD and Claude Bouchard, PhD, Pennington Biomedical Research Center, Louisiana State University System, Baton Rouge, LA, USA.

Pennington Biomedical Research Center, Louisiana State University System, Baton Rouge, Louisiana, USA

George A. Bray, MD and Claude Bouchard, PhD, Pennington Biomedical Research Center, Louisiana State University System, Baton Rouge, LA, USA.


This systematic review has examined more than 300 original papers dealing with the biology of overfeeding. Studies have varied from 1 day to 6 months. Overfeeding produced weight gain in adolescents, adult men and women and in older men. In longer term studies, there was a clear and highly significant relationship between energy ingested and weight gain and fat storage with limited individual differences. There is some evidence for a contribution of a genetic component to this response variability. The response to overfeeding was affected by the baseline state of the groups being compared: those with insulin resistance versus insulin sensitivity those prone to obesity versus those resistant to obesity and those with metabolically abnormal obesity versus those with metabolically normal obesity. Dietary components, such as total fat, polyunsaturated fat and carbohydrate influenced the patterns of adipose tissue distribution as did the history of low or normal birth weight. Overfeeding affected the endocrine system with increased circulating concentrations of insulin and triiodothyronine frequently present. Growth hormone, in contrast, was rapidly suppressed. Changes in plasma lipids were influenced by diet, exercise and the magnitude of weight gain. Adipose tissue and skeletal muscle morphology and metabolism are substantially altered by chronic overfeeding.

Table S1. Main and Ancillary Studies Related to Overfeeding

Table S2. Interaction of overfeeding and exercise on resting metabolic rate (RMR), thermic effect of food (TEM) in the upper panel and fat oxidation and carbohydrate oxidation in the lower panel. Data show changes from baseline measurements to the subsequent effects of increasing energy intake or energy expenditure to maintain energy flux, increasing energy flux or reducing energy flux.

Table S3. Summary of the Effect of Diet and Overfeeding on Changes in Thyroid Hormones

Table S4. Response of Glucose, Insulin and Insulin Sensitivity to Overfeeding

Table S5. Effect of Overfeeding on Gastrointestinal

Table S6. Effect of Overfeeding on Neuronal Responses Using Functional Magnetic Resonance Imaging (fMRI)

Table S7. Response of Triglycerides, Cholesterol, LDL-Chol, HDL-Chol, FFA, and Other Parameters to Overfeeding

Table S8. Effect of Overfeeding on inflammatory Markers

Table S9. The effects of long-term overfeeding on selected skeletal muscle characteristics

Table S10. Effect of overfeeding for 100 days on the energy cost and respiratory exchange ratio of selected resting and exercise workloads.

Figure S1. Plot of fat mass gain in relation to the number of overfed calories in 17 experimental groups retrieved after removing the low-protein overfeeding studies. 7,9,13,25,38,46,47,58,96,117,199-201 Study numbers are defined in Table 4 of the main manuscript.

Figure S2. Plot of fat-free mass gain in relation to the number of overfed calories in 17 experimental groups retrieved after removing the low-protein overfeeding studies. 7,9,13,25,38,46,47,58,96,117,199-201 Study numbers are defined in Table 4 of the main manuscript.

Figure S3. Plot of total body energy gain in relation to the number of overfed calories in 17 experimental groups retrieved after removing the low-protein overfeeding studies. 7,9,13,25,38,46,47,58,96,117,199-201 Study numbers are defined in Table 4 of the main manuscript.

Figure S4. Plot of body energy gain in % of the overfed calories in relation to the number of overfed calories in 17 experimental groups retrieved after removing the low-protein overfeeding studies. 7,9,13,25,38,46,47,58,96,117,199-201 Study numbers are defined in Table 4 of the main manuscript.

Figure S5. Observed-minus-predicted TEE (Shaded Bars). Based on the regression of Total Daily Energy Expenditure in a model combining FFM and fat mass in the same subjects at their initial weight Mean (±SD). Reproduced from Leibel et al. 35

Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.