Robust Phylogenetic Inference over Parallel and Distributed Digital Evolution Systems

retention visualization for hereditary stratigraphy policy Retention visualization for hereditary stratigraphy policy.

The capability to detect phylogenetic cues within digital evolution has become increasingly necessary in both applied and scientific contexts. These cues unlock post hoc insight into evolutionary history — particularly with respect to ecology and selection pressure — but also can be harnessed to drive digital evolution algorithms as they unfold. However, parallel and distributed evaluation complicates, among other concerns, maintenance of an evolutionary record. Existing phylogenetic record keeping requires inerrant and complete collation of birth and death reports within a centralized data structure. Such perfect tracking approaches are brittle to data loss or corruption and impose communication overhead.

A phylogenetic inference approach, as opposed to phylogenetic tracking, has potential to improve scalability and robustness. Under such a model, history is estimated from comparison of available extant genomes — aligning with the familiar paradigm of phylogenetic work in wet biology. However, this raises the question of how best to design digital genomes to facilitate phylogenetic inference.

This work introduces a new technique, called hereditary stratigraphy, that works by attaching a set of immutable historical “checkpoints” — referred to as strata — as an annotation on evolving genomes. Checkpoints can be strategically discarded to reduce annotation size at the cost of increasing inference uncertainty. An accompanying software library, hstrat, provides a plug-and-play implementation of hereditary stratigraphy that can be incorporated into any digital evolution system.

Publications & Software
2022 hstrat: a Python Package for phylogenetic inference on distributed digital evolution populations
Journal of Open Source Science (Under Revision)
Download
Authors
Date November 7th, 2022
Venue Journal of Open Source Science (Under Revision)
Abstract

Digital evolution systems instantiate evolutionary processes over populations of virtual agents in silico. These programs can serve as rich experimental model systems. Insights from digital evolution experiments expand evolutionary theory, and can often directly improve heuristic optimization techniques . Perfect observability, in particular, enables in silico experiments that would be otherwise impossible in vitro or in vivo. Notably, availability of the full evolutionary history (phylogeny) of a given population enables very powerful analyses.

As a slow but highly parallelizable process, digital evolution will benefit greatly by continuing to capitalize on profound advances in parallel and distributed computing [@moreno2020practical;@ackley2014indefinitely], particularly emerging unconventional computing architectures [@ackley2011homeostatic;@lauterbach2021path;@furber2014spinnaker]. However, scaling up digital evolution presents many challenges. Among these is the existing centralized perfect-tracking phylogenetic data collection model, which is inefficient and difficult to realize in parallel and distributed contexts. Here, we implement an alternative approach to tracking phylogenies across vast and potentially unreliable hardware networks.

The hstrat Python library exists to facilitate application of hereditary stratigraphy, a cutting-edge technique to enable phylogenetic inference over distributed digital evolution populations. This technique departs from the traditional perfect-tracking approach to phylogenetic record-keeping. Instead, hereditary stratigraphy enables phylogenetic history to be inferred from heritable annotations attached to evolving digital agents. This approach aligns with phylogenetic reconstruction methodologies in evolutionary biology. Hereditary stratigraphy attaches a set of immutable historical “checkpoints” — referred to as strata — as an annotation on evolving genomes. Checkpoints can be strategically discarded to reduce annotation size at the cost of increasing inference uncertainty. A particular strategy for which checkpoints to discard when is referred to as a stratum retention policy. We refer to the set of retained strata as a hereditary stratigraphic column.

Appropriate stratum retention policy choice varies by application. For example, if annotation size is not a concern it may be best to preserve all strata. In other situations, it may be necessary to constrain annotation size to remain within a fixed memory budget.

Key features of the library include:

  • object-oriented hereditary stratigraphic column implementation to annotate arbitrary genomes,
  • modular interchangeability and user extensibility of stratum retention policies,
  • programmatic interface to query guarantees and behavior of stratum retention policy,
  • modular interchangeability and user extensibility of back-end data structure used to store annotation data,
  • a suite of visualization tools to elucidate stratum retention policies,
  • support for automatic parameterization of stratum retention policies to meet user size complexity or inference precision specifications,
  • tools to compare two columns and extract information about the phylogenetic relationship between them,
  • extensive documentation hosted on ReadTheDocs,
  • a comprehensive test suite to ensure stability and reliability,
  • convenient availability as a Python package via the PyPI repository, and
  • pure Python implementation to ensure universal portability.
BibTeX
⎘ copy to clipboard
@article{moreno2022hstrat,
  author = {Moreno, Matthew Andres and Dolson, Emily and Ofria, Charles},
  title = "{hstrat: a Python Package for phylogenetic inference on distributed digital evolution populations}",
  journal = {Journal of Open Source Software},
  year = {Under Revision},
}
Citation
⎘ copy to clipboard

Matthew Andres Moreno, Emily Dolson, and Charles Ofria. hstrat: a Python Package for phylogenetic inference on distributed digital evolution populations. Journal of Open Source Software. Under Revision.


2022 Hereditary stratigraphy: genome annotations to enable phylogenetic inference over distributed populations
The Genetic and Evolutionary Computation Conference
Download
View at Publisher
Authors
Date May 13th, 2022
DOI 10.1145/3520304.3533937
Venue The Genetic and Evolutionary Computation Conference
Abstract

Phylogenetic analyses can also enable insight into evolutionary and ecological dynamics such as selection pressure and frequency dependent selection in digital evolution systems. Traditionally digital evolution systems have recorded data for phylogenetic analyses through perfect tracking where each birth event is recorded in a centralized data structures. This approach, however, does not easily scale to distributed computing environments where evolutionary individuals may migrate between a large number of disjoint processing elements. To provide for phylogenetic analyses in these environments, we propose an approach to infer phylogenies via heritable genetic annotations rather than directly track them. We introduce a “hereditary stratigraphy” algorithm that enables efficient, accurate phylogenetic reconstruction with tunable, explicit trade-offs between annotation memory footprint and reconstruction accuracy. This approach can estimate, for example, MRCA generation of two genomes within 10% relative error with 95% confidence up to a depth of a trillion generations with genome annotations smaller than a kilobyte. We also simulate inference over known lineages, recovering up to 85.70% of the information contained in the original tree using a 64-bit annotation.

BibTeX
⎘ copy to clipboard
@inproceedings{moreno2022hereditary_gecco,
  author = {Moreno, Matthew Andres and Dolson, Emily and Ofria, Charles},
  title = {Hereditary Stratigraphy: Genome Annotations to Enable Phylogenetic Inference over Distributed Populations},
  year = {2022},
  isbn = {9781450392686},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3520304.3533937},
  doi = {10.1145/3520304.3533937},
  booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference Companion},
  pages = {65–66},
  numpages = {2},
  keywords = {phylogenetics, decentralized algorithms, genetic algorithms, digital evolution, genetic programming},
  location = {Boston, Massachusetts},
  series = {GECCO '22}
}
Citation
⎘ copy to clipboard

Matthew Andres Moreno, Emily Dolson, and Charles Ofria. 2022. Hereditary stratigraphy: genome annotations to enable phylogenetic inference over distributed populations. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO ‘22). Association for Computing Machinery, New York, NY, USA, 65–66. https://doi.org/10.1145/3520304.3533937

Supporting Materials

2022 Hereditary stratigraphy: genome annotations to enable phylogenetic inference over distributed populations
The 2022 Conference on Artificial Life
Download
View at Publisher
Authors
Date May 13th, 2022
DOI 10.1162/isal_a_00550
Venue The 2022 Conference on Artificial Life
Abstract

Phylogenies provide direct accounts of the evolutionary trajectories behind evolved artifacts in genetic algorithm and artificial life systems. Phylogenetic analyses can also enable insight into evolutionary and ecological dynamics such as selection pressure and frequency-dependent selection. Traditionally, digital evolution systems have recorded data for phylogenetic analyses through perfect tracking where each birth event is recorded in a centralized data structure. This approach, however, does not easily scale to distributed computing environments where evolutionary individuals may migrate between a large number of disjoint processing elements. To provide for phylogenetic analyses in these environments, we propose an approach to enable phylogenies to be inferred via heritable genetic annotations rather than directly tracked. We introduce a “hereditary stratigraphy” algorithm that enables efficient, accurate phylogenetic reconstruction with tunable, explicit trade-offs between annotation memory footprint and reconstruction accuracy. In particular, we demonstrate an approach that enables estimation of the most recent common ancestor (MRCA) between two individuals with fixed relative accuracy irrespective of lineage depth while only requiring logarithmic annotation space complexity with respect to lineage depth This approach can estimate, for example, MRCA generation of two genomes within 10% relative error with 95% confidence up to a depth of a trillion generations with genome annotations smaller than a kilobyte. We also simulate inference over known lineages, recovering up to 85.70% of the information contained in the original tree using 64-bit annotations.

BibTeX
⎘ copy to clipboard
@inproceedings{moreno2022hereditary,
    author = {Moreno, Matthew Andres and Dolson, Emily and Ofria, Charles},
    title = "{Hereditary Stratigraphy: Genome Annotations to Enable Phylogenetic Inference over Distributed Populations}",
    volume = {ALIFE 2022: The 2022 Conference on Artificial Life},
    series = {ALIFE 2022: The 2022 Conference on Artificial Life},
    year = {2022},
    month = {07},
    doi = {10.1162/isal_a_00550},
    url = {https://doi.org/10.1162/isal\_a\_00550},
    note = {64},
    eprint = {https://direct.mit.edu/isal/proceedings-pdf/isal/34/64/2035363/isal\_a\_00550.pdf},
}
Citation
⎘ copy to clipboard

Matthew Andres Moreno, Emily Dolson, Charles Ofria; July 18–22, 2022. “Hereditary Stratigraphy: Genome Annotations to Enable Phylogenetic Inference over Distributed Populations.” Proceedings of the ALIFE 2022: The 2022 Conference on Artificial Life. ALIFE 2022: The 2022 Conference on Artificial Life. Online. (pp. 64). ASME. https://doi.org/10.1162/isal_a_00550

Supporting Materials

2022 hstrat
Python package published via PyPI
View on GitHub
View at Publisher
Authors
Date January 1st, 2022
Venue Python package published via PyPI

hstrat enables phylogenetic inference on distributed digital evolution populations.

BibTeX
⎘ copy to clipboard
@article{moreno2022hereditary,
  author = {Moreno, Matthew Andres and Dolson, Emily and Ofria, Charles},
  doi = {https://doi.org/10.1162/isal_a_00550},
  journal = {Proceedings of the ALIFE 2022: The 2022 Conference on Artificial Life},
  month = {7},
  pages = {64--74},
  title = {{Hereditary Stratigraphy: Genome Annotations to Enable Phylogenetic Inference over Distributed Populations}},
  volume = {Proceedings of the ALIFE 2022: The 2022 Conference on Artificial Life},
  year = {2022}
}
Citation
⎘ copy to clipboard

Moreno, M. A., Dolson, E., & Ofria, C. (2022). Hereditary Stratigraphy: Genome Annotations to Enable Phylogenetic Inference over Distributed Populations. Proceedings of the ALIFE 2022: The 2022 Conference on Artificial Life, Proceedings of the ALIFE 2022: The 2022 Conference on Artificial Life(), 64–74. https://doi.org/https://doi.org/10.1162/isal_a_00550