Robust Phylogenetic Inference over Parallel and Distributed Digital Evolution Systems
Retention visualization for hereditary stratigraphy policy.
The capability to detect phylogenetic cues within digital evolution has become increasingly necessary in both applied and scientific contexts. These cues unlock post hoc insight into evolutionary history — particularly with respect to ecology and selection pressure — but also can be harnessed to drive digital evolution algorithms as they unfold. However, parallel and distributed evaluation complicates, among other concerns, maintenance of an evolutionary record. Existing phylogenetic record keeping requires inerrant and complete collation of birth and death reports within a centralized data structure. Such perfect tracking approaches are brittle to data loss or corruption and impose communication overhead.
A phylogenetic inference approach, as opposed to phylogenetic tracking, has potential to improve scalability and robustness. Under such a model, history is estimated from comparison of available extant genomes — aligning with the familiar paradigm of phylogenetic work in wet biology. However, this raises the question of how best to design digital genomes to facilitate phylogenetic inference.
This work introduces a new technique, called hereditary stratigraphy, that works by attaching a set of immutable historical “checkpoints” — referred to as strata — as an annotation on evolving genomes.
Checkpoints can be strategically discarded to reduce annotation size at the cost of increasing inference uncertainty.
An accompanying software library, hstrat
, provides a plug-and-play implementation of hereditary stratigraphy that can be incorporated into any digital evolution system.
Publications & Software
Authors | Matthew Andres Moreno, Emily Dolson, Charles Ofria |
Date | November 7th, 2022 |
DOI | 10.21105/joss.04866 |
Venue | Journal of Open Source Software |
Abstract
Digital evolution systems instantiate evolutionary processes over populations of virtual agents in silico. These programs can serve as rich experimental model systems. Insights from digital evolution experiments expand evolutionary theory, and can often directly improve heuristic optimization techniques . Perfect observability, in particular, enables in silico experiments that would be otherwise impossible in vitro or in vivo. Notably, availability of the full evolutionary history (phylogeny) of a given population enables very powerful analyses.
As a slow but highly parallelizable process, digital evolution will benefit greatly by continuing to capitalize on profound advances in parallel and distributed computing, particularly emerging unconventional computing architectures. However, scaling up digital evolution presents many challenges. Among these is the existing centralized perfect-tracking phylogenetic data collection model, which is inefficient and difficult to realize in parallel and distributed contexts. Here, we implement an alternative approach to tracking phylogenies across vast and potentially unreliable hardware networks.
BibTeX
@article{moreno2022hstrat,
doi = {10.21105/joss.04866},
url = {https://doi.org/10.21105/joss.04866},
year = {2022}
publisher = {The Open Journal},
volume = {7},
number = {80}
pages = {4866}
author = {Matthew Andres Moreno and Emily Dolson and Charles Ofria},
title = {hstrat: a Python Package for phylogenetic inference on distributed digital evolution populations},
journal = {Journal of Open Source Software}
}
Citation
Moreno et al., (2022). hstrat: a Python Package for phylogenetic inference on distributed digital evolution populations. Journal of Open Source Software, 7(80), 4866, https://doi.org/10.21105/joss.04866
View at Publisher
Authors | Matthew Andres Moreno, Emily Dolson, Charles Ofria |
Date | May 13th, 2022 |
DOI | 10.1145/3520304.3533937 |
Venue | The Genetic and Evolutionary Computation Conference |
Abstract
Phylogenetic analyses can also enable insight into evolutionary and ecological dynamics such as selection pressure and frequency dependent selection in digital evolution systems. Traditionally digital evolution systems have recorded data for phylogenetic analyses through perfect tracking where each birth event is recorded in a centralized data structures. This approach, however, does not easily scale to distributed computing environments where evolutionary individuals may migrate between a large number of disjoint processing elements. To provide for phylogenetic analyses in these environments, we propose an approach to infer phylogenies via heritable genetic annotations rather than directly track them. We introduce a “hereditary stratigraphy” algorithm that enables efficient, accurate phylogenetic reconstruction with tunable, explicit trade-offs between annotation memory footprint and reconstruction accuracy. This approach can estimate, for example, MRCA generation of two genomes within 10% relative error with 95% confidence up to a depth of a trillion generations with genome annotations smaller than a kilobyte. We also simulate inference over known lineages, recovering up to 85.70% of the information contained in the original tree using a 64-bit annotation.
BibTeX
@inproceedings{moreno2022hereditary_gecco,
author = {Moreno, Matthew Andres and Dolson, Emily and Ofria, Charles},
title = {Hereditary Stratigraphy: Genome Annotations to Enable Phylogenetic Inference over Distributed Populations},
year = {2022},
isbn = {9781450392686},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3520304.3533937},
doi = {10.1145/3520304.3533937},
booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference Companion},
pages = {65–66},
numpages = {2},
keywords = {phylogenetics, decentralized algorithms, genetic algorithms, digital evolution, genetic programming},
location = {Boston, Massachusetts},
series = {GECCO '22}
}
Citation
Matthew Andres Moreno, Emily Dolson, and Charles Ofria. 2022. Hereditary stratigraphy: genome annotations to enable phylogenetic inference over distributed populations. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO ‘22). Association for Computing Machinery, New York, NY, USA, 65–66. https://doi.org/10.1145/3520304.3533937
View at Publisher
Authors | Matthew Andres Moreno, Emily Dolson, Charles Ofria |
Date | May 13th, 2022 |
DOI | 10.1162/isal_a_00550 |
Venue | The 2022 Conference on Artificial Life |
Abstract
Phylogenies provide direct accounts of the evolutionary trajectories behind evolved artifacts in genetic algorithm and artificial life systems. Phylogenetic analyses can also enable insight into evolutionary and ecological dynamics such as selection pressure and frequency-dependent selection. Traditionally, digital evolution systems have recorded data for phylogenetic analyses through perfect tracking where each birth event is recorded in a centralized data structure. This approach, however, does not easily scale to distributed computing environments where evolutionary individuals may migrate between a large number of disjoint processing elements. To provide for phylogenetic analyses in these environments, we propose an approach to enable phylogenies to be inferred via heritable genetic annotations rather than directly tracked. We introduce a “hereditary stratigraphy” algorithm that enables efficient, accurate phylogenetic reconstruction with tunable, explicit trade-offs between annotation memory footprint and reconstruction accuracy. In particular, we demonstrate an approach that enables estimation of the most recent common ancestor (MRCA) between two individuals with fixed relative accuracy irrespective of lineage depth while only requiring logarithmic annotation space complexity with respect to lineage depth This approach can estimate, for example, MRCA generation of two genomes within 10% relative error with 95% confidence up to a depth of a trillion generations with genome annotations smaller than a kilobyte. We also simulate inference over known lineages, recovering up to 85.70% of the information contained in the original tree using 64-bit annotations.
BibTeX
@inproceedings{moreno2022hereditary,
author = {Moreno, Matthew Andres and Dolson, Emily and Ofria, Charles},
title = "{Hereditary Stratigraphy: Genome Annotations to Enable Phylogenetic Inference over Distributed Populations}",
volume = {ALIFE 2022: The 2022 Conference on Artificial Life},
series = {ALIFE 2022: The 2022 Conference on Artificial Life},
year = {2022},
month = {07},
doi = {10.1162/isal_a_00550},
url = {https://doi.org/10.1162/isal\_a\_00550},
note = {64},
eprint = {https://direct.mit.edu/isal/proceedings-pdf/isal/34/64/2035363/isal\_a\_00550.pdf},
}
Citation
Matthew Andres Moreno, Emily Dolson, Charles Ofria; July 18–22, 2022. “Hereditary Stratigraphy: Genome Annotations to Enable Phylogenetic Inference over Distributed Populations.” Proceedings of the ALIFE 2022: The 2022 Conference on Artificial Life. ALIFE 2022: The 2022 Conference on Artificial Life. Online. (pp. 64). ASME. https://doi.org/10.1162/isal_a_00550
View at Publisher
Authors | Matthew Andres Moreno, Emily Dolson, Charles Ofria |
Date | January 1st, 2022 |
Venue | Python package published via PyPI |
hstrat enables phylogenetic inference on distributed digital evolution populations.
BibTeX
@article{moreno2022hereditary,
author = {Moreno, Matthew Andres and Dolson, Emily and Ofria, Charles},
doi = {https://doi.org/10.1162/isal_a_00550},
journal = {Proceedings of the ALIFE 2022: The 2022 Conference on Artificial Life},
month = {7},
pages = {64--74},
title = {{Hereditary Stratigraphy: Genome Annotations to Enable Phylogenetic Inference over Distributed Populations}},
volume = {Proceedings of the ALIFE 2022: The 2022 Conference on Artificial Life},
year = {2022}
}
Citation
Moreno, M. A., Dolson, E., & Ofria, C. (2022). Hereditary Stratigraphy: Genome Annotations to Enable Phylogenetic Inference over Distributed Populations. Proceedings of the ALIFE 2022: The 2022 Conference on Artificial Life, Proceedings of the ALIFE 2022: The 2022 Conference on Artificial Life(), 64–74. https://doi.org/https://doi.org/10.1162/isal_a_00550