hstrat: a Python Package for phylogenetic inference on distributed digital evolution populations

Date November 7th, 2022
Venue Journal of Open Source Science (Under Revision)

Digital evolution systems instantiate evolutionary processes over populations of virtual agents in silico. These programs can serve as rich experimental model systems. Insights from digital evolution experiments expand evolutionary theory, and can often directly improve heuristic optimization techniques . Perfect observability, in particular, enables in silico experiments that would be otherwise impossible in vitro or in vivo. Notably, availability of the full evolutionary history (phylogeny) of a given population enables very powerful analyses.

As a slow but highly parallelizable process, digital evolution will benefit greatly by continuing to capitalize on profound advances in parallel and distributed computing [@moreno2020practical;@ackley2014indefinitely], particularly emerging unconventional computing architectures [@ackley2011homeostatic;@lauterbach2021path;@furber2014spinnaker]. However, scaling up digital evolution presents many challenges. Among these is the existing centralized perfect-tracking phylogenetic data collection model, which is inefficient and difficult to realize in parallel and distributed contexts. Here, we implement an alternative approach to tracking phylogenies across vast and potentially unreliable hardware networks.

The hstrat Python library exists to facilitate application of hereditary stratigraphy, a cutting-edge technique to enable phylogenetic inference over distributed digital evolution populations. This technique departs from the traditional perfect-tracking approach to phylogenetic record-keeping. Instead, hereditary stratigraphy enables phylogenetic history to be inferred from heritable annotations attached to evolving digital agents. This approach aligns with phylogenetic reconstruction methodologies in evolutionary biology. Hereditary stratigraphy attaches a set of immutable historical “checkpoints” — referred to as strata — as an annotation on evolving genomes. Checkpoints can be strategically discarded to reduce annotation size at the cost of increasing inference uncertainty. A particular strategy for which checkpoints to discard when is referred to as a stratum retention policy. We refer to the set of retained strata as a hereditary stratigraphic column.

Appropriate stratum retention policy choice varies by application. For example, if annotation size is not a concern it may be best to preserve all strata. In other situations, it may be necessary to constrain annotation size to remain within a fixed memory budget.

Key features of the library include:

  • object-oriented hereditary stratigraphic column implementation to annotate arbitrary genomes,
  • modular interchangeability and user extensibility of stratum retention policies,
  • programmatic interface to query guarantees and behavior of stratum retention policy,
  • modular interchangeability and user extensibility of back-end data structure used to store annotation data,
  • a suite of visualization tools to elucidate stratum retention policies,
  • support for automatic parameterization of stratum retention policies to meet user size complexity or inference precision specifications,
  • tools to compare two columns and extract information about the phylogenetic relationship between them,
  • extensive documentation hosted on ReadTheDocs,
  • a comprehensive test suite to ensure stability and reliability,
  • convenient availability as a Python package via the PyPI repository, and
  • pure Python implementation to ensure universal portability.
⎘ copy to clipboard
  author = {Moreno, Matthew Andres and Dolson, Emily and Ofria, Charles},
  title = "{hstrat: a Python Package for phylogenetic inference on distributed digital evolution populations}",
  journal = {Journal of Open Source Software},
  year = {Under Revision},
⎘ copy to clipboard

Matthew Andres Moreno, Emily Dolson, and Charles Ofria. hstrat: a Python Package for phylogenetic inference on distributed digital evolution populations. Journal of Open Source Software. Under Revision.