Packaging Composable Research Software Libraries

include graph for DISHTINY software Include graph for DISHTINY software.

Packaging and distribution of software multiplies the impact of research, both by opening the door to follow-on research within the scientific community and by facilitating direct real-world applications. However, realizing this goal requires special attention to organization, documentation, and reliability. Many of my research projects are organized so to maximize contribution of general-purpose library software back to the community. This usually involves adding software features to an existing project or publishing a standalone Python or C++ library.

Publications & Software
2024 DendroPy 5: a mature Python library for phylogenetic computing
Journal of Open Source Software
Download
View at Publisher
Authors
Date September 23rd, 2024
DOI 10.21105/joss.06943
Venue Journal of Open Source Software
Abstract

Contemporary bioinformatics has seen in profound new visibility into the composition, structure, and history of the natural world around us. Arguably, the central pillar of bioinformatics is phylogenetics – the study of hereditary relatedness among organisms. Insight from phylogenetic analysis has touched nearly every corner of biology. Examples range across natural history, population genetics and phylogeography, conservation biology, public health, medicine, in vivo and in silico experimental evolution, application-oriented evolutionary algorithms, and beyond. High-throughput genetic and phenotypic data has realized groundbreaking results, in large part, through conjunction with open-source software used to process and analyze it. Indeed, the preceding decades have ushered in a flourishing ecosystem of bioinformatics software applications and libraries. Over the course of its nearly fifteen-year history, the DendroPy library for phylogenetic computation in Python has established a generalist niche in serving the bioinformatics community. Here, we report on the recent major release of the library, DendroPy version 5. The software release represents a major milestone in transitioning the library to a sustainable long-term development and maintenance trajectory. As such, this work positions DendroPy to continue fulfilling a key supporting role in phyloinformatics infrastructure.

BibTeX
⎘ copy to clipboard
@article{moreno2024dendropy,
  doi = {10.21105/joss.06943},
  url = {https://doi.org/10.21105/joss.06943},
  year = {2024},
  publisher = {The Open Journal},
  volume = {9},
  number = {101},
  pages = {6943},
  author = {Matthew Andres Moreno and Mark T. Holder and Jeet Sukumaran},
  title = {DendroPy 5: a mature Python library for phylogenetic computing},
  journal = {Journal of Open Source Software}
}
Citation
⎘ copy to clipboard

Moreno, M. A., Holder, M. T., & Sukumaran, J. (2024). DendroPy 5: a mature Python library for phylogenetic computing. Journal of Open Source Software, 9(101), 6943, https://doi.org/10.21105/joss.06943

Supporting Materials

2024 Structured Downsampling for Fast, Memory-efficient Curation of Online Data Streams
arXiv
Download
View at Publisher
Authors
Date September 10th, 2024
DOI 10.48550/arXiv.2409.06199
Venue arXiv
Abstract

Operations over data streams typically hinge on efficient mechanisms to aggregate or summarize history on a rolling basis. For high-volume data steams, it is critical to manage state in a manner that is fast and memory efficient — particularly in resource-constrained or real-time contexts. Here, we address the problem of extracting a fixed-capacity, rolling subsample from a data stream. Specifically, we explore “data stream curation” strategies to fulfill requirements on the composition of sample time points retained. Our “DStream” suite of algorithms targets three temporal coverage criteria: (1) steady coverage, where retained samples should spread evenly across elapsed data stream history; (2) stretched coverage, where early data items should be proportionally favored; and (3) tilted coverage, where recent data items should be proportionally favored. For each algorithm, we prove worst-case bounds on rolling coverage quality. We focus on the more practical, application-driven case of maximizing coverage quality given a fixed memory capacity. As a core simplifying assumption, we restrict algorithm design to a single update operation: writing from the data stream to a calculated buffer site — with data never being read back, no metadata stored (e.g., sample timestamps), and data eviction occurring only implicitly via overwrite. Drawing only on primitive, low-level operations and ensuring full, overhead-free use of available memory, this “DStream” framework ideally suits domains that are resource-constrained, performance-critical, and fine-grained (e.g., individual data items as small as single bits or bytes). The proposed approach supports O(1) data ingestion via concise bit-level operations. To further practical applications, we provide plug-and-play open-source implementations targeting both scripted and compiled application domains.

BibTeX
⎘ copy to clipboard
@misc{moreno2024structured,
      doi={10.48550/arXiv.2409.06199},
      url={https://arxiv.org/abs/2409.06199},
      title={Structured Downsampling for Fast, Memory-efficient Curation of Online Data Streams},
      author={Matthew Andres Moreno and Luis Zaman and Emily Dolson},
      year={2024},
      eprint={2409.06199},
      archivePrefix={arXiv},
      primaryClass={cs.DS}
}
Citation
⎘ copy to clipboard

Moreno, M. A., Zaman L., & Dolson E. (2024). Structured Downsampling for Fast, Memory-efficient Curation of Online Data Streams. arXiv preprint arXiv:2409.06199. https://doi.org/10.48550/arXiv.2409.06199

Supporting Materials

2024 Empirical: A scientific software library for research, education, and public engagement
Journal of Open Source Software
Download
View at Publisher
Authors
Date June 2nd, 2024
DOI 10.21105/joss.06617
Venue Journal of Open Source Software
Abstract

Empirical is a C++ library designed to promote open science and facilitate the development of scientific software that is efficient, reliable, and easily distributable to researchers and non-experts alike. Specifically, the library sets out to fulfill the following goals:

  1. Utility: Empirical tools streamline common scientific computing tasks such as configuration, end-to-end data management, and mathematical manipulations.
  2. Efficiency: Empirical implements general-purpose data structures and algorithms that emphasize computational efficiency to support scientific computing workloads.
  3. Reliability: Empirical provides sophisticated debug-mode instrumentation including audited memory management and safety-checked versions of standard library containers.
  4. Distributability: Empirical is highly portable, uses common data formats, and facilitates compile-to-web app development with object-oriented bindings for Emscripten/WebAssembly GUI elements, all with the goal of building broadly accessible scientific software.
BibTeX
⎘ copy to clipboard
@article{vostinar2024empirical,
  year = {2024},
  publisher = {The Open Journal},
  author = {Vostinar, Anya and Lalejini, Alexander and Ofria, Charles and Dolson, Emily and Moreno, Matthew Andres},
  title = {Empirical: A scientific software library for research, education, and public engagement},
  journal = {Journal of Open Source Software},
  volume = {9},
  number = {98},
  pages = {6617},
  doi = {10.21105/joss.06617},
  url = {https://doi.org/10.21105/joss.06617},
}
Citation
⎘ copy to clipboard

Vostinar, A., Lalejini, A., Ofria, C., Dolson, E., & Moreno, M.A. (2024). Empirical: A scientific software library for research, education, and public engagement. Journal of Open Source Software, 9(98), 6617, https://doi.org/10.21105/joss.06617

Supporting Materials

2024 Phylotrack: C++ and Python libraries for in silico phylogenetic tracking
arXiv
Download
View at Publisher
Authors
Date May 15th, 2024
DOI 10.48550/arXiv.2405.09389
Venue arXiv
Abstract

In silico evolution instantiates the processes of heredity, variation, and differential reproductive success (the three “ingredients” for evolution by natural selection) within digital populations of computational agents. Consequently, these populations undergo evolution, and can be used as virtual model systems for studying evolutionary dynamics. This experimental paradigm — used across biological modeling, artificial life, and evolutionary computation — complements research done using in vitro and in vivo systems by enabling experiments that would be impossible in the lab or field. One key benefit is complete, exact observability. For example, it is possible to perfectly record all parent-child relationships across simulation history, yielding complete phylogenies (ancestry trees). This information reveals when traits were gained or lost, and also facilitates inference of underlying evolutionary dynamics.

The Phylotrack project provides libraries for tracking and analyzing phylogenies in in silico evolution. The project is composed of 1) Phylotracklib: a header-only C++ library, developed under the umbrella of the Empirical project, and 2) Phylotrackpy: a Python wrapper around Phylotracklib, created with Pybind11. Both components supply a public-facing API to attach phylogenetic tracking to digital evolution systems, as well as a stand-alone interface for measuring a variety of popular phylogenetic topology metrics. Underlying design and C++ implementation prioritizes efficiency, allowing for fast generational turnover for agent populations numbering in the tens of thousands. Several explicit features (e.g., phylogeny pruning and abstraction, etc.) are provided for reducing the memory footprint of phylogenetic information.

BibTeX
⎘ copy to clipboard
@misc{dolson2024phylotrack,
      doi={10.48550/arXiv.2405.09389},
      url={https://arxiv.org/abs/2405.09389},
      title={Phylotrack: C++ and Python libraries for in silico phylogenetic tracking},
      author={Emily Dolson and Santiago Rodriguez-Papa and Matthew Andres Moreno},
      year={2024},
      eprint={2405.09389},
      archivePrefix={arXiv},
      primaryClass={q-bio.PE}
}
Citation
⎘ copy to clipboard

Dolson, E., Rodriguez-Papa, S., & Moreno, M. A. (2024). Phylotrack: C++ and Python libraries for in silico phylogenetic tracking. arXiv preprint arXiv:2405.09389. https://doi.org/10.48550/arXiv.2405.09389

Supporting Materials

2024 qspool
Python package published via PyPI
View on GitHub
View at Publisher
Authors
Date March 24th, 2024
Venue Python package published via PyPI

a dependency-free solution to spool jobs into SLURM scheduler without exceeding queue capacity limits

BibTeX
⎘ copy to clipboard
@software{moreno2024qspool,
  author = {Matthew Andres Moreno},
  title = {mmore500/qspool},
  month = mar,
  year = 2024,
  publisher = {Zenodo},
  doi = {10.5281/zenodo.10864602},
  url = {https://doi.org/10.5281/zenodo.10864602}
}
Citation
⎘ copy to clipboard

Matthew Andres Moreno (2024). mmore500/qspool. Zenodo. https://doi.org/10.5281/zenodo.10864602

Supporting Materials

2024 pecking
Python package published via PyPI
View on GitHub
View at Publisher
Authors
Date March 21st, 2024
Venue Python package published via PyPI

pecking identifies the set of lowest-ranked groups and set of highest-ranked groups in a dataset using nonparametric statistical tests

BibTeX
⎘ copy to clipboard
@software{moreno2024pecking,
  author = {Matthew Andres Moreno},
  title = {mmore500/pecking},
  month = feb,
  year = 2024,
  publisher = {Zenodo},
  doi = {10.5281/zenodo.10701185},
  url = {https://doi.org/10.5281/zenodo.10701185}
}
Citation
⎘ copy to clipboard

Matthew Andres Moreno. (2024). mmore500/pecking. Zenodo. https://doi.org/10.5281/zenodo.10701185

Supporting Materials

2024 colorclade
Python package published via PyPI
View on GitHub
View at Publisher
Authors
Date March 11th, 2024
Venue Python package published via PyPI

colorclade draws phylogenies with hierarchical coloring for easier visual comparison

BibTeX
⎘ copy to clipboard
@software{moreno2024colorclade,
  author = {Matthew Andres Moreno},
  title = {mmore500/colorclade},
  month = mar,
  year = 2024,
  publisher = {Zenodo},
  doi = {10.5281/zenodo.10802404},
  url = {https://doi.org/10.5281/zenodo.10802404}
}
Citation
⎘ copy to clipboard

Matthew Andres Moreno. (2024). mmore500/colorclade. Zenodo. https://doi.org/10.5281/zenodo.10802404

Supporting Materials

2024 joinem
Python package published via PyPI
View on GitHub
View at Publisher
Authors
Date February 20th, 2024
Venue Python package published via PyPI

joinem provides a CLI for fast, flexbile concatenation of tabular data using polars

BibTeX
⎘ copy to clipboard
@software{moreno2024joinem,
  author = {Matthew Andres Moreno},
  title = {mmore500/joinem},
  month = feb,
  year = 2024,
  publisher = {Zenodo},
  doi = {10.5281/zenodo.10701182},
  url = {https://doi.org/10.5281/zenodo.10701182}
}
Citation
⎘ copy to clipboard

Matthew Andres Moreno. (2024). mmore500/joinem. Zenodo. https://doi.org/10.5281/zenodo.10701182

Supporting Materials

2023 outset
Python package published via PyPI
View on GitHub
View at Publisher
Authors
Date December 22nd, 2023
Venue Python package published via PyPI

add zoom indicators, insets, and magnified panels to matplotlib/seaborn visualizations with ease!

BibTeX
⎘ copy to clipboard
@software{moreno2023outset,
  author = {Matthew Andres Moreno},
  title = {mmore500/outset},
  month = dec,
  year = 2023,
  publisher = {Zenodo},
  doi = {10.5281/zenodo.10426106},
  url = {https://doi.org/10.5281/zenodo.10426106}
}
Citation
⎘ copy to clipboard

Matthew Andres Moreno. (2023). mmore500/outset. Zenodo. https://doi.org/10.5281/zenodo.10426106

Supporting Materials

2022 phylotrackpy
Python package published via PyPI
View on GitHub
View at Publisher
Authors
Date January 1st, 2022
Venue Python package published via PyPI

phylotrackpy is a Python phylogeny tracker.

BibTeX
⎘ copy to clipboard
@misc{dolson2024phylotrack,
      doi={10.48550/arXiv.2405.09389},
      url={https://arxiv.org/abs/2405.09389},
      title={Phylotrack: C++ and Python libraries for in silico phylogenetic tracking},
      author={Emily Dolson and Santiago Rodriguez-Papa and Matthew Andres Moreno},
      year={2024},
      eprint={2405.09389},
      archivePrefix={arXiv},
      primaryClass={q-bio.PE}
}
Citation
⎘ copy to clipboard

Dolson, E., Rodriguez-Papa, S., & Moreno, M. A. (2024). Phylotrack: C++ and Python libraries for in silico phylogenetic tracking. arXiv preprint arXiv:2405.09389. https://doi.org/10.48550/arXiv.2405.09389

Supporting Materials

2022 opytional
Python package published via PyPI
View on GitHub
View at Publisher
Authors
Date January 1st, 2022
Venue Python package published via PyPI

opytional makes working with values that might be None safer and easier.

Supporting Materials

2022 interval-search
Python package published via PyPI
View on GitHub
View at Publisher
Authors
Date January 1st, 2022
Venue Python package published via PyPI

interval-search provides predicate-based binary and doubling search implementations.

Supporting Materials

2022 hstrat
Python package published via PyPI
View on GitHub
View at Publisher
Authors
Date January 1st, 2022
Venue Python package published via PyPI

hstrat enables phylogenetic inference on distributed digital evolution populations.

BibTeX
⎘ copy to clipboard
@article{moreno2022hstrat,
  doi = {10.21105/joss.04866},
  url = {https://doi.org/10.21105/joss.04866},
  year = {2022},
  publisher = {The Open Journal},
  volume = {7},
  number = {80},
  pages = {4866},
  author = {Matthew Andres Moreno and Emily Dolson and Charles Ofria},
  title = {hstrat: a Python Package for phylogenetic inference on distributed digital evolution populations},
  journal = {Journal of Open Source Software}
}
Citation
⎘ copy to clipboard

Moreno M.A., Dolson, E., & Ofria, C. (2022). hstrat: a Python Package for phylogenetic inference on distributed digital evolution populations. Journal of Open Source Software, 7(80), 4866, https://doi.org/10.21105/joss.04866

Supporting Materials

2022 alifedata-phyloinformatics-convert
Python package published via PyPI
View on GitHub
View at Publisher
Authors
Date January 1st, 2022
Venue Python package published via PyPI

alifedata-phyloinformatics-convert helps apply traditional phyloinformatics software to alife standardized data.

BibTeX
⎘ copy to clipboard
@software{moreno2024apc,
  author = {Matthew Andres Moreno AND Santiago {Rodriguez Papa}},
  title = {mmore500/alifedata-phyloinformatics-convert},
  month = feb,
  year = 2024,
  publisher = {Zenodo},
  doi = {10.5281/zenodo.10701178},
  url = {https://doi.org/10.5281/zenodo.10701178}
}
Citation
⎘ copy to clipboard

Matthew Andres Moreno, Santiago Rodriguez Papa. (2024). mmore500/alifedata-phyloinformatics-convert. Zenodo. https://doi.org/10.5281/zenodo.10701178

Supporting Materials

2020 Zero to Sixty: Onboarding Tutorials for Native & Web Software Development with C++
Workshop for Avida-ED Software Development
View at Publisher
Authors
Date May 26th, 2020
Venue Workshop for Avida-ED Software Development

Hands-on, asynchronous 4 day tutorial series covering foundational web development competencies, C++ development with the Empirical library, and compiling for the web with Emscripten.


2020 teeplot
Python package published via PyPI
View on GitHub
View at Publisher
Authors
Date January 1st, 2020
Venue Python package published via PyPI

teeplot wrangles your data visualizations out of notebooks for you.

BibTeX
⎘ copy to clipboard
@software{moreno2023teeplot,
  author = {Matthew Andres Moreno},
  title = {mmore500/teeplot},
  month = dec,
  year = 2023,
  publisher = {Zenodo},
  doi = {10.5281/zenodo.10440670},
  url = {https://doi.org/10.5281/zenodo.10440670}
}
Citation
⎘ copy to clipboard

Matthew Andres Moreno. (2023). mmore500/teeplot. Zenodo. https://doi.org/10.5281/zenodo.10440670

Supporting Materials

2020 signalgp-lite
header-only C++ library
View on GitHub
Authors
Date January 1st, 2020
Venue header-only C++ library

A genetic programming implementation designed for large-scale artificial life applications. Organized as a header-only C++ library. Inspired by Alex Lalejini’s SignalGP.

BibTeX
⎘ copy to clipboard
@misc{moreno2021signalgp,
  doi = {10.48550/ARXIV.2108.00382},

  url = {https://arxiv.org/abs/2108.00382},

  author = {Moreno, Matthew Andres and Rodriguez Papa, Santiago and Lalejini, Alexander and Ofria, Charles},

  keywords = {Neural and Evolutionary Computing (cs.NE), FOS: Computer and information sciences, FOS: Computer and information sciences},

  title = {SignalGP-Lite: Event Driven Genetic Programming Library for Large-Scale Artificial Life Applications},

  publisher = {arXiv},

  year = {2021},

  copyright = {arXiv.org perpetual, non-exclusive license}
}
Citation
⎘ copy to clipboard

Moreno, M. A., Rodriguez Papa, S., & Ofria, C. (2021). SignalGP-Lite: Event Driven Genetic Programming Library for Large-Scale Artificial Life Applications. arXiv preprint arXiv:2108.00382.

Supporting Materials

2020 conduit
header-only C++ library
View on GitHub
Authors
Date January 1st, 2020
Venue header-only C++ library

C++ library that wraps intra-thread, inter-thread, and inter-process communication in a uniform, modular, object-oriented interface, with a focus on asynchronous high-performance computing applications.

BibTeX
⎘ copy to clipboard
@inproceedings{moreno2021conduit,
  author = {Moreno, Matthew Andres and Rodriguez Papa, Santiago and Ofria, Charles},
  title = {Conduit: A C++ Library for Best-Effort High Performance Computing},
  year = {2021},
  isbn = {9781450383516},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3449726.3463205},
  doi = {10.1145/3449726.3463205},
  booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference Companion},
  pages = {1795–1800},
  numpages = {6},
  keywords = {high performance computing, best-effort computing},
  location = {Lille, France},
  series = {GECCO '21}
}
Citation
⎘ copy to clipboard

Matthew Andres Moreno, Santiago Rodriguez Papa, and Charles Ofria. 2021. Conduit: a C++ library for best-effort high performance computing. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO ‘21). Association for Computing Machinery, New York, NY, USA, 1795–1800. https://doi.org/10.1145/3449726.3463205

Supporting Materials

2020 dishtiny
header-only C++ library
View on GitHub
Authors
Date January 1st, 2020
Venue header-only C++ library

C++ library for digital evolution simulations studying digital multicellularity and fraternal major evolutionary transitions in individuality.

Supporting Materials

2019 keyname
Python package published via PyPI
View on GitHub
View at Publisher
Authors
Date January 1st, 2019
Venue Python package published via PyPI

keyname helps easily pack and unpack metadata in a filename.

Supporting Materials

2018 Empirical
header-only C++ library
View on GitHub
Date January 1st, 2018
Venue header-only C++ library

Empirical is a library of tools for developing useful, efficient, reliable, and available scientific software. The provided code is header-only and encapsulated into the emp namespace, so it is simple to incorporate into existing projects.

BibTeX
⎘ copy to clipboard
@software{Ofria_Empirical_C_library_2020,
  author = {Ofria, Charles and Moreno, Matthew Andres and Dolson, Emily and Lalejini, Alex and Rodriguez Papa, Santiago and Fenton, Jake and Perry, Katherine and Jorgensen, Steven and hoffmanriley and grenewode and Baldwin Edwards, Oliver and Stredwick, Jason and cgnitash and theycallmeHeem and Vostinar, Anya and Moreno, Ryan and Schossau, Jory and Zaman, Luis and djrain},
  doi = {10.5281/zenodo.4141943},
  license = {MIT},
  month = {10},
  title = {{Empirical: C++ library for efficient, reliable, and accessible scientific software}},
  url = {https://github.com/devosoft/Empirical},
  version = {0.0.4},
  year = {2020}
}
Citation
⎘ copy to clipboard

Ofria, C., Moreno, M. A., Dolson, E., Lalejini, A., Rodriguez Papa, S., Fenton, J., Perry, K., Jorgensen, S., , H., , G., Baldwin Edwards, O., Stredwick, J., , C., , T., Vostinar, A., Moreno, R., Schossau, J., Zaman, L., & , D. (2020). Empirical: C++ library for efficient, reliable, and accessible scientific software (Version 0.0.4) [Computer software]. https://doi.org/10.5281/zenodo.4141943

Supporting Materials