Packaging Composable Research Software Libraries
Include graph for DISHTINY software.
Packaging and distribution of software multiplies the impact of research, both by opening the door to follow-on research within the scientific community and by facilitating direct real-world applications. However, realizing this goal requires special attention to organization, documentation, and reliability. Many of my research projects are organized so to maximize contribution of general-purpose library software back to the community. This usually involves adding software features to an existing project or publishing a standalone Python or C++ library.
Publications & Software
View at Publisher
Authors | Matthew Andres Moreno, Mark T. Holder, Jeet Sukumaran |
Date | September 23rd, 2024 |
DOI | 10.21105/joss.06943 |
Venue | Journal of Open Source Software |
Abstract
Contemporary bioinformatics has seen in profound new visibility into the composition, structure, and history of the natural world around us. Arguably, the central pillar of bioinformatics is phylogenetics – the study of hereditary relatedness among organisms. Insight from phylogenetic analysis has touched nearly every corner of biology. Examples range across natural history, population genetics and phylogeography, conservation biology, public health, medicine, in vivo and in silico experimental evolution, application-oriented evolutionary algorithms, and beyond. High-throughput genetic and phenotypic data has realized groundbreaking results, in large part, through conjunction with open-source software used to process and analyze it. Indeed, the preceding decades have ushered in a flourishing ecosystem of bioinformatics software applications and libraries. Over the course of its nearly fifteen-year history, the DendroPy library for phylogenetic computation in Python has established a generalist niche in serving the bioinformatics community. Here, we report on the recent major release of the library, DendroPy version 5. The software release represents a major milestone in transitioning the library to a sustainable long-term development and maintenance trajectory. As such, this work positions DendroPy to continue fulfilling a key supporting role in phyloinformatics infrastructure.
BibTeX
@article{moreno2024dendropy,
doi = {10.21105/joss.06943},
url = {https://doi.org/10.21105/joss.06943},
year = {2024},
publisher = {The Open Journal},
volume = {9},
number = {101},
pages = {6943},
author = {Matthew Andres Moreno and Mark T. Holder and Jeet Sukumaran},
title = {DendroPy 5: a mature Python library for phylogenetic computing},
journal = {Journal of Open Source Software}
}
Citation
Moreno, M. A., Holder, M. T., & Sukumaran, J. (2024). DendroPy 5: a mature Python library for phylogenetic computing. Journal of Open Source Software, 9(101), 6943, https://doi.org/10.21105/joss.06943
Supporting Materials
View at Publisher
Authors | Matthew Andres Moreno, Luis Zaman, Emily Dolson |
Date | September 10th, 2024 |
DOI | 10.48550/arXiv.2409.06199 |
Venue | arXiv |
Abstract
Operations over data streams typically hinge on efficient mechanisms to aggregate or summarize history on a rolling basis. For high-volume data steams, it is critical to manage state in a manner that is fast and memory efficient — particularly in resource-constrained or real-time contexts. Here, we address the problem of extracting a fixed-capacity, rolling subsample from a data stream. Specifically, we explore “data stream curation” strategies to fulfill requirements on the composition of sample time points retained. Our “DStream” suite of algorithms targets three temporal coverage criteria: (1) steady coverage, where retained samples should spread evenly across elapsed data stream history; (2) stretched coverage, where early data items should be proportionally favored; and (3) tilted coverage, where recent data items should be proportionally favored. For each algorithm, we prove worst-case bounds on rolling coverage quality. We focus on the more practical, application-driven case of maximizing coverage quality given a fixed memory capacity. As a core simplifying assumption, we restrict algorithm design to a single update operation: writing from the data stream to a calculated buffer site — with data never being read back, no metadata stored (e.g., sample timestamps), and data eviction occurring only implicitly via overwrite. Drawing only on primitive, low-level operations and ensuring full, overhead-free use of available memory, this “DStream” framework ideally suits domains that are resource-constrained, performance-critical, and fine-grained (e.g., individual data items as small as single bits or bytes). The proposed approach supports O(1) data ingestion via concise bit-level operations. To further practical applications, we provide plug-and-play open-source implementations targeting both scripted and compiled application domains.
BibTeX
@misc{moreno2024structured,
doi={10.48550/arXiv.2409.06199},
url={https://arxiv.org/abs/2409.06199},
title={Structured Downsampling for Fast, Memory-efficient Curation of Online Data Streams},
author={Matthew Andres Moreno and Luis Zaman and Emily Dolson},
year={2024},
eprint={2409.06199},
archivePrefix={arXiv},
primaryClass={cs.DS}
}
Citation
Moreno, M. A., Zaman L., & Dolson E. (2024). Structured Downsampling for Fast, Memory-efficient Curation of Online Data Streams. arXiv preprint arXiv:2409.06199. https://doi.org/10.48550/arXiv.2409.06199
Supporting Materials
View at Publisher
Authors | Anya Vostinar, Alexander Lalejini, Charles Ofria, Emily Dolson, Matthew Andres Moreno |
Date | June 2nd, 2024 |
DOI | 10.21105/joss.06617 |
Venue | Journal of Open Source Software |
Abstract
Empirical is a C++ library designed to promote open science and facilitate the development of scientific software that is efficient, reliable, and easily distributable to researchers and non-experts alike. Specifically, the library sets out to fulfill the following goals:
- Utility: Empirical tools streamline common scientific computing tasks such as configuration, end-to-end data management, and mathematical manipulations.
- Efficiency: Empirical implements general-purpose data structures and algorithms that emphasize computational efficiency to support scientific computing workloads.
- Reliability: Empirical provides sophisticated debug-mode instrumentation including audited memory management and safety-checked versions of standard library containers.
- Distributability: Empirical is highly portable, uses common data formats, and facilitates compile-to-web app development with object-oriented bindings for Emscripten/WebAssembly GUI elements, all with the goal of building broadly accessible scientific software.
BibTeX
@article{vostinar2024empirical,
year = {2024},
publisher = {The Open Journal},
author = {Vostinar, Anya and Lalejini, Alexander and Ofria, Charles and Dolson, Emily and Moreno, Matthew Andres},
title = {Empirical: A scientific software library for research, education, and public engagement},
journal = {Journal of Open Source Software},
volume = {9},
number = {98},
pages = {6617},
doi = {10.21105/joss.06617},
url = {https://doi.org/10.21105/joss.06617},
}
Citation
Vostinar, A., Lalejini, A., Ofria, C., Dolson, E., & Moreno, M.A. (2024). Empirical: A scientific software library for research, education, and public engagement. Journal of Open Source Software, 9(98), 6617, https://doi.org/10.21105/joss.06617
View at Publisher
Authors | Emily Dolson, Santiago Rodriguez-Papa, Matthew Andres Moreno |
Date | May 15th, 2024 |
DOI | 10.48550/arXiv.2405.09389 |
Venue | arXiv |
Abstract
In silico evolution instantiates the processes of heredity, variation, and differential reproductive success (the three “ingredients” for evolution by natural selection) within digital populations of computational agents. Consequently, these populations undergo evolution, and can be used as virtual model systems for studying evolutionary dynamics. This experimental paradigm — used across biological modeling, artificial life, and evolutionary computation — complements research done using in vitro and in vivo systems by enabling experiments that would be impossible in the lab or field. One key benefit is complete, exact observability. For example, it is possible to perfectly record all parent-child relationships across simulation history, yielding complete phylogenies (ancestry trees). This information reveals when traits were gained or lost, and also facilitates inference of underlying evolutionary dynamics.
The Phylotrack project provides libraries for tracking and analyzing phylogenies in in silico evolution. The project is composed of 1) Phylotracklib: a header-only C++ library, developed under the umbrella of the Empirical project, and 2) Phylotrackpy: a Python wrapper around Phylotracklib, created with Pybind11. Both components supply a public-facing API to attach phylogenetic tracking to digital evolution systems, as well as a stand-alone interface for measuring a variety of popular phylogenetic topology metrics. Underlying design and C++ implementation prioritizes efficiency, allowing for fast generational turnover for agent populations numbering in the tens of thousands. Several explicit features (e.g., phylogeny pruning and abstraction, etc.) are provided for reducing the memory footprint of phylogenetic information.
BibTeX
@misc{dolson2024phylotrack,
doi={10.48550/arXiv.2405.09389},
url={https://arxiv.org/abs/2405.09389},
title={Phylotrack: C++ and Python libraries for in silico phylogenetic tracking},
author={Emily Dolson and Santiago Rodriguez-Papa and Matthew Andres Moreno},
year={2024},
eprint={2405.09389},
archivePrefix={arXiv},
primaryClass={q-bio.PE}
}
Citation
Dolson, E., Rodriguez-Papa, S., & Moreno, M. A. (2024). Phylotrack: C++ and Python libraries for in silico phylogenetic tracking. arXiv preprint arXiv:2405.09389. https://doi.org/10.48550/arXiv.2405.09389
View at Publisher
Authors | Matthew Andres Moreno |
Date | March 24th, 2024 |
Venue | Python package published via PyPI |
a dependency-free solution to spool jobs into SLURM scheduler without exceeding queue capacity limits
BibTeX
@software{moreno2024qspool,
author = {Matthew Andres Moreno},
title = {mmore500/qspool},
month = mar,
year = 2024,
publisher = {Zenodo},
doi = {10.5281/zenodo.10864602},
url = {https://doi.org/10.5281/zenodo.10864602}
}
Citation
Matthew Andres Moreno (2024). mmore500/qspool. Zenodo. https://doi.org/10.5281/zenodo.10864602
Supporting Materials
View at Publisher
Authors | Matthew Andres Moreno |
Date | March 21st, 2024 |
Venue | Python package published via PyPI |
pecking identifies the set of lowest-ranked groups and set of highest-ranked groups in a dataset using nonparametric statistical tests
BibTeX
@software{moreno2024pecking,
author = {Matthew Andres Moreno},
title = {mmore500/pecking},
month = feb,
year = 2024,
publisher = {Zenodo},
doi = {10.5281/zenodo.10701185},
url = {https://doi.org/10.5281/zenodo.10701185}
}
Citation
Matthew Andres Moreno. (2024). mmore500/pecking. Zenodo. https://doi.org/10.5281/zenodo.10701185
Supporting Materials
View at Publisher
Authors | Matthew Andres Moreno |
Date | March 11th, 2024 |
Venue | Python package published via PyPI |
colorclade draws phylogenies with hierarchical coloring for easier visual comparison
BibTeX
@software{moreno2024colorclade,
author = {Matthew Andres Moreno},
title = {mmore500/colorclade},
month = mar,
year = 2024,
publisher = {Zenodo},
doi = {10.5281/zenodo.10802404},
url = {https://doi.org/10.5281/zenodo.10802404}
}
Citation
Matthew Andres Moreno. (2024). mmore500/colorclade. Zenodo. https://doi.org/10.5281/zenodo.10802404
Supporting Materials
View at Publisher
Authors | Matthew Andres Moreno |
Date | February 20th, 2024 |
Venue | Python package published via PyPI |
joinem provides a CLI for fast, flexbile concatenation of tabular data using polars
BibTeX
@software{moreno2024joinem,
author = {Matthew Andres Moreno},
title = {mmore500/joinem},
month = feb,
year = 2024,
publisher = {Zenodo},
doi = {10.5281/zenodo.10701182},
url = {https://doi.org/10.5281/zenodo.10701182}
}
Citation
Matthew Andres Moreno. (2024). mmore500/joinem. Zenodo. https://doi.org/10.5281/zenodo.10701182
Supporting Materials
View at Publisher
Authors | Matthew Andres Moreno |
Date | December 22nd, 2023 |
Venue | Python package published via PyPI |
add zoom indicators, insets, and magnified panels to matplotlib/seaborn visualizations with ease!
BibTeX
@software{moreno2023outset,
author = {Matthew Andres Moreno},
title = {mmore500/outset},
month = dec,
year = 2023,
publisher = {Zenodo},
doi = {10.5281/zenodo.10426106},
url = {https://doi.org/10.5281/zenodo.10426106}
}
Citation
Matthew Andres Moreno. (2023). mmore500/outset. Zenodo. https://doi.org/10.5281/zenodo.10426106
Supporting Materials
- documentation via GitHub Pages
- source archive via Zenodo z
- A Killer Fix for Scrunched Axes, Step-by-step, article via towards data science
- A Comprehensive Guide to Inset Axes in Matplotlib, article via towards data science
- Let Your Data Breathe: Tips, tricks, & tools to level up your FacetGrid game, article via level up coding
View at Publisher
Authors | Emily Dolson, Santiago Rodriguez-Papa, Matthew Andres Moreno |
Date | January 1st, 2022 |
Venue | Python package published via PyPI |
phylotrackpy is a Python phylogeny tracker.
BibTeX
@misc{dolson2024phylotrack,
doi={10.48550/arXiv.2405.09389},
url={https://arxiv.org/abs/2405.09389},
title={Phylotrack: C++ and Python libraries for in silico phylogenetic tracking},
author={Emily Dolson and Santiago Rodriguez-Papa and Matthew Andres Moreno},
year={2024},
eprint={2405.09389},
archivePrefix={arXiv},
primaryClass={q-bio.PE}
}
Citation
Dolson, E., Rodriguez-Papa, S., & Moreno, M. A. (2024). Phylotrack: C++ and Python libraries for in silico phylogenetic tracking. arXiv preprint arXiv:2405.09389. https://doi.org/10.48550/arXiv.2405.09389
Supporting Materials
View at Publisher
Authors | Matthew Andres Moreno |
Date | January 1st, 2022 |
Venue | Python package published via PyPI |
opytional makes working with values that might be None safer and easier.
Supporting Materials
View at Publisher
Authors | Matthew Andres Moreno |
Date | January 1st, 2022 |
Venue | Python package published via PyPI |
interval-search provides predicate-based binary and doubling search implementations.
Supporting Materials
View at Publisher
Authors | Matthew Andres Moreno, Emily Dolson, Charles Ofria |
Date | January 1st, 2022 |
Venue | Python package published via PyPI |
hstrat enables phylogenetic inference on distributed digital evolution populations.
BibTeX
@article{moreno2022hstrat,
doi = {10.21105/joss.04866},
url = {https://doi.org/10.21105/joss.04866},
year = {2022},
publisher = {The Open Journal},
volume = {7},
number = {80},
pages = {4866},
author = {Matthew Andres Moreno and Emily Dolson and Charles Ofria},
title = {hstrat: a Python Package for phylogenetic inference on distributed digital evolution populations},
journal = {Journal of Open Source Software}
}
Citation
Moreno M.A., Dolson, E., & Ofria, C. (2022). hstrat: a Python Package for phylogenetic inference on distributed digital evolution populations. Journal of Open Source Software, 7(80), 4866, https://doi.org/10.21105/joss.04866
Supporting Materials
View at Publisher
Authors | Matthew Andres Moreno, Santiago Rodriguez Papa |
Date | January 1st, 2022 |
Venue | Python package published via PyPI |
alifedata-phyloinformatics-convert helps apply traditional phyloinformatics software to alife standardized data.
BibTeX
@software{moreno2024apc,
author = {Matthew Andres Moreno AND Santiago {Rodriguez Papa}},
title = {mmore500/alifedata-phyloinformatics-convert},
month = feb,
year = 2024,
publisher = {Zenodo},
doi = {10.5281/zenodo.10701178},
url = {https://doi.org/10.5281/zenodo.10701178}
}
Citation
Matthew Andres Moreno, Santiago Rodriguez Papa. (2024). mmore500/alifedata-phyloinformatics-convert. Zenodo. https://doi.org/10.5281/zenodo.10701178
Supporting Materials
Authors | Matthew Andres Moreno, Santiago Rodriguez Papa |
Date | May 26th, 2020 |
Venue | Workshop for Avida-ED Software Development |
Hands-on, asynchronous 4 day tutorial series covering foundational web development competencies, C++ development with the Empirical library, and compiling for the web with Emscripten.
View at Publisher
Authors | Matthew Andres Moreno |
Date | January 1st, 2020 |
Venue | Python package published via PyPI |
teeplot wrangles your data visualizations out of notebooks for you.
BibTeX
@software{moreno2023teeplot,
author = {Matthew Andres Moreno},
title = {mmore500/teeplot},
month = dec,
year = 2023,
publisher = {Zenodo},
doi = {10.5281/zenodo.10440670},
url = {https://doi.org/10.5281/zenodo.10440670}
}
Citation
Matthew Andres Moreno. (2023). mmore500/teeplot. Zenodo. https://doi.org/10.5281/zenodo.10440670
Authors | Matthew Andres Moreno, Santiago Rodriguez Papa, Alexander Lalejini, Charles Ofria |
Date | January 1st, 2020 |
Venue | header-only C++ library |
A genetic programming implementation designed for large-scale artificial life applications. Organized as a header-only C++ library. Inspired by Alex Lalejini’s SignalGP.
BibTeX
@misc{moreno2021signalgp,
doi = {10.48550/ARXIV.2108.00382},
url = {https://arxiv.org/abs/2108.00382},
author = {Moreno, Matthew Andres and Rodriguez Papa, Santiago and Lalejini, Alexander and Ofria, Charles},
keywords = {Neural and Evolutionary Computing (cs.NE), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {SignalGP-Lite: Event Driven Genetic Programming Library for Large-Scale Artificial Life Applications},
publisher = {arXiv},
year = {2021},
copyright = {arXiv.org perpetual, non-exclusive license}
}
Citation
Moreno, M. A., Rodriguez Papa, S., & Ofria, C. (2021). SignalGP-Lite: Event Driven Genetic Programming Library for Large-Scale Artificial Life Applications. arXiv preprint arXiv:2108.00382.
Supporting Materials
Authors | Matthew Andres Moreno, Santiago Rodriguez Papa, Charles Ofria |
Date | January 1st, 2020 |
Venue | header-only C++ library |
C++ library that wraps intra-thread, inter-thread, and inter-process communication in a uniform, modular, object-oriented interface, with a focus on asynchronous high-performance computing applications.
BibTeX
@inproceedings{moreno2021conduit,
author = {Moreno, Matthew Andres and Rodriguez Papa, Santiago and Ofria, Charles},
title = {Conduit: A C++ Library for Best-Effort High Performance Computing},
year = {2021},
isbn = {9781450383516},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3449726.3463205},
doi = {10.1145/3449726.3463205},
booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference Companion},
pages = {1795–1800},
numpages = {6},
keywords = {high performance computing, best-effort computing},
location = {Lille, France},
series = {GECCO '21}
}
Citation
Matthew Andres Moreno, Santiago Rodriguez Papa, and Charles Ofria. 2021. Conduit: a C++ library for best-effort high performance computing. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO ‘21). Association for Computing Machinery, New York, NY, USA, 1795–1800. https://doi.org/10.1145/3449726.3463205
Supporting Materials
Authors | Matthew Andres Moreno, Santiago Rodriguez Papa, Katherine Perry, Charles Ofria |
Date | January 1st, 2020 |
Venue | header-only C++ library |
C++ library for digital evolution simulations studying digital multicellularity and fraternal major evolutionary transitions in individuality.
Supporting Materials
View at Publisher
Authors | Matthew Andres Moreno |
Date | January 1st, 2019 |
Venue | Python package published via PyPI |
keyname helps easily pack and unpack metadata in a filename.
Supporting Materials
Date | January 1st, 2018 |
Venue | header-only C++ library |
Empirical is a library of tools for developing useful, efficient, reliable, and available scientific software. The provided code is header-only and encapsulated into the emp
namespace, so it is simple to incorporate into existing projects.
BibTeX
@software{Ofria_Empirical_C_library_2020,
author = {Ofria, Charles and Moreno, Matthew Andres and Dolson, Emily and Lalejini, Alex and Rodriguez Papa, Santiago and Fenton, Jake and Perry, Katherine and Jorgensen, Steven and hoffmanriley and grenewode and Baldwin Edwards, Oliver and Stredwick, Jason and cgnitash and theycallmeHeem and Vostinar, Anya and Moreno, Ryan and Schossau, Jory and Zaman, Luis and djrain},
doi = {10.5281/zenodo.4141943},
license = {MIT},
month = {10},
title = {{Empirical: C++ library for efficient, reliable, and accessible scientific software}},
url = {https://github.com/devosoft/Empirical},
version = {0.0.4},
year = {2020}
}
Citation
Ofria, C., Moreno, M. A., Dolson, E., Lalejini, A., Rodriguez Papa, S., Fenton, J., Perry, K., Jorgensen, S., , H., , G., Baldwin Edwards, O., Stredwick, J., , C., , T., Vostinar, A., Moreno, R., Schossau, J., Zaman, L., & , D. (2020). Empirical: C++ library for efficient, reliable, and accessible scientific software (Version 0.0.4) [Computer software]. https://doi.org/10.5281/zenodo.4141943