Week 1, 2020: Version control with git and Github
🔗 What are we going to be talking about and why?
Git is a program that keeps track of changes that different people make to a piece of software and facilitates merging them together. While git is useful for projects of all sizes, for as complex a collaborative project as Empirical, MABE, or Avida it is absolutely essential. It lets us all work on the code at once, undo changes that break stuff, and keep track of what state the code was in when used it to run a set of experiments (important for doing science!).
Github is a website where people host git repositories. Importantly, it is where the repositories for all of our code live, and as a result it’s a big part of our development workflow. In today’s encrichment, we’ll be talking about these tools, taking a tour through the Empirical repo, making some practice pull requests, and, if we have time, having some discussions.
🔗 Pre-Discussion Learning Material
Everyone’s got different levels of comfort and experience with git coming into this, so we’re doing a self-paced lesson to get everyone up to speed. I love teaching people git interactively, though, so please feel free to ask questions or summon me to a zoom room to explain stuff!
-
Software carpentry lesson on git:
This is a tutorial that covers all the most important features of git in a way that’s easy to follow along with on your own.
It’s also the resource most of my demo was drawn from.
- Priority: Be comfortable with the material in lessons 1 - 4 and 7 (it will be very basic to some and not at all to others)
- Ideally: Also be comfortable with 8 and 9
- The stuff in 5 and 6 will improve your life and I recommend looking through it, but it is not essential to know before the enrichment session.
🔗 Notes on what we’ll be covering
🔗 Branches and forks
- A branch is a different sequence of changes to the code that exists within a repo
- A fork is another copy of a repository
- You can fork a repo on Github by clicking on the “fork” button.
🔗 Empirical Github notes
- When you start working on a new feature, open a pull request and give it the “Merge: not ready” label. This way we can all see what everyone’s working on, but no one will accidentally merge in your code before it’s ready. Also add other appropriate labels describing what you’re working on.
- Once you’re done developing a new feature, follow these steps:
- Make sure all of the tests pass (there will be a warning if they aren’t currently)
- Make sure you’ve written enough tests for your new code to meet the minimum requirements (overall test coverage for the repository can’t decrease and also the files you touched need to meet a certain minimum coverage level). Caveat: we can’t currently measure coverage for web tests, but you should still test your web code.
- Add the “Merge: ready” label
- Get an Empirical maintainer (which includes most of the mentors) to review and approve your code111
- If you encounter a bug in the code currently in the master branch of Empirical, open an issue on github describing the observed behavior and why it doesn’t match the correct behavior.
- Do not commit directly to the master branch. If you accidentally do this on your computer, definitely do not push it the master branch on the Empirical repository. In general, you should be doing all of your development either on a different branch in the main repository, or in your own fork of the repository (and then, still probably in a branch other than master).
- We’ve got a few external tools hooked into the GitHub repository:
- Readthedocs, which builds our documentation
- Travis CI which runs all of our automated tests and tells us if they fail
- Codecov, which tells us how much of our code our tests actually test (note that we currently can’t measure code coverage for web-only code, although we do have tests for it)
🔗 Making a pull request
- A pull request is a request that someone else pull from your branch or fork of a git repo. It’s how we contribute code to repositories in an orderly way
- Steps:
- Create the branch or fork you’re making the pull request from. In our demo today, we’ll make a fork of the WAVES workshop repo. You can do that by clicking the link to the repo, and clicking the fork button in the upper right.
- Make the changes to the code that you want to make. In this case, that will be createing a stub for a blog post about your project. To do that, go into the blog/_posts directory and click the “create new file” button. Your file name should follow the convention of the file that’s aleady present (year-month-day-post-title) and should begin with a header like the one in the existing file.
- Go back to the main WAVES repo and click “pull requests” at the top, and then the green “create pull request” button. Click the “compare across forks” link on the next page to tell Github you want to make a pull request from your fork. Then select your fork from the dropdown menu on the right, and select the correct branch of it from the dropdown next to that.
- This will take you to a screen where you can give your pull request a title and note describing what it does. Then you can click the submit button to create the pull request. You’re done!
🔗 Open source development and licenses
- A lot of code on Github, including Empirical, is open-source. That means all of it is freely available for others to view. The specific rules on what people can do with open source code vary based on the exact license that that code is under.
- When you make your code publicly available, you should put a license on it. If you don’t you’re creating ambiguity about in what ways you’re comfortable with people using your code.
- Some core questions to ask yourself when choosing a license:
- Do I want people who are using my code to give me credit (attribution)?
- Do I care if other people profit off of my code (commercial vs. non-commercial use)?
- Is it okay for people to modify my code or can they only reuse it exactly as-is?
- Do I want all code based on mine to use a similar license (infective license/copyleft)?
- Note: This can have unanticipated side effects
- This website is a great guide to choosing a license
- There are a lot of benefits to open source code, especially in science (since we care more about scientific credit than profit): helps build a community of users that are invested and contribute back to the software, can be widely used (does more good in the world!), and allows reproducible science.
🔗 Conversation Starters/Discussion Questions
- How often do you think you should make commits in git?
- When should you make new branches?
- Have you encountered any best practices for working with git that you want to share?
- Why do merge conflicts happen?
- What license do you think scientific software should use?
🔗 Other resources
- Levels to git mastery: Great resource quickly summarizing the main commands you need to know at each level of understanding of git (thanks to Jory for suggesting it!).
- Git branch explainer: A reference for learning more about how branches work
- Git flow: A way of thinking about how to organize branches in a large software project. We don’t 100% conform to this in Empirical yet, but it’s useful to think about and move closer to.
- Refined github: this is a browser extension that I recently discovered and totally love. It alters GitHub to add additional features. They’re mostly on the advanced side, so this might be something to check out once you’re comfortable with the standard interface. My favorite feature is that it gives you a button to revert changes on a file that you accidentally included in a pull request.