Five months ago, we released a small library called Analytics.js by submitting it to Hacker News. A couple hours in it hit the #1 spot, and over the course of the day it grew from just 20 stars to over 1,000. Since then we’ve learned a ton about managing an open-source library, so I wanted to share some of those tips.
At the very beginning, we knew absolutely nothing about managing an open-source library. I don’t think any of us had even been on the merging side of a pull request before. So we had to learn fast.
Since Analytics.js has over 2,000 stars now, lots of people are making amazing contributions from the open-source community. Along the way, we’ve learned a lot about what we can do to keep pull requests top-quality, and how to streamline the development process for contributors.
1. Keep a consistent style.
New contributors will look at your existing codebase to learn how to add functionality to your library. And that’s exactly what they should be doing. Every developer wants to match the structure of a library they contribute to, but don’t own. Your job as a maintainer is to make that as easy as possible.
The trouble starts when your library leaves ambiguity in its source. If you do the same thing two different ways in two different places, how are contributors going to know which way is recommended? Answer: they won’t.
In the worst case they might even decide that because you aren’t consistent, they don’t have to be either!
Solving this takes a lot of discipline and consistency. As a rule, you shouldn’t experiment with different styles inside a single open source repository. If you want to change styles, do it quickly and globally. Otherwise, newcomers won’t be able to differentiate new conventions from ones you abandoned months ago.
We started off being very poorly equipped to handle this. All of our code lived in a single file, and the functions weren’t organized at all. (And if you check out the commits, that was after a cleanup!) We hadn’t taken the time to set a consistent style - the library was a jumble of different conventions.
As the pull requests started coming in, each one conflicted with the others. Everyone was modifying the same parts of the same files and adding their own utility functions wherever they felt like it.
Which leads into my next point…
2. Make the “right” way, the easy way.
The initial way we structured our code was leading to loads of problems with merging pull requests: namely we had no structure! One of the big changes we made to fix our structure was moving over to Component.
We love Component because it eliminates the magic from our code and reduces our library’s scope. It lets us use CommonJS, so we can just
require the modules we need, right from where we need them. Everything is explicit, which means our is code much easier for newcomers to follow. It’s a maintainer’s dream.
While making the switch, we wrote a bunch of our own components to replace all the utility functions we had been attaching to our global
analytics object. And now, since components are easy to include and use everywhere else in the library, pull requesters just use them by default!
As soon as we released the right way and made it clear, pull request quality went up dramatically.
As far as keeping a consistent style goes, you have to be militant when it comes to new code. You cannot be afraid of commenting on pull requests even if it seems like a minor style correction, or refusing requests which needlessly clutter your API.
And remember, that goes for your own code as well! If you get lazy while adding new features, why shouldn’t contributors? The more clean code in your repository, the more good examples you have for newcomers to learn from.
Speaking of not getting lazy…
3. Write tests, and hook up Travis.
Having great test coverage is easily the best way to speed up development. We push changes all the time, so we can’t afford to spend time worrying about breaking existing functionality. We write lots of tests, and get lots of benefits: much fastoer development, more confidence in our own code, more trust from outsiders, and…
It also leads to much higher-quality pull requests!
When developers copy the library coding style, that extends to tests as well. We don’t let contributors add their own code without adding corresponding tests.
The good part is we don’t have to enforce this too much. Thanks to Travis-CI, nearly all of our pull requests come complete with tests patterned from our existing tests. And since we’ve made sure our existing tests are high quality, the copied tests naturally start off at a higher bar.
Travis is so well integrated with GitHub, that it will make merging your pull requests significantly easier. Both you and your contributors get notifications when a pull request doesn’t pass all your tests, so they’ll be incentivized to fix a breaking change.
Having a passing Travis badge also inspires trust that your library actually does work correctly and is still maintained. Not to mention peer pressures you into keeping your tests in good condition (which we all know is the hardest part).
4. Version from day one.
Versioning properly, just like testing, takes discipline and is completely worth it. Most developers when they are having issues will immediately check to see if they are running the later version of a library. If not, they’ll update and pray the bug is fixed. Without versioning, you’ll get issues being reported about bugs you’ve already fixed.
When we started, we had no idea how to manage versions at all; our repository was basically just a pile of commits. If you push frequent updates, this will needlessly hassle the developers using your library.
From the start, there are three things your repo needs for versioning:
Readme.mddescribing what your library does.
Version numbers both in the source and in git tags.
History.mdcontaining versioned and dated descriptions of your changes.
Having a changelog is essential for letting developers track down issues. Any time a developer finds a potential bug, the changelog if the first place they’ll go to see if an upgrade will fix it.
Tagging our repository also turned out to be immensely useful. Each version has a well defined point where the code has stabilized. If you’re using a frontend registry like Component or Bower, you also get the advantage of automatic packaging.
Don’t forget to put the version somewhere in your source too, where the developer using the library can access it. Because when you’re helping someone remotely debug you’ll want to have a quick way to check what version they’re running so you don’t waste time needlessly.
Oh, and use semver. It’s the standard for open-source projects.
5. Add a Makefile—the hacker’s instruction manual.
I’ll let you in on a little secret: we have close to no documentation for people wanting to contribute to the library. (That’s another problem we need to fix.) But I’m amazed at how many pull requests we receive even without any building or testing instructions whatsoever.
Just from looking at our repository, how do new developers know how to build the script?
Considering that most of them have never seen a Component-based repository, my best guess is that they learn through our Makefile. Running
make forms a natural entry point to see how a new project works. It’s become the de facto instruction manual for editing code.
We use long flags in our scripts, and give each command a descriptive name. By using Component to structure and a build our library, we can essentially defer to their documentation before writing any of our own. Once you understand how Component works, you know how any Component-based repo is built.
We’ve done a lot to clean up our Makefile through the different iterations of ourcode. Remember, your build and test processes are part of your code as well. They should be clean and readable as they are the starting point newcomers.
6. Keep iterating on your process.
Notice how all the tips I’ve mentioned are about streamlining your process? That’s because maintaining a popular repository is all about staying above water. You’ll be making lots of little changes throughout the day as new issues are filed, and if you don’t optimize your development process all of your free time will evaporate.
Not only that, but the quality of your library will suffer. Without a good build system, automated testing, and a clean codebase, fixing small bugs becomes a chore, so issues start taking longer and longer to resolve. No one wants that.
We’ve learned a lot when it comes to managing a repo of our size, but we still have a long ways to go in terms of managing a really big project. Managing a large open source project takes a lot of work. As the codebase grows, it will be harder and harder to make major overhauls of the code. More and more people will start depending on it, and the number of pull requests will start growing.
We still have a few more TODOs that will hopefully make maintaining Analytics.js even easier:
We want to split up our tests even more to make them as manageable as possible. Right now the file sizes are getting pretty out of control, which means it’s hard for newcomers to keep everything in their head at once.
Add better contributor documentation. It’s kind of ridiculous that we haven’t done this yet. We’re surprised we’ve even gotten any pull requests at all without it, so this is very high on our list.
Start pull requesting every change we make to the repository ourselves as well. This way we can always peer review each other’s changes, and other contributors can get involved with discussions.
Lots of these tactics come from the Node.js source, which has great guidelines for new contributors.
Every Node.js commit is first pull requested, and reviewed by a core contributor before it is merged. New features are discussed first as issues or pull requests, so multiple opinions are considered. Node commit logs are clean, yet detailed. They have an extensive guide for new contributors, and a linter to serve as a rough style guide.
No project is perfect, but learning these lessons firsthand has helped us adopt better practices across the rest of our libraries as well. Hopefully, they’ll help you too!
How to collaborate across marketing & engineering teams when purchasing new technology
Learn how to align and collaborate across marketing and engineering teams, especially when it comes to launching new marketing software that benefits them both.
AI + Personalization: 5 Ways to Use It
Companies that use AI and a CDP can create strong, personalized campaigns that are unique to their customers. This blog explores 5 use cases, complete with examples.
Four recent GDPR changes your business needs to know about
We cover GDPR changes: AI's impact, updated cookie banners, cross-border enforcement law, EU-U.S. Data Privacy Framework; Twilio Segment's role in GDPR compliance through consent, PII protection, local data processing.