Understanding the “Repository Pattern”

The “repository” is a heavily overloaded term within software engineering, so its important to understand repository as an in memory collection facade, it’s role as transaction script layer, a generic data access primitive, it’s role within the domain driven design context, and it’s general usage within a developer’s toolkit.

All of these facets are heavily related, but basically they’re all just evolutions of the original repository pattern.

The pattern predates the Gang of Four, Domain Driven Design, and many of the popular frameworks and strategies that are out there today. You can read about repositories in An Introduction to Software Architecture by David Garlan and Mary Shaw (1994). So, what is it? The gist of it is:

Create a wrapper around all of your data access logic with a database agnostic interface, so that you can you can defer commitment to a particular datastore.

The main benefit here is that at any time you can swap underlying data store by implementing the same interface in a different repository and your business logic remains unchanged. Of course you’d need to migrate data so its not all that easy, but it does insulate your business logic from the infrastructural decision of picking a database. A skeptic might ask, well how often do you actually change your entire db? Not that often. But you cache expensive functions all the time which I’ll address more in a bit.

There are other benefits worth mentioning:

Testability — it becomes incredibly easy to test data access logic in isolation. It’s also very easy to stub repository methods to prevent needing to store enormous amounts of test seed data
Indexing — With all of your data access logic for a table or a collection in one place, its very easy to spot opportunities for add indexes or query optimizations
Cache-ability — since your data access logic doesn’t bleed into the business domain, you can very easily create higher order functions which cache high throughput calls either in memory (careful!) or in a distributed cache.
Reusability — since your data access logic is in the same place, you’re bound to found re-use opportunities here

So, what does a repository look like? The most basic is a single file which contains all of your data access logic. Nothing fancy, just database queries, related helper functions, index declarations, and anything else database related. You of course separate out persistence models into separate files — whatever makes sense. This is the core of the “repository design pattern”. Some may argue that it’s actually “transaction scripting” — but in reality is a very specialized subtype of that, which basically exposes operations as if you were operating on an in-memory set. Want to see some code? There’s plenty of resources from reputable sources available online. Here’s an old implementation in .NET which also speaks to the pattern a bit (i’m sure there are newer versions). I’m a JavaScript / TypeScript developer; so if you want to see concrete, usable implementations of the repository pattern check out this github repository where you can find working, tested repositories for almost a dozen different data stores including repositories for mongodb, to redis, to Google Cloud Firestore. Star it if you find it useful, or, better yet submit a PR with a datastore you use if you don’t see it listed! Note: the github repo is meant to be used as example boilerplate for “base class implementations”, and less as a downloadable NPM Package.

As your application grows, you need to determine a model partitioning strategy. A popular framework for doing so is Domain Driven Design or DDD. Eric Evans (author of Domain-Driven Design: Tackling Complexity in The Heart of Software) defines a Repository as part of what he calls “Tactical DDD” which is just concrete implementation of the generic repository. It’s ‘tactical’ because while it separates your domain service layer from your data access layer, it does not address the core concern of DDD which is forming a ubiquitous language within your team and organizing around discrete problem sets. In strategic domain driven design, the goal is to partition your model around “bounded contexts” which are the boundaries within the business domain where terminology remains consistent.

In this context, the DDD Repository is a “repository implementation” with the added constraint that it provides data access for an aggregate root or an entire bounded context.

To contrast with what was mentioned above, a “vanilla” repository implementation (or, tactical only) might lead to Noun-Driven-Development where every noun or table in the database will have it’s own Repository, Service, Controller, etc (hexagonal architecture constructs). With DDD, you would only have repositories for aggregate roots such as the “User Repository” which would provide access to anything within the user boundary, and a separate Product repository which would provide whatever’s needed from the product context.

Your mileage may vary on how many layers are required for the domain or application you’re working in, but in a large complex domain it’s worth noting that repositories shouldn’t speak to each other across contexts. An example of this is that a user repository shouldn’t fetch favorite products directly from the product repository). Service to service communication should be done though the service layer, each communicating directly with their own respective repositories. The boundaries here also apply within he db layer as well; avoid joining directly to outside tables.

So, in conclusion implementing a repository is always almost a good idea because it’s very little boilerplate you never know when you may need to make a change, add a cache, or add a little extra test coverage and a repository makes it infinitely easier.

As always, post below if you have questions or want to know next steps and I promise to address them in a future post!

Thank you for reading! To be the first notified when I publish a new article, sign up for my mailing list!

Ben Lugavere is a Lead Engineer and Architect at Boxed where he develops primarily using JavaScript and TypeScript. Ben writes about all things JavaScript, system design and architecture and can be found on twitter at @benlugavere.

Follow me on medium for more articles or on GitHub to see my open source contributions!