How to migrate data when building micro services

When you have a big ball of mud monolith and you get this crazy idea to try building micro services, where do you start? You have users, right? Let’s build a User Service!

Let’s set the stage. You’re using mongo. You’ve got this huge collection of users and other collections referencing the user’s ID all over the place. Let’s look at a fragment: Orders reference a given user, and you need to get the user’s current email — simply populate the user object and grab the email from there. Easy right? Another. You need some admin page to modify the user’s attributes. Let’s say, edit their email address. Simple enough.

How does this look in the micro services world? What do we need to change, to decouple users and rip it out into a separate micro service?

Well, the whole point of the micro service architecture is team autonomy and that’s lost when sharing a database, so we’re not going to make that mistake. We need to spin up a new service with it’s own brand new database. Services need their own databases, because, the moment something outside of the ecosystem can start manipulating your data resulting in potential crashes unexpected ways, you need to test both processes with every change which means coupled release cycles, and lost autonomy.

First collection? users. Now we’ve got two apps, our main app and our user app.

User creation code of course exists in the main app and does all sorts of dependent things. If they currently happen synchronously (first create the user, then create an address referencing the user), we can either communicate via REST to the micro service (which might be problematic, because the user service could be down) or we could create an instance of the user in the local system, and asynchronously construct the user in the user service by publishing that event to the event stream.

If there aren’t synchronous dependencies, newly created users can go to straight to the new user app via REST call. Let’s go this route for fun.

Now we’ve got a problem. We’ve got tons of user’s in our main app, and a handful of users in our old app. The main app populates (does an in-code join for those coming from a SQL background) user all the time, and now our new users only exist in the new user database! This is going to cause all sorts of bugs!

To solve this problem, you’ll need to do something we love to do in the NoSQL world. Duplicate data! When the user’s created in the new user service, you need to somehow notify the main app to duplicate that user in the original system! There are many ways to do this. You can publish a command on a point to point queue, or, publish a USER_CREATED event to an exchange. You can also use an event stream, such as Kafka so these messages are re-playable. Publishing an event is the better option, because its open to modification (you can add new subscribers without changing the publisher, but to execute additional commands requires changes to the publisher) so let’s do that.

Now we’ve got this event on a stream, and we need a consumer. We can build a thin little process to read the stream and for every event, durably write the user data and via API to our main app’s users collection. This will ensure the user is in the original place expected by the main system. We’re going to use this pattern a lot. Now, whenever something changes on that user, you’re going to need to publish an event indicating that change, and depending if you want to migrate all of your write endpoints to be on the new service, you’ll need a consumer to patch that object in the local system.

How do we handle existing users? We’re going to have to pipe them to the micro service somehow. Doesn’t need to happen all at once. If we want them all, we can run a script and seed the database, or we can just do it lazily! At some point, let’s say when the user places their next order (or log in, whatever) we’ll push the entire user onto our stream and have a consumer add it to the micro service.

So, admin UI, update email form — now writes to the micro service via HTTP. Micro service updates its local store, and emits an event to the stream that something changed. Main app consumer receives that event and updates it’s local users collection. Order is still able to populate user as needed. Users on the main app is now eventually consistent, but the golden source of user data is always the User Micro Service.

But what, what if things get overloaded and there’s a race condition? New user gets created in the user micro service, but it takes too long to reach the main app before other processes get called (remember, this is brownfield) expecting a user to be there. Well, then you might need to change up your strategy and create the new user in the main app first (instead of the micro service) with the bare essentials, and emit an event to the event stream that a user was created. This would allow the user service to construct the user more fully in an asynchronous process (which can later emit another event to fill in the main app with more details).

A key concern here that rises up is that these things aren’t always readily apparent to a new developer extending a system they haven’t previously worked on. Had we gone 0–100 all at once, we may have crippling problems ripping throughout our app, but, to prevent this or at least limit the risk we should have our lazy migration code, and, frankly all of the code paths that are using the new micro service sitting behind a feature flag. You can roll your own or use a vendor like Launch Darkly, but the goal is roll the change out with full backwards compatibility.

Remember to write thorough automated tests for these major code paths in a distributed test environment with real running micro services, and not solely rely on unit and local integration tests. Do not follow the advise of those who only test in production — getting testing right is critical before migrating to micro services.

Thank you for reading! To be the first notified when I publish a new article, sign up for my mailing list!

Ben Lugavere is a Lead Engineer and Architect at Boxed where he develops primarily using JavaScript and TypeScript. Ben writes about all things JavaScript, system design and architecture and can be found on twitter at @benlugavere.

Follow me on medium for more articles or on GitHub to see my open source contributions!