Handling "schema" change in production

I often heard and read about such situation, you started a brand new application based on a NoSQL datastore everything goes fine so far, you’re almost happy but all of the sudden you face a critical point: you need to change the “schema” for your application and you’re already live,running production solution.

From this point in time, you must ask yourself, is my  amount of data relatively small(i.e. documents count) so I can run a batch process in order to update all the documents in bulk , writing a small conversion program.

Unfortunately won’t always turn this way and sometimes due to the big amount of data you’re dealing with,  performing bulk batch updates wouldn’t feasible due to the time and impact on performance.

In such case you must consider a Lazy Update Approach , this is where in your application you can check whether the document is in the ‘previous schema’ when you need to read it in and update it when you write it out again.

Over time this will eventually migrate documents in ‘previous schema’  to the new, though it’s possible that you may end up with documents that rarely get accessed and so remain in an ‘previous schema’. You must then wait for the number of documents that remain in the  ‘previous schema’  to be small enough so  you could run batch jobs to update these remaining documents.

During this conversion process, you need to be very careful to any process which perform operation over multiple documents, this is the downside, those process might need to be rewrited as well and at least carefully reviewed.

Comments are closed.