Access Rights — Data Synchronization: Patterns, Tools, & Techniques

The discussion of access rights, and specifically how they are handled is similar in concept to the discussion of deletions. In both scenarios, the sync API needs to be modified to be more general than just exposing a list of changed records, and in this way, either topic can be discussed and approached without regard to order. In other words, they are conceptual siblings and are not dependent upon one another conceptually.

It should come as no surprise then that the implementation of access rights in regards to sync will be similar to deletes, in that they both utilize SQL joins when fetching changed records for the purpose of altering the response of that API. This, however, is where the simplicity of this discussion ends. The implementation of access control can be non-trivial, as a variety of considerations must be taken into account. Let’s briefly take a summary of those potential complexities before analyzing them more fully.

Potential Complexities

The potential complexities, and therefore considerations, that can arise when discussing access rights and how they pertain to data synchronization are varied; however, few applications have the luxury of avoiding them. Additionally, access rights and permissions, in general, are often as varied as the applications in which they are securing. Best practices do exist in the security sector, but the implementation of those best practices can just as varied as the implementations for data synchronization to which we are discussing. It is the intersection of those complexities that lends itself to additional consideration.

For all their variety, the topics for consideration can generally be broken down into three categories:

Excluding Records

The most basic and fundamental consideration of access rights is to exclude certain records from the privy of others. This can take the form of simple ownership, to more fine-grained visibility modifiers, but all share the same common objective of restricting and granting access.

Granting & Revoking Access

Rarely is data static when discussing synchronization, and this is especially more pronounced when it comes to adding and removing permissions over existing data. As a user’s visibility to data changes, so to do the techniques needed to synchronize their data.

Object Hierarchies

Object hierarchies, or the concept of parent/child relationships in data, present unique challenges to access rights when synchronizing data.

Excluding Records

Excluding records due to access rights is perhaps the most fundamental of access control requirements. All types of applications have the concept of permissions, and with that comes limiting visibility of data based on the authenticated user.

Excluding whole records from synchronization involves the conceptually simple process of ignoring those records when the sync API would have otherwise included them. Stated another way, even if a record’s updated_at timestamp would normal indicate its inclusion in a sync, some other attribute of that record would override this decision and therefore exclude it from synchronization. This is perhaps best illustrated by returning to our running example.

In our running example, let’s add the concept of a paid membership to our user model. Paid users in this example will have access to certain information that non-paid users do not – specifically campground records themselves will be marked as either premium or not.

campground_tier column

This change can be as straightforward as the addition of a single column, as is illustrated in the diagram above. By adding a campground_tier column to our Campground model, we can alter the synchronization process to key off that field when determining if a given campground should be included in a resultset. Specifically for our example, we will use a value of zero to represent the default tier, and therefore something that is included for everyone, whereas values greater than zero would necessitate that a user has a premium account – this decision is purely arbitrary, however, and simply serves to move our example forward. The DDL for this change can be represented as;

Until this chapter, however, our example has not needed a User model – the examples have intentionally glossed over things such as security or error handling for the purpose of not obfuscating the principals of which we were discussing. The discussion of access rights necessitates that we introduce at least a minimum concept of a user model. For our example, a user may simply have a username, password hash, and access tier (which we will tie back to the campground tier). Therefore, the DDL necessary may look like;

Users table

With the users table added to the database, a User model can be created in Javascript as follows;

For brevity, we will not dwell on the details of the model, as its form follows the patterns we have discussed to present, with an exception for a small discussion of the timestamps. In chapter 1, when discussing whole database synchronization, our models lacked such timestamps entirely. In chapter 2, when discussing delta synchornization, we added those retroactively to our example at the cost of lost data – we had to retroactively supply values for both updated_at and created_at for any existing database records. While there may not be any obvious utility in syncing user models, there is also limited downside in adding the necessary timestamp fields on the outset. In fact, doing so from the beginning is the recommended pattern in that should a need arise for them, they will already be there. Following this pattern also allows developers on a project to follow a consistent routine without every developer necessarily having to fully understand the nuances for the pattern (though it is the author’s opinion that every developer would be benefited from such knowledge).

With both the User model in place, as well as a campground_tier property added to the Campground model, we can return our discussion to that of access rights. This is accomplished by comparing the set of potential records to sync with the current user to identify the subset of those records that meet the desired characteristics. In terms of our example, this means campgrounds with specific tiers as correlated to each user’s access_tier.

Access Rights Restriction

The above diagram helps to illustrate this concept more tangibly. The large outer circle illustrates all known campgrounds in our sample system, while the smaller blue circle towards the bottom represents all the campgrounds which have been updated since a specific user last synced. Without additional modification to our example system, the blue also represents all the records the sync API will return to a mobile client in this scenario. However, the red circle represents all the free tier campgrounds (e.g. those available to users of our sample application without paid subscriptions), and in this scenario, the user who is trying to sync is not a paid subscriber. Therefore, the desired behavior is not to sync all the blue campgrounds, but rather the intersection of the red and blue circles – the recently updated free campgrounds.

Set operations such as this are what relational databases excel at, and precisely how this restriction of visibility can be best implemented – via SQL joins. By modifying the sync API to join the active user to the campground model, in addition to its current logic, we can easily implement this restriction of visibility. Let’s review what this would look like in code to better help illustrate;

The above code snippet utilizes specific features of the OnionRM library used within the example application, however, the generalities are common to most any database access library. The above code performs a SQL join from campgrounds to users based on the relationship between the user’s access tier and a given campground’s tier. While the above syntax can be jarring, the logic essentially executes the following SQL:

As currently implemented, such logic can only be applied to the Campground model, as no other model in our example should or can be directly joined to the User model like the above code snippet is doing. This presents a difficulty when attempting to incorporate the above logic into the sync API as a whole – how do we permit some models to be restricted based on access rights while limiting others. A naive approach might involve an if-statement, to account for this specific scenario;

The above solution, while correct in behavior, is far from desirable in terms of its implementation. When we discuss object hierarchies later in this chapter we will revisit this and discuss a more general purpose solution, but until then there is still much in the way of clean up and preparation we can do. Specifically, the above approach contains an inner block of duplicate code resulting from the necessity of an outer conditional. This can be solved by abstracting away the data access request into an access chain pattern.

Access Chains

An access chain is a pattern which we will develop throughout the course of this chapter, by which a chained access descriptor will be constructed piecemeal such that a holistic approach to access rights can be asserted over any given piece of data, but without overly burdening any single component of the application with the knowledge of the entirety. Like many concepts, this will make more intuitive sense with exposure and practice. Returning to the dilemma of the previous section, let’s see how we might implement the access chain pattern to remove the redundancy of that previous code snippet. Let’s start by simply stating our goal for that bit of code;

Find all recently updated records of a given model
If that model happens to be the Campground model, additional restrict those records based on the logic described in the last section
Process those records for final output to the client

Stated this way, we can also state that our current implementation duplicates step 1 and 3, whereas ideally, we’d prefer code without any duplication of those three steps. This duplication is the result of the conditionality of the second step – sometimes only the first and last steps are required, other times all three. To solve this we need to construct the query in such a way in the first step that it can be executed as-is, or on a conditional modified, without branching the execution path. The solution, therefore, is to construct the query via a chained set of operations.

Those familiar with jQuery may well be familiar with the pattern of chained objects. An initial jQuery selector is performed and returns another jQuery object which can be further refined until such time a final and desired result is achieved. An example of such jQuery usage might look like:

Has this trivial jQuery analogy required that sometimes the second p tag be hidden, and other times the first, it could instead be written like this:

Usefulness or best practices of the above jQuery code aside, as this is not a book on jQuery, the above does serve to illustrate an analogy of concept to that of access chains. Returning to our sample project, we can apply this pattern via OnionRM’s chained query model in the following fashion:

The initial query is constructed on the first line of the above code sample, whereas the conditional join to users is implemented on lines 2 through 5. By the time the application reaches line 7, and finally runs the query, it always expresses the desired result, regardless of which code path it took. Other ORMs allow for the implementation of this concept, provided they allow queries to be constructed in piecemeal.

Functionally we are left with a piece of software that is no different than it was at the end of the last section but has gone from 20 lines to 13 lines, and in the process has reduced the redundancy of mental complexity significantly.

Throughout the following sections of this chapter we will revisit this pattern and expand upon our usage of it, as doing so will allow for more succinct expression of intent. With this lose-end wrapped up, and pattern established, we can return our discussion to access rights and delve into the complexities of granting and revoking access.

Granting & Revoking Access

Both the addition and removal of access rights has an impact on data synchronization. This is especially noteworthy in the impact it creates for devices which have already synced – creating the opportunity for gaps in synchronized records.

When adding permissions to an already synced user, the existing techniques we first discussed in the delta-based synchronization chapter begin to show their limitation. Specifically, if records are synced based on the intersection of the updated_at timestamp, and each device’s last sync timestamp, then there exists an opportunity by which records could be omitted from syncing to a client device following the addition of additional permissions. The reason for this occurrence is counterintuitive at passing-thought but can be easily expressed with an example.

We have already seen scenarios that illustrate corner cases in synchronization – the scenario we began chapter 2 with when discussing handling deletions. In that scenario, we saw where the timing of events could lead to different representations of data on different users systems, despite both users’ being up-to-date with their sync. Similar timing events can arise when granting or revoking access. Let’s return to our fictitious Alice and Ann from chapter 2. Alice is once again a user of our system, and Ann reprises her role as an administrator of our database. With that background, consider then the following series of events:

Alice, being an existing free-tier user, is already fully synced.
The campground titled AAA Deluxe Campground is an existing premium campground in the system.
The AAA Deluxe Campground is not accessible to Alice, for intended reasons.
Alice, upgrades to a paid subscription.
Alice’s device syncs.

At this point, despite Alice having a paid subscription, she still cannot see the AAA Deluxe Campground in her application. She will never see this campground in fact, not unless one of the following were to occur:

Ann, being the administrator, was to update the campground’s listing.
Alice was to delete and re-install the app.

Let’s take a moment to analyze why this is.

In the first possible “solution”, the missing campground will be picked up due to its timestamp being updated. In the second, the campground is picked up because the application is forced to re-perform its initial sync. Both of these possibilities hint at the underlying issue – the updated_at timestamp of the AAA Deluxe Campground is older than Alice’s last sync timestamp and is, therefore, being excluded from our existing sync algorithm. Our algorithm to date has never had the possibility for excluding records and has therefore never encountered a scenario where a user might gain access to records without the necessity of their updated_at timestamp also being updated.

To resolve this the algorithm needs to somehow make allowances for retrieving some records that are older than the last sync timestamp.

One possible solution to this is to track when a user’s access tier changes, and use that to essentially re-force the initial sync from the server side. To do this we will first need to add a column to the users table:

And a field and replicated logic to the User model to keep this populated, and to update it when a user’s access_tier changes:

Of particular importance are lines 12 through 14 of the User model. On those lines, within the before save-event, the application is checking to see if a change has occurred to the access_tier field, and if so, it is also setting the last_access_tier_change field to the current date and time. This takes advantage of a feature of the ORM by which a model can be interrogated as to if a field has been modified – this functionality is not unique to OnionRM, and can be found in many other ORMS.

Once the application can properly track when a user’s access_tier has last changed, we can update the sync API to make use of this.

In the above code, we can see the general idea here – modify the API, and on a conditional basis re-perform the initial sync. If the user’s permissions have recently changed, then ignore the timestamp supply and just re-sync everything. This is crude but effective. There will never be a situation where the client device will be missing data, it is easy to implement, but the downside is that it is very wasteful. Especially in situations where user permissions can change frequently or where the dataset is large.

Revoking access involves challenges too. Much like granting access can result in users missing data, revoking access (if not implemented properly) can result in the opposite – users may find themselves still able to see stale copies of data that they should no longer have access to.

The situation arises when a user, having previously been fully synced, loses access to a portion of that data which they have already synced. The key element here is a portion. Had the user lost access to all the data – perhaps in the case of a corporate app that only employees can access – then other, non-synchronization based solutions would be sufficient. In the case of a partial loss of access, the user should still be able to use and sync data within the application, albeit a smaller subset than previously.

In the last chapter, we discussed various techniques for handling deletions, one of which involved replicating hard deletes from the server to the client. If implemented, as records are deleted on the server, those IDs are recorded so that the same records can be deleted at a future date on a client device. This has the added benefit of allowing the server to revoke records from a client device.

To conceptualize this, think of it as virtual deletions. The client device, when told to delete a record with a specific ID, has no means of knowing if such a request is the result of a hard delete, soft delete, or in our current case a revocation of access.

To implement this the server must, when processing sync requests, identify the IDs of all records which a user previously had access to, but does not any longer, and present those IDs in the sync API response as-if they had been deleted.

In our example application, we can find all the campgrounds that a user has access to by joining to the users table on user.access_tier >= campgrounds.campground_tier, so, therefore, it holds to reason that all the campgrounds a user does not have access to can be identified by inverting that relationship.

This only needs to be performed when the user’s last_access_tier_change field is newer than the last sync timestamp supplied to the sync API – in other words, it only needs to be performed when the user’s permissions have changed since they last synced.

Applying these changes to our example sync API involves adding an additional parallel code block. This is a rather unfortunate compromise and is the same compromise we had to make when discussing deletes. As was the case then, this too is logic that will be refactored in Part III of this book when we begin discussing performance and scalability – as this compromise is certainly neither. However, what it lacks in those respects it makes up for in terms of being simple to comprehend and keeps us focused on the task of conceptualizing the logic involved.

The implementation as seen in the immediately-preceding code snippet is presented in its entirety, though the bulk of the code is boilerplate. The core of the logic is isolated to these few lines:

In that logic, we invert the normal criteria for access rights, by asserting the inequality of less-than (as opposed to greater-than-or-equal). Additionally, there are provisions for disregarding the API supplied last sync time, and instead reverting to the point in time when the user’s access rights were updated. These two small, but impactful, changes have the result of syncing virtual deletions records upon access right revocation.

Object Hierarchies

When constructing permission constructs of real-world data, it is not uncommon to encounter parent-child relationships among such entities. These hierarchies of data present a unique challenge, and opportunity, when it comes to synchronization.

For the purpose of accurately modeling a problem space, we might want the relationship between a child and its parent to have a meaningful impact on our application. Doing so not only permits the model to more closely mirror the real world, but it also allows the administrators of such an application to more concisely work with the users and their permissions. Said another way, when the system more closely mirrors the reality of the problem it is solving, the users of that system experience less mental fatigue when performing routine tasks.

Up to this point, however, our synchronization algorithms have essentially dealt with one model at a time. Updating the sync API in this regard involves revisiting the topic of query chains we discussed earlier in this chapter, with the goals of;

Building a query that joins parent to child
allowing us to sync the child as-normal
but, also allowing us to use the characteristics of the parent to additionally restrict which child objects are fetched.

parent/child access rights

The above figure illustrates the objective. By asserting a relationship between a parent and its child, we can influence the child records fetching during any given sync.

Return briefly to our running example, this could be the case with campground reviews. Currently, our example application has a mechanism for restricting campgrounds synced to a user, but despite that, the reviews of those campgrounds are sync regardless of if the user needs them.

At best this creates waste. Unnecessary objects are synced to client devices, wasting:

Server resources (e.g. CPU & memory)
Bandwidth (both client and server)
Storage space on client devices

In our example, other than being wasteful, syncing these reviews is unlikely to be a consequential oversight. The same cannot be said for all applications and scenarios. If a child record represents private, confidential, or otherwise sensitive information, we would not want to sync those records. An example of this might be an insurance application, where notes are associated with claims. In this scenario, an insurance claim might be visible to only specific people, such as the insurance policyholder. As the claims process is worked, various notes might be added to the claim. these notes might even contain private information, and unnecessarily syncing those to mobile users who do not have permissions over them is not just wasteful, it could also be a legal liability to the insurance company.

Despite our running example not having the same level of criticality as the insurance example we just discussed, it will suffice nevertheless.

To begin we must extend the query chain, but do so in a way that is general – all calls to the sync API must execute the same logic, regardless of whether they belong to a parent object.

One way to accomplish this is to add a method to each model by which it can be interrogated. The logic would look like this:

The sync API invokes method on object
The method returns a query chain
1. If model has no parent, the chain is the model itself
2. If model has a parent, the chain is the model joined to its parent
The API extends the chain as needed
The API finally executes the query chain and serializes the results to send to the caller

Steps 3 & 4 in the preceding outline should look familiar, as they are essential steps we discussed in the section on access chains. For that matter, the second step should also be vaguely familiar as it too is essentially similar to the algorithm outlined previously. The real distinction here is that we are pushing this logic into the application’s models, and therefore abstracting this conditional behavior from the API. From the API’s perspective, it treats all models the same, simply differing to it to get the necessary query chain and going from there. Each model is then at the liberty to construct its own access chain as is unique and necessary to that model.

Let’s review how this would be implemented in our sample application.

First, we add a class-level (i.e. static) method to each model called scopedForUser, accepting a single parameter of the currently authenticated user, and returning a query chain for the current model.

The previous code snippet represents the default case for all models. We will circle back to this, but first, let’s update the API to make use of these new methods. Recall that the sync API first creates a query chain based on the user-supplied criteria like this:

This is also the line of code that will need to be updated. Instead of directly starting a query chain, the API should instead delegate this functionality to the model, like this:

At this point, the API remains functional – and, functionally unchanged for that matter. If we were to stop at this point the only downside is that of more obtuse code. Luckily it is not our intention to stop refactoring here, and instead, we will return our focus back to the scopedForUser methods we created earlier.

In the previously outlined steps, we mentioned that our scopedForUser method should return a query chain that either:

If model has no parent, the chain is the model itself, or
If model has a parent, the chain is the model joined to its parent

Applying this to our example app, we will leave the Campground model’s scopedForUser alone, as Campground has no parent. The CampgroundPhoto and CampgroundReview models will be updated to join to their parent (Campground).

All that remains is to push the access tier logic into the scopedForUser method, and thusly remove it from the API directly.

For Campground, this is as straightforward as porting the existing logic we previously created in the API. Thusly the Campground’s scopedForUser would look like:

For the CampgroundReview and CampgroundPhoto models, the process is similar. By joining each of those models to the Campground model, we can then update that query chain to perform similar logic as the Campground model itself performs. Thusly, when a campground will have been excluded from a sync, its child objects will also be excluded by the same logic.

Observant readers may have questioned why we ever joined the Campground model to the User model earlier in this chapter. On the surface, it does appear to be a frivolous step. Twice again we have refactored that logic, keeping the join each time – and even adding such logic to additional classes, as we have just done. The reasoning for this is one of a separation of concerns, and a reasoning that would have not been apparent until having first discussed both access chains and future refactoring such chains into the models themselves like we have just done.

By joining the Campground model to the User model we have been able to perform the necessary security checks on the SQL join directly, as opposed to within the Campground model’s usual where-clause. This distinction is important, in that it separates out those such conditions that relate to security. Consider again how the API consumes the scopedForUser method:

From the perspective of the API, the scopedForUser method is a black-box – it has no idea what steps it performs, just that it returns a query chain that can be extended, and it does extended it. With the very last function call in the previously show snippet, it appends addition where-clauses by way of the find method. Suppose for a moment that, either by accident or malice, that the filter supplied to find contained an assertment of the campground’s tier.

Had the scopedForUser method asserted the user’s visibility by validating directly against the same model, then the previous logic would have overwritten such asserts and potentially exposed more data to a user than was intended. By instead joining to another table, and perform security constraints there we avoid the potential for such simple, and yet unfortunate, lapses in security. This change by no means makes the system fool-proof but certainly does a great deal towards that goals.

Mobile

Throughout this chapter, we have focused on server-side changes. The mobile application was not an afterthought in this process, however. Many of the designs we set out were actually crafted with the intention of minimizing, if not entirely preventing, changes in the mobile application. Indeed, a carefully crafted sync API is designed to do just this, as changes on client devices are more expensive than changes on the server. Due to the inherent nature of such systems, one can never have full control over when client devices upgrade. For this reason, a well-designed sync API needs to maintain a degree of backward compatibility, and the more forethought and care are taken to prevent, or limit, changes necessary on the client, the less backward compatibility that needs to be supported going forward.

If not already implemented for the purpose of deletions, the mobile techniques discussed in the last chapter for processing deletions should be implemented at this time. By doing so, even if no true deletions will occur, the techniques outlined in this chapter for virtual deletions can be implemented.

Outside of that one consideration, no additional changes should be necessary for a mobile application to facilitate the techniques discussed in this chapter.

Summary

As we set out to accomplish, we have surveyed some of the various complexities that can arise when introducing permissions and access rights. In the course of discussing the patterns for these challenges, we have also seen the complexity of the sync API increase quite substantially. We have taken some limited steps to reduce, or at least constraint the increase, but we have not set out to refactor that yet.

In part III of this book, we will return to this subject and refactor our API as we prepare to scale – both in terms of API usage as well as API complexity – but, for now, we have completed our initial tour of server to client synchronization, and thusly part I of this book.