10 10th, 2008On my current project we decided to leverage Microsoft Sync Services (since we were already using SQL CE for our local database in a smart-client solution with an Oracle back end, it seemed to be the logical choice). After a few months of tweaking and jumping through a few hoops to get SQL CE and Oracle to play together, we had a solution that appeared to work just fine… on the surface anyway. We decided to assess the solution we had and see how far down the road it would take us. This article describes our findings which are some inherent problems, some problems integrating with Oracle, some performance related, and some just because one size does not fit all.
Note: I will not assume that any or all of these hurdles cannot be overcome with some level of effort. On our project the effort to implement our own synchronization service far outweighed the cost of trying to work around these issues and in the end we still weren’t sure it would work.
System overview
Database Server: Oracle 10g
App Server: Sync Web Services hosted in IIS
Local Database: SQL CE
Synchronization characteristics
The database contains around 40-50 tables, some tables are download-only and others are unidirectional. The business rules ensure that only one user can edit a record at one time and the user must explicitly give up ownership before another user can edit the record. The data is what I would call document based, in that the user owns and works with a parent record and many detailed child records. Each parent record and it’s child records are self-contained. Because users cannot edit a record at the same time there is no need for merges (at least in theory). Data is only logically deleted which simplifies the sync process in that there is no need to the tombstone table.
How Sync Services works
Client-side change tracking: SQL CE has a hidden feature to track changes on rows; this feature is leveraged by Sync Services as long as the tables are created via Sync Services. Each time a row is modified a flag is set on a hidden field for that row.
Server-side change tracking: If you’re using SQL Server 2008, change tracking is integrated. SQL Server 2008 change tracking resolves some of the issues that be identified in this article. It’s change tracking is built into the SQL Server core engine and outperforms similar implementations using triggers. Change tracking also occurs per transaction which is harder to accomplish with triggers. If you’re unfortunate to be working without SQL Server 2008 (or Oracle in our case) you will need to create a couple columns for each table and implement a trigger for each table to be change tracked. The trigger simply updates a client id and timestamp when the record is modified.
Synchronization occurs via the Client and Server Providers. Each of these provide a set of methods which are used to perform sync operations. No transport mechanism is provided out of the box, however it’s a trivial task to publish the server provider methods in a web service.
A sync operation consists of looping through each table on the client, checking for changes, packing up the changes and sending them to the ServerProvider which applies the changes. This occurs for each table.
Tables can be synchronized at different frequencies by creating a separate service. For example lookup tables may only require synchronization once per day.
Tables can be grouped. All grouping does is ensure that a dependency tables are updated after their dependency. Customer would need to be updated before Address (for example).
The anchor: Each time a client requests data from the server, the server provides a timestamp which the client stores. On subsequent download requests the client sends its anchor to the server. The server queries the table in question to determine if there are any records with a timestamp greater than the supplied timestamp.
Transaction isolation: On the server, serializable or snapshot isolation is required for both upload and download operations to ensure that data does not change during the sync.
The first problem: no-change-sync takes 30 seconds
Sync Services must perform a separate check on each client and server table to check for changes. Each table check is a separate call to the web service. In our case it took 30 seconds to perform a no-change-sync. Ideally a no-change-sync should be sub-second. The client should “know” if it has any changes (just maintaining a single flag would suffice). For server changes, the client should ask “Are there any changes for me?” with a simple yes/no response from the server. Further on I will discuss why background sync is not feasible, but basically a 30 second no-change-sync means that the user is locked out of the system for 30 seconds when there aren’t even any changes.
The sub-second goal is particularly important on our project because of background sync. In most cases there will be no data changes and the number of hits on the server will be pretty high. 6000 users pinging the server every 30 seconds equates to 720,000 requests per hour (200 requests per second) on the server. It has to be fast.
No granularity on transactional units
A sync operation is an all-or-nothing deal. Because Sync services has no knowledge of the relationships on your data (other than order which is determined by sync groups), the longer the client waits to sync, the more data there is to sync in a single operation. The transactional unit in Sync Services is the entire mass of data that needs to be uploaded or downloaded. There is no way to break the data up into smaller chunks without knowing how the data is related. Of course you can sync your lookup tables separately and batch those into smaller chunks. But when the data is all related there are no other options. This inherent flaw is the cause of the next three problems.
Long running transactions
Each time an upload operation starts a transaction is started on the server and the transaction is closed when the last table is checked and uploaded. The longer you wait to sync, the more data you have to sync and the longer the transaction. Long running transactions can cause serious problems. The length of these transactions is also dependent on the user’s bandwidth and other conditions which leads to the next problem…
Client controlled transactions
Just off the bat this seems like a bad idea. What I mean by this is that it is the client that initiates the start of the Oracle (server) transaction and the client that determines the end of the transaction. The length of the transaction can be determined by: their bandwidth, how much data they have that’s not synchronized, or if they just decide to suspend their laptop in the middle of a sync operation. If user’s behave nicely and sync often, make sure they have adequate bandwidth and be sure to not terminate the application or suspend the laptop while a sync is in progress things may work just fine… but we all know this isn’t going to happen.
Sync operations are non-recoverable
By non-recoverable, I mean that if an upload or download fails part-way through, the sync operation must completely restart (again for tables without relationships it’s easy to continue where you left off). For data that has relationships to other data, ALL the data must be synchronized in a single operation and if there are any failures the sync operation must start from the beginning. If the user has an unreliable connection (low QoS), this problem could compound. Poor QoS means that the connection could be lost resulting in a hanging long running Oracle transaction. The user will possibly continue to add more data, and by the time they try to sync again, they have even more data to synchronize which means even less chance of getting all their data through, possibly leaving a few more hanging transactions before a successful attempt is made. These scenarios increase the network traffic and load on the servers and database.
Data integrity issues under load conditions
The recommended solution for Database Servers (outside of SQL Server 2008) is using triggers with timestamps. This solution is flawed right off the bat. The best way to describe this is with the help of the help of the following example. (Be sure to read the definition of anchor above before continuing). Assuming a parent table, “Project” and child table “Line Item.” Client A is performing an upload which will last for 15 seconds. Prior to upload a transaction is started. After 7 seconds of uploading all the project records are uploaded and just then Client B performs a download sync. Because the projects have not yet been committed the server returns no results (however, Client B’s anchor is updated to the 7 second mark). At 15 seconds client A has completed its upload and Client B performs another download – the only data that’s downloaded are the line items. Because Client B’s anchor was at 7 seconds only changes after the 7 second mark are retrieved. Of course this will result in FK Violations on the client because the Projects that the Line Item references don’t exist.

SQL Server 2008 solves this problem by (a) not using timestamps; and (b) by setting the version of the rows all to the same version which is based on the transaction.
Impossible to support background sync
Because the sync operations must scan ALL data and SQL CE does not support snapshot or serializable transaction isolation, it’s a lost cause even trying to attempt background sync. Once a sync operations starts the user cannot modify the data. If snapshot isolation were available, the sync operation could start a transaction and the data within the transaction wouldn’t change.
read comments (1)