Categories: Feature deep dive

Deep Dive: New bookmark sync in Nightly

For the last two years, the Firefox Sync team has been hard at work improving bookmarks on all our platforms. Last year, we added support for uploading bookmarks to Firefox for iOS, and made change tracking more durable on Android. Today, we’d like to tell you about our latest project to overhaul bookmark sync in Firefox for Desktop.

A homemade cardboard bookmark shaped like a fox

Firefox bookmark. Image credit: Lonnie Jacobsen

Historically, bookmark sync has been plagued by problems that were difficult to isolate and fix.

  • Bookmarks would be duplicated, lost, or reordered, temporarily or permanently.
  • Folders with different contents would smush together.
  • New bookmarks wouldn’t make their way to all devices, causing them to gradually fall out of sync.
  • Moves would be partially or completely undone.

At the root of all these issues was an approach to syncing that didn’t consider the unique challenges of bookmarks. In this post, we’ll dive into an overview of how Sync works, why bookmarks are special, and the advantages of the new design. Whether you’re a new or long-time Sync user, we invite you to flip the pref, try the new bookmarks engine out, and send us your feedback!

TL;DR: How do I opt-in?

The new bookmarks engine is currently behind a pref in Beta 61 and Nightly 62, but you can turn it on easily. First, we recommend you make a backup of your bookmarks, just in case:

  1. Go to Bookmarks > Show All Bookmarks to open the Library.
  2. In the Library toolbar, click the “Import and backup your bookmarks” button to bring up the menu.
  3. Click “Backup…” to export your bookmarks in JSON form.

Now, you can enable the new engine:

  1. Open about:config.
  2. Search for services.sync.engine.bookmarks.buffer.
  3. If it’s true, congrats, you’re already using the new engine! If it’s false, double-click the row to toggle the pref to true.

That’s it! Keep an eye on your bookmarks: do you notice any issues when you sync? Try adding, deleting, and moving bookmarks around on all your devices, and see if your changes sync everywhere. If you’ve been using Sync for a while, there’s a good chance you have some inconsistencies on the server already. After you turn the new engine on for the first time, Sync will download all your bookmarks from the server, and run a full merge. This is a good time to notice if any of your bookmarks are deleted or rearranged.

If you start seeing problems:

  1. Install the About Sync add-on.
  2. Go to Tools > About Sync, or open about:sync.
  3. In the “Log Files” section at the top, set “Level of messages written by Sync engines” and “Level of messages written to about:sync-logs log files” to “Debug”.
  4. Make sure “Create log files even on success?” is checked.
  5. Trigger a sync to reproduce the problem.
  6. Download the logs as a zip file.
  7. Scroll down to the “Data provider options” section at the bottom.
  8. Select “Load local sync data”, and make sure “Anonymize data” is checked.
  9. Click “Export to file” to generate an export of your validation data, with all titles, URLs, tags, and other identifying fields obscured.
  10. File a bug in “Firefox :: Sync” on Bugzilla, with the logs and anonymized validation export as attachments. Please be sure to include in your report which devices are attached to your account: other Desktops, Android, and iOS.

You don’t need to understand how bookmark sync works under the hood to use new bookmark sync, but read on if you’re curious!

Anatomy of a Synced Data Type

Sync is a complicated beast—30 files and 25k lines of JavaScript, written over ten years—but the core ideas are simple. Every syncable data type (bookmarks, history, tabs, passwords, and so on) is backed by a tracker, a store, and an engine.

  • The tracker monitors and records changes, like when you add a bookmark, update a saved password, or open a new tab. The tracker notes the change, and tells the engine to sync once the number of changes exceeds a threshold.
  • The store is the glue between Sync and storage in Firefox. It marshals Sync records, which are encrypted, type-specific JSON blobs, to and from a backing data store. For bookmarks and history, the backing store is Places, which is an SQLite database that lives in your profile folder. For passwords, addresses, and credit cards, the backing store is an on-disk JSON file.
  • The engine manages the sync lifecycle. It queries the tracker for a list of what’s changed since the last sync, downloads new records from the server, resolves conflicts with local changes, applies the new records to the store, asks the store to inflate Sync records for new local items, and uploads the new items back up to the server.

Hasty Generalization

Sync is very generic, which has some important consequences that we’ll talk about later.

A record is just an encrypted JSON blob: it’s up to the store on each device to make sense of it, figure out how to represent items (for example, a record per bookmark, a record per page with a list of visits, or a record per device with all open tabs), and translate the records into the right format for the underlying data store.

Records are decrypted and applied to the store one at a time, in no particular order, and typically without a transaction. Conflict resolution considers each record in isolation, and uses timestamps to decide which is newer. Timestamps are subject to clock drift in the best case, and wildly inaccurate in the worst.

This approach is an artifact of Sync’s history more than a deliberate decision. Sync began its life in 2008 as an add-on called “Weave”, and integrated into Firefox in 2010. Backing stores like Places were designed long before this, and most weren’t built with syncing in mind.

The generic design works surprisingly well in most cases, but an important lesson that we’ve learned from working on Sync is that this only works well for the simplest data types. However, each data type has different requirements.

  • Append-only data, like history visits, are easy because they can be synced out of order, and we don’t need to worry about conflicts.
  • Semistructured data, like passwords and addresses, are independent, but need some kind of conflict handling: what happens if you change a password for a site on your laptop, and the username for the same site on your phone?
  • Hierarchical data, like bookmarks, are especially thorny, and need robust conflict resolution that considers multiple records and the relationships between them.

Bookmarks

Bookmarks are probably the most complicated data type we sync, and one of the more valuable. You might visit dozens or hundreds of sites in a week, and it’s okay if some pages get lost in the shuffle. But when you bookmark a page, you’re signaling that it’s important in some way.

Bookmarks are versatile. Some folks use them as a reading list, or a drop file of things they’d like to see again. Others meticulously organize the bookmarks they’ve collected over many years, and use keywords and tags for easy access. Still others go with a mix of strategies. I have just under 4000 bookmarks that I’ve saved over eight years, most in “Other Bookmarks”, some organized into folders, and about a dozen in the toolbar.

It would be a shame if I took the time to tend to my bookmarks, only to have Sync lose or scramble them. Sadly, this has been a common complaint from folks over the years, eroding trust in Sync and Firefox.

What makes bookmarks so challenging? The short answer: they’re trees! Your bookmarks form a hierarchy, where each one lives in a folder, and has a unique position within that folder. Sometimes, the position and folder doesn’t matter; other times, it does. I’d be hard-pressed to remember that this awesome article about how SQLite works is #1480 in “Other Bookmarks”, but I’ll notice right away if my favorite recipes end up in the menu instead of my recipes folder, if the separators between my folders are off, or if my toolbar suddenly shows Bugzilla ahead of Purrli!

Sync can’t know how you use bookmarks, so it must be able to handle every case.

The server doesn’t distinguish between bookmarks and other data types: everything is stored in a collection of flat, unordered, and encrypted records. Folders keep pointers to their children, and children back to their parents. This means that some changes, like moving a bookmark between two folders, or deleting an entire folder, require uploading multiple records. Corruption happens when these changes are lost, not made in lockstep, or applied out of order.

Corruption doesn’t mean all your bookmarks are unrecoverable. They might be missing, or appear in different folders or the wrong order on different devices. Thanks to the magic of complex distributed systems, corruption also isn’t stable: inconsistent data can become eventually consistent, and much of Sync relies on this property to work.

In a common case, your desktop might realize halfway through the sync that the bookmark tree on the server doesn’t make sense, maybe because your laptop didn’t finish uploading all its changed bookmarks. At that point, it’s too late. Sync can’t undo the changes it made, because it already applied all the records it saw directly to Places.

Another case where corruption typically happens is when you connect a new device to your account. A first sync can take minutes for large trees, as inserting or updating into Places incurs a half-dozen database statements per bookmark. Since bookmarks are unordered on the server, the new device might see a child before its parent folder, and stash the bookmark in “Other Bookmarks” until the folder arrives. Some folks will notice this and try to help Sync out, which usually makes the problem worse.

Remember how I mentioned earlier that Sync records are encrypted and opaque to the server? This is still very much the case! It means that Mozilla can’t see any of your bookmarks, which is a core privacy guarantee. On the other hand, it means that Firefox needs to detect corruption and resolve merge conflicts locally. Everything must happen client-side; the server can’t help at all, because it can’t decrypt your bookmarks.

We’ve learned from experience, and many reports of bizarre bookmark issues, that “stash everything in Places and trust that we’ll get it right eventually” doesn’t work. Last summer, we set out to fix these problems once and for all.

Mobile First!

For inspiration on how to fix bookmark syncing on Desktop, we turned to Firefox for iOS.

In contrast to Desktop, iOS was built to sync from the start. Bookmarks are stored in a database schema that separates “value”, like the title, URL, or description, and “structure”, or parent-child relationships. When you change a bookmark on your phone, or make a change on your laptop that’s synced to your phone, iOS doesn’t mutate the canonical representation of that bookmark in a “bookmarks” table, as on Desktop. Instead, iOS keeps the original bookmark value and structure in a “mirror” table, and records the changes in a separate table: “local” for changes that you make, and “buffer” for changes that Sync makes.

The mirror helps with conflict resolution; preserving the value and structure until the next sync, as well as keeping them separate, makes three-way merges possible. You might recognize this as the same idea behind version control systems like Git and Mercurial. The “shared parent” is the original value and structure, and the left and right sides are the “local” and “buffer” tables.

During a sync, iOS walks the mirror, local, and buffer trees to produce a complete, consistent merged tree. Any bookmarks added to a folder on one side that’s deleted on the other side are relocated to the deleted folder’s parent, to avoid data loss.

Three-way merges make value-structure conflicts, like renaming a folder on one side, and moving or reordering its children on the other, easy to resolve without resorting to the timestamp. iOS still uses the timestamp to break the tie in case of conflicting value and structure changes—this is where a system like Git would insert conflict markers and pause merging until they’re resolved—but these are rare.

We can’t redo bookmark storage on Desktop to match iOS. That would be an invasive change touching almost every part of Places, and require extensive regression and performance testing. But we can take away some insights from how iOS does things.

Staging

Instead of modifying “mirror” directly, iOS writes synced changes into a separate “buffer” table. This allows Sync to detect and bail on inconsistent or incomplete trees before merging. Likewise, changes that you make are staged in an outgoing “local” table, meaning Sync won’t upload partial changes if you happen to move or rename a bookmark at the exact time a sync is running.

Structured tree merging

Structured tree merging. The Sync record format smushes value and structure for folders, which is why Sync has historically mishandled easy conflicts like renaming a folder on your laptop, and adding some bookmarks to the same folder on your phone. Deleting entire folders and moving bookmarks to new folders are other cases where Sync has done the wrong thing. Walking the entire tree to decide on a final structure solves these issues.

Content-based deduplication

Sync tries to avoid creating duplicates where it can. If you bookmark the same page in the same folder on your desktop and tablet, it’ll only sync one. This is also handy if you import bookmarks from another browser before syncing. Thanks to structured merging, iOS can zip trees together, while Desktop looks for similar bookmarks anywhere in the tree, and takes the first match it finds. This causes interesting bugs like merging the contents of two untitled folders.

Applying in a single transaction

This improves performance by reducing writes, and makes interruptions safe. Quitting the app or interrupting the sync means the transaction doesn’t commit, and the merge can start over on the next sync. Merging in a transaction also allows iOS to roll back on errors like inconsistent trees.

How New Sync Works

The new bookmark sync borrows a lot from iOS.

There are two important parts: a mirror and a merger. The mirror maintains a local copy of the server’s bookmark tree in a separate SQLite database, applies merged trees back to Places, and stages locally changed bookmarks for upload. The merger takes the local bookmark tree in Places, the remote bookmark tree in the mirror, and builds a complete, consistent merged tree, with all conflicts resolved.

You can think of the new sync in iOS terms as “Desktop Mirror = iOS mirror + iOS buffer”, and “Desktop Places = iOS mirror + iOS local”.

Unlike iOS, Desktop doesn’t store the shared parent; it knows that a bookmark or folder changed, but not how. Since Desktop don’t know the shared parent, it only has enough information for a two-way merge of two complete trees. This is fine for resolving simple conflicts, which are usually “I added two different bookmarks to the same folder on two different devices.” The structured merge can still detect value and structure changes, and fix up Places or the server to match.

During each sync, the new engine stages incoming bookmarks in the mirror, instead of writing them directly to Places. The mirror handles missing children and parent folders, so the order of records doesn’t matter. The mirror also has weaker consistency guarantees; unlike Places, the mirror don’t need to worry about maintaining consistency until it’s ready to ‘inflate’ a tree for merging.

Unfortunately, Sync might have uploaded corruption to the server in the past, especially in the early days. For example, you might have bookmarks referencing nonexistent parent folders, folders referencing nonexistent children, or a bookmark and folder that disagree about where it belongs. The mirror tries to make the structure temporarily consistent, as the missing or updated records usually show up on the next sync.

Next, the merger recurses down the local and remote trees to build a merged tree, following the same process as iOS. You can dive into the code if you’re curious about how this works. Once the merger has built the new tree, the mirror stuffs the new structure into an in-memory SQL table, and applies the tree to Places using a pile of triggers. The triggers handle deduplication, URL, title, keyword, and tag changes. Much of the complexity is mapping the simpler Sync record model to the Places model, especially for keywords and tags.

Finally, the mirror ‘inflates’ Sync records for all outgoing bookmarks. The engine takes the records, uploads them to the server, and writes them back to the mirror at the end of the sync. This lets the engine resume cleanly if the sync is interrupted during or after upload.

What have we learned?

In a fun turn of events, the strategy that we initially rejected—building a “shadow” bookmarking system that kept synced bookmarks separate, and merged with Places—was the strategy we implemented.

Bookmark merging is hard! Much of the work was about getting the semantics right, and adapting the three-way merger from iOS into a two-way merger on Desktop. “How should we resolve this conflict?” came up more often early on, than “what’s the best way to write the conflict resolution logic?”

Pushing more logic into SQL fixed many edge cases in the first cut of the mirror, which implemented most of the “apply to Places” step in JavaScript. Applying a complete merge tree to Places boils down to two views for value and structure, five INSTEAD OF DELETE triggers on the views to deduplicate and update existing bookmarks, insert new bookmarks, fix parent-child relationships and position, and flag folders with resolved merge conflicts for re-upload, and two DELETE statements to process every row in each view. This also makes the merge step more efficient, as everything is contained within SQLite. There’s no need to accumulate row objects in memory, pass results and statements back and forth between the main and storage threads, or cross the JavaScript-C++ barrier.

There are subtleties in handling keywords and tags, which Places associates with URLs instead of bookmarks.

Some of the performance characteristics of SQLite surprised us! For example, a query containing a WITH RECURSIVE expression (handy for walking trees in SQL!) and LEFT JOINs took 10 seconds to run for 5000 rows, compared to 5 milliseconds for a simpler query and recursing in JavaScript. An INSERT OR IGNORE...SELECT with a LEFT JOIN took over 4 minutes to insert 40000 rows, compared to 2 seconds with a subquery. Removing most LEFT JOINs on TEXT primary keys, and avoiding cascading TEXT foreign key deletes, reduced the merge time for 40000 bookmarks from over 4 minutes to under 10 seconds.

Thank you

Shipping new bookmarks was a months-long team effort, and I’d like to close this post by acknowledging the awesome folks who made this happen. Bookmark merging wouldn’t have landed without their astute and insightful feedback, support and patience, planning, mentoring, hours of code review, data analysis, and testing.

3 comments on “Deep Dive: New bookmark sync in Nightly”

Post a comment

  1. Ezh wrote on

    When is it planned to turn on by-default in beta/release?

    Reply

    1. kit wrote on

      It’ll be a couple of months before we have enough telemetry data and bug reports to feel confident that the new engine won’t make things worse. Likely Firefox 64 at the earliest, but we might be pleasantly surprised. 🙂 There’s a meta bug that captures the remaining work.

      In general, folks on Nightly and early Betas tend to have been using Sync the longest, and have more interesting kinds of corruption that could confuse the new engine. Y’all are also great about noticing problems and filing bugs pretty quickly!

      For all our folks on Release, we can only rely on error telemetry to show trends. If the new engine builds a bad tree, we’ll abort the sync, but telemetry can’t show us the structure of the bad tree, so it’s hard to figure out what we did wrong.

      Reply

  2. Wellington Torrejais da Silva wrote on

    Nice work!
    I’m working with my friends in synchronization solutions too.

    And I think these ideas sound so good. Activated for my clients right now.

    Thanks for this 🙂

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *