|
|
[BSP](BSP) clients may need to store their own metadata as well as the data they sync via BSP. Metadata may have been extracted from the data (e.g. the subject line of a message), or it may describe the state of the client (e.g. whether the user has starred a message). It may refer to a single message or the relationships between messages.
|
|
|
|
|
|
We have to decide whether the metadata should be stored in the same database as the data, and if so, what the API for storing and querying metadata should look like.
|
|
|
|
|
|
Issues to consider:
|
|
|
* Encapsulation - if clients have low-level access to the sync layer's database they can't be insulated from each other
|
|
|
* Modularity - if we want to release the protocol stack as a separate library it should have a well-defined API, not all of SQL
|
|
|
* Expressiveness - if we provide an API for metadata it must be rich enough that clients don't need to use a separate database
|
|
|
* Performance - if we provide an API for metadata it must have comparable performance to using a separate database
|
|
|
* Transactions - do clients need to update metadata and data in a single atomic operation? Do they need atomic operations spanning multiple messages?
|
|
|
* Encryption - if clients use their own metadata storage it won't benefit from our database encryption
|
|
|
|
|
|
Use cases to consider:
|
|
|
* Full text search - efficient queries over multiple messages
|
|
|
* Attachments - simple case of relationships between messages
|
|
|
* Peer moderation - users share upvotes and downvotes that refer to messages; if a message is eligible to be shared, so are all its ancestors
|
|
|
* Expiry - delete discussion threads that have been inactive for a certain amount of time
|
|
|
|
|
|
Considering the above issues and uses cases, my current thinking is as follows:
|
|
|
|
|
|
* Store metadata in the same database as data
|
|
|
* Allow arbitrary key/value pairs to be associated with each message and group
|
|
|
* Initially, support the following queries:
|
|
|
* Get the IDs of all messages with a given metadata key
|
|
|
* Ditto, also retrieving the metadata value
|
|
|
* Queries can be scoped to a single group or all the client's groups
|
|
|
* The sync layer needs to know about dependencies between messages
|
|
|
* References between messages are encoded in the message body rather than the header
|
|
|
* References belong in the body because signatures etc may need to cover them
|
|
|
* The body is opaque to the sync layer
|
|
|
* When the client validates a message, it parses the body and informs the sync layer of the dependencies
|
|
|
* Not all references between messages have to be dependencies
|
|
|
* When a message is shared, the sync layer transitively shares its dependencies
|
|
|
* The client can flag expired messages
|
|
|
* The sync layer garbage collects expired messages that aren't transitive dependencies of unexpired messages
|
|
|
|
|
|
Sketch of how full text search would work:
|
|
|
* The client parses each message and extracts search words
|
|
|
* The client creates a metadata key for each search word
|
|
|
* The metadata value is a list of positions where the word appears in the message
|
|
|
* One metadata query finds all messages matching a word
|
|
|
* Boolean operators are handled by the client (we could push this down to the sync layer later if useful)
|
|
|
* To search for the phrase "foo bar", use two queries to get the message IDs and metadata values for "foo" and "bar", manually combine the results to find message IDs where position("foo") + 1 == position("bar")
|
|
|
* This could be done with a single join query in SQL, so it doesn't score perfectly on expressiveness or performance
|
|
|
|
|
|
Sketch of how attachments would work:
|
|
|
* An attachment is a dependency of the message it's attached to
|
|
|
* Sharing the message automatically shares the attachment
|
|
|
|
|
|
Sketch of how peer moderation would work:
|
|
|
* Messages depend on their parents
|
|
|
* Sharing a message automatically shares its ancestors
|
|
|
* Moderation votes don't necessarily depend on the messages they refer to (we may want to share votes without sharing the messages they refer to, especially downvotes)
|
|
|
|
|
|
Sketch of how expiry would work:
|
|
|
* All messages older than a certain age are flagged as expired
|
|
|
* The sync layer automatically garbage collects threads without any unexpired messages |
|
|
\ No newline at end of file |