|
|
[[_TOC_]]
|
|
[[_TOC_]]
|
|
|
|
|
|
|
|
## Abstract
|
|
# Abstract
|
|
|
|
|
|
|
|
This report describes the Briar team's ongoing research into human mobility patterns and the implications of these patterns for the design of delay-tolerant networks.
|
|
This report describes the Briar team's ongoing research into human mobility patterns and the implications of these patterns for the design of delay-tolerant networks.
|
|
|
We analyse three published datasets that record encounters between people carrying short-range wireless devices in different enviroments: an urban university, an office building and a rural village.
|
|
We analyse three published datasets that record encounters between people carrying short-range wireless devices in different enviroments: an urban university, an office building and a rural village.
|
| ... | @@ -16,13 +16,13 @@ For many sender-recipient pairs, delivery latencies are on the order of days or |
... | @@ -16,13 +16,13 @@ For many sender-recipient pairs, delivery latencies are on the order of days or |
|
|
The consistency of these results across three different environments (university, office building and village) suggests that the findings are robust.
|
|
The consistency of these results across three different environments (university, office building and village) suggests that the findings are robust.
|
|
|
These findings will inform the Briar team's strategy for enabling people to communicate securely without access to the Internet.
|
|
These findings will inform the Briar team's strategy for enabling people to communicate securely without access to the Internet.
|
|
|
|
|
|
|
|
## Research Goals
|
|
# Research Goals
|
|
|
|
|
|
|
|
The goal of this research is to investigate whether offline communication using delay-tolerant networks made up of smartphones or similar mobile devices is feasible.
|
|
The goal of this research is to investigate whether offline communication using delay-tolerant networks made up of smartphones or similar mobile devices is feasible.
|
|
|
|
|
|
|
|
We analyse several human mobility datasets to find bounds on the performance that can be achieved by any network of this kind, given realistic mobility patterns.
|
|
We analyse several human mobility datasets to find bounds on the performance that can be achieved by any network of this kind, given realistic mobility patterns.
|
|
|
|
|
|
|
|
## Human Mobility Datasets
|
|
# Human Mobility Datasets
|
|
|
|
|
|
|
|
The datasets analysed in this research were all produced by similar methods: the researchers asked a group of participants to carry or wear small electronic devices for several days as they went about their lives.
|
|
The datasets analysed in this research were all produced by similar methods: the researchers asked a group of participants to carry or wear small electronic devices for several days as they went about their lives.
|
|
|
The devices were capable of communicating wirelessly over a range of a few metres.
|
|
The devices were capable of communicating wirelessly over a range of a few metres.
|
| ... | @@ -30,7 +30,7 @@ Each device would periodically detect the presence of other participants' device |
... | @@ -30,7 +30,7 @@ Each device would periodically detect the presence of other participants' device |
|
|
|
|
|
|
|
Detection events indicate the participants' mobility patterns only indirectly, but for our purposes this is fine: in designing delay-tolerant networks we're less interested in the precise details of how people move around than in the question of whether wireless devices carried by people can detect and communicate with nearby devices, and this is exactly what the datasets record.
|
|
Detection events indicate the participants' mobility patterns only indirectly, but for our purposes this is fine: in designing delay-tolerant networks we're less interested in the precise details of how people move around than in the question of whether wireless devices carried by people can detect and communicate with nearby devices, and this is exactly what the datasets record.
|
|
|
|
|
|
|
|
### Haggle Content Dataset
|
|
## Haggle Content Dataset
|
|
|
|
|
|
|
|
https://ieee-dataport.org/open-access/crawdad-cambridgehaggle-v-2009-05-29
|
|
https://ieee-dataport.org/open-access/crawdad-cambridgehaggle-v-2009-05-29
|
|
|
|
|
|
| ... | @@ -74,7 +74,7 @@ All contacts with devices not involved in the experiment were ignored. |
... | @@ -74,7 +74,7 @@ All contacts with devices not involved in the experiment were ignored. |
|
|
This is the version of the dataset that we used for our analysis.
|
|
This is the version of the dataset that we used for our analysis.
|
|
|
Specifically, we used the file `haggle-one-cambridge-city-complete.tsv` from the archive `haggle-one-cambridge-city-complete.zip`, published at https://ieee-dataport.org/open-access/crawdad-uoihaggle.
|
|
Specifically, we used the file `haggle-one-cambridge-city-complete.tsv` from the archive `haggle-one-cambridge-city-complete.zip`, published at https://ieee-dataport.org/open-access/crawdad-uoihaggle.
|
|
|
|
|
|
|
|
### SocioPatterns Office Dataset
|
|
## SocioPatterns Office Dataset
|
|
|
|
|
|
|
|
http://www.sociopatterns.org/datasets/contacts-in-a-workplace/
|
|
http://www.sociopatterns.org/datasets/contacts-in-a-workplace/
|
|
|
|
|
|
| ... | @@ -99,7 +99,7 @@ However, due to the much shorter scan interval used in this experiment, the diff |
... | @@ -99,7 +99,7 @@ However, due to the much shorter scan interval used in this experiment, the diff |
|
|
|
|
|
|
|
Our analysis is based on the file `tij_InVS.dat` from the archive `tij_InVS.zip`, published by the SocioPatterns project at http://www.sociopatterns.org/datasets/contacts-in-a-workplace/.
|
|
Our analysis is based on the file `tij_InVS.dat` from the archive `tij_InVS.zip`, published by the SocioPatterns project at http://www.sociopatterns.org/datasets/contacts-in-a-workplace/.
|
|
|
|
|
|
|
|
### SocioPatterns Village Dataset
|
|
## SocioPatterns Village Dataset
|
|
|
|
|
|
|
|
http://www.sociopatterns.org/datasets/contact-patterns-in-a-village-in-rural-malawi/
|
|
http://www.sociopatterns.org/datasets/contact-patterns-in-a-village-in-rural-malawi/
|
|
|
|
|
|
| ... | @@ -119,7 +119,7 @@ We translated the detection events into contacts in the same way as for the offi |
... | @@ -119,7 +119,7 @@ We translated the detection events into contacts in the same way as for the offi |
|
|
|
|
|
|
|
Our analysis is based on the file `tnet_malawi_pilot.csv`, published by the SocioPatterns project at http://www.sociopatterns.org/datasets/contact-patterns-in-a-village-in-rural-malawi/.
|
|
Our analysis is based on the file `tnet_malawi_pilot.csv`, published by the SocioPatterns project at http://www.sociopatterns.org/datasets/contact-patterns-in-a-village-in-rural-malawi/.
|
|
|
|
|
|
|
|
### HYCCUPS 2012 Dataset
|
|
## HYCCUPS 2012 Dataset
|
|
|
|
|
|
|
|
https://ieee-dataport.org/open-access/crawdad-upbhyccups
|
|
https://ieee-dataport.org/open-access/crawdad-upbhyccups
|
|
|
|
|
|
| ... | @@ -150,7 +150,7 @@ We don't know how to reconcile these numbers. |
... | @@ -150,7 +150,7 @@ We don't know how to reconcile these numbers. |
|
|
|
|
|
|
|
After trying to contact the researchers and receiving no reply, we reluctantly decided to exclude this dataset from our research, as we don't understand the data well enough to analyse it.
|
|
After trying to contact the researchers and receiving no reply, we reluctantly decided to exclude this dataset from our research, as we don't understand the data well enough to analyse it.
|
|
|
|
|
|
|
|
## Directed and Undirected Contacts
|
|
# Directed and Undirected Contacts
|
|
|
|
|
|
|
|
The detection events recorded in the datasets are *directed*, meaning that an event in which device `A` detects device `B` is distinct from an event in which `B` detects `A`.
|
|
The detection events recorded in the datasets are *directed*, meaning that an event in which device `A` detects device `B` is distinct from an event in which `B` detects `A`.
|
|
|
|
|
|
| ... | @@ -171,17 +171,15 @@ Therefore we use the less restrictive definition: there's an undirected contact |
... | @@ -171,17 +171,15 @@ Therefore we use the less restrictive definition: there's an undirected contact |
|
|
|
|
|
|
|
The difference between these definitions is less significant for the SocioPatterns datasets, with their 20-second scan intervals, than for the Haggle dataset, where the scan interval of most devices is 10 minutes.
|
|
The difference between these definitions is less significant for the SocioPatterns datasets, with their 20-second scan intervals, than for the Haggle dataset, where the scan interval of most devices is 10 minutes.
|
|
|
|
|
|
|
|
## Initial Analysis of Contact Patterns
|
|
# Initial Analysis of Contact Patterns
|
|
|
|
|
|
|
|
Before moving on to analysing the emergent properties of the datasets we'll discuss some characteristics that are apparent in the raw data.
|
|
Before moving on to analysing the emergent properties of the datasets we'll discuss some characteristics that are apparent in the raw data.
|
|
|
|
|
|
|
|
### Contact Events per Hour
|
|
## Contact Events per Hour
|
|
|
|
|
|
|
|
Counting the number of contact events per hour reveals clear daily patterns in all the datasets.
|
|
Counting the number of contact events per hour reveals clear daily patterns in all the datasets.
|
|
|
(Here we use "contact event" as shorthand for the detection event that begins a contact between two devices.)
|
|
(Here we use "contact event" as shorthand for the detection event that begins a contact between two devices.)
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|

|
|

|
|
|
|
|
|
|
|
**Contact events per hour in the Haggle dataset.**
|
|
**Contact events per hour in the Haggle dataset.**
|
| ... | @@ -220,13 +218,11 @@ The office and village datasets were collected using the same experimental setup |
... | @@ -220,13 +218,11 @@ The office and village datasets were collected using the same experimental setup |
|
|
|
|
|
|
|
The difference may be due in part to the fact that contacts in the office dataset only occur during office hours (roughly a quarter of all hours in the week).
|
|
The difference may be due in part to the fact that contacts in the office dataset only occur during office hours (roughly a quarter of all hours in the week).
|
|
|
|
|
|
|
|
### Number of Connected Devices
|
|
## Number of Connected Devices
|
|
|
|
|
|
|
|
We refer to a device as "connected" when it's involved in one or more undirected contacts.
|
|
We refer to a device as "connected" when it's involved in one or more undirected contacts.
|
|
|
All three datasets show daily patterns in the number of connected devices, with similar patterns to the number of contact events per hour.
|
|
All three datasets show daily patterns in the number of connected devices, with similar patterns to the number of contact events per hour.
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|

|
|

|
|
|
|
|
|
|
|
**Number of connected devices for the Haggle dataset.**
|
|
**Number of connected devices for the Haggle dataset.**
|
| ... | @@ -252,15 +248,13 @@ The Haggle and office datasets have clear differences between weekdays and weeke |
... | @@ -252,15 +248,13 @@ The Haggle and office datasets have clear differences between weekdays and weeke |
|
|
Both of the SocioPatterns datasets show a lot of rapid variation: even within each daily peak, the number of connected devices frequently drops to zero for short periods, whereas in the Haggle dataset the peaks are smoother.
|
|
Both of the SocioPatterns datasets show a lot of rapid variation: even within each daily peak, the number of connected devices frequently drops to zero for short periods, whereas in the Haggle dataset the peaks are smoother.
|
|
|
This may be due in part to the greater detection range of the devices used in the Haggle experiment (5-10 metres versus 1.5 metres), and the fact that the devices used in the SocioPatterns experiments were designed to detect only face-to-face contacts.
|
|
This may be due in part to the greater detection range of the devices used in the Haggle experiment (5-10 metres versus 1.5 metres), and the fact that the devices used in the SocioPatterns experiments were designed to detect only face-to-face contacts.
|
|
|
|
|
|
|
|
### Time Spent Connected
|
|
## Time Spent Connected
|
|
|
|
|
|
|
|
The three datasets differ considerably in the amount of time devices spend connected.
|
|
The three datasets differ considerably in the amount of time devices spend connected.
|
|
|
|
|
|
|
|
The following figure shows the distribution for each dataset.
|
|
The following figure shows the distribution for each dataset.
|
|
|
The boxes show the interquartile range, while the whiskers show the maximum and minimum values.
|
|
The boxes show the interquartile range, while the whiskers show the maximum and minimum values.
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|

|
|

|
|
|
|
|
|
|
|
**Fraction of time spent connected, compared across datasets.**
|
|
**Fraction of time spent connected, compared across datasets.**
|
| ... | @@ -274,7 +268,7 @@ The difference between the Haggle dataset and the others may be due in part to t |
... | @@ -274,7 +268,7 @@ The difference between the Haggle dataset and the others may be due in part to t |
|
|
Contacts in the office dataset only occur during office hours, which are roughly a quarter of all hours in the week.
|
|
Contacts in the office dataset only occur during office hours, which are roughly a quarter of all hours in the week.
|
|
|
But even after accounting for this there's still a large difference between the office and village datasets, indicating that face-to-face contacts in the office environment are sparser than in the village environment.
|
|
But even after accounting for this there's still a large difference between the office and village datasets, indicating that face-to-face contacts in the office environment are sparser than in the village environment.
|
|
|
|
|
|
|
|
## Static Contact Graphs
|
|
# Static Contact Graphs
|
|
|
|
|
|
|
|
The following figures show the *static contact graphs* for the three datasets.
|
|
The following figures show the *static contact graphs* for the three datasets.
|
|
|
|
|
|
| ... | @@ -283,8 +277,6 @@ An edge connects two nodes if the corresponding devices were in contact at any t |
... | @@ -283,8 +277,6 @@ An edge connects two nodes if the corresponding devices were in contact at any t |
|
|
The thickness of the edge represents the total duration the devices spent in contact, in proportion to the maximum duration any two devices spent in contact.
|
|
The thickness of the edge represents the total duration the devices spent in contact, in proportion to the maximum duration any two devices spent in contact.
|
|
|
Contact durations are calculated using the definition of an undirected contact given above.
|
|
Contact durations are calculated using the definition of an undirected contact given above.
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|

|
|

|
|
|
|
|
|
|
|
**Static contact graph for the Haggle dataset.**
|
|
**Static contact graph for the Haggle dataset.**
|
| ... | @@ -314,7 +306,7 @@ The village dataset's static contact graph is sparser than the others: most node |
... | @@ -314,7 +306,7 @@ The village dataset's static contact graph is sparser than the others: most node |
|
|
Several cliques of three to five nodes are visible, perhaps representing households.
|
|
Several cliques of three to five nodes are visible, perhaps representing households.
|
|
|
A small component of two nodes is disconnected from the rest of the graph.
|
|
A small component of two nodes is disconnected from the rest of the graph.
|
|
|
|
|
|
|
|
### Isolated Nodes and Small Components
|
|
## Isolated Nodes and Small Components
|
|
|
|
|
|
|
|
The static contact graph for the Haggle dataset has two isolated nodes representing devices that didn't make contact with any other participating devices during the experiment, while the static contact graph for the village dataset has a small component representing two devices that made contact with each other but not with any other devices.
|
|
The static contact graph for the Haggle dataset has two isolated nodes representing devices that didn't make contact with any other participating devices during the experiment, while the static contact graph for the village dataset has a small component representing two devices that made contact with each other but not with any other devices.
|
|
|
|
|
|
| ... | @@ -324,7 +316,7 @@ In the real world there are many people who are physically isolated from others, |
... | @@ -324,7 +316,7 @@ In the real world there are many people who are physically isolated from others, |
|
|
These people are less likely to be well-served by delay-tolerant networks than people who are more mobile.
|
|
These people are less likely to be well-served by delay-tolerant networks than people who are more mobile.
|
|
|
By including isolated nodes and small components in our analysis, rather than analysing only the main component of the graph, we aim to ensure that this aspect of reality is reflected in our results.
|
|
By including isolated nodes and small components in our analysis, rather than analysing only the main component of the graph, we aim to ensure that this aspect of reality is reflected in our results.
|
|
|
|
|
|
|
|
### Measuring Static Reachability
|
|
## Measuring Static Reachability
|
|
|
|
|
|
|
|
One way to measure the connectivity of a dataset is to count how many devices in the dataset's static contact graph are reachable from a given source device by following the edges of the contact graph.
|
|
One way to measure the connectivity of a dataset is to count how many devices in the dataset's static contact graph are reachable from a given source device by following the edges of the contact graph.
|
|
|
Averaging over all possible source devices gives a metric that we call the *mean static reachability* of the dataset.
|
|
Averaging over all possible source devices gives a metric that we call the *mean static reachability* of the dataset.
|
| ... | @@ -347,7 +339,7 @@ For example, given the mobility patterns recorded in the Haggle dataset, no dela |
... | @@ -347,7 +339,7 @@ For example, given the mobility patterns recorded in the Haggle dataset, no dela |
|
|
|
|
|
|
|
Next we'll see how we can tighten this upper bound by including information about the order in which contacts happened.
|
|
Next we'll see how we can tighten this upper bound by including information about the order in which contacts happened.
|
|
|
|
|
|
|
|
## Dynamic Contact Graphs
|
|
# Dynamic Contact Graphs
|
|
|
|
|
|
|
|
The static contact graph captures information about which devices made contact with each other during the experiment, but it doesn't include any information about the order in which contacts happened.
|
|
The static contact graph captures information about which devices made contact with each other during the experiment, but it doesn't include any information about the order in which contacts happened.
|
|
|
To capture this information as well, we can construct a *dynamic contact graph* for each dataset.
|
|
To capture this information as well, we can construct a *dynamic contact graph* for each dataset.
|
| ... | @@ -384,21 +376,19 @@ A dynamic contact graph can be built from directed or undirected contacts, resul |
... | @@ -384,21 +376,19 @@ A dynamic contact graph can be built from directed or undirected contacts, resul |
|
|
As discussed above, for the purposes of this research we're mainly interested in undirected contacts, because the delay-tolerant networks we're interested in building will use two-way communication to exchange messages.
|
|
As discussed above, for the purposes of this research we're mainly interested in undirected contacts, because the delay-tolerant networks we're interested in building will use two-way communication to exchange messages.
|
|
|
But it's possible to imagine delay-tolerant networks that use one-way broadcasts to propagate messages, and such networks can be modeled using dynamic contact graphs with directed contacts - that is, with directed edges within each layer.
|
|
But it's possible to imagine delay-tolerant networks that use one-way broadcasts to propagate messages, and such networks can be modeled using dynamic contact graphs with directed contacts - that is, with directed edges within each layer.
|
|
|
|
|
|
|
|
### Events With the Same Timestamp
|
|
## Events With the Same Timestamp
|
|
|
|
|
|
|
|
The Haggle and SocioPatterns datasets use timestamps with one-second precision and contain many events with the same timestamp.
|
|
The Haggle and SocioPatterns datasets use timestamps with one-second precision and contain many events with the same timestamp.
|
|
|
When building a dynamic contact graph, if two or more events have the same timestamp then we process them in the order in which they occurred in the original dataset.
|
|
When building a dynamic contact graph, if two or more events have the same timestamp then we process them in the order in which they occurred in the original dataset.
|
|
|
In particular, an event indicating the end of a contact is always processed after the event indicating the beginning of the same contact, even if they have the same timestamp.
|
|
In particular, an event indicating the end of a contact is always processed after the event indicating the beginning of the same contact, even if they have the same timestamp.
|
|
|
(This is common, because as discussed above, many contacts have recorded durations of zero, and so the events indicating the beginning and end of such a contact have the same timestamp.)
|
|
(This is common, because as discussed above, many contacts have recorded durations of zero, and so the events indicating the beginning and end of such a contact have the same timestamp.)
|
|
|
|
|
|
|
|
### Example
|
|
## Example
|
|
|
|
|
|
|
|
The following figures show how to construct a dynamic contact graph for a simple dataset with three devices: `A`, `B` and `C`.
|
|
The following figures show how to construct a dynamic contact graph for a simple dataset with three devices: `A`, `B` and `C`.
|
|
|
The dataset starts at time `t1`.
|
|
The dataset starts at time `t1`.
|
|
|
There's a contact between `A` and `B` beginning at `t2` and ending at `t3`, and later there's a contact between `B` and `C` beginning at `t4` and ending at `t5`.
|
|
There's a contact between `A` and `B` beginning at `t2` and ending at `t3`, and later there's a contact between `B` and `C` beginning at `t4` and ending at `t5`.
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|

|
|

|
|
|
|
|
|
|
|
**The bottom layer of the contact graph has a node for each device.**
|
|
**The bottom layer of the contact graph has a node for each device.**
|
| ... | @@ -430,7 +420,7 @@ Thus, unlike a static contact graph, the dynamic contact graph captures the fact |
... | @@ -430,7 +420,7 @@ Thus, unlike a static contact graph, the dynamic contact graph captures the fact |
|
|
|
|
|
|
|
If we collapsed all the layers of the dynamic contact graph into a single layer then we'd recover the static contact graph.
|
|
If we collapsed all the layers of the dynamic contact graph into a single layer then we'd recover the static contact graph.
|
|
|
|
|
|
|
|
## Dynamic Reachability Analysis
|
|
# Dynamic Reachability Analysis
|
|
|
|
|
|
|
|
By building a dynamic contact graph for each dataset we can begin to study which devices are reachable from which others at which times, which tells us which devices would potentially be able to send messages to each other in a delay-tolerant network.
|
|
By building a dynamic contact graph for each dataset we can begin to study which devices are reachable from which others at which times, which tells us which devices would potentially be able to send messages to each other in a delay-tolerant network.
|
|
|
|
|
|
| ... | @@ -447,7 +437,7 @@ The upper bound is also useful as a way of assessing whether *any* delay-toleran |
... | @@ -447,7 +437,7 @@ The upper bound is also useful as a way of assessing whether *any* delay-toleran |
|
|
If the upper bound for delivery performance is too low to be useful then we don't need to investigate the performance of specific designs.
|
|
If the upper bound for delivery performance is too low to be useful then we don't need to investigate the performance of specific designs.
|
|
|
A negative result at this stage could save us a lot of effort designing and deploying real networks, if we can see that no network could possibly achieve the level of performance we need.
|
|
A negative result at this stage could save us a lot of effort designing and deploying real networks, if we can see that no network could possibly achieve the level of performance we need.
|
|
|
|
|
|
|
|
### Measuring Dynamic Reachability
|
|
## Measuring Dynamic Reachability
|
|
|
|
|
|
|
|
Just as we measured static reachability in a static contact graph by counting how many devices are reachable from a given source device, in a dynamic contact graph we can measure *dynamic reachability* by counting how many devices are reachable *from a given source device, starting at a given time*.
|
|
Just as we measured static reachability in a static contact graph by counting how many devices are reachable from a given source device, in a dynamic contact graph we can measure *dynamic reachability* by counting how many devices are reachable *from a given source device, starting at a given time*.
|
|
|
Averaging over all possible source devices gives a metric that we call the *mean dynamic reachability* at the given time.
|
|
Averaging over all possible source devices gives a metric that we call the *mean dynamic reachability* at the given time.
|
| ... | @@ -461,8 +451,6 @@ The figures below show the mean dynamic reachability for each dataset as a funct |
... | @@ -461,8 +451,6 @@ The figures below show the mean dynamic reachability for each dataset as a funct |
|
|
The dashed line in each figure shows the mean static reachability, for comparison.
|
|
The dashed line in each figure shows the mean static reachability, for comparison.
|
|
|
Static reachability is always at least as high as dynamic reachability, as static reachability ignores the order in which contacts occur.
|
|
Static reachability is always at least as high as dynamic reachability, as static reachability ignores the order in which contacts occur.
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|

|
|

|
|
|
|
|
|
|
|
**Mean dynamic reachability for the Haggle dataset.**
|
|
**Mean dynamic reachability for the Haggle dataset.**
|
| ... | @@ -504,8 +492,6 @@ If the hypothesis of a boundary effect is wrong and the observed decline in reac |
... | @@ -504,8 +492,6 @@ If the hypothesis of a boundary effect is wrong and the observed decline in reac |
|
|
On the other hand, if the hypothesis of a boundary effect is right then cutting off the end of the dataset will reduce the dynamic reachability at times before the cutoff time.
|
|
On the other hand, if the hypothesis of a boundary effect is right then cutting off the end of the dataset will reduce the dynamic reachability at times before the cutoff time.
|
|
|
By measuring how far back from the cutoff time the change in reachability extends, we'll be able to estimate how far back from the end of the full dataset the boundary effect extends.
|
|
By measuring how far back from the cutoff time the change in reachability extends, we'll be able to estimate how far back from the end of the full dataset the boundary effect extends.
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|

|
|

|
|
|
|
|
|
|
|
**Mean dynamic reachability for the Haggle dataset with different cutoff times.**
|
|
**Mean dynamic reachability for the Haggle dataset with different cutoff times.**
|
| ... | @@ -531,7 +517,7 @@ Simulations based on these datasets would not yield meaningful results: messages |
... | @@ -531,7 +517,7 @@ Simulations based on these datasets would not yield meaningful results: messages |
|
|
So we have a problem.
|
|
So we have a problem.
|
|
|
How can we prevent the boundary effect from distorting our results?
|
|
How can we prevent the boundary effect from distorting our results?
|
|
|
|
|
|
|
|
## Looping the Data
|
|
# Looping the Data
|
|
|
|
|
|
|
|
In an ideal world we'd like to go back and repeat the experiments over much longer durations, capturing several week of data.
|
|
In an ideal world we'd like to go back and repeat the experiments over much longer durations, capturing several week of data.
|
|
|
We would measure the extent of the boundary effect by truncating the data, as we did above, and then run the reachability analysis for all start times up to the beginning of the boundary effect.
|
|
We would measure the extent of the boundary effect by truncating the data, as we did above, and then run the reachability analysis for all start times up to the beginning of the boundary effect.
|
| ... | @@ -548,8 +534,6 @@ It's easy to find suitable start and end times for the office dataset, which has |
... | @@ -548,8 +534,6 @@ It's easy to find suitable start and end times for the office dataset, which has |
|
|
|
|
|
|
|
The following figures show the part of each dataset that we selected for looping.
|
|
The following figures show the part of each dataset that we selected for looping.
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|

|
|

|
|
|
|
|
|
|
|
**Data selected to be looped for the Haggle dataset.**
|
|
**Data selected to be looped for the Haggle dataset.**
|
| ... | @@ -568,15 +552,13 @@ The following figures show the part of each dataset that we selected for looping |
... | @@ -568,15 +552,13 @@ The following figures show the part of each dataset that we selected for looping |
|
|
|
|
|
|
|
---
|
|
---
|
|
|
|
|
|
|
|
## Static Contact Graphs for Looped Data
|
|
# Static Contact Graphs for Looped Data
|
|
|
|
|
|
|
|
The static contact graphs for the looped data are somewhat sparser than the graphs for the full datasets.
|
|
The static contact graphs for the looped data are somewhat sparser than the graphs for the full datasets.
|
|
|
|
|
|
|
|
Recall that in a static contact graph, each node represents a device and an edge connects two nodes if the corresponding devices were in contact at any time.
|
|
Recall that in a static contact graph, each node represents a device and an edge connects two nodes if the corresponding devices were in contact at any time.
|
|
|
The thickness of the edge represents the total duration the devices spent in contact, in proportion to the maximum duration any two devices spent in contact.
|
|
The thickness of the edge represents the total duration the devices spent in contact, in proportion to the maximum duration any two devices spent in contact.
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|

|
|

|
|
|
|
|
|
|
|
**Static contact graph for the looped Haggle dataset.**
|
|
**Static contact graph for the looped Haggle dataset.**
|
| ... | @@ -613,12 +595,10 @@ Due to these isolated nodes and small components, static reachability is lower f |
... | @@ -613,12 +595,10 @@ Due to these isolated nodes and small components, static reachability is lower f |
|
|
| Office | 100.0% | 95.7% |
|
|
| Office | 100.0% | 95.7% |
|
|
|
| Village | 95.4% | 82.2% |
|
|
| Village | 95.4% | 82.2% |
|
|
|
|
|
|
|
|
## Dynamic Reachability Analysis for Looped Data
|
|
# Dynamic Reachability Analysis for Looped Data
|
|
|
|
|
|
|
|
Repeating the dynamic reachability analysis with five weeks of looped data confirms that the decline in reachability seen with the original data is due to a boundary effect.
|
|
Repeating the dynamic reachability analysis with five weeks of looped data confirms that the decline in reachability seen with the original data is due to a boundary effect.
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|

|
|

|
|
|
|
|
|
|
|
**Mean dynamic reachability for the looped Haggle dataset.**
|
|
**Mean dynamic reachability for the looped Haggle dataset.**
|
| ... | @@ -645,14 +625,12 @@ For the office dataset, the extent of the boundary effect is 14 days, while for |
... | @@ -645,14 +625,12 @@ For the office dataset, the extent of the boundary effect is 14 days, while for |
|
|
This leaves us with at least two weeks of "clean" data at the start of each looped dataset.
|
|
This leaves us with at least two weeks of "clean" data at the start of each looped dataset.
|
|
|
The rest of the analysis of the looped data will focus on these two weeks of "clean" data.
|
|
The rest of the analysis of the looped data will focus on these two weeks of "clean" data.
|
|
|
|
|
|
|
|
### Time-to-Live
|
|
## Time-to-Live
|
|
|
|
|
|
|
|
If our understanding of the boundary effect caused by the end of the dataset is right then some paths take many days to reach their destinations.
|
|
If our understanding of the boundary effect caused by the end of the dataset is right then some paths take many days to reach their destinations.
|
|
|
We can explore this further by imposing a *time-to-live*: paths that take longer than this to reach their destinations are ignored when measuring reachability.
|
|
We can explore this further by imposing a *time-to-live*: paths that take longer than this to reach their destinations are ignored when measuring reachability.
|
|
|
This lets us see how many source-destination pairs can reach each other within a given amount of time.
|
|
This lets us see how many source-destination pairs can reach each other within a given amount of time.
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|

|
|

|
|
|
|
|
|
|
|
**Mean dynamic reachability for the looped Haggle dataset, with various time-to-live values.**
|
|
**Mean dynamic reachability for the looped Haggle dataset, with various time-to-live values.**
|
| ... | @@ -696,7 +674,7 @@ In principle it's possible for paths to depend on longer chains of meetings, and |
... | @@ -696,7 +674,7 @@ In principle it's possible for paths to depend on longer chains of meetings, and |
|
|
|
|
|
|
|
We can see from the figures above that this is not just a theoretical concern: for all of the datasets but especially the village dataset, a significant fraction of paths don't reach their destinations within one week.
|
|
We can see from the figures above that this is not just a theoretical concern: for all of the datasets but especially the village dataset, a significant fraction of paths don't reach their destinations within one week.
|
|
|
|
|
|
|
|
### Path Latency
|
|
## Path Latency
|
|
|
|
|
|
|
|
The figures in the previous section can be read in two ways: for a given time-to-live, they can tell us the fraction of paths that reach their destinations; or for a given probability of reaching the destination, they can tell us the time-to-live required to achieve that probability.
|
|
The figures in the previous section can be read in two ways: for a given time-to-live, they can tell us the fraction of paths that reach their destinations; or for a given probability of reaching the destination, they can tell us the time-to-live required to achieve that probability.
|
|
|
In other words, the figures show the distribution of path latencies.
|
|
In other words, the figures show the distribution of path latencies.
|
| ... | @@ -706,7 +684,7 @@ The distribution changes throughout the week due to weekly patterns in the parti |
... | @@ -706,7 +684,7 @@ The distribution changes throughout the week due to weekly patterns in the parti |
|
|
|
|
|
|
|
It's clear that for all of the datasets, these latencies are much longer than the latencies that users of Internet-based communication tools are used to.
|
|
It's clear that for all of the datasets, these latencies are much longer than the latencies that users of Internet-based communication tools are used to.
|
|
|
|
|
|
|
|
### Maximum Path Latency
|
|
## Maximum Path Latency
|
|
|
|
|
|
|
|
For each dataset, we can find the minimum time-to-live that achieves the same level of reachability as an unlimited time-to-live.
|
|
For each dataset, we can find the minimum time-to-live that achieves the same level of reachability as an unlimited time-to-live.
|
|
|
This tells us the *maximum path latency* of the dataset.
|
|
This tells us the *maximum path latency* of the dataset.
|
| ... | @@ -729,7 +707,7 @@ Real-world performance may therefore *decrease* when the time-to-live is increas |
... | @@ -729,7 +707,7 @@ Real-world performance may therefore *decrease* when the time-to-live is increas |
|
|
|
|
|
|
|
Fully understanding these performance tradeoffs requires a more realistic simulation of a specific delay-tolerant network design, including its rules for deciding which messages to store and forward when storage and bandwidth are limited.
|
|
Fully understanding these performance tradeoffs requires a more realistic simulation of a specific delay-tolerant network design, including its rules for deciding which messages to store and forward when storage and bandwidth are limited.
|
|
|
|
|
|
|
|
## Summary of Findings
|
|
# Summary of Findings
|
|
|
|
|
|
|
|
* All of the datasets show daily patterns in the frequency of contact events and the number of connected devices. The Haggle and office datasets also show weekly patterns.
|
|
* All of the datasets show daily patterns in the frequency of contact events and the number of connected devices. The Haggle and office datasets also show weekly patterns.
|
|
|
* The datasets differ greatly in the amount of time devices spend connected.
|
|
* The datasets differ greatly in the amount of time devices spend connected.
|
| ... | @@ -740,7 +718,7 @@ Fully understanding these performance tradeoffs requires a more realistic simula |
... | @@ -740,7 +718,7 @@ Fully understanding these performance tradeoffs requires a more realistic simula |
|
|
* The static contact graphs for the looped data are qualitatively similar to those for the original data, but with a higher incidence of isolated nodes and small components. Thus static reachability for the office and village datasets is lower for the looped data than for the original data.
|
|
* The static contact graphs for the looped data are qualitatively similar to those for the original data, but with a higher incidence of isolated nodes and small components. Thus static reachability for the office and village datasets is lower for the looped data than for the original data.
|
|
|
* Path latencies are high for all datasets. For the Haggle dataset, 44-79% of paths take more than a day to reach their destinations (depending on the time of week), while the slowest paths take 14 days to reach their destinations. The other datasets have even higher latencies.
|
|
* Path latencies are high for all datasets. For the Haggle dataset, 44-79% of paths take more than a day to reach their destinations (depending on the time of week), while the slowest paths take 14 days to reach their destinations. The other datasets have even higher latencies.
|
|
|
|
|
|
|
|
## Conclusions and Future Work
|
|
# Conclusions and Future Work
|
|
|
|
|
|
|
|
We extracted static and dynamic contact graphs from three datasets and analysed the graphs to obtain some bounds on the performance achievable by any delay-tolerant network, given the mobility patterns recorded in the datasets.
|
|
We extracted static and dynamic contact graphs from three datasets and analysed the graphs to obtain some bounds on the performance achievable by any delay-tolerant network, given the mobility patterns recorded in the datasets.
|
|
|
|
|
|
| ... | |
... | |
| ... | | ... | |