Changes

akwizgran · 6f34f412
--- a/Analysis-of-Human-Mobility-Datasets.md
+++ b/Analysis-of-Human-Mobility-Datasets.md
+[[_TOC_]]
+## Abstract
+This report describes the Briar team's ongoing research into human mobility patterns and their implications for the design of delay-tolerant networks.
+We analysed three published datasets that record encounters between people carrying short-range wireless devices in different enviroments: an urban university, an office building and a rural village.
+We constructed a *transitive contact graph* for each dataset and analysed these graphs to understand the emergent structures created by the individual encounters.
+By analysing the transitive contact graphs we were able to obtain some bounds on the performance achievable by any delay-tolerant network when operating under the conditions recorded in the datasets.
+This analysis produced two interesting findings.
+First, the upper bound for *delivery probability* is high: under the conditions recorded in the datasets, a delay-tolerant network can, in principle, deliver a message from any sender to any recipient with high probability.
+Second, the lower bound for *delivery latency* is also high: under the conditions recorded in the datasets, a delay-tolerant network cannot, even in principle, deliver messages quickly between all potential senders and recipients.
+For some sender-recipient pairs, delivery latencies were on the order of days or weeks, rather than the seconds or minutes that users of internet-based communication tools are accustomed to.
+The consistency of these results across three different environments (university, office building and village) suggests that the findings are robust.
+These findings will inform the Briar team's strategy for enabling people to communicate securely without access to the internet.
+## Research Goals
+The goal of this research is to investigate whether offline communication using a delay-tolerant network made up of smartphones or similar mobile devices is feasible.
+We analyse several human mobility datasets to find bounds on the performance that can be achieved by any network of this kind, given realistic mobility patterns.
+## Human Mobility Datasets
+The datasets analysed in this research were all produced by similar methods: the researchers asked a group of participants to carry or wear small electronic devices for several days as they went about their lives.
+The devices were capable of communicating wirelessly over a range of a few metres.
+Each device would periodically detect the presence of other participants' devices nearby and record the time at which these *detection events* took place.
+Detection events indicate the participants' mobility patterns only indirectly, but for our purposes this is fine: in designing delay-tolerant networks we're interested in the question of whether wireless devices carried by people can detect and communicate with nearby devices, and this is exactly what the datasets record.
+### Haggle Content Dataset
+https://ieee-dataport.org/open-access/crawdad-cambridgehaggle-v-2009-05-29
+* 54 devices (36 mobile and 18 stationary), 52 of which had at least one contact
+* Duration 12 days
+* Scan interval 2-10 minutes
+This dataset was collected by researchers at Cambridge University in 2005.
+The dataset records Bluetooth contacts between small devices carried by participants, or placed at locations that the researchers expected participants to visit, in and around the city of Cambridge, UK.
+One of the stationary devices was placed at the reception of the Computer Lab where most of the participants were students.
+Each mobile device scanned for nearby devices every 10 minutes.
+The stationary devices had scan intervals ranging from 2 to 10 minutes.
+The scan intervals were slightly randomised to avoid a situation where two devices would repeatedly fail to detect each other due to scanning at the same time.
+The researchers deployed 40 mobile and 21 stationary devices, of which 36 mobile and 18 stationary devices produced usable results.
+Two of the stationary devices did not have any contacts with other devices involved in the experiment.
+For 25 days, the devices recorded contacts with each other and with devices not involved in the experiment.
+After 12 days there were no further contacts between devices involved in the experiment, although contacts with other devices were recorded for a further 13 days.
+For our purposes we're only interested in contacts between devices involved in the experiment, so we ignore the contacts with other devices and treat the dataset as having a duration of 12 days.
+Due to the limited amount of storage on the devices, the raw scan results were not recorded.
+Instead, each device recorded the start and end times of contacts with other devices.
+This was done by keeping track of which devices had been detected in the last two scans.
+A contact between devices was considered to start when a device was detected that had not been detected in the last two scans, and to end when the device was not detected in two consecutive scans.
+The last time when the device had been detected was then recorded as the end time of the contact.
+The researchers used two consecutive detection failures to define the end of a contact because there were too many cases where devices were not detected in one scan but were detected in the next, and there was not enough storage on the devices to record all these intermittent contacts.
+Measuring the contact duration from the time when a device was first detected to the time when it was last detected is a reasonable choice, but it means that the recorded contact duration should be understood as a lower bound on the amount of time that the device was within communication range.
+The device might actually have come into range almost one scan interval before it was first detected, and remained within range almost one scan interval after it was last detected.
+Due to the long scan intervals in this dataset (10 minutes for most devices), this leads to considerable uncertainty about the true contact durations.
+In particular, if a device was detected in one scan, with a gap of two scans on either side in which is was not detected, then the recorded contact duration would be zero (or very close to zero, if the device was detected twice in the same scan, which occasionally happened).
+About 36% of the contacts in the dataset have a recorded duration of zero.
+Researchers at the University of Ioannina converted the Haggle Content dataset into the format used by the Opportunistic Network Environment (ONE) simulator.
+All contacts with devices not involved in the experiment were ignored.
+This is the version of the dataset that we used for our analysis.
+Specifically, we used the file `haggle-one-cambridge-city-complete.tsv` from the archive `haggle-one-cambridge-city-complete.zip` published at https://ieee-dataport.org/open-access/crawdad-uoihaggle.
+### SocioPatterns Office Dataset
+http://www.sociopatterns.org/datasets/contacts-in-a-workplace/
+* 92 mobile devices, all of which had at least one contact
+* Duration 12 days
+* Scan interval 20 seconds
+This dataset was collected by researchers at the SocioPatterns project, a collaboration between researchers at the ISI Foundation in Turin, the CNRS in Marseille, and Bitmanufactory in Cambridge.
+A research paper describing the experiment is available at https://doi.org/10.1017/nws.2015.10.
+This dataset records contacts between workers in an office building in France in 2013.
+The participants wore devices for 12 days that scanned for nearby devices every 20 seconds.
+The devices were designed so that they would only register face-to-face contacts between the participants at a distance of up to 1.5 metres.
+Unlike the Haggle dataset, the list of devices detected by each scan was published (in anonymised form), leaving us with some flexibility in translating the detection events into contacts.
+For consistency with the Haggle dataset, we decided to define a contact as a series of consecutive detections of the same device, with the contact duration measured from the time of the first detection to the time of the last detection.
+This means that as with the Haggle dataset, the recorded contact duration should be understood as a lower bound on the actual duration.
+In particular, if a device was only detected in a single scan then the recorded contact duration would be zero.
+However, due to the much shorter scan interval in this experiment, the difference between the recorded contact duration and the true duration should be smaller than for the Haggle dataset.
+Our analysis is based on the file `tij_InVS.dat` from the archive `tij_InVS.zip` published by the SocioPatterns project at http://www.sociopatterns.org/datasets/contacts-in-a-workplace/.
+### SocioPatterns Village Dataset
+http://www.sociopatterns.org/datasets/contact-patterns-in-a-village-in-rural-malawi/
+* 86 mobile devices, all of which had at least one contact
+* Duration 14 days
+* Scan interval 20 seconds
+This dataset was collected by researchers at the SocioPatterns project.
+It records contacts between individuals living in a village in rural Malawi in 2019-2020.
+The recording method was similar to the SocioPatterns office dataset: the participants wore devices for 14 days that scanned for nearby devices every 20 seconds, registering face-to-face contacts between the participants.
+A research paper describing the experiment is available at https://doi.org/10.1140/epjds/s13688-021-00302-w.
+We translated the detection events into contacts in the same way as for the office dataset.
+Our analysis is based on the file `tnet_malawi_pilot.csv` published by the SocioPatterns project at http://www.sociopatterns.org/datasets/contact-patterns-in-a-village-in-rural-malawi/.
+### HYCCUPS 2012 Dataset
+https://ieee-dataport.org/open-access/crawdad-upbhyccups
+* 73 devices, 43 of which had at least one contact
+* Duration 63 days
+This dataset was collected by researchers at the University Politehnica of Bucharest in 2012.
+The participants were UPB students and staff.
+The dataset records contacts between the participants' Android smartphones, which used a peer-to-peer networking library called AllJoyn to detect other participants' smartphones that were connected to the same Wi-Fi network.
+Details of how AllJoyn detects other devices, and how the detection events are recorded in the dataset, have unfortunately not been published.
+The AllJoyn library and the Android app used by the researchers are no longer available, so we can't reverse-engineer this information from the source code.
+This makes it difficult to interpret the data or make comparisons with other datasets.
+One unusual feature of this dataset is that contacts are rarely symmetrical.
+That is, if device `A` records a contact with device `B` then it's uncommon for device `B` to record a contact with device `A` at the same time.
+For the other datasets this is very common, which makes intuitive sense: if two devices remain within detection range of each other for long enough then we'd expect them both to detect each other.
+The HYCCUPS dataset considers two devices to be in contact if they're connected to the same Wi-Fi network, rather than if they're within detection range of each other, but this is still a symmetrical condition (if `A` is connected to the same network as `B` then `B` is connected to the same network as `A`), so we'd still expect the recorded contacts to be roughly symmetrical.
+The lack of symmetry may be due to some feature of the AllJoyn library - perhaps if `A` connects to `B` via AllJoyn then `B` stops trying to connect to `A`, for example.
+But without being able to confirm this guess it's difficult to draw meaningful conclusions from the data.
+Another open question about this dataset is how the data relates to the researchers' publications.
+The 2012 paper "Exploring Predictability in Mobile Interaction", which is based on this dataset, mentions that there were about 21,000 encounters via Wi-Fi and about 7,000 via Bluetooth during the experiment.
+The dataset includes about 8,400 Wi-Fi contacts.
+We don't know how to reconcile these numbers.
+After trying to contact the researchers and receiving no reply, we reluctantly decided to exclude this dataset from our research, as we don't understand the data well enough to analyse it.
+## Directed and Undirected Contacts
+The detection events recorded in the datasets are *directed*, meaning that an event in which device `A` detects device `B` is distinct from an event in which `B` detects `A`.
+We expect that the delay-tolerant networks we're interested in building will establish two-way communication between devices when they detect each other, so that the devices can exchange messages in both directions.
+Since the Haggle and SocioPatterns experiments only recorded detection events and did not try to establish two-way communication between devices, we need to find a reasonable way of modeling two-way communication based on the detection events that were recorded.
+We model this by assuming that if device `A` can detect device `B` or vice versa then two-way communication between `A` and `B` is possible.
+Thus we define an *undirected contact* between `A` and `B` as beginning when there is a directed contact from `A` to `B` or vice versa, and ending when there is no longer a directed contact from `A` to `B` or vice versa.
+An alternative approach would be to assume that if `A` can detect `B` *and* vice versa then two-way communication is possible.
+Thus an undirected contact would start when there were directed contacts in both directions, and end when either of the directed contacts ended.
+We consider this alternative definition to be too restrictive because it doesn't take into account the periodic nature of scanning: if `A` detects `B` shortly after `B` has completed its scan, then nearly a full scan interval will elapse before `B` has an opportunity to detect `A`, even if the devices remain within detection range the whole time.
+This is less significant for the SocioPatterns datasets, with their 20-second scan intervals, than for the Haggle dataset, where the scan interval of most devices is 10 minutes.
+Therefore we use the less restrictive definition: there's an undirected contact between `A` and `B` whenever there's a directed contact from `A` to `B` or vice versa.
+## Initial Analysis of Contact Patterns
+Before moving on to analysing the emergent properties of the datasets we'll discuss some characteristics that are apparent in the raw data.
+### Contact Graphs
+The following figures show the contact graphs for the three datasets.
+The nodes represent devices.
+An edge connects two nodes if the corresponding devices were in contact at any time during the experiment.
+The thickness of the edge represents the total duration the devices spent in contact during the experiment, in proportion to the maximum duration any two devices spent in contact.
+Contact durations are calculated using the definition of an undirected contact given above.
+---
+![haggle-contact-graph.png](uploads/img/haggle-contact-graph.png)
+**Contact graph for the Haggle dataset.**
+---
+![office-contact-graph.png](uploads/img/office-contact-graph.png)
+**Contact graph for the SocioPatterns office dataset.**
+---
+![malawi-contact-graph.png](uploads/img/malawi-contact-graph.png)
+**Contact graph for the SocioPatterns village dataset.**
+---
+The Haggle contact graph has a densely connected core and a sparsely connected periphery.
+Within the core are two clusters of nodes connected by thick edges, representing devices that spent a lot of time in contact with each other.
+The periphery contains about a dozen devices that each made contact with one or two devices from the core.
+There are two isolated nodes representing the two devices that didn't make contact with any other devices.
+The office contact graph is more uniform: there's no clear core or periphery, no obvious clusters, and most of the edges have similar thickness.
+The village contact graph is sparser than the others: most nodes have fewer edges than in the other graphs.
+Several cliques of three to five nodes are visible, perhaps representing households.
+A small component of two nodes is disconnected from the rest of the graph.
+### Contact Events per Hour
+Counting the number of contact events per hour reveals clear daily patterns in all the datasets.
+(Here we use "contact event" as shorthand for the detection event that begins a contact between two devices.)
+---
+![haggle-contact-frequency.png](uploads/img/haggle-contact-frequency.png)
+**Contact events per hour in the Haggle dataset.**
+---
+![office-contact-frequency.png](uploads/img/office-contact-frequency.png)
+**Contact events per hour in the SocioPatterns office dataset.**
+---
+![malawi-contact-frequency.png](uploads/img/malawi-contact-frequency.png)
+**Contact events per hour in the SocioPatterns village dataset.**
+---
+These figures show an obvious but important property of delay-tolerant networks based on human mobility: contacts happen more frequently during the daytime than at night.
+The office dataset has no contacts at all during the night.
+As well as daily patterns, the Haggle and office datasets show weekly patterns, with weekdays clearly distinguishable from weekends (the Haggle dataset starts at 9:55 local time on a Friday morning, while the office dataset starts at 8:00 local time on a Monday morning).
+The weekly pattern is especially pronounced for the office dataset, which has no contacts at all during the weekend.
+The village dataset, on the other hand, does not show any clear differences between weekdays and weekends (we don't know the day and time when this dataset starts).
+| Dataset | Mean contacts per hour |
+|---------|------------------------|
+| Haggle  |                   39.5 |
+| Office  |                   14.0 |
+| Village |                  102.3 |
+Contacts are far more frequent in the village dataset than the other datasets.
+The office and village datasets were collected using the same experimental setup, so the difference is not attributable to the collection methodology.
+This may be due in part to the fact that contacts in the office dataset only occur during office hours (roughly a quarter of all hours in the week).
+### Number of Connected Devices
+We refer to a device as "connected" when it's involved in one or more undirected contacts.
+All three datasets show daily patterns in the number of connected devices, with similar patterns to the number of contact events per hour.
+---
+![haggle-density.png](uploads/img/haggle-density.png)
+**Number of connected devices for the Haggle dataset.**
+---
+![office-density.png](uploads/img/office-density.png)
+**Number of connected devices for the SocioPatterns office dataset.**
+---
+![village-density.png](uploads/img/village-density.png)
+**Number of connected devices for the SocioPatterns village dataset.**
+---
+During the daily peaks, the Haggle dataset has far more connected devices than the other datasets.
+As expected, the office dataset has no connected devices at night or during the weekend.
+The Haggle and office datasets have clear differences between weekdays and weekends, while the village dataset does not.
+Both of the SocioPatterns datasets show a lot of rapid variation: even within each daily peak, the number of connected devices frequently drops to zero for short periods, whereas in the Haggle dataset the peaks are smoother.
+This may be due in part to the greater detection range of the devices used in the Haggle experiment (5-10 metres versus 1.5 metres), and the fact that the devices used in the SocioPatterns experiments were designed to detect only face-to-face contacts.
+### Time Spent Connected
+The three datasets differ considerably in the amount of time devices spend connected.
+---
+![time-active.png](uploads/img/time-active.png)
+**Fraction of time spent connected, compared across datasets.**
+---
+The median device in the Haggle dataset spends 9.8% of its time connected, whereas the median is just 0.2% for the office dataset and 2.7% for the village dataset.
+The difference between the Haggle dataset and the others may be due in part to the greater detection range of the devices used in the Haggle experiment and the fact that the devices used in the SocioPatterns experiments detected only face-to-face contacts.
+Contacts in the office dataset only occur during office hours, which are roughly a quarter of all hours in the week, but even after accounting for this there's still a large difference between the office and village datasets, indicating that face-to-face contacts in the office environment are sparser than in the village environment, even during office hours.
+### Isolated Nodes and Small Components
+Before continuing with our analysis we need to decide how to handle isolated nodes and small components in the contact graphs.
+The contact graph for the Haggle dataset has two isolated nodes representing devices that didn't make contact with any other participating devices during the experiment, while the contact graph for the village dataset has a small component representing two devices that made contact with each other but not with any other devices.
+Clearly devices such as these will cause difficulties for delay-tolerant communication, as they can't reach or be reached by any of the devices in the main component.
+If we excluded them from our analysis then we might reach unrealistically positive conclusions about the feasibility of delay-tolerant networks.
+In the real world there are many people who are physically isolated from others, or from those outside their households: for example, people who are unable to leave their homes due to illness or disability.
+These people are less likely to be well-served by delay-tolerant networks than people who are more mobile.
+By including isolated nodes and small components in our analysis, we aim to ensure that this aspect of reality is reflected in our results.
+## Transitive Contact Graphs
+In order to study the datasets at a higher level of abstraction, we can extract a *transitive contact graph* from each dataset.
+This graph represents the whole series of events recorded in the dataset and allows us to study the emergent properties of those events.
+To do this, we create a layered graph in which each layer represents the contacts between devices at a given instant.
+We start by creating the bottom layer, which simply contains a node representing each device.
+Then we process the detection events in order of time.
+When a contact begins, we add a new layer to the graph that duplicates the nodes and edges of the layer below.
+We then an edge to the new layer that connects the two devices involved in the new contact.
+When a contact ends, we likewise add a new layer that duplicates the nodes and edges of the layer below.
+We then remove from the new layer the edge representing the contact that has just ended.
+In both cases, we add a directed edge from each node in the layer below to the node representing the same device in the new layer.
+The result is a graph in which it is possible to follow edges horizontally within each layer, and vertically from each layer to the layer above (later in time), but *not* to the layer below (earlier in time).
+Horizontal edges represent contacts between devices at a given instant, while vertical edges connect nodes representing the same device at different instants.
+Thus, for example, if a contact occurs between devices `A` and `B`, and then after this contact has ended there is a second contact between devices `B` and `C`, the graph allows us to see that `C` is transitively reachable from `A`, but not vice versa.
+`A` could transfer a message to `B` during their contact, and `B` could then transfer this message to `C` during their later contact, but the reverse would not be possible, as the contact between `A` and `B` ends before the contact between `B` and `C` begins.
+This asymmetry is an important property of delay-tolerant networks, and the directed edges between layers allow our model to capture it.
+We can use the transitive contact graph to study the emergent properties of the mobility patterns recorded in the datasets.
+For example, if we start from a given device at a given time, which devices are transitively reachable, and how long does it take to reach them?
+How do these properties vary with the choice of starting device or starting time?
+A transitive contact graph can be built from directed or undirected contacts.
+As discussed above, we're mainly interested in undirected contacts because the delay-tolerant networks we're interested in building will use two-way communication to exchange messages.
+But it's possible to imagine delay-tolerant networks that use one-way broadcasts to propagate messages, and such networks can be modeled using transitive contact graphs with directed contacts - that is, with directed edges within each layer.
+### Events With the Same Timestamp
+The Haggle and SocioPatterns datasets use timestamps with one-second precision and contain many events with the same timestamp.
+When building a transitive contact graph, if two or more events have the same timestamp then we process them in the order in which they occurred in the original dataset.
+In particular, an event indicating the end of a contact is always processed after the event indicating the beginning of the same contact, even if they have the same timestamp.
+(This is common, because as discussed above, many contacts have recorded durations of zero, and so the events indicating the beginning and end of such a contact have the same timestamp.)
+## Initial Reachability Analysis
+By building a transitive contact graph for each dataset we can begin to study which devices are reachable from which others, which tells us which devices would potentially be able to send messages to each other in a delay-tolerant network.
+It's important to bear in mind that a real delay-tolerant network might not be able to deliver all messages that are theoretically deliverable according to the transitive contact graph.
+A real network will have to deal with issues such as limited storage space on devices and limited communication bandwidth between devices, which may prevent some messages from being delivered.
+On the other hand, if a message is *not* deliverable according to the transitive contact graph then no real network could deliver that message - there is simply no path by which the message could reach its destination.
+Thus the reachability analysis provides an upper bound for the delivery performance that any real network could achieve.
+This information is useful alongside more realistic simulations of specific network designs, as it tells us how close a simulated network is coming to the upper bound.
+If a simulated network does not manage to deliver all messages then the upper bound provided by the reachability analysis can tell us to what extent this is due to the inherent properties of the mobility data, and to what extent it's due to the specific network design being simulated.
+The upper bound is also useful as a way of assessing whether any delay-tolerant network could achieve useful levels of performance, given the mobility data.
+If the upper bound for delivery performance is too low to be useful then we don't need to investigate the performance of specific designs.
+A negative result at this stage could save us a lot of effort designing and deploying real networks, if we can see that no network could possibly achieve the level of performance we need.
+### Mean Reachability
+A basic way to measure reachability is to count how many devices are reachable, starting from a given source device at a given time. By averaging over all possible source devices, we obtain the *mean reachability* at the given time. This is the fraction of source-destination pairs such that delay-tolerant path exists from the source to the destination starting at the given time, and thus a message sent by the source at the given time could theoretically reach the destination.
+---
+![haggle-reachability.png](uploads/img/haggle-reachability.png)
+**Mean reachability for the Haggle dataset.**
+---
+![office-reachabilty.png](uploads/img/office-reachabilty.png)
+**Mean reachability for the SocioPatterns office dataset.**
+---
+![malawi-reachability.png](uploads/img/malawi-reachability.png)
+**Mean reachability for the SocioPatterns village dataset.**
+---
+It's immediately apparent that mean reachability declines with time for all the datasets.
+At the beginning of the experiments, mean reachability is above 90% for every dataset.
+By the end of the experiments, mean reachability is zero for every dataset.
+Given that we've already seen above that contacts continue to happen for the duration of the experiments (subject to daily and weekly patterns), it seems unlikely that this decline in reachability is due to a change in the participants' mobility patterns.
+Instead, we hypothesise that the decline is due to a *boundary effect* created by the end of the experiment.
+The mean reachability metric counts all paths through the transitive contact graph that start at a given time - that is, at a given layer of the graph - and reach their destinations by the end of the experiment.
+Paths that start earlier in the experiment can access more layers of the transitive contact graph than paths that start later.
+If, counterfactually, the experiment had ended earlier than it did, then the upper layers of the transitive contact graph would not have existed, and any paths passing through those layers would not have reached their destinations.
+Thus we would expect the mean reachability to have been lower than what was measured in reality, especially for times closer to the counterfactually earlier end of the experiment.
+The experiment's earlier end would have caused the boundary effect to occur earlier.
+If our hypothesis is wrong and the observed decline in reachability is *not* due to a boundary effect, then ending the experiment earlier would have had no effect on the mean reachability.
+Thus we can test the hypothesis about a boundary effect by cutting off the end of the dataset, effectively ending the experiment earlier, and comparing the results with the full dataset.
+---
+![haggle-reachability-cutoff-time.png](uploads/img/haggle-reachability-cutoff-time.png)
+**Mean reachability for the Haggle dataset with different cutoff times.**
+---
+The results support the hypothesis of a boundary effect.
+Cutting off the end of the dataset reduces the mean reachability compared with the full dataset, and later times (closer to the cutoff time) are affected more than earlier times (further from the cutoff time).
+For some cutoff times, reachability drops to zero before the cutoff time, while for others it remains above zero, indicating that some devices are transitively connected to each other when the cutoff time is reached (ie in the highest remaining layer of the transitive contact graph).
+By looking at the left edge of the figure above, we can see that the mean reachability at the start of the experiment is affected by the cutoff time.
+Even a cutoff time of 11 days slightly affects the mean reachability at the beginning of the experiment, indicating that some of the paths that started at the beginning of the experiment took more than 11 days to reach their destinations, and were thus removed when the cutoff time was set to 11 days.
+This leads to the troubling conclusion that the boundary effect extends at least 11 days back from the end of the 12-day dataset, and thus we have little or no data that is not affected by the boundary.
+It isn't meaningful to calculate average measurements across the whole dataset, or to compare one day to another, if we know that the results are distorted by this boundary effect.
+So we have a problem.
+How can we prevent the boundary effect from distorting our results?
+## Looping the Data
+In an ideal world we'd like to repeat the experiments over much longer durations, capturing several weeks' worth of data.
+We would measure the extent of the boundary effect by truncating the data, as we did above, and then we would run the reachability analysis on the full dataset, discarding the results near the end of the experiment that fell within the boundary effect.
+We would need to ensure that each experiment lasted long enough to leave at least one week's worth of "clean" results to analyse after discarding the results affected by the boundary.
+This would let us average out the daily and weekly patterns in the clean results, and compare one day to another, without any distortions created by the boundary effect.
+Of course we can't actually repeat the experiments, but we can try to approximate the results of a longer experiment by selecting a week's worth of data from each dataset and looping it - in other words, processing the detection events as though the experiment had lasted for multiple weeks and the same sequence of events had occurred in every week.
+To avoid any distortions caused by suddenly "jumping back in time", we can choose the start and end times of the loop so that they're exactly a week apart and no devices are in contact with each other at the start time or the end time.
+All of the datasets have suitable start and end times where no devices are in contact.
+It's easy to find suitable start and end times for the office dataset, which has a long period every night with no contacts, but the Haggle dataset has fewer suitable times and the village dataset has fewer still.
+The following figures show the part of each dataset that we selected for looping.
+---
+![haggle-loop-points.png](uploads/img/haggle-loop-points.png)
+**Data selected to be looped for the Haggle dataset.**
+---
+![office-loop-points.png](uploads/img/office-loop-points.png)
+**Data selected to be looped for the SocioPatterns office dataset.**
+---
+![village-loop-points.png](uploads/img/village-loop-points.png)
+**Data selected to be looped for the SocioPatterns village dataset.**
+---
+### Contact Graphs of the Looped Data
+The contact graphs for the looped data are somewhat sparser than the graphs for the full datasets.
+---
+![haggle-contact-graph-looped.png](uploads/img/haggle-contact-graph-looped.png)
+**Contact graph for the looped Haggle dataset.**
+---
+![office-contact-graph-looped.png](uploads/img/office-contact-graph-looped.png)
+**Contact graph for the looped SocioPatterns office dataset.**
+---
+![malawi-contact-graph-looped.png](uploads/img/malawi-contact-graph-looped.png)
+**Contact graph for the looped SocioPatterns village dataset.**
+---
+This shows that our technique for simulating a longer dataset by looping a week's worth of data isn't perfect: some devices that make contact during the full dataset don't make contact during the week's worth of data that we selected for looping.
+The contact graph for the looped version of the village dataset has three small compoments that are disconnected from the rest of the graph, whereas the full version of the dataset has only one small compoment.
+There are also two devices in the office dataset and one in the village dataset that don't have any contacts during the week's worth of data chosen for the loop.
+These are shown as isolated nodes in the contact graphs above.
+The full versions of these datasets have no isolated nodes.
+As discussed above, we prefer to include these isolated nodes and small components in our analysis, as they represent a realistic feature of human mobility patterns with implications for the feasibility of delay-tolerant networks.
+## Reachability Analysis with Looped Data
+TODO