Changes

akwizgran · af88277b
--- a/Analysis-of-Human-Mobility-Datasets.md
+++ b/Analysis-of-Human-Mobility-Datasets.md
@@ -11,7 +11,7 @@ By analysing the contact graphs we can obtain some bounds on the performance ach
 This analysis produces two interesting findings.
 First, the upper bound for *delivery probability* is high: under the conditions recorded in the datasets, a delay-tolerant network can, in principle, deliver a message from any sender to any recipient with high probability.
 Second, the lower bound for *delivery latency* is also high: under the conditions recorded in the datasets, a delay-tolerant network cannot, even in principle, deliver messages quickly between all potential senders and recipients.
-For some sender-recipient pairs, delivery latencies are on the order of days or weeks, rather than the seconds or minutes that users of Internet-based communication tools are used to.
+For many sender-recipient pairs, delivery latencies are on the order of days or weeks, rather than the seconds or minutes that users of Internet-based communication tools are used to.
 The consistency of these results across three different environments (university, office building and village) suggests that the findings are robust.
 These findings will inform the Briar team's strategy for enabling people to communicate securely without access to the Internet.
@@ -28,7 +28,7 @@ The datasets analysed in this research were all produced by similar methods: the
 The devices were capable of communicating wirelessly over a range of a few metres.
 Each device would periodically detect the presence of other participants' devices nearby and record the time at which these *detection events* took place.
-Detection events indicate the participants' mobility patterns only indirectly, but for our purposes this is fine: in designing delay-tolerant networks we're interested in the question of whether wireless devices carried by people can detect and communicate with nearby devices, and this is exactly what the datasets record.
+Detection events indicate the participants' mobility patterns only indirectly, but for our purposes this is fine: in designing delay-tolerant networks we're less interested in the precise details of how people move around than in the question of whether wireless devices carried by people can detect and communicate with nearby devices, and this is exactly what the datasets record.
 ### Haggle Content Dataset
@@ -44,7 +44,7 @@ One of the stationary devices was placed at the reception of the Computer Lab wh
 Each mobile device scanned for nearby devices every 10 minutes.
 The stationary devices had scan intervals ranging from 2 to 10 minutes.
-The scan intervals were slightly randomised to avoid a situation where two devices would repeatedly fail to detect each other due to scanning at the same time.
+The scan intervals were slightly randomised to avoid a situation where nearby devices would repeatedly fail to detect each other due to scanning at the same time.
 The researchers deployed 40 mobile and 21 stationary devices, of which 36 mobile and 18 stationary devices produced usable results.
 Two of the stationary devices did not have any contacts with other devices involved in the experiment.
@@ -65,8 +65,8 @@ Measuring the contact duration from the time when a device was first detected to
 The device might actually have come into range almost one scan interval before it was first detected, and remained within range almost one scan interval after it was last detected.
 Due to the long scan intervals in this dataset (10 minutes for most devices), this leads to considerable uncertainty about the true contact durations.
-In particular, if a device was detected in one scan, with a gap of two scans on either side in which is was not detected, then the recorded contact duration would be zero (or very close to zero, if the device was detected twice in the same scan, which occasionally happened).
+In particular, if a device was detected in one scan, with a gap of two scans on either side in which is was not detected, then the recorded contact duration would be zero (or very close to zero, if the device was detected twice in the same scan, which occasionally happened), whereas the true duration could be up to 20 minutes.
-About 36% of the contacts in the dataset have a recorded duration of zero.
+About 36% of the contacts in the dataset have recorded durations of zero.
 Researchers at the University of Ioannina converted the Haggle Content dataset into the format used by the Opportunistic Network Environment (ONE) simulator.
 All contacts with devices not involved in the experiment were ignored.
@@ -78,7 +78,6 @@ Specifically, we used the file `haggle-one-cambridge-city-complete.tsv` from the
 http://www.sociopatterns.org/datasets/contacts-in-a-workplace/
 * 92 mobile devices, all of which had at least one contact
 * Duration 12 days
 * Scan interval 20 seconds
@@ -96,7 +95,7 @@ For consistency with the Haggle dataset, we decided to define a contact as a ser
 This means that as with the Haggle dataset, the recorded contact duration should be understood as a lower bound on the actual duration.
 In particular, if a device was only detected in a single scan then the recorded contact duration would be zero.
-However, due to the much shorter scan interval in this experiment, the difference between the recorded contact duration and the true duration should typically be smaller than for the Haggle dataset.
+However, due to the much shorter scan interval used in this experiment, the difference between the recorded contact duration and the true duration should typically be smaller than for the Haggle dataset.
 Our analysis is based on the file `tij_InVS.dat` from the archive `tij_InVS.zip`, published by the SocioPatterns project at http://www.sociopatterns.org/datasets/contacts-in-a-workplace/.
@@ -108,8 +107,9 @@ http://www.sociopatterns.org/datasets/contact-patterns-in-a-village-in-rural-mal
 * Duration 14 days
 * Scan interval 20 seconds
-This dataset was collected by researchers at the SocioPatterns project.
+This dataset was also collected by researchers at the SocioPatterns project.
 It records contacts between individuals living in a village in rural Malawi in 2019-2020.
 The recording method was similar to the SocioPatterns office dataset: the participants wore devices that scanned for nearby devices every 20 seconds, registering face-to-face contacts between the participants.
 Individual participants joined and left the study at different times, so the researchers published data for a 14-day period during which all participants were taking part.
@@ -140,7 +140,7 @@ That is, if device `A` records a contact with device `B` then it's uncommon for
 For the other datasets this is very common, which makes intuitive sense: if two devices remain within detection range of each other for long enough then we'd expect them both to detect each other sooner or later.
 The HYCCUPS dataset considers two devices to be in contact if they're connected to the same Wi-Fi network, rather than if they're within detection range of each other, but this is still a symmetrical condition (if `A` is connected to the same network as `B` then `B` is connected to the same network as `A`), so we'd still expect the recorded contacts to be roughly symmetrical.
-The lack of symmetry may be due to some feature of the AllJoyn library - perhaps if `A` connects to `B` via AllJoyn then `B` stops trying to connect to `A`, for example.
+The lack of symmetry may be due to some characteristic of the AllJoyn library - perhaps if `A` connects to `B` via AllJoyn then `B` stops trying to connect to `A`, for example.
 But without being able to confirm this guess it's difficult to draw meaningful conclusions from the data.
 Another open question about this dataset is how the data relates to the researchers' publications.
@@ -162,13 +162,15 @@ We model this by assuming that if device `A` can detect device `B` or vice versa
 Thus we define an *undirected contact* between `A` and `B` as beginning when there is a directed contact from `A` to `B` or vice versa, and ending when there is no longer a directed contact from `A` to `B` or vice versa.
 An alternative approach would be to assume that if `A` can detect `B` *and* vice versa then two-way communication is possible.
-Thus an undirected contact would start when there were directed contacts in both directions, and end when either of the directed contacts ended.
+Thus an undirected contact would be defined as starting when there were directed contacts in both directions, and ending when either of the directed contacts ended.
-We consider this alternative definition to be too restrictive because it doesn't take into account the periodic nature of scanning: if `A` detects `B` shortly after `B` has completed its scan, then nearly a full scan interval will elapse before `B` has an opportunity to detect `A`, even if the devices remain within detection range the whole time.
+We consider this alternative definition to be too restrictive because it doesn't take into account the periodic nature of scanning: if `A` detects `B` shortly after `B` has completed its scan then nearly a full scan interval will elapse before `B` has an opportunity to detect `A`, even if the devices remain within detection range the whole time.
-This is less significant for the SocioPatterns datasets, with their 20-second scan intervals, than for the Haggle dataset, where the scan interval of most devices is 10 minutes.
+The more restrictive definition of an undirected contact would consider `A` and `B` not to be in contact during this time.
 Therefore we use the less restrictive definition: there's an undirected contact between `A` and `B` whenever there's a directed contact from `A` to `B` or vice versa.
+The difference between these definitions is less significant for the SocioPatterns datasets, with their 20-second scan intervals, than for the Haggle dataset, where the scan interval of most devices is 10 minutes.
 ## Initial Analysis of Contact Patterns
 Before moving on to analysing the emergent properties of the datasets we'll discuss some characteristics that are apparent in the raw data.
@@ -201,7 +203,8 @@ Counting the number of contact events per hour reveals clear daily patterns in a
 These figures show an obvious but important property of delay-tolerant networks based on human mobility: contacts happen more frequently during the daytime than at night.
 The office dataset has no contacts at all during the night.
-As well as daily patterns, the Haggle and office datasets show weekly patterns, with weekdays clearly distinguishable from weekends (the Haggle dataset starts at 9:55 local time on a Friday morning, while the office dataset starts at 8:00 local time on a Monday morning).
+As well as daily patterns, the Haggle and office datasets show weekly patterns, with weekdays clearly distinguishable from weekends.
+(The Haggle dataset starts at 9:55 local time on a Friday morning, while the office dataset starts at 8:00 local time on a Monday morning.)
 The weekly pattern is especially pronounced for the office dataset, which has no contacts at all during the weekend.
 The village dataset, on the other hand, doesn't show any clear differences between weekdays and weekends (we don't know the day and time when this dataset starts).
@@ -242,7 +245,7 @@ All three datasets show daily patterns in the number of connected devices, with
 ---
-During the daily peaks, the Haggle dataset has far more connected devices than the other datasets.
+During the daily peaks, the Haggle dataset has more connected devices than the other datasets.
 As expected, the office dataset has no connected devices at night or during the weekend.
 The Haggle and office datasets have clear differences between weekdays and weekends, while the village dataset does not.
@@ -303,9 +306,9 @@ Contact durations are calculated using the definition of an undirected contact g
 The Haggle dataset's static contact graph has a densely connected core and a sparsely connected periphery.
 Within the core there are two clusters of nodes connected by thick edges, representing devices that spent a lot of time in contact with each other.
 The periphery contains about a dozen devices that each made contact with one or two devices from the core.
-There are two isolated nodes, which represent the two devices that didn't make contact with any other participating devices during the experiment.
+There are two isolated nodes, which represent the two stationary devices that didn't make contact with any other participating devices during the experiment.
-The office datatet's static contact graph is more uniform: there's no clear core or periphery, no obvious clusters, and most of the edges have similar thickness.
+The office dataset's static contact graph is more uniform: there's no clear core or periphery, no obvious clusters, and most of the edges have similar thickness.
 The village dataset's static contact graph is sparser than the others: most nodes have fewer edges than in the other graphs.
 Several cliques of three to five nodes are visible, perhaps representing households.
@@ -351,6 +354,7 @@ To capture this information as well, we can construct a *dynamic contact graph*
 To do this, we create a layered graph in which each layer represents the contacts between devices at a given instant.
 Every time there's a connectivity change (ie the beginning or end of a contact), we add a new layer to the graph representing the new state.
+Directed connections between the layers represent the passage of time.
 We start by creating the bottom layer, which simply contains a node representing each device.
 Then we process the detection events in order of time.
@@ -445,12 +449,14 @@ A negative result at this stage could save us a lot of effort designing and depl
 ### Measuring Dynamic Reachability
-Just as we measured static reachability in a static contact graph by counting how many devices are reachable from a given source device, in a dynamic contact graph we can measure *dynamic reachability* by counting how many devices are reachable from a given source device, *starting at a given time*.
+Just as we measured static reachability in a static contact graph by counting how many devices are reachable from a given source device, in a dynamic contact graph we can measure *dynamic reachability* by counting how many devices are reachable *from a given source device, starting at a given time*.
 Averaging over all possible source devices gives a metric that we call the *mean dynamic reachability* at the given time.
 This is the fraction of source-destination pairs such that a delay-tolerant path exists from the source to the destination starting at the given time, and thus a message sent by the source at the given time could, in principle, eventually reach the destination.
-The figures below show the mean dynamic reachability for each dataset, as a function of the start time.
+It's important to note that unlike the mean static reachability, the mean dynamic reachability depends on the start time.
+The figures below show the mean dynamic reachability for each dataset as a function of the start time.
 The dashed line in each figure shows the mean static reachability, for comparison.
 Static reachability is always at least as high as dynamic reachability, as static reachability ignores the order in which contacts occur.
@@ -484,16 +490,16 @@ We've already seen above that contacts continue to happen at similar rates throu
 Instead, we hypothesise that the decline is due to a *boundary effect* created by the end of the experiment.
-The mean dynamic reachability metric counts all paths through the dynamic contact graph that start at a given time (ie, at a given layer of the graph) and reach their destinations by the end of the experiment (ie, by the top layer of the graph).
+The mean dynamic reachability metric counts all paths through the dynamic contact graph that start at a given time and reach their destinations by the end of the experiment.
-Paths that start earlier in the experiment can make use of more layers of the dynamic contact graph than paths that start later.
+Paths that start at earlier times have more time to reach their destinations than paths that start later.
-If, counterfactually, the experiment had ended earlier than it did, then the upper layers of the dynamic contact graph would not have existed, and any paths passing through those layers would not have reached their destinations.
+If, counterfactually, the experiment had ended earlier than it did, then any paths that had not yet reached their destinations by the experiment's earlier end time would not have been counted.
-Thus we would expect the dynamic reachability to have been lower than what was measured in reality.
+Thus we would expect the dynamic reachability to have been lower than what was measured in reality, especially for start times close to the earlier end time of the experiment.
-The experiment's earlier end would have caused the boundary effect to occur earlier.
+Ending the experiment earlier would have caused the boundary effect to occur earlier.
 We can test this hypothesis by cutting off the end of the dataset, effectively ending the experiment earlier, and comparing the results with the full dataset.
-If the hypothesis of a boundary effect is wrong and the observed decline in reachability is *not* due to a boundary effect, then cutting off the end of the dataset will have no effect on the dynamic reachability at times before the cutoff time.
+If the hypothesis of a boundary effect is wrong and the observed decline in reachability is *not* due to a boundary effect, then cutting off the end of the dataset will have no effect on the dynamic reachability at start times before the cutoff time.
 On the other hand, if the hypothesis of a boundary effect is right then cutting off the end of the dataset will reduce the dynamic reachability at times before the cutoff time.
 By measuring how far back from the cutoff time the change in reachability extends, we'll be able to estimate how far back from the end of the full dataset the boundary effect extends.
@@ -509,31 +515,35 @@ By measuring how far back from the cutoff time the change in reachability extend
 The results support the hypothesis of a boundary effect.
 Cutting off the end of the dataset reduces the mean dynamic reachability at times before the cutoff time, compared with the full dataset.
-For some cutoff times, reachability drops to zero before the cutoff time, while for others it remains above zero, indicating that some devices are connected to each other when the cutoff time is reached (ie, in the highest remaining layer of the dynamic contact graph).
+For some cutoff times, reachability drops to zero at the cutoff time, while for others it remains above zero.
+This is not a particularly important difference - it just indicates whether any devices are connected to each other when the cutoff time is reached (ie, in the highest remaining layer of the dynamic contact graph).
 By looking at the left edge of the figure above, we can see that the dynamic reachability at the beginning of the experiment is affected by changing the cutoff time.
-Even a cutoff time of 11 days slightly affects the dynamic reachability at the beginning of the experiment, indicating that some of the paths that started at the beginning of the experiment took more than 11 days to reach their destinations, and were thus removed when the cutoff time was set to 11 days.
+Even a cutoff time of 11 days (versus the original duration of 12 days) slightly affects the dynamic reachability at the beginning of the experiment, indicating that some of the paths that started at the beginning of the experiment took more than 11 days to reach their destinations, and were thus removed when the cutoff time was set to 11 days.
 This leads to the troubling conclusion that the boundary effect extends at least 11 days back from the end of the 12-day dataset, and thus we have little or no "clean" data that's not affected by the boundary.
 Results for the other datasets are similar.
-It isn't meaningful to calculate average measurements across the whole dataset, or to compare one day to another, if we know that the results are distorted by this boundary effect.
+It isn't meaningful to calculate average measurements across the whole dataset, or to compare one day to another, if we know that the results are substantially distorted by this boundary effect.
+Simulations based on these datasets would not yield meaningful results: messages sent near the start of a simulation would be vastly more likely to reach their destinations than messages sent near the end.
 So we have a problem.
 How can we prevent the boundary effect from distorting our results?
 ## Looping the Data
-In an ideal world we'd like to go back and repeat the experiments over much longer durations, capturing several weeks' worth of data.
+In an ideal world we'd like to go back and repeat the experiments over much longer durations, capturing several week of data.
 We would measure the extent of the boundary effect by truncating the data, as we did above, and then run the reachability analysis for all start times up to the beginning of the boundary effect.
-We'd need to ensure that each experiment lasted long enough to leave at least one week's worth of "clean" results unaffected by the boundary.
+We would need to ensure that each experiment ran for long enough to have at least one week of "clean" data before the boundary.
 This would let us average out the daily and weekly patterns in the clean results, without any distortions created by the boundary effect.
-Of course we can't actually repeat the experiments, but we can try to approximate the results of a longer experiment by selecting a week's worth of data from each dataset and looping it - in other words, processing the detection events as though the experiment had lasted for multiple weeks and the same sequence of events had occurred in every week.
+Of course we can't actually repeat the experiments, but we can try to approximate the results of a longer experiment by selecting one week of data from each dataset and looping it - in other words, replicating the detection events from that week and then processing the events as though the experiment had lasted for multiple weeks and the same sequence of events had occurred in every week.
 To avoid any distortions caused by suddenly "jumping back in time" at the end of the loop, we can choose the start and end times of the loop so that no devices are in contact with each other at the start time or the end time.
 All of the datasets have suitable start and end times exactly a week apart where no devices are in contact.
 It's easy to find suitable start and end times for the office dataset, which has a long period every night with no contacts, but the Haggle dataset has fewer suitable times and the village dataset has fewer still.
 The following figures show the part of each dataset that we selected for looping.
@@ -585,11 +595,11 @@ The thickness of the edge represents the total duration the devices spent in con
 ---
-The differences between the static contact graphs for the full and looped versions of the datasets show that our technique for simulating a longer dataset by looping a week's worth of data isn't perfect: some devices that make contact during the full dataset don't make contact during the week's worth of data that we selected for looping.
+The differences between the static contact graphs for the full and looped versions of the datasets show that our technique for simulating a longer dataset by looping one week of data isn't perfect: some devices that make contact during the full dataset don't make contact during the week that we selected for looping.
 The static contact graph for the looped version of the village dataset has three small components that are disconnected from the rest of the graph, whereas the full version of the dataset has only one small component.
-There are also two devices in the office dataset and one in the village dataset that don't have any contacts during the week's worth of data chosen for the loop.
+There are also two devices in the office dataset and one in the village dataset that don't have any contacts during the week selected for looping.
 These are shown as isolated nodes in the graphs above.
 The full versions of these datasets have no isolated nodes.
@@ -605,4 +615,132 @@ Due to these isolated nodes and small components, static reachability is lower f
 ## Dynamic Reachability Analysis for Looped Data
+Repeating the dynamic reachability analysis with five weeks of looped data confirms that the decline in reachability seen with the original data is due to a boundary effect.
+---
+![haggle-loop-5-reachability.png](uploads/img/haggle-loop-5-reachability.png)
+**Mean dynamic reachability for the looped Haggle dataset.**
+---
+![office-loop-5-reachability.png](uploads/img/office-loop-5-reachability.png)
+**Mean dynamic reachability for the looped SocioPatterns office dataset.**
+---
+![malawi-loop-5-reachability.png](uploads/img/malawi-loop-5-reachability.png)
+**Mean dynamic reachability for the looped SocioPatterns village dataset.**
+---
+Each of the looped datasets starts with a long period during which the mean dynamic reachability is stable and matches the mean static reachability.
+For the Haggle dataset, this period lasts until 13 days before the end of the looped dataset.
+In other words, the boundary effect caused by the end of the dataset affects the last 13 days of data.
+For the office dataset, the extent of the boundary effect is 14 days, while for the village dataset it's 18 days. 
+This leaves us with at least two weeks of "clean" data at the start of each looped dataset.
+The rest of the analysis of the looped data will focus on these two weeks of "clean" data.
+### Time-to-Live
+If our understanding of the boundary effect caused by the end of the dataset is right then some paths take many days to reach their destinations.
+We can explore this further by imposing a *time-to-live*: paths that take longer than this to reach their destinations are ignored when measuring reachability.
+This lets us see how many source-destination pairs can reach each other within a given amount of time.
+---
+![haggle-loop-5-reachability-timeout-short.png](uploads/img/haggle-loop-5-reachability-timeout-short.png)
+**Mean dynamic reachability for the looped Haggle dataset, with various time-to-live values.**
+---
+![office-loop-5-reachability-timeout-short.png](uploads/img/office-loop-5-reachability-timeout-short.png)
+**Mean dynamic reachability for the looped SocioPatterns office dataset, with various time-to-live values.**
+---
+![malawi-loop-5-reachability-timeout-short.png](uploads/img/malawi-loop-5-reachability-timeout-short.png)
+**Mean dynamic reachability for the looped SocioPatterns village dataset, with various time-to-live values.**
+---
+The figures above show that reachability increases with time-to-live, which is expected (all devices that are reachable within one day are also reachable within two days, and so on).
+More interesting are the weekly patterns, which are very apparent for shorter time-to-live values and less so for longer values.
+The office dataset is particularly interesting here: with a time-to-live of one day there's a "valley" at the weekend where reachability is zero.
+In this dataset there are no contacts at all during the weekend, so any paths that start between Friday evening and Sunday morning have no chance to reach their destinations within the time-to-live.
+Some paths that start on Sunday may reach their destinations on Monday when the office reopens.
+When the time-to-live is increased to two days, the width of this "valley" is reduced: now some paths that start on Saturday may reach their destinations on Monday.
+When the time-to-live is increased again to three days, the "valley" no longer touches zero: at any time during the weekend, some paths may reach their destinations after the office reopens, although not all paths are able to do so within the three-day limit.
+The Haggle dataset also shows a dip in reachability at the weekend, although it's less pronounced than for the office dataset as there are some contacts during the weekend.
+For all three datasets, reachability with a time-to-live of one week is lower than with an unlimited time-to-live.
+This indicates that some paths take longer than a week to reach their destinations.
+This is puzzling, because these datasets consist of one week of data looped five times.
+The contacts that occur in the first week are repeated in the second week and every subsequent week.
+How then can there be some paths that don't manage to reach their destinations during the first week, but manage to do so during later weeks?
+An example shows how this is possible.
+Imagine that Alice meets Bob every Monday and Bob meets Carol every Tuesday.
+The path from Alice to Carol takes one day if it starts just before Alice and Bob meet, but it takes eight days if it starts just after Alice and Bob meet (seven days waiting for Alice and Bob's next meeting, then one day waiting for Bob and Carol's meeting).
+In principle it's possible for paths to depend on longer chains of meetings, and thus to take several weeks to reach their destinations, even though each meeting happens weekly in the looped data.
+We can see from the figures above that this is not just a theoretical concern: for all of the datasets but especially the village dataset, a significant fraction of paths don't reach their destinations within one week.
+### Path Latency
+The figures in the previous section can be read in two ways: for a given time-to-live, they can tell us the fraction of paths that reach their destinations; or for a given probability of reaching the destination, they can tell us the time-to-live required to achieve that probability.
+In other words, the figures show the distribution of path latencies.
+For example, looking at the left edge of the Haggle figure, we can see that 31% of paths reach their destinations within one day, 48% within two days, and so on.
+The distribution changes throughout the week due to weekly patterns in the participants' mobility.
+It's clear that for all of the datasets, these latencies are much longer than the latencies that users of Internet-based communication tools are used to.
+Many activities that we're used to coordinating through such tools would be seriously disrupted by communication latencies on the order of days or weeks.
+## Maximum Path Latency
+For each dataset, we can find the minimum time-to-live that achieves the same level of reachability as an unlimited time-to-live.
+This tells us the *maximum path latency* of the dataset.
+This is also the *maximum useful time-to-live* for the dataset, as increasing the time-to-live beyond this value has no effect on reachability.
+| Dataset | Maximum Path Latency |
+|---------|----------------------|
+| Haggle  | 14 days              |
+| Office  | 17 days              |
+| Village | 22 days              |
+It's worth mentioning that only a small fraction of paths come close to the maximum path latency.
+In other words, the time-to-live could be set a few days shorter than the maximum path latency without a large impact on reachability.
+This is likely to be a relevant concern in real networks, where increasing the time-to-live may increase the storage and bandwidth costs of carrying a given volume of message traffic.
+Real-world performance may therefore *decrease* when the time-to-live is increased, as old messages (which have a low probability of reaching their destinations if they have not already done so, and are redundant if they *have* already done so) compete for storage and bandwidth with new messages.
+Fully understanding these performance tradeoffs requires a more realistic simulation of a specific delay-tolerant network design, including its rules for deciding which messages to store and forward when storage and bandwidth are finite.
+The maximum path latency gives us a lower bound on the worst-case delivery latency that we could expect any real delay-tolerant network to achieve, given the mobility patterns recorded in the datasets.
+For example, for the Haggle dataset, no real delay-tolerant network could achieve a delivery latency of less than 14 days in the worst case (ie, in the case of the sender-recipient pairs that are connected by the slowest paths).
+## Summary of Findings
+* All of the datasets show daily patterns in the frequency of contact events and the number of connected devices.
+* The datasets differ greatly in the amount of time devices spend connected.
+* The static contact graphs have different structures: the Haggle graph has a dense core and a sparse periphery, the office graph is uniform, and the village graph is sparse and clustered.
+* Static reachability is high (at least 92.7%) for all datasets. It's below 100% for the Haggle and village datasets because of a few isolated nodes and small components. These reflect realistic conditions: some people are socially isolated or unable to leave their homes.
+* The datasets are too short to be used for realistic network simulations, or for understanding how reachability changes with time: a boundary effect caused by the end of the dataset would distort the results.
+* We can correct for this by selecting a week's worth of data and looping it, then analysing the "clean" data that's outside the boundary effect.
+* The static contact graphs for the looped data are qualitatively similar to those for the original data, but with a higher incidence of isolated nodes and small components, and thus static reachability for the office and village datasets is lower.
+* Path latencies are high for all datasets. For the Haggle dataset, 21 to 56% of paths reach their destinations within one day (depending on the time of week) and some paths take 14 days to reach their destinations. The other datasets have even higher latencies.
+## Conclusions
 TODO
\ No newline at end of file