Changes

akwizgran · 1b5b144f
--- a/Analysis-of-Human-Mobility-Datasets.md
+++ b/Analysis-of-Human-Mobility-Datasets.md
@@ -172,46 +172,6 @@ Therefore we use the less restrictive definition: there's an undirected contact
 Before moving on to analysing the emergent properties of the datasets we'll discuss some characteristics that are apparent in the raw data.
-### Static Contact Graphs
-The following figures show the *static contact graphs* for the three datasets.
-Each node in a static contact graph represents a device.
-An edge connects two nodes if the corresponding devices were in contact at any time.
-The thickness of the edge represents the total duration the devices spent in contact, in proportion to the maximum duration any two devices spent in contact.
-Contact durations are calculated using the definition of an undirected contact given above.
---
-![haggle-contact-graph.png](uploads/img/haggle-contact-graph.png)
-**Static contact graph for the Haggle dataset.**
---
-![office-contact-graph.png](uploads/img/office-contact-graph.png)
-**Static contact graph for the SocioPatterns office dataset.**
---
-![malawi-contact-graph.png](uploads/img/malawi-contact-graph.png)
-**Static contact graph for the SocioPatterns village dataset.**
---
-The Haggle dataset's static contact graph has a densely connected core and a sparsely connected periphery.
-Within the core are two clusters of nodes connected by thick edges, representing devices that spent a lot of time in contact with each other.
-The periphery contains about a dozen devices that each made contact with one or two devices from the core.
-There are two isolated nodes, which represent the two devices that didn't make contact with any other participating devices during the experiment.
-The office datatet's static contact graph is more uniform: there's no clear core or periphery, no obvious clusters, and most of the edges have similar thickness.
-The village dataset's static contact graph is sparser than the others: most nodes have fewer edges than in the other graphs.
-Several cliques of three to five nodes are visible, perhaps representing households.
-A small component of two nodes is disconnected from the rest of the graph.
 ### Contact Events per Hour
 Counting the number of contact events per hour reveals clear daily patterns in all the datasets.
@@ -310,6 +270,46 @@ The difference between the Haggle dataset and the others may be due in part to t
 Contacts in the office dataset only occur during office hours, which are roughly a quarter of all hours in the week.
 But even after accounting for this there's still a large difference between the office and village datasets, indicating that face-to-face contacts in the office environment are sparser than in the village environment.
+### Static Contact Graphs
+The following figures show the *static contact graphs* for the three datasets.
+Each node in a static contact graph represents a device.
+An edge connects two nodes if the corresponding devices were in contact at any time.
+The thickness of the edge represents the total duration the devices spent in contact, in proportion to the maximum duration any two devices spent in contact.
+Contact durations are calculated using the definition of an undirected contact given above.
+---
+![haggle-contact-graph.png](uploads/img/haggle-contact-graph.png)
+**Static contact graph for the Haggle dataset.**
+---
+![office-contact-graph.png](uploads/img/office-contact-graph.png)
+**Static contact graph for the SocioPatterns office dataset.**
+---
+![malawi-contact-graph.png](uploads/img/malawi-contact-graph.png)
+**Static contact graph for the SocioPatterns village dataset.**
+---
+The Haggle dataset's static contact graph has a densely connected core and a sparsely connected periphery.
+Within the core are two clusters of nodes connected by thick edges, representing devices that spent a lot of time in contact with each other.
+The periphery contains about a dozen devices that each made contact with one or two devices from the core.
+There are two isolated nodes, which represent the two devices that didn't make contact with any other participating devices during the experiment.
+The office datatet's static contact graph is more uniform: there's no clear core or periphery, no obvious clusters, and most of the edges have similar thickness.
+The village dataset's static contact graph is sparser than the others: most nodes have fewer edges than in the other graphs.
+Several cliques of three to five nodes are visible, perhaps representing households.
+A small component of two nodes is disconnected from the rest of the graph.
 ### Isolated Nodes and Small Components
 The static contact graph for the Haggle dataset has two isolated nodes representing devices that didn't make contact with any other participating devices during the experiment, while the static contact graph for the village dataset has a small component representing two devices that made contact with each other but not with any other devices.
@@ -431,7 +431,7 @@ Static reachability is always at least as high as dynamic reachability, as it ig
 ---
-It's immediately apparent that dynamic reachability declines with time for all the datasets.
+It's immediately apparent that dynamic reachability declines with time for all datasets.
 (Static reachability doesn't vary with time, by definition.)
 At the beginning of the experiments, the mean dynamic reachability is above 90% for every dataset.
 By the end of the experiments, the mean dynamic reachability is zero for all datasets.
@@ -447,12 +447,12 @@ If, counterfactually, the experiment had ended earlier than it did, then the upp
 Thus we would expect the dynamic reachability to have been lower than what was measured in reality.
 The experiment's earlier end would have caused the boundary effect to occur earlier.
-We can test the hypothesis of a boundary effect by cutting off the end of the dataset, effectively ending the experiment earlier, and comparing the results with the full dataset.
+We can test this hypothesis by cutting off the end of the dataset, effectively ending the experiment earlier, and comparing the results with the full dataset.
-If the hypothesis of a boundary effect is right then cutting off the end of the dataset will reduce reachability at times before the cutoff time.
+If the hypothesis of a boundary effect is wrong and the observed decline in reachability is *not* due to a boundary effect, then cutting off the end of the dataset will have no effect on the mean dynamic reachability at times before the cutoff time.
-By measuring how far back from the cutoff time the reduction in reachability extends, we'll be able to estimate how far back from the end of the full dataset the boundary effect extends.
-If the hypothesis is wrong and the observed decline in reachability is *not* due to a boundary effect, then cutting off the end of the dataset will have no effect on reachability at times before the cutoff time.
+On the other hand, if the hypothesis of a boundary effect is right then cutting off the end of the dataset will reduce the mean dynamic reachability at times before the cutoff time.
+By measuring how far back from the cutoff time the change in reachability extends, we'll be able to estimate how far back from the end of the full dataset the boundary effect extends.
 ---
@@ -463,14 +463,14 @@ If the hypothesis is wrong and the observed decline in reachability is *not* due
 ---
 The results support the hypothesis of a boundary effect.
-Cutting off the end of the dataset reduces the dynamic reachability compared with the full dataset, and later times (closer to the cutoff time) are affected more than earlier times (further from the cutoff time).
+Cutting off the end of the dataset reduces the mean dynamic reachability at times before the cutoff time, compared with the full dataset.
 For some cutoff times, reachability drops to zero before the cutoff time, while for others it remains above zero, indicating that some devices are transitively connected to each other when the cutoff time is reached (ie, in the highest remaining layer of the dynamic contact graph).
-By looking at the left edge of the figure above, we can see that the dynamic reachability at the start of the experiment is affected by changing the cutoff time.
+By looking at the left edge of the figure above, we can see that the dynamic reachability at the beginning of the experiment is affected by changing the cutoff time.
 Even a cutoff time of 11 days slightly affects the dynamic reachability at the beginning of the experiment, indicating that some of the paths that started at the beginning of the experiment took more than 11 days to reach their destinations, and were thus removed when the cutoff time was set to 11 days.
-This leads to the troubling conclusion that the boundary effect extends at least 11 days back from the end of the 12-day dataset, and thus we have little or no data that is not affected by the boundary.
+This leads to the troubling conclusion that the boundary effect extends at least 11 days back from the end of the 12-day dataset, and thus we have little or no "clean" data that's not affected by the boundary.
 Results for the other datasets are similar.
@@ -541,7 +541,7 @@ The thickness of the edge represents the total duration the devices spent in con
 ---
-This shows that our technique for simulating a longer dataset by looping a week's worth of data isn't perfect: some devices that make contact during the full dataset don't make contact during the week's worth of data that we selected for looping.
+The differences between the static contact graphs for the full and looped versions of the datasets show that our technique for simulating a longer dataset by looping a week's worth of data isn't perfect: some devices that make contact during the full dataset don't make contact during the week's worth of data that we selected for looping.
 The static contact graph for the looped version of the village dataset has three small compoments that are disconnected from the rest of the graph, whereas the full version of the dataset has only one small compoment.
@@ -551,7 +551,7 @@ The full versions of these datasets have no isolated nodes.
 As discussed above, we prefer to include these isolated nodes and small components in our analysis, as they represent a realistic feature of human mobility patterns that has implications for the feasibility of delay-tolerant networks.
-Due to these isolated nodes and small components, the mean static reachability is lower for the looped versions of the office and village datasets than for the full versions of the same datasets.
+Due to these isolated nodes and small components, static reachability is lower for the looped versions of the office and village datasets than for the full versions of the same datasets.
 | Dataset | Mean static reachability (full) | Mean static reachability (looped) |
 |---------|---------------------------------|-----------------------------------|