Port the LAN peer discovery code from the Disco app to the public mesh testbed, bring it up to date, wire it up to the UI, and see whether we can use wake locks, wifi locks and/or multicast locks to solve the issues reported here with the success rate dropping when the screen is turned off. We also need to test on newer API levels and work out what permissions are required on each API level.
It would be useful to know whether LAN peer discovery typically works for devices connected to a wifi hotspot (created manually via the system settings app) and/or a Wi-Fi Direct legacy mode hotspot (we could use the Offline Hotspot app for testing this).
Two devices (Huawei P8 Lite 2015 and P8 Lite 2017) don't seem to receive LSD multicast advertisements, even when the screen is turned on. Both devices can send LSD advertisements. The Huawei Y6P and Alcatel A3 XL can send and receive.
When using NSD (with DNS service discovery rather than UPnP), all four devices can send and receive advertisements. However, the P8 Lite 2015 doesn't receive the attributes map in the advertisement.
The P8 Lite 2015 does however send an attributes map, which is received by the other three devices.
It's probably worth noting that NSD is also a special case for Android apps running on ChromeOS: apps can't generally make or receive connections via the LAN, except in the case of outgoing connections to services discovered via NSD. See briar#1362.
To do: Test whether NSD works over WFD groups, and specifically whether a device providing a legacy mode hotspot can discover clients via NSD. This comment suggests that NSD service resolution will fail for network interfaces that don't have an associated Network instance:
When discovering services via NSD, one needs to resolve each service found to find out IP address, port etc.
NsdManager has a method resolveService() for this. Only one service can be resolved at a time, an error with code FAILURE_ALREADY_ACTIVE will appear when a second simultaneous resolve request is submitted. Hence it is important not to call that method right away when new services have been discovered, instead a queue needs to be maintained and worked on one by one. Resolving a service can succeed or fail and after both events, it should be possible to resolve the next service. It seems however, that sometimes resolving neither succeeds nor fails, but instead we receive a service lost event. We don't get any updates on the resolving request afterwards, but we also cannot submit a new resolve request afterwards, they will still fail with a FAILURE_ALREADY_ACTIVE error. Even restarting the service discovery entirely did not seem to help.
It looks to me like the only way to remove the NsdService's ClientInfo instance, which holds state relating to the client (i.e. our app), is for the service to receive an UNREGISTER_CLIENT message.
If this binder object unexpectedly goes away (typically because its hosting process has been killed), then the given IBinder.DeathRecipient's DeathRecipient.binderDied() method will be called.
You will only receive death notifications for remote binders, as local binders by definition can't die without you dying as well.
So it seems that the UNREGISTER_CLIENT message is only meant to be used when the remote process (i.e. our app) dies. I can't see any way for us to remove the ClientInfo instance, which contains the mResolvedService field that prevents any further resolutions from starting, short of exiting our app process.
It's gotten marked as fixed Mar 31, 2016. The next Android release afterwards was Android Nougat 7.0 (API 24), so I'm guessing that's the first version this should work on and all devices with lower API levels are affected by the bug?!
Looks like you're right - the Moto G 4G (API 22) and Moto E3 (API 23) are both affected, while the Huawei P8 Lite 2017 and Alcatel A3 XL (both API 24) are not.
This means we can't advertise attributes, unless we encode them in the service name along with the address or fetch them separately via unicast. This (abandoned) library implements the latter approach to work around the empty attributes map issue mentioned above:
Alternatively we could implement our own unicast protocol for fetching attributes, while still using NSD for the multicast part, to benefit from the special treatment it seems to get on some devices.
TODO: Modify the LSD branch to use broadcast rather than multicast (following the approach used by KDE Connect). Test whether this affects which devices can discover each other over which LANs.