Librem PUSH Notification platform [FUNDING REQUIRED]

Caliga · October 1, 2019, 11:19am

To my knowledge, if a TCP socket waits for new data, the process (thread, rather) is sleeping until the timeout is reached. If this requires additional keep alive packages under the hood (does it?), the kernel handles it. This could possibly be optimized (in-kernel) by aligning such packets so that the kernel can sleep e.g. 15s and then send all keep-alive packages at once.

In a similar fashion, apps that are polling could be synced by (voluntarily) listening to some DBus events. This should be customizable. First I wanted to write I’m totally fine with having my mails polled only every 10 minutes, but then I realized I’m actually fine with “no polling at all” while the screen is off.
No need to poll my mails 50 or even 500 times every night when I won’t read them before turning on the screen anyway.

Yes, I know, we have a fancy LED to indicate incoming messages, but personally I’d poll emails only when the screen is on, and maybe disable messenger polling during night time.

rogro82 · October 1, 2019, 11:25am

Its an interesting case and I guess some standarized DBus events to notify applications about updates or even just the presence of an update would work.

But instead of just one solution id rather see a multitude of solutions which can hook into this… I don’t want all my apps consumed by one big external messaging system like GCM (decentralized or not) I have no control over.

I’d rather have a self-hosted solution where I can connect “all” my apps / services ( e.g. through polling ) and then specify that as the back-end for such a solution.

Caliga · October 1, 2019, 11:30am

That’s not quite what I meant. Rather, DBus would only hint apps to do their polling in a synced fashion.
E.g., “Dear app, if you need fast polling, use this scheme: 10:10:15, 10:10:25, 10:10:35, … For medium, go 10:10:15, 10:11:15, 10:12:15, … and for slow polling, go 10:10:15, 10:20:15, 10:30:15”

Or, in other words, “Please sync on the timestamp 10:10:15, and then poll every 10 seconds, every minute or every 10 minutes”

ruff · October 1, 2019, 11:33am

Ok, let imagine following situation (again I’m at XMPP heavy case which is realtime and statefull).
So we have xmpp session setup (several dozens roundtips), we have SM, SI and TLS.
When phone goes sleeping and suspending apps - client detects that in pushes inactive SI message. Server treats that like stop spamming - increase keepalive timers, hold presense broadcast, maybe queue pep events. however since it can scarcely queue messages and IQs (those are interactive) it must fire them up even in SI inactive case. So server did writes to TCP socket.

Since client is still sleeping the packets don’t hit it. perhaps they are queued at operators gateway. Server sends retries, client still sleeps. Again - these retries are now server-side TCP stack (kernel) not application, so it’s not SI state aware. So again to avoid socket starving the stack most probably will be configured with shorter backoff and may eventually time out till client wakes up and resumes TCP conversation.

If there’s a push though - server may hold everything and rely on the fact that push should wake the client up, client will send active state and server would flush the buffer. That will allow socket to survive long silence periods and actually rely on client’s ineraction preserving the connection and resources.

It has nothing to do with notification led.

dcz · October 1, 2019, 1:18pm

I think you’re going to have a more meaningful response if you expand your acronyms

ruff · October 1, 2019, 1:37pm

sorry - stream management and state indication. SM tracks xml-stream to enforce xml stanza framing integrity and SI allows mobile clients to indicate they are entering suspended state so do-not-disturb unless urgently needed.
SM was created as response to the fact that TCP reliable connection is still not reliable enough for higher level application framing. And SI was created exactly for mobile devices. However SI alone is still not sufficient as I described above, push service complements the gap.
Note: push service becomes required in mobile compliance suite 2020 for xmpp advanced server.

jon.armani · October 1, 2019, 3:32pm

Do you mind posting or linking your dissertation?

itay-grudev · October 1, 2019, 10:40pm

Now let me try and answer all of your questions.

If you mean implementation details I would love to share all of my ideas. I plan to release everything as free software so nothing will be closed from the public. Still I am in the very early stages of development.

Multiple applications polling different services means a lot more network overhead and more radio time. Simply the fact that you need to maintain several TCP connections with several servers. Yes, you can sync the requests so you don’t have to wake up the radio module unnecessarily but just maintaining a TCP socket alive you need to send keepalives every several seconds. Some NATs (and mobile connections are very often in a local network through a NAT) drop connections if there haven’t been any packets for more than 30s. This is radio time, which could be saved by only maintaining a single (or at the very least less) sockets. And if the polling is done at higher level - even worse. In this case we are no longer talking about TCP keepalives or simple UDP messages but more complicated protocol specific communication.

Decentralised in the way Matrix works. You can deploy and use an arbitrary PUSH server and instruct services to use the server you specified. Currently browsers and platforms provide a PUSH service and all messages go through Google’s or Mozilla’s servers. And yes Mozilla sounds much better than Google, but still they get a copy of every message transferred and there is no end-to-end encryption enforced at the protocol level.

The way it will work is essentially a Publish Subscribe architecture with message persistency so you can lookup past messages. I will explain more implementation details in a separate post.

I have seen and used the MaidSafe network in the past. While it is a cool technology it still will produce too much overhead for this purpose. Also it is a P2P network. I want to have a more client-server architecture because this is where the energy efficiency advantage comes from. You have less connections to maintain, smaller amount of keepalives. Also mobile networks often employ NATs which makes it very hard or impossible to punch through and will make the protocol unreliable in a P2P network.

I have spent so much time researching this. I wanted to avoid having to reinvent the wheel and I looked at enormous amount of alternatives - from XMPP and MQTT to the Matrix protocol. Unfortunately almost all current protocols lack certain features or have heavy implementations, or lack security aspects required for this communication. Also I want this to be as lightweight as possible. Less CPU usage, less networking usage - more phone battery life.

I want to avoid HTTP on the device side. HTTP is awesome for and simple for applications to send notifications but on the device side the protocol has to be compressed, efficient and binary. HTTP introduces a huge overhead - headers, cookies, browser versions, protocol upgrades and downgrades, expensive HTTPS/TLS negotiation. I want to go simpler and more efficient without compromising security. I will explain the proposed architecture in a separate post.

This is true at the physical layer, but at the network layer to persist a socket, especially through a NAT you need regular keepalives.

Sure if you are up for reading 60 pages of the initial protocol version write me on matrix: @itay-grudev:m.hacktag.uk. I wouldn’t mind sharing it.

dcz · October 2, 2019, 8:26am

That’s not what I asked about though Do you have any measurements that prove what you’re saying?

That’s plain wrong; my browser connects to Purism at forums.puri.sm directly, and any other site directly (content delivery network notwithstanding). My game client connects to my friend’s game server directly. My torrent client connects to the seeds/peers directly.

In a different category, my email client connects to my email provider directly, my Mastodon client as well, my Matrix client as well.

Apps being forced to use Firebase fall rather into the former category, so my question was about being better than that existing arrangement.

Why don’t you share it here?

ruff · October 2, 2019, 8:46am

No wait, push service is not data push and you named it right - push notification service. I have no idea why would browser use push notification service like fcm or apns as browser by definition is interactive hence is pulling. And if it needs to push data during interactive session there are plenty of methods starting from long pull, xhr event poll, http2, websockets, you-name-it.
HTML app which is using browser as a runtime - yes, it may rely on push service as it cannot guarantee the browser will be executing an app when something happens on the backend. So push notification may trigger app start which will spin off browser runtime for the app.

Either way I agree standardized push service may contribute to more efficient power management. I.e. it’s not an absolute remedy, but it’s a tool which may help to improve it.

itay-grudev · October 3, 2019, 8:17am

Architecture

The most important factor that drives the architecture is that devices will communicate mostly through home and public wireless networks and mobile data networks all of which are usually behind a NAT. This forces a very strict requirement on how the protocol needs to work - the device needs to connect to a relay server, since the connection wouldn’t work in reverse.

Encryption and message format

The protocol needs to be end-to-end encrypted from the application sending notifications to the device. This means that the device will generate a set of keys and send them to the application upon subscription. There are two ways to approach the encryption problem - symmetric and asymmetric. Each has advantages and disadvantages. But the main problem with asymmetric encryption is efficiency and since this protocol needs to be as efficient as possible I propose we use the following scheme using symmetric keys. Upon registration the device generates a set of two symmetric keys:

Public key - This key will be sent to the relay server and will be used to authenticate messages sent from the application and forward them to the device.
Private key - Used for encrypting and signing the message sent.
When an application is sending a message it is first signed with the private key than encrypted with the private key. The final step is to sign the encrypted message with the public key. The last step allows verification on the relay server.

Before using a specific relay server, the user needs to register on the server with a unique username or email and a password. This user’s password is never sent to the relay server and authentication is performed using the algorithm specified in the SRP (Secure Remote Password) protocol. One deviation from the SRP protocol is that the PBKDF (password based key derivation function) step is applied twice to the password before it is used for authentication with the server. The reasoning for this will be explained later in the special password base messaging use case. After registration the relay server assigns each device a unique identification number.

There are other precautions that need to be taken, like replay attack mitigation. This is achieved by assigning a unique counter (unsigned int) to each message. The first message starts at zero and very subsequent message sent from either side needs to have a counter greater than the previous message. When the counter approaches integer overflow the subscription needs to be re-established. A message that has a counter equal to or smaller is considered either an invalid message or a replayed message and should be treated specially in the device/application software. The fields carried within a message will be as follows:

Message encrypted with the Private Key
Application identifier, encrypted with the Private Key
Device ID, encrypted with the Public Key
Counter, encrypted with the Public Key
Timestamp (needs not be accurate for increased privacy), encrypted with the Public Key
Private Key signature (includes the counter and device ID)
Public Key signature (includes the counter, device ID and the private key signature)

The message destination within the device (the corresponding application) is identified by the public key. As such it needs to be random, but unique for each subscription for the given device.

Password based messaging

There is one special use case in which the public and private keys can’t be random. This is when instead of an application sending messages to the device, a user would like to send messages to their device. An example use case is device management, similar to Find My Device/iPhone. In this case the user needs to issue commands to their device from a different computer that may never have exchanged the any public or private keys in the past. Instead the user can use password derived keys generated using a PBKDF. In this situation the user will first authenticate with the relay server. Then the public key is computed by applying the PBKDF twice on the password using the user’s salt from the relay server registration and the private key is computer by applying the PBKDF step only once using the application identifier as salt (Note that a PBKDF step does NOT refer to the iteration count, aka the function complexity but simply to the number of times it is performed). This is done so the protocol remains safe even if the user re-uses the same password they used for authenticating with the relay server and guarantees that even in this case the relay server will be unable to decrypt message contents.

Message priority and persistency

The protocol will be able to persist messages. There will be 4 basic priority levels as follows:

ZERO - No persistency required for this messages
LOW - Persistency is required but discard first if the user is running low on message storage space
MEDIUM - Persistency is required but discard if the user is running low on message storage space and there are no LOW levels message that can be discarded instead.
HIGH - Persistency is required but discard if the user is running low on message storage space and there are no HIGH levels message that can be discarded instead.

Messages should be discarded oldest first.

The persistency priority levels are there just to handle the use case when each user on a relay server is allocated a certain message storage space and that space is exceeded.

Final words

This is just a rough explanation about how the protocol will work but I hope it gives you an insight of what I’m trying to build. Regarding my dissertation it was written for a device management protocol called (Secure Device Management Protocol / SDMP) and not a generic PUSH protocol, but based on the work I did there I wanted to implement a general purpose protocol that can be used for mobile devices of any kind, IoT devices, etc. Anyone curious can find it here and can read about all of my initial thoughts and consideration when designing the first version of protocol along with references to all of the science it is based on.

ruff · October 3, 2019, 9:56am

Uhm, ok, there’s significant focus on confidentiality and integrity but very little on actual accessibility (flow control). Don’t treat me wrong, i’ll try to attack your approach from different angles to make sure i understand it correctly.
First line:
End-to-end encryption - this is significant requirement and means apps willing to support that would need to implement separate custom (non-IETF) crypto stack. While there’s obvious benefit in this (ability to pass the data) introducing yet another method is a bit an overkill. Also it lacks multidevice support. Why not to reuse something akin to Signal/Axlotl/Double Ratchet mechanism? There are implementations in the wild which should not be too complex to adopt.
Second line:
If we simply need C/I over public media why not to reuse strongswan stack and use IKEv2 + ESP without re-inventing the wheel. ESP is in-kernel (efficient) IKE is in userspace (controllable and flexible). v2 supports resumption and mobility so should tolerate long sleeps and roaming (eg from WWAN to WLAN).
Third line: Flow control: Why not to introduce something like MQTT for initial metadata poll? Eg as initial setup do quick poll to see if there’s anything pending, getting the number, prios, sizes (to estimate if you have enough wake time to digest it or need to make a full wake to process). Then higher level pull of the payloads with cryptography.

As you can see my main line is - why not to try to reuse existing technologies reducing the required implementation footprint and possible mistakes/flaws.

kieran · October 3, 2019, 12:23pm

Just between you and me , why is that? For IPv4, sure, I get it. Why not use IPv6 and say goodbye to any requirement for NAT and hence allow the protocol to reflect its requirements rather than the artificial requirements imposed by NAT?

My current phone is globally accessible (inbound) via IPv6 when it is at home connected via WiFi. (Whether that is a good thing from a security point of view is another question.) We just needs telcos to get their act together.

It can perhaps be assumed that the major players who might be generating notifications would already support IPv6 and if they don’t then they should!

ruff · October 3, 2019, 1:24pm

Really, why? I’ve asked liberty global (now vodafone) - they say IPv4 with NAT works fine for them.

patch · October 6, 2019, 8:34pm

First of all: well done; it looks like you’ve put a lot of work into this, and I’m sure you’ve put much more thought into it than I ever will.

Here are my thoughts. Take them with a grain of salt: I don’t speak with authority on this matter. I’ve only skim-read your paper and this thread, so I apologise if you’ve already expressed yourself on these points. Also, I’m not someone who’s likely to fund the project, so don’t spend too much time trying to appease me!

There’s a difference between the dissertation and this forum thread

It’s worth restating that your dissertation was about a device management protocol but this forum thread is about a more general purpose push notification protocol. You did say this, but I failed to register it on first reading.

I still don’t get what this is for

My overriding feeling is that I still don’t understand what you are proposing. Can you write some new user stories for the expanded project scope?

Who is expected to host the relay server?

Who is expected to run or host applications?

What’s an application?

It’s also not altogether clear to me what an “Application” is in your architecture. Having given it some thought, my assumption is that these could be:

Phone management applications (for “find my phone” and “remote erase” kinds of use cases) - hosted on a server or running only on a ‘client’ device.
Self-hosted or other cloud applications that run on a server and implement this protocol
Adaptors that connect to cloud services and implement this protocol on behalf of those services (e.g. something that connects to your IMAP email account and issues push notifications when new mail arrives).

Is that correct?

I’m struggling not to confuse these applications with the applications that run on the mobile devices. It might be worth changing the naming convention to make it clearer what an application is. (Or what kind of application it is.)

The terms “client” and “server” are also ambiguous, because it depends on what perspective you are looking at the system from.

How does it interface with third party software and services?

I assume there will be a library that implements the protocol, for use by applications? Or is there a daemon that implements the protocol and the applications communicate with that?

This question goes for both endpoints of the protocol: on the mobile device and on the other end (the application). How does one go about integrating push notifications into, say, an email app for the Librem 5 and into a corresponding email server?

Is NAT the real reason to have a relay server?

When I saw that your requirement for a relay server was exclusively derived from your requirement FR-2 (“The protocol must work when devices are behind a NAT and/or a firewall limiting incoming connections”) my first thought was that this seemed like something that should be an implementation detail rather than a core part of the protocol.

You acknowledged that there are other ways this could be implemented. I’d be inclined to define a relay abstraction that could represent either a relay server or some other means of relaying the messages. In practice, an alternative implementation could masquerade as a relay server even without a clean abstraction being defined, but that would be a bit of a hack.

It seems to me that the real reason to have a relay server might be to have a single point of contact for the mobile devices. Otherwise the devices would need to maintain connections to each individual application, or perhaps participate in a peer-to-peer network.

Won’t NAT be a solved problem where server applications are concerned?

It seems to me that many applications in this architecture would most sensibly be run on a server, in which case NAT may well be a solved problem, since those application servers would need to be publicly addressable regardless of whether they issue push notifications. Someone who can run a server without NAT being a problem can probably also run a relay server without NAT being a problem. Or does the push notification platform aim to replace other protocols that applications might use to communicate with the device, thereby allowing them to run from behind NAT?

Do some of the components of this system already exist?

@ruff made some points on this theme.

It struck me that the relay server is, in part, a message broker. Those already exist, and their protocols can be quite efficient with bandwidth and CPU cycles.

itay-grudev · October 7, 2019, 10:19am

@patch I’m currently away for 2 days. I’ll answer all your questions when I come back.