This document overhauls the XMPP protocol extension Entity Capabilities (XEP-0115). It defines an XMPP protocol extension for broadcasting and dynamically discovering client, device, or generic entity capabilities. In order to minimize network impact, the transport mechanism is standard XMPP presence broadcast (thus forestalling the need for polling related to service discovery data), the capabilities information can be cached either within a session or across sessions, and the format has been kept as small as possible.
WARNING: This document has been automatically Deferred after 12 months of inactivity in its previous Experimental state. Implementation of the protocol described herein is not recommended for production systems. However, exploratory implementations are encouraged to resume the standards process.
XMPP applications often face choices based on the disco#info (see Service Discovery (XEP-0030) [1]) exposed by other entities. For example, for a client, knowledge about whether a roster entry is a Mediated Information eXchange (MIX) (XEP-0369) [2] entity or a normal client is important for user experience. It may also be desirable to provide indicators on the type of client a contact is using (mobile or not).
The canonical way to do so has been issueing XEP-0030 requests to the entities emitting presence. This, with the evergrowing featureset of XMPP, induces a lot of traffic for all involed parties, especially during startup. This is a waste of resources, as XEP-0030 information rarely changes and even more, common client configurations and versions share exactly the same information.
Entity Capabilities (XEP-0115) [3] has provided the XMPP ecosystem with a way to share this information with less bandwith. Entities using that protocol send a hash of their disco#info result along with presence or stream features. As those hashes can be cached, entities receiving these hashes only need to query the information for each hash once, greatly reducing the Service Discovery traffic.
However, XEP-0115 has two main flaws:
The hash agility mechanism is underspecified. While it is possible to change the hash function, there is no clearly defined way to send multiple hashes at once to allow for a transition period. Even though it is technically not forbidden to send multiple XEP-0115 <c/> elements with different hashes at once, it is unclear how implementations behave when this happens. Possible issues lie in the use of caps optimization, as well as clients expecting only one <c/> element.
The algorithm to generate the input for the hash function has flaws as pointed out by Waqas Hussain [4]. Even though these flaws have partially been fixed and worked around, the fundamental problem that the structural information of the individual strings from the disco response is lost persists.
Entities must be able to participate without connectivity to services except their own XMPP server and without connectivity to specialized XMPP services, including cached information from those services.
Entities should be able to learn Service Discovery information without actively querying for it.
The bandwidth consumption should be as minimal as possible, while reusing existing specifications.
Entities must be able to update their published information arbitrarily often in a single presence session.
Server infrastructure beyond XMPP Core and XMPP IM must not be required for this to work (but may be beneficial).
Entities must be able to be confident that the information obtained from the broadcast is equivalent to the information which would be obtained from querying the generating entity directly at the time the broadcast was generated.
The input to this algorithm is a Service Discovery (XEP-0030) [1] disco#info <query/> response. The output is an octet string which can be used as input to a hash function or an error.
General remarks:
The algorithm strongly distinguishes between character data (sequences of Unicode code points) and octet strings (sequences of 8-bit bytes). Whenever character data is encoded to octet strings in the following algorithm, the UTF-8 as specified in RFC 3629 [10] encoding is used. Whenever octet strings are sorted in the following algorithm, the i;octet collation as specified in RFC 4790 [11] is used.
The algorithm uses the xml:lang attribute. Implementations must take implicit values for the xml:lang attribute into account, for example those inherited from the disco#info, the IQ element, or from the root <stream> tag.
For each <feature/> element: Encode the character data of the 'var' attribute and append an octet of value 0x1f (ASCII Unit Separator)
Join the resulting octet strings together, ordered from lesser to greater.
Append an octet of value 0x1c (ASCII File Separator).
The result of this step is referenced as Features String later.
Processing of <identity/> nodes:
For each <identity/> node:
Encode the character data of the 'category', 'type', 'xml:lang' and 'name' attributes.
Append an octet of value 0x1f (ASCII Unit Separator) to each resulting octet string.
Join the resulting octet strings together, in the order of 'category', 'type', 'xml:lang' and 'name', resulting in a single octet string for the <identity/> node.
Append an octet of value 0x1e (ASCII Record Separator).
Join the resulting octet strings together, ordered from lesser to greater.
Append an octet of value 0x1c (ASCII File Separator).
The result of this step is referenced as Identities String later.
The entity picks a set of hash functions it wishes to use. The set of hash functions MUST include at least one hash function which MUST be implemented according to Use of Cryptographic Hash Functions in XMPP (XEP-0300) [9] and SHOULD NOT include any hash functions which MUST NOT be supported according to XEP-0300.
Using the algorithm from the previous subsection, the entity calculates the input for the hash functions. It then runs the input through each hash function individually. The resulting tuples of hash algorithm and hash values constitute the Capability Hash Set.
The algorithm takes a Capability Hash Set as input and returns successfully if the hash matches and an error otherwise.
Pick a Capability Hash from the Capability Hash Set.
Query the Generating Entity for disco#info on the Capability Hash Node for the chosen hash like described above. If the entity returns an error, abort with an error.
Locally calculate the Capability Hash using the same hash function as in the input as described in the algorithm. If the algorithm exits with an error, abort with an error.
The two examples walk through the process of constructing a Capability Hash Set for SHA-256 and SHA3-256. The full algorithm for generating the hash function input is explained.
The data from the example was the first entry in the capsdb [15] hashes subdirectory which had no data forms at the time of writing. The features have been shuffled to show the sorting step in the algorithm.
The algorithm starts by constructing the Features String. For this, the values of the 'var' attributes of the feature nodes are encoded as UTF-8 and suffixed with 0x1f (ASCII Unit Separator). The first three of those features are shown as a hexdump below:
Note the appended 0x1f octet for each of the three strings. Now the strings are ordered using the i;octet collation and concatenated. The result is suffixed with 0x1c (ASCII File Separator), which gives the following hexdump of the final Features String:
For the Identities String, first the character data of the 'category', 'type', 'xml:lang' and 'name' attributes is encoded as UTF-8 and suffixed with 0x1f (ASCII Unit Separator). The resulting individual strings have the following hexdumps:
The strings are now joined together and the result is suffixed with 0x1e (ASCII Record Separator):
Normally, a sorting step would occur here. As the example only has a single string, the sorting and joining is a no-op. The string is now suffixed with 0x1c (ASCII File Separator) to get the Identities String:
The Extensions String is simply the 0x1c (ASCII File Separator) used to terminate it as no extensions are contained in the example. Thus, the final input for the hash function is, as hexdump:
Running this octet string through the hash functions leads as to the following Capability Hash Set:
The data from the example is the shortest entry from the capsdb [15] hashes subdirectory which had data forms and multiple identities at the time of writing. The features have been shuffled to show the sorting step in the algorithm.
We skip over the process for the Features String and only present the final result encoded as base64 for reference:
In the previous example, it was already shown how the individual parts of each <identity/> element are combined. We get the following octet strings as hexdumps:
The second string is ordered before the first string in the i;octet collation and afterwards the strings are joined and the result is suffixed with 0x1c (ASCII File Separator) to close the identities part of the input. The final Identities String is thus, as hexdump:
The example has a Service Discovery Extensions (XEP-0128) [12] form. For each field, a string consisting of the 'var' attributes character data and the values is created as per the algorithm:
The strings need to be sorted using i;octet and joined together. The result is suffixed with 0x1d (ASCII Group Separator), which closes the form. As this is the only form, the resulting Extensions String is obtained by adding a 0x1c (ASCII File Separator) to close the extensions section of the hash input:
Note the "os" field is now before the other fields but after "FORM_TYPE", due to the sorting.
The final hash function input is obtained by concatenating the Features String, Identities String and Extensions String:
Feeding the concatenated octet string as input to the hash functions yields the following Capability Hash Set:
If an entity supports Entity Capabilities 2.0, it MUST advertise the fact by returning a feature of "urn:xmpp:caps".
5.2 Advertisement of Support and Capabilities by Servers¶
A server MAY advertise its support for this protocol as well as the current hashes in the stream features.
When a connected client or peer server sends a service discovery information request to determine the entity capabilities of a server that advertises capabilities via the stream feature, the requesting entity MUST send the disco#info request to the server's JID as provided in the 'from' attribute of the response stream header. To enable this functionality, a server that advertises support for entity capabilities MUST provide a 'from' address in its response stream headers, in accordance with RFC 6120 [5].
5.5 Service Discovery Query for a Specific Hash Value¶
To query the Service Discovery (XEP-0030) [1] information for a specific Capability Hash value, an entity MUST query a Service Discovery node equal to the Capability Hash Node [16].
An entity is free to choose for which Capability Hash of a Capability Hash Set the request is sent.
A server MAY support pushing of Capability Hashes from clients before sending initial presence. This allows servers to discover capabilities of clients before those have sent initial presence, which may be useful or important for some protocols (such as Mediated Information eXchange (MIX) (XEP-0369) [2]). This feature is called Gratuitous Capabilities.
To advertise support, the server publishes the urn:xmpp:caps:gratuitous feature:
After determining server support, a client can send Capability Hashes via Gratuitous Capabilities before sending initial presence:
The server replies with an empty result on success.
The server MUST NOT broadcast the Capability Hashes submitted via Gratuitous Capabilities using presence.
Clients SHOULD NOT send Gratuitous Capabilities after they have sent initial presence; instead, they SHOULD re-send presence to update the Capability Hashes. Otherwise, entities subscribed to the presence will not receive the updated Capability Hashes.
Entities MUST respond to disco#info queries for all Capability Hash Nodes of at least the most recent 3 Capability Hash Sets emitted.
Entities MUST broadcast the Capability Hash Set of the current disco#info it publishes in every non-directed "available" <presence/> they send and SHOULD do so for directed "available" <presence/>.
After initial presence has been sent, entities MUST re-broadcast the Capability Hash Set after their disco#info response changes, but MAY limit the rate at which presences are emitted solely for the purpose of sending new Capability Hash Sets.
Before initial presence has been sent and if the server supports Gratuitous Capabilities, entities SHOULD send Gratuitous Capabilities after their disco#info response changes, but MAY limit the rate at which Gratuitous Capabilities are sent. (For example, a client may load and enable additional functionality (thus changing its features) based on server support and only send Gratuitous Capabilities once all functionality has been set up, not after each individual feature.)
Entities MAY assume that another entity supports Entity Capabilities 2.0 after receiving a Capability Hash Set from that entity.
A Capability Hash MAY be stored alongside with its disco#info in a Capability Hash Cache. A received Capability Hash which has not been verified MUST NOT be stored.
Instead of issuing a Service Discovery (XEP-0030) [1] disco#info <query/> with absent 'node' attribute to a target entity, an entity MAY use a Capability Hash Cache to obtain the response. To look up the disco#info response in the Capability Hash Cache, an entity MUST use a hash from the Capability Hash Set which was most recently received from the entity to which the <query/> would have been sent otherwise. If none of the most recently received Capability Hashes are found in the Capability Hash Cache, the entity MUST fall back to sending the request.
An entity MUST NOT use Capability Hashes which were not included in the most recent Capability Hash Set received from the target entity.
An entity MAY use external data sources to fill the Capability Hash Cache.
An entity MUST ensure that implicit values for xml:lang attributes is preserved when disco#info data is cached. This can for example happen by making the implicit values explicit in the storage.
6.3 Additional Rules for Clients and Servers implementing Caps Optimizations¶
Servers MAY strip off the <c/> element if it has not changed since the previous presence broadcast.
Servers MUST ensure that the first presence notification sent to each subscriber contains the most recent <c/> element, if any were sent in the current presence session.
Servers MUST ensure that every change in the <c/> element is sent to all subscribers.
Clients MAY omit the <c/> element if it has not changed since the last presence iff they determined that their server supports Caps Optimization.
Servers MAY answer disco#info requests for Capability Hash Nodes on behalf of their and others clients if the disco#info response belonging to that Capability Hash is known to them.
Servers MAY implement Query Interception to further optimise bandwidth consumption. The idea is that servers intercept Service Discovery (XEP-0030) [1] disco#info queries sent to clients if they already know the answer from Capability Hashes published by the client. The rules for Query Interception are the following (to be applied in this order):
Servers MUST NOT intercept disco#info queries except those with empty node or a node which refers to a Capability Hash Node known to the server.
Servers MUST NOT intercept disco#info queries on behalf of the resource unless the query would be forwarded to the resource otherwise.
Servers MUST NOT intercept disco#info queries to resources which do not support Entity Capabilities 2.0 (clients not implementing Entity Capabilities 2.0 may legitimately use disco#info nodes matching the format of Capability Hash Nodes for different purposes).
Servers SHOULD intercept disco#info queries with empty node and answer them with the disco#info of the most recent Capability Hash Set published by the client.
Servers SHOULD intercept disco#info queries a valid Capability Hash Nodenode, if the server knows the disco#info for the Capability Hash Node. Otherwise, the query MUST be forwarded to the addressed resource. Note that it is valid for a sevrer to reply for Capability Hash Nodes which have not been published by the resource.
It is RECOMMENDED that entities use the caching mechanisms outlined in the Caching Business Rules. Entities MAY share caches among connections and accounts.
Generating Entities are encouraged to also emit Entity Capabilities (XEP-0115) [3] <c/> elements in their presence updates (as specified in XEP-0115) for a reasonable transition period.
When receiving a Capability Hash Set along with XEP-0115 capabilities, a Processing Entity MAY obtain the disco#info <query/> for verification from a XEP-0115 based cache instead of querying the Generating Entity directly. A Processing Entity MUST NOT use disco#info data from a XEP-0115 cache without verification if a Entity Capabilities 2.0 <c/> element is available.
The codepoints used for separating the different parts in the Hash Function Input Algortihm (0x1c (ASCII File Separator) through 0x1f (ASCII Unit Separator)) are not allowed in well-formed XML 1.0 character data [17]. As entities are, per XMPP Core [5], required to close a stream if non-well-formed XML 1.0 data is received, these codepoints cannot occur in the input to the algorithm and their use as separators is safe.
If the algorithm for constructing the input to the hash function or the used hash function itself allow for cheap collisions, caching the hashes will become dangerous as it allows for cache poisoning. This in turn allows entities to effectively fake disco#info responses of other entities.
This was an issue with Entity Capabilities (XEP-0115) [3] and has been addressed with a new algorithm for generating the hash function input which keeps the structural information of the disco#info input.
An entity MUST NOT ever use disco#info which has not been verified to belong to a Capability Hash obtained from a cache using that Capability Hash. Using cache contents from a trusted source (at the discretion of the entity) counts as verifying.
A malicious entity could send a large amount of Capability Hash Sets in short intervals, while making sure that it provides matching disco#info responses. If a Processing Entity uses caching, this can overflow or thrash the caches. Processing Entities should be aware of this risk and apply proper rate-limiting for processing Capability Hash Sets. To reduce the attack surface, an entity MAY choose to not cache Capability Hashes obtained from entities not in its roster.
As mentioned earlier, when storing disco#info data in a cache for later retrieval, implementations MUST ensure that implicit values for xml:lang attributes are reconstructed correctly when the disco#info is restored.
Entities MAY choose to not send Capability Hash Sets with directed presence (for example to increase privacy). In that case, entities SHOULD also refuse direct Service Discovery (XEP-0030) [1] queries.
The server replies to certain disco#info queries on behalf of the client. This means that the client has no choice on to whom they reply. Otherwise, a client could choose to reply with <service-unavailable/> to mask its existence. We consider two effects of this:
A remote entity could attempt to detect that an entity exists behind a resource. For this, they send a disco#info query to the resource since nearly everyone implements disco#info. As the client responds with <service-unavailable/>, it looks as if no client was present at this resource.
With Query Interception, the server would reply on behalf of the client. However, the consensus in the community is that by measuring the difference between the reply from the server of the resource and the reply from the actual resource, it would generally be possible to detect the existence of a resource.
A remote entity can obtain the disco#info information of any resource which supports Entity Capabilities 2.0 and of which the entity knows the resource.
This cannot be mitigated with Query Interception. The risk is deemed acceptable considering that resources should generally be chosen randomly.
A common way to canonicalize XML which could be used is Canonical XML [18]. It was decided not to use Canonical XML for the following reasons:
Implementing it is quite some effort and not all XML libraries come with an implementation.
It is sensitive to the relative ordering of the elements. The relative ordering of children in disco#info <query/> elements, however, does not matter.
Several children of Service Discovery Extensions (XEP-0128) [12] data forms are deliberately ignored, like instructions and other descriptive text. The descriptive text is not relevant for the information is being conveyed.
Thus, using Canonical XML would require additional, non-trivial software support and still require non-trivial additional canonicalization rules.
Thanks to the authors of Entity Capabilities (XEP-0115) [3] for coming up with the original idea of using presence broadcast to convey service discovery information, as well as the optimization strategies.
Permission is hereby granted, free of charge, to any person obtaining a copy of this specification (the "Specification"), to make use of the Specification without restriction, including without limitation the rights to implement the Specification in a software program, deploy the Specification in a network service, and copy, modify, merge, publish, translate, distribute, sublicense, or sell copies of the Specification, and to permit persons to whom the Specification is furnished to do so, subject to the condition that the foregoing copyright notice and this permission notice shall be included in all copies or substantial portions of the Specification. Unless separate permission is granted, modified works that are redistributed shall not contain misleading information regarding the authors, title, number, or publisher of the Specification, and shall not claim endorsement of the modified works by the authors, any organization or project to which the authors belong, or the XMPP Standards Foundation.
Disclaimer of Warranty
## NOTE WELL: This Specification is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. ##
Limitation of Liability
In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall the XMPP Standards Foundation or any author of this Specification be liable for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising from, out of, or in connection with the Specification or the implementation, deployment, or other use of the Specification (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if the XMPP Standards Foundation or such author has been advised of the possibility of such damages.
IPR Conformance
This XMPP Extension Protocol has been contributed in full conformance with the XSF's Intellectual Property Rights Policy (a copy of which can be found at <https://xmpp.org/about/xsf/ipr-policy> or obtained by writing to XMPP Standards Foundation, P.O. Box 787, Parker, CO 80134 USA).
Visual Presentation
The HTML representation (you are looking at) is maintained by the XSF. It is based on the YAML CSS Framework, which is licensed under the terms of the CC-BY-SA 2.0 license.
The Extensible Messaging and Presence Protocol (XMPP) is defined in the XMPP Core (RFC 6120) and XMPP IM (RFC 6121) specifications contributed by the XMPP Standards Foundation to the Internet Standards Process, which is managed by the Internet Engineering Task Force in accordance with RFC 2026. Any protocol defined in this document has been developed outside the Internet Standards Process and is to be understood as an extension to XMPP rather than as an evolution, development, or modification of XMPP itself.
The following requirements keywords as used in this document are to be interpreted as described in RFC 2119: "MUST", "SHALL", "REQUIRED"; "MUST NOT", "SHALL NOT"; "SHOULD", "RECOMMENDED"; "SHOULD NOT", "NOT RECOMMENDED"; "MAY", "OPTIONAL".
19. The Internet Assigned Numbers Authority (IANA) is the central coordinator for the assignment of unique parameter values for Internet protocols, such as port numbers and URI schemes. For further information, see <http://www.iana.org/>.
20. The XMPP Registrar maintains a list of reserved protocol namespaces as well as registries of parameters used in the context of XMPP extension protocols approved by the XMPP Standards Foundation. For further information, see <https://xmpp.org/registrar/>.