This specification defines an XMPP protocol extension for providing language translation facilities over XMPP. It supports human, machine, client-based, and server-based translations.
NOTICE: The protocol defined herein is a Draft Standard of the XMPP Standards Foundation. Implementations are encouraged and the protocol is appropriate for deployment in production systems, but some changes to the protocol are possible before it becomes a Final Standard.
There currently exists no standard for describing language translations over a text chat protocol. While numerous products and services exist to provide translation of text, there exists no standardized protocol extension for requesting a translation and expressing the details of the translation over XMPP (see XMPP Core [1]). This document describes how to express a translation and its components in an XMPP message as well as a method to request translation.
Direct translation can be realized by either client-side translation before sending or transparent components translating messages on the fly. Discovering XMPP entities capable of translation allows for clients to request translation from them based on their capabilities. The remote XMPP entity could be either an automated translation service or a human providing translation.
This is the message text that was originally created by the sender. This is the text that is translated.
Translated Text
This is the message text that has been translated by the language translation engines. This also called the destination text. For any given message there can be multiple destination text message bodies.
Pivot Language
Pivoting is the process of using one or more intermediate languages to translate from a given source language to a specific destination language. For example, if you needed to translate from English to Russian but only had translators that went from English to French and French to Russian then you could use French as a pivot language.
Pivot Text
This is the translated text of the original message in a pivot language. For any given destination language, there can be zero or more pivot text bodies. The ordering of pivoting is required to be specified for the destination language.
Language Translation Engine
Since not all language translation engines are the same quality it is important to some classes of users that they know what translation engine was used. It is equally important to also be able to select a specific translation engine for a given language pairing if more than one engine is available.
Language Translation Character Set
Some language translation engines can only translate text between languages if certain character sets (or code pages) are used.
Language Translation Dictionary
In order to enhance the accuracy of translation engines most support the concept of mission specific dictionaries.
A message directly translated by the originating XMPP entity or a transparent XMPP entity delivered to a remote entity with only the required elements of source and destination language; this is the simplest case for a translation from one language to another. The source language is known because there is no <translation/> tag describing it. Three translation methods are supported by doing the following:
If no 'engine' attribute is present, then manual (or human) translation was performed.
If an 'engine' attribute is present then machine (or automated) translation was performed, where the translation engine is identified by the value of the 'engine' attribute. If the 'engine' attribute is present its value is an empty string, then the name of the translation engine was not available.
If the 'engine' attribute and the 'reviewed' attribute are present, then machine translation was performed but the message text was reviewed and possibly modified by a human.
A message translated by the originating XMPP entity or a transparent XMPP entity delivered to a remote entity with the pivot languages used to accomplish the translation. The source language is known because there is no <x/> translation tag describing it. When a translation is done via a pivot language, the pivot languages and their order of use MUST be specified.
A message translated by the originating XMPP entity or a transparent XMPP entity delivered to a remote entity using pivot languages and machine translation. The source language is known because there is no <x/> translation tag describing it.
4.2.1 Discovering Translation Providers On a Server¶
When connected to a server, a XMPP entity can locate translation providers by asking a server which translation providers are attached to the server; this MUST be done using Service Discovery (XEP-0030) [2]. The server SHOULD return the availability of of translation providers and language pairings for which the user has rights to use.
Service Discovery is used to determine if a JID provides translation services. The JID can also be a bot (e.g., <towerofbabel@shakespeare.lit>) or a server component (e.g., <translation.shakespeare.lit>).
The supported languages and other details for the service must be known to use it. It is permissible for a translation service to provide multiple translation engines for the same language pairing -- if this is done, then a separate <item/> tag MUST be used for each pairing. A 'dictionary' attribute MAY be used to specify the dictionary for a specific <item/>. In order to specify more than one dictionary for a given language pairing then a separate <item/> tag MUST be used for each dictionary specification for that language pairing.
To request service from a translation provider you can send a message to a provider requesting translations. The lack of a 'source_lang' attribute in the <translation/> element indicates a request for a translation.
4.3.2 Requesting a Translation With Multiple Destination Languages¶
4.3.3 Requesting a Translation With a Specific Dictionary¶
If a specific dictionary is required you MAY request a dictionary. This SHOULD have been returned when discoing the server although a dictionary MAY be requested which was not. The dictionaries are translation engine specific and are free form text.
If the translation service cannot complete the translation it SHOULD return a <item-not-found/> error indicating some part of the translation request was problematic, unless doing so would violate the privacy and security considerations in XMPP Core and XMPP IM, or local security and privacy policies.
If privacy or security considerations make returning an <item-not-found/> error not feasible it SHOULD return a <service-unavailable/> error.
In order to reduce user confusion and misunderstanding of a translated message body, it is RECOMMENDED that implementations of langtran implement the following user interface features.
Translated messages should be clearly identified as being a translation.
The display of translated message should clearly show how (automated, manual, automated with human review) a messaged was translated.
The display of a message should clearly show if the translation is the destination, original or pivot language.
If pivoting is used, the destination message text should be marked in such a way as to indicate that it was translated on one or more pivot languages, what those language are, in what order they were used, and the actual pivot language text should be accessible to the user.
It is recommended that only one level of pivoting be used as quality of the destination translation degrades significantly after each pivot.
Note: The 'reviewed' and 'pivotable' attributes are of type "boolean" and MUST be handled accordingly. [3]
In order to properly process multi-language messages, clients MUST implement support for multiple message bodies differentiated by the 'xml:lang' attribute as described in RFC 6120.
Potential attacks may be easier against services that implement translation because of the potential disclosure of information regarding language pairings, engines, and dictionaries used however no specific vulnerabilities are introduced.
This possible weakness can be mitigated by not returning specifics to requesting entities and the responding entity MAY perform authorization checks in order to determine how to respond.
Note: Before version 1.1 of this specification, the name of the items namespace was urn:xmpp:langtrans#items, however the '#' character is not recommended in URN syntax (see RFC 2141 [6]) so the name was changed to urn:xmpp:langtrans:items.
Permission is hereby granted, free of charge, to any person obtaining a copy of this specification (the "Specification"), to make use of the Specification without restriction, including without limitation the rights to implement the Specification in a software program, deploy the Specification in a network service, and copy, modify, merge, publish, translate, distribute, sublicense, or sell copies of the Specification, and to permit persons to whom the Specification is furnished to do so, subject to the condition that the foregoing copyright notice and this permission notice shall be included in all copies or substantial portions of the Specification. Unless separate permission is granted, modified works that are redistributed shall not contain misleading information regarding the authors, title, number, or publisher of the Specification, and shall not claim endorsement of the modified works by the authors, any organization or project to which the authors belong, or the XMPP Standards Foundation.
Disclaimer of Warranty
## NOTE WELL: This Specification is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. ##
Limitation of Liability
In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall the XMPP Standards Foundation or any author of this Specification be liable for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising from, out of, or in connection with the Specification or the implementation, deployment, or other use of the Specification (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if the XMPP Standards Foundation or such author has been advised of the possibility of such damages.
IPR Conformance
This XMPP Extension Protocol has been contributed in full conformance with the XSF's Intellectual Property Rights Policy (a copy of which can be found at <https://xmpp.org/about/xsf/ipr-policy> or obtained by writing to XMPP Standards Foundation, P.O. Box 787, Parker, CO 80134 USA).
Visual Presentation
The HTML representation (you are looking at) is maintained by the XSF. It is based on the YAML CSS Framework, which is licensed under the terms of the CC-BY-SA 2.0 license.
The Extensible Messaging and Presence Protocol (XMPP) is defined in the XMPP Core (RFC 6120) and XMPP IM (RFC 6121) specifications contributed by the XMPP Standards Foundation to the Internet Standards Process, which is managed by the Internet Engineering Task Force in accordance with RFC 2026. Any protocol defined in this document has been developed outside the Internet Standards Process and is to be understood as an extension to XMPP rather than as an evolution, development, or modification of XMPP itself.
The following requirements keywords as used in this document are to be interpreted as described in RFC 2119: "MUST", "SHALL", "REQUIRED"; "MUST NOT", "SHALL NOT"; "SHOULD", "RECOMMENDED"; "SHOULD NOT", "NOT RECOMMENDED"; "MAY", "OPTIONAL".
3. In accordance with Section 3.2.2.1 of XML Schema Part 2: Datatypes, the allowable lexical representations for the xs:boolean datatype are the strings "0" and "false" for the concept 'false' and the strings "1" and "true" for the concept 'true'; implementations MUST support both styles of lexical representation.
4. The Internet Assigned Numbers Authority (IANA) is the central coordinator for the assignment of unique parameter values for Internet protocols, such as port numbers and URI schemes. For further information, see <http://www.iana.org/>.
5. The XMPP Registrar maintains a list of reserved protocol namespaces as well as registries of parameters used in the context of XMPP extension protocols approved by the XMPP Standards Foundation. For further information, see <https://xmpp.org/registrar/>.
With author approval, the XMPP Registrar changed the items namespace from urn:xmpp:langtrans#items to urn:xmpp:langtrans:items because # is not recommended in URN syntax.
Modified semantics to use IQ stanzas for communication with servers; changed dst_lang to destination_lang and src_lang to source_lang; changed destination to destination_lang and derived_from to source_lang.