Static HTML

There are cases where you cannot use a framework to localize your content, and all you have is an HTML document. This guide will explain how you can utilize the @transifex/dom Javascript library to :

  • Segment an HTML document into phrases for localization
  • Send them over to a Transifex Native project for translation
  • Translate the HTML page into a target language

@transifex/dom works both in the browser or in a NodeJS environment using a DOM library such as jsdom, happy-dom and linkedom.

jsdom is the most compatible server side HTML DOM library, but the slowest. Feel free to explore the alternatives in case performance is a concern.

Make sure you have all the required libraries installed:

$ npm i @transifex/native @transifex/dom jsdom --save

Note: Use Transifex versions 3.2.0 or greater

Send phrases for localization

Given an HTML document, we will try to prepare it for localization.
The following NodeJS code can explain how:

const { TxNativeDOM } = require('@transifex/dom');
const { createNativeInstance } = require('@transifex/native');
const { JSDOM } = require('jsdom');

async function push() {
  // Use an isolated Transifex Native instance or use the global "tx" one
  const tx = createNativeInstance({
    token: 'YOUR-TX-NATIVE-PUBLIC-TOKEN',
    secret: 'YOUR-TX-NATIVE-SECRET',
  });

  // Create a TxNativeDOM instance
  const txdom = new TxNativeDOM();

  // Grab your HTML code from a url, file or an API request
  const html = '<html><body><p>Hello world</p></body></html>';
 
  // Create a jsdom instance
  const jsdom = new JSDOM(html);

  // Attach the document node to the TxNativeDOM instance
  txdom.attachDOM(jsdom.window.document);

  // Push source content to Transifex for localization
  await tx.pushSource(txdom.getStringsJSON());
}

push();

Get translated HTML

With the phrases translated in Transifex, we can now compose back the translated HTML file.

const { TxNativeDOM } = require('@transifex/dom');
const { createNativeInstance } = require('@transifex/native');
const { JSDOM } = require('jsdom');

async function pull() {
  // Use an isolated Transifex Native instance or use the global "tx" one
  const tx = createNativeInstance({
    token: 'YOUR-TX-NATIVE-PUBLIC-TOKEN',
    secret: 'YOUR-TX-NATIVE-SECRET',
  });

  // Create a TxNativeDOM instance
  const txdom = new TxNativeDOM();

  // Grab your HTML code from a url, file or an API request
  const html = '<html><body><p>Hello world</p></body></html>';
 
  // Create a jsdom instance
  const jsdom = new JSDOM(html);

  // Attach the document node to the TxNativeDOM instance
  txdom.attachDOM(jsdom.window.document);

  // Set instance target language. This will fetch translations locally.
  await tx.setCurrentLocale('fr');

  // Translate HTML document
  txdom.toLanguage(tx.getCurrentLocale(), (key) => {
    return tx.cache.get(key, tx.getCurrentLocale());
  });

  // Get the HTML code and do something with it
  const translatedHtml = jsdom.serialize();

  // Clean-up
  jsdom.window.close();
}

pull();

Working with HTML fragments

There are cases that you might need to translate HTML fragments and not whole documents. Here is a trick on how to do this.

const htmlFragment = '<p>Hello world</p>';

const jsdom = new JSDOM('<html><body></body></html>');
jsdom.window.document.body.innerHTML = htmlFragment;

// Attach the document node to the TxNativeDOM instance
txdom.attachDOM(jsdom.window.document);

// ... push or pull content

// Get back the translated fragment
const translatedFragment = jsdom.window.document.body.innerHTML;

How string segmentation works

This is how the HTML content is segmented:

Block HTML tags

Example of block tags are: DIV, P, H1, TABLE, UL, OL etc.

When the content of a block tag is a combination of plain text and inline elements such as SPAN, all the content is considered a single segment.

HTML:
  <div>
    <p>This is a paragraph</p>
    <p>This is a paragraph with <span>inline element</span></p>
  <div>

Segments:
  "This is a paragraph"
  "This is a paragraph with <span>inline element</span>"

Plain text

When the content of a block tag is NOT a combination of plain text and a tag, only the plain text content is extracted.

HTML:
  <div>
    <p>
      <span>My span text</span>
      <span>Another span text</span>
    </p>
  </div>

Segments:
  "My span text"
  "Another span text"

CSS Data Binding On The Angular or React Framework

CSS styles may also be used for data binding on the Angular or React framework. The DOM model is used to decipher the Angular or React framework. This entails that the text inside the Inline Element, is directly controlled by the Angular/React framework, as opposed to being modified through HTML (e.g. in the case of jQuery). Because of this, when a block tag is a combination of plain text and inline elements such as SPAN that use a data binding based. CSS style that is based on the Angular/React framework, the text attribute needs to be evaluated separately. This results in the creation of multiple segments.

HTML:
<div>
  <p>This is a paragraph</p>
  <p>This is a paragraph with <span class="AngularReact">an inline element</span></p>
<div>

Segments:
"This is a paragraph"
"This is a paragraph with"
"an inline element"

Page title

HTML:
  <title>My title</title>

Segments:
  "My title"

Anchor titles

HTML:
  <a title="My title">..</a>

Segments:
  "My title"

Image titles and alt text

HTML:
  <img title="My title" alt="My alt text"/>

Segments:
  "My title"
  "My alt text"

Input values and placeholders

Input values are only detected for inputs with type button, reset, submit.

Textarea placeholders

HTML:
  <textarea placeholder="My placeholder text">

Segments:
  "My placeholder text"

Meta keywords and descriptions

HTML:
  <meta name="keywords" content="tag1, tag2, tag3">
  <meta name="description" content="My page description">
  <meta name="title" content="My page title" >
  <meta property="og:title" content="Localization Platform for Translating Digital Content | Transifex">
  <meta property="og:description" content="Integrate with Transifex to manage the creation of multilingual websites and app content. Order translations, see translation progress, and tools like TM.">

Segments:
  "tag1, tag2, tag3"
  "My page description"
  "My page title"
  "Localization Platform for Translating Digital Content | Transifex"
  "Integrate with Transifex to manage the creation of multilingual websites and app content. Order translations, see translation progress, and tools like TM."

Input elements of "image" type

HTML:
  <input type="image" alt="Submit">

Segments:
  "Submit"

SVG elements

SVG tags may contain some nested TEXT tags which are parsed and their strings extracted, but there is no MARKING in the UI for these elements. However, when you mouse over these elements, the options for the strings are shown (ignore string, follow link, etc.).

Elements that are ignored: script, style, link, iframe, noscript, canvas, audio, video, code.

Social widgets such as Facebook and Twitter that have tags with class names facebook_container and twitter_container are also ignored.

How to handle non-translatable content

You can manually define a block or node as non-translatable by adding a notranslate class.

For example:

<div class="notranslate">This content will not be translated</div>

Marking attributes for translation

Apart from the attributes that are automatically detected for translations, you can define custom attributes for translation using the tx-attrs="attr1, attr2,..." attribute.

Before:

HTML:
  <span title="My title" data-content="My data content">

Segments: Nothing detected

After:

HTML:
  <span title="My title" data-content="My data"
        tx-attrs="title, data-content">

Segments:
  "My title"
  "My data"

How to tag strings in the source language

You can automatically tag source strings by using the tx-tags="tag1, tag2,..." attribute.

These tags propagate to child elements as well.

For example:

<div tx-tags="marketing">...</div>

How to handle inline block variables

To define variables or placeholders within a block that shouldn't be translated, use class="notranslate" in the variable nodes or encapsulate them inside var tags.

For example:

HTML:
  Hi, you are visitor <span class="notranslate">142</span>
  Hi, you are visitor <var>341</var>

Segments:
  "Hi, you are visitor {{0}}"

How to handle URLs as translatable content

When images <img> or links <a> appear within a segment, their URLs are handled by default as non-translatable content (i.e variables).

Translating images

To translate an image you should treat its URL as translatable text. To do so, use the special directive tx-content="translate_urls" to enable this functionality for a node and its children.

Before:

HTML:
  <div>
    <img src="/uploads/smiley.jpg" alt="Smiley face" width="42" height="42">
  </div>

Segments:
  "<img src="{{0}}" alt="Smiley face" width="42" height="42">"

After:

HTML:
  <div tx-content="translate_urls">
    <img src="/uploads/smiley.jpg" alt="Smiley face" width="42" height="42">
  </div>

Segments:
  "<img src="/uploads/smiley.jpg" alt="Smiley face" width="42" height="42">"

Translating links

To translate a link you should treat each URL as translatable text. To do so, use the special directive tx-content="translate_urls" to enable this functionality for a node and its children.

Before:

HTML:
  <div>
    Click to go to the <a href="/features">features</a> page
  </div>

Segments:
  "Click to go to the <a href="{{0}}">features</a> page"

After:

HTML:
  <div tx-content="translate_urls">
    Click to go to the <a href="/features">features</a> page
  </div>

Segments:
  "Click to go to the <a href="/features">features</a> page"

Tip: To treat ALL URLs as translatable content within a page, add the tx-content="translate_urls" to the opening BODY tag.

How to define custom variables

If you want to use your own custom patterns and you are looking for a way to ignore such text handling this as a variable, then you can add custom rules on how variables are handled within a string segment.

For example:

const txdom = new TxNativeDOM({
  variablesParser: (text, fn) => {
    // Example of replacing the value of an s-href attribute with a variable.
    // Input: Hello <a s-href="doc:example">Click here</a>
    // Output: Hello <a s-href="{{0}}">Click here</a>
    // We use a regular expression to match the attribute
    text = text.replace(/s-href="([^"]*)"/g, (a, b) => {
      // Group a contains: s-href="doc:example"
      // Group b contains: doc:example
      // fn function registers the content of "doc:example" as variable
      // in Live and returns a variable expression to replace it: {{0}}
      return a.replace(b, fn(b));
    });
    return text;
  },
});

How to fine tune translatable content

For even finer control over how strings are detected, use the tx-content HTML attribute, which can contain the following values:

  • exclude to mark a node and its children to be excluded from string detection
  • include to mark a node and its children within a exclude block to be included in string detection
  • block to mark a node and its children to be detected as a single string
  • notranslate_urls to mark a node and its children to handle URLs as variables (default)
  • translate_urls to mark a node and its children that URLs should be translated

Include/exclude example.

Before:

HTML:
  <div>
    <p>First text</p>
    <p>Second text</p>
    <p>Third text</p>
  </div>

Segments:
  "First text"
  "Second text"
  "Third text"

After:

HTML:
  <div tx-content="exclude">
    <p>First text</p>
    <p tx-content="include">Second text</p>
    <p>Third text</p>
  </div>

Segments:
  "Second text"

Block example.

Before:

HTML:
  <div>
    <h1>A header</h1>
    <p>A paragraph</p>
  </div>

Segments:
  "A header"
  "A paragraph"

After:

HTML:
  <div tx-content="block">
    <h1>A header</h1>
    <p>A paragraph</p>
  </div>

Segments:
  "<h1>A header</h1><p>A paragraph</p>"

Note: Strings that match the following regular expression are ignored:
^( |\s|\d|[-/:-?~@#!"^_`.,[]])*$