Static HTML
There are cases where you cannot use a framework to localize your content, and all you have is an HTML document. This guide will explain how you can utilize the @transifex/dom Javascript library to :
- Segment an HTML document into phrases for localization
- Send them over to a Transifex Native project for translation
- Translate the HTML page into a target language
@transifex/dom
works both in the browser or in a NodeJS environment using a DOM library such as jsdom, happy-dom and linkedom.
jsdom
is the most compatible server side HTML DOM library, but the slowest. Feel free to explore the alternatives in case performance is a concern.
Make sure you have all the required libraries installed:
$ npm i @transifex/native @transifex/dom jsdom --save
Note: Use Transifex versions
3.2.0
or greater
Send phrases for localization
Given an HTML document, we will try to prepare it for localization.
The following NodeJS code can explain how:
const { TxNativeDOM } = require('@transifex/dom');
const { createNativeInstance } = require('@transifex/native');
const { JSDOM } = require('jsdom');
async function push() {
// Use an isolated Transifex Native instance or use the global "tx" one
const tx = createNativeInstance({
token: 'YOUR-TX-NATIVE-PUBLIC-TOKEN',
secret: 'YOUR-TX-NATIVE-SECRET',
});
// Create a TxNativeDOM instance
const txdom = new TxNativeDOM();
// Grab your HTML code from a url, file or an API request
const html = '<html><body><p>Hello world</p></body></html>';
// Create a jsdom instance
const jsdom = new JSDOM(html);
// Attach the document node to the TxNativeDOM instance
txdom.attachDOM(jsdom.window.document);
// Push source content to Transifex for localization
await tx.pushSource(txdom.getStringsJSON());
}
push();
Get translated HTML
With the phrases translated in Transifex, we can now compose back the translated HTML file.
const { TxNativeDOM } = require('@transifex/dom');
const { createNativeInstance } = require('@transifex/native');
const { JSDOM } = require('jsdom');
async function pull() {
// Use an isolated Transifex Native instance or use the global "tx" one
const tx = createNativeInstance({
token: 'YOUR-TX-NATIVE-PUBLIC-TOKEN',
secret: 'YOUR-TX-NATIVE-SECRET',
});
// Create a TxNativeDOM instance
const txdom = new TxNativeDOM();
// Grab your HTML code from a url, file or an API request
const html = '<html><body><p>Hello world</p></body></html>';
// Create a jsdom instance
const jsdom = new JSDOM(html);
// Attach the document node to the TxNativeDOM instance
txdom.attachDOM(jsdom.window.document);
// Set instance target language. This will fetch translations locally.
await tx.setCurrentLocale('fr');
// Translate HTML document
txdom.toLanguage(tx.getCurrentLocale(), (key) => {
return tx.cache.get(key, tx.getCurrentLocale());
});
// Get the HTML code and do something with it
const translatedHtml = jsdom.serialize();
// Clean-up
jsdom.window.close();
}
pull();
Working with HTML fragments
There are cases that you might need to translate HTML fragments and not whole documents. Here is a trick on how to do this.
const htmlFragment = '<p>Hello world</p>';
const jsdom = new JSDOM('<html><body></body></html>');
jsdom.window.document.body.innerHTML = htmlFragment;
// Attach the document node to the TxNativeDOM instance
txdom.attachDOM(jsdom.window.document);
// ... push or pull content
// Get back the translated fragment
const translatedFragment = jsdom.window.document.body.innerHTML;
How string segmentation works
This is how the HTML content is segmented:
Block HTML tags
Example of block tags are: DIV
, P
, H1
, TABLE
, UL
, OL
etc.
When the content of a block tag is a combination of plain text and inline elements such as SPAN
, all the content is considered a single segment.
HTML:
<div>
<p>This is a paragraph</p>
<p>This is a paragraph with <span>inline element</span></p>
<div>
Segments:
"This is a paragraph"
"This is a paragraph with <span>inline element</span>"
Plain text
When the content of a block tag is NOT a combination of plain text and a tag, only the plain text content is extracted.
HTML:
<div>
<p>
<span>My span text</span>
<span>Another span text</span>
</p>
</div>
Segments:
"My span text"
"Another span text"
CSS Data Binding On The Angular or React Framework
CSS styles may also be used for data binding on the Angular or React framework. The DOM model is used to decipher the Angular or React framework. This entails that the text inside the Inline Element, is directly controlled by the Angular/React framework, as opposed to being modified through HTML (e.g. in the case of jQuery). Because of this, when a block tag is a combination of plain text and inline elements such as SPAN
that use a data binding based. CSS style that is based on the Angular/React framework, the text attribute needs to be evaluated separately. This results in the creation of multiple segments.
HTML:
<div>
<p>This is a paragraph</p>
<p>This is a paragraph with <span class="AngularReact">an inline element</span></p>
<div>
Segments:
"This is a paragraph"
"This is a paragraph with"
"an inline element"
Page title
HTML:
<title>My title</title>
Segments:
"My title"
Anchor titles
HTML:
<a title="My title">..</a>
Segments:
"My title"
Image titles and alt text
HTML:
<img title="My title" alt="My alt text"/>
Segments:
"My title"
"My alt text"
Input values and placeholders
Input values are only detected for inputs with type button, reset, submit.
Textarea placeholders
HTML:
<textarea placeholder="My placeholder text">
Segments:
"My placeholder text"
Meta keywords and descriptions
HTML:
<meta name="keywords" content="tag1, tag2, tag3">
<meta name="description" content="My page description">
<meta name="title" content="My page title" >
<meta property="og:title" content="Localization Platform for Translating Digital Content | Transifex">
<meta property="og:description" content="Integrate with Transifex to manage the creation of multilingual websites and app content. Order translations, see translation progress, and tools like TM.">
Segments:
"tag1, tag2, tag3"
"My page description"
"My page title"
"Localization Platform for Translating Digital Content | Transifex"
"Integrate with Transifex to manage the creation of multilingual websites and app content. Order translations, see translation progress, and tools like TM."
Input elements of "image" type
HTML:
<input type="image" alt="Submit">
Segments:
"Submit"
SVG elements
SVG tags may contain some nested TEXT tags which are parsed and their strings extracted, but there is no MARKING in the UI for these elements. However, when you mouse over these elements, the options for the strings are shown (ignore string, follow link, etc.).
Elements that are ignored: script, style, link, iframe, noscript, canvas, audio, video, code
.
Social widgets such as Facebook and Twitter that have tags with class names facebook_container
and twitter_container
are also ignored.
How to handle non-translatable content
You can manually define a block or node as non-translatable by adding a notranslate
class.
For example:
<div class="notranslate">This content will not be translated</div>
Marking attributes for translation
Apart from the attributes that are automatically detected for translations, you can define custom attributes for translation using the tx-attrs="attr1, attr2,..."
attribute.
Before:
HTML:
<span title="My title" data-content="My data content">
Segments: Nothing detected
After:
HTML:
<span title="My title" data-content="My data"
tx-attrs="title, data-content">
Segments:
"My title"
"My data"
How to tag strings in the source language
You can automatically tag source strings by using the tx-tags="tag1, tag2,..."
attribute.
These tags propagate to child elements as well.
For example:
<div tx-tags="marketing">...</div>
How to handle inline block variables
To define variables or placeholders within a block that shouldn't be translated, use class="notranslate"
in the variable nodes or encapsulate them inside var
tags.
For example:
HTML:
Hi, you are visitor <span class="notranslate">142</span>
Hi, you are visitor <var>341</var>
Segments:
"Hi, you are visitor {{0}}"
How to handle URLs as translatable content
When images <img>
or links <a>
appear within a segment, their URLs are handled by default as non-translatable content (i.e variables).
Translating images
To translate an image you should treat its URL as translatable text. To do so, use the special directive tx-content="translate_urls"
to enable this functionality for a node and its children.
Before:
HTML:
<div>
<img src="/uploads/smiley.jpg" alt="Smiley face" width="42" height="42">
</div>
Segments:
"<img src="{{0}}" alt="Smiley face" width="42" height="42">"
After:
HTML:
<div tx-content="translate_urls">
<img src="/uploads/smiley.jpg" alt="Smiley face" width="42" height="42">
</div>
Segments:
"<img src="/uploads/smiley.jpg" alt="Smiley face" width="42" height="42">"
Translating links
To translate a link you should treat each URL as translatable text. To do so, use the special directive tx-content="translate_urls"
to enable this functionality for a node and its children.
Before:
HTML:
<div>
Click to go to the <a href="/features">features</a> page
</div>
Segments:
"Click to go to the <a href="{{0}}">features</a> page"
After:
HTML:
<div tx-content="translate_urls">
Click to go to the <a href="/features">features</a> page
</div>
Segments:
"Click to go to the <a href="/features">features</a> page"
Tip: To treat ALL URLs as translatable content within a page, add the
tx-content="translate_urls"
to the openingBODY
tag.
How to define custom variables
If you want to use your own custom patterns and you are looking for a way to ignore such text handling this as a variable, then you can add custom rules on how variables are handled within a string segment.
For example:
const txdom = new TxNativeDOM({
variablesParser: (text, fn) => {
// Example of replacing the value of an s-href attribute with a variable.
// Input: Hello <a s-href="doc:example">Click here</a>
// Output: Hello <a s-href="{{0}}">Click here</a>
// We use a regular expression to match the attribute
text = text.replace(/s-href="([^"]*)"/g, (a, b) => {
// Group a contains: s-href="doc:example"
// Group b contains: doc:example
// fn function registers the content of "doc:example" as variable
// in Live and returns a variable expression to replace it: {{0}}
return a.replace(b, fn(b));
});
return text;
},
});
How to fine tune translatable content
For even finer control over how strings are detected, use the tx-content
HTML attribute, which can contain the following values:
exclude
to mark a node and its children to be excluded from string detectioninclude
to mark a node and its children within a exclude block to be included in string detectionblock
to mark a node and its children to be detected as a single stringnotranslate_urls
to mark a node and its children to handle URLs as variables (default)translate_urls
to mark a node and its children that URLs should be translated
Include/exclude example.
Before:
HTML:
<div>
<p>First text</p>
<p>Second text</p>
<p>Third text</p>
</div>
Segments:
"First text"
"Second text"
"Third text"
After:
HTML:
<div tx-content="exclude">
<p>First text</p>
<p tx-content="include">Second text</p>
<p>Third text</p>
</div>
Segments:
"Second text"
Block example.
Before:
HTML:
<div>
<h1>A header</h1>
<p>A paragraph</p>
</div>
Segments:
"A header"
"A paragraph"
After:
HTML:
<div tx-content="block">
<h1>A header</h1>
<p>A paragraph</p>
</div>
Segments:
"<h1>A header</h1><p>A paragraph</p>"
Note: Strings that match the following regular expression are ignored:
^( |\s|\d|[-/:-?~@#!"^_`.,[]])*$
Updated over 2 years ago