Localization Using LLMs - Sitecloud Blog

Localization and internationalization are, very generally, the processes of translating an application, website, or blog into other languages. To be more specific, internationalization is the process of designing software to be adaptable to multiple languages, while localization is the adaptation of that software for a specific region or language. This process is often abbreviated as i18n.

As you can imagine, it is a key process in any market strategy, as it allows you to reach a broader audience and provide better service to customers.

Our commercial website is developed with Astro, one of our favorite frameworks. It allows us to combine components in pure HTML or with other frameworks like React. Additionally, it is easy to integrate with other services and generates highly optimized static sites.

An example of integration on our website is our pricing API. Every time we launch a deal or a new product, we update our prices and need these changes to be reflected on the site. It’s an interesting internal process that we’ll dedicate a post to in the future, but now let’s continue with the topic of localizing our products.

LLMs for Localization

By now, most people know about LLMs (Large Language Models), or at least their more popular versions, like ChatGPT or DeepSeek. We use these tools in various internal processes, both in market strategies (go-to-market) and engineering tasks. Our experience has shown us that they are very useful tools, but we always use them judiciously, without delegating absolute control to them.

For example, for our blog, we use ChatGPT to edit the texts. Our approach is to write the posts manually and then use an LLM to analyze them and suggest improvements. We go through several iterations until we reach a result we are satisfied with.

For localization, we follow a similar approach, but with scripts. Currently, we offer our products in Spanish and English, two languages in which the models from OpenAI provide high-quality translations.

How We Localize at Sitecloud

As mentioned, we use Astro for our commercial website, which includes built-in support for internationalization. You can find more details in its documentation: Astro i18n.

Another essential tool in our toolkit is a command-line utility created by Simon Willison, simply called llm. To install it, you need Python or Brew (if you’re on macOS). This tool allows you to chain commands and send prompts to various language models, such as GPT-4o, GPT-3.5, or Llama. In our case, we use GPT-4o.

The localization process is quite simple. First, we write all the texts in Spanish (we are a company from Spain). The text strings are stored in a specific file for this language. For example, the navigation bar has the following localization keys:

export default {
  "navbar.product": "Products",
  "navbar.features": "Features",
  "navbar.pricing": "Pricing",
  "navbar.company": "Company",
}

To translate this file into English with GPT-4o from the command line, we use:

cat ./src/i18n/ui.es.ts | llm "Translate to English the following TypeScript localization dictionary. Do not include any headers, just write the new translated dictionary. Do not include markdown." > ./src/i18n/ui.en.ts

In the prompt, we instruct the model to generate only the translated code, without headers or additional decorations. The result is a valid TypeScript file with the localization in English.

To simplify the process, we have a Node.js script that executes this command on demand. Then, we conduct a manual inspection before pushing the changes to our code repository.

Security in the Web Application

As mentioned, we perform a visual inspection before sending the translations to the repository, as the model may make mistakes or generate invalid code. Additionally, when bundling with Astro, it checks that the code is correct and fails if it detects errors.

The next validation step occurs before merging into GitHub’s main branch. This is our second layer of security, where we review possible translation or syntax errors. Once the changes are approved, they are automatically deployed to production.

Commercial Portal

For the blog, we follow the same process, but we adjust the prompt to handle files in extended Markdown (MDX) format. For example, to translate a blog post, we use:

cat lanzamiento-wordpress.mdx | llm "Translate the following MDX file to English" > launching-wordpress.mdx

Then the inspection process would be the same, always requiring us to validate the translation and any syntax errors.

Conclusion

We hope these ideas are useful for localizing your applications. This process can be applied to any type of software project. Not only is it simple, but it is also cost-effective, as the cost per token in localization is low, given that only new content is translated.

Although the process could be fully automated, we do not recommend it. It is essential to validate translations to ensure that they are accurate and of high quality.