Data

Designing Databases for i18n and l10n in Global Services

jin@catsriding.com
Feb 23, 2025
Published byJin
Designing Databases for i18n and l10n in Global Services

Designing a Database for Localized Content

To support global services, designing a database that efficiently stores and manages multilingual data is essential. This goes beyond storing translated strings—it requires careful consideration of region-specific content, performance, and scalability. Various approaches exist, such as using separate tables, expanding columns, storing JSON structures, or leveraging key-value stores. Understanding the strengths, limitations, and appropriate use cases of each method is critical.

This post analyzes database design strategies with internationalization (i18n) and localization (l10n) in mind and explores how to apply them in real-world scenarios.

1. Various localization strategies

When designing a system that requires multilingual support, the database layer can adopt several strategies. Each approach differs in performance, scalability, and maintainability, making it important to choose the right one based on the use case. This section outlines three common strategies.

1-1. Column expansion strategy

The column expansion strategy is one of the simplest ways to store multilingual data. This approach adds separate columns for each language within the same table. For example, a product_name field might be expanded to product_name_ko, product_name_en, and product_name_ja to support Korean, English, and Japanese respectively.

designing-a-database-for-localized-content_00.png

This strategy is intuitive and easy to implement, especially in the early stages of a project. Queries remain straightforward since language-specific data resides in a single table. However, as the number of supported languages increases, the schema becomes bloated, potentially degrading performance. Moreover, adding new languages requires altering the table structure, limiting scalability.

1-1-1. Strengths and weaknesses of the column expansion strategy

The column expansion strategy is ideal when only a few languages need to be supported. It allows for fast retrieval of language-specific data without joins and keeps SQL queries simple. However, it lacks long-term flexibility. Every new language requires schema changes, and as column count grows, index efficiency can drop and table scans may become more expensive depending on the database engine.

  • Fast data retrieval: All language variants are stored in the same row, enabling fast access without joins.
  • Simple SQL: Queries are easy to write and maintain.
  • Easy to adopt: Can be quickly integrated into existing systems with minimal changes.

However, the strategy has notable downsides:

  • Limited scalability: Adding a new language requires modifying the schema, which makes long-term maintenance more difficult. As more languages are added, the table grows in width, increasing storage use and potentially impacting query performance.
  • Column bloat: The more languages supported, the more columns are added, making the structure harder to manage.
  • NULL inflation: If translations for some languages are missing, their respective columns will often be left as NULL, leading to sparsity.
1-1-2. Table schema for the column expansion strategy

In this strategy, separate columns are defined for each translatable field in every supported language. The following schema demonstrates how this might be implemented:

ColumnExpansionStrategy
create table products
(
    id              bigint auto_increment primary key,
    product_name_ko varchar(255),  -- Product name in Korean
    product_name_en varchar(255),  -- Product name in English
    product_name_ja varchar(255),  -- Product name in Japanese
    description_ko  varchar(1000), -- Product description in Korean
    description_en  varchar(1000), -- Product description in English
    description_ja  varchar(1000), -- Product description in Japanese
    price           decimal(10, 2)
);

When inserting a product, values for all languages should be provided:

ColumnExpansionStrategy
insert into products (product_name_ko, product_name_en, product_name_ja, description_ko, description_en, description_ja, price)
values ('스마트폰', 'Smartphone', 'スマートフォン', '최신 기술이 적용된 고급 스마트폰입니다.', 'A high-end smartphone with advanced features.', '最新技術が搭載された高級スマートフォンです。', 999.99);

To retrieve content in a specific language, use a query like this:

ColumnExpansionStrategy
select
    id                      as `Product ID`,
    product_name_en         as `Product Name`,
    products.description_en as `Product Description`
from
    products
where
    id = 1;

The result would be:

Product IDProduct NameProduct Description
1SmartphoneA high-end smartphone with advanced features.

As shown, the column expansion strategy enables fast, join-free queries for each language. However, its rigidity in accommodating new languages can become a significant limitation over time.

1-2. Column-based translation table strategy

The column-based translation table strategy stores multilingual data in a separate table, rather than using individual columns within the main table. Fields that require translation—such as product names or descriptions—are not stored directly in the primary entity table. Instead, the application retrieves localized content by referencing a translation table.

designing-a-database-for-localized-content_01.png
1-2-1. Strengths and weaknesses of the column-based translation table strategy

This approach allows multilingual support to be introduced with minimal changes to the existing data model. It reduces redundancy by centralizing translation data and supports easy expansion: new languages can be added by inserting data rather than altering the schema. However, it typically requires joins between the main and translation tables, which may affect performance and increase query complexity.

  • Minimal changes to existing schema: New languages can be supported without modifying table structures—only new data needs to be inserted.
  • Reusability of translation entries: When the same translated value is used in multiple places, a single reference can be reused.
  • Consistency of translated data: Centralized management of translations ensures consistent messaging across the application.

There are also some trade-offs to consider:

  • Increased SQL complexity: Retrieving multilingual data requires joining with the translation table. This can lead to more complex queries, especially when multiple fields (e.g., product name and description) each require a join.
  • Potential for NULLs: If a translation for a particular language is missing, the corresponding field in the translation table will return NULL.
1-2-2. Table schema for the column-based translation table strategy

In this model, translations are stored in a separate table, while the main entity table keeps reference IDs to the appropriate translations. This allows multilingual content to be managed efficiently and flexibly. Below is an example schema:

ColumnBasedTranslationTableStrategy
create table translations
(
    id      bigint auto_increment primary key,
    text_ko varchar(1000), -- Translated text in Korean
    text_en varchar(1000), -- Translated text in English
    text_ja varchar(1000)  -- Translated text in Japanese
);

create table products
(
    id                         bigint auto_increment primary key,
    category_id                bigint         not null,
    price                      decimal(10, 2) not null,
    image_url                  varchar(255)   not null,
    name_translation_id        bigint         not null,
    description_translation_id bigint         not null,
    foreign key (name_translation_id) references translations (id),
    foreign key (description_translation_id) references translations (id)
);

Here, the products table does not contain any multilingual fields directly. Instead, it refers to entries in the translations table.

ColumnBasedTranslationTableStrategy
-- Insert translations (each row contains all language variants)
insert into 
    translations (text_ko, text_en, text_ja)
values
    ('스마트폰', 'Smartphone', 'スマートフォン'),
    ('최신 기술이 적용된 고급 스마트폰입니다.', 'A high-end smartphone with advanced features.', '最新技術が搭載された高級スマートフォンです。');

-- Insert product with references to the translation rows
insert into
    products (category_id, price, image_url, name_translation_id, description_translation_id)
values
    (1, 999.99, 'image.jpg', 1, 2);

To retrieve Japanese content for product name and description, you can use the following query:

ColumnBasedTranslationTableStrategy
select
    p.id                as `Product ID`,
    productName.text_ja as `Product Name`,
    productDesc.text_ja as `Product Description`
from
    products p
    left join translations productName on p.name_translation_id = productName.id
    left join translations productDesc on p.description_translation_id = productDesc.id
where
    p.id = 1;

The result would be:

Product IDProduct NameProduct Description
1スマートフォン最新技術が搭載された高級スマートフォンです。

This approach enables multilingual data to be managed independently of the main schema, allowing new languages to be added without any structural changes.

1-3. Row-based translation table strategy

The row-based translation table strategy stores multilingual data in separate rows rather than columns. In this approach, translatable fields are not kept in the main table but are retrieved from a translation table based on a locale value.

designing-a-database-for-localized-content_02.png
1-3-1. Strengths and weaknesses of the row-based translation table strategy

This strategy offers excellent scalability, as new languages can be added simply by inserting additional rows, with no changes to the schema. It also supports optimized queries by filtering on locale, minimizing the need for complex joins.

  • Highly scalable: New languages can be supported by inserting data, without altering the table structure.
  • Query optimization by locale: Only the required language data is retrieved by filtering on locale.
  • Minimal data duplication: Translations can be reused across entities, improving consistency and reducing redundancy.

However, there are some limitations:

  • Joins required for retrieval: Retrieving multilingual data typically involves joining the main table and the translation table.
  • Performance impact on full scans: While locale-specific queries are efficient, retrieving all translations can lead to a large number of rows, which may affect performance if not properly optimized.
1-3-2. Table schema for the row-based translation table strategy

In this approach, translatable fields are moved to a separate translation table, and the main table holds only the non-localized data. The translation table stores one row per locale per entity, allowing multiple translations to be maintained efficiently. The example schema below illustrates this setup:

RowBasedTranslationTableStrategy
create table products
(
    id          bigint auto_increment primary key,
    category_id bigint         not null,
    price       decimal(10, 2) not null,
    image_url   varchar(255)   not null
);

create table product_translations
(
    id          bigint auto_increment primary key,
    product_id  bigint       not null,
    locale      varchar(10)  not null, -- Language code (e.g., ko, en, ja)
    name        varchar(255) not null, -- Translated product name
    description text         not null, -- Translated product description
    unique key unique_translation (product_id, locale),
    foreign key (product_id) references products (id) on delete cascade
);

Translatable fields are now stored in the product_translations table, separated by locale. This structure supports clean multilingual management and allows new languages to be added without modifying the schema.

RowBasedTranslationTableStrategy
-- Insert base product
insert into
    products (category_id, price, image_url)
values
    (1, 999.99, 'image.jpg');

-- Insert translations (one row per language)
insert into
    product_translations (product_id, locale, name, description)
values
    (1, 'ko', '스마트폰', '최신 기술이 적용된 고급 스마트폰입니다.'),
    (1, 'en', 'Smartphone', 'A high-end smartphone with advanced features.'),
    (1, 'ja', 'スマートフォン', '最新技術が搭載された高級スマートフォンです。');

To retrieve English content for a given product, use the following query:

RowBasedTranslationTableStrategy
select
    product_id  as `Product ID`,
    name        as `Product Name`,
    description as `Product Description`
from
    product_translations
where
      product_id = 1
  and locale = 'en';

The result will be:

Product IDProduct NameProduct Description
1SmartphoneA high-end smartphone with advanced features.

This strategy provides an efficient and scalable way to manage multilingual data, with no need to alter the schema when adding new languages.

2. Implementation of a localized database

Previously, we explored three strategies for managing translated content tailored to specific regions and cultures. Among them, the row-based translation table strategy offers a balanced choice in terms of scalability and maintainability. It allows new languages to be added without altering the schema and provides reasonable performance. In this section, we’ll implement this strategy in code and examine how it works in practice.

2-1. Database schema definition

First, define the tables needed to store multilingual data. The products table holds base product information, while the product_translations table stores the localized text for each product. The two tables are linked via product_id, and translations are filtered by locale.

schema.sql
create table if not exists products
(
    id          bigint auto_increment primary key,
    category_id bigint         not null,
    price       decimal(10, 2) not null,
    image_url   varchar(255)   not null
);

create table if not exists product_translations
(
    id          bigint auto_increment primary key,
    product_id  bigint       not null,
    locale      varchar(10)  not null,
    name        varchar(255) not null,
    description text         not null,
    unique key unique_translation (product_id, locale),
    foreign key (product_id) references products (id) on delete cascade
);

2-2. Insert sample data

Next, insert sample data to verify how multilingual content is stored. Add a product to the products table, then insert translated content for that product into the product_translations table.

data.sql
insert into
    products (category_id, price, image_url)
values
    (10, 999.99, 'iphone.jpg');

insert into
    product_translations (product_id, locale, name, description)
values
    (1, 'en', 'iPhone 15', 'The latest iPhone model'),
    (1, 'ko', '아이폰 15', '최신 아이폰 모델'),
    (1, 'ja', 'アイフォン15', '最新のアイフォンモデル');

Each translation is stored as a separate row, making it easy to expand the system by simply inserting new data for additional languages.

2-3. API implementation

Now let’s implement an API that returns localized data. Using Spring Boot, create a controller that returns product information with translations based on the requested locale.

ProductController.java
@GetMapping("/products/{id}")
public ResponseEntity<?> products(
        @PathVariable Long id
) {
    ProductResult result = service.retrieve(id);
    return ResponseEntity
            .ok(result);
}

To provide localized content, the application needs access to the request's locale. Spring’s i18n support automatically extracts the locale from the Accept-Language header and stores it in a thread-local context. This enables locale-aware responses throughout the application without requiring custom logic.

ProductService.java
public ProductResult retrieve(Long id) {
    Locale locale = LocaleContextHolder.getLocale();
    return productRepository.findProductWithTranslationById(id, locale);
}

The query to retrieve data for a specific locale can then be written as follows:

ProductRepositoryImpl.java
@Override
public ProductResult findProductWithTranslationById(Long id, Locale locale) {
    return queryFactory
            .select(Projections.constructor(
                    ProductResult.class,
                    product.id,
                    product.price,
                    product.imageUrl,
                    productTranslation.name,
                    productTranslation.description
            ))
            .from(product)
            .innerJoin(productTranslation).on(productTranslation.product.eq(product))
            .where(
                    product.id.eq(id),
                    productTranslation.locale.eq(locale.getLanguage())
            )
            .fetchOne();
}

2-4. API request and response

With the multilingual API in place, let’s test it in action. The client sends a request with the desired language set via the Accept-Language header, and the server returns the translated data accordingly. This enables a single endpoint to serve multiple languages.

To retrieve English content, include Accept-Language: en in the request:

English
GET /products/1 HTTP/1.1
Content-Type: application/json
Accept-Language: en
Host: localhost:8080

The response will contain English translations:

response-in-english.json
{
  "id": 1,
  "price": 999.99,
  "imageUrl": "iphone.jpg",
  "name": "iPhone 15",
  "description": "The latest iPhone model"
}

To retrieve the Japanese version, use Accept-Language: ja:

Japanese
GET /products/1 HTTP/1.1
Content-Type: application/json
Accept-Language: ja
Host: localhost:8080

The response will be:

response-in-japanese.json
{
  "id": 1,
  "price": 999.99,
  "imageUrl": "iphone.jpg",
  "name": "アイフォン15",
  "description": "最新のアイフォンモデル"
}

This setup allows multilingual content to be served dynamically based on the client’s locale. Most importantly, it avoids the need to modify the database schema when new languages are added—only new rows must be inserted—making the system highly maintainable and easy to extend.

3. Final thoughts

While planning a side project with potential for global reach, I naturally began to think about multilingual support. Although the project started as a simple prototype, I felt a growing sense of responsibility to support at least Korean, rather than offering it in English only.

This led me to explore various strategies for managing multilingual data and designing a database that can support them efficiently. Beyond the strategies covered here, other approaches exist—such as storing schema-less data in JSON fields, using key-value stores, maintaining separate tables per language, or integrating with external translation services. Each option has its place, depending on the project’s scale, number of supported languages, frequency of content updates, and performance requirements.

The implementation covered in this post is available in the 🐙 GitHub repository. 🚀