System Architecture Design

Level: Expert Module: Architecture 19 min read Lesson 48 of 55

Overview

What you’ll learn:
- How to design scalable iDempiere deployments using multi-tenant architecture, clustering, and high availability patterns
- How to implement database replication, load balancing, and disaster recovery strategies for production environments
- How to plan capacity, monitor system health, and deploy iDempiere on cloud platforms using infrastructure as code
Prerequisites: Lessons 1-39 (complete Beginner, Intermediate, and Advanced paths), production iDempiere administration experience
Estimated reading time: 25 minutes

Introduction

A successful iDempiere implementation eventually faces a critical inflection point: the system must scale beyond a single server serving a handful of users. Whether you are onboarding additional business entities as tenants, expanding to new geographies, or simply preparing for growth, the architecture decisions you make at this stage will determine system reliability, performance, and operational cost for years to come.

This lesson covers the full spectrum of system architecture design for iDempiere — from multi-tenant models and clustering strategies to database replication, disaster recovery, and cloud deployment. The goal is to equip you with the knowledge to design an iDempiere deployment that meets enterprise-grade availability and scalability requirements.

Multi-Tenant Architecture Patterns

iDempiere’s built-in Client/Organization model provides native multi-tenancy at the application level. Every record in the database carries AD_Client_ID and AD_Org_ID columns, and the security framework enforces strict data isolation between clients. However, there are multiple ways to architect a multi-tenant deployment, each with distinct trade-offs.

Single Instance, Multi-Client

This is iDempiere’s native multi-tenancy model. A single application server and a single database host multiple clients. Each client has its own chart of accounts, business partners, products, and transactional data, all stored in the same database tables but isolated by AD_Client_ID.

Advantages: Lowest infrastructure cost, simplest administration, single codebase and plugin set to maintain, shared system-level configuration (reference lists, countries, currencies).
Disadvantages: A heavy workload in one client can affect performance for all clients (noisy neighbor problem). Database schema changes apply to all clients simultaneously. Upgrade and maintenance windows affect everyone.
Best for: Related business entities under a single parent company, SaaS providers with small-to-medium tenants, development and staging environments.

Multiple Instances, Separate Databases

Each tenant gets its own iDempiere application server and its own database. The instances are completely independent and share nothing at the application level.

Advantages: Complete workload isolation, independent upgrade schedules, tenant-specific customizations and plugin versions, no noisy neighbor risk, simpler backup and restore per tenant.
Disadvantages: Higher infrastructure cost, more operational complexity (N instances to monitor, patch, and upgrade), no shared data between tenants without additional integration.
Best for: Large enterprises with independent business units, managed service providers hosting unrelated companies, regulatory environments requiring strict data separation.

Hybrid Approach

Group related tenants into shared instances while keeping large or high-security tenants on dedicated instances. This balances cost efficiency with isolation requirements. Use resource quotas, connection pooling limits, and monitoring to manage the shared instances.

Horizontal Scaling Strategies

Vertical scaling (adding more CPU, RAM, and faster storage to a single server) has limits. Horizontal scaling distributes the workload across multiple servers. For iDempiere, horizontal scaling requires careful consideration of the application’s stateful nature.

Application-Level Scaling

iDempiere maintains user session state on the application server. This means you cannot simply place multiple iDempiere instances behind a round-robin load balancer — a user’s requests must consistently reach the same instance (session affinity/sticky sessions). There are two primary approaches:

Sticky sessions with a load balancer: Configure your load balancer (HAProxy, Nginx, AWS ALB) to route all requests from a given session to the same backend instance. This is the simplest approach but limits failover — if an instance dies, active sessions on that instance are lost.
Externalized session store: Store session data in a shared external store (Redis, Memcached, or a database) so that any instance can serve any request. This enables true stateless scaling but requires custom modifications to iDempiere’s session management layer, which is a non-trivial effort.

Database-Level Scaling

The database is often the bottleneck before the application server. Database scaling strategies include:

Read replicas: Route read-heavy queries (reports, dashboards, lookups) to replica databases while writes go to the primary. This requires either application-level routing or a database proxy like PgBouncer with read/write splitting.
Connection pooling: Use PgBouncer or PgPool-II in front of PostgreSQL to manage connection limits efficiently. iDempiere can consume many database connections, especially under high concurrency.
Table partitioning: For very large transactional tables (C_Order, C_Invoice, Fact_Acct), PostgreSQL table partitioning by date range or client can dramatically improve query performance and maintenance operations.

Clustering iDempiere Instances

Clustering multiple iDempiere instances provides both high availability and increased capacity. There are two fundamental clustering models.

Shared-Database Clustering

Multiple iDempiere application server instances connect to the same database. Each instance handles a subset of users, and the database is the single source of truth. This is the most common clustering model for iDempiere.

Architecture: Load balancer (with sticky sessions) in front of N iDempiere instances, all pointing to one PostgreSQL primary.
Considerations: Database connection limits must accommodate all instances. Cache invalidation across instances must be addressed — when one instance modifies cached data (such as Application Dictionary metadata), other instances may serve stale cache until their next refresh cycle. Application-level schedulers (e.g., accounting processors, workflow processors) must run on only one instance to avoid duplicate processing.
Scheduler coordination: Designate one instance as the scheduler leader or use a distributed lock mechanism (database advisory locks or a tool like Apache ZooKeeper) to ensure only one instance runs scheduled processes.

Shared-Nothing Clustering

Each iDempiere instance has its own database, and tenants are partitioned across instances. There is no shared state between instances at the application or database level.

Architecture: A routing layer (DNS-based, or a reverse proxy with path-based routing) directs each tenant to its assigned instance.
Considerations: Simpler to reason about (no cross-instance cache issues), easier to scale incrementally (add a new instance for a new tenant), but requires a management layer to handle tenant provisioning and routing.

Database Replication

Database replication is the foundation of both high availability and read scaling. PostgreSQL offers two primary replication mechanisms.

Streaming Replication (Physical Replication)

PostgreSQL streaming replication copies the entire database cluster at the WAL (Write-Ahead Log) level. The standby server is a byte-for-byte replica of the primary.

# postgresql.conf on primary
wal_level = replica
max_wal_senders = 5
wal_keep_size = '1GB'

# Enable archiving for point-in-time recovery
archive_mode = on
archive_command = 'cp %p /var/lib/postgresql/archive/%f'

# On standby, initialize with pg_basebackup
pg_basebackup -h primary-host -D /var/lib/postgresql/data -U replicator -P -R

# The -R flag creates standby.signal and sets primary_conninfo
# in postgresql.auto.conf automatically

Synchronous vs asynchronous: Asynchronous replication has minimal performance impact on the primary but allows a small window of data loss if the primary fails. Synchronous replication guarantees zero data loss but adds latency to every write transaction (the primary waits for the standby to confirm receipt).
Best for: High availability failover, read replicas for reporting, disaster recovery.

Logical Replication

Logical replication decodes WAL changes into logical operations (INSERT, UPDATE, DELETE) and applies them to the subscriber. Unlike streaming replication, the subscriber can have a different schema, indexes, or even be a different PostgreSQL version.

# On the publisher
CREATE PUBLICATION idempiere_pub FOR ALL TABLES;

# On the subscriber
CREATE SUBSCRIPTION idempiere_sub
    CONNECTION 'host=primary-host dbname=idempiere user=replicator'
    PUBLICATION idempiere_pub;

Best for: Selective table replication, cross-version upgrades, data warehouse feeding, partial replicas for specific reporting needs.

Read Replicas for Reporting

Reporting workloads (JasperReports, custom SQL reports, BI tools) can be extremely resource-intensive. Directing them to a read replica prevents report execution from degrading the performance of transactional users. Implementation approaches include:

Configure a separate JDBC connection pool in iDempiere that points to the replica, and route reporting processes to use this pool.
Use a database proxy with read/write splitting rules.
Configure BI tools (Metabase, Apache Superset, Pentaho) to connect directly to the replica.

Disaster Recovery Planning

Disaster recovery (DR) is not optional for production ERP systems. A comprehensive DR plan covers data backup, infrastructure redundancy, and documented recovery procedures.

Backup Strategies

A robust backup strategy includes multiple layers:

Database backups: Use pg_dump for logical backups (portable, selective) and pg_basebackup for physical backups (faster restore for large databases). Schedule both daily. Enable WAL archiving for point-in-time recovery (PITR) — this lets you restore to any moment in time, not just the last backup.
File system backups: Back up the iDempiere installation directory (plugins, configuration files, custom reports), attachment storage, and any external document repositories. Use incremental backup tools like rsync or cloud-native snapshots.
Configuration backups: Export 2Pack packages of critical Application Dictionary customizations. Version-control all plugin source code, migration scripts, and deployment configurations in Git.

Recovery Time and Recovery Point Objectives

Define your targets clearly:

RPO (Recovery Point Objective): How much data loss is acceptable? With synchronous streaming replication and WAL archiving, RPO can be near zero. With daily backups only, RPO is up to 24 hours.
RTO (Recovery Time Objective): How quickly must the system be operational after a failure? This determines whether you need a hot standby (minutes), a warm standby (tens of minutes), or a cold restore from backup (hours).

DR Testing

A disaster recovery plan that has not been tested is not a plan — it is a hope. Schedule regular DR drills:

Restore a database backup to a separate server and verify data integrity.
Practice failover to a streaming replica and confirm the application functions correctly.
Time the entire recovery process and compare against your RTO target.
Document the exact steps and update the runbook after each drill.

High Availability Patterns

High availability (HA) eliminates single points of failure across every layer of the architecture.

Application Server HA

Deploy at least two iDempiere instances behind a load balancer. The load balancer performs health checks (HTTP health endpoint or TCP connection check) and removes unhealthy instances from the pool. Users on a failed instance will lose their session but can log in again and be routed to a healthy instance.

Database HA

Use PostgreSQL streaming replication with automatic failover. Tools like Patroni (combined with etcd or Consul) provide automated leader election, failover, and replica management. When the primary fails, Patroni promotes a replica to primary and reconfigures the remaining replicas — typically within seconds.

# Patroni configuration excerpt (patroni.yml)
scope: idempiere-cluster
name: node1

restapi:
  listen: 0.0.0.0:8008

postgresql:
  listen: 0.0.0.0:5432
  data_dir: /var/lib/postgresql/data
  parameters:
    max_connections: 200
    shared_buffers: 4GB
    wal_level: replica
    max_wal_senders: 5

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576

Load Balancer HA

The load balancer itself must not be a single point of failure. Use a pair of load balancers in active-passive or active-active mode with a virtual IP (keepalived/VRRP) or use a cloud-managed load balancer (AWS ALB/NLB, Azure Load Balancer, GCP Cloud Load Balancing) which is inherently redundant.

Storage HA

If iDempiere stores attachments or documents on the file system, use a shared storage solution (NFS with redundancy, AWS EFS, Azure Files, GlusterFS) so that all application instances access the same file store. Alternatively, configure iDempiere to store attachments in the database (the default behavior), which is then protected by database replication.

Capacity Planning

Capacity planning ensures your infrastructure can handle current and projected workloads. Key metrics to size include:

Sizing Guidelines

Component	Small (up to 25 users)	Medium (25-100 users)	Large (100+ users)
Application Server CPU	4 cores	8 cores	16+ cores (or multiple instances)
Application Server RAM	8 GB	16 GB	32+ GB per instance
Database Server CPU	4 cores	8 cores	16+ cores
Database Server RAM	8 GB	32 GB	64+ GB
Database Storage	50 GB SSD	200 GB SSD	500+ GB NVMe SSD
JVM Heap Size	2-4 GB	4-8 GB	8-16 GB per instance

These are starting points. Actual requirements depend heavily on the volume of transactions, the complexity of customizations, reporting load, and the number of concurrent (not just named) users.

Performance Baselines

Establish baselines early in the deployment lifecycle. Measure and record:

Average response time for key transactions (document completion, record saves, report generation)
Database query execution times for critical queries
JVM heap utilization, garbage collection frequency and pause times
Database connection pool utilization
Disk I/O throughput and latency

Monitoring and Alerting

Production iDempiere systems require comprehensive monitoring at every layer.

Application Monitoring

JMX metrics: iDempiere exposes JVM metrics via JMX (Java Management Extensions). Use tools like Prometheus with JMX Exporter to collect heap usage, thread counts, GC statistics, and class loading data.
Application logs: Centralize iDempiere logs using the ELK stack (Elasticsearch, Logstash, Kibana) or a managed log service. Monitor for error patterns, slow queries, and authentication failures.
Health endpoints: Implement a custom health check servlet or plugin that verifies database connectivity, cache state, and scheduler status, and expose it at a known URL for load balancer health checks.

Database Monitoring

pg_stat_statements: Enable this PostgreSQL extension to track query execution statistics. Identify the slowest and most frequently executed queries for optimization.
Replication lag: Monitor the delay between primary and replicas. Alerting thresholds should be set based on your RPO — if lag exceeds the acceptable data loss window, trigger an alert.
Connection counts: Alert when connection usage approaches max_connections. Connection exhaustion is a common cause of outages.

Infrastructure Monitoring

Use tools like Prometheus + Grafana, Datadog, or cloud-native monitoring (CloudWatch, Azure Monitor, GCP Cloud Monitoring) to track CPU, memory, disk, and network metrics across all servers. Set alerts for resource utilization thresholds (e.g., CPU above 80% sustained for 5 minutes, disk usage above 85%).

Cloud Deployment Considerations

Deploying iDempiere on cloud platforms offers elasticity, managed services, and global availability, but requires adaptation of traditional deployment practices.

Compute Options

Virtual machines (EC2, Azure VMs, GCE): The most straightforward migration path. Run iDempiere on VMs just as you would on bare metal, but with the ability to resize, snapshot, and auto-scale.
Containers (Docker + Kubernetes): Containerizing iDempiere enables consistent deployments, rolling updates, and orchestration. The iDempiere community maintains Docker images. Kubernetes can manage scaling, health checks, and self-healing. However, the stateful nature of iDempiere sessions requires careful configuration of persistent storage and session affinity.

Managed Database Services

Use cloud-managed PostgreSQL (AWS RDS, Azure Database for PostgreSQL, GCP Cloud SQL) to offload backup, patching, replication, and failover management. These services provide automated point-in-time recovery, read replicas, and high availability with minimal operational effort. Be aware of version compatibility — ensure the managed service supports the PostgreSQL version required by your iDempiere release.

Infrastructure as Code

Define your entire iDempiere infrastructure in code using tools like Terraform, AWS CloudFormation, or Pulumi. Infrastructure as code provides:

Reproducibility: Spin up identical environments for development, staging, and production from the same templates.
Version control: Track infrastructure changes in Git alongside application code.
Disaster recovery: Rebuild the entire infrastructure from code in a new region or account if needed.

# Terraform example: PostgreSQL RDS instance for iDempiere
resource "aws_db_instance" "idempiere_db" {
  identifier           = "idempiere-production"
  engine               = "postgres"
  engine_version       = "15.4"
  instance_class       = "db.r6g.xlarge"
  allocated_storage    = 200
  storage_type         = "gp3"

  db_name              = "idempiere"
  username             = "adempiere"
  password             = var.db_password

  multi_az             = true
  backup_retention_period = 14

  vpc_security_group_ids = [aws_security_group.db_sg.id]
  db_subnet_group_name   = aws_db_subnet_group.db_subnets.name

  parameter_group_name = aws_db_parameter_group.idempiere_params.name

  tags = {
    Environment = "production"
    Application = "iDempiere"
  }
}

Putting It All Together: Reference Architecture

A production-ready iDempiere architecture for a medium-to-large deployment might look like this:

Load balancer tier: Cloud-managed load balancer (e.g., AWS ALB) with SSL termination and sticky sessions, distributing traffic across two or more iDempiere instances.
Application tier: Two or more iDempiere instances running on VMs or containers, each with 16 GB heap, configured with shared attachment storage (EFS/NFS) and one designated scheduler leader.
Database tier: Managed PostgreSQL (RDS Multi-AZ) with automatic failover, 64 GB RAM, provisioned IOPS storage, and a read replica for reporting workloads.
Monitoring: Prometheus + Grafana dashboards for application, database, and infrastructure metrics, with PagerDuty or Opsgenie for alerting.
Backup: Automated database snapshots with 14-day retention, WAL archiving to S3 for PITR, nightly file system backups, and 2Pack exports of AD customizations stored in version control.
DR: Cross-region read replica for database, infrastructure-as-code templates ready to deploy in the DR region, documented and tested runbook.

Summary

Designing an iDempiere architecture for production requires decisions across multiple dimensions: multi-tenancy model, scaling strategy, clustering approach, replication topology, and disaster recovery posture. The key principles are:

Eliminate single points of failure at every layer.
Separate read and write workloads where possible.
Automate everything — backups, failover, monitoring, and infrastructure provisioning.
Test your disaster recovery plan regularly.
Right-size for today but architect for growth.

In the next lesson, we will explore integration patterns and middleware — how to connect your iDempiere deployment with external systems using EDI, message queues, and API-driven architectures.

繁體中文翻譯

概述

您將學到：
- 如何使用多租戶架構、叢集和高可用性模式設計可擴展的 iDempiere 部署
- 如何為生產環境實施資料庫複寫、負載平衡和災難復原策略
- 如何規劃容量、監控系統健康狀態，以及使用基礎設施即程式碼在雲端平台部署 iDempiere
先修條件：第 1-39 課（完成初學者、中級和進階路徑），具備 iDempiere 生產環境管理經驗
預估閱讀時間：25 分鐘

導論

成功的 iDempiere 實施最終會面臨一個關鍵轉折點：系統必須擴展到超越服務少量用戶的單一伺服器。無論您是將額外的業務實體作為租戶加入、擴展到新的地理區域，還是單純為增長做準備，您在此階段做出的架構決策將決定未來數年的系統可靠性、效能和營運成本。

本課涵蓋 iDempiere 系統架構設計的完整範圍——從多租戶模型和叢集策略到資料庫複寫、災難復原和雲端部署。目標是讓您具備設計符合企業級可用性和可擴展性要求的 iDempiere 部署的知識。

多租戶架構模式

iDempiere 內建的用戶端/組織模型在應用程式層級提供原生的多租戶功能。資料庫中的每筆記錄都帶有 AD_Client_ID 和 AD_Org_ID 欄位，安全框架在用戶端之間強制執行嚴格的資料隔離。然而，多租戶部署有多種架構方式，每種都有不同的取捨。

單一實例，多用戶端

這是 iDempiere 的原生多租戶模型。單一應用程式伺服器和單一資料庫託管多個用戶端。每個用戶端擁有自己的會計科目表、業務夥伴、產品和交易資料，全部儲存在相同的資料庫表中，但透過 AD_Client_ID 隔離。

優點：最低的基礎設施成本、最簡單的管理、維護單一程式碼庫和外掛集、共享系統層級配置（參考清單、國家、幣別）。
缺點：一個用戶端的繁重工作負載可能影響所有用戶端的效能（吵鬧鄰居問題）。資料庫結構變更同時適用於所有用戶端。升級和維護時段影響所有人。
適用於：單一母公司下的相關業務實體、服務中小型租戶的 SaaS 供應商、開發和暫存環境。

多個實例，獨立資料庫

每個租戶獲得自己的 iDempiere 應用程式伺服器和自己的資料庫。實例完全獨立，在應用程式層級不共享任何東西。

優點：完全的工作負載隔離、獨立的升級排程、租戶特定的自訂和外掛版本、無吵鬧鄰居風險、更簡單的逐租戶備份和還原。
缺點：較高的基礎設施成本、更多的營運複雜度（N 個實例需要監控、修補和升級）、租戶之間無共享資料（除非有額外整合）。
適用於：擁有獨立業務單位的大型企業、託管不相關公司的受管服務供應商、要求嚴格資料分離的法規環境。

混合方式

將相關租戶分組到共享實例中，同時將大型或高安全性租戶放在專用實例上。這在成本效率和隔離要求之間取得平衡。使用資源配額、連線池限制和監控來管理共享實例。

水平擴展策略

垂直擴展（向單一伺服器添加更多 CPU、RAM 和更快的儲存）有其限制。水平擴展將工作負載分散到多台伺服器上。對於 iDempiere，水平擴展需要仔細考慮應用程式的有狀態特性。

應用程式層級擴展

iDempiere 在應用程式伺服器上維護使用者會話狀態。這意味著您不能簡單地將多個 iDempiere 實例放在循環負載平衡器後面——使用者的請求必須一致地到達相同的實例（會話親和性/黏性會話）。有兩種主要方法：

使用負載平衡器的黏性會話：配置您的負載平衡器（HAProxy、Nginx、AWS ALB）將指定會話的所有請求路由到相同的後端實例。這是最簡單的方法，但限制了容錯移轉——如果一個實例死亡，該實例上的活動會話將丟失。
外部化會話存儲：將會話資料儲存在共享外部存儲（Redis、Memcached 或資料庫）中，使任何實例都可以服務任何請求。這實現了真正的無狀態擴展，但需要對 iDempiere 的會話管理層進行自訂修改，這是一項不小的工作。

資料庫層級擴展

資料庫通常是比應用程式伺服器更早的瓶頸。資料庫擴展策略包括：

唯讀副本：將讀取密集的查詢（報表、儀表板、查詢）路由到副本資料庫，寫入操作則到主資料庫。這需要應用程式層級的路由或像 PgBouncer 這樣具有讀寫分離功能的資料庫代理。
連線池：在 PostgreSQL 前面使用 PgBouncer 或 PgPool-II 來有效管理連線限制。iDempiere 可能消耗許多資料庫連線，特別是在高並發情況下。
表分區：對於非常大的交易表（C_Order、C_Invoice、Fact_Acct），PostgreSQL 按日期範圍或用戶端的表分區可以顯著提高查詢效能和維護操作。

iDempiere 實例叢集

叢集多個 iDempiere 實例提供高可用性和增加的容量。有兩種基本的叢集模型。

共享資料庫叢集

多個 iDempiere 應用程式伺服器實例連接到同一個資料庫。每個實例處理一部分使用者，資料庫是唯一的真實來源。這是 iDempiere 最常見的叢集模型。

架構：負載平衡器（帶黏性會話）在 N 個 iDempiere 實例前面，全部指向一個 PostgreSQL 主資料庫。
注意事項：資料庫連線限制必須容納所有實例。必須處理跨實例的快取失效——當一個實例修改了快取資料（如應用程式字典元資料），其他實例可能在下次重新整理週期之前提供過期快取。應用程式層級的排程器（如會計處理器、工作流程處理器）必須只在一個實例上執行，以避免重複處理。
排程器協調：指定一個實例作為排程器領導者，或使用分散式鎖機制（資料庫諮詢鎖或 Apache ZooKeeper 等工具）確保只有一個實例執行排程流程。

無共享叢集

每個 iDempiere 實例有自己的資料庫，租戶在實例間分區。在應用程式或資料庫層級，實例之間沒有共享狀態。

架構：路由層（基於 DNS 或基於路徑路由的反向代理）將每個租戶導向其指定的實例。
注意事項：更容易理解（無跨實例快取問題）、更容易漸進式擴展（為新租戶添加新實例），但需要管理層來處理租戶的佈建和路由。

資料庫複寫

資料庫複寫是高可用性和讀取擴展的基礎。PostgreSQL 提供兩種主要的複寫機制。

串流複寫（實體複寫）

PostgreSQL 串流複寫在 WAL（預寫日誌）層級複製整個資料庫叢集。備援伺服器是主伺服器的逐位元組副本。

# 主伺服器上的 postgresql.conf
wal_level = replica
max_wal_senders = 5
wal_keep_size = '1GB'

# 啟用歸檔以進行時間點復原
archive_mode = on
archive_command = 'cp %p /var/lib/postgresql/archive/%f'

# 在備援伺服器上，使用 pg_basebackup 初始化
pg_basebackup -h primary-host -D /var/lib/postgresql/data -U replicator -P -R

# -R 旗標自動建立 standby.signal 並在
# postgresql.auto.conf 中設定 primary_conninfo

同步與非同步：非同步複寫對主伺服器的效能影響最小，但如果主伺服器故障則允許小範圍的資料丟失。同步複寫保證零資料丟失，但為每個寫入交易增加延遲（主伺服器等待備援確認接收）。
適用於：高可用性容錯移轉、報表用唯讀副本、災難復原。

邏輯複寫

邏輯複寫將 WAL 變更解碼為邏輯操作（INSERT、UPDATE、DELETE）並套用到訂閱者。與串流複寫不同，訂閱者可以擁有不同的結構、索引，甚至是不同的 PostgreSQL 版本。

# 在發布者上
CREATE PUBLICATION idempiere_pub FOR ALL TABLES;

# 在訂閱者上
CREATE SUBSCRIPTION idempiere_sub
    CONNECTION 'host=primary-host dbname=idempiere user=replicator'
    PUBLICATION idempiere_pub;

適用於：選擇性表複寫、跨版本升級、資料倉儲饋送、特定報表需求的部分副本。

報表用唯讀副本

報表工作負載（JasperReports、自訂 SQL 報表、BI 工具）可能極度耗費資源。將它們導向唯讀副本可防止報表執行降低交易使用者的效能。實施方法包括：

在 iDempiere 中配置指向副本的獨立 JDBC 連線池，並將報表流程路由到使用此池。
使用具有讀寫分離規則的資料庫代理。
配置 BI 工具（Metabase、Apache Superset、Pentaho）直接連接到副本。

災難復原規劃

災難復原（DR）對生產 ERP 系統不是可選的。全面的 DR 計畫涵蓋資料備份、基礎設施冗餘和文件化的復原程序。

備份策略

穩健的備份策略包括多個層面：

資料庫備份：使用 pg_dump 進行邏輯備份（可攜帶、可選擇性）和 pg_basebackup 進行實體備份（大型資料庫的還原更快）。兩者都安排每日執行。啟用 WAL 歸檔以進行時間點復原（PITR）——這讓您可以還原到任何時間點，而不僅是最後一次備份。
檔案系統備份：備份 iDempiere 安裝目錄（外掛、配置檔、自訂報表）、附件儲存和任何外部文件庫。使用增量備份工具如 rsync 或雲端原生快照。
配置備份：匯出關鍵應用程式字典自訂的 2Pack 套件。在 Git 中版本控制所有外掛原始碼、遷移腳本和部署配置。

復原時間和復原點目標

明確定義您的目標：

RPO（復原點目標）：可接受多少資料丟失？透過同步串流複寫和 WAL 歸檔，RPO 可接近零。僅使用每日備份，RPO 最多 24 小時。
RTO（復原時間目標）：故障後系統必須多快恢復運作？這決定了您是否需要熱備援（分鐘）、暖備援（數十分鐘），還是從備份冷還原（小時）。

DR 測試

未經測試的災難復原計畫不是計畫——而是希望。安排定期 DR 演練：

將資料庫備份還原到獨立伺服器並驗證資料完整性。
練習容錯移轉到串流副本並確認應用程式正常運作。
計時整個復原過程並與 RTO 目標比較。
記錄確切步驟並在每次演練後更新操作手冊。

高可用性模式

高可用性（HA）消除架構每一層的單點故障。

應用程式伺服器 HA

在負載平衡器後面部署至少兩個 iDempiere 實例。負載平衡器執行健康檢查（HTTP 健康端點或 TCP 連線檢查），並從池中移除不健康的實例。故障實例上的使用者將丟失會話，但可以重新登入並被路由到健康的實例。

資料庫 HA

使用帶有自動容錯移轉的 PostgreSQL 串流複寫。Patroni（結合 etcd 或 Consul）等工具提供自動化的領導者選舉、容錯移轉和副本管理。當主伺服器故障時，Patroni 將副本提升為主伺服器並重新配置剩餘副本——通常在數秒內完成。

# Patroni 配置摘錄（patroni.yml）
scope: idempiere-cluster
name: node1

restapi:
  listen: 0.0.0.0:8008

postgresql:
  listen: 0.0.0.0:5432
  data_dir: /var/lib/postgresql/data
  parameters:
    max_connections: 200
    shared_buffers: 4GB
    wal_level: replica
    max_wal_senders: 5

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576

負載平衡器 HA

負載平衡器本身不能成為單點故障。使用一對負載平衡器以主動-被動或主動-主動模式搭配虛擬 IP（keepalived/VRRP），或使用具有內建冗餘的雲端管理負載平衡器（AWS ALB/NLB、Azure Load Balancer、GCP Cloud Load Balancing）。

儲存 HA

如果 iDempiere 在檔案系統上儲存附件或文件，使用共享儲存解決方案（具有冗餘的 NFS、AWS EFS、Azure Files、GlusterFS），使所有應用程式實例存取相同的檔案儲存。或者，配置 iDempiere 在資料庫中儲存附件（預設行為），這將受到資料庫複寫的保護。

容量規劃

容量規劃確保您的基礎設施能夠處理當前和預期的工作負載。需要調整的關鍵指標包括：

容量建議

元件	小型（最多 25 位使用者）	中型（25-100 位使用者）	大型（100+ 位使用者）
應用程式伺服器 CPU	4 核心	8 核心	16+ 核心（或多個實例）
應用程式伺服器 RAM	8 GB	16 GB	每個實例 32+ GB
資料庫伺服器 CPU	4 核心	8 核心	16+ 核心
資料庫伺服器 RAM	8 GB	32 GB	64+ GB
資料庫儲存	50 GB SSD	200 GB SSD	500+ GB NVMe SSD
JVM 堆積大小	2-4 GB	4-8 GB	每個實例 8-16 GB

這些是起始點。實際需求很大程度上取決於交易量、自訂的複雜度、報表負載和並發（不僅是具名的）使用者數量。

效能基線

在部署生命週期早期建立基線。測量和記錄：

關鍵交易的平均回應時間（單據完成、記錄儲存、報表產生）
關鍵查詢的資料庫查詢執行時間
JVM 堆積使用率、垃圾收集頻率和暫停時間
資料庫連線池使用率
磁碟 I/O 吞吐量和延遲

監控和告警

生產 iDempiere 系統需要在每一層進行全面監控。

應用程式監控

JMX 指標：iDempiere 透過 JMX（Java 管理擴充功能）公開 JVM 指標。使用 Prometheus 搭配 JMX Exporter 等工具來收集堆積使用量、執行緒數、GC 統計和類別載入資料。
應用程式日誌：使用 ELK 堆疊（Elasticsearch、Logstash、Kibana）或受管日誌服務集中 iDempiere 日誌。監控錯誤模式、慢查詢和認證失敗。
健康端點：實作自訂健康檢查 Servlet 或外掛，驗證資料庫連線、快取狀態和排程器狀態，並在已知 URL 公開供負載平衡器健康檢查使用。

資料庫監控

pg_stat_statements：啟用此 PostgreSQL 擴充功能以追蹤查詢執行統計。識別最慢和最頻繁執行的查詢進行優化。
複寫延遲：監控主伺服器和副本之間的延遲。告警閾值應基於您的 RPO 設定——如果延遲超過可接受的資料丟失窗口，觸發告警。
連線數：當連線使用量接近 max_connections 時告警。連線耗盡是停機的常見原因。

基礎設施監控

使用 Prometheus + Grafana、Datadog 或雲端原生監控（CloudWatch、Azure Monitor、GCP Cloud Monitoring）等工具追蹤所有伺服器的 CPU、記憶體、磁碟和網路指標。為資源使用率閾值設定告警（例如 CPU 持續超過 80% 達 5 分鐘、磁碟使用率超過 85%）。

雲端部署注意事項

在雲端平台上部署 iDempiere 提供彈性、受管服務和全球可用性，但需要調整傳統的部署實務。

運算選項

虛擬機器（EC2、Azure VMs、GCE）：最直接的遷移路徑。如同在裸機上一樣在 VM 上執行 iDempiere，但具有調整大小、快照和自動擴展的能力。
容器（Docker + Kubernetes）：將 iDempiere 容器化可實現一致的部署、滾動更新和編排。iDempiere 社群維護 Docker 映像。Kubernetes 可以管理擴展、健康檢查和自我修復。然而，iDempiere 會話的有狀態特性需要仔細配置持久儲存和會話親和性。

受管資料庫服務

使用雲端受管 PostgreSQL（AWS RDS、Azure Database for PostgreSQL、GCP Cloud SQL）來卸載備份、修補、複寫和容錯移轉管理。這些服務以最少的營運工作提供自動化的時間點復原、唯讀副本和高可用性。注意版本相容性——確保受管服務支援您的 iDempiere 版本所需的 PostgreSQL 版本。

基礎設施即程式碼

使用 Terraform、AWS CloudFormation 或 Pulumi 等工具在程式碼中定義整個 iDempiere 基礎設施。基礎設施即程式碼提供：

可重現性：從相同的模板建立開發、暫存和生產環境的相同副本。
版本控制：在 Git 中與應用程式碼一起追蹤基礎設施變更。
災難復原：在需要時從程式碼在新的區域或帳戶中重建整個基礎設施。

# Terraform 範例：iDempiere 的 PostgreSQL RDS 實例
resource "aws_db_instance" "idempiere_db" {
  identifier           = "idempiere-production"
  engine               = "postgres"
  engine_version       = "15.4"
  instance_class       = "db.r6g.xlarge"
  allocated_storage    = 200
  storage_type         = "gp3"

  db_name              = "idempiere"
  username             = "adempiere"
  password             = var.db_password

  multi_az             = true
  backup_retention_period = 14

  vpc_security_group_ids = [aws_security_group.db_sg.id]
  db_subnet_group_name   = aws_db_subnet_group.db_subnets.name

  parameter_group_name = aws_db_parameter_group.idempiere_params.name

  tags = {
    Environment = "production"
    Application = "iDempiere"
  }
}

整合在一起：參考架構

中大型部署的生產就緒 iDempiere 架構可能如下：

負載平衡器層：雲端管理的負載平衡器（例如 AWS ALB），帶有 SSL 終止和黏性會話，將流量分散到兩個或更多 iDempiere 實例。
應用程式層：兩個或更多 iDempiere 實例在 VM 或容器上運行，每個配置 16 GB 堆積，配置共享附件儲存（EFS/NFS），並指定一個排程器領導者。
資料庫層：受管 PostgreSQL（RDS Multi-AZ），帶自動容錯移轉、64 GB RAM、佈建的 IOPS 儲存，以及用於報表工作負載的唯讀副本。
監控：Prometheus + Grafana 儀表板用於應用程式、資料庫和基礎設施指標，搭配 PagerDuty 或 Opsgenie 進行告警。
備份：自動化資料庫快照，14 天保留期，WAL 歸檔到 S3 用於 PITR，每晚檔案系統備份，以及儲存在版本控制中的 AD 自訂 2Pack 匯出。
DR：跨區域的資料庫唯讀副本，準備在 DR 區域部署的基礎設施即程式碼模板，已記錄並測試的操作手冊。

摘要

為生產設計 iDempiere 架構需要在多個維度上做決策：多租戶模型、擴展策略、叢集方法、複寫拓撲和災難復原態勢。關鍵原則是：

消除每一層的單點故障。
盡可能分離讀取和寫入工作負載。
自動化一切——備份、容錯移轉、監控和基礎設施佈建。
定期測試您的災難復原計畫。
為今天調整大小，但為增長設計架構。

在下一課中，我們將探討整合模式和中介軟體——如何使用 EDI、訊息佇列和 API 驅動架構將您的 iDempiere 部署與外部系統連接。

日本語翻訳

概要

学習内容：
- マルチテナントアーキテクチャ、クラスタリング、高可用性パターンを使用してスケーラブルなiDempiere展開を設計する方法
- 本番環境のためのデータベースレプリケーション、ロードバランシング、災害復旧戦略の実装方法
- キャパシティプランニング、システムヘルスの監視、Infrastructure as Codeを使用したクラウドプラットフォームへのiDempiere展開方法
前提条件：第1-39課（初級、中級、上級パスの完了）、iDempiere本番環境の管理経験
推定読了時間：25分

はじめに

成功したiDempiere導入は最終的に重要な転換点に直面します：システムは少数のユーザーに対応する単一サーバーを超えて拡張しなければなりません。追加の事業体をテナントとしてオンボーディングする場合も、新しい地域に拡大する場合も、単に成長に備える場合も、この段階で行うアーキテクチャの決定が今後数年にわたるシステムの信頼性、パフォーマンス、運用コストを決定します。

本レッスンでは、iDempiereのシステムアーキテクチャ設計の全範囲をカバーします——マルチテナントモデルとクラスタリング戦略からデータベースレプリケーション、災害復旧、クラウド展開まで。目標は、エンタープライズグレードの可用性とスケーラビリティ要件を満たすiDempiere展開を設計するための知識を身につけることです。

マルチテナントアーキテクチャパターン

iDempiereの組み込みクライアント/組織モデルはアプリケーションレベルでネイティブなマルチテナンシーを提供します。データベースのすべてのレコードはAD_Client_IDとAD_Org_IDカラムを持ち、セキュリティフレームワークがクライアント間の厳格なデータ隔離を強制します。ただし、マルチテナント展開のアーキテクチャには複数の方法があり、それぞれ異なるトレードオフがあります。

単一インスタンス、マルチクライアント

これはiDempiereのネイティブなマルチテナンシーモデルです。単一のアプリケーションサーバーと単一のデータベースが複数のクライアントをホストします。各クライアントは独自の勘定科目表、取引先、製品、トランザクションデータを持ち、すべて同じデータベーステーブルに格納されますがAD_Client_IDで隔離されています。

利点：最低のインフラコスト、最もシンプルな管理、維持する単一のコードベースとプラグインセット、共有のシステムレベル設定（参照リスト、国、通貨）。
欠点：一つのクライアントの重いワークロードがすべてのクライアントのパフォーマンスに影響する可能性（ノイジーネイバー問題）。データベーススキーマの変更がすべてのクライアントに同時に適用される。アップグレードとメンテナンスウィンドウが全員に影響。
適用先：単一の親会社の下の関連事業体、中小規模のテナントを持つSaaSプロバイダー、開発・ステージング環境。

複数インスタンス、個別データベース

各テナントが独自のiDempiereアプリケーションサーバーと独自のデータベースを取得します。インスタンスは完全に独立しており、アプリケーションレベルで何も共有しません。

利点：完全なワークロード隔離、独立したアップグレードスケジュール、テナント固有のカスタマイズとプラグインバージョン、ノイジーネイバーリスクなし、テナントごとのシンプルなバックアップとリストア。
欠点：より高いインフラコスト、より多くの運用複雑性（N個のインスタンスを監視、パッチ、アップグレード）、追加の統合なしではテナント間でデータ共有不可。
適用先：独立した事業部門を持つ大企業、無関係の企業をホストするマネージドサービスプロバイダー、厳格なデータ分離を要求する規制環境。

ハイブリッドアプローチ

関連テナントを共有インスタンスにグループ化し、大規模またはセキュリティの高いテナントは専用インスタンスに配置します。これによりコスト効率と隔離要件のバランスを取ります。リソースクォータ、接続プーリング制限、監視を使用して共有インスタンスを管理します。

水平スケーリング戦略

垂直スケーリング（単一サーバーへのCPU、RAM、高速ストレージの追加）には限界があります。水平スケーリングはワークロードを複数のサーバーに分散させます。iDempiereの水平スケーリングには、アプリケーションのステートフルな性質を慎重に考慮する必要があります。

アプリケーションレベルのスケーリング

iDempiereはアプリケーションサーバー上でユーザーセッション状態を維持します。つまり、複数のiDempiereインスタンスをラウンドロビンロードバランサーの背後に単純に配置することはできません——ユーザーのリクエストは一貫して同じインスタンスに到達する必要があります（セッションアフィニティ/スティッキーセッション）。二つの主要なアプローチがあります：

ロードバランサーによるスティッキーセッション：ロードバランサー（HAProxy、Nginx、AWS ALB）を設定して、特定のセッションからのすべてのリクエストを同じバックエンドインスタンスにルーティングします。これは最もシンプルなアプローチですが、フェイルオーバーが制限されます——インスタンスが停止すると、そのインスタンス上のアクティブセッションは失われます。
外部化セッションストア：セッションデータを共有外部ストア（Redis、Memcached、またはデータベース）に格納し、任意のインスタンスが任意のリクエストを処理できるようにします。これにより真のステートレススケーリングが可能になりますが、iDempiereのセッション管理レイヤーへのカスタム修正が必要で、これは軽微な作業ではありません。

データベースレベルのスケーリング

データベースはアプリケーションサーバーより先にボトルネックになることが多いです。データベースのスケーリング戦略には以下が含まれます：

リードレプリカ：読み取りが多いクエリ（レポート、ダッシュボード、ルックアップ）をレプリカデータベースにルーティングし、書き込みはプライマリに送ります。これにはアプリケーションレベルのルーティングまたはPgBouncerのような読み書き分離機能を持つデータベースプロキシが必要です。
接続プーリング：PostgreSQLの前にPgBouncerまたはPgPool-IIを使用して接続制限を効率的に管理します。iDempiereは特に高い同時実行性の下で多くのデータベース接続を消費する可能性があります。
テーブルパーティショニング：非常に大きなトランザクションテーブル（C_Order、C_Invoice、Fact_Acct）に対して、PostgreSQLの日付範囲またはクライアントによるテーブルパーティショニングがクエリパフォーマンスとメンテナンス操作を劇的に改善できます。

iDempiereインスタンスのクラスタリング

複数のiDempiereインスタンスのクラスタリングは高可用性と容量の増加の両方を提供します。二つの基本的なクラスタリングモデルがあります。

共有データベースクラスタリング

複数のiDempiereアプリケーションサーバーインスタンスが同じデータベースに接続します。各インスタンスがユーザーのサブセットを処理し、データベースが唯一の真実の源です。これはiDempiereで最も一般的なクラスタリングモデルです。

アーキテクチャ：N個のiDempiereインスタンスの前にロードバランサー（スティッキーセッション付き）、すべてが一つのPostgreSQLプライマリを指します。
考慮事項：データベース接続制限がすべてのインスタンスに対応する必要があります。インスタンス間のキャッシュ無効化に対処する必要があります——一つのインスタンスがキャッシュデータ（アプリケーション辞書メタデータなど）を変更すると、他のインスタンスは次のリフレッシュサイクルまで古いキャッシュを提供する可能性があります。アプリケーションレベルのスケジューラー（会計プロセッサー、ワークフロープロセッサーなど）は重複処理を避けるため一つのインスタンスでのみ実行する必要があります。
スケジューラー調整：一つのインスタンスをスケジューラーリーダーに指定するか、分散ロックメカニズム（データベースアドバイザリーロックまたはApache ZooKeeperなどのツール）を使用して、一つのインスタンスのみがスケジュールされたプロセスを実行するようにします。

シェアードナッシングクラスタリング

各iDempiereインスタンスが独自のデータベースを持ち、テナントがインスタンス間でパーティショニングされます。アプリケーションまたはデータベースレベルでインスタンス間に共有状態はありません。

アーキテクチャ：ルーティングレイヤー（DNSベースまたはパスベースルーティングのリバースプロキシ）が各テナントを割り当てられたインスタンスに誘導します。
考慮事項：理解しやすい（クロスインスタンスのキャッシュ問題なし）、漸進的にスケールしやすい（新しいテナント用に新しいインスタンスを追加）、ただしテナントのプロビジョニングとルーティングを処理する管理レイヤーが必要です。

データベースレプリケーション

データベースレプリケーションは高可用性と読み取りスケーリングの両方の基盤です。PostgreSQLは二つの主要なレプリケーションメカニズムを提供します。

ストリーミングレプリケーション（物理レプリケーション）

PostgreSQLストリーミングレプリケーションはWAL（Write-Ahead Log）レベルでデータベースクラスタ全体をコピーします。スタンバイサーバーはプライマリのバイト単位の完全なレプリカです。

# プライマリのpostgresql.conf
wal_level = replica
max_wal_senders = 5
wal_keep_size = '1GB'

# ポイントインタイムリカバリのためのアーカイブを有効化
archive_mode = on
archive_command = 'cp %p /var/lib/postgresql/archive/%f'

# スタンバイでpg_basebackupで初期化
pg_basebackup -h primary-host -D /var/lib/postgresql/data -U replicator -P -R

# -Rフラグはstandby.signalを作成し、
# postgresql.auto.confにprimary_conninfoを自動設定する

同期と非同期：非同期レプリケーションはプライマリへのパフォーマンス影響が最小ですが、プライマリが故障した場合に小さなデータ損失のウィンドウを許容します。同期レプリケーションはデータ損失ゼロを保証しますが、すべての書き込みトランザクションにレイテンシーが追加されます（プライマリがスタンバイの受信確認を待ちます）。
適用先：高可用性フェイルオーバー、レポート用リードレプリカ、災害復旧。

論理レプリケーション

論理レプリケーションはWALの変更を論理操作（INSERT、UPDATE、DELETE）にデコードし、サブスクライバーに適用します。ストリーミングレプリケーションとは異なり、サブスクライバーは異なるスキーマ、インデックス、さらには異なるPostgreSQLバージョンを持つことができます。

# パブリッシャー上
CREATE PUBLICATION idempiere_pub FOR ALL TABLES;

# サブスクライバー上
CREATE SUBSCRIPTION idempiere_sub
    CONNECTION 'host=primary-host dbname=idempiere user=replicator'
    PUBLICATION idempiere_pub;

適用先：選択的テーブルレプリケーション、バージョン間アップグレード、データウェアハウスへのフィード、特定のレポートニーズのための部分レプリカ。

レポート用リードレプリカ

レポートワークロード（JasperReports、カスタムSQLレポート、BIツール）は極めてリソース集約的になり得ます。リードレプリカに誘導することで、レポート実行がトランザクションユーザーのパフォーマンスを低下させることを防ぎます。実装アプローチには以下が含まれます：

iDempiereでレプリカを指す個別のJDBC接続プールを設定し、レポートプロセスをこのプールを使用するようにルーティングする。
読み書き分離ルールを持つデータベースプロキシを使用する。
BIツール（Metabase、Apache Superset、Pentaho）をレプリカに直接接続するように設定する。

災害復旧計画

災害復旧（DR）は本番ERPシステムにとってオプションではありません。包括的なDR計画はデータバックアップ、インフラの冗長性、文書化された復旧手順をカバーします。

バックアップ戦略

堅牢なバックアップ戦略は複数のレイヤーを含みます：

データベースバックアップ：論理バックアップ（移植性、選択性あり）にはpg_dump、物理バックアップ（大規模データベースのリストアが高速）にはpg_basebackupを使用。両方を毎日スケジュール。ポイントインタイムリカバリ（PITR）のためにWALアーカイブを有効化——これにより最後のバックアップだけでなく任意の時点にリストア可能。
ファイルシステムバックアップ：iDempiereのインストールディレクトリ（プラグイン、設定ファイル、カスタムレポート）、添付ファイルストレージ、外部ドキュメントリポジトリをバックアップ。rsyncやクラウドネイティブスナップショットなどの増分バックアップツールを使用。
設定バックアップ：重要なアプリケーション辞書カスタマイズの2Packパッケージをエクスポート。すべてのプラグインソースコード、マイグレーションスクリプト、デプロイメント設定をGitでバージョン管理。

復旧時間と復旧ポイントの目標

目標を明確に定義します：

RPO（復旧ポイント目標）：許容できるデータ損失はどの程度か？同期ストリーミングレプリケーションとWALアーカイブにより、RPOはほぼゼロにできます。日次バックアップのみの場合、RPOは最大24時間。
RTO（復旧時間目標）：障害後、システムがどれだけ速く運用可能になる必要があるか？これによりホットスタンバイ（分単位）、ウォームスタンバイ（数十分）、またはバックアップからのコールドリストア（時間単位）のいずれが必要かが決まります。

DRテスト

テストされていない災害復旧計画は計画ではなく、希望です。定期的なDR訓練をスケジュールします：

データベースバックアップを別のサーバーにリストアし、データの整合性を検証する。
ストリーミングレプリカへのフェイルオーバーを練習し、アプリケーションが正しく機能することを確認する。
復旧プロセス全体の時間を計測し、RTO目標と比較する。
正確な手順を文書化し、各訓練後にランブックを更新する。

高可用性パターン

高可用性（HA）はアーキテクチャのすべてのレイヤーにおける単一障害点を排除します。

アプリケーションサーバーHA

ロードバランサーの背後に少なくとも2つのiDempiereインスタンスをデプロイします。ロードバランサーがヘルスチェック（HTTPヘルスエンドポイントまたはTCP接続チェック）を実行し、不健全なインスタンスをプールから除外します。障害のあるインスタンスのユーザーはセッションを失いますが、再ログインして健全なインスタンスにルーティングされます。

データベースHA

自動フェイルオーバー付きのPostgreSQLストリーミングレプリケーションを使用します。Patroni（etcdまたはConsulと組み合わせ）などのツールが自動化されたリーダー選出、フェイルオーバー、レプリカ管理を提供します。プライマリが障害を起こすと、Patroniはレプリカをプライマリに昇格させ、残りのレプリカを再設定します——通常は数秒以内。

# Patroni設定の抜粋（patroni.yml）
scope: idempiere-cluster
name: node1

restapi:
  listen: 0.0.0.0:8008

postgresql:
  listen: 0.0.0.0:5432
  data_dir: /var/lib/postgresql/data
  parameters:
    max_connections: 200
    shared_buffers: 4GB
    wal_level: replica
    max_wal_senders: 5

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576

ロードバランサーHA

ロードバランサー自体が単一障害点であってはなりません。仮想IP（keepalived/VRRP）を使用したアクティブ-パッシブまたはアクティブ-アクティブモードのロードバランサーペアを使用するか、本質的に冗長なクラウドマネージドロードバランサー（AWS ALB/NLB、Azure Load Balancer、GCP Cloud Load Balancing）を使用します。

ストレージHA

iDempiereがファイルシステムに添付ファイルやドキュメントを保存する場合、共有ストレージソリューション（冗長性のあるNFS、AWS EFS、Azure Files、GlusterFS）を使用して、すべてのアプリケーションインスタンスが同じファイルストアにアクセスできるようにします。あるいは、iDempiereを添付ファイルをデータベースに保存するように設定（デフォルトの動作）し、データベースレプリケーションで保護します。

キャパシティプランニング

キャパシティプランニングにより、インフラが現在および予測されるワークロードを処理できることを確保します。サイジングすべき主要な指標は以下のとおりです：

サイジングガイドライン

コンポーネント	小規模（最大25ユーザー）	中規模（25-100ユーザー）	大規模（100+ユーザー）
アプリケーションサーバーCPU	4コア	8コア	16+コア（または複数インスタンス）
アプリケーションサーバーRAM	8 GB	16 GB	インスタンスあたり32+ GB
データベースサーバーCPU	4コア	8コア	16+コア
データベースサーバーRAM	8 GB	32 GB	64+ GB
データベースストレージ	50 GB SSD	200 GB SSD	500+ GB NVMe SSD
JVMヒープサイズ	2-4 GB	4-8 GB	インスタンスあたり8-16 GB

これらは出発点です。実際の要件はトランザクション量、カスタマイズの複雑さ、レポート負荷、同時接続（名前付きだけでなく）ユーザー数に大きく依存します。

パフォーマンスベースライン

デプロイメントライフサイクルの早期にベースラインを確立します。測定・記録する項目：

主要なトランザクションの平均応答時間（伝票完了、レコード保存、レポート生成）
重要なクエリのデータベースクエリ実行時間
JVMヒープ使用率、ガベージコレクション頻度と一時停止時間
データベース接続プール使用率
ディスクI/Oスループットとレイテンシー

監視とアラート

本番iDempiereシステムはすべてのレイヤーで包括的な監視が必要です。

アプリケーション監視

JMXメトリクス：iDempiereはJMX（Java Management Extensions）経由でJVMメトリクスを公開します。PrometheusとJMX Exporterなどのツールを使用してヒープ使用量、スレッド数、GC統計、クラスローディングデータを収集します。
アプリケーションログ：ELKスタック（Elasticsearch、Logstash、Kibana）またはマネージドログサービスを使用してiDempiereログを一元化します。エラーパターン、遅いクエリ、認証失敗を監視します。
ヘルスエンドポイント：データベース接続、キャッシュ状態、スケジューラーステータスを検証するカスタムヘルスチェックサーブレットまたはプラグインを実装し、ロードバランサーのヘルスチェック用に既知のURLで公開します。

データベース監視

pg_stat_statements：このPostgreSQL拡張を有効にしてクエリ実行統計を追跡します。最適化のために最も遅い、最も頻繁に実行されるクエリを特定します。
レプリケーションラグ：プライマリとレプリカ間の遅延を監視します。アラート閾値はRPOに基づいて設定すべきです——ラグが許容可能なデータ損失ウィンドウを超えた場合、アラートをトリガーします。
接続数：接続使用量がmax_connectionsに近づいたらアラート。接続の枯渇はダウンタイムの一般的な原因です。

インフラ監視

Prometheus + Grafana、Datadog、またはクラウドネイティブ監視（CloudWatch、Azure Monitor、GCP Cloud Monitoring）などのツールを使用して、すべてのサーバーのCPU、メモリ、ディスク、ネットワークメトリクスを追跡します。リソース使用率の閾値にアラートを設定します（例：CPUが5分間80%以上を持続、ディスク使用率が85%以上）。

クラウドデプロイメントの考慮事項

クラウドプラットフォームへのiDempiereデプロイメントは弾力性、マネージドサービス、グローバルな可用性を提供しますが、従来のデプロイメントプラクティスの適応が必要です。

コンピュートオプション

仮想マシン（EC2、Azure VMs、GCE）：最も直接的な移行パス。ベアメタルと同様にVMでiDempiereを実行しますが、リサイズ、スナップショット、オートスケールの機能があります。
コンテナ（Docker + Kubernetes）：iDempiereのコンテナ化により、一貫したデプロイメント、ローリングアップデート、オーケストレーションが可能になります。iDempiereコミュニティがDockerイメージを維持しています。Kubernetesはスケーリング、ヘルスチェック、セルフヒーリングを管理できます。ただし、iDempiereセッションのステートフルな性質には永続ストレージとセッションアフィニティの慎重な設定が必要です。

マネージドデータベースサービス

クラウドマネージドPostgreSQL（AWS RDS、Azure Database for PostgreSQL、GCP Cloud SQL）を使用して、バックアップ、パッチ適用、レプリケーション、フェイルオーバー管理をオフロードします。これらのサービスは最小限の運用工数で自動化されたポイントインタイムリカバリ、リードレプリカ、高可用性を提供します。バージョン互換性に注意——マネージドサービスがiDempiereリリースで必要なPostgreSQLバージョンをサポートしていることを確認してください。

Infrastructure as Code

Terraform、AWS CloudFormation、Pulumiなどのツールを使用して、iDempiereインフラ全体をコードで定義します。Infrastructure as Codeは以下を提供します：

再現性：同じテンプレートから開発、ステージング、本番の同一環境をスピンアップ。
バージョン管理：アプリケーションコードとともにGitでインフラ変更を追跡。
災害復旧：必要に応じて新しいリージョンやアカウントでコードからインフラ全体を再構築。

# Terraform例：iDempiere用PostgreSQL RDSインスタンス
resource "aws_db_instance" "idempiere_db" {
  identifier           = "idempiere-production"
  engine               = "postgres"
  engine_version       = "15.4"
  instance_class       = "db.r6g.xlarge"
  allocated_storage    = 200
  storage_type         = "gp3"

  db_name              = "idempiere"
  username             = "adempiere"
  password             = var.db_password

  multi_az             = true
  backup_retention_period = 14

  vpc_security_group_ids = [aws_security_group.db_sg.id]
  db_subnet_group_name   = aws_db_subnet_group.db_subnets.name

  parameter_group_name = aws_db_parameter_group.idempiere_params.name

  tags = {
    Environment = "production"
    Application = "iDempiere"
  }
}

まとめ：参照アーキテクチャ

中〜大規模デプロイメントの本番準備が整ったiDempiereアーキテクチャは以下のようになります：

ロードバランサー層：SSL終端とスティッキーセッションを備えたクラウドマネージドロードバランサー（例：AWS ALB）が2つ以上のiDempiereインスタンスにトラフィックを分散。
アプリケーション層：VMまたはコンテナで実行される2つ以上のiDempiereインスタンス、各16 GBヒープ、共有添付ファイルストレージ（EFS/NFS）を設定し、一つのスケジューラーリーダーを指定。
データベース層：自動フェイルオーバー付きマネージドPostgreSQL（RDS Multi-AZ）、64 GB RAM、プロビジョンドIOPSストレージ、レポートワークロード用リードレプリカ。
監視：アプリケーション、データベース、インフラメトリクス用のPrometheus + Grafanaダッシュボード、PagerDutyまたはOpsgenieによるアラート。
バックアップ：14日間保持の自動データベーススナップショット、PITR用のS3へのWALアーカイブ、夜間のファイルシステムバックアップ、バージョン管理に保存されたADカスタマイズの2Packエクスポート。
DR：データベースのクロスリージョンリードレプリカ、DRリージョンでデプロイ可能なInfrastructure as Codeテンプレート、文書化・テスト済みのランブック。

まとめ

本番用のiDempiereアーキテクチャの設計には複数の次元での決定が必要です：マルチテナンシーモデル、スケーリング戦略、クラスタリングアプローチ、レプリケーショントポロジー、災害復旧体制。主要な原則は：

すべてのレイヤーで単一障害点を排除する。
可能な限り読み取りと書き込みのワークロードを分離する。
すべてを自動化する——バックアップ、フェイルオーバー、監視、インフラプロビジョニング。
災害復旧計画を定期的にテストする。
今日に合わせてサイジングし、成長に合わせてアーキテクチャを設計する。

次のレッスンでは統合パターンとミドルウェアを探求します——EDI、メッセージキュー、API駆動アーキテクチャを使用してiDempiereの展開を外部システムと接続する方法です。