The embedded-postgres library hardcodes --lc-messages=en_US.UTF-8 and
strips the parent process environment when spawning initdb/postgres.
In slim Docker images (e.g. node:20-bookworm-slim), the en_US.UTF-8
locale isn't installed, causing initdb to exit with code 1.
Two fixes applied:
1. Add --lc-messages=C to all initdbFlags arrays (overrides the
library's hardcoded locale since our flags come after in the spread)
2. pnpm patch on embedded-postgres to preserve process.env in spawn
calls, preventing loss of PATH, LD_LIBRARY_PATH, and other vars
Co-Authored-By: Paperclip <noreply@paperclip.ing>
On macOS, `initdb` defaults to SQL_ASCII encoding because it infers
locale from the system environment. When `ensurePostgresDatabase()`
creates a database without specifying encoding, the new database
inherits SQL_ASCII from the cluster. This causes string functions like
`left()` to operate on bytes instead of characters, producing invalid
UTF-8 when multi-byte characters are truncated.
Two-part fix:
1. Pass `--encoding=UTF8 --locale=C` via `initdbFlags` to all
EmbeddedPostgres constructors so the cluster defaults to UTF-8.
2. Explicitly set `encoding 'UTF8'` in the CREATE DATABASE statement
with `template template0` (required because template1 may already
have a different encoding) and `C` locale for portability.
Existing databases created with SQL_ASCII are NOT automatically fixed;
users must delete their local `data/db` directory and restart to
re-initialize the cluster.
Relates to #636
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>