Multi-Tenant Architecture: Isolating Healthcare Data at Scale

The Multi-Tenancy Challenge

Imagine you're running a SaaS application. Now imagine that application handles the most sensitive data possible: medical records. Now imagine that a single bug could expose Patient A's cancer diagnosis to Hospital B.

That's the stakes we're playing with in a multi-tenant healthcare system.

Most SaaS apps solve multi-tenancy at the application layer—same database, different tenant_id columns. That works for Slack or Notion. It doesn't work for healthcare.

Here's why:

Regulatory compliance: HIPAA requires strong data isolation
Different schemas: Hospital A stores patient names as patient_name, Hospital B uses full_name
Legacy systems: Each tenant runs their own ancient MySQL database on-premise
Zero trust: One tenant's data must be physically impossible to access from another tenant's queries

The Architecture: Database-Level Isolation

Instead of logical isolation (same DB, different rows), I went with physical isolation—each tenant gets their own database connection to their own database.

// Each tenant has their own encrypted database URL
pub struct TenantDatabaseConfig {
    pub id: i32,
    pub client_id: String,
    pub database_url: String,  // Encrypted
    pub database_type: DatabaseType,
}

The Connection Pool Dance

Here's the tricky part: you can't create a new database connection for every request. Connections are expensive (100ms+ handshake). But you also can't keep 1000 connection pools open if you have 1000 tenants.

I built a TenantPoolManager that:

Lazily creates connection pools (only when needed)
Caches them in memory for fast access
Evicts idle pools after 30 minutes of inactivity
Validates connections before returning them

pub struct TenantPoolManager {
    pools: Arc<Mutex<HashMap<String, DatabasePool>>>,
    encryption_key: Vec<u8>,
}

impl TenantPoolManager {
    pub async fn get_pool(&self, client_id: &str) -> Result<DatabasePool> {
        // Check cache first
        {
            let pools = self.pools.lock().unwrap();
            if let Some(pool) = pools.get(client_id) {
                return Ok(pool.clone());
            }
        }

        // Not cached - fetch config, decrypt URL, create pool
        let config = self.fetch_tenant_config(client_id).await?;
        let decrypted_url = self.decrypt_database_url(&config.database_url)?;
        let pool = self.create_pool(&decrypted_url).await?;

        // Cache it
        let mut pools = self.pools.lock().unwrap();
        pools.insert(client_id.to_string(), pool.clone());

        Ok(pool)
    }
}

Encryption

Database URLs contain credentials. Storing them in plaintext would be... unwise.

Every tenant's database URL is encrypted using AES-256-GCM before hitting the database:

fn encrypt_database_url(&self, url: &str) -> Result<String> {
    let cipher = Aes256Gcm::new(Key::<Aes256Gcm>::from_slice(&self.encryption_key));
    let nonce = Aes256Gcm::generate_nonce(&mut OsRng);

    let ciphertext = cipher.encrypt(&nonce, url.as_bytes())
        .map_err(|e| AdapterError::Encryption(e.to_string()))?;

    // Encode as base64 for storage
    let mut combined = nonce.to_vec();
    combined.extend_from_slice(&ciphertext);
    Ok(base64::encode(&combined))
}

The encryption key lives in environment variables, never in the database.

Configuration Caching: Speed vs Consistency

Each tenant has a configuration:

Which FHIR resources they support
How to map FHIR paths to database columns
Custom transformations (e.g., "1"/"0" → true/false)
Nested array handling strategies

Loading this from MySQL on every request would be slow (5-10ms per query). But caching it forever means configuration changes don't take effect.

I built a ConfigResolver with a TTL-based cache:

pub struct ConfigResolver {
    cache: Arc<Mutex<HashMap<String, CachedConfig>>>,
    cache_ttl: Duration,
}

struct CachedConfig {
    config: TenantConfig,
    loaded_at: Instant,
}

impl ConfigResolver {
    pub async fn get_config(&self, client_id: &str) -> Result<TenantConfig> {
        // Check cache
        {
            let cache = self.cache.lock().unwrap();
            if let Some(cached) = cache.get(client_id) {
                if cached.loaded_at.elapsed() < self.cache_ttl {
                    return Ok(cached.config.clone());
                }
            }
        }

        // Cache miss or expired - reload
        let config = self.load_from_database(client_id).await?;

        let mut cache = self.cache.lock().unwrap();
        cache.insert(client_id.to_string(), CachedConfig {
            config: config.clone(),
            loaded_at: Instant::now(),
        });

        Ok(config)
    }

    // Admin panel calls this after configuration changes
    pub async fn invalidate(&self, client_id: &str) {
        let mut cache = self.cache.lock().unwrap();
        cache.remove(client_id);
    }
}

Authentication: Keycloak Integration

Multi-tenancy isn't just about data—it's about who can access that data.

I integrated Keycloak for enterprise SSO:

Client ID embedded in JWT: Every request includes client_id claim
Role-based access: fhir-read, fhir-write, fhir-admin

The authentication flow:

User logs in via Keycloak
Every request to the adapter includes this JWT
Middleware validates JWT and extracts client_id
All database queries are scoped to that client_id

Keycloak issues JWT with claims:

{
  "sub": "user-123",
  "client_id": "tenant-42",
  "realm_access": {
    "roles": ["fhir-read", "fhir-write"]
  }
}

No way to query across tenants. The client_id is the security boundary.

// Middleware that extracts and validates tenant context
pub async fn auth_middleware(
    State(state): State<AppState>,
    req: Request,
    next: Next,
) -> Result<Response, AppError> {
    let token = extract_bearer_token(&req)?;

    // Validate JWT signature and expiration
    let claims = state.keycloak.validate_token(&token).await?;

    // Extract client_id - this determines data isolation
    let client_id = claims.client_id
        .ok_or_else(|| AppError::Unauthorized("Missing client_id"))?;

    // Inject into request extensions for downstream handlers
    req.extensions_mut().insert(AuthContext {
        user_id: claims.sub,
        client_id,
        roles: claims.realm_access.roles,
    });

    Ok(next.run(req).await)
}

Lessons Learned

✅ What Worked

Database-level isolation: No worrying about leaked WHERE clauses
Lazy pool creation: Most tenants are idle most of the time
TTL-based config cache: 5-minute TTL is a sweet spot

❌ What Didn't

Initial design had no pool eviction: Memory grew unbounded
Encrypted URLs in logs: Accidentally logged encrypted URLs (look like gibberish, but still bad)
No circuit breakers: One tenant's broken DB took down the whole adapter

🔧 Improvements Made

Added pool eviction after 30 min idle
Sanitized all logging (URLs, credentials, PHI)
Per-tenant circuit breakers (now one tenant can fail without affecting others)

Up Next: The Mapping Engine

Physical isolation solves security. But how do we handle the fact that Hospital A stores patient names as patient_name while Hospital B uses nombre_paciente?

That's where the dynamic mapping engine comes in.

Part 3 will dive deep into how the adapter translates arbitrary database schemas into standard FHIR resources at runtime—without code generation.

Discussion: How do you handle multi-tenancy in your systems? Physical vs logical isolation? Let me know your thoughts.

Multi-Tenant Architecture: Isolating Healthcare Data at Scale

The Multi-Tenancy Challenge

The Architecture: Database-Level Isolation

The Connection Pool Dance

Encryption

Configuration Caching: Speed vs Consistency

Authentication: Keycloak Integration

Lessons Learned

✅ What Worked

❌ What Didn't

🔧 Improvements Made

Up Next: The Mapping Engine

Comments

Building a FHIR Adapter from Scratch

Building a FHIR Adapter from Scratch: Why Rust, Why Now

More from this blog

Building Trust: Authentication, Authorization, and Audit Logging

Dynamic Mapping Engine: Bridging Legacy and Modern Healthcare

Building a FHIR Adapter from Scratch: Why Rust, Why Now

Command Palette

The Multi-Tenancy Challenge

The Architecture: Database-Level Isolation

The Connection Pool Dance

Encryption

Configuration Caching: Speed vs Consistency

Authentication: Keycloak Integration

Lessons Learned

✅ What Worked

❌ What Didn't

🔧 Improvements Made

Up Next: The Mapping Engine

Comments

Building a FHIR Adapter from Scratch

Building a FHIR Adapter from Scratch: Why Rust, Why Now

More from this blog