# Multi-Tenant Architecture: Isolating Healthcare Data at Scale

## The Multi-Tenancy Challenge

Imagine you're running a SaaS application. Now imagine that application handles the most sensitive data possible: medical records. Now imagine that a single bug could expose Patient A's cancer diagnosis to Hospital B.

That's the stakes we're playing with in a multi-tenant healthcare system.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1767622362440/485ecc9a-6055-4fa9-8f34-c00ae7da3ec2.png align="center")

Most SaaS apps solve multi-tenancy at the application layer—same database, different `tenant_id` columns. That works for Slack or Notion. It doesn't work for healthcare.

Here's why:

* **Regulatory compliance:** HIPAA requires strong data isolation
    
* **Different schemas:** Hospital A stores patient names as `patient_name`, Hospital B uses `full_name`
    
* **Legacy systems:** Each tenant runs their own ancient MySQL database on-premise
    
* **Zero trust:** One tenant's data must be *physically impossible* to access from another tenant's queries
    

## The Architecture: Database-Level Isolation

Instead of logical isolation (same DB, different rows), I went with **physical isolation**—each tenant gets their own database connection to their own database.

```rust
// Each tenant has their own encrypted database URL
pub struct TenantDatabaseConfig {
    pub id: i32,
    pub client_id: String,
    pub database_url: String,  // Encrypted
    pub database_type: DatabaseType,
}
```

### The Connection Pool Dance

Here's the tricky part: you can't create a new database connection for every request. Connections are expensive (100ms+ handshake). But you also can't keep 1000 connection pools open if you have 1000 tenants.

I built a `TenantPoolManager` that:

1. **Lazily creates** connection pools (only when needed)
    
2. **Caches them** in memory for fast access
    
3. **Evicts idle pools** after 30 minutes of inactivity
    
4. **Validates connections** before returning them
    

```rust
pub struct TenantPoolManager {
    pools: Arc<Mutex<HashMap<String, DatabasePool>>>,
    encryption_key: Vec<u8>,
}

impl TenantPoolManager {
    pub async fn get_pool(&self, client_id: &str) -> Result<DatabasePool> {
        // Check cache first
        {
            let pools = self.pools.lock().unwrap();
            if let Some(pool) = pools.get(client_id) {
                return Ok(pool.clone());
            }
        }

        // Not cached - fetch config, decrypt URL, create pool
        let config = self.fetch_tenant_config(client_id).await?;
        let decrypted_url = self.decrypt_database_url(&config.database_url)?;
        let pool = self.create_pool(&decrypted_url).await?;

        // Cache it
        let mut pools = self.pools.lock().unwrap();
        pools.insert(client_id.to_string(), pool.clone());

        Ok(pool)
    }
}
```

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1767622392052/ffc64773-6887-44d8-9177-dc2225f6e9f8.png align="center")

## Encryption

Database URLs contain credentials. Storing them in plaintext would be... unwise.

Every tenant's database URL is encrypted using AES-256-GCM before hitting the database:

```rust
fn encrypt_database_url(&self, url: &str) -> Result<String> {
    let cipher = Aes256Gcm::new(Key::<Aes256Gcm>::from_slice(&self.encryption_key));
    let nonce = Aes256Gcm::generate_nonce(&mut OsRng);

    let ciphertext = cipher.encrypt(&nonce, url.as_bytes())
        .map_err(|e| AdapterError::Encryption(e.to_string()))?;

    // Encode as base64 for storage
    let mut combined = nonce.to_vec();
    combined.extend_from_slice(&ciphertext);
    Ok(base64::encode(&combined))
}
```

The encryption key lives in environment variables, never in the database.

## Configuration Caching: Speed vs Consistency

Each tenant has a configuration:

* Which FHIR resources they support
    
* How to map FHIR paths to database columns
    
* Custom transformations (e.g., "1"/"0" → true/false)
    
* Nested array handling strategies
    

Loading this from MySQL on every request would be slow (5-10ms per query). But caching it forever means configuration changes don't take effect.

I built a `ConfigResolver` with a TTL-based cache:

```rust
pub struct ConfigResolver {
    cache: Arc<Mutex<HashMap<String, CachedConfig>>>,
    cache_ttl: Duration,
}

struct CachedConfig {
    config: TenantConfig,
    loaded_at: Instant,
}

impl ConfigResolver {
    pub async fn get_config(&self, client_id: &str) -> Result<TenantConfig> {
        // Check cache
        {
            let cache = self.cache.lock().unwrap();
            if let Some(cached) = cache.get(client_id) {
                if cached.loaded_at.elapsed() < self.cache_ttl {
                    return Ok(cached.config.clone());
                }
            }
        }

        // Cache miss or expired - reload
        let config = self.load_from_database(client_id).await?;

        let mut cache = self.cache.lock().unwrap();
        cache.insert(client_id.to_string(), CachedConfig {
            config: config.clone(),
            loaded_at: Instant::now(),
        });

        Ok(config)
    }

    // Admin panel calls this after configuration changes
    pub async fn invalidate(&self, client_id: &str) {
        let mut cache = self.cache.lock().unwrap();
        cache.remove(client_id);
    }
}
```

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1767622413017/4a629de8-82e7-4a84-ab63-0468b349d41b.png align="center")

## Authentication: Keycloak Integration

Multi-tenancy isn't just about data—it's about *who can access* that data.

I integrated Keycloak for enterprise SSO:

* **Client ID embedded in JWT:** Every request includes `client_id` claim
    
* **Role-based access:** `fhir-read`, `fhir-write`, `fhir-admin`
    

The authentication flow:

1. User logs in via Keycloak
    
2. Every request to the adapter includes this JWT
    
3. Middleware validates JWT and extracts `client_id`
    
4. All database queries are scoped to that `client_id`
    

Keycloak issues JWT with claims:

```json
{
  "sub": "user-123",
  "client_id": "tenant-42",
  "realm_access": {
    "roles": ["fhir-read", "fhir-write"]
  }
}
```

**No way to query across tenants. The** `client_id` is the security boundary.

```rust
// Middleware that extracts and validates tenant context
pub async fn auth_middleware(
    State(state): State<AppState>,
    req: Request,
    next: Next,
) -> Result<Response, AppError> {
    let token = extract_bearer_token(&req)?;

    // Validate JWT signature and expiration
    let claims = state.keycloak.validate_token(&token).await?;

    // Extract client_id - this determines data isolation
    let client_id = claims.client_id
        .ok_or_else(|| AppError::Unauthorized("Missing client_id"))?;

    // Inject into request extensions for downstream handlers
    req.extensions_mut().insert(AuthContext {
        user_id: claims.sub,
        client_id,
        roles: claims.realm_access.roles,
    });

    Ok(next.run(req).await)
}
```

## Lessons Learned

### ✅ What Worked

1. **Database-level isolation:** No worrying about leaked `WHERE` clauses
    
2. **Lazy pool creation:** Most tenants are idle most of the time
    
3. **TTL-based config cache:** 5-minute TTL is a sweet spot
    

### ❌ What Didn't

1. **Initial design had no pool eviction:** Memory grew unbounded
    
2. **Encrypted URLs in logs:** Accidentally logged encrypted URLs (look like gibberish, but still bad)
    
3. **No circuit breakers:** One tenant's broken DB took down the whole adapter
    

### 🔧 Improvements Made

1. **Added pool eviction** after 30 min idle
    
2. **Sanitized all logging** (URLs, credentials, PHI)
    
3. **Per-tenant circuit breakers** (now one tenant can fail without affecting others)
    

## Up Next: The Mapping Engine

Physical isolation solves *security*. But how do we handle the fact that Hospital A stores patient names as `patient_name` while Hospital B uses `nombre_paciente`?

That's where the **dynamic mapping engine** comes in.

---

*Part 3 will dive deep into how the adapter translates arbitrary database schemas into standard FHIR resources at runtime—without code generation.*

---

**Discussion:** How do you handle multi-tenancy in your systems? Physical vs logical isolation? Let me know your thoughts.
