spring - Using partitioning to do multi-schema multi-tenancy with dynamic tenants -
i'm writing web application must multi-tenant. i'm using jpa persistence layer , i'm evaluating eclipselink interest.
the multi-tenant strategy want use is: 1 schema per customer. hibernate supports such strategy (http://docs.jboss.org/hibernate/orm/4.2/devguide/en-us/html/ch16.html#d5e4771) , i've used success. however, afaik supports when using native hibernate api, while want use jpa.
eclipselink, on other hand, supports single-table , multi-table multi-tenancy strategies. however, supports partitioning , simple custom partitioning policy may set 1 partition each customer.
the first question might whether using partitioning use case appropriate or not.
the main problem, however, customer base may (hopefully) grow on time, have make eclipselink "know" new customers dynamically (i.e.: without restarting webapp). understand, set partitioning in eclipselink have setup persistence unit different "connection pools" (or "nodes"): every node has configured datasource , name. on other hand, partitioning strategy determine node use name. far good, plan setup persistence unit using spring's localcontainerentitymanagerfactorybean
. may discover customers dynamically on startup, when localcontainerentitymanagerfactorybean
processed, can pass needed properties nodes/customers time, happens if new customer added afterwards? don't think changing persistence unit properties dynamically have effect on constructed entitymanagerfactory
singleton instance... , fear eclipselink complain if request partition no corresponding node known @ entitymanagerfactory
creation time. correct me if i'm wrong.
i think declaring localcontainerentitymanagerfactorybean
scope "prototype" bean bad idea , think won't work @ all. on other hand, since customer interaction bound specific http session, may alternatively use "middle" approach declaring localcontainerentitymanagerfactorybean
scope "session", think in case have manage problems increased memory consumption , shared cache coordination between multiple entitymanagerfactories
(one each customer using application @ given time).
if can't make strategy work, think i'll have abandon partitioning whole , fall "dynamic data source routing" approach, in case i'm concerned eclipselink shared cache consistency (i think i'll have disable , real disadvantage).
thanks in advance feedback on this.
honestly, didn't try chris's suggestion, opted more fine-tuned solution. solution.
- in case, tenant = customer; each customer data in own database schema, potentially located in dedicated dbms instance (of whatever vendor); in other words, have 1 different data source per customer
- since use partitioning, means every customer has own partition; each partition identified corresponding unique customer id
- every user logs application belongs different customer; use spring security handle authentication , authorization, hence can retrieve information user (including owning customer) querying
securitycontextholder
- i defined own eclipselink
partitioningpolicy
determines customer of logged in user described in previous point, , returns list containingaccessor
identifies customer partition all tables must partitioned , don't want specify on every entity annotations, registered partitioning policy eclipselink on startup , set default one; briefly:
jpaentitymanagerfactory jpaemf = entitymanagerfactory.unwrap(jpaentitymanagerfactory.class); serversession serversession = jpaemf.getserversession(); serversession.getproject().addpartitioningpolicy(mycustomerpolicy); serversession.setpartitioningpolicy(mycustomerpolicy);
then, dynamically add data sources eclipselink (they called "connection pools" in eclipselink terminology), customer id specified policy above matched against known "connection pool" in eclipselink, following:
- a listener intercepts user successful login
this listener queries eclipselink see knows connection pool identified user customer id; if does, we're done, eclipselink can correctly handle partition; otherwise new connection pool created , added eclipselink; proof of concept:
string customerid = principal.getcustomerid(); jpaentitymanagerfactory jpaemf = entitymanagerfactory.unwrap(jpaentitymanagerfactory.class); serversession serversession = jpaemf.getserversession(); if (!serversession.getconnectionpools().containskey(customerid)) { datasource customerdatasource = createdatasourceforcustomer(customerid); databaselogin login = new databaselogin(); login.usedatasource(customerid); login.setconnector(new jndiconnector(customerdatasource)); class<? extends databaseplatform> databaseplatformclass = determinedbvendorplatform(customerid); login.useplatform(databaseplatformclass.newinstance()); connectionpool connectionpool = new externalconnectionpool(customerid, login, serversession); connectionpool.startup(); serversession.addconnectionpool(connectionpool); }
the user login operation of course performed against central data base (or other source of authentication), above code occurs before customer-specific jpa query executed (and hence customer connection pool added eclipselink before partitioning policy ever reference it).
there's important aspect take consideration, though. in eclipselink, data partitioning means identifiable piece of data (= entity instance) either in 1 partition, or equally replicated in multiple partitions. entity instance identity determined through identifier (= primary key). means there should not exist 2 different entity instances of type e same id=x 2 different customers/tenants t1 , t2, otherwise eclipselink might think exact same entity instance. may lead mixed data different customers being read/written during single jpa session => disaster. possible solutions:
- in scenario, partition use determined logged in user; means same every query executed within scope of http session; since use transaction-scoped entity managers, lifespan @ equal request duration (which extends within http session), disabling eclipselink shared cache avoids possibility data different customers mixed; however, still undesirable
- the best option find make sure ids (= primary keys) generated , generation handled eclipselink in central cross-customer way, id=x entity e assigned 1 entity instance of 1 customer only; means "partitioning" id assignment sequences on customers , prevents use of mysql auto-increment columns (aka database identity generation type); opted using table generation type entity identifiers , putting table in central database user , customer information stored
the last little problem solve implement option 2 correctly that, if eclipselink documentation says it's possible specify connection pool (= data source) dedicated table sequencing using eclipselink.connection-pool.sequence
configuration option, seems ignored when default partitioning policy set described above. in fact, customer partitioining policy gets invoked every query, used id allocations. reason, policy must intercept queries , route them central data source. couldn't find definitive solution problem, best options think of are:
- if sql string of query starts "update sequence ", means it's query id allocation, assumption table dedicated sequence allocation called sequence (this default)
- if adopt convention add sequence suffix generators, if executed query name ends "sequence", means it's query id allocation
i opted option 2 correctly defining id generation mappings such:
@entity public class myentity { @id @tablegenerator(name = "myentity_sequence", allocationsize = 10) @generatedvalue(generator = "myentity_sequence") private long id; }
this makes eclipselink use table named sequence
, containing 1 row seq_name
column value myentity_sequence
. query used update sequence id allocation named myentity_sequence
, we're done. made partitioining policy configurable can switch 1 sequence-query-identification strategy other @ time in case changes in eclipselink implementation breaks "heuristics".
this substantially whole picture. now, has been working well. feedback, improvements, suggestions welcome.
Comments
Post a Comment