Data Persistence with PostgreSQL
By default, the in-memory
stores are provided by the dependency injection, the sql
implementations can be used by
simply registering the relative extensions (e.g. asset-index-sql
, contract-negotiation-store-sql
, …).
1. Configuring DataSources
For using sql
extensions, a DataSource
is needed, and it should be registered on the DataSourceRegistry
service.
The sql-pool-apache-commons
extension is responsible for creating and registering pooled data sources starting from
configuration. At least one data source named "default"
is required.
edc.datasource.default.url=...
edc.datasource.default.name=...
edc.datasource.default.password=...
It is recommended to hold these values in the Vault rather than in configuration. The config key (e.g.
edc.datasource.default.url
) serves as secret alias. If no vault entries are found for these keys, they will be
obtained from the configuration. This is unsafe and should be avoided!
Other datasources can be defined using the same settings structure:
edc.datasource.<datasource-name>.url=...
edc.datasource.<datasource-name>.name=...
edc.datasource.<datasource-name>.password=...
<datasource-name>
is string that then can be used by the store’s configuration to use specific data sources.
1.2 Using custom datasource in stores
Using a custom datasource in a store can be done by configuring the setting:
edc.sql.store.<store-context>.datasource=<datasource-name>
Note that <store-context>
can be an arbitrary string, but it is recommended to use a descriptive name. For example,
the SqlPolicyStoreExtension
defines a data source name as follows:
@Extension("SQL policy store")
public class SqlPolicyStoreExtension implements ServiceExtension {
@Setting(value = "The datasource to be used", defaultValue = DataSourceRegistry.DEFAULT_DATASOURCE)
public static final String DATASOURCE_NAME = "edc.sql.store.policy.datasource";
@Override
public void initialize(ServiceExtensionContext context) {
var datasourceName = context.getConfig().getString(DATASOURCE_NAME, DataSourceRegistry.DEFAULT_DATASOURCE);
//...
}
}
2. SQL Statement abstraction
EDC does not use any sort of Object-Relation-Mapper (ORM), which would automatically translate Java object graphs to SQL statements. Instead, EDC uses pre-canned parameterized SQL statements.
We typically distinguish between literals such as table names or column names and “templates”, which are SQL statements
such as INSERT
.
Both are declared as getters in an interface that extends the SqlStatements
interface, with literals being default
methods and templates being implemented by a BaseSqlDialectStatements
class.
A simple example could look like this:
public class BaseSqlDialectStatements implements SomeEntityStatements {
@Override
public String getDeleteByIdTemplate() {
return executeStatement().delete(getSomeEntityTable(), getIdColumn());
}
@Override
public String getUpdateTemplate() {
return executeStatement()
.column(getIdColumn())
.column(getSomeStringFieldColumn())
.column(getCreatedAtColumn())
.update(getSomeEntityTable(), getIdColumn());
}
//...
}
Note that the example makes use of the SqlExecuteStatement
utility class, which should be used to construct all SQL
statements - except queries. Queries are special in that they have a highly dynamic aspect to them. For more
information, please read on in this chapter.
As a general rule of thumb, issuing multiple statements (within one transaction) should be preferred over writing complex nested statements. It is very easy to inadvertently create an inefficient or wasteful statement that causes high resource load on the database server. The latency that is introduced by sending multiple statements to the DB server is likely negligible in comparison, especially because EDC is architected towards reliability rather than latency.
3. Querying PostgreSQL databases
Generally speaking, the basis for all queries is a QuerySpec
object. This means, that at some point a QuerySpec
must
be translated into an SQL SELECT
statement. The place to do this is the SqlStatements
implementation often called
BaseSqlDialectStatements
:
@Override
public SqlQueryStatement createQuery(QuerySpec querySpec) {
var select = "SELECT * FROM %s".formatted(getSomeEntityTable());
return new SqlQueryStatement(select, querySpec, new SomeEntityMapping(this), operatorTranslator);
}
Now, there are a few things to unpack here:
- the
SELECT
statement serves as starting point for the query - individual
WHERE
clauses get added by parsing thefilterExpression
property of theQuerySpec
LIMIT
andOFFSET
clauses get appended based onQuerySpec#offset
andQuerySpec#limit
- the
SomeEntityMapping
maps the canonical form onto the SQL literals - the
operatorTranslator
is used to convert operators such as=
orlike
into SQL operators
3.1 The canonical form
Theoretically it is possible to map every schema onto every other schema, given that they are of equal cardinality. To achieve that, EDC introduces the notion of a canonical form, which is our internal working schema for entities. In other words, this is the schema in which objects are represented internally. If we ever support a wider variety of translation and transformation paths, everything would have to be transformed into that canonical format first.
In actuality the canonical form of an object is defined by the Java class and its field names. For instance, a query
for contract negotiations must be specified using the field names of a ContractNegotiation
object:
public class ContractNegotiation {
// ...
private ContractAgreement contractAgreement;
// ...
}
public class ContractAgreement {
// ...
private final String assetId;
}
Consequently, contractAgreement.assetId
would be valid, whereas contract_agreement.asset_id
would be invalid. Or,
the left-hand operand looks like as if we were traversing the Java object graph. This is what we call the canonical
form . Note the omission of the root object contractNegotiation
!
3.1 Translation Mappings
Translation mappings are EDCs way to map a QuerySpec
to SQL statements. At its core, it contains a Map
that contains
the Java entity field name and the related SQL column name.
In order to decouple the canonical form from the SQL schema (or any other database schema), a mapping scheme exists to
map the canonical model onto the SQL model. This TranslationMapping
is essentially a graph-like metamodel of the
entities: every Java entity has a related mapping class that contains its field names and the associated SQL column
names. The convention is to append *Mapping
to the class name, e.g. PolicyDefinitionMapping
.
3.1.1 Mapping primitive fields
Primitive fields are stored directly as columns in SQL tables. Thus, mapping primitive data types is trivial: a simple
mapping from one onto the other is necessary, for example, ContractNegotiation.counterPartyAddress
would be
represented in the ContractNegotiationMappin
as an entry
"counterPartyAddress"->"counterparty_address"
When constructing WHERE/AND
clauses, the canonical property is simply be replaced by the respective SQL column name.
3.1.2 Mapping complex objects
For fields that are of complex type, such as the ContractNegotiation.contractAgreement
field, it is necessary to
accommodate this, depending on how the relational data model is defined. There are two basic variants we use:
Option 1: using foreign keys
In this case, the referenced object is stored in a separate table using a foreign key relation. Thus, the canonical
property (contractAgreement
) is mapped onto the SQL schema using another *Mapping
class. Here, this would be the
ContractAgreementMapping
. When resolving a property in the canonical format (contractAgreement.assetId
), this means
we must recursively descend into the model graph and resolve the correct SQL expression.
Note: mapping
one-to-many
relations (= arrays/lists) with foreign keys is not implemented at this time.
Option 2a: encoding the object
Another popular way to store complex objects is to encode them in JSON and store them in a VARCHAR
column. In
PostgreSQL we use the specific JSON
type instead of VARCHAR
. For example, the TranferProcess
is stored in a table
called edc_transfer_process
, its DataAddress
property is encoded in JSON and stored in a JSON
field.
Querying for TransferProcess
objects: when mapping the filter expression
contentDataAddress.properties.somekey=somevalue
, the contentDataAddress
is represented as JSON, therefore in the
TransferProcessMapping
the contentDataAddress
field maps to a JsonFieldTranslator
:
public TransferProcessMapping(TransferProcessStoreStatements statements) {
// ...
add(FIELD_CONTENTDATAADDRESS, new JsonFieldTranslator(statements.getContentDataAddressColumn()));
// ...
}
which would then get translated to:
SELECT *
FROM edc_transfer_process
-- omit LEFT OUTER JOIN for readability
WHERE content_data_address -> 'properties' ->> 'somekey' = 'somevalue'
Note that JSON queries are specific to PostgreSQL and are not portable to other database technologies!
Option 2b: encoding lists/arrays
Like accessing objects, accessing lists/arrays of objects is possible using special JSON operators. In this case the
special Postgres function json_array_elements()
is used. Please refer to the official
documentation.
For an example of how this is done, please look at how the TransferProcessMapping
maps a ResourceManifest
, which in
turn contains a List<ResourceDefinition>
using the ResourceManifestMapping
.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.