Best practices and recommendations
1. Preface
This document aims at giving guidelines and recommendations to developers who want to use or extend EDC or EDC modules in their applications, to DevOps engineers who are tasked with packaging and operating EDC modules as runnable application images.
Please understand this document as a recommendation from the EDC project committers team that they compiled to the best of their knowledge. We realize that use case scenarios are plentiful and requirements vary, and not every best practice is applicable everywhere. You know your use case best.
This document is not an exhaustive list of prescribed steps, that will shield adopters from any conceivable harm or danger, but rather should serve as starting point for engineers to build upon.
Finally, it should go without saying that the software of the EDC project is distributed “as is” and committers of EDC take no responsibility or liability, direct or indirect, for any harm or damage caused by the us`e of it. This document does not change that.
2. Security recommendations
2.1 Exposing APIs to the internet
The EDC code base has several outward-facing APIs, exclusively implemented as HTTP/REST endpoints. These have different purposes, different intended consumers and thus different security implications.
As a general rule, APIs should not be exposed directly to the internet. That does not mean that they shouldn’t be accessible via the internet, obviously the connector and related components cannot work without a network connection. This only means that API endpoints should not be directly facing the internet, instead, there should be appropriate infrastructure in place.
It also means that we advise extreme caution when making APIs accessible via the internet - by default only the DSP API and the data plane’s public API should be accessible via the internet, the others (management API, signaling API,…) are intended only for local network access, e.g. within a Kubernetes cluster.
Corporate security policies might require that only HTTPS/TLS connections be used, even between pods in a Kubernetes cluster. While the EDC project makes no argument pro or contra, that is certainly an idea worth considering in high security environments.
The key take-away is that all of EDC’s APIs - if accessible outside the local network - should only be accessible through separate software components such as API gateways or load balancers. These are specialized tools with the sole purpose of performing authentication, authorization, rate limiting, IP blacklisting/whitelisting etc.
There is a plethora of ready-made components available, both commercial and open-source, therefor the EDC project will not provide that functionality. Feature requests and issues to that effect will be ignored.
In the particular case of the DSP API, the same principle holds, although with the exception of authentication and authorization. That is handled by the DSP protocol itself.
We have a rudimentary token-based API security module available, which can be used to secure the connection API gateway <-> connector if so desired. It should be noted that it is not designed to act as a ingress point!
TL;DR: don’t expose any APIs if you can help it, but if you must, use available tools to harden the ingress
2.2 Use only official TLS certificates/CAs
Typically, JVMs ship with trust stores that contain a number of widely accepted CAs. Any attempts to package additional CAs/certificates with runtime base images are discouraged, as that would be problematic because:
- scalability: in a heterogenous networks one cannot assume such a custom CA to be accepted by the counterparty
- maintainability: TLS certificates expire, so there is a chance that mandatory software rollouts become necessary because of expired certificates lest the network breaks down completely.
- security: there have been a number of issues with CAs (1, 2), so adding non-sanctioned ones brings a potential security weakness
2.3 Use appropriate network infrastructure
As discussed earlier, EDC does not (and will not) provide or implement tooling to harden network ingress, as that is an orthogonal concern, and there are tools better suited for that.
We encourage every connector deployment to plan and design their network layout and infrastructure right from the onset, before even writing code. Adding that later can be difficult and time-consuming.
For example, in Kubernetes deployments, which are the de-facto industry standard, networking can be taken on by ingress controllers and load balancers. Additional external infrastructure, such as API gateways are recommended to handle authentication, authorization and request throttling.
2.4 A word on authentication and authorization
EDC does not have a concept of a “user account” as many client-facing applications do. In terms of identity, the connector itself represents a participant in a dataspace, so that is the level of granularity the connector operates on. That means, that client-consumable APIs such as the Management API only have rudimentary security. This is by design and must be solved out-of-band.
The reasoning behind this is that requirements for authentication and authorization are so diverse and heterogeneous, that it is virtually impossible for the EDC project to satisfy them all, or even most of them. In addition, there is very mature software available that is designed for this very use case.
Therefore, adopters of EDC have two options to consider:
- develop a custom
AuthenticationService
(or even aContainerRequestFilter
), that integrates with an IDP - use a dedicated API gateway (recommended)
Both these options are viable, and may have merit depending on the use case.
2.5 Docker builds
As Docker is a very popular method to build and ship applications, we put forward the following recommendations:
- use official Eclipse Temurin base images for Java
- use dedicated non-root users: in your Dockerfile, add the following lines
ARG APP_USER=docker ARG APP_UID=10100 RUN addgroup --system "$APP_USER" RUN adduser \ shell /sbin/nologin \ disabled-password \ gecos "" \ ingroup "$APP_USER" \ no-create-home \ uid "$APP_UID" \ APP_USER" USER "$APP_USER"
2.6 Use proper database security
Database connections are secured with a username and a password. Please choose non-default users and strong passwords. In addition, database credentials should be stored in an HSM (vault).
Further, the roles of the technical user for the connector should be limited to SELECT
, INSERT
, UPDATE
, and
DELETE
. There is no reason for that user to have permissions to modify databases, tables, permissions or execute other
DDL statements.
2.7 Store sensitive data in a vault
While the default behaviour of EDC is that configuration values are taken either from environment variables, system
properties or from configuration extensions, it is highly recommended to store sensitive data in a vault
when
developing EDC extensions.
Here is a (non-exhaustive) list of examples of such sensitive values:
- database credentials
- cryptographic keys, e.g. private keys in an asymmetric key pair
- symmetric keys
- API keys/tokens
- credentials for other third-party services, even if temporary
Sensitive values should not be passed through multiple layers of code. Instead, they should be referenced by their
alias, and be resolved from the vault
wherever they are used. Do not store sensitive data as class members but use
local variables that are garbage-collected when leaving execution scope.
3. General recommendations
3.1 Use only official releases
We recommend using only official releases of our components. The latest version can be obtained from the project’s GitHub releases page and the modules are available from MavenCentral.
Snapshots are less stable, less tested and less reliable than release versions and they make for non-repeatable builds.
That said, we realize that sometimes living on the bleeding edge of technology is thrilling, or in some circumstances
even necessary. EDC components publish a -SNAPSHOT
build on every commit the main
branch, so there could be several
such builds per day, each overwriting the previous one. In addition, we publish nightly builds, that are versioned
<VERSION>-<YYYYMMDD>-SNAPSHOT
and those don’t get overwritten. For more information please refer to the respective
documentation.
3.2 Dependency hygiene
It should be at the top of every software engineer’s todo list to keep application dependencies current, to avoid security issues, minimize technical debt and prevent difficult upgrade paths. We strongly recommend using a tool to keep dependencies up-to-date, or at least notify when a new version is out.
This is especially true for EDC versions. Since the project has not yet reached a state of equilibrium, where we can follow SemVer rules, major (potentially breaking) changes and incompatibilities are to be expected on every version increment.
Internally we use dependabot to maintain our dependencies, as it is well integrated with GitHub actions, but this is not an endorsement. Alternatives exist.
3.3 Use database persistence wherever possible
While the connector runtime provides in-memory persistence by default, it is recommended to use database persistence in production scenarios, if possible. Hosting the persistence of several modules (e.g. AssetIndex and PolicyDefinitionStore) in the same database is generally OK.
This is because although memory stores are fast and easy to use, they have certain drawbacks, for instance:
- clustered deployments: multiple replica don’t have the same data, thus they would operate on inconsistent data
- security: if an attacker is able to create a memdump of the pod, they gain access to all application data
- memory consumption: Kubernetes has no memory limits out-of-the-box, so depending on the amount of data that is stored by a connector, this could cause runtime problems when databases start to grow, especially on resource constrained deployments.
3.4 Use proper Vault
implementations
Similar to the previous section, proper HSM (Vault
) implementations should be used in all but the most basic test and
demo scenarios. Vaults are used to store the most sensitive information, and by
default EDC provides only an in-memory variant.
3.4 Use UUIDs as object identifiers
While we don’t enforce any particular shape or form for object identifiers, we recommend using UUIDs because they are reasonably unique, reasonably compact, and reasonably available on most tech stacks. Use the JDK UUID implementation. It’s good enough.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.