Contributors Manual

0. Intended audience

This document is aimed at software developers who have already read the adopter documentation and want to contribute code to the Eclipse Dataspace Components project.

Its purpose is to explain in greater detail the core concepts of EDC. After reading through it, readers should have a good understanding of EDCs inner workings, implementation details and some of the advanced concepts.

So if you are a solution architect looking for a high-level description on how to integrate EDC, or a software engineer who wants to use EDC in their project, then this guide is not for you. More suitable resources can be found here and here respectively.

1. Getting started

1.1 Prerequisites

This document presumes a good understanding and proficiency in the following technical areas:

  • JSON and JSON-LD
  • HTTP/REST
  • relational databases (PostgreSQL) and transaction management
  • git and git workflows

Further, the following tools are required:

  • Java Development Kit 17+
  • Gradle 8+
  • a POSIX compliant shell (bash, zsh,…)
  • a text editor
  • CLI tools like curl and git

This guide will use CLI tools as common denominator, but in many cases graphical alternatives exist (e.g. Postman, Insomnia, some database client, etc.), and most developers will likely use IDEs like IntelliJ or VSCode. We are of course aware of them and absolutely recommend their use, but we simply cannot cover and explain every possible combination of OS, tool and tool version.

Note that Windows is not a supported OS at the moment. If Windows is a must, we recommend using WSL2 or a setting up a Linux VM.

1.2 Terminology

  • runtime: a Java process executing code written in the EDC programming model (e.g. a control plane)
  • distribution: a specific combination of modules, compiled into a runnable form, e.g. a fat JAR file, a Docker image etc.
  • launcher: a runnable Java module, that pulls in other modules to form a distribution. “Launcher” and “distribution” are sometimes used synonymously
  • connector: a control plane runtime and 1…N data plane runtimes. Sometimes used interchangeably with “distribution”.
  • consumer: a dataspace participant who wants to ingest data under the access rules imposed by the provider
  • provider: a dataspace participant who offers data assets under a set of access rules

1.3 Architectural and coding principles

When EDC was originally created, there were a few fundamental architectural principles around which we designed and implemented all dataspace components. These include:

  • asynchrony: all external mutations of internal data structures happen in an asynchronous fashion. While the REST requests to trigger the mutations may still be synchronous, the actual state changes happen in an asynchronous and persistent way. For example starting a contract negotiation through the API will only return the negotiation’s ID, and the control plane will cyclically advance the negotiation’s state.
  • single-thread processing: the control plane is designed around a set of sequential state machines, that employ pessimistic locking to guard against race conditions and other problems.
  • idempotency: requests, that do not trigger a mutation, are idempotent. The same is true when provisioning external resources.
  • error-tolerance: the design goal of the control plane was to favor correctness and reliability over (low) latency. That means, even if a communication partner may not be reachable due to a transient error, it is designed to cope with that error and attempt to overcome it.

Prospective contributors to the Eclipse Dataspace Components are well-advised to follow these principles and build their applications around them.

There are other, less technical principles of EDC such as simplicity and self-contained-ness. We are extremely careful when adding third-party libraries or technologies to maintain a simple, fast and un-opinionated platform.

Take a look at our coding principles and our styleguide.

2. The control plane

Simply put, the control plane is the brains of a connector. Its tasks include handling protocol and API requests, managing various internal asynchronous processes, validating policies, performing participant authentication and delegating the data transfer to a data plane. Its job is to handle (almost) all business logic. For that, it is designed to favor reliability over low latency. It does not directly transfer data from source to destination.

The primary way to interact with a connector’s control plane is through the Management API, all relevant Java modules are located at extensions/control-plane/api/management-api.

2.1 Entities

Detailed documentation about entities can be found here

2.2 Programming Primitives

This chapter describes the fundamental architectural and programming paradigms that are used in EDC. Typically, they are not related to one single extension or feature area, they are of overarching character.

Detailed documentation about programming primitives can be found here

2.3 Serialization via JSON-LD

JSON-LD is a JSON-based format for serializing Linked Data, and allows adding specific “context” to the data expressed as JSON format. It is a W3C standard since 2010.

Detailed information about how JSON-LD is used in EDC can be found here

2.4 Extension model

One of the principles EDC is built around is extensibility. This means that by simply putting a Java module on the classpath, the code in it will be used to enrich and influence the runtime behaviour of EDC. For instance, contributing additional data persistence implementations can be achieved this way. This is sometimes also referred to as “plugin”.

Detailed documentation about the EDC extension model can be found here

2.5 Dependency injection deep dive

In EDC, dependency injection is available to inject services into extension classes (implementors of the ServiceExtension interface). The ServiceExtensionContext acts as service registry, and since it’s not quite an IoC container, we’ll refer to it simple as the “context” in this chapter.

Detailed documentation about the EDC dependency injection mechanism can be found here

2.6 Service layers

Like many other applications and application frameworks, EDC is built upon a vertically oriented set of different layers that we call “service layers”.

Detailed documentation about the EDC service layers can be found here

2.7 Policy Monitor

The policy monitor is a component that watches over on-going transfers and ensures that the policies associated with the transfer are still valid.

Detailed documentation about the policy monitor can be found here

2.8 Protocol extensions (DSP)

This chapter describes how EDC abstracts the interaction between connectors in a Dataspace through protocol extensions and introduces the current default implementation which follows the Dataspace protocol specification.

Detailed documentation about protocol extensions can be found here

3. (Postgre-)SQL persistence

PostgreSQL is a very popular open-source database and it has a large community and vendor adoption. It is also EDCs data persistence technology of choice.

Every store in the EDC, intended to persist state, comes out of the box with two implementations:

  • in-memory
  • sql (PostgreSQL dialect)

By default, the in-memory stores are provided by the dependency injection, the SQL variants can be used by simply adding the relevant extensions (e.g. asset-index-sql, contract-negotiation-store-sql, …) to the classpath.

Detailed documentation about EDCs PostgreSQL implementations can be found here

4. The data plane

4.1 Data plane signaling

Data Plane Signaling (DPS) is the communication protocol that is used between control planes and data planes. Detailed information about it and other topics such as data plane self-registration and public API authentication can be found here.

4.2 Writing a custom data plane extension (sink/source)

The EDC Data Plane is build on top of the Data Plane Framework (DPF), which can be used for building custom data planes. The framework has extensibility points for supporting different data sources and sinks (e.g., S3, HTTP, Kafka) and can perform direct streaming between different source and sink types.

Detailed documentation about writing a custom data plane extension can be found here.

4.3 Writing a custom data plane (using only DPS)

Since the communication between control plane and data plane is well defined in the DPS protocol, it’s possible to write a data plane from scratch (without using EDC and DPF) and make it work with the EDC control plane.

Detailed documentation about writing a custom data plane be found here.

5. Development best practices

5.1 Writing Unit-, Component-, Integration-, Api-, EndToEnd-Tests

test pyramid… Like any other project, EDC has established a set of recommendations and rules that contributors must adhere to in order to guarantee a smooth collaboration with the project. Note that familiarity with our formal contribution guidelines is assumed. There additional recommendations we have compiled that are relevant when deploying and administering EDC instances.

5.1 Coding best practices

Code should be written to conform with the EDC style guide and our coding principles.

A frequent subject of critique in pull requests is logging. Spurious and very verbose log lines like “Entering/Leaving method X” or “Performing action Z” should be avoided because they pollute the log output and don’t contribute any value.

Please find detailed information about logging here.

5.2 Testing best practices

Every class in the EDC code base should have a test class that verifies the correct functionality of the code.

Detailed information about testing can be found here.

5.3 Other best practices

Please find general best practices and recommendations here.

6. Further concepts

6.1 Autodoc

In EDC there is an automated way to generate basic documentation about extensions, plug points, SPI modules and configuration settings. To achieve this, simply annotate respective elements directly in Java code:

@Extension(value = "Some supercool extension", categories = {"category1", "category2"})
public class SomeSupercoolExtension implements ServiceExtension {

  // default value -> not required
  @Setting(value = "Some string config property", type = "string", defaultValue = "foobar", required = false)
  public static final String SOME_STRING_CONFIG_PROPERTY = "edc.some.supercool.string";

  //no default value -> required
  @Setting(value = "Some numeric config", type = "integer", required = true)
  public static final String SOME_INT_CONFIG_PROPERTY = "edc.some.supercool.int";

  // ...
}

during compilation, the EDC build plugin generates documentation for each module as structured JSON.

Detailed information about autodoc can be found here

6.2 Adapting the Gradle build

The EDC build process is based on Gradle and as such uses several plugins to customize the build and centralize certain functionality. One of these plugins has already been discussed in the previous chapter. All of EDC’s plugins are hosted in the GradlePlugins repository.

The most important plugin is the “EDC build” plugin. It consists essentially of these things:

  • a plugin class: extends Plugin<Project> from the Gradle API to hook into the Gradle task infrastructure
  • extensions: they are POJOs that are model classes for configuration.
  • conventions: individual mutations that are applied to the project. For example, we use conventions to add some standard repositories to all projects, or to implement publishing to OSSRH and MavenCentral in a generic way.
  • tasks: executable Gradle tasks that perform a certain action like merging OpenAPI Specification documents.

It is important to note that a Gradle build is separated in phases, namely Initialization, Configuration and Execution (see documentation). Some of our conventions as well as other plugins have to be applied in the Configuration phase.

6.3 The EDC Release process

Generally speaking, EDC publishes -SNAPSHOT build artifacts to OSSRH Snapshots and release build artefacts to MavenCentral.

We further distinguish our artifacts in “core” modules and “technology” modules. The earlier consists of the Connector, IdentityHub and FederatedCatalog as well as the RuntimeMetamodel and the aforementioned GradlePlugins. The latter is comprised up of technology-specific implementations of core SPIs, for example cloud-based object storage or Vault implementations.

6.3.1 Releasing “core” modules

The build processes for two module classes are separated from one another. All modules in the “core” class are published under the same Maven group-id org.eclipse.edc. This makes it necessary to publish them all at the same time, because once publishing of an artifact of a certain group-id is completed, no artifacts with the same group-id can be published anymore.

That means, that we cannot publish the Connector repository, then the IdentityHub repository and finally the FederatedCatalog repository, because by the time we get to IdentityHub, the publishing of Connector would already be complete and the publishing of IdentityHub would fail.

The way to get around this limitation is to merge all “core” modules into one big root project, where the project structure is synthesized and contains all “core” modules as subprojects, and to publish the entire root project. The artifact names remain unchanged.

This functionality is implemented in the Release repository, which also contains GitHub Actions workflows to publish snapshots, nightly builds and release builds.

6.3.2 Releasing “technology” modules

Building and publishing releases for “technology” modules is much simpler, because they do not have to be built together with any other repository. With them, we can employ a conventional build-and-publish approach.