Dataspaces
A brief introduction to what a dataspace is and how it relates to EDC.
The quickest way to get started building with EDC is to work through the samples. The samples cover everything from basic scenarios involving sharing files to advanced streaming and large data use cases.
The EDC Minimal Viable Dataspace (MVD) sets up and runs a complete demonstration dataspace between two organizations. The MVD includes automated setup of a complete dataspace environment in a few minutes.
EDC is architected as modules called extensions that can be combined and customized to create components that perform specific tasks. These components (the “C” in EDC) are not what is commonly referred to as " microservices." Rather, EDC components may be deployed as separate services or collocated in a runtime process. This section provides a quick overview of the key EDC components.
The Connector is a pair of components that control data sharing and execute data transfer. These components are the Control Plane and Data Plane, respectively. In keeping with EDC’s modular design philosophy, connector components may be deployed in a single monolith (for simple use cases) or provisioned as clusters of individual services. It is recommended to separate the Control Plane and Data Plane so they can be individually managed and scaled.
The Control Plane is responsible for creating contract agreements that grant access to data, managing data transfers, and monitoring usage policy compliance. For example, a data consumer’s control Plan initiates a contract negotiation with a data provider’s connector. The negotiation is an asynchronous process that results in a contract agreement if approved. The consumer connector then uses the contract agreement to initiate a data transfer with the provider connector. A data transfer can be a one-shot (finite) transfer, such as a discrete set of data, or an ongoing ( non-finite) data stream. The provider control plane can pause, resume, or terminate transfers in response to certain conditions. For example, if a contract agreement expires.
The Data Plane is responsible for executing data transfers, which are managed by the Control Plane. A Data Plane sends data using specialized technology such as a messaging system or data integration platform. EDC includes the Data Plane Framework (DPF) for building custom Data Planes. Alternatively, a Data Plane can be built using other languages or technologies and integrated with the EDC Control Plane by implementing the Data Plane Signaling API.
The Federated Catalog (FC) is responsible for crawling and caching data catalogs from other participants. The FC builds a local cache that can be queried or processed without resorting to complex distributed queries across multiple participants.
The Identity Hub securely stores and manages W3C Verifiable Credentials, including the presentation of VCs and the issuance and re-issuance process.
EDC components are deployed to create a dataspace ecosystem. It is important to understand that there is no such thing as “dataspace software.” At its most basic level, a dataspace is simply a context between two participants:
The Federated Catalog fetches data catalogs from other participants. A Connector negotiates a contract agreement for data access between two participants and manages data transfers using a data plane technology. The Identity Hub presents verifiable credentials that a participant connector uses to determine whether it trusts and should grant data access to a counterparty.
The above EDC components can be deployed in a single runtime process (e.g., K8S ReplicaSet) or a distributed topology (
multiple ReplicaSets or clusters). The connector components can be further decomposed. For example, multiple control
plane components can be deployed within an organization in a federated manner where departments or subdivisions manage
specific instances termed Management Domains
.
EDC was designed with the philosophy that one size does not fit all. Before deploying an EDC-powered data sharing ecosystem, you’ll need to build customizations and bundle them into one or more distributions. Specifically:
Third parties and other open source projects distribute EDC extensions that can be included in a distribution. These will typically be hosted on Maven Central.
A brief introduction to what a dataspace is and how it relates to EDC.
An overview of the EDC modularity system.
Explains how data, policies, access control, and transfers are managed.
Describes how the EDC integrates with off-the-shelf protocols such as HTTP
, Kafka
, cloud object storage, and other technologies to transfer data between parties.
Details how EDC implements decentralized identity, access control, and trust using standards such as Decentralized Identifiersand W3c Verifiable Credentials.
Covers how publishing and retrieving federated data catalogs works.
Explains how to create distributions and design deployment architectures. This chapter also provides an overview of Management Domains and system configuration.
Details how to add customizations, features, and new capabilities to EDC components.
Covers how to use EDC test runtimes.
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.