High-level architecture

CloudBees SDM is a preview, with early access for select preview members. Product features and documentation are frequently updated. If you find an issue or have a suggestion, please contact CloudBees Support. Learn more about the preview program.

CloudBees SDM contains four main components: data ingestion, System of Record, web user interface (UI), and an authentication service.

Architecture diagram

Data ingestion

CloudBees SDM uses integrations, or data apps, to import data from third-party applications. A data app allows CloudBees SDM to synchronize with the current state of an application and enables data manipulation and retrieval on CloudBees SDM.

Currently, CloudBees SDM supports data integrations with GitHub, Jira, DevOptics, and Jenkins. The mechanism by which the data is ingested depends on the integration. However, at a high level, the integration uses both webhooks and fetch requests to ingest raw data into the System of Record database.

The data ingestion services are Java applications that run within a Kubernetes cluster and are, at a minimum, in charge of deploying and monitoring the Apache Flink topologies used to process the integration data. These services may also expose endpoints for handling webhooks and configuring client authentication. The raw data received from a webhook is put onto a queue via Apache Kafka and consumed by the associated Flink topology.

Apache Flink is a highly scalable stream-processing framework for Java-based languages. A Flink topology defines how one or more streams of data are processed. Each integration has an associated topology that is used to process raw data and keep the System of Record in sync with the third-party application. Currently, the Flink cluster is deployed on Google Kubernetes Engine.

System of Record

The System of Record (SOR) provides dynamic association and querying of data in CloudBees SDM. SOR is composed of a PostgreSQL database and a Java GraphQL API service. Raw data from integrations is stored as JSON blobs, enabling SOR to use PostgreSQL JSON functions for retrieval and manipulation. Each data record is associated with a specific data type, or schema, indicating the structure of the record along with any implicit relationships with other schemas. The dynamic nature of the schemas enables flexibility in querying and establishing relationships across two disparate data types. GraphQL, an open-source data query and manipulation language from Facebook, provides flexible querying across both CloudBees and third-party data.

Web UI

When users connect to CloudBees SDM, their browser receives a ReactJS front-end web bundle that is hosted by an NGINX web server running in a Kubernetes container. The UI redirects the user to authenticate before allowing the user to access their account on CloudBees SDM.

Authentication service

Users are authenticated through an authentication service, which provides a short-lived access token for making requests to the System of Record GraphQL API. This same service is in charge of authorizing integrations to give them access to writing data for a specific account that has enabled the integration.

When installing an integration, the user enables the integration to access their account in the third-party service, through the authorization mechanism outlined by that service. For cloud providers, this likely includes a redirect to a third-party service to accept an authorization request.