Multi-Tenant Finance Data Platform at UBS

Multi-tenant Logical separation by business unit, geography, and application

4 functions Product Control, Treasury, Tax, Group Finance on one platform

Self-serve Business teams onboard data sources without IT bottlenecks

AI-powered Real-time data quality monitoring with active learning

The Business Problem

Large banks accumulate a particular kind of data debt. Each function (product control, treasury, tax, group finance) builds its own data infrastructure, often on its own platform with its own ingestion patterns and its own definitions of common entities. Within each function, individual reporting applications add another layer of bespoke pipelines. The result is a sprawl of overlapping systems holding similar data with different lineage, different quality controls, and different access models.

For UBS, this structural data fragmentation was creating cost and risk in three places:

Regulatory reporting. Reports like Swiss Capital Reporting, Swiss Cost Reporting, group disclosures, SG Tax, UK VAT, and 263A all needed common reference data and overlapping financial sources, but each was being assembled from a different combination of upstream systems with no shared platform layer.
Compliance and analytics. Trade reconciliation, product control metrics, GLASS Anomaly Detection, GL reconciliation, RBPT, IPV, GPL, and other compliance and analytics applications were each reinventing data ingestion and quality patterns rather than sharing a platform.
Capital and value risk control. Capital reporting and value risk control work needed unified data across business units and geographies, but the existing infrastructure made cross-function analysis slow and brittle.

The bank engaged airisDATA to design and deliver a Finance Data Hub for UBS: a single multi-tenant platform that could host applications across product control, treasury, tax, and group finance, with the data quality, governance, and self-service capabilities that would let the platform actually scale across the organisation rather than becoming the next bottleneck. The engagement built directly on the architecture and reusable IP airisDATA had previously developed for Credit Suisse, with roughly 80% reusable patterns across the two banks, allowing UBS to skip the discovery and architectural design phases that a greenfield build would have required.

The Solution

The Finance Data Hub is a multi-tenant data platform built on Cloudera with Spark, Impala, Kafka, HBase, Hive, H2O, and R as the supporting analytical stack. The architecture is organised around four design principles.

1. Multi-Tenant by Design

The platform supports both logical tenants (product control, treasury, tax, group finance) and micro tenants (individual reporting and analytics applications within each function). Tenant separation is enforced at the data access layer through RBAC, with applications grouped by function, location, and data sharing requirements.

The platform runs across two clusters (North America and Europe) to support data sovereignty and regulatory boundary requirements, with the same logical platform model on each. Applications can be deployed to either cluster based on data residency rules without requiring application-level changes.

In production today, the platform hosts applications across:

Product Control: Trade Reconciliation, FDE, PC Metrics, GLASS Anomaly Detector, GL REC, RBPT, IPV, GPL, FACE, BEAT
Treasury: Capital reporting and cost analysis applications
Tax: SG Tax, 263A, UK VAT, group disclosures, group QR
Group Finance: Swiss Capital Reporting, Swiss Cost Reporting, FDE, STP, IMBS, LEAR, GLREF

2. Self-Service Data Ingestion

One of the key design goals was to remove IT as a bottleneck for data onboarding. The platform supports both batch (FTP) and streaming data sources, with a self-serve ingestion layer that lets business teams onboard new data sources without filing IT tickets for every change. A common data model and reference data layer keep the platform coherent as new sources come on, and Control-M handles job scheduling across the ingestion and processing pipelines.

This was as much an operating-model decision as a technical one. The self-service ingestion layer required investment in data governance and stewardship to be safe, but it eliminated the months-long backlog that had existed when every new source required a custom IT engagement.

3. AI-Powered Data Quality

Data quality is enforced at multiple layers, with AI models providing real-time quality monitoring on top of standard rule-based validation. The AI data quality layer covers data profiling, statistical measures, extreme value analysis, deduplication, imputation, linkage, column value profiling, canonicalisation, co-occurrence analysis, data deviation detection, and referential integrity checking.

An Active Data Quality dashboard surfaces issues in real time to data stewards, and an active learning loop captures human resolutions to refine the quality models over time. For regulatory reporting workflows, key data element exceptions are tracked end-to-end with full lineage, including which data sources contributed, which systems stored a given KDE, which columns were used, what transformations were applied, and what queries produced the aggregate. This lineage capability is essential for satisfying audit and regulatory inquiry on regulated reports.

4. Defense-in-Depth Data Protection

Data protection is enforced through CORA (the bank's data protection layer) integrated into the platform, with role-based access control at the data layer, application-level authorisation, and infrastructure-level network segmentation. Data segregation between business units respects information barriers and internal policies, and the access control framework applies consistently across both pre-built reports and ad-hoc query interfaces.

Architecture at a Glance

Platform layer: Cloudera, Spark, Impala, Kafka, HBase, Hive, H2O, R
Storage: HDFS-based data lake, protected and shared by business unit, by tenant, by application
Ingestion: Self-serve batch (FTP) and streaming, with common data model and reference data
Quality: AI-driven data quality with active learning loop, plus rule-based validation
Access control: RBAC enforced at data access layer, integrated with bank SSO
Data protection: CORA integration plus infrastructure-level protections
Scheduling: Control-M for job orchestration
Geography: Dual-cluster deployment (North America and Europe) for data sovereignty

The platform was designed to support a many-to-many relationship between business units and applications, with grouping by function, location, and data sharing patterns. This is the architectural choice that lets the same platform host product control work in North America and Swiss tax reporting in Europe without requiring separate platform stacks or compromising on the shared services beneath them.

The Results

The Finance Data Hub became the production foundation for finance data at UBS, replacing the fragmented per-application infrastructure that preceded it. Specific outcomes included:

Eliminated duplicate ingestion. Common data sources were ingested once into the platform rather than being re-ingested by each consuming application, cutting both pipeline cost and data inconsistency.
Self-service onboarding at scale. Business teams could add new data sources and stand up new analytics applications without month-long IT engagements, dramatically reducing the time from business need to working data product.
Real-time data quality. AI-driven quality monitoring caught issues at the ingestion layer rather than letting them propagate to consuming reports and analytics. The active learning loop meant the quality models kept getting better as the platform matured.
Audit-ready lineage. End-to-end data lineage for regulatory reports satisfied audit requirements that had previously required substantial manual reconstruction.
Cross-function analytics. Applications and analytics that needed data across product control, treasury, tax, and group finance could now run against a single platform rather than stitching together extracts from multiple systems.
Architectural reusability. The platform model has since been adapted for similar multi-tenant data platform use cases at other tier-1 banks. The architecture, the ontology approach, the multi-tenant access pattern, and the AI-driven data quality layer all transfer.

Why This Pattern Matters Beyond Banking

The Finance Data Hub architecture is not specific to finance. The pattern (multi-tenant data platform with self-service ingestion, AI-driven quality, role-based access, and full lineage) applies anywhere an enterprise has multiple business functions sharing common data and accumulating per-function data infrastructure debt. We have adapted the same architecture for clinical and commercial data platforms in life sciences, customer data platforms in retail, and network data platforms in telecom.

The reusability of the platform architecture is one of airisDATA's core advantages. New engagements start from a working reference rather than a blank architecture document.

About airisDATA

airisDATA is the AI and data engineering practice of Innovative Information Technologies. Founded in 2015 and based in Princeton, NJ with delivery teams in Hyderabad and Pune, airisDATA has shipped production AI and data platforms inside tier-1 banks for more than a decade. The Finance Data Hub work spans both Credit Suisse (where the original architecture and reference IP were built) and UBS (where the platform was deployed using the proven patterns), alongside related deliveries in automated trade reconciliation, contract review for the LIBOR transition, regulatory data quality automation, treasury liquidity and funding forecasting, and on-demand value-at-risk computation.