backend
Featured

Datarul - Enterprise Data Governance Platform

Enterprise SaaS data governance platform for managing metadata, lineage, and PII across client databases. Built as a polyglot microservices system (.NET Core, Python, Go) on Kubernetes, with Business Glossary, Data Dictionary, Report Catalog, Data Lineage, and Data Quality features. Includes automated metadata discovery across SQL/NoSQL databases, AI-assisted PII detection, and interactive lineage visualization built with React, D3.js, and RedisGraph.

Project Details

Role
Senior Software Engineer
Timeline
July 2023 - February 2025
Tech Stack
.NET Core
Python
Go
React
TypeScript
PostgreSQL
MongoDB
Elasticsearch
Redis
RabbitMQ
Docker
Kubernetes
GraphQL
gRPC
Datarul - Enterprise Data Governance Platform

Key Features

  • Business Glossary: Term standardization and corporate knowledge management with synonym detection
  • Data Dictionary: Automated metadata discovery across SQL/NoSQL databases with scheduled imports and change tracking
  • Report Catalog: Centralized report repository with impact analysis, version control, and cross-environment consistency
  • Data Lineage: Interactive graph visualization showing end-to-end data flow from source systems through transformations to reports
  • AI-assisted data classification for sensitive data (PII, PCI, PHI) tagging
  • Natural language search across metadata using Elasticsearch
  • Role-Based Access Control (RBAC) with Active Directory/LDAP integration and granular permissions
  • RESTful and GraphQL APIs for integration with existing enterprise tools
  • Scheduled metadata synchronization with configurable import frequencies and conflict resolution
  • Historical versioning of metadata changes with audit trails
  • Multi-tenancy support with data isolation between clients

Challenges

  • Indexing metadata across heterogeneous data sources (Oracle, SQL Server, PostgreSQL, MongoDB, etc.)
  • Building a lineage parser that handles complex SQL with CTEs, subqueries, and window functions
  • Implementing graph traversal for impact analysis across many data dependencies
  • Designing multi-tenant architecture with strict data isolation
  • Balancing eventual consistency across microservices with audit/integrity requirements
  • Supporting deployment on both on-premise air-gapped environments and cloud infrastructure

Solutions

  • Event-driven microservices using domain-driven design with bounded contexts
  • Custom SQL parser using ANTLR generating abstract syntax trees for lineage extraction
  • Graph database (RedisGraph) with optimized traversal for impact analysis
  • Multi-tenant PostgreSQL schema with row-level security
  • Adapter pattern for pluggable integration with new data sources
  • Elasticsearch cluster with custom analyzers for metadata search
  • Containerized services with Kubernetes enabling hybrid on-premise/cloud deployments

Project Gallery

Business Glossary - Creating corporate memory and standardizing business terms
Business Glossary - Creating corporate memory and standardizing business terms
Data Dictionary - Managing database assets under one roof
Data Dictionary - Managing database assets under one roof
Report Catalog - Monitoring reporting tools and tracking changes
Report Catalog - Monitoring reporting tools and tracking changes
Data Lineage - Visualizing and analyzing data flow with diagrams
Data Lineage - Visualizing and analyzing data flow with diagrams
Data Quality - Monitoring data accuracy, consistency, and reliability
Data Quality - Monitoring data accuracy, consistency, and reliability