Strategic Database Choices for Resilient Software

Before starting a project, it’s important to pause and carefully consider the choice of architecture. This decision significantly impacts how the project will evolve and the challenges it may face in the future. While it might be tempting to jump straight into implementation—choosing microservices and PostgreSQL without deeper analysis and diving into records, classes, web clients, and repositories—I want to follow best practices. My goal is to analyze individual topics in a structured manner using ADRs (Architectural Decision Records) and explore where this process leads.

The first step in designing a robust system is to define its key attributes. Ideally, every application would embody all the qualities of "good software" (I’ve briefly discussed what this entails here). However, to ensure a comprehensive understanding of the project’s needs, it’s common practice to identify 7–10 critical architectural traits.

Based on my analysis of both business and technical requirements, I have identified the following priorities. The list is ordered from most to least important, reflecting the aspects that will most strongly influence strategic decisions. Although it was a challenging task, I ultimately decided on the following arrangement:

Maintainability – The ease of modifying the application’s code and logic, essential for adapting quickly to legal or business changes.
Security – Protecting sensitive data, such as personal information or proprietary business details.
Legal compliance – Ensuring alignment with legal requirements (e.g., GDPR), especially given the project’s focus on legal aspects of business operations.
Agility – Facilitating rapid system expansion, testing, and deployment of new features—a key factor in personal projects aiming for fast iteration cycles.
Privacy – Ensuring that only authorized users can access resources or sensitive data.
Usability – Creating an intuitive and user-friendly interface, even for non-technical users.
Data integrity – Maintaining data consistency, preventing duplication, and ensuring accuracy.
Concurrency – Supporting multiple users working simultaneously on the same documents or forms.

Choosing the right database architecture is one of the key aspects of software design and development. The decision about the number of databases an application should use is far from trivial and requires careful consideration of many factors. To make an informed decision, it’s crucial to first understand the application’s context and its functional requirements.

Using the first four attributes—Maintainability, Security, Legal compliance, and Agility—as a foundation, I began analyzing the technical aspects of the application’s architecture. At this stage of design, I will treat them as constants and base all other design decisions on them. This limitation stems from the fact that it is impossible to design a program that satisfies all priorities equally. Solutions are often mutually exclusive (e.g., performance optimization vs. code readability, scalability vs. simplicity of architecture, and many others). In programming, as in life, there are always trade-offs—choosing one priority often comes at the expense of another.

One such trade-off is the choice of database architecture. It exemplifies the challenge of balancing priorities, as this decision impacts performance, scalability, and maintainability. Determining the number of databases an application should use requires a nuanced understanding of the project’s specific context and functional requirements.

To illustrate this process, let's look at an application that will manage various types of data. This application needs to store and process information about users (theirs roles and access levels), documents (template data and and dynamic fields filled in by users) and company data like financial information, personal details of board members and employees, company structure.

Let's decide now how many databases the application needs.

In this subsection, we will analyze different scenarios and criteria to consider when choosing the number of databases. We will look at the advantages and disadvantages of using a single database versus multiple databases in the context of our key attributes. I have prepared a comparison in the table below.

Analysis:

Criterion	One Database	Multiple Databases
Security	Centralization may mean higher risk in case of an attack, but simpler access policy management.	Separation of sensitive data improves security but complicates policy synchronization.
GDPR Compliance	Possible centralized audits and logging of changes, easier to manage.	Ability to geolocate and separate data in accordance with legal regulations.
Maintainability	A single structure simplifies migrations and backups.	Each database requires separate maintenance processes and synchronization.
Agility	Simplicity in managing schema and data logic in one place.	Flexibility in designing dedicated solutions for specific types of data.
Performance	As data volume increases, performance may become an issue.	Separation of data can improve performance for specific operations.
Cost	Lower setup and maintenance cost.	Higher setup cost and separate backup and management processes.

Decision:

Due to the large variety of data and the priority of security and GDPR compliance, I have decided to use multiple databases. This decision is driven by the need to separate sensitive user data from other types of data, which enhances security by limiting access to personal information. Additionally, separating databases allows for compliance with GDPR by enabling data to be geolocated and managed according to legal requirements. Separating user data and their sensitive information will better meet both Security and Legal Compliance requirements. Moreover, it will allow greater flexibility in design and future changes. For example, if a new regulation requires changes to how personal data is stored, having a dedicated database for user data simplifies the process. Unfortunately, the downside is that maintaining multiple databases is more challenging. To be honest, I was unsure whether to go this route, especially considering that the project is just starting, and there might be time to separate the databases later. Nevertheless, sticking to the principles, it seems ultimately worth making the decision in line with them, even though it complicates the application.

Consequences:

System complexity: Requires additional synchronization processes.
Scalability: Flexibility for future modifications to the data structure.

Now, let's look at the next step: we need to choose the type of database.

In this subsection, I decided to compare relational and non-relational databases in the context of our key attributes. Of course, I am aware that there are other types like graph or time-series databases, but I admit I haven't had much experience with them, and I don't want to introduce too many new elements into the project.

Analysis of database types:

Criterion	Relational (RDBMS)	Non-Relational (NoSQL)
Security	Good support for authorization mechanisms and encryption.	Support for distributed systems, but requires more configuration.
GDPR Compliance	Built-in audit and compliance mechanisms.	More difficult to implement audit at the system level.
Maintainability	Standard tools and wide support.	Greater flexibility in modifying data structures.
Agility	Limited capabilities for dynamic changes in data structure.	Free changes in the data schema.
Performance	Efficient for transactional data.	Fast for non-relational data (e.g., JSON, documents).
Use Cases	Financial data, logins, and roles.	Document templates, data with variable formats.

Decision:

Having chosen multiple databases in point 1, I will use a specific database for each data type based on their respective types.

Data Type	Description	Decision on Data Choice
Company Data	Financial data, organizational structure, personal data of board members and employees.	Relational database due to the structural nature and integration needs.
Application User Data	Information about users, roles, access levels, connection with company data.	Relational database for easy management of relationships between users and companies.
Form and Document Templates	Document definitions and dynamic fields (e.g., forms, consent for data processing).	Non-relational database for flexibility in handling changing data structures.
Document Content	Texts generated from document templates (large, unstructured text data).	Non-relational database for full-text search and content analysis.

Finally, to address the synchronization challenge, we need to consider how to manage data synchronization between different databases.

The application manages various types of data, which are stored in different databases. These data are strongly interconnected, e.g., a user of the application may be assigned to one or more companies. Changes to company data should be immediately visible in the systems using user data and vice versa. It is important to synchronize data between different databases (relational and non-relational) in a way that ensures:

Data consistency (especially for sensitive and dependent data, e.g., users and companies).
Performance (avoiding excessive delays and synchronization costs).
Compliance with legal regulations (e.g., logging changes in data).

The following solutions were considered to address the problem: Event-driven architecture, Batch processing, Centralized synchronization system with middleware.

Event-driven Architecture

Description

Any change in company data (e.g., updating the company name) generates an event that is transmitted to a communication management system (e.g., Apache Kafka, RabbitMQ). Event recipients (e.g., user database) process them and update their data.

Advantages

Agility: Easy to extend the system with new functionalities.
Maintainability: Modular architecture where each system operates independently.
Performance: Asynchronous processing offloads systems from long transactions.

Disadvantages

Delays: Data may be inconsistent for a short time (eventual consistency).
Complexity: Implementing and maintaining an event-driven system requires additional infrastructure.
Security: Event flow security must be ensured (e.g., encryption, authorization).

Priodic Synchronization (Batch Processing)

Description

Data is synchronized at defined intervals (e.g., every 5 minutes) using ETL (Extract, Transform, Load) processes or dedicated scripts.

Advantages

Simplicity: Easy to implement and understand.
Performance: Fewer real-time operations reduce database load.
Security: Data is synchronized in a controlled manner.

Disadvantages

Lack of Timeliness: Data may be inconsistent between synchronizations.
Legal Compliance: It may be harder to track real-time changes.
Maintainability: Any changes to database schemas require modifications to ETL processes.

Centralized Synchronization System with Middleware

Description

The middleware system manages real-time synchronization but works centrally, providing greater control over the process. Middleware receives data from one database, processes it, and writes it to the second database.

Advantages

Data Consistency: Data is synchronized in real-time.
Legal Compliance: The centralized system enables accurate change logging and reporting.
Security: Middleware can serve as an additional access control point.

Disadvantages

Complexity: Developing and maintaining middleware requires significant effort.
Costs: Can increase infrastructure costs (e.g., middleware servers).
Single Point of Failure: Middleware issues can impact synchronization.

Decision:

From the perspective of the set priorities, I have chosen Event-driven Architecture. Events can be encrypted and signed, ensuring their security during transmission. The event-driven system is scalable and modular, making it easier to introduce changes. The event-driven architecture allows quick implementation of new features and integration of additional systems, and each event can be logged, providing a complete audit trail.

Conclusion:

In conclusion, the architectural decisions made in the early stages of a project are crucial for its long-term success and adaptability. By prioritizing key attributes such as maintainability, security, legal compliance, and agility, and documenting these choices through some kind of Architectural Decision Records (ADRs), we can ensure a robust and flexible system design. The decision to use multiple databases tailored to specific data types, along with an event-driven architecture for synchronization, reflects a commitment to security, compliance, and scalability. As the project evolves, these foundational choices will support rapid iteration and integration of new features, ultimately contributing to a more resilient and efficient application.

# 3. Crafting a Resilient Software Architecture: Strategic Database Decisions for Future-Proof Development

Table of contents

Let's decide now how many databases the application needs.

Analysis:

Decision:

Consequences:

Now, let's look at the next step: we need to choose the type of database.

Analysis of database types:

Decision:

Finally, to address the synchronization challenge, we need to consider how to manage data synchronization between different databases.

Event-driven Architecture

Priodic Synchronization (Batch Processing)

Centralized Synchronization System with Middleware

Conclusion: