Data Integrity controls for a more secure cloud platform using the CIA Triad model (checklist)
In the last article we covered Confidentiality, CIA Triad in Cloud Security (Part 1: Confidentiality)
As a quick review, there are 3 aspects of the CIA Triad which is a model for security infrastructure for software.
- Confidentiality: Data access is restricted to authorized users.
- Integrity: Maintain data accuracy and trust.
- Availability: Keeps systems accessible when needed.
In this continuation of part 2 of the series let’s look at… Integrity.
In the CIA Triad, Integrity refers to the principle of keeping that data accurate, reliable, and unaltered unless modified through authorized processes. We need this to trust in the information systems that individuals and organizations rely on.
Keep in mind that data integrity can be damaged by:
- Malware and viruses that corrupt or modify data.
- Man-in-the-middle attacks that alter data in transit.
- Insider threats from employees or authorized users making unauthorized changes.
- Ransomware attacks that encrypt data, effectively altering its state and rendering it unusable until a ransom is paid — or corrupting it beyond recovery.
- SQL injection attacks or other vulnerabilities in web applications that may execute unauthorized database or CLI commands, potentially altering or deleting critical data.
- Human errors that cause accidental deletion, overwriting, or improper data handling by authorized personnel can compromise data integrity.
As such, in terms of implementation, we seek to prevent those from occurring and monitor closely so we respond if they do.
Here are the Integrity-related topics we’ll cover for this article:
- Data Hashing
- Digital Signatures
- Checksum Verification
- Version Control
- Database Constraints
- Input Validation
- Redundancy Checks
- Data Backup Policies
- Integrity Monitoring (tools)
- Patch and Configuration Management
- Virus and Malware Scans
- Immutable Storage
- Blockchain
- Audit Trails and Logs
- End-to-End Data Integrity Verification
- Data Classification
Some frameworks put Incident Response also in the category of Integrity, but I generally like to focus on that in the 3rd category, Availability. Incident response and recovery restores the availability of the correct data and software. But you may see it sometimes in the Integrity category of the CIA triad. We’ll cover it later!
Role-based access control (RBAC) in the previous Part 1 article also relates to this article as well for authorization of read/write for data.
Apply cryptographic hashing to data to detect any unauthorized changes by comparing current data hashes with expected values.
Hashing validates data integrity by generating a fixed-size unique hash value for input data. Any change in the data results in a completely different hash.
- Use secure cryptographic hash algorithms like SHA-256 or SHA-3.
- Store the expected hash values in a secure location separate from the data (such as AWS Secrets Manager, Azure Key Vault).
- Regularly verify data integrity by recalculating hashes and comparing them to the stored values.
- AWS Lambdas can be used for automated hash checks, as well as Azure Functions, or Google Cloud Functions if you on their platforms.
- OpenSSL can create cryptographic signatures for data or files, so that any unauthorized modification is detectable. Verifying these signatures helps confirm the integrity of the data. You can also use OpenSSL for generating cryptographic hashes to create a unique fingerprint of data, and, of course what it’s mostly used for, it secures data in transit with TLS/SSL encryption,
- HashiCorp Vault is a secrets management and data protection tool that is cloud platform independent. It can encrypt sensitive data at rest and in transit, so unauthorized changes are detectable and reversible. Vault has a lot of features and can use roles to restrict access and can generate short-lived secrets for applications or users.
- Blockchains use cryptographic hashes to endure the blocks produced are soundly linked.
- Bitcoin, Solana and Ethereum use various related cryptographic strategies with hashing. With Bitcoin, a SHA-256 hash is used with proof-of-work, while proof-of-stake variations in Solana and Ethereum link hashed blocks to maintain integrity.
- IPFS, a decentralized file storage system, also uses file integrity hashes for distributed blockchain storage of files.
In AWS for example:
- A Lambda function is triggered by a new file is uploaded. (use S3 events)
- The Lambda function retrieves the uploaded file from S3.
- It computes the SHA-256 hash of the file content such as with hashlib in Python.
- Compares the computed hash against a predefined known hash value (perhaps this was databased.)
This can get you started with the hash part of the Lambda in Python (you still would have to add your own handler):
import hashlib
import boto3
import os# Initialize S3 client
s3_client = boto3.client('s3')
# Predefined hash for comparison
EXPECTED_HASH = os.getenv("EXPECTED_HASH", "d41d8cd98f00b204e9800998ecf8427e") # Example MD5 hash for an empty string
def calculate_file_hash(file_path):
"""Calculate the SHA-256 hash of a file."""
hash_sha256 = hashlib.sha256()
with open(file_path, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash_sha256.update(chunk)
return hash_sha256.hexdigest()
# etc... add def lambda_handler(event, context):
Use digital signatures to verify the authenticity and integrity of data, ensuring it has not been altered.
- PKI (Public Key Infrastructure) solutions can be used to manage digital certificates and keys. This automates the issuance, renewal, and revocation of digital certificates, for consistent key management practices.
- Certificate authorities (CAs) provide trusted verification of public keys, preventing impersonation or misuse.
- Make sure your key generation and management is scalable. (PKI solutions)
- Used for software updates or binaries are unaltered by attackers.
- Legally binding electronic signatures for contracts, reducing reliance on physical paperwork.
- Validates transactions and blocks to maintain the trust and immutability of distributed ledgers and databases.
- Protects IoT devices and firmware updates from tampering or unauthorized access.
- Implement signing at the point of data generation to verify an unbroken chain of trust. Validate integrity of data as it moves through various storage, transmission, and processing stages. Timestamping digital signatures is also advisable.
- Verify digital signatures during data retrieval or processing to validate data authenticity.
- Open-source tools like GPG and OpenPGP are useful, and other related tools like GPG Suite. These tools can be used for checking integrity of data and are popular in DevOps for signing code and containers, confirming the integrity of build pipelines.
- You can also look into in-toto “A framework to secure the integrity of software supply chains”… “in-toto can help you protect your software supply chain.” Tools
- Cloud services like AWS KMS, AWS ACM, Azure Key Vault, Google Cloud KMS provide cloud platform digital signatures and verification. These adhere to high compliance standards like FIPS 140–2 and SOC 2, instilling confidence in the robustness of their security measures.
- Let’s Encrypt Boulder: “An ACME-based certificate authority, written in Go.”
- Smallstep Certificates: “A private certificate authority (X.509 & SSH) & ACME server for secure automated certificate management, so you can use TLS everywhere & SSO for SSH.”
- Dogtag PKI “The Dogtag Certificate System is an enterprise-class open source Certificate Authority (CA). It is a full-featured system, and has been hardened by real-world deployments.”
- Square Certstrap: “Tools to bootstrap CAs, certificate requests, and signed certificates.”
- AWS Private Certificate Authority (AWS PCA) is good for internal PKI needs within an organization. Allows issuing and managing private certificates for applications, devices, and users.
- AWS Certificate Manager (ACM) is best for managing SSL/TLS certificates for web applications on AWS. But does not issue certificates for client applications or devices outside of its AWS-supported ecosystem.
Lite: In AWS a very basic solution where a full-service PKI is not required:
- AWS KMS to create a customer-managed asymmetric key pair.
- Create a Lambda function that uses the KMS Sign operation.
- Get the data/file and use KMS to sign the data, returning a base64-encoded signature.
- Save the signature in an S3 bucket alongside the original data.
Checksums detect accidental or malicious alterations in files during storage or transmission by computing and verifying checksum values.
- You can use algorithms like CRC32 or MD5 (for non-critical data) for basic light checksum tasks, though for higher security it is recommended to use SHA-256 or SHA-3.
- Checksums help safeguard against transmission errors or storage corruption in distributed systems.
- You can determine if data has been tampered with or corrupted by comparing the checksum to the original.
- Automate checksum generation and validation in workflows using CI/CD pipelines. This is especially useful for validating build files or containers, remain unaltered throughout the deployment lifecycle.
- Without this, a virus or hacker could add malicious code or change data without you knowing, causing users to interact with malicious programs.
- Integrate checksum verification in file transfer protocols like SFTP or AWS S3 multipart uploads. AWS S3 multipart uploads automatically compute and verify checksums for each part of a file, validating the integrity of large uploads. AWS S3 supports multiple checksum algorithms (SHA-256, CRC32C, etc.) for validating object integrity after upload.
- Open-source tools can be used like md5sum, sha256sum, and cksum are command-line utilities for computing and verifying checksums. With scripting you can combine these tools with automation frameworks to handle large-scale checksum operations efficiently.
- Cloud services have implementations such as AWS S3 Checksum, Azure Blob Storage Checksum, Google Cloud Storage Integrity Checks.
In AWS when you upload a file you can specify checksum function (ordinarily you might do this programmatically with the SDK), for example:
There are a lot of docs on this here: Checking object integrity
See the tutorial:
✅ Quick promo, article continues below!
🚀 Check out some of my ebooks 📚 on Cloud Best Practices and Cloud Cost Savings at Store.SystemsArchitect.io:
Implement version control systems to manage data changes and maintain an accurate history of modifications.
- Use version control systems like Git for managing code and data changes. This creates an auditable history of all modifications. If a hacker or malicious actor makes a change it is easier to trace the history of the change and rollback to a legitimate version.
- Git is the most used versioning software and Github is the most popular cloud service for Git repos, though there are many other services you can use too.
- Git links changes to specific users, so if a user account has been compromised that can be also determined more quickly.
- Versioning provides a transparent history of changes for audits, satisfying regulatory requirements for many industries like finance and healthcare.
- Implement branch protections and code reviews to avoid unintended modifications. Use branch protection as access control for code repos and rules for who can deploy to various environments.
- Regularly back up repos for integrity and availability. Although Github and other tools backup repos, it is useful to have additional backups in addition.
- You can run Git with other cloud SaaS tools and platforms like GitLab, Azure DevOps. Some may be more preferable than other depending on your current stack. Github is very full-featured and a favorite of many, but others may give you certain advantages if your stack is on their platform.
- Take note: AWS is phasing out it’s cloud git service “AWS CodeCommit is no longer available to new customers.”
See Github docs or Gitlab docs
Constraints like primary keys, foreign keys, and unique constraints enforce rules at the database level to make sure that only valid and consistent data is stored.
- Constraints help reduce the risk of invalid data entering the database. This can come from malicious apps inserting random or fake data or human error.
- Define and enforce schema validation during database setup.
- Each record in a table should have a unique and non-null identifier, preventing duplicate entries and facilitating efficient indexing and retrieval. Implement not-null constraints, default values, and strict data type checks
- Guarantee that specific columns, such as email addresses or employee IDs, do not contain duplicate values across rows
- Use triggers and stored procedures to handle complex integrity checks. Triggers can enforce rules like preventing the deletion of parent records if dependent child records exist.
- Regularly audit and test constraints during schema migrations to avoid unintended data corruption.
- AWS Database Migration Service (DMS) can help with schema conversion. DMS tools like Schema Conversion Tool (SCT) are used to “[a]utomatically assess and convert the source database schema and a majority of the database code objects, including views, stored procedures, and functions, to a format compatible with the target database.” AWS Glue Data Catalog can help with unstructured ata migrations.
- Tools like Flyway or Liquibase can automate and validate schema changes in a controlled and repeatable manner.
A very simple example with SQL:
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY, -- Primary Key Constraint
name VARCHAR(50) NOT NULL, -- NOT NULL Constraint
age INT CHECK (age > 18), -- CHECK Constraint
email VARCHAR(100) UNIQUE, -- UNIQUE Constraint
department_id INT, -- Foreign Key to departments table
FOREIGN KEY (department_id) REFERENCES departments(department_id)
);
There are more complex use cases, but it’s a good idea to question every data type and column you are doing and setting up constraints in advance.
Some databases have special tools or features for this.
For example in Postgres you can use GENERATED ALWAYS AS for a computation type.
CREATE TABLE sales (
sale_id SERIAL PRIMARY KEY,
quantity INT NOT NULL,
price_per_item DECIMAL(10, 2) NOT NULL,
total_price DECIMAL(10, 2) GENERATED ALWAYS AS (quantity * price_per_item) STORED -- Automatically computes total
);
Validate inputs to prevent injection attacks or corrupted data from entering the system, safeguarding data accuracy.
- Create a list of inputs at entry points, such as forms, APIs, or file upload.
- Use regular expressions or libraries to strictly validate user inputs against expected formats.
- Use strick type checking libraries.
- Implement server-side validation in addition to client-side checks for enhanced security.
- Utilize frameworks with built-in input sanitization mechanisms, like Django or Spring Boot, to reduce vulnerabilities.
- OWASP ZAP (to identify vulnerabilities)
- Validator.js (for JavaScript input validation), Zod
- Python: pydantic, Cerberus
- AWS WAF, Azure Application Gateway (with web application firewall).
- Great Expectations: “Great Expectations (GX) is a framework for describing data using expressive tests and then validating that the data meets test criteria. GX Core is a Python library that provides a programmatic interface to building and running data validation workflows using GX.” (docs)
Site: “Zed Attack Proxy (ZAP) by Checkmarx is a free, open-source penetration testing tool. ZAP is designed specifically for testing web applications and is both flexible and extensible.”
Python regex:
import re# Input validation functions
def validate_email(email):
email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
if not re.match(email_regex, email):
raise ValueError("Invalid email address")
Perform redundancy checks to verify data accuracy, comparing data across different sources or systems.
- Use periodic reconciliation processes to compare records across primary and secondary storage.
- Use distributed databases or data lakes with automatic consistency checks, such as AWS Aurora or Apache Cassandra.
- Automate redundancy validation with tools that flag mismatched data for immediate correction.
- Great Expectations: “Great Expectations (GX) is a framework for describing data using expressive tests and then validating that the data meets test criteria. GX Core is a Python library that provides a programmatic interface to building and running data validation workflows using GX.” (docs)
- Apache Hadoop (for distributed redundancy checks), Elasticsearch (for real-time redundancy analysis).
- AWS S3 with Cross-Region Replication, Google Cloud Spanner (for consistency across replicas).
- Rubrik/Commvault: Multi-cloud backup solutions for enterprise-grade data management.
Testing redundancy on AWS:
- Set up Cross-Region Replication (CRR) to automatically replicate objects between two buckets in different regions.
- Use S3 Inventory or AWS DataSync to periodically validate that objects match between the source and destination buckets.
- Aurora Global Databases can be used to provide automatic synchronization between primary and secondary regions.
- Check AWS CloudWatch to monitor replication lag and consistency.
Regularly back up data so it can be restored to a known accurate state if integrity is compromised.
- Use automated backup schedules with appropriate retention policies to cover business needs.
- Adopt incremental or differential backup strategies to optimize storage and reduce backup times while making sure to have comprehensive data recovery options.
- High-priority data should have more frequent backup schedules, based on recovery point objectives (RPOs). An RPO is the maximum amount of data loss you can tolerate in the event of a disaster, system failure, or data breach. RTO is recovery time objective, hiw quickly to recover.
- Store backups in geographically distributed locations to avoid loss from regional outages.
- Use the 3–2–1 Backup Rule: Keep 3 copies of your data, store them on 2 different media, with at least 1 copy stored offsite.
- Use tiered retention policies, such as daily backups for the past 7 days, weekly backups for a month, and monthly backups for a year.
- Test backups periodically for restorability and data accuracy.
- Some tools include Bacula (backup and recovery), Restic (lightweight backup tool). Bacula is a backup and recovery solution for enterprise environments, supporting disk, tape, and cloud storage targets.
- For cloud tools try AWS Backup, Azure Backup. AWS Backup is a centralized, fully managed service that automates backup tasks for AWS resources like RDS, EBS, S3, and DynamoDB. Use cross-region backups and integration with CloudWatch for monitoring and alerts
- Supported services with AWS Backup: Amazon EBS, Amazon EC2, Amazon RDS, Amazon DynamoDB, Amazon EFS, AWS Storage Gateway, Amazon FSx
- S3 Backup strategy using AWS S3 Lifecycle Policies
Continuously monitor systems and databases for signs of tampering, including unauthorized modifications or deletions.
- Integrity monitoring is basically observability for all the integrity techniques we’ve been talking about.
- Deploy file integrity monitoring tools, such as Tripwire or OSSEC, for real-time alerts. Tripwire tracks changes to critical files and configurations. It provids detailed logs and alerts when unauthorized modifications are detected. OSSEC monitors file integrity and system logs.
- Use database activity monitoring (DAM) solutions to detect and log suspicious queries. DAM solutions can flag activities like SQL injection attempts, privilege escalations, or unauthorized schema modifications.
- Implement SIEM (Security Information and Event Management) tools like Crowdstrike, Datadog, Splunk or AWS CloudTrail for comprehensive oversight.
- Also there are a variety of SIEM tools in AWS Marketplace.
- OSSEC — OSSEC is a scalable, multi-platform, open source Host-based Intrusion Detection System (HIDS)
- Tripwire — “Your Integrity Management Ally
Detect and neutralize threats with superior security and continuous compliance”… - Open Source Tripwire “A Tripwire check compares the current filesystem state against a known baseline state, and alerts on any changes it detects. The baseline and check behavior are controlled by a policy file, which specifies which files or directories to monitor, and which attributes to monitor on them, such as hashes, file permissions, and ownership.”
- Wazuh (file integrity monitoring), Lynis (security auditing and integrity monitoring).
- AWS Config, AWS CloudTrail, Azure Sentinel.
For basic functions sha256sum for example, generate baseline hashes of files — “The program sha256sum is designed to verify data integrity using the SHA-256 (SHA-2 family with a digest length of 256 bits). SHA-256 hashes used properly can confirm both file integrity and authenticity. SHA-256 serves a similar purpose to a prior algorithm recommended by Ubuntu, MD5, but is less vulnerable to attack.” (source) or research more
sha256sum /path/to/critical/file > baseline_hash.txt
Set up monitoring to re-check file integrity:
sha256sum -c baseline_hash.txt
Regularly update software and firmware to fix vulnerabilities that could compromise data integrity.
- Minimize downtime, disruptions and security issues due to outdated software. Proactively managing patches and configurations prevents unexpected failures, keepnig system availability high. Use staging environments to test patches before deploying them in production systems.
- Implement a scheduled patch cycle, prioritize critical patches to address vulnerabilities promptly.
- Use policies, automation to enforce automatic updates and configuration baselines aligned with compliance requirements. Infrastructure as Code (IaC) tools like Terraform to enforce consistent configurations across environments.
- Schedule patches during low-traffic periods.
- Subscribe to vulnerability alert services (such as CVE databases, OWASP) to stay informed about critical patches.
- OSSEC is a good tool for keeping up with updates. Crowdstrike Falcon is a popular enterprise security software that does patch updates.
- AWS Systems Manager Patch Manager: Automates patching for Amazon EC2 and on-premises servers.
- AWS Config: Tracks configuration changes and detects non-compliance with desired baselines.
- Amazon Inspector: Scans for vulnerabilities and missing patches.
- Microsoft Azure Update Management, Google Cloud OS Patch Management, Red Hat Satellite
Regularly scan systems to prevent malicious software from compromising data integrity.
- Schedule automatic antivirus and anti-malware scans across all endpoints and servers.
- Use advanced threat detection solutions, like AWS GuardDuty or Microsoft Defender, to identify and mitigate sophisticated attacks.
- Keep signatures and threat detection databases up to date for effectiveness against new threats.
- There are many threat detection tools on the market both paid and open source. We can;t review them all here.
- Zscaler, Cloudflare Warp, Netskope
- ClamAV (open-source antivirus), YARA (to detect malware patterns).
- AWS GuardDuty, Google Cloud Security Command Center.
- Integrate with Amazon S3 to automatically scan uploaded files.
- Use a Lambda function triggered by S3 events to invoke antivirus tools like ClamAV. See this article for some AWS ideas: Integrating Amazon S3 Malware Scanning into Your Application Workflow with Cloud Storage Security
Use immutable storage options (such as AWS S3 Object Lock or WORM storage) to prevent overwriting or deletion of critical data, for the integrity of stored data.
- Configure Write Once, Read Many (WORM) policies to prevent tampering or accidental deletion of sensitive files.
- Regularly review and manage retention policies to make sure data remains accessible as needed while adhering to compliance.
- Use immutable storage for critical data like financial records, audit logs, or legal documents.
- Open Source Tools: OpenZFS (supports WORM-like snapshots), MinIO (object storage with immutability features).
- Cloud Tools: AWS S3 Object Lock, Azure Immutable Blob Storage.
There is an overlap with immutable storage and blockchains.
Some blockcahin storage solutions for example IPFS, create an immutable storage option:
- IPFS is a decentralized file storage system that uses content-addressable storage.
- Files are split into chunks, cryptographically hashed, and distributed across the network.
- Each file chunk has a unique hash. If a file changes, its hash changes, ensuring integrity.
We should not overlook the advantages of data integrity of a blockchain like Bitcoin, Solana or Ethereum.
- Immutability. Data added to a blockchain cannot be altered without consensus by leaders in a proof-of-stake system (Solana , Ethereum). Or in the cade of with Bitcoin with proof-of-work.
- Cryptographic Hashing. Every block contains a cryptographic hash of its contents and the previous block, linking them together securely.
- Consensus Mechanisms. Transactions are validated by network participants, ensuring only legitimate data is added.
- Bitcoin uses Proof of Work (PoW) and a decentralized ledger to record all transactions. Each transaction is cryptographically hashed and grouped into a block. Miners validate blocks and add them to the blockchain. Altering any transaction requires re-mining the entire chain from the altered block onward, which is computationally infeasible.
- Solana and Ethereum use Proof-of-Stake (and Solana also “Proof of History”) for immutable, linked blocks and timestamped ledger.
Implement audit trails and logging mechanisms to track changes and detect unauthorized modifications, for traceability and accountability.
- Use logging frameworks to centralize and structure logs for ease of analysis (such as JSON or structured log formats).
- Implement tamper-proof logging mechanisms to make sure logs cannot be altered.
- Regularly review and analyze logs to identify suspicious activity or unexpected data changes.
- Open Source Tools: Elastic Stack (ELK for log management), Fluentd (for log aggregation).
- Cloud Tools: AWS CloudTrail, Google Cloud Logging.
Implement end-to-end checks that verify data integrity from the point of creation to its final destination, especially in distributed systems.
- This is extending integrity verification monitoring to not just be a stored file or artifact but
- Use cryptographic checksums (such as SHA-256) to verify data at each step in the pipeline.
- Implement automated integrity verification as part of data transfer or replication processes.
- Monitor and log integrity verification results to quickly identify and resolve discrepancies.
- Open Source Tripwire, ZFS (Zettabyte File System) Open ZFS, Rsync (checksum-based file transfers), Great Expectations, Apache Kafka (supports end-to-end event integrity checks).
- Cloud Tools: AWS DataSync, Azure Data Factory.
Classify data based on sensitivity and integrity requirements, applying stricter controls to high-integrity data for protection.
We did cover Data classification also in Part 1 of this series.
One thing I didn’t understand earlier in my career was how important tagging is.
Tagging helps you inventory resources for a variety uses including cost optmization, security, and data classification — so we know the data is only being accessed by the right people.
As a review:
- Implement tagging systems for automatic classification of data based on sensitivity levels.
- Apply encryption, access controls, and monitoring based on the classification level.
- Regularly audit classification rules for compliance with regulatory and organizational standards.
- Open Source tolls may help like Apache Ranger (for policy-based data classification), OpenDLP (data loss prevention and classification).
- Use cloud tools likeAWS Macie (automated data classification), Azure Information Protection.