3 MINS READ
Such data, which can include personally identifiable information (PII), intellectual property (IP) and financial information, needs to be secured. Cyber security company Varionis reports that there were more than 3,950 confirmed data breaches in 2020, and data breaches were among the top causes of data loss. Given the number of data loss incidents reported each day, it is imperative for organizations to chalk out a strategy to ensure sensitive data is not lost, misused or accessed by unauthorized users.
Leakage or threat to a data asset need not necessarily be external to an organization. In fact, employees and contractors are more likely to cause a data breach. Employees could unwittingly download files to a vulnerable USB drive or use their organization IP asset for their own gains. Contract employees, vendors and partners trusted with access to an organization’s network pose a similar threat. Organizations spend more to deal with negligent insiders than any other threat profile. Data loss makes organizations vulnerable to data leaks, breaches and industrial espionage leading to regulatory non-compliance as well as long-term damage to reputation and competitive edge.
Where business data is identified, data usage is monitored, and security policies are enforced to prevent users from moving sensitive data outside an organization’s network. The scope of a DLP solution spans the entire IT environment of an organization including data repositories, laptops, desktops and communication tools such as e-mail and cloud applications.
Predictive technologies, using machine learning (ML) algorithms, can enhance DLP solutions and provide a granular control of security breaches within the enterprise. ML models can be used to drive greater accuracy to detect unintended and purported insider breaches and other threats. AI engines get smarter at DLP with every data loss situation they analyze.
Before an organization invests in a DLP solution, they need to understand their requirements. Data in an organization is scattered throughout their network, endpoints, servers, file share, databases and so on. Data is classified into high, medium, low or non-sensitive based on predefined business criteria. The classification exercise helps organizations to build security controls based on sensitivity. AI-ML models can be used to classify sensitive data with high accuracy.
For example, it can search for numeric strings of nine characters to detect social security number or identify a 13-19 digit sequence of Luhn check formula to identify information marked as sensitive. The data classification exercise not only helps in data protection but is also required to ensure compliance with regulations such as PCI DSS, GDPR, HIPA and others. Amazon Macie leverages machine learning and pattern-matching techniques to bucket data and sends alerts on sensitive data such as PII.
The second component of the DLP framework is monitoring across network egress and public-facing devices. It involves installing a DLP agent on all hosts that process data within the production environment. The monitoring process should also scan configuration changes and raise alerts in case of accidental deviations in security policies or unauthorized access to the data. AWS services such as Amazon CloudWatch and AWS CloudTrail help in monitoring, while alerts could be managed using Amazon SNS. Machine learning helps to set up real-time alerts to for timely remediations.
The final component of the DLP framework is policy enforcement. Specific rules are created based on the content and context of the data classification. The rule will dictate the type of action to take when a certain type of data is accessed or leaked. For example, if an AWS admin accidentally tries to open a secure bucket to public, the policy enforcement mechanism should prevent the action from being completed. AWS Organizations service helps to define policies on AWS resources. AWS Config can also be leveraged that helps the entire AWS account to be continuously evaluated for compliance. In order to build a robust security governance, organizations should bring in automation to codify these policies for which open-source tools such as Cloud Custodian could be used.
With the proliferation of connected devices leading to a borderless network, organizations need to ensure visibility and granular control on data usage to reduce enterprise attack surface. DLP framework implementation brings complete visibility into the location and usage of data within the AWS environment. It leverages machine learning to:
Enterprises should proactively adopt DLP framework as a strategic initiative, and leveraging machine learning in this effort will ensure end-to-end security operation efficiencies and help protect sensitive business data.