Automatic MASS data anonymization in Microsoft Dynamics AX / 365

The development of IT technologies and services has brought business to a completely new level and enabled faster civilization development. Currently, all transactions and business operations are carried out almost immediately, anywhere in the world.

However, the digital era has not changed the need to use various types of data that are necessary to implement business processes.

Moreover, it greatly facilitated the possibility of disclosing, processing, copying and finally trading information. Personal data, as well as business data, are a highly sought-after “commodity” today. Their security is a sanctioned requirement by GDPR (2016/679). It is a key element of protecting our privacy, culture and organization in business. There is also market independence and competitiveness at stake. The media regularly reports about leaks, theft or the discovery of a huge amount of (shared) sensitive data.

To meet such threats, states and international organizations introduce laws that impose an obligation on businesses to protect sensitive data. In practice, this means that companies are must anonymize and pseudo-animate data in their ERP systems and more.

Anonymization and pseudonymization in ERP?

Viewed from the outside, anonymization is the same as deleting data. This means that
a person not appointed to do so does not have access to sensitive data stored in ERP-related databases. This means transforming sensitive data into random strings. In other words, anonymization can be viewed as an irreversible replacing data with other values. As a result, no one can read the original values from the database.

On the other hand, pseudo-anonymization is an identical process except that it is reversible. In very specific and strictly defined situations, an entity with the powers indicated by law (authority or institution) may gain access to the secured data. This means that it can use the saved keys, thanks to which it is possible to decrypt the indicated values, which constitute sensitive data. In both cases, securing such information is challenging. For example, deleting data from a cell in Excel does not disturb the program itself. In the case of an ERP, such as Microsoft Dynamics 365, things are different. 

In large ERP systems, the processes are interdependent. A small change in one process (i.e. deleting data from a table) imposes changes in another. Deleting key personal data from the list of recipients may break the relationship within a database. As a result, this could lead to a serious or total disruption of the application.

This is one of many reasons why tables are not deleted in the ERP system. Such a scenario may occur in strictly defined situations and only under the supervision of the application architect. In the same way, in exceptional circumstances, it is possible to clear certain values, taking into account the requirements of the program itself: the fields cannot be empty, null, or be repeated (must be unique). As a result, the anonymization of sensitive data in ERP consists of replacing them with fictitious values.

Why do we anonymize the data?

We can consider the process from several perspectives, two of which are key:

  • legal
  • business.

From the legal perspective, the company is bound by the letter of law, especially the GDPR and accompanying acts. According to those, sensitive data can be stored, processed, and duplicated only under justifiable circumstances.

Nevertheless, the interpretation of the provisions in this area is ambiguous. Especially in the context of the complexity of ERP. This makes it difficult for many companies to correctly determine what exactly sensitive data is and what types of data fall into this category. Is it the name, surname, address, personal id number, e-mail address, and telephone number? Or maybe the date of birth and bank account number?

This is critical for data protection related to ERP systems. What’s more, companies using such systems usually have many environments: production (containing all data), test, development, and many others.

This is all the more important as companies often back up their production environment data for non-production environments. Thus, they improve the work of testers and developers. Any duplication of such data is prohibited from the point of view of the GDPR. In law, if there is no need to copy sensitive data, companies should not do it.

However, from a business point of view, we can identify many cases, including entities from the IT industry, generating needs related to anonymization. This is the case of developers who have access to sensitive data, such as personal data, addresses, salaries, email, employee account numbers. This, of course, is in opposition to the letter of the law as well as to the internal policies of most companies.

From the company’s view, there is no need for the developer to know the customer base or the list of subcontractors. In this case, we must take into account the risk of illegal trading of sensitive information. As we mentioned earlier, deleting fields and entire tables from the database is complicated and causes many problems. Besides, there are many rules against which they can be validated. When a database contains several thousand related tables, the problem of anonymization becomes very complicated.

We can assume that we will perform such a process from the SQL level, but its complexity, implementation time, and related expenditure will be disproportionate to the data protection needs.

A certain solution may be to check the access rights. In practice, test teams have limited rights to access sensitive data. However, this is not anonymization. Although the testers do not perform their tasks using real data, the problem remains. The database administrator always has access to such a resource. Also, the restrictions do not change the fact that the information is still stored in the test environment.

So, there are several legal and business needs for data anonymization:

  • GDPR (2016/679) and other EU regulations 
  • very complicated and dangerous for the ERP process of data removal from hundreds or thousands of tables in databases
  • complexity, cost, and time-consuming anonymization on the SQL level
  • insufficient data protection through access authorization management.

Especially since the processes in ERP are secondarily dependent on the field values in the database. The exceptions are situations when the business process requires it, e.g. costs, or a catalog of cities to choose from. This means that the field may contain a string sufficient for its validation.