Will companies protect our personal information if there were no laws and regulations?

Sunday, January 13, 2008

Data Obfuscation tools

In this post I would like to discuss about data masking or obfuscation or anonymization tools. There are several tools in the market today that are competing for their market share. Each tool has its own advantages and limitations and moreover they cater to certain segments. When I say segments, some are pretty strong in Mainframe flat files. Others are strong in databases like Oracle, SQL etc. And a few niche tools even cater to ERP, CRM applications. Here are some of the tools I would like to post.

1. Princeton Softech's Optim

2. DataVanatage

3. Camouflage


I will discuss in detail these tools in my future posts.

Tuesday, January 8, 2008

Data Discovery (E-Discovery)

Data discovery is very usefull if you are planning to search for PII that is residing at various locations* in the organization. I would like to expand few terms mentioned here.

PII - Personally Identifiable Information like SSN, Name, Address, Phone, email etc., in combination of one or more items.

*Locations - File servers, application servers, Databases, Websites, Mainframes, AS400 systems etc.

But why is it necessary to search for SSNs, customer names and their credit card numbers in the organization? The answer is, more and more companies are looking forward to protect your personal data and make sure that it will not be leaked to malicious users and to be in compliance with current regulations. So, companies use data discovery processes to discover this sensitive data. Once all these locations where sensitive data resides is known, management team can take appropriate actions to protect this data. The other main use of data discovery (e-discovery) is to enforce data retention rules required by various regulations.

Continue visiting, I am planning to write an article on how to discover data across multiple platforms.

Monday, January 7, 2008

Data Obfuscation

I overheard a guy talking about "Data Obfuscation". This term fascinated me and I asked him what it meant. He looked at me like a first grader. Forget what happened then. I have decided to educate people like me about different terms used for a single technology.

I heard about "Data Masking", "Data Redaction", "Data Sanitization" and "Data Anonymization" but not "Data Obfuscation" until recently. Data obfuscation, sanitization, redaction and data anonymization refer to the same concept. While "Data Masking" refers to masking data. We all have called help desk or the customer support. They ask you what are the last four digits of your social security number. Here they see only those four digits and nothing else. The rest of the digits are masked. This is what I call "Data Masking".

While "Data obfuscation" is a concept of altering the actual text\numeric in such a way that it does no longer have references to the actual text\numeric. All the above synonyms mentioned above refer to the same concept.

For example, If i have an SSN "123 45 6789" by employing any one of the several techniques (ask me and i will blog) will be transformed to a new number "678 45 3456". This technique is used by several financial companies to be used for testing and developing purposes.

Companies use this technique to provide developers and testers "real like data". In a way developers and testers would not know the difference between if this is generated data or real customer data.

I know you must be wondering about "what about integrity constraints in database?" How does application handle this data ? etc.. Come back next week and I will give you the answers.