This paper explores about various ideas of privacy data protection. Now a day’s distinct types of information should be stored and depending upon the type of information, the level of protection measures should be taken into consideration i.e., if data stored is very sensitive like personal details of customers, business related of an organization then it should be protected precisely and any loophole in protection mechanism may result in catastrophic damage in people’s life. How the privacy protection of databases has been implemented by different methods is explained in this paper. This can be done by restricting the access to databases to authorized people which can be achieved by various authentication methods like fingerprint. As the technology evolved people want to carry out operations virtually at any time in place which raises a need to protect the data for the use of certain purpose.

Many organizations tend to make their databases public such as social networking sites where the query results show the multiple user details. We must restrict the results to limited details such that the privacy information of users is protected. Information leakage is a severe threat for the user’s privacy. Protection of released databases is very much important and, we cannot simply hide every bit of the data, so we opt for the data transformation so that the confidential information of an individual is not revealed. In this paper, we discuss about the two techniques namely k-anonymity and randomization. This paper provides the summary of all the issues that have been mentioned and provides with the necessary solutions for the respective problems that compromise the individual’s privacy.

As information that is stored in the databases varies like the fingerprint data, user personal data, user’s financial information, health information and many more there is absolute need to protect the data stored in the database. As people are more concerned about their privacy and since most of the online purchases requires the user’s personal data the online commerce was reduced by US$15 billion in 2001. If the organization doesn’t provide necessary mechanisms that are implemented to protect the user data people are not interested in giving their details about their personal data. Since database are accessed by many number of people there is a high probability of the information leakage. To avoid this information leakage, we follow various methods such as k-anonymity and fingerprint authentication. Some of the databases make their data available to the public for different purposes. For example, in case of health care database if it is available to public then it helps in the medical research. i.e. if the symptoms for a rare disease of a patient are known and the course of treatment that is used on that patient is known to all the doctors then it would help them to treat many others suffering from the same symptoms.

But this involves in the confidential information of patient and thus before publishing the database there is a need to hide the patient identity. This can be done by K-anonymity method—. This is a popular method that is used to protect the privacy of published databases by generalizing or suppressing the identifying attributes to make at least k records have the same set of identifying attributes. Even though making the k records having the same identifying attributes doesn’t protect the patient identity if there are only limited number of records which gives attacker a good chance that the confidential information can be obtained by uniformly guessing. To prevent this problem, we use randomization— techniques used along with k-anonymity where we introduce random noise to the original data. By combining two methods, randomization and k-anonymity we make the attackers work much harder to guess the individual’s identity. In this paper, the evaluation of the combined method of using k-anonymity and randomization is also evaluated by conducting the experiments on the real- world data.

In the modern and smart world, authentication systems using biometric techniques fingerprint, face are widely present which gives an absolute need to protect the databases that store this biometric information. Mainly due to the growing amount of using the smart gadgets the risk of revealing the sensitive data is also rapidly increasing and thus there is an immediate need of protecting this type of information. In this paper, a method in which the fingerprint data that is stored in form of templates will be discussed and about how the access to the database is provided is needed to access. In general, these biometric templates that are stored in databases may be stolen and modified such that the authorized person couldn’t have access to enter the system because biometric templates that are modified couldn’t be replaced like passwords. There are various techniques of data hiding which ensure the privacy of fingerprint template stored in database.

We use a method where thinned fingerprint image files which are much smaller in size and consists of all key features are stored. Keeping the finger print images alone as the template won’t be sufficient to reconstruct original fingerprint. To preserve all the features of the fingerprint template in data embedding, a novel lossless data hiding method for thinned fingerprint is proposed. Using the proposed scheme, we can reconstruct the original thinned fingerprint from the marked-fingerprint without data extraction. Therefore, the fingerprint matching accuracy is not affected after data embedding. In this paper, the methodology that is used to protecting the individual confidential information from revealing will be explained in the following sections. Before discussing the methodology that is used to protect the data of released Database we need to know the following details. Numerical data randomization Given n records of a same attribute x1, x2, . . ., xn, noise values y1, y2, . . ., yn are independent identically distributed random variables added to x1, x2, . . ., xn respectively, such that zi = xi + yi.

In this technique, we introduce various disturbances to the data that is being stored to the data. Quasi-Identifier Attribute Set Quasi-identifiers [1] are pieces of data that are not of themselves unique identifiers, but rather are adequately very much corresponded with a substance that they can be consolidated with other quasi-identifiers to make a unique identifier. Quasi-identifiers can in this way, when joined, turn out to be specifically recognizing data. This procedure is called re-recognizable proof. For instance, Latanya Sweeney has demonstrated that even though neither one of the genders, birth dates nor postal codes uniquely recognize an individual, the mix of each of the three is adequate to distinguish 87% of people in the United States Table: Anonymized Health Care Data In this paper, we try to prevent the linking attacks against an index value t given in a record Rt . To explain the method in a simple manner we consider there are only two types of attributes in the table, namely Quasi-Identifier attributes, and sensitive private attributes. Let Q denote the set of quasi-identifier attributes and S denote the set of private sensitive attributes. We have Rt = [Qt, St], in which Qt and St are respectively the quasi-identifiers and sensitive attributes in record Rt.

Assume that all attributes are independent. The probability of successful linking attacks [2] The success of linking attacks is the conditional probability of obtaining the record index t with the prior knowledge of R’t , where R’t is the disguised data record, different from the original data record Rt. The definition of S’t is The probability of reconstructing a sensitive attribute This shows in how many ways we can reconstruct the sensitive attribute with successful linking of the attributes. The privacy breach probability At the point when the attackers know both the record index t and any trait in the sensitive characteristics set of a similar record Rt, we say that the privacy of record Rt is disregarded. The privacy breach probability is characterized as in which sti is any sensitive attribute in record Rt. Methodology to protect the data of released Database K-anonymity and randomization combined approach to prevent privacy breach.

1.Privacy breach prevention Algorithm with a Combined Approach(PACA) In this paper, two techniques are combined, namely k-anonymization of the quasi-identifier attributes and randomization of private sensitive attributes, to protect the data more efficiently. The two methods are combined so that after we k-anonymize the quasi attributes we apply a multi group randomization on the sensitive attributes. When we consider a case where there are records having less data and same k-anonymity is applied on the records. This indicates that they all must have common components in their identifying attributes. If we consider the above-mentioned example of health care database where pin code and date of birth are the only quasi identifiers. If we found records in the same anonymity group then we can conclude that the patients must live either in the nearby areas or they must have born on the same day since they belong to the same anonymity group. This can be summarized as data records are deeply grouped by generalizing the identifying attributes.

To avoid this situation, we combined the k-anonymity with the randomization. In this paper, we use multi group randomization where same randomization is applied on the one identifier for the records of same group. Randomization is nothing but introduced some noise to the attributes of quasi identifiers and the noise introduced is independent from group to group. By using multi group randomization relationship of data in same group is preserved and better precision is achieved than the traditional randomization. The k-anonymization part in our algorithm is based on Incognito algorithm. The basic idea of Incognito is that the algorithm begins by checking single attribute subsets of the quasi-identifier, and then iterates, checking k-anonymity with respect to increasingly large subsets. In each iteration, a modified breadth-first search over a graph of candidate multi-attribute generalization is conducted, which yields the set of multi-attribute generalization of size i with respect to which T is k-anonymous. Then the algorithm constructs the set of candidate nodes of size i + 1 with the subset property. For details, please refer to [3].

2.Privacy Preservation Analysis When we know the privacy breach probability of published data we can show that the above algorithm provides much more privacy than apply any one of the k-anonymity or randomization. We can demonstrate that PACA gives less probability of privacy breach than K-anonymization and randomization approach. Let us consider the following privacy breach probabilities when different techniques are applied. Pa for k-anonymization Pr for randomization Pc for combined approach. We have a theorem which specifies the relationship between the above-mentioned probabilities. Theorem: Pc < Pr and Pc < Pa. Proof: From the definition of the privacy breach probability we have Let us assume a situation where linking attacks occur independently and original sensitive attributes are derived from disguised data. This implies that the probability of obtaining the record index(re-identification) and meanwhile deriving the original sensitive data from the randomized attributes equals the product of the probabilities in which the two events happen separately.

Thus, we have Since algorithm PACA uses k-anonymization to de-identify rows the probabilities of obtaining the record index(re-identification) in k-anonymity approach and algorithm PAPCA are equivalent, i.e., When we apply only k-anonymization to the identifying attributes, we have St = S’t and thus sti E S’t . So Pa (sti |[Q’t, S’t ]) = 1. Therefore, For combination approach Pc (sti |[Q’t, S’t ]) < 1. Thus, we have Now let us consider where randomization is only applied to sensitive attributes, i.e., Qt = Q’t .if there no k-anonymization then the probability of re-identifying the records increases. i.e., Therefore with Pa (sti |[Q’t, S’t ]) = Pc (sti |[Q’t, S’t ]) we have Pc < Pr Evaluation: Now the evaluation of time efficiency of PACA is done by conducting the experiments on real world data and we compare the results with the k-anonymization. After conducting the experiments, we got to know that there is not much of difference in the cost if we combine the randomization techniques.

We have evaluated our algorithm using the Adult data set from UC Irvine Machine Learning Repository which consists of the details of US Census. In this evaluation, we use nine attributes out of which two attributes are confidential and seven are quasi identifiers (Quasi-identifiers are pieces of information that are not of themselves unique identifiers, but are sufficiently well correlated with an entity that they can be combined with other quasi-identifiers to create a unique identifier) . . In this experiment, we utilize a data set which consists of 30152 records. The table below describes about the number of unique values of each attribute. Depending upon the number of unique values the generalization is based on categorical taxonomy tree or simple suppression. But in case of sensitive attributes we don’t apply the generalization.

Sample details

Related Topics

About Various Ideas Of Privacy Data Protection

Cite this page

Related Topics

Related Topics

About Various Ideas Of Privacy Data Protection

Cite this page

Related Topics

Check more samples on your topics