Health Breach Notification Rulemaking #541358-00087

Submission Number:
Bradley Malin
Vanderbilt University
Initiative Name:
Health Breach Notification Rulemaking
The following comments are in response to 16. C.F.R. Part 318: Notice of Proposed Rulemaking and Request for Public Comments Concerning Proposed Health Breach Notification Rule, Pursuant to the American Recovery and Reinvestment Act of 2009. The following comments pertain to the designation of limited, and similar types of data, should be included or excluded from breach requirements. First, it should be made clear that when an organization creates a Limited Data Set (LDS), the organization may or may not retain a table that links LDS records to their corresponding identities. If such a linking table is not retained, or made available, it is impossible for an organization experiencing a breach to directly establish the identities of those in the LDS. If the organization cross-references the LDS with public records harboring identifiable information (assuming such records are available), it is questionable that such an action sufficiently assist in notification. While it is true that many individuals are unique based on the combination of features in an LDS (e.g., dates, geocodes, and gender), the entire population is not unique. Thus, even if an organization could link an LDS to an external naming source, it would be unable to establish the identities of all individual that would need to be notified. At the same time, one must question the extent to which an LDS, or variant of such data, could be leveraged to compromise privacy. The DHHS inquired if the suppression of additional features, such as the last three digits of a zip code or the day and month of birth sufficiently protect records from identification. While we are unable to define “sufficient” for the department, we are able to provide objective estimates regarding how such policies protect data. Using an existing statistical estimation approach [1] and the 2000 U.S. Census, we computed the percent of a state’s population that is expected to be unique given various demographics that could be made available under the policies. Specifically, we computed results for the following demographics: 1. {Gender, Year of Birth, State} - basically a safe harbor data set (SH) 2. {Gender, Date of Birth, 5-digit zip code} - basically a limited data set (LDS) 3. {Gender, Year of Birth, 5-digit zip code} - call this (LDS-Year) , which was explicitly requested in the DHHS call 4. {Gender, Date of Birth, 4-digit zip code} - call this (LDS-4zip) 5. {Gender, Date of Birth, 3-digit zip code} - call this (LDS-3zip) 6. {Gender, Date of Birth, 2-digit zip code} - call this (LDS-2zip), which was also explicitly requested in the DHHS call Our analysis revealed the following uniqueness estimates. Note, the following results are reported as the average (and standard deviation) across the U.S. States: 1. SH: 0.0001% (0.0002%) 2. LDS: 68.4% (7.9%) 3. LDS-Year: 0.38% (0.42%) 4. LDS-4zip: 36.8% (14.3%) 5. LDS-3zip: 7.5% (8.3%) 6. LDS-2zip: 0.33% (0.71%) Notice, LDS-Year and LDS-2zip are similar in their risks, though both are significantly a greater risk than the safe harbor policy. It should be further noted that we neglected to include the Census race attribute, which is permissible to disclose under both the limited data set and safe harbor policies and is available in public records such as voter registration lists. If we include this feature, uniqueness of the population increases, but not by much (according to our calculations). Thank you for your time and consideration of this matter. Bradley Malin, Ph.D. Assistant Professor of Biomedical Informatics, School of Medicine, Director, Health Information Privacy Laboratory Vanderbilt University Reference: [1] P. Golle. Revisiting the uniqueness of simple demographics in the US population. Proceedings of the 5th ACM Workshop on Privacy in the Electronic Society. 2006, pages 77-80.