Showing posts with label data usage. Show all posts
Showing posts with label data usage. Show all posts

Sunday, February 04, 2018

Data Ethics Regulation: Two key updates in 2018

This year, two important new regulations will be impacting research with human subjects: the EU's General Data Protection Regulation (GDPR), which kicks in May 2018, and the USA's updated Common Rule, called the Final Rule, is in effect from Jan 2018. Both changes relate to protecting individuals' private information and will affect researchers using behavioral data in terms of data collection, access, use, applications for ethics committee (IRB) approvals/exemptions, collaborations within the same country/region and beyond, and collaborations with industry.
Both GDPR and the final rule try to modernize what today constitutes "private data" and data subjects' rights and balance it against "free flow of information between EU countries" (GDPR) or . However, the GDPR's approach is much more strongly in favor of protecting private data
Here are a few points to note about GDPR:

  1. "Personal data" (GDPR) or "private information" (final rule) is very broadly defined and includes data on physical, physiological or behavioral characteristics of a person "which allow or confirm the unique identification of that natural person".
  2. The GDPR affects any organization within the EU as well as "external organizations that are trading within the EU". It applies to personal data on any person, not just EU citizens/residents.
  3. The GDPR distinguishes between "data controller" (the entity who has the data, in the eyes of the data subjects, e.g. a hospital) and "data processor" (the entity who operates on the data). Both entities are bound and liable by GDPR.
  4. GDPR distinguishes between "data processing" (any operation related to the data including storage, structuring, record deletion, transfer) and "profiling" (automated processing of personal data to "evaluate personal aspects relating to a natural person". 
  5. The Final Rule now offers an option of relying on broad consent obtained for future research as an alternative to seeking IRB approval to waive the consent requirement.
  6. Domestic collaborations within the US now require a single institutional review board (IRB) approval (for the portion of the research that takes place within the US) - effective 2021.
The Final Rule tries to lower burden for low-risk research. One attempt is new "exemption" categories for secondary research use of identifiable private information (i.e. re-using
identifiable information collected for some other ‘‘primary’’ or ‘‘initial’’ activity) when: 
  • The identifiable private information is publicly available;
  • The information is recorded by the investigator in such a way that the identity of subjects cannot readily be ascertained, and the investigator does not contact subjects or try to re-identify subjects; 
  • The secondary research activity is regulated under HIPAA; or
  • The secondary research activity is conducted by or on behalf of a federal entity and involves the use of federally generated non-research information provided that the original collection was subject to specific federal privacy protections and continues to be protected.
This approach to secondary data, and specifically to observational data from public sources, seems in contrast to the GDPR approach that states that the new regulations also apply when processing historical data for "historical research purposes". Metcalf (2018) criticized the above Final Rule exemption because "these criteria for exclusion focus on the status of the dataset (e.g., is it public? does it already exist?), not the content of the dataset nor what will be done with the dataset, which are more accurate criteria for determining the risk profile of the proposed research".

Sunday, November 14, 2010

Data visualization in the media: Interesting video

A colleague who knows my fascination with data visualization pointed me to a recent interesting video created by Geoff McGhee on Journalism in the Age of Data. In this 8-part video, he interviews media people who create visualizations for their websites at the New York Times, Washington Post, CNBC, and more. It is interesting to see their view of why interactive visualization might be useful to their audience, and how it is linked to "good journalism".

Also interviewed are a few visualization interface developers (e.g., IBM's Many Eyes designers) as well as Infographics experts and participants at the major Inforgraphics conference in Pamplona, Spain. The line between beautiful visualizations (art) and effective ones is discussed in Part IV ("too sexy for its own good" - Gert Nielsen) - see also John Grimwade's article.


Journalism in the Age of Data from Geoff McGhee on Vimeo.

The videos can be downloaded as a series of 8 podcasts, for those with narrower bandwidth.

Thursday, September 06, 2007

Data mining = Evil?

Some get a chill when they hear "data mining" because they associate it with "big brother". Well, here's one more major incident that sheds darkness on smart algorithms: The Department of Homeland Security declared the end of a data mining program called ADVISE (Analysis, Dissemination, Visualization, Insight and Semantic Enhancement). Why? Because it turns out that they were testing it for two years on live data on real people "without meeting privacy requirements" (Yahoo! News: DHS ends criticized data-mining program).

There is nothing wrong or evil about data mining. It's like any other tool: you can use it or abuse it. Issues of privacy and confidentiality in data usage have always been there and will continue to be a major concern as more and more of our private data gets stored in commercial, government, and other databases.

Many students in my data mining class use data from their workplace for their term project. The projects almost always turn out to be insightful and useful beyond the class exercise. But we do always make sure to obtain permission, de-identify, and protect and restrict access to the data as needed. Good practice is the key to keeping "data mining" a positive term!