privacy

California Enacts "genAI" Laws That Introduce New Privacy and Transparency Requirements, Amongst Others 

In September 2024, California Governor Gavin Newsome signed a number of new generative AI (“genAI”) bills into law. These laws address risks associated with deepfakes, training dataset transparency, use of genAI in healthcare settings, privacy, and AI literacy in schools. California is the first US state to enact such sweeping genAI regulations. While we’ve seen states like Colorado take steps to address “high-risk” use cases, California’s new laws mark a significant shift away from high-risk use cases to broad regulation of the development and deployment of genAI. In this post, we highlight several of these new laws that are most impactful to businesses. 

AB1008: Expands CCPA’s definition of “personal information” 

In the current California Consumer Privacy Act (CCPA) text, the law defines the forms in which personal information can exist, including documents, images, digital files, and encrypted files. AB1008 will broaden this definition to expressly include that personal information can exist in “artificial intelligence systems that are capable of outputting personal information.” This amendment, as echoed in the Staff Position Paper on this Bill, provides a “helpful reinteraction of existing law” and “underscores” that the CCPA’s rights apply to AI systems. 

In practice, for models trained on California residents’ personal information, the application of the CCPA may raise new questions and challenges, especially with regard to actioning on data subject rights. It’s widely recognized that deleting data from a trained model is often impossible, if not extraordinarily burdensome to do; it’s also arguably not reasonable to delete an entire model upon one personal data deletion request. As regulators and companies traverse into this new era of regulation, techniques like output filters and personal data suppression should be explored as potential strategies for managing obligations under the CCPA. 

This Bill highlights an emerging question regarding whether a model is personal information which can be regulated under the law. Notably, the Bill does not define “artificial intelligence systems.” It’s not clear whether the California legislature intends for the CCPA to apply to personal information contained in training data and model outputs only, or also to the model itself. In a July 2024 Discussion Paper, the Hamburg Data Protection Authority clarified their position on this question concluding that while the inputs and outputs of a model may contain personal information, a genAI model doesn’t likely contain personal information because personal information isn’t stored in it. Instead, they highlighted that the data stored in models isn’t identifiable because, as part of the training process, personal information is transformed into a form that renders the data no longer identifiable. The Hamburg DPA offers a helpful example to explain this, they say (page 3):  

“The German sentence "Ist ein LLM personenbezogen?”, which loosely translates to “Does an LLM store personal data?” is divided into 12 tokens by a typical tokenizer, 5 for example, as follows: [I][st][ e][in][ LL][M] [ person][en][be][z][ogen] [?]. These tokens are converted into numerical values, 6 which are used exclusively within the model in the following process.”  

The Hamburg DPA reasons that these tokens lack identifiable information and therefore do not represent personal information, even if the data they were derived from was identifiable.  

Under this view, where data stored in a model (e.g., the tokens, embeddings, and other derived metadata) is not linked to an identifiable person, the model itself is arguably not personal information and therefore is not regulated by the CCPA. It’s not clear if the California AG or Privacy Protection Agency will take this view, which is fact dependent. AB1008 will take effect on January 1, 2025.   

AB2013: Introduces AI transparency requirements 

AB2013 introduces new disclosure requirements for developers that “train” or “substantially modify” genAI systems or services that are made available for Californians to use. The definition of “train” is broad, and it includes testing and fine-tuning; the breadth of this definition may implicate businesses that may not otherwise typically consider themselves to be AI model developers. 

AB2013 will require that developers post on their website “high-level summaries” of the datasets used to train an AI system or service before they launch and anytime there is a substantial modification. The summary must include: 

  • whether the datasets include personal information or aggregate consumer information 

  • how the datasets further the purpose of the AI system or service 

  • whether there was any cleaning, processing, or other modification to the datasets by the developer 

  • when the data in the datasets was collected and if collection is ongoing 

  • the dates the datasets were first used in the development of the AI system or service 

  • whether the datasets include any data protected by copyright, trademark, or patent, or whether the datasets are entirely in the public domain 

  • the owner of the datasets the system or service was developed on 

  • whether the datasets were purchased or licensed by the developer 

  • the number of datapoints in the dataset 

  • a description of the types of data points within the datasets, and 

  • whether the AI system or service uses or used synthetic data generation in its development. 

The law exempts certain AI systems and services that aren’t generally made available to the public from these disclosure requirements, including those with the sole purpose of ensuring data security and those provided to federal entities for defense purposes. The law goes into effect on January 1, 2026, and the disclosures are required to be applied to models released after January 1, 2022. 

SB 942: Introduces the AI Transparency Act and AI detection and watermarking requirements. 

SB 942, also known at the California AI Transparency Act, establishes AI detection and watermarking requirements for “covered providers.” A covered provider is a developer of a genAI system that has over 1,000,000 monthly visitors or users and is publicly accessible from California. Covered providers are required to:  

  • Make available an AI detection tool, at no cost to users. The tool must: (a) allow users to assess whether content was created or altered by the provider’s genAI system; (b) output system provenance data detected in the content; (c) not output personal provenance data (personal information, or device/service information associable with a particular user) in the content; (d) be publicly accessible; (e) allow users to upload content or provide a URL linking to content; and (f) support an API allowing use of the tool without visiting the provider’s website. Covered providers are prohibited from collecting or retaining personal information of users of the tool and are limited in their ability to retain content or personal provenance data in content submitted to the tool. 

  • Include an option to have a “manifest” disclosure in the AI-generated image, video, or audio content that is being created or altered by the genAI system. Manifest means that the disclosure is easily perceived on the face of the content. The disclosure must: (a) identify the content as AI generated; (b) be “clear, conspicuous, appropriate for the medium of the content, and understandable to a reasonable person;” and (c) be permanent or extraordinarily difficult to remove, to the extent it is technically feasible. 

  • Include a “latent disclosure” or watermark in AI-generated image, video, or audio content created by or altered by the genAI system to the extent that it is technically feasible. Latent means present but not necessarily easily perceived (e.g., only detectable by software). The disclosure must: (a) include the name of the covered provider, name and version of the genAI system used, date of the content’s creation, and a unique identifier; (b) be detectable by the covered provider’s AI detection tool; (c) be consistent with widely accepted industry standards; and (d) be permanent or extraordinarily difficult to remove, to the extent it is technically feasible. 

  • Contractually require third parties with whom it licenses the genAI system to maintain the latent disclosure in the content it creates. If it becomes known to the covered provider that the third-party licensee modified the genAI system such that it no longer includes the latent disclosure, the covered provider must revoke the third party’s license within 96 hours of discovery. 

The law will go into effect on January 1, 2026. Between now and then, potential covered providers should assess which genAI systems they develop or modify are in scope for this law and develop necessary product or service updates for launching a compliant AI detection tool and content watermarking and labelling.  It may also help to monitor developing industry standards for guidance regarding how to best act on these requirements.  

AB3030: Introduces certain AI disclosure requirements in healthcare setting 

AB3030 will require certain healthcare providers to disclose their use of genAI when it's used to generate communications to a patient pertaining to their clinical information. AB3030 will require healthcare providers to include a disclaimer that informs the patient that genAI was used to generate the communication, and instructions on how the patient may contact a human health care provider. These requirements do not apply when a human healthcare provider reviews AI-generated communication.   

 

Emily Litka is a Senior Associate at Hintze Law PLLC. Emily focuses her practice on global privacy and emerging AI laws and regulations.

Hintze Law PLLC is a Chambers-ranked and Legal 500-recognized, boutique law firm that provides counseling exclusively on global privacy, data security, and AI law. Its attorneys and data consultants support technology, ecommerce, advertising, media, retail, healthcare, and mobile companies, organizations, and industry associations in all aspects of privacy, data security, and AI law. 

Washington My Health My Data Act - Part 4: Effective Dates

By Mike Hintze

Yesterday the amended Senate version of the Washington My Health My Data Act was approved by the Washington State Legislature. Now that it is a near certainty the Act will become law in its current form, entities subject to the Act need to start preparing to comply. The key factor in determining deadlines for having compliance measures in place is the effective date of the Act. The Act purports to come into effect on March 31, 2024 (and for small businesses, three months later on June 30, 2024). However, contrary to stated legislative intent, and due to what one can only conclude is, at least in part, a drafting error, some of the key substantive provisions of the Act may come into effect much sooner than expected - as soon as July 2023. 

Read More

Washington My Health My Data Act - Part 3: The Scope of Entities and Consumers Captured by the Act

By Mike Hintze

The Washington My Health My Data Act applies to “regulated entities” that collect or process “consumer health information” from “consumers.” Part two of this series addressed the definition of “consumer health data” and how that definition results in a scope of applicability that is far beyond what we might typically think of as sensitive health data. But the other two above-quoted defined terms – “regulated entity” and “consumer” also result in a very broad (and in some ways surprising) scope and impact. 

Read More

Washington My Health My Data Act - Part 2: The Scope of “Consumer Health Data”

By Mike Hintze

The substantive requirements of the Washington My Health My Data Act apply to collection, use, and disclosure of “consumer health data.” While there are a few important exclusions, the stunning breath of that term's definition, means that it will be difficult to safely conclude that any category of personal data is out of scope of the Act. As a result, it is inaccurate to refer to the Washington My Health My Data Act as a “health data privacy law.” On the contrary, it is, in effect, a generally-applicable privacy law. 

Read More

The Washington My Health My Data Act - Part 1: An Overview

By Mike Hintze

The Washington My Health My Data Act will become the most consequential privacy legislation enacted in 2023. The sweeping scope and extreme substantive obligations, combined with vague terms and with a full private right of action, make this Act extraordinarily challenging and risky for entities seeking to comply with its requirements.

Read More

Direct-to-Consumer Genetic Testing Privacy Laws: California Joins the Party

By Sheila Sokolowski

On October 6, 2021, California’s governor signed the  Genetic Information Privacy Act (the “Act”), adding the state to the growing number enacting laws requiring direct-to-consumer genetic testing companies to protect the privacy and security of their customers’ genetic data. 

Read More

EU-U.S. Privacy Shield Details Released

On February 29, 2016, the European Commission issued a draft “adequacy decision” introducing the EU-U.S. Privacy Shield (“Privacy Shield”). The Privacy Shield replaces the U.S.-EU Safe Harbor Framework (“Safe Harbor”) as the new data transfer agreement legitimizing transfer of EU personal data to the U.S. by certifying participants. As described and linked to in the Commission’s press release, several U.S. government agencies have provided written commitments to enforce the Privacy Shield. These commitments will be published in the U.S. Federal Register.

Read More