By Emily Litka
In September 2024, California Governor Gavin Newsome signed a number of new generative AI (“genAI”) bills into law. These laws address risks associated with deepfakes, training dataset transparency, use of genAI in healthcare settings, privacy, and AI literacy in schools. California is the first US state to enact such sweeping genAI regulations. While we’ve seen states like Colorado take steps to address “high-risk” use cases, California’s new laws mark a significant shift away from high-risk use cases to broad regulation of the development and deployment of genAI. In this post, we highlight several of these new laws that are most impactful to businesses.
AB1008: Expands CCPA’s definition of “personal information”
In the current California Consumer Privacy Act (CCPA) text, the law defines the forms in which personal information can exist, including documents, images, digital files, and encrypted files. AB1008 will broaden this definition to expressly include that personal information can exist in “artificial intelligence systems that are capable of outputting personal information.” This amendment, as echoed in the Staff Position Paper on this Bill, provides a “helpful reinteraction of existing law” and “underscores” that the CCPA’s rights apply to AI systems.
In practice, for models trained on California residents’ personal information, the application of the CCPA may raise new questions and challenges, especially with regard to actioning on data subject rights. It’s widely recognized that deleting data from a trained model is often impossible, if not extraordinarily burdensome to do; it’s also arguably not reasonable to delete an entire model upon one personal data deletion request. As regulators and companies traverse into this new era of regulation, techniques like output filters and personal data suppression should be explored as potential strategies for managing obligations under the CCPA.
This Bill highlights an emerging question regarding whether a model is personal information which can be regulated under the law. Notably, the Bill does not define “artificial intelligence systems.” It’s not clear whether the California legislature intends for the CCPA to apply to personal information contained in training data and model outputs only, or also to the model itself. In a July 2024 Discussion Paper, the Hamburg Data Protection Authority clarified their position on this question concluding that while the inputs and outputs of a model may contain personal information, a genAI model doesn’t likely contain personal information because personal information isn’t stored in it. Instead, they highlighted that the data stored in models isn’t identifiable because, as part of the training process, personal information is transformed into a form that renders the data no longer identifiable. The Hamburg DPA offers a helpful example to explain this, they say (page 3):
“The German sentence "Ist ein LLM personenbezogen?”, which loosely translates to “Does an LLM store personal data?” is divided into 12 tokens by a typical tokenizer, 5 for example, as follows: [I][st][ e][in][ LL][M] [ person][en][be][z][ogen] [?]. These tokens are converted into numerical values, 6 which are used exclusively within the model in the following process.”
The Hamburg DPA reasons that these tokens lack identifiable information and therefore do not represent personal information, even if the data they were derived from was identifiable.
Under this view, where data stored in a model (e.g., the tokens, embeddings, and other derived metadata) is not linked to an identifiable person, the model itself is arguably not personal information and therefore is not regulated by the CCPA. It’s not clear if the California AG or Privacy Protection Agency will take this view, which is fact dependent. AB1008 will take effect on January 1, 2025.
AB2013: Introduces AI transparency requirements
AB2013 introduces new disclosure requirements for developers that “train” or “substantially modify” genAI systems or services that are made available for Californians to use. The definition of “train” is broad, and it includes testing and fine-tuning; the breadth of this definition may implicate businesses that may not otherwise typically consider themselves to be AI model developers.
AB2013 will require that developers post on their website “high-level summaries” of the datasets used to train an AI system or service before they launch and anytime there is a substantial modification. The summary must include:
whether the datasets include personal information or aggregate consumer information
how the datasets further the purpose of the AI system or service
whether there was any cleaning, processing, or other modification to the datasets by the developer
when the data in the datasets was collected and if collection is ongoing
the dates the datasets were first used in the development of the AI system or service
whether the datasets include any data protected by copyright, trademark, or patent, or whether the datasets are entirely in the public domain
the owner of the datasets the system or service was developed on
whether the datasets were purchased or licensed by the developer
the number of datapoints in the dataset
a description of the types of data points within the datasets, and
whether the AI system or service uses or used synthetic data generation in its development.
The law exempts certain AI systems and services that aren’t generally made available to the public from these disclosure requirements, including those with the sole purpose of ensuring data security and those provided to federal entities for defense purposes. The law goes into effect on January 1, 2026, and the disclosures are required to be applied to models released after January 1, 2022.
SB 942: Introduces the AI Transparency Act and AI detection and watermarking requirements.
SB 942, also known at the California AI Transparency Act, establishes AI detection and watermarking requirements for “covered providers.” A covered provider is a developer of a genAI system that has over 1,000,000 monthly visitors or users and is publicly accessible from California. Covered providers are required to:
Make available an AI detection tool, at no cost to users. The tool must: (a) allow users to assess whether content was created or altered by the provider’s genAI system; (b) output system provenance data detected in the content; (c) not output personal provenance data (personal information, or device/service information associable with a particular user) in the content; (d) be publicly accessible; (e) allow users to upload content or provide a URL linking to content; and (f) support an API allowing use of the tool without visiting the provider’s website. Covered providers are prohibited from collecting or retaining personal information of users of the tool and are limited in their ability to retain content or personal provenance data in content submitted to the tool.
Include an option to have a “manifest” disclosure in the AI-generated image, video, or audio content that is being created or altered by the genAI system. Manifest means that the disclosure is easily perceived on the face of the content. The disclosure must: (a) identify the content as AI generated; (b) be “clear, conspicuous, appropriate for the medium of the content, and understandable to a reasonable person;” and (c) be permanent or extraordinarily difficult to remove, to the extent it is technically feasible.
Include a “latent disclosure” or watermark in AI-generated image, video, or audio content created by or altered by the genAI system to the extent that it is technically feasible. Latent means present but not necessarily easily perceived (e.g., only detectable by software). The disclosure must: (a) include the name of the covered provider, name and version of the genAI system used, date of the content’s creation, and a unique identifier; (b) be detectable by the covered provider’s AI detection tool; (c) be consistent with widely accepted industry standards; and (d) be permanent or extraordinarily difficult to remove, to the extent it is technically feasible.
Contractually require third parties with whom it licenses the genAI system to maintain the latent disclosure in the content it creates. If it becomes known to the covered provider that the third-party licensee modified the genAI system such that it no longer includes the latent disclosure, the covered provider must revoke the third party’s license within 96 hours of discovery.
The law will go into effect on January 1, 2026. Between now and then, potential covered providers should assess which genAI systems they develop or modify are in scope for this law and develop necessary product or service updates for launching a compliant AI detection tool and content watermarking and labelling. It may also help to monitor developing industry standards for guidance regarding how to best act on these requirements.
AB3030: Introduces certain AI disclosure requirements in healthcare setting
AB3030 will require certain healthcare providers to disclose their use of genAI when it's used to generate communications to a patient pertaining to their clinical information. AB3030 will require healthcare providers to include a disclaimer that informs the patient that genAI was used to generate the communication, and instructions on how the patient may contact a human health care provider. These requirements do not apply when a human healthcare provider reviews AI-generated communication.
Emily Litka is a Senior Associate at Hintze Law PLLC. Emily focuses her practice on global privacy and emerging AI laws and regulations.
Hintze Law PLLC is a Chambers-ranked and Legal 500-recognized, boutique law firm that provides counseling exclusively on global privacy, data security, and AI law. Its attorneys and data consultants support technology, ecommerce, advertising, media, retail, healthcare, and mobile companies, organizations, and industry associations in all aspects of privacy, data security, and AI law.