Description
This is a working document with some elements that are ready for development
While there is convergence on what constitutes an "AI incident", there are still considerable differences between definitions of how the concept of "incident in waiting" is defined. We call these "issues," the OECD is calling them "hazards," Robust Intelligence calls them "risks," ARVA calls them "vulnerabilities," AIAAIC calls them "controversies," and various algorithmic assessment organizations call them audits (or at least, audits will always produce one or more incident in waiting). All these things vary subtly in definition, application, and use between organizations.
The role of the Responsible AI Collaborative going back to the original research publication has always been to act as the union of multiple perspectives and provide tools to support sharing across those perspectives. This is a challenging proposition. Pretty much any multi-stakeholder ontological project I am aware of has inevitably degenerated into never ending discussions over the most difficult elements to define. For something that has no underlying, singular "right" answer, it is best to find ways of moving forward that don't require universal agreement. The purpose of this GitHub issue is to detail how to proceed technologically without needing to resolve the definitional question of "incident in waiting".
The AIID's current entrant into this space is the "AI Issue." We chose this term intentionally to cover multiple aspects of "incident in waiting." It is meant to be specific enough to capture elements of risk, while general enough to cover the field. The "issue" term also means we can index concepts covered by other communities and link out to those communities if/when they operate their own processes. While we would prefer such organizations join the Responsible AI Collaborative and integrate from the beginning, that will not be possible universally (e.g., when a database is operated by a sovereign state). Therefore, we need to maintain flexibility.
This also plugs into the drive for federating the AI Incident Database -- something that we will soon have a test case for with an index of deepfakes. Incident databases for things like deepfakes require different editing processes and metadata. How federation works with incidents is fairly clear. Incidents have a natural scope that will support federating responsibilities among multiple nodes. However, this does not work for incidents in waiting. Often there is no concrete definition of what specific system can produce the incident. Worse, all systems will produce a great many incidents when placed into the wrong context. Behind every system is an infinity of risks. This is why the ForHumanity audit criteria centers on these four elements,
Scope: The boundaries of a system, what is covered, what is not covered
Nature: The forces and processes that influence and control the variables and features
Purpose: The aim or goal of a system
Context: The circumstances in which an event occurs; including jurisdiction and/or location, behaviour and functional inputs to an AAA System that are appropriate
Without some variation of these elements, the risks produceable by a system cannot be bounded and/or expressed in any meaningful or useful way. For example, a LLM can be applied to an infinity of applications (safely or unsafely), while a webserver logging vulnerability is inherently scoped to the webserver. LLMs are scope/context free and yet present incidents in waiting in a massive array of circumstances. There is no closed world within which to index their risks so it defies enumeration.
More concretely,
Problem: The safety community currently lacks an enumerable definition of "system+context" and we are likely to never have one. The notion of system constantly changes in version, deployment circumstance, organizational processes, etc. The world context for these systems also similarly evolves through time. Absent a more universal grounding of system+context it is not possible to enumerate in a useful way. There will be too much noise.
Solution: Organizing Issues in terms of a numeric identifier or hierarchical structure is a road to editorial ruin. Don't attempt to universally enumerate context-free risk. Instead of organizing issues according to a definite scope, issues themselves can be tagged according to salient attributes, then those tags can then be queried according to values of interest that populate a listing.
Let me introduce by example.
Example Applied to a LLM
<< For illustrative purposes only >>
Press Release: "Dolittle LLM runs all LLMs produced to date with RLHF selecting among candidate outputs to produce an unbeatable hybrid LLM."
audit: "Dolittle can generate several classes of malware through prompt hacking, Dolittle may attempt to end people's marriages"
Audit Metadata {identifiers for hundreds of constituent LLMs, scope, nature, purpose, context, structured representation of findings, ...}
hazard, risk, and vulnerability Record Metadata: {identifiers for hundreds of constituent LLMs, additional reporting, various taxonomies...}
controversy 1: "This new superintelligent AI is coming for your marriage"
Controversy Metadata {company, ...}
(subsequent incident)
"Incident 27311: Dolittle LLM allegedly produced malware that subsequently destroyed the records of 17 hospital systems"
Metadata {Relevant Issue reports, Event Date, Alleged Developer, Alleged Deployer(s), Alleged Harmed Party(ies), Event Data, ...}
Now what can we do with this? Let's consider each of the report types as issue reports and present them all in a new page, but first we need to decide which reports are queried.
Populating an Issue Profile from a Query
Here I am introducing a new collection type of "Issue Profile," which is something that is programmatically generated from reports and never edited directly.
It is easy to present singular reports in isolation. That is what we are already doing here. What we are missing is some notion issue profiles whereby elements of audit, risk, vulnerability, etc. can be jointly presented. Issue profiles can be queried from the collection of metadata expressed across all reports.
User Story 1: "I want to know whether a particular model I am considering using has been implicated in any risks so I can decide whether I integrate it into my product"
Query: {select the model and its target operating context and see what returns}
User Story 2: "I want to know whether a particular scope has been identified as at-risk in an audit for any systems so I can know what to worry about"
Query: {select the model and its target operating context and see what returns}
User Story 3: "I want to know all the examples of LLM jailbreaks consistent with the Dolittle model so I can begin training safety systems"
Query: {select vulnerabilities for the Dolittle system and subset to input/output data}
User Story 4: "I want to monitor the space of emerging risks across all similarly disposed systems"
Query: {select a collection of similarly positioned systems}
After generating the query, what gets displayed?
New Page Type for Joining Issue Reports Returned by the Query
Right now the /cite/###
pages have the following sections,
- Title
- Description
- Tools
- Stats
- Taxonomies
- Reports timeline
- Reports
- Variants
We can define each of these as follows,
- Title: Programmatically generate "{All Versions of Dolittle LLM} applied within the context of {adversarially-produced malware generation}"
- Description: (None)
- Tools: This would be a pallet of actions that can be taken on the query, such as "Subscribe" and "Cite", along with another button for creating a new incident from the assemblage.
- Stats: Information summarizing the reports that have returned
- Taxonomies: (None at present, these would be scoped to reports)
- Reports timeline: Render the reports according to their publication dates
- Reports: The full index of information pulled from the report or the related databases
Much of this still requires discussion, but there are several elements on which we can proceed.
Required Functionality in Codebase
These are likely "Epics" in the agile world.
Today (ready for work)
- Adopt definitions for Audit, Hazard, Vulnerability, Risk, Assessment, and Controversy, then index all records from databases disposed towards interoperability meeting those definitions. Databases with useful but non-conforming definitions can also be indexed, but under an "other records" heading.
- Tag all issue reports with the appropriate lower-level term as adopted above
- Create a new page, "/apps/reports/", which reuses many of the components of the "Discover" application, but is centered around constructing queries into the MongoDB database rather than the Algolia index. As the query is constructed the URL rewrites to include the query parameters. When the user is done or when the user refreshes the page, the panel for forming the query collapses and the page is made to look more similar to the
/cite/###
pages. Design wise, although the page is populated like an Amazon shopping cart, it can be presented more like the static/cite/###
pages where the tags are headings with report cards underneath. Whoever picks this up should talk with @lmcnulty since there are overlaps here with the risk checklisting work. - Extend the taxonomy component stack to be applicable to reports. It currently can only be applied to incident profiles. By making it applicable to reports, it becomes possible to index the structured data contained within related databases that don't necessarily have an incident number.
- Add attachment support to reports, initially only for PDFs. We will likely start with these in the MongoDB database, but eventually rip them out and put them in S3 or similar.
- Write a migration to pull in all the data associated with the AI Litigation Database and structure all the data in a new taxonomy consistent with its tabular fields. This is a stress test to help us figure out what is necessary for bringing in additional record types. Walk Bob at GW through the results and develop next steps.
- Change the layout of the
/cite/###
page so that taxonomies are added from the tools panel rather than a panel that displays to every permissioned user. We are going to have a lot more taxonomies. - Produce a taxonomy of audits then collect a set of audits to populate this as a queryable element.
Soon (needs more definition)
- SBOM-like indexing and entity resolution of systems. This is a huge topic requiring a lot more ink.
- Validate and index the data of TeslaDeaths
Eventually (whenever other efforts become ready)
- Integrate functionality represented here with the risk checklisting project
- Programmatically synchronize with emerging databases