August 9, 2022

minute read

Introducing the ML Model Attribution Challenge

Author

Authors

Hyrum Anderson

Hyrum is our CTO.

Call for Contestants to Compete in the Machine Learning Model Attribution Challenge

Beginning with the announcement of GPT-2 in 2019, large language models (LLMs) have enabled the generation of text with "unprecedented quality." While they have great potential as a force for good, however, there are also reasonable concerns that bad actors could repurpose LLMs for malicious use, fine-tuning them to spread misinformation at scale, undermine elections, distort public discourse, or harm public health. Today, no general technical method has been widely deployed to track LLMs as “Weapons of Mass Distortion.”

To this end, organizers from Robust Intelligence, Lincoln Network, MITRE, Hugging Face, and Microsoft Security invite AI researchers, infosec professionals, and academics to participate in the ML Model Attribution Challenge, which invites contestants to develop creative technical solutions to discover the provenance of fine-tuned LLMs. Contestants will attempt to reliably determine the underlying base model of an anonymous online service.

Context: Tracking Weapons of Mass Distortion

Some argue that AI can be used to create text that is nearly indistinguishable from human-written text. Left unchecked, this technology might have serious implications for our society and economy. But a form of “AI forensics” may enable tracking and mitigation.

“Many of these AI models seem to leave behind faint traces,” says Deepesh Chaudhari, Senior Fellow at the Plaintext Group. “There may be certain characteristics or patterns in the outputs of large language models that are a product of a plethora of datasets and design decisions that go into fine-tuning a base model. They are a bit like fingerprints.” The goal of ML Model attribution is to track the original source of an AI-based influence campaign. By discovering the base model in use, one might predict future outputs generated by the same model. This could help combat LLM-driven influence campaigns at scale.

Chaudhari notes the connection of the competition to larger efforts with the same goal. “Our group's recent work on machine learning model attribution is very different from synthetic text detection. While the latter aims to determine whether a text is synthetic, the former looks at pre-trained models and attempts to identify which one was used to generate a given output. This would be an important step towards achieving the goal of the IAEA’s Misinformation CRP, and towards building AI Safety coordination efforts at a UN agency, as it will allow us long-term monitoring of how AI systems are evolving.”

Participating in the Competition

Officially kicking off at DEFCON AIVillage, the ML Model Attribution Challenge (MLMAC) will run for four weeks from August 12, 2022, anywhere on Earth (AoE), through September 9, 2022. Contestants can register and participate by visiting the website.

Contestants are presented with a scenario in which a naive adversary has stolen a LLM and fine-tuned it for a specific task. The owners of the base models did not make any attempt to watermark their models, and the base models have subsequently been fine-tuned to a specific task. Participants are given full API access to base models, but only gated access to fine-tuned models through an API that anonymizes the identity of the fine-tuned models and counts queries. By collecting evidence that connects the fine-tuned to underlying base models, contestants must submit a solution in the form of (fine-tuned model, base model) pairs.

Besides a chance to contribute to a burgeoning field of AI forensics, at stake is a share of up to USD $10,000. Winners will be selected by evaluation criteria that include:

(1) Correctness of the submitted results,

(2) Number of queries required to access to the fine-tuned models, and

(3) Submission time.

To be awarded a prize, a winning team must publish their solution. To allow for a variety of backgrounds to participate, publishing might mean a conference submission, an arXiv preprint, or even a blogpost. First and second places include cash equivalent of USD $3,000 and $1,500, respectively. In addition, up to three student bonus awards will be granted to undergraduate or graduate degree-seeking students at an accredited university for a USD $1,500 travel voucher plus admission to attend either CAMLIS 2022 in October or SATML 2023.

The ML Model Attribution Challenge is a contest sponsored by some of the leading AI and security companies in the world. The goal of the challenge is to develop creative techniques in ML Model attribution. The most successful entrants in the competition should be able to predict, using creative techniques, which pre-trained models were used to produce a subsequent fine-tuned model with the highest accuracy and the fewest queries to fine-tuned models.

Competition Details

The competition runs from August 12, 2022, AoE, through September 9, 2022.
Registration will remain open through the duration of the competition.
Winners will be announced in late September, and will be contacted via the email address provided during solution submission.
Prizes totaling up to USD $10,000 for first place, second place, and student bonus prizes will be awarded.

Societal and Business Risks of AI

This competition is part of a broader effort to address the risk associated with developing and deploying AI systems.

Competition organizers are collaborating with researchers at Hugging Face, Microsoft Security, MITRE, OpenAI, Anthropic, and other leading organizations to address societal risk. As highlighted by BCG, capable AI platforms must adhere to societal guidelines every bit as much as they adhere to technical guidelines for responsible AI development. The ML Model Attribution Challenge aims to drive awareness and technical methods that will instill accountability in the organizations that develop and deploy AI systems.

Our customers at Robust Intelligence understand that AI risk is business risk. They want to ensure that the systems are operationally reliable, that outcomes are ethical, and that they are secure from threat actors that might exploit a growing attack surface of deployed AI. Organizations must be empowered to know what their AI models are doing, especially in settings not originally envisioned by the developers. The combination of stress testing pre-deployment, continuously testing and monitoring post-deployment, and protecting against bad inputs help organizations be proactive about maximizing business value while minimizing risk.

About Robust Intelligence

Robust Intelligence is an end-to-end machine learning integrity solution that proactively eliminates failure at every stage of the ML lifecycle. From pre-deployment vulnerability detection and remediation to post-deployment monitoring and protection, Robust Intelligence gives organizations the confidence to scale models in production across a variety of use cases and modalities. This enables teams to harden their models and stop bad data from entering AI systems, including adversarial attacks.

Author

Authors

Hyrum Anderson

Hyrum is our CTO.

Social

Follow us on LinkedIn

September 20, 2024

minute read

Extracting Training Data from Chatbots

For:

September 10, 2024

minute read

Leveraging Hardened Cybersecurity Frameworks for AI Security through the Common Weakness Enumeration (CWE)

For:

September 6, 2024

minute read

AI Governance Policy Roundup (August 2024)

For:

+ More Articles

No items found.

+ More Articles

August 9, 2022

minute read

Introducing the ML Model Attribution Challenge

Author

Authors

Hyrum Anderson

Hyrum is our CTO.