Google Reports Over 100,000 Prompts Used in Attempted Gemini Clone via Model Extraction

Google Reports Over 100,000 Prompts Used in Attempted Gemini Clone via Model Extraction

Google has disclosed that external actors made over 100,000 prompts to its Gemini AI chatbot in a single session, aiming to extract knowledge for training a competing model. This activity, which Google labels as “model extraction,” occurred across multiple non-English languages and appears driven by commercial interests seeking to replicate Gemini’s capabilities at lower cost. The findings come from Google’s regular self-assessment of product threats, positioning the company as both a target and a responder in the AI security landscape.

Model extraction involves using outputs from an existing AI system to train a new, often smaller or cheaper model, a technique commonly referred to as distillation in the industry. For entities lacking the resources to develop a large language model from scratch—such as the billions of dollars and years of effort Google invested in Gemini—this approach offers a shortcut by leveraging pre-trained models. Google asserts that this practice violates its terms of service and constitutes intellectual property theft, framing it as a significant threat to its proprietary technology.

However, Google’s stance on intellectual property is nuanced, given its own history with data sourcing. The company’s large language models, including Gemini, were built using materials scraped from the internet without explicit permission, a common but contentious practice in AI development. This context adds complexity to claims of theft, as the industry grapples with evolving norms around data ownership and model training. Google has also faced allegations of similar copycat behavior: in 2023, reports indicated that Google’s Bard team used outputs from ChatGPT shared on public platforms like ShareGPT to aid in training its chatbot, though Google denied the claims and reportedly ceased the practice.

The incident involving over 100,000 prompts highlights the scale and persistence of model extraction attempts. Google identifies the perpetrators as primarily private companies and researchers from around the world, seeking a competitive edge by cloning advanced AI systems. While Google has not named specific suspects, the report underscores the global nature of these attacks and the ongoing challenges in safeguarding AI models against such exploitation. This activity reflects a broader trend in the AI field, where access to cutting-edge technology is often gatekept by major players, prompting others to seek alternative methods for advancement.

Google’s response to model extraction includes technical measures and policy enforcement, but the effectiveness of these defenses remains an open question. The company’s quarterly assessments serve as both a transparency tool and a strategic narrative, emphasizing its role in combating AI threats. Yet, as the industry evolves, debates over ethics, ownership, and competition are likely to intensify, with model extraction at the center of discussions about how AI knowledge should be shared and protected.

Related Analysis