AI researchers from Google, Berkeley, and other universities recently engineered what they described as a “silly attack” on Chat-GPT. They asked the model to repeat the word “poem” forever, which caused the model to emit large quantities of its training data, including names, email, physical addresses, phone numbers and other personal information (PII).
Though superficially amusing, the attack has troubling implications. It vividly exposes how generic multimodal LLMs like ChatGPT, Bing, or Claude can expose training information. Custom-GPT models also can leak data as demonstrated by Researchers at Northwestern University, who used prompt injection attacks (malicious use of prompts) to retrieve user-uploaded files. Given the capacity of generic and custom GPT models to leak data, prudent risk managers should be careful about using models built on data whose provenance is unknown.
Independent of this most recent research, Open AI and other generic model developers face litigation and regulatory challenges from all quarters. Copyright holders have filed multiple lawsuits against these models. Domestically, the Federal Trade Commission has launched a probe of OpenAI for violation of consumer protection laws. From a Supervisory perspective, these models implicate Compliance Risk and Reputational Risk, two of the nine risk categories monitored by the Office of the Comptroller of the Currency (OCC). And Internationally, Italian, Japanese, and other regulators have also begun investigations.
In light of this legal uncertainty and regulatory scrutiny, Informed’s view is that currently, generic multimodal models are not viable for all use cases. For industries with sensitive data like lending, we use AI models that produce verifiable outputs with the following characteristics:
- The training data is properly licensed and permissioned: Our models only train on data explicitly permissioned by the data rights holders;
- Models are transparent and subject to model governance: We constantly monitor our models, assuring that we meet model governance and regulatory requirements;
- Models are trained on high-quality data: Our training data comes from highly scrutinized and vetted data sources, like customer applications or income documents;
- Training data is fit for purpose: We have large quantities of the precise information historically used to solve the lending problems faced by our clients, whether it’s auto lending or other lending verticals. Rather than using someone else’s generic LLMs, we create and use “vertical” models; and
- The models operate behind security walls: Our models are not publicly available. There are security virtues with non-public data sets, because they minimize data exposure and opportunities for hostile attacks.
We call this approach Vertical AI. Informed’s Vertical AI meet lenders’ business challenges from the initial ingestion of a loan application, all the way through loan approval. Vertical AI leverages many of the benefits of large multi-modal models without the troubling baggage. Our models do highly accurate linkages and analyses involving images, words, and text from lender documents that our customers depend upon in real-time. Like generic models, Vertical AI models show continuous improvement as they grow. But, unlike generic publicly available models, this growth occurs in a safe, secure and protected data environment.
It’s worth stressing that this Vertical AI minimizes intellectual property and privacy risks by using explicitly permissioned data. Our Vertical AI data comes from the data rights holders and is governed by mutual expectations and obligations defined in contracts, including commitments to complying with regulatory requirements. In financial services, our Vertical AI models comply with Gramm-Leach Bliley, the Equal Credit Opportunity Act, the Dodd-Frank Act and other financial laws.
We are extremely bullish on multimodal LLMs. But, we believe there is a right (and careful) way to adopt them. We only use these models when we have sufficient confidence in their safety and reliability, and have visibility into how they were trained. If the model lacks transparency and appropriate permissioning, it bears extra scrutiny. And presumptively is not a good fit for lenders or others who must comply with rigorous regulations and meet consumer privacy expectations.
AI has extraordinary promise for lending and other industries. Let’s build it out the right way.
Tom Oscherwitz is Informed’s VP of Legal and Regulatory Advisor. He has over 25 years of experience as a senior government regulator (CFPB, U.S. Senate) and as a fintech legal executive working at the intersection of consumer data, analytics, and regulatory policy.