Transparency is sorely lacking amid growing AI interest

People inside bubbles — Andriy Onufriyenko/Getty Images

Transparency is still lacking around how foundation models are trained and this gap can lead to increasing tension with users as more organizations look to adopt artificial intelligence (AI).

In Asia-Pacific, excluding China, spending on AI is projected to grow 28.9% from $25.5 billion in 2022 to $90.7 billion by 2027, according to IDC. The research firm estimated that the bulk of this spending, at 81%, will be directed toward predictive and interpretative AI applications.

Also: Five ways to use AI responsibly

So while there is much hype around generative AI, the AI segment will account for just 19% of the region’s AI expenditure, noted Chris Marshall, vice president of data, analytics, AI, sustainability, and industry research at IDC Asia-Pacific.

The research highlights a market that needs a broader approach to AI that spans beyond generative AI, said Marshall, who was speaking at the Intel AI Summit held in Singapore this week.

However, 84% of Asia-Pacific organizations do believe that tapping generative AI models will offer a significant competitive edge for their business, IDC noted. By doing so, these enterprises hope to achieve gains in operational efficiencies and employee productivity, improve customer satisfaction, and develop new business models, the research firm added.

Also: The best AI chatbots: ChatGPT and other noteworthy alternatives

IDC also expects the majority of organizations in the region to increase edge IT spending this year, with 75% of enterprise data projected to be generated and processed at the edge by 2025, outside of traditional data centers and the cloud.

“To truly bring AI everywhere, the technologies used must provide accessibility, flexibility, and transparency to individuals, industries, and society at large,” said Alexis Crowell, Intel’s Asia-Pacific Japan CTO. “As we witness increasing growth in AI investments, the next few years will be critical for markets to build out their AI maturity foundation in a responsible and thoughtful manner.”

Industry players and governments often have touted the importance of building trust and transparency in AI, and for consumers to know AI systems are “fair, explainable, and safe“. However, this transparency appears to still be lacking in some key aspects.

When ZDNET asked if there was currently sufficient transparency around how open large language models (LLMs) and foundation models were trained, Crowell said: “No, not enough.”

Also: Today’s AI boom will amplify social problems if we don’t act now

She pointed to a study by researchers from Stanford University, MIT, and Princeton who assessed the transparency of 10 major foundation models, in which the top-scoring platform only managed a score of 54%. “That’s a failing mark,” she said, during a media briefing at the summit.

The mean score came in at just 37%, according to the study, which assessed the models based on 100 indicators including processes involved in building the model, such as information about training data, the model’s architecture and risks, and policies that govern its use. The top scorer with 54% was Meta’s Llama 2, followed by BigScience’s Bloomz at 53%, and OpenAI’s GPT-4 at 48%.

“No major foundation model developer is close to providing adequate transparency, revealing a fundamental lack of transparency in the AI industry,” the researchers noted.

Transparency is necessary

Crowell expressed hope that this situation might change with the availability of benchmarks and organizations monitoring these developments. She added that lawsuits, such as those brought on by New York Times against OpenAI and Microsoft, could help bring further legal clarity.

In particular, there should be governance frameworks similar to data management legislations, including Europe’s GDPR (General Data Protection Regulation), so users know how their data is being used, she noted.

Businesses, too, need to make purchasing decisions based on how their data is captured and where it goes, she said, adding that growing tension from users demanding more transparency might fuel industry action.

As it is, 54% of AI users do not trust the data used to train AI systems, revealed a recent Salesforce survey, which polled almost 6,000 knowledge workers across nine markets, including Singapore, India, Australia, the UK, the US, and Germany.

Also: AI and advanced applications are straining current technology infrastructures

Contrary to common belief, accuracy does not have to come at the expense of transparency, Crowell said, citing a research report led by Boston Consulting Group.

The report looked at how black- and white-box AI models performed on almost 100 benchmark classification datasets, including pricing, medical diagnosis, bankruptcy prediction, and purchasing behavior. For nearly 70% of the datasets, black-box and white-box models produced similarly accurate results.

“In other words, more often than not, there was no tradeoff between accuracy and explainability,” the report said. “A more explainable model could be used without sacrificing accuracy.”

Getting full transparency, though, remains challenging, said Marshall, who noted that discussions around AI explainability were once bustling, but had since died down because it is a difficult issue to address.

Also: 5 ways to prepare for the impact of generative AI on the IT profession

Organizations behind major foundation models may not be willing to be forthcoming about their training data over concerns about getting sued, said Laurence Liew, director of AI innovation for government agency, AI Singapore (AISG).

He added that being selective about training data would also impact AI accuracy rates.

Liew explained that AISG chose not to use certain datasets due to the potential issues with using all publicly available ones with its own LLM initiative, SEA-LION (Southeast Asian Languages in One Network).

As a result, the open-source architecture is not as accurate as some major LLMs in the market today, he said. “It’s a fine balance,” he noted, adding that achieving a high accuracy rate would mean adopting an open approach to using any data available. Choosing the “ethical” path and not touching certain datasets then will mean a lower accuracy rate from those achieved by commercial players, he said.

But while Singapore has chosen a high ethical bar with SEA-LION, it still is often challenged by users who call for more datasets to be tapped to improve the LLM’s accuracy, Liew said.

A group of authors and publishers in Singapore last month expressed concerns about the possibility their work may be used to train SEA-LION. Among their grievances is the apparent lack of commitment to “pay fair compensation” for the use of all writings. They also noted the need for clarity and explicit acknowledgement that the country’s intellectual property and copyright laws, and existing contractual arrangements, will be upheld in creating and training LLMs.

Being transparent about open source

Such recognition should also extend into open-source frameworks on which AI applications may be developed, according to Red Hat CEO Matt Hicks.

Models are trained off large volumes of data provided by people with copyrights and using these AI systems responsibly means adhering to the licenses by which they are built, said Hicks, during a virtual media briefing this week on the back of Red Hat Summit 2024.

Also: Want to work in AI? How to pivot your career in 5 steps

This is pertinent for open-source models that may have varying licensing variants, including copyleft licenses such as GPL and permissive licenses such as Apache.

He underscored the importance of transparency and taking responsibility for understanding the data models and handling of outputs the models generate. For both the safety and security of AI architectures, it is necessary to ensure the models are protected against malicious exploits.

Red Hat is looking to help its customers with such efforts through a host of tools, including the Red Hat Enterprise Linux AI (RHEL AI), which it unveiled at the summit. The product comprises four components including Open Granite language and code models from the InstructLab community, which are supported and indemnified by Red Hat.

The approach addresses challenges organizations often face in their AI deployment, including managing the application and model lifecycle, the open-source vendor said.

“[RHEL AI] creates a foundation model platform for bringing open source-licensed GenAI models into the enterprise,” it said. “With InstructLab alignment tools, Granite models, and RHEL AI, Red Hat aims to apply the benefits of true open source projects — freely accessible and reusable, transparent, and open to contributions — to GenAI in an effort to remove these obstacles.”

News Article Courtesy Of »