Home » Giskard’s open-source framework evaluates AI models before they’re pushed into production.

Giskard’s open-source framework evaluates AI models before they’re pushed into production.

by Alex Turner
Photo: Giskard

The French firm Giskard is developing an open-source testing framework for huge language models. The company is based in Paris. It can warn developers about potential biases, security flaws, and the possibility of a model creating harmful or poisonous information.

Even though there is a lot of excitement around AI models, machine learning testing systems are soon becoming an essential topic of discussion. This is because regulation is set to be implemented in the EU with the AI Act and in other nations. To avoid paying substantial fines, businesses that build AI models must provide evidence that they comply with laws and take measures to limit risks.

Get Posts Like This Sent to your Email
Iterative approaches to corporate strategy foster collaborative thinking to further the overall value.
Get Posts Like This Sent to your Email
Iterative approaches to corporate strategy foster collaborative thinking to further the overall value.

Giskard is an artificial intelligence (AI) firm that supports regulation. It is also one of the first instances of a developer tool that focuses primarily on testing in a more time-efficient way.

“Before, I was employed at Dataiku, where I mostly worked on NLP model integration. “When I was in charge of testing, I noticed that there were both things that didn’t work well when you wanted to apply them to practical cases, and it was tough to compare the performance of suppliers to one another,” Giskard co-founder and CEO Alex Combessie told me. “And I could see that when I was in charge of testing, there were both things that didn’t work well when you wanted to apply them to practical cases.”

Giskard’s testing framework is made up of three different parts, all working together. The organization has disseminated an open-source Python library suitable for incorporation into an LLM project and, more significantly, retrieval-augmented generation (RAG) initiatives. Already fairly popular on GitHub, it is also interoperable with various tools within machine learning ecosystems, like Hugging Face, MLFlow, Weights & Biases, PyTorch, Tensorflow, and Langchain.

Once the first configuration is complete, Giskard will assist you in generating a test suite that will be routinely applied to your model. These examinations cover various topics, including performance, hallucinations, false information, output not based on facts, biases, data leakage, hazardous content development, and quick injections.

“And there are several facets: you’ll have the performance facet, which will be the first thing on the mind of a data scientist. Combessie remarked, “But more and more, you have the ethical aspect to consider, both from a brand image point of view and now from a regulatory point of view.”

The developers may incorporate the tests into the continuous integration and delivery (CI/CD) pipeline. This will ensure the tests are executed each time a new codebase iteration is created. For example, developers will receive a scan report on their GitHub repository if something is incorrect.

The tests are tailored to the model’s final use case to ensure accuracy. Companies working on RAG can access vector databases and knowledge repositories to make it as relevant as is humanly practical. For instance, if you are developing a chatbot that can provide you with information on climate change based on the most recent report from the IPCC and utilizing an LLM from OpenAI, Giskard tests will verify whether the model may produce disinformation about climate change, contradict itself, and address other similar issues.

Giskard’s second offering is an artificial intelligence quality center that assists with debugging and comparing big language models to other models. The premium package that Giskard offers includes this high-caliber hub. The young company has high aspirations that it will one day be able to produce paperwork demonstrating that a model is operating according to the applicable regulations.

“We are beginning to sell the AI Quality Hub to businesses such as the Banque de France and L’Oréal to assist these organizations in debugging and locating the sources of problems. Combessie stated that this will be the location of all of the regulatory features in the not-too-distant future.

The third product that the corporation offers is known as LLMon. Before the response is transmitted back to the user, this real-time monitoring tool may assess LLM replies for the most typical concerns (toxicity, hallucinations, fact-checking, etc.).

It now collaborates with businesses that have chosen OpenAI’s APIs and LLMs as the core paradigm for their operations; however, the firm is working on integrations with Hugging Face, Anthropic, and other similar organizations.

There are some different approaches to regulating AI models. Based on talks with people working within the AI ecosystem, it is currently unclear whether the AI Act will apply to fundamental models developed by OpenAI, Anthropic, Mistral, and others or apply to applied use cases.

In the second case, Giskard seems to be in an excellent position to tell developers about the wrong ways to use LLMs that have been improved with outside data, which AI researchers call retrieval-augmented generation (RAG).

Giskard is now employing a total of 20 individuals in its operation. “We see an obvious market fit with customers on LLMs, so we’re going to roughly double the team size to be the best LLM antivirus on the market,” Combessie said. “We see an obvious market fit with customers on LLMs.”

You may also like

Leave a Comment