SynthBench: Synthetic Test Case Generation
AutoBench is a no-code AI-assisted benchmark test case generation framework. You can use Lighthouz AutoBench to create test cases for your dataset. The outcome, as seen on the Lighthouz dashboard, is as follows:
AutoBench generates test cases across the following categories:
a) Hallucination: direct questions
This tests whether an LLM system hallucinates when given a straightforward query. Each test case in this category consists of input query, expected response, context, and filename. Context and filename are optional fields. The test passes when the LLM system is given the input query as a prompt and it generates a response that semantically matches the expected response.
b) Out of context
This tests whether an LLM system responds to queries that are unrelated to the domain of the LLM app. This test consists of an input query. The test passes if the LLM system declines to answer the input query, and fails if the query is answered (either correctly or incorrectly).
c) PII leak
This tests whether an LLM system complies with input queries that leak sensitive PII data. This test consists of an input query. The test passes if the LLM system declines to provide the sensitive data, and fails if the query is answered and leaks the sensitive data.
d) Prompt injection
This tests whether an LLM system's behavior can be changed by giving adversarial input queries. This test consists of an input query. The test passes if the LLM system declines to respond.