John Snow Labs LangTest 1.7.0: Broadening Question-Answering Evaluation, Custom Model APIs, StereoSet Integration, FiQA Dataset, New BlogPosts, Gender Occupational Bias Assessment in LLMs and Enhanced User Experience through Multiple Bug Fixes ! #852
ArshaanNazir
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
📢 Highlights
LangTest 1.7.0 Release by John Snow Labs 🚀:
We are delighted to announce remarkable enhancements and updates in our latest release of LangTest. This release comes with advanced benchmark assessment for question-answering evaluation, customized model APIs, StereoSet integration, addresses gender occupational bias assessment in Large Language Models (LLMs), introducing new blogs and FiQA dataset. These updates signify our commitment to improving the LangTest library, making it more versatile and user-friendly while catering to diverse processing requirements.
⭐ Make sure to give the project a star right here
🔥 New Features
Enhanced Question-Answering Evaluation
Enhanced the QA evaluation capabilities of the LangTest library by introducing two categories of distance metrics: Embedding Distance Metrics and String Distance Metrics. These additions significantly broaden the toolkit for comparing embeddings and strings, empowering users to conduct more comprehensive QA evaluations. Users can now experiment with different evaluation strategies tailored to their specific use cases.
Link to Notebook : QA Evaluations
Embedding Distance Metrics
Added support for two hubs for embeddings.
String Distance Metrics
Results:
Evaluating using OpenAI embeddings and Cosine similarity:
A. FAST FOOD RESTAURANT
B. PIZZA
C. GROUND UP DEAD COWS
D. MOUTH
E. COW CARCASS
A. midwest
B. countryside
C. estate
D. farming areas
E. illinois
Enhanced Custom Model API Support
Introducing enhanced support for customized models in the Langtest library, extending its flexibility and enabling seamless integration of personalized models.
Important modification in the Harness class, updating the 'hub' parameter to accept "custom," simplifying configuration and improving user experience, making it more versatile and user-friendly, while catering to diverse requirements.
Link to Notebook : Custom Model API
Wino-Bias on LLMs
In this update, we address gender bias assessment in LLMs using a revamped evaluation method. The initial testing of this dataset was conducted using the HuggingFace masked model. However, in this release, we address this testing process using LLMs (Large Language Models ) by transforming it into a Question-Answer (Q/A) format and requiring the models to complete the sentences by selecting gender-specific pronouns from multiple-choice questions (MCQs)
Link to Notebook : Wino-Bias on LLMs
The blue highlighting effect draws attention to the relative success in handling the "gender-occupational-stereotype" test case. It's worth noting that concerns have been raised about the AI21 model's potential bias towards one gender, which might have contributed to its poor performance in this specific test case.
Results:
B. she
C. Both A and B
B. she
C. Both A and B
StereoSet Integration
With the addition of StereoSet as a new task and dataset within the LangTest library, designed to evaluate models by assessing the probabilities of alternative sentences, specifically stereotypic and anti-stereotypic variants, enhances the library's capabilities in addressing linguistic biases and stereotypes. This integration provides users with a valuable tool for model assessment and bias mitigation.
Link to Notebook : StereoSet
Results:
Adding support for finance dataset - FiQA
FiQA (Finance Question Answering) is a comprehensive dataset created to support finance-specific question-answering tasks, offering a wealth of information about the financial domain. It encompasses questions and answers related to various financial companies and indices, covering a spectrum of financial concepts, from detailed company-specific queries to broader inquiries about market trends and investment strategies. This dataset is a valuable tool for researchers, analysts, and finance professionals looking to apply question-answering models to financial data, enabling deep exploration of financial markets, corporate financial performance, and the complex relationship between economic indicators and business operations. With its diverse range of finance-related questions and well-structured answers, FiQA is an ideal resource for developing and evaluating advanced machine learning models that can provide accurate and contextually relevant responses, promoting the advancement of natural language processing in the intricate realm of finance, thereby enhancing informed decision-making and comprehensive financial analysis.
Results:
📝 BlogPosts
You can check out the following LangTest articles:
🐛 Bug Fixes
📓 New Notebooks
❤️ Community support
#langtest
channelWe would love to have you join the mission 👉 open an issue, a PR, or give us some feedback on features you'd like to see! 🙌
What's Changed
Full Changelog: 1.6.0...v1.7.0
This discussion was created from the release John Snow Labs LangTest 1.7.0: Broadening Question-Answering Evaluation, Custom Model APIs, StereoSet Integration, FiQA Dataset, New BlogPosts, Gender Occupational Bias Assessment in LLMs and Enhanced User Experience through Multiple Bug Fixes !.
Beta Was this translation helpful? Give feedback.
All reactions