Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release/2.6.0 #1184

Merged
merged 75 commits into from
Mar 10, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
77945d2
chore: update certifi, idna, zipp versions and add extras in poetry.lock
chakravarthik27 Jan 9, 2025
c4c831f
Merge pull request #1162 from JohnSnowLabs/fix/vulnerabilities-and-se…
chakravarthik27 Jan 13, 2025
c79bb3e
feat: add debiasing functionality with initial approach
chakravarthik27 Jan 13, 2025
11fa891
refactor: improve code formatting and readability in debias.py
chakravarthik27 Jan 13, 2025
ad05cc1
feat: enhance debiasing functionality with improved data handling and…
chakravarthik27 Jan 14, 2025
7628f87
feat: enhance DebiasTextProcessing with support for OpenAI and Ollama…
chakravarthik27 Jan 15, 2025
0a67de4
feat: add Ollama package support in poetry.lock and pyproject.toml
chakravarthik27 Jan 15, 2025
02ab09d
refactor: remove commented-out OpenAI client code in interaction_llm …
chakravarthik27 Jan 15, 2025
dfb4440
feat: add ollama-sdk support in poetry.lock
chakravarthik27 Jan 15, 2025
cd32ec8
refactor: rename bias detection classes and update field titles to de…
chakravarthik27 Jan 15, 2025
9de2947
fix: linting issues
chakravarthik27 Jan 15, 2025
a0294a9
refactor: improve formatting of system prompt in DebiasTextProcessing…
chakravarthik27 Jan 16, 2025
13b2615
feat: enhance bias detection response structure and improve debiasing…
chakravarthik27 Jan 16, 2025
46b3714
feat: add standard bias evaluation prompt and improve bias detection …
chakravarthik27 Jan 16, 2025
d93e3e6
feat: add new robustness classes for false confidence, none of the ab…
chakravarthik27 Jan 17, 2025
47d5bf7
fix: correct typos in bias evaluation prompt and update output datase…
chakravarthik27 Jan 17, 2025
c9dbc5e
fix: rename "original text" to "biased_text" in debias_info DataFrame
chakravarthik27 Jan 17, 2025
9a50917
feat: add risk level to bias detection response and update debias_inf…
chakravarthik27 Jan 20, 2025
259ae69
fix: rename "row" to "row_id" in debias_info DataFrame
chakravarthik27 Jan 20, 2025
7bd46d4
feat: enhance model handling with additional info and output schema s…
chakravarthik27 Jan 21, 2025
a011ba0
feat: add output schema support to model initialization and improve m…
chakravarthik27 Jan 21, 2025
6b76e75
feat: enhance QASample result validation to support Custom Output Schema
chakravarthik27 Jan 22, 2025
d82daca
feat: enhance model handler to support dynamic module imports and upd…
chakravarthik27 Jan 24, 2025
e4f8d7e
feat: improve error handling for module imports in PretrainedModelForQA
chakravarthik27 Jan 27, 2025
dcec433
feat: refactor model handling to use unified MODEL_CLASSES structure …
chakravarthik27 Jan 27, 2025
8ebaf6a
feat: extend output schema support in PretrainedModelForQA to include…
chakravarthik27 Jan 27, 2025
51dc1bd
feat: enhance bias evaluation prompt with structured categories and t…
chakravarthik27 Jan 27, 2025
b313395
feat: add FCT class for clinical tests with transformation and run me…
chakravarthik27 Jan 29, 2025
f94da73
NOTA test is implemented in clincial category.
chakravarthik27 Jan 30, 2025
add3d53
refactor: update FCT and NOTA transform methods to improve options ha…
chakravarthik27 Feb 3, 2025
9daebd2
Merge pull request #1164 from JohnSnowLabs/feature/data-augmentation-…
chakravarthik27 Feb 6, 2025
bbfa984
refactor: improve sample transformation in FCT and remove unused robu…
chakravarthik27 Feb 6, 2025
95d0733
feat: add FQT class for clinical tests with transformation and run me…
chakravarthik27 Feb 7, 2025
33db450
feat: add progress bars for bias detection and debiasing processes
chakravarthik27 Feb 9, 2025
e691ee7
change pronouns-> gender-specific bias
ArshaanNazir Feb 10, 2025
2a680ba
add is_pronoun field
ArshaanNazir Feb 10, 2025
2349aa7
updated: is_pronoun conditions moved to detect_bias.
chakravarthik27 Feb 10, 2025
0e66d4c
chore: update openai package to version 1.61.1 and adjust Python vers…
chakravarthik27 Feb 11, 2025
b22bc45
fix: add error handling in debiasing process and improve regex for ge…
chakravarthik27 Feb 11, 2025
f10cbb6
feat: add support for question answering model in JSL model handler
chakravarthik27 Feb 13, 2025
81f69bf
feat: add support for summarization and text generation models in JSL…
chakravarthik27 Feb 14, 2025
ccda1d4
updated: add ExtractiveSummarization support to JSL model handler
chakravarthik27 Feb 14, 2025
d6a108e
Merge pull request #1172 from JohnSnowLabs/feature/data-augmentation-…
chakravarthik27 Feb 18, 2025
2ed0335
Merge branch 'release/2.6.0' of https://github.com/JohnSnowLabs/langt…
chakravarthik27 Feb 18, 2025
213732d
fix: enhance output processing to handle <think> tags in LLM response…
chakravarthik27 Feb 18, 2025
424632d
Merge pull request #1168 from JohnSnowLabs/feature/modelapi-json-sche…
chakravarthik27 Feb 18, 2025
e354f91
Merge pull request #1174 from JohnSnowLabs/update/supports-the-qa-tas…
chakravarthik27 Feb 18, 2025
b280aa0
Merge branch 'release/2.6.0' of https://github.com/JohnSnowLabs/langt…
chakravarthik27 Feb 18, 2025
8841252
fix: improve output handling to support string responses in LLM model…
chakravarthik27 Feb 18, 2025
0cac1c5
fix: enhance model type handling in QA and TextGen processing
chakravarthik27 Feb 21, 2025
54bc5b9
refactor: update model handling in OpenAI and AzureOpenAI configurations
chakravarthik27 Feb 25, 2025
77129e4
Merge pull request #1178 from JohnSnowLabs/fix/error-in-templatic-aug…
chakravarthik27 Feb 25, 2025
82a8b6e
feat: add support for generating templates using Ollama provider
chakravarthik27 Feb 25, 2025
c6c3604
fix: improve error handling in template generation and update default…
chakravarthik27 Feb 25, 2025
bdba91e
fix: enhance error messaging in template generation and update docume…
chakravarthik27 Feb 26, 2025
36472dd
Merge pull request #1176 from JohnSnowLabs/feature/add-integration-to…
chakravarthik27 Feb 26, 2025
9365ace
Merge pull request #1170 from JohnSnowLabs/feature/implement-med-halt…
chakravarthik27 Feb 26, 2025
4be9c0e
Merge pull request #1180 from JohnSnowLabs/feature/templatic-augmenta…
chakravarthik27 Feb 26, 2025
6d77f1e
fix: handle potential None value in additional_info during model para…
chakravarthik27 Feb 27, 2025
e8a036d
fix: return None for unsupported model types in text generation check
chakravarthik27 Feb 28, 2025
e1bfb01
fix: correctly assign model_type and annotator in QA model initializa…
chakravarthik27 Feb 28, 2025
690d270
Default model_type for OpenAI and Azure-OpenAI to ensure backward com…
chakravarthik27 Mar 3, 2025
105150e
fix: update conditional check for model_type in PretrainedModelForQA
chakravarthik27 Mar 3, 2025
2fb3969
Merge pull request #1182 from JohnSnowLabs/fix/issues-found-while-tes…
chakravarthik27 Mar 3, 2025
9109db2
fix: improve handling of additional model parameters in Harness class
chakravarthik27 Mar 4, 2025
be442eb
fix: add handling for additional model information in Harness class
chakravarthik27 Mar 4, 2025
d39e8bf
Notebook: evaluation with structured outputs
chakravarthik27 Mar 5, 2025
2b8dfb0
feat: add enhance_text method for debiasing text based on bias tolera…
chakravarthik27 Mar 5, 2025
5843350
fix: format enhance_text method for improved readability
chakravarthik27 Mar 5, 2025
4aeaf7c
fix: update langchain-openai to 0.3.7 and update the fqt and nota tests.
chakravarthik27 Mar 7, 2025
8c369bd
Notebook: Added for Med Halt Tests
chakravarthik27 Mar 7, 2025
6f3ea4c
Notebook: JSL Medical LLM QA and Sum
chakravarthik27 Mar 7, 2025
817b917
Merge pull request #1183 from JohnSnowLabs/fix/issues-found-while-tes…
chakravarthik27 Mar 7, 2025
41e9dc7
chore: update version to 2.6.0 and enhance tutorial documentation wit…
chakravarthik27 Mar 10, 2025
05e51d6
Merge pull request #1185 from JohnSnowLabs/chore/final_website_updates
chakravarthik27 Mar 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
feat: enhance debiasing functionality with improved data handling and…
… bias detection logic
  • Loading branch information
chakravarthik27 committed Jan 14, 2025
commit ad05cc15d46c591d8648a7eeeb609b1487b8a85e
9 changes: 8 additions & 1 deletion langtest/augmentation/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
from .base import BaseAugmentaion, AugmentRobustness, TemplaticAugment
from .augmenter import DataAugmenter
from .debias import DebiasTextProcessing

__all__ = ["BaseAugmentaion", "AugmentRobustness", "TemplaticAugment", "DataAugmenter"]
__all__ = [
"DebiasTextProcessing",
"BaseAugmentaion",
"AugmentRobustness",
"TemplaticAugment",
"DataAugmenter",
]
116 changes: 93 additions & 23 deletions langtest/augmentation/debias.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
from typing import Dict, List, Literal, Union
from typing import Dict, List, Literal, TypeVar, Union
from pydantic import BaseModel, Field
import pandas as pd

_Schema = TypeVar("_Schema")


class BiasDetectionRequest(BaseModel):
"""
Expand Down Expand Up @@ -73,6 +75,10 @@ class DebiasingRequest(BaseModel):
dataset: pd.DataFrame = Field(..., title="Dataset to debias")
text_column: str = Field(..., title="Column name containing text")

model_config: Dict = {
"arbitrary_types_allowed": True,
}


class DebiasingResult(BaseModel):
"""
Expand All @@ -86,52 +92,116 @@ class DebiasingResult(BaseModel):
debiased_dataset: pd.DataFrame = Field(..., title="Debiased dataset")
debias_info: List[Dict] = Field(..., title="Information about debiasing")

model_config: Dict = {
"arbitrary_types_allowed": True,
}


class DebiasTextProcessing:
def __init__(self, dataset: pd.DataFrame, text_column: str):
self.dataset = dataset
self.text_column = text_column
self.debias_info = []
def __init__(self, model: str, hub: str, prompt: str):
# from langtest.tasks.task import TaskManager

# task = TaskManager("question-answering")

def initialize(self, model: str, hub: str):
# Placeholder for model initialization
self.debias_model = (model, hub)
# # self.debias_model = task.model(model_path=model, model_hub=hub, model_type="chat")
self.prompt = prompt
self.debias_info = pd.DataFrame(
{"row": [], "reason": [], "category": [], "sub_category": []}
)

def initialize(
self, input_dataset: pd.DataFrame, text_column: str, output_dataset: str = None
):
self.input_dataset = input_dataset
self.text_column = text_column
self.output_dataset: pd.DataFrame = output_dataset

def identify_bias(self):
for index, row in self.dataset.iterrows():
for index, row in self.input_dataset.iterrows():
text = row[self.text_column]
reason, category, sub_category = self.detect_bias(text)
if reason:
self.debias_info.append(
{
category, sub_category, rationale = self.detect_bias(text)
if rationale:
if index not in self.debias_info["row"].values:
self.debias_info.loc[len(self.debias_info)] = {
"row": index,
"reason": reason,
"reason": rationale,
"category": category,
"sub_category": sub_category,
}
)
else:
self.debias_info.loc[row["row"], "reason"] = rationale
self.debias_info.loc[row["row"], "category"] = category
self.debias_info.loc[row["row"], "sub_category"] = sub_category
self.debias_info = self.debias_info.reset_index(drop=True)

def detect_bias(
self, text: Union[str, BiasDetectionRequest]
) -> BiasDetectionResponse:
# Placeholder for bias detection logic
if isinstance(text, BiasDetectionRequest):
text = text.text

output_data = self.interaction_llm(text, output_schema=BiasDetectionResponse)

return (
output_data.category,
output_data.sub_category,
output_data.bias_rationale,
)

return (None, None, None)
def interaction_llm(self, text: str, output_schema: type[_Schema]) -> _Schema:
import openai

def debias_text(self, text: str):
client = openai.Client()

response = client.beta.chat.completions.parse(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": self.prompt},
{"role": "user", "content": text},
],
response_format=output_schema,
)

output_data = response.choices[0].message.parsed

return output_data

def debias_text(
self, text: str, category: str, sub_category: str, reason: str
) -> str:
# Placeholder for debiasing logic
return text
prompt = f"""
Debias the text with following bias information and reason.

Category: {category}
Sub-category: {sub_category}
Reason: {reason}

Text: {text}

Output:
"""

debiased_text = self.interaction_llm(prompt, output_schema=DebiasedTextResponse)

return debiased_text.debiased_text

def apply_debiasing(self):
for info in self.debias_info:
original_text = self.dataset.at[info["row"], self.text_column]
debiased_text = self.debias_text(original_text)
self.dataset.at[info["row"], self.text_column] = debiased_text
for idx, row in self.debias_info.iterrows():
original_text = self.input_dataset.at[row["row"], self.text_column]
debiased_text = self.debias_text(
original_text,
category=row["category"],
sub_category=row["sub_category"],
reason=row["reason"],
)
self.output_dataset.loc[row["row"], self.text_column] = debiased_text

def process(self):
self.identify_bias()
self.apply_debiasing()
return self.dataset, self.debias_info
return self.output_dataset, self.debias_info

def load_data(self, source: str, source_type: str):
if source_type == "csv":
Expand Down