Classify flags into general, linear, nn categories #26

Abies-0 · 2025-10-09T06:27:20Z

What does this PR do?

(Some descriptions here...)

Test CLI & API (`bash tests/autotest.sh`)

Test APIs used by main.py.

Test Pass
- (Copy and paste the last outputted line here.)
Not Applicable (i.e., the PR does not include API changes.)

Check API Document

If any new APIs are added, please check if the description of the APIs is added to API document.

API document is updated (linear, nn)
Not Applicable (i.e., the PR does not include API changes.)

Test quickstart & API (`bash tests/docs/test_changed_document.sh`)

If any APIs in quickstarts or tutorials are modified, please run this test to check if the current examples can run correctly after the modified APIs are released.

…me flags orders in main.py

Eleven1Liu

Please help change settings in https://github.com/ntumlgroup/LibMultiLabel/blob/master/docs/conf.py#L48-L52 to prevent writing sg_execution_times.rst in Sphinx 5.

sphinx_gallery_conf = {
     ...
     "write_computation_times": False,
     ...
}

…es.rst in Sphinx 5 - optimize code in docs/cli/classifier.py - reformat the above scripts

Eleven1Liu · 2025-10-20T04:52:27Z

docs/cli/classifier.py

+current_dir = os.path.dirname(os.path.abspath(__file__))
+lib_path = os.path.abspath(os.path.join(current_dir, "..", ".."))
+sys.path.insert(0, lib_path)


Suggested change

current_dir = os.path.dirname(os.path.abspath(__file__))

lib_path = os.path.abspath(os.path.join(current_dir, "..", ".."))

sys.path.insert(0, lib_path)

sys.path.insert(0, os.getcwd())

Eleven1Liu · 2025-10-20T08:52:42Z

docs/cli/classifier.py

+def classify(raw_flags):
+
+    category_set = {"general": set(), "linear": set(), "nn": set()}
+    flags = fetch_option_flags(raw_flags)
+    allowed_keys = set(flag["instruction"] for flag in flags)
+    file_set = fetch_all_files()
+    usage_map = defaultdict(list)
+    collected = {}
+
+    for file_path in file_set:
+        detailed_results = find_config_usages_in_file(file_path, allowed_keys)
+        if detailed_results:
+            usage_map[file_path] = set(detailed_results.keys())
+            for k, v in detailed_results.items():
+                if k not in collected:
+                    collected[k] = []
+                collected[k].append(v)
+
+    for path, keys in usage_map.items():
+        category, path = classify_file_category(path)
+        category_set[category] = category_set[category].union(keys)
+
+    category_set = move_duplicates_together(category_set, "general")
+
+    for flag in flags:
+        for k, v in category_set.items():
+            for i in v:
+                if flag["instruction"] == i:
+                    flag["category"] = k
+        if "category" not in flag:
+            flag["category"] = "general"
+
+    result = {}
+    for flag in flags:
+        if flag["category"] not in result:
+            result[flag["category"]] = []
+
+        result[flag["category"]].append(
+            {"name": flag["name"].replace("--", r"\-\-"), "description": flag["description"]}
+        )
+
+    result["details"] = []
+    for k, v in collected.items():
+        result["details"].append({"name": k, "file": v[0]["file"], "location": ", ".join(v[0]["lines"])})
+        if len(v) > 1:
+            for i in v[1:]:
+                result["details"].append({"name": "", "file": i["file"], "location": ", ".join(i["lines"])})
+
+    return result


How about simplify the data structure (e.g., unused detailed line numbers) after the spec is decided?

Suggested change

def classify(raw_flags):

category_set = {"general": set(), "linear": set(), "nn": set()}

flags = fetch_option_flags(raw_flags)

allowed_keys = set(flag["instruction"] for flag in flags)

file_set = fetch_all_files()

usage_map = defaultdict(list)

collected = {}

for file_path in file_set:

detailed_results = find_config_usages_in_file(file_path, allowed_keys)

if detailed_results:

usage_map[file_path] = set(detailed_results.keys())

for k, v in detailed_results.items():

if k not in collected:

collected[k] = []

collected[k].append(v)

for path, keys in usage_map.items():

category, path = classify_file_category(path)

category_set[category] = category_set[category].union(keys)

category_set = move_duplicates_together(category_set, "general")

for flag in flags:

for k, v in category_set.items():

for i in v:

if flag["instruction"] == i:

flag["category"] = k

if "category" not in flag:

flag["category"] = "general"

result = {}

for flag in flags:

if flag["category"] not in result:

result[flag["category"]] = []

result[flag["category"]].append(

{"name": flag["name"].replace("--", r"\-\-"), "description": flag["description"]}

)

result["details"] = []

for k, v in collected.items():

result["details"].append({"name": k, "file": v[0]["file"], "location": ", ".join(v[0]["lines"])})

if len(v) > 1:

for i in v[1:]:

result["details"].append({"name": "", "file": i["file"], "location": ", ".join(i["lines"])})

return result

def classify(raw_flags):

category_set = {"general": set(), "linear": set(), "nn": set()}

flags = fetch_option_flags(raw_flags)

allowed_keys = set(flag["instruction"] for flag in flags)

file_set = fetch_all_files()

for file_path in file_set:

find_config_usages_in_file(file_path, allowed_keys, category_set)

category_set = move_duplicates_together(category_set)

result = defaultdict(list)

for flag in raw_flags:

for category, keys in category_set.items():

for key in keys:

if key in flag["name"]:

result[category].append(flag)

return result

Eleven1Liu · 2025-10-20T08:53:52Z

docs/cli/classifier.py

+def find_config_usages_in_file(file_path, allowed_keys):
+    pattern = re.compile(r"\bconfig\.([a-zA-Z_][a-zA-Z0-9_]*)")
+    detailed_results = {}
+    try:
+        with open(file_path, "r", encoding="utf-8") as f:
+            lines = f.readlines()
+    except (IOError, UnicodeDecodeError):
+        return []
+
+    _, path = classify_file_category(file_path)
+
+    if file_path.endswith("main.py"):
+        for idx in range(len(lines)):
+            if lines[idx].startswith("def main("):
+                lines = lines[idx:]
+                main_start = idx
+                break
+        for i, line in enumerate(lines[1:]):
+            if line and line[0] not in (" ", "\t") and line.strip() != "":
+                lines = lines[:i]
+                break
+
+    for i, line in enumerate(lines, start=1):
+        matches = pattern.findall(line)
+        for key in matches:
+            if key in allowed_keys:
+                if key not in detailed_results:
+                    detailed_results[key] = {"file": path, "lines": []}
+                if file_path.endswith("main.py"):
+                    detailed_results[key]["lines"].append(str(i + main_start))
+                else:
+                    detailed_results[key]["lines"].append(str(i))
+
+    return detailed_results


Similar in this function,

Suggested change

def find_config_usages_in_file(file_path, allowed_keys):

pattern = re.compile(r"\bconfig\.([a-zA-Z_][a-zA-Z0-9_]*)")

detailed_results = {}

try:

with open(file_path, "r", encoding="utf-8") as f:

lines = f.readlines()

except (IOError, UnicodeDecodeError):

return []

_, path = classify_file_category(file_path)

if file_path.endswith("main.py"):

for idx in range(len(lines)):

if lines[idx].startswith("def main("):

lines = lines[idx:]

main_start = idx

break

for i, line in enumerate(lines[1:]):

if line and line[0] not in (" ", "\t") and line.strip() != "":

lines = lines[:i]

break

for i, line in enumerate(lines, start=1):

matches = pattern.findall(line)

for key in matches:

if key in allowed_keys:

if key not in detailed_results:

detailed_results[key] = {"file": path, "lines": []}

if file_path.endswith("main.py"):

detailed_results[key]["lines"].append(str(i + main_start))

else:

detailed_results[key]["lines"].append(str(i))

return detailed_results

def find_config_usages_in_file(file_path, allowed_keys, category_set):

pattern = re.compile(r"\bconfig\.([a-zA-Z_][a-zA-Z0-9_]*)")

try:

with open(file_path, "r", encoding="utf-8") as f:

lines = f.readlines()

except (IOError, UnicodeDecodeError):

return []

# get start line in main.py

if file_path.endswith("main.py"):

for idx in range(len(lines)):

if lines[idx].startswith("def main("):

lines = lines[idx:]

break

all_str = " ".join(lines)

matches = set(pattern.findall(all_str)) & allowed_keys

category = classify_file_category(file_path)[0]

for key in matches:

category_set[category].add(key)

TBD: try-catch here can be removed, as we want to see error instantly when someone change the files (not like production settings)

Eleven1Liu · 2025-10-20T08:58:50Z

docs/cli/classifier.py

+def classify_file_category(path):
+
+    relative_path = Path(path).relative_to(lib_path)
+    return_path = relative_path.as_posix()
+    filename = Path(*relative_path.parts[1:]).as_posix() if len(relative_path.parts) > 1 else return_path
+
+    if filename.startswith("linear"):
+        category = "linear"
+    elif filename.startswith("torch") or filename.startswith("nn"):
+        category = "nn"
+    else:
+        category = "general"
+    return category, return_path


After simplifying find_config_usage_in_file, we no longer need return_path here.
Let's discuss how it can be simplified further.

Eleven1Liu · 2025-10-20T08:59:34Z

docs/cli/classifier.py

+def move_duplicates_together(data, keep):
+    all_keys = list(data.keys())
+    duplicates = set()
+
+    for i, key1 in enumerate(all_keys):
+        for key2 in all_keys[i + 1 :]:
+            duplicates |= data[key1] & data[key2]
+
+    data[keep] |= duplicates
+
+    for key in all_keys:
+        if key != keep:
+            data[key] -= duplicates
+
+    return data


TBD: readability

Abies-0 requested review from a team and cjlin1 as code owners October 9, 2025 06:27

Eleven1Liu self-assigned this Oct 9, 2025

Eleven1Liu added the documentation Improvements or additions to documentation label Oct 9, 2025

Eleven1Liu unassigned Eleven1Liu Oct 9, 2025

Abies-0 force-pushed the master branch from 10c2d9e to 39c4f52 Compare October 9, 2025 07:28

Winter Deng added 2 commits October 9, 2025 15:55

classify flags to 'general', 'linear', 'nn' categories, and change so…

4c5459c

…me flags orders in main.py

reformat changed code

38fc479

Abies-0 force-pushed the master branch from 39c4f52 to 38fc479 Compare October 9, 2025 07:57

Eleven1Liu requested changes Oct 9, 2025

View reviewed changes

- change settings in docs/conf.py to prevent writing sg_execution_tim…

cd7554d

…es.rst in Sphinx 5 - optimize code in docs/cli/classifier.py - reformat the above scripts

Eleven1Liu self-requested a review October 20, 2025 04:07

Eleven1Liu reviewed Oct 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Classify flags into general, linear, nn categories #26

Classify flags into general, linear, nn categories #26

Uh oh!

Abies-0 commented Oct 9, 2025

Uh oh!

Eleven1Liu left a comment •

edited

Loading

Uh oh!

Eleven1Liu Oct 20, 2025

Uh oh!

Eleven1Liu Oct 20, 2025

Uh oh!

Eleven1Liu Oct 20, 2025 •

edited

Loading

Uh oh!

Eleven1Liu Oct 20, 2025 •

edited

Loading

Uh oh!

Eleven1Liu Oct 20, 2025

Uh oh!

Eleven1Liu Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Classify flags into general, linear, nn categories #26

Are you sure you want to change the base?

Classify flags into general, linear, nn categories #26

Uh oh!

Conversation

Abies-0 commented Oct 9, 2025

What does this PR do?

Test CLI & API (bash tests/autotest.sh)

Check API Document

Test quickstart & API (bash tests/docs/test_changed_document.sh)

Uh oh!

Eleven1Liu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Eleven1Liu Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Eleven1Liu Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Eleven1Liu Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Eleven1Liu Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Eleven1Liu Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Eleven1Liu Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Test CLI & API (`bash tests/autotest.sh`)

Test quickstart & API (`bash tests/docs/test_changed_document.sh`)

Eleven1Liu left a comment •

edited

Loading

Eleven1Liu Oct 20, 2025 •

edited

Loading

Eleven1Liu Oct 20, 2025 •

edited

Loading