added support for automatic_label_demo.py without chatgpt #142

lsch0lz · 2023-04-16T11:02:57Z

Hi! Since it's only possible to use automatic_label_demo.py with a paid OpenAI Account I implemented a method to use the functionality without ChatGPT.
To achieve this I used NLTK to extract nouns from the caption. I also used a Lemmatizer to improve the produced tags.

Your implementation with ChatGPT:

My implementation without ChatGPT:

I implemented the feature in way to let the user choose if they want to use ChatGPT or not. If the script get startet without the openai_key by default NLTK will be used. If the user provides a valid key ChatGPT will be used.

Let me know what you think about this feature.

Andy1621 · 2023-04-16T14:46:36Z

@lsch0lz Thanks for your commit! I also try to use nltk in the begining, but in some cases, it's hard to extract the nouns directly.

where the group is not a current num, but it is also hard for our ChatGPT prompt now hhh.
However, it often works. By the way, in my environment, it needs to download the packages manually.
Maybe you can add more instructions in README. And then I accept the change.

wget https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt.zip
# unzip in '~/nltk_data/tokenizers'
wget https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/taggers/averaged_perceptron_tagger.zip
# unzip in '~/nltk_data/taggers'
wget https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/wordnet.zip
# unzip in 'nltk_data/corpora'

lsch0lz · 2023-04-16T16:32:11Z

@Andy1621 Thanks for the Feedback!
I totally forgot that I already downloaded the NLTK packages. Now all the necessary packages are downloaded when NLTK is used. I think it's more elegenat, than let the user download all the packages manually.
Let me know what you think about that.

lsch0lz · 2023-04-16T16:34:40Z

Also what do you mean by:

where the group is not a current num

If I understand the intention correct group is a noun and should be detected as a tag in that case, right?

pierizvi

Looks Good To Test 🛠

Andy1621 · 2023-04-17T02:27:26Z

@lsch0lz Thanks! For group, unfortunately, I found that some caption models generate unrelated nouns, such as a group of, a family of and so on. These nouns are used as adjectives and should not be detected. Like in Tag2Text, it generates the caption as follows,

We should only detect the bear, shore and water.

However, using nltk can handle most of cases, while ChatGPT can handle some potentially difficult captions with fine-grained prompts, for example ignore those phrases like 'a group (or other noun) of'.
@rentainhe I approve to accept it!

rentainhe · 2023-04-18T07:12:56Z

Thanks for this PR, I'm going to merge it!

added support for automatic_label_demo.py without chatgpt

18a7caa

added automatically download of nltk data

89f47bc

rearranged order for NLTK usafe in README.md

f7044c9

pierizvi reviewed Apr 16, 2023

View reviewed changes

rentainhe merged commit bc526e6 into IDEA-Research:main Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added support for automatic_label_demo.py without chatgpt #142

added support for automatic_label_demo.py without chatgpt #142

lsch0lz commented Apr 16, 2023

Andy1621 commented Apr 16, 2023 •

edited

Loading

lsch0lz commented Apr 16, 2023

lsch0lz commented Apr 16, 2023 •

edited

Loading

pierizvi left a comment

Andy1621 commented Apr 17, 2023

rentainhe commented Apr 18, 2023

added support for automatic_label_demo.py without chatgpt #142

added support for automatic_label_demo.py without chatgpt #142

Conversation

lsch0lz commented Apr 16, 2023

Andy1621 commented Apr 16, 2023 • edited Loading

lsch0lz commented Apr 16, 2023

lsch0lz commented Apr 16, 2023 • edited Loading

pierizvi left a comment

Choose a reason for hiding this comment

Andy1621 commented Apr 17, 2023

rentainhe commented Apr 18, 2023

Andy1621 commented Apr 16, 2023 •

edited

Loading

lsch0lz commented Apr 16, 2023 •

edited

Loading