-
Notifications
You must be signed in to change notification settings - Fork 117
Rewrite new simulator to use JSON mode; additional fixes to new simulator #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ken is <|endoftext|>, and also fix bug when activation is outside of expected range of 0 to 10
…t of simulator (should be handled by preprocessing outside of library in case GPT bugs are fixed), add better debug logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! just some nits
normalized_activations = normalize_activations( | ||
activation_record.activations, max_activation=max_activation | ||
) | ||
return json.dumps({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think it would help to add indent=2
(or 4
) here? maybe gpt3.5 is better at parsing indented blocks of json rather than having everything on the same line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could help and definitely worth testing. IMO I haven't seen GPT have trouble parsing unindented JSON, though I can't prove it.
The main problem we ran into is GPT gets confused by non-ascii characters (it stops generating abruptly, or it will add extra non-existent tokens). For example, the bullet point, ellipses, pound symbol, etc. I should open a separate issue for this but it's not really fixable by this repo.
But I think OpenAI is aware and trying to fix:
https://community.openai.com/t/gpt-4-1106-preview-is-not-generating-utf-8/482839/6
https://community.openai.com/t/gpt-4-1106-preview-messes-up-function-call-parameters-encoding/478500/36?page=2
In the meantime our workaround is to double-escape non-ascii chars BEFORE we feed it to automated-interpretability:
hijohnnylin/neuronpedia-scorer@21c07d8
E.g, "\u2022" becomes "\\u2022"
I decided to put those "pre-processing" changes outside of this repo, since it's a temporary workaround until OpenAI fixes it - but lmk if you think it should be here instead. Can also make it an additional flag like replace_non_ascii
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it seems like GPT doesn't have trouble with unindented JSON then it's probably fine to leave it. in terms of the non-ascii characters, how prevalent is this problem? i.e. if you try to use the simulator (either in plaintext or json mode) how often does it bork the result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for delay - have been working on other parts of NP.
It borks it quite frequently on non-ascii in json mode (not sure about plaintext). I started out by special casing each character, but after 5-6 special cases it was apparent that it needed to all be excluded (that's also when I found the OpenAI community threads).
Here's a code sample to reproduce the issue. Run with the latest api_client
that supports json mode. GPT ends up returning truncated json that's unparseable.
import os
import asyncio
import json
os.environ["OPENAI_API_KEY"] = "YOUR_KEY"
from neuron_explainer.api_client import ApiClient
api_client = ApiClient(model_name="gpt-3.5-turbo-1106", max_concurrent=1)
to_send = {
"neuron": 3,
"explanation": "'protect', 'know', 'with' and 'save'",
"activations": [
{"token": "hello", "activation": None},
{"token": "hello", "activation": None},
{"token": "hello", "activation": None},
{"token": "hello", "activation": None},
{"token": "hello", "activation": None},
{"token": "hello", "activation": None},
{"token": " …", "activation": None},
{"token": " \u2022", "activation": None},
{"token": " £", "activation": None},
],
}
prompt = [
{
"role": "system",
"content": "We're studying neurons in a neural network. Each neuron looks for some particular thing in a short document. Look at an explanation of what the neuron does, and try to predict its activations on a particular token.\n\nFor each sequence, you will see the tokens in the sequence where the activations are left blank. You will print, in valid json, the exact same tokens verbatim, but with the activation values filled in according to the explanation.\nFill out the activation values from 0 to 10. Most activations will be 0.\n",
},
{
"role": "user",
"content": '{"neuron": 1, "explanation": "language related to something being groundbreaking", "activations": [{"token": "The", "activation": None}, {"token": " editors", "activation": None}, {"token": " of", "activation": None}, {"token": " Bi", "activation": None}, {"token": "opol", "activation": None}, {"token": "ym", "activation": None}, {"token": "ers", "activation": None}, {"token": " are", "activation": None}, {"token": " delighted", "activation": None}, {"token": " to", "activation": None}, {"token": " present", "activation": None}, {"token": " the", "activation": None}, {"token": " ", "activation": None}, {"token": "201", "activation": None}, {"token": "8", "activation": None}, {"token": " Murray", "activation": None}, {"token": " Goodman", "activation": None}, {"token": " Memorial", "activation": None}, {"token": " Prize", "activation": None}, {"token": " to", "activation": None}, {"token": " Professor", "activation": None}, {"token": " David", "activation": None}, {"token": " N", "activation": None}, {"token": ".", "activation": None}, {"token": " Ber", "activation": None}, {"token": "atan", "activation": None}, {"token": " in", "activation": None}, {"token": " recognition", "activation": None}, {"token": " of", "activation": None}, {"token": " his", "activation": None}, {"token": " seminal", "activation": None}, {"token": " contributions", "activation": None}, {"token": " to", "activation": None}, {"token": " bi", "activation": None}, {"token": "oph", "activation": None}, {"token": "ysics", "activation": None}, {"token": " and", "activation": None}, {"token": " their", "activation": None}, {"token": " impact", "activation": None}, {"token": " on", "activation": None}, {"token": " our", "activation": None}, {"token": " understanding", "activation": None}, {"token": " of", "activation": None}, {"token": " charge", "activation": None}, {"token": " transport", "activation": None}, {"token": " in", "activation": None}, {"token": " biom", "activation": None}, {"token": "olecules", "activation": None}, {"token": ".\\n\\n", "activation": None}, {"token": "In", "activation": None}, {"token": "aug", "activation": None}, {"token": "ur", "activation": None}, {"token": "ated", "activation": None}, {"token": " in", "activation": None}, {"token": " ", "activation": None}, {"token": "200", "activation": None}, {"token": "7", "activation": None}, {"token": " in", "activation": None}, {"token": " honor", "activation": None}, {"token": " of", "activation": None}, {"token": " the", "activation": None}, {"token": " Bi", "activation": None}, {"token": "opol", "activation": None}, {"token": "ym", "activation": None}, {"token": "ers", "activation": None}, {"token": " Found", "activation": None}, {"token": "ing", "activation": None}, {"token": " Editor", "activation": None}, {"token": ",", "activation": None}, {"token": " the", "activation": None}, {"token": " prize", "activation": None}, {"token": " is", "activation": None}, {"token": " awarded", "activation": None}, {"token": " for", "activation": None}, {"token": " outstanding", "activation": None}, {"token": " accomplishments", "activation": None}]}',
},
{
"role": "assistant",
"content": '{"neuron": 1, "explanation": "language related to something being groundbreaking", "activations": [{"token": "The", "activation": 0}, {"token": " editors", "activation": 0}, {"token": " of", "activation": 0}, {"token": " Bi", "activation": 0}, {"token": "opol", "activation": 0}, {"token": "ym", "activation": 0}, {"token": "ers", "activation": 0}, {"token": " are", "activation": 0}, {"token": " delighted", "activation": 0}, {"token": " to", "activation": 0}, {"token": " present", "activation": 0}, {"token": " the", "activation": 0}, {"token": " ", "activation": 0}, {"token": "201", "activation": 0}, {"token": "8", "activation": 0}, {"token": " Murray", "activation": 0}, {"token": " Goodman", "activation": 0}, {"token": " Memorial", "activation": 0}, {"token": " Prize", "activation": 0}, {"token": " to", "activation": 0}, {"token": " Professor", "activation": 0}, {"token": " David", "activation": 0}, {"token": " N", "activation": 0}, {"token": ".", "activation": 0}, {"token": " Ber", "activation": 0}, {"token": "atan", "activation": 0}, {"token": " in", "activation": 0}, {"token": " recognition", "activation": 0}, {"token": " of", "activation": 0}, {"token": " his", "activation": 0}, {"token": " seminal", "activation": 10}, {"token": " contributions", "activation": 0}, {"token": " to", "activation": 0}, {"token": " bi", "activation": 0}, {"token": "oph", "activation": 0}, {"token": "ysics", "activation": 0}, {"token": " and", "activation": 0}, {"token": " their", "activation": 0}, {"token": " impact", "activation": 0}, {"token": " on", "activation": 0}, {"token": " our", "activation": 0}, {"token": " understanding", "activation": 0}, {"token": " of", "activation": 0}, {"token": " charge", "activation": 0}, {"token": " transport", "activation": 0}, {"token": " in", "activation": 0}, {"token": " biom", "activation": 0}, {"token": "olecules", "activation": 0}, {"token": ".\\n\\n", "activation": 0}, {"token": "In", "activation": 0}, {"token": "aug", "activation": 0}, {"token": "ur", "activation": 0}, {"token": "ated", "activation": 0}, {"token": " in", "activation": 0}, {"token": " ", "activation": 0}, {"token": "200", "activation": 0}, {"token": "7", "activation": 0}, {"token": " in", "activation": 0}, {"token": " honor", "activation": 0}, {"token": " of", "activation": 0}, {"token": " the", "activation": 0}, {"token": " Bi", "activation": 0}, {"token": "opol", "activation": 0}, {"token": "ym", "activation": 0}, {"token": "ers", "activation": 0}, {"token": " Found", "activation": 0}, {"token": "ing", "activation": 1}, {"token": " Editor", "activation": 0}, {"token": ",", "activation": 0}, {"token": " the", "activation": 0}, {"token": " prize", "activation": 0}, {"token": " is", "activation": 0}, {"token": " awarded", "activation": 0}, {"token": " for", "activation": 0}, {"token": " outstanding", "activation": 0}, {"token": " accomplishments", "activation": 0}]}',
},
{
"role": "user",
"content": '{"neuron": 2, "explanation": "the word \\u201cvariant\\u201d and other words with the same \\u201dvari\\u201d root", "activations": [{"token": "{\\"", "activation": None}, {"token": "widget", "activation": None}, {"token": "Class", "activation": None}, {"token": "\\":\\"", "activation": None}, {"token": "Variant", "activation": None}, {"token": "Matrix", "activation": None}, {"token": "Widget", "activation": None}, {"token": "\\",\\"", "activation": None}, {"token": "back", "activation": None}, {"token": "order", "activation": None}, {"token": "Message", "activation": None}, {"token": "\\":\\"", "activation": None}, {"token": "Back", "activation": None}, {"token": "ordered", "activation": None}, {"token": "\\",\\"", "activation": None}, {"token": "back", "activation": None}, {"token": "order", "activation": None}, {"token": "Message", "activation": None}, {"token": "Single", "activation": None}, {"token": "Variant", "activation": None}, {"token": "\\":\\"", "activation": None}, {"token": "This", "activation": None}, {"token": " item", "activation": None}, {"token": " is", "activation": None}, {"token": " back", "activation": None}, {"token": "ordered", "activation": None}, {"token": ".\\",\\"", "activation": None}, {"token": "ordered", "activation": None}, {"token": "Selection", "activation": None}, {"token": "\\":", "activation": None}, {"token": "true", "activation": None}, {"token": ",\\"", "activation": None}, {"token": "product", "activation": None}, {"token": "Variant", "activation": None}, {"token": "Id", "activation": None}, {"token": "\\":", "activation": None}, {"token": "0", "activation": None}, {"token": ",\\"", "activation": None}, {"token": "variant", "activation": None}, {"token": "Id", "activation": None}, {"token": "Field", "activation": None}, {"token": "\\":\\"", "activation": None}, {"token": "product", "activation": None}, {"token": "196", "activation": None}, {"token": "39", "activation": None}, {"token": "_V", "activation": None}, {"token": "ariant", "activation": None}, {"token": "Id", "activation": None}, {"token": "\\",\\"", "activation": None}, {"token": "back", "activation": None}, {"token": "order", "activation": None}, {"token": "To", "activation": None}, {"token": "Message", "activation": None}, {"token": "Single", "activation": None}, {"token": "Variant", "activation": None}, {"token": "\\":\\"", "activation": None}, {"token": "This", "activation": None}, {"token": " item", "activation": None}, {"token": " is", "activation": None}, {"token": " back", "activation": None}, {"token": "ordered", "activation": None}, {"token": " and", "activation": None}, {"token": " is", "activation": None}, {"token": " expected", "activation": None}, {"token": " by", "activation": None}, {"token": " {", "activation": None}, {"token": "0", "activation": None}, {"token": "}.", "activation": None}, {"token": "\\",\\"", "activation": None}, {"token": "low", "activation": None}, {"token": "Price", "activation": None}, {"token": "\\":", "activation": None}, {"token": "999", "activation": None}, {"token": "9", "activation": None}, {"token": ".", "activation": None}, {"token": "0", "activation": None}, {"token": ",\\"", "activation": None}, {"token": "attribute", "activation": None}, {"token": "Indexes", "activation": None}, {"token": "\\":[", "activation": None}, {"token": "],\\"", "activation": None}, {"token": "productId", "activation": None}, {"token": "\\":", "activation": None}, {"token": "196", "activation": None}, {"token": "39", "activation": None}, {"token": ",\\"", "activation": None}, {"token": "price", "activation": None}, {"token": "V", "activation": None}, {"token": "ariance", "activation": None}, {"token": "\\":", "activation": None}, {"token": "true", "activation": None}, {"token": ",\\"", "activation": None}]}',
},
{
"role": "assistant",
"content": '{"neuron": 2, "explanation": "the word \\u201cvariant\\u201d and other words with the same \\u201dvari\\u201d root", "activations": [{"token": "{\\"", "activation": 0}, {"token": "widget", "activation": 0}, {"token": "Class", "activation": 0}, {"token": "\\":\\"", "activation": 0}, {"token": "Variant", "activation": 6}, {"token": "Matrix", "activation": 0}, {"token": "Widget", "activation": 0}, {"token": "\\",\\"", "activation": 0}, {"token": "back", "activation": 0}, {"token": "order", "activation": 0}, {"token": "Message", "activation": 0}, {"token": "\\":\\"", "activation": 0}, {"token": "Back", "activation": 0}, {"token": "ordered", "activation": 0}, {"token": "\\",\\"", "activation": 0}, {"token": "back", "activation": 0}, {"token": "order", "activation": 0}, {"token": "Message", "activation": 0}, {"token": "Single", "activation": 0}, {"token": "Variant", "activation": 0}, {"token": "\\":\\"", "activation": 0}, {"token": "This", "activation": 0}, {"token": " item", "activation": 0}, {"token": " is", "activation": 0}, {"token": " back", "activation": 0}, {"token": "ordered", "activation": 0}, {"token": ".\\",\\"", "activation": 0}, {"token": "ordered", "activation": 0}, {"token": "Selection", "activation": 0}, {"token": "\\":", "activation": 0}, {"token": "true", "activation": 0}, {"token": ",\\"", "activation": 0}, {"token": "product", "activation": 0}, {"token": "Variant", "activation": 0}, {"token": "Id", "activation": 0}, {"token": "\\":", "activation": 0}, {"token": "0", "activation": 0}, {"token": ",\\"", "activation": 0}, {"token": "variant", "activation": 0}, {"token": "Id", "activation": 0}, {"token": "Field", "activation": 0}, {"token": "\\":\\"", "activation": 0}, {"token": "product", "activation": 0}, {"token": "196", "activation": 0}, {"token": "39", "activation": 0}, {"token": "_V", "activation": 0}, {"token": "ariant", "activation": 0}, {"token": "Id", "activation": 0}, {"token": "\\",\\"", "activation": 0}, {"token": "back", "activation": 0}, {"token": "order", "activation": 0}, {"token": "To", "activation": 0}, {"token": "Message", "activation": 0}, {"token": "Single", "activation": 0}, {"token": "Variant", "activation": 0}, {"token": "\\":\\"", "activation": 0}, {"token": "This", "activation": 0}, {"token": " item", "activation": 0}, {"token": " is", "activation": 0}, {"token": " back", "activation": 0}, {"token": "ordered", "activation": 0}, {"token": " and", "activation": 0}, {"token": " is", "activation": 0}, {"token": " expected", "activation": 0}, {"token": " by", "activation": 0}, {"token": " {", "activation": 0}, {"token": "0", "activation": 0}, {"token": "}.", "activation": 0}, {"token": "\\",\\"", "activation": 0}, {"token": "low", "activation": 0}, {"token": "Price", "activation": 0}, {"token": "\\":", "activation": 0}, {"token": "999", "activation": 0}, {"token": "9", "activation": 0}, {"token": ".", "activation": 0}, {"token": "0", "activation": 0}, {"token": ",\\"", "activation": 0}, {"token": "attribute", "activation": 0}, {"token": "Indexes", "activation": 0}, {"token": "\\":[", "activation": 0}, {"token": "],\\"", "activation": 0}, {"token": "productId", "activation": 0}, {"token": "\\":", "activation": 0}, {"token": "196", "activation": 0}, {"token": "39", "activation": 0}, {"token": ",\\"", "activation": 0}, {"token": "price", "activation": 0}, {"token": "V", "activation": 0}, {"token": "ariance", "activation": 1}, {"token": "\\":", "activation": 0}, {"token": "true", "activation": 0}, {"token": ",\\"", "activation": 0}]}',
},
{
"role": "user",
"content": json.dumps(to_send),
},
]
print("activations length sent to GPT: " + str(len(to_send["activations"])))
print("activation tokens sent: ")
print([activation["token"] for activation in to_send["activations"]])
async def run():
response = await api_client.make_request(
messages=prompt, max_tokens=2000, temperature=0, json_mode=True
)
choice = response["choices"][0]
completion = choice["message"]["content"]
print("received string: " + completion)
received_json = json.loads(completion)
print(
"activations length received from GPT: "
+ str(len(received_json["activations"]))
)
print([activation["token"] for activation in received_json["activations"]])
asyncio.run(run())
You should get the following output and error:
activations length sent to GPT: 9
activation tokens sent:
['hello', 'hello', 'hello', 'hello', 'hello', 'hello', ' …', ' •', ' £']
received string: {"neuron": 3, "explanation": "'protect', 'know', 'with' and 'save'", "activations": [{"token": "hello", "activation": 0}, {"token": "hello", "activation": 0}, {"token": "hello", "activation": 0}, {"token": "hello", "activation": 0}, {"token": "hello", "activation": 0}, {"token": "hello", "activation": 0}, {"token": " \\u2026", "activation": 0}, {"token": " \", "
Traceback (most recent call last):
File "/Users/johnnylin/neuronpedia-scorer/src/test-json-ellipses.py", line 75, in <module>
asyncio.run(run())
File "/Users/johnnylin/.pyenv/versions/3.10.0/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/Users/johnnylin/.pyenv/versions/3.10.0/lib/python3.10/asyncio/base_events.py", line 641, in run_until_complete
return future.result()
File "/Users/johnnylin/neuronpedia-scorer/src/test-json-ellipses.py", line 67, in run
received_json = json.loads(completion)
File "/Users/johnnylin/.pyenv/versions/3.10.0/lib/python3.10/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/Users/johnnylin/.pyenv/versions/3.10.0/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/johnnylin/.pyenv/versions/3.10.0/lib/python3.10/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 465 (char 464)
Here's another reproducible example - replace to_send
's activations
with the array below. We give GPT 63 activations and only receive 62 back - the ellipses symbol (…) token and activation are missing from the response. In this version of the bug, GPT just silently eliminates the token.
"activations": [
{"token": " on", "activation": None},
{"token": " some", "activation": None},
{"token": " days", "activation": None},
{"token": " we", "activation": None},
{"token": " post", "activation": None},
{"token": " an", "activation": None},
{"token": " afternoon", "activation": None},
{"token": " story", "activation": None},
{"token": " at", "activation": None},
{"token": " around", "activation": None},
{"token": " 2", "activation": None},
{"token": " PM", "activation": None},
{"token": ".", "activation": None},
{"token": " After", "activation": None},
{"token": " every", "activation": None},
{"token": " new", "activation": None},
{"token": " story", "activation": None},
{"token": " we", "activation": None},
{"token": " send", "activation": None},
{"token": " out", "activation": None},
{"token": " an", "activation": None},
{"token": " alert", "activation": None},
{"token": " to", "activation": None},
{"token": " our", "activation": None},
{"token": " e", "activation": None},
{"token": "-", "activation": None},
{"token": "mail", "activation": None},
{"token": " list", "activation": None},
{"token": " and", "activation": None},
{"token": " our", "activation": None},
{"token": " FB", "activation": None},
{"token": " page", "activation": None},
{"token": ".", "activation": None},
{"token": "\n", "activation": None},
{"token": "\n", "activation": None},
{"token": "Learn", "activation": None},
{"token": " about", "activation": None},
{"token": " Scientology", "activation": None},
{"token": " with", "activation": None},
{"token": " our", "activation": None},
{"token": " numerous", "activation": None},
{"token": " series", "activation": None},
{"token": " with", "activation": None},
{"token": " experts", "activation": None},
{"token": "…", "activation": None},
{"token": "\n", "activation": None},
{"token": "\n", "activation": None},
{"token": "BL", "activation": None},
{"token": "OG", "activation": None},
{"token": "G", "activation": None},
{"token": "ING", "activation": None},
{"token": " DI", "activation": None},
{"token": "AN", "activation": None},
{"token": "ET", "activation": None},
{"token": "ICS", "activation": None},
{"token": ":", "activation": None},
{"token": " We", "activation": None},
{"token": " read", "activation": None},
{"token": " Scientology", "activation": None},
{"token": "��", "activation": None},
{"token": "s", "activation": None},
{"token": " founding", "activation": None},
{"token": " text", "activation": None},
]
You should see the following output, which shows the incorrect length of GPT's response.
activations length sent to GPT: 63
activation tokens sent:
[' on', ' some', ' days', ' we', ' post', ' an', ' afternoon', ' story', ' at', ' around', ' 2', ' PM', '.', ' After', ' every', ' new', ' story', ' we', ' send', ' out', ' an', ' alert', ' to', ' our', ' e', '-', 'mail', ' list', ' and', ' our', ' FB', ' page', '.', '\n', '\n', 'Learn', ' about', ' Scientology', ' with', ' our', ' numerous', ' series', ' with', ' experts', '…', '\n', '\n', 'BL', 'OG', 'G', 'ING', ' DI', 'AN', 'ET', 'ICS', ':', ' We', ' read', ' Scientology', '��', 's', ' founding', ' text']
activations length received from GPT: 62
[' on', ' some', ' days', ' we', ' post', ' an', ' afternoon', ' story', ' at', ' around', ' 2', ' PM', '.', ' After', ' every', ' new', ' story', ' we', ' send', ' out', ' an', ' alert', ' to', ' our', ' e', '-', 'mail', ' list', ' and', ' our', ' FB', ' page', '.', '\n', '\n', 'Learn', ' about', ' Scientology', ' with', ' our', ' numerous', ' series', ' with', ' experts', '\n', '\n', 'BL', 'OG', 'G', 'ING', ' DI', 'AN', 'ET', 'ICS', ':', ' We', ' read', ' Scientology', '\\ufffd\\ufffd', 's', ' founding', ' text']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no worries about the delay and thanks for the thorough instructions on reproducing the problem! in that case, I think it would be great to put the non-ascii preprocessing code in this file and add a flag to enable it like you suggested 🙏
return zero_prediction | ||
predicted_activations = [] | ||
# check that there is a token and activation value | ||
# no need to double check the token matches exactly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where's the first check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I should have placed this comment one line lower. If you mean the token and activation value check:
token check is line 665 if "token" not in activation:
activation check is line 669 if "activation" not in activation:
Or do you mean a different check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding the code correctly it looks like you only check that the number of tokens is as expected, but you don't check that any of the tokens individually are correct. is that right?
zero_prediction = [0] * len(tokens) | ||
token_lines = completion.strip("\n").split("༗\n") | ||
# FIX: Strip the last ༗\n, otherwise all last activations are invalid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! let's remove the FIX
, I don't think it makes sense out of context of the PR :)
token_lines = completion.strip("\n").split("༗\n") | ||
# FIX: Strip the last ༗\n, otherwise all last activations are invalid | ||
token_lines = completion.strip("\n").strip("༗\n").split("༗\n") | ||
# Edge Case #2: Sometimes GPT doesn't use the special character when it answers, it only uses the \n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line breaks are fairly common. How often do we get cases where GPT doesn't use the special character and the text doesn't contain \n
?
predicted_activation = token_line.split("\t")[1] | ||
if predicted_activation not in VALID_ACTIVATION_TOKENS: | ||
predicted_activation_split = token_line.split("\t") | ||
# Ensure token line has correct size after splitting. If not then assume it's a zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like there's a better way to do this, since I imagine tabs aren't rare. maybe we could split on tabs and take the last element in the list? and then if there's a problem with the result, it will be caught by the activation parsing code below?
self.explanation, | ||
) | ||
response = await self.api_client.make_request( | ||
messages=prompt, max_tokens=1000, temperature=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make max_tokens = 2000
a constant and have the above make_request
call use it too
This rewrites the new chat-based simulator to use JSON mode. New prompts and parsers were added to do this, and a new flag for Api_client
json_mode
. It works pretty well - in my testing it performed much more accurately than non-JSON mode, to the point where we are able to usegpt-3.5-turbo-1106
instead ofgpt-4
, resulting in massive cost and time savings. GPT4 took about 30 seconds, gpt-3.5-turbo-1106 takes about 10 seconds. JSON mode also eliminates the need for many of the response parsing edge cases.This also sets temperature = 0 as originally intended by the documentation.
This pull request also fixes other edge cases, some of which apply to JSON mode as well:
GPT's response, which is a list of tokens and activations, often omits the space before tokens (seen in about ~40% of results). Currently the response parser considers this an invalid response and returns with zero activations for all tokens. This PR allows the first token to be missing the space and still be considered valid.
New simulator uses special character
༗\n
as a unique separator between lines (since\n
is too common). However, GPT sometimes (~5% of the time) doesn't return༗\n
and only returns\n
. This causes the response parser to consider this an invalid response. This PR allows\n
to be the separator in the case that༗\n
doesn't work. However, if an activation text encounters this edge case and also has\n
, this fix won't work either. Better fix in the future is to re-query GPT as a followup e.g., ("it looks like you didn't include the ༗\n separator i originally included. can you try again?")<|endoftext|>
token in activation texts confuses the new chat-based simulator. Fix is to replace these occurrences with<|not_endoftext|>
.GPT sometimes gives a non-int activation (like 9.5 - it's never told that it needs to be an int). Since this allows more granularity it makes sense to allow it, so this PR changes int to float and enforces a value of 0 to 10 inclusive. Everything else is considered 0.
Some of these are fairly opinionated fixes, so feel free to exclude or alter them in any way you see fit.