Rewrite new simulator to use JSON mode; additional fixes to new simulator #34

hijohnnylin · 2023-12-29T02:05:56Z

This rewrites the new chat-based simulator to use JSON mode. New prompts and parsers were added to do this, and a new flag for Api_client json_mode. It works pretty well - in my testing it performed much more accurately than non-JSON mode, to the point where we are able to use gpt-3.5-turbo-1106 instead of gpt-4, resulting in massive cost and time savings. GPT4 took about 30 seconds, gpt-3.5-turbo-1106 takes about 10 seconds. JSON mode also eliminates the need for many of the response parsing edge cases.

This also sets temperature = 0 as originally intended by the documentation.

This pull request also fixes other edge cases, some of which apply to JSON mode as well:

GPT's response, which is a list of tokens and activations, often omits the space before tokens (seen in about ~40% of results). Currently the response parser considers this an invalid response and returns with zero activations for all tokens. This PR allows the first token to be missing the space and still be considered valid.
New simulator uses special character ༗\n as a unique separator between lines (since \n is too common). However, GPT sometimes (~5% of the time) doesn't return ༗\n and only returns \n. This causes the response parser to consider this an invalid response. This PR allows \n to be the separator in the case that ༗\n doesn't work. However, if an activation text encounters this edge case and also has \n, this fix won't work either. Better fix in the future is to re-query GPT as a followup e.g., ("it looks like you didn't include the ༗\n separator i originally included. can you try again?")
<|endoftext|> token in activation texts confuses the new chat-based simulator. Fix is to replace these occurrences with <|not_endoftext|>.
GPT sometimes gives a non-int activation (like 9.5 - it's never told that it needs to be an int). Since this allows more granularity it makes sense to allow it, so this PR changes int to float and enforces a value of 0 to 10 inclusive. Everything else is considered 0.

Some of these are fairly opinionated fixes, so feel free to exclude or alter them in any way you see fit.

…ken is <|endoftext|>, and also fix bug when activation is outside of expected range of 0 to 10

…t of simulator (should be handled by preprocessing outside of library in case GPT bugs are fixed), add better debug logging

henktillman

nice! just some nits

henktillman · 2024-01-09T01:19:38Z

neuron-explainer/neuron_explainer/explanations/simulator.py

+        normalized_activations = normalize_activations(
+            activation_record.activations, max_activation=max_activation
+        )
+    return json.dumps({


do you think it would help to add indent=2 (or 4) here? maybe gpt3.5 is better at parsing indented blocks of json rather than having everything on the same line.

Could help and definitely worth testing. IMO I haven't seen GPT have trouble parsing unindented JSON, though I can't prove it.

The main problem we ran into is GPT gets confused by non-ascii characters (it stops generating abruptly, or it will add extra non-existent tokens). For example, the bullet point, ellipses, pound symbol, etc. I should open a separate issue for this but it's not really fixable by this repo.

But I think OpenAI is aware and trying to fix:
https://community.openai.com/t/gpt-4-1106-preview-is-not-generating-utf-8/482839/6
https://community.openai.com/t/gpt-4-1106-preview-messes-up-function-call-parameters-encoding/478500/36?page=2

In the meantime our workaround is to double-escape non-ascii chars BEFORE we feed it to automated-interpretability:
hijohnnylin/neuronpedia-scorer@21c07d8
E.g, "\u2022" becomes "\\u2022"

I decided to put those "pre-processing" changes outside of this repo, since it's a temporary workaround until OpenAI fixes it - but lmk if you think it should be here instead. Can also make it an additional flag like replace_non_ascii or something.

if it seems like GPT doesn't have trouble with unindented JSON then it's probably fine to leave it. in terms of the non-ascii characters, how prevalent is this problem? i.e. if you try to use the simulator (either in plaintext or json mode) how often does it bork the result?

Apologies for delay - have been working on other parts of NP.

It borks it quite frequently on non-ascii in json mode (not sure about plaintext). I started out by special casing each character, but after 5-6 special cases it was apparent that it needed to all be excluded (that's also when I found the OpenAI community threads).

Here's a code sample to reproduce the issue. Run with the latest api_client that supports json mode. GPT ends up returning truncated json that's unparseable.

import os import asyncio import json os.environ["OPENAI_API_KEY"] = "YOUR_KEY" from neuron_explainer.api_client import ApiClient api_client = ApiClient(model_name="gpt-3.5-turbo-1106", max_concurrent=1) to_send = { "neuron": 3, "explanation": "'protect', 'know', 'with' and 'save'", "activations": [ {"token": "hello", "activation": None}, {"token": "hello", "activation": None}, {"token": "hello", "activation": None}, {"token": "hello", "activation": None}, {"token": "hello", "activation": None}, {"token": "hello", "activation": None}, {"token": " …", "activation": None}, {"token": " \u2022", "activation": None}, {"token": " £", "activation": None}, ], } prompt = [ { "role": "system", "content": "We're studying neurons in a neural network. Each neuron looks for some particular thing in a short document. Look at an explanation of what the neuron does, and try to predict its activations on a particular token.\n\nFor each sequence, you will see the tokens in the sequence where the activations are left blank. You will print, in valid json, the exact same tokens verbatim, but with the activation values filled in according to the explanation.\nFill out the activation values from 0 to 10. Most activations will be 0.\n", }, { "role": "user", "content": '{"neuron": 1, "explanation": "language related to something being groundbreaking", "activations": [{"token": "The", "activation": None}, {"token": " editors", "activation": None}, {"token": " of", "activation": None}, {"token": " Bi", "activation": None}, {"token": "opol", "activation": None}, {"token": "ym", "activation": None}, {"token": "ers", "activation": None}, {"token": " are", "activation": None}, {"token": " delighted", "activation": None}, {"token": " to", "activation": None}, {"token": " present", "activation": None}, {"token": " the", "activation": None}, {"token": " ", "activation": None}, {"token": "201", "activation": None}, {"token": "8", "activation": None}, {"token": " Murray", "activation": None}, {"token": " Goodman", "activation": None}, {"token": " Memorial", "activation": None}, {"token": " Prize", "activation": None}, {"token": " to", "activation": None}, {"token": " Professor", "activation": None}, {"token": " David", "activation": None}, {"token": " N", "activation": None}, {"token": ".", "activation": None}, {"token": " Ber", "activation": None}, {"token": "atan", "activation": None}, {"token": " in", "activation": None}, {"token": " recognition", "activation": None}, {"token": " of", "activation": None}, {"token": " his", "activation": None}, {"token": " seminal", "activation": None}, {"token": " contributions", "activation": None}, {"token": " to", "activation": None}, {"token": " bi", "activation": None}, {"token": "oph", "activation": None}, {"token": "ysics", "activation": None}, {"token": " and", "activation": None}, {"token": " their", "activation": None}, {"token": " impact", "activation": None}, {"token": " on", "activation": None}, {"token": " our", "activation": None}, {"token": " understanding", "activation": None}, {"token": " of", "activation": None}, {"token": " charge", "activation": None}, {"token": " transport", "activation": None}, {"token": " in", "activation": None}, {"token": " biom", "activation": None}, {"token": "olecules", "activation": None}, {"token": ".\\n\\n", "activation": None}, {"token": "In", "activation": None}, {"token": "aug", "activation": None}, {"token": "ur", "activation": None}, {"token": "ated", "activation": None}, {"token": " in", "activation": None}, {"token": " ", "activation": None}, {"token": "200", "activation": None}, {"token": "7", "activation": None}, {"token": " in", "activation": None}, {"token": " honor", "activation": None}, {"token": " of", "activation": None}, {"token": " the", "activation": None}, {"token": " Bi", "activation": None}, {"token": "opol", "activation": None}, {"token": "ym", "activation": None}, {"token": "ers", "activation": None}, {"token": " Found", "activation": None}, {"token": "ing", "activation": None}, {"token": " Editor", "activation": None}, {"token": ",", "activation": None}, {"token": " the", "activation": None}, {"token": " prize", "activation": None}, {"token": " is", "activation": None}, {"token": " awarded", "activation": None}, {"token": " for", "activation": None}, {"token": " outstanding", "activation": None}, {"token": " accomplishments", "activation": None}]}', }, { "role": "assistant", "content": '{"neuron": 1, "explanation": "language related to something being groundbreaking", "activations": [{"token": "The", "activation": 0}, {"token": " editors", "activation": 0}, {"token": " of", "activation": 0}, {"token": " Bi", "activation": 0}, {"token": "opol", "activation": 0}, {"token": "ym", "activation": 0}, {"token": "ers", "activation": 0}, {"token": " are", "activation": 0}, {"token": " delighted", "activation": 0}, {"token": " to", "activation": 0}, {"token": " present", "activation": 0}, {"token": " the", "activation": 0}, {"token": " ", "activation": 0}, {"token": "201", "activation": 0}, {"token": "8", "activation": 0}, {"token": " Murray", "activation": 0}, {"token": " Goodman", "activation": 0}, {"token": " Memorial", "activation": 0}, {"token": " Prize", "activation": 0}, {"token": " to", "activation": 0}, {"token": " Professor", "activation": 0}, {"token": " David", "activation": 0}, {"token": " N", "activation": 0}, {"token": ".", "activation": 0}, {"token": " Ber", "activation": 0}, {"token": "atan", "activation": 0}, {"token": " in", "activation": 0}, {"token": " recognition", "activation": 0}, {"token": " of", "activation": 0}, {"token": " his", "activation": 0}, {"token": " seminal", "activation": 10}, {"token": " contributions", "activation": 0}, {"token": " to", "activation": 0}, {"token": " bi", "activation": 0}, {"token": "oph", "activation": 0}, {"token": "ysics", "activation": 0}, {"token": " and", "activation": 0}, {"token": " their", "activation": 0}, {"token": " impact", "activation": 0}, {"token": " on", "activation": 0}, {"token": " our", "activation": 0}, {"token": " understanding", "activation": 0}, {"token": " of", "activation": 0}, {"token": " charge", "activation": 0}, {"token": " transport", "activation": 0}, {"token": " in", "activation": 0}, {"token": " biom", "activation": 0}, {"token": "olecules", "activation": 0}, {"token": ".\\n\\n", "activation": 0}, {"token": "In", "activation": 0}, {"token": "aug", "activation": 0}, {"token": "ur", "activation": 0}, {"token": "ated", "activation": 0}, {"token": " in", "activation": 0}, {"token": " ", "activation": 0}, {"token": "200", "activation": 0}, {"token": "7", "activation": 0}, {"token": " in", "activation": 0}, {"token": " honor", "activation": 0}, {"token": " of", "activation": 0}, {"token": " the", "activation": 0}, {"token": " Bi", "activation": 0}, {"token": "opol", "activation": 0}, {"token": "ym", "activation": 0}, {"token": "ers", "activation": 0}, {"token": " Found", "activation": 0}, {"token": "ing", "activation": 1}, {"token": " Editor", "activation": 0}, {"token": ",", "activation": 0}, {"token": " the", "activation": 0}, {"token": " prize", "activation": 0}, {"token": " is", "activation": 0}, {"token": " awarded", "activation": 0}, {"token": " for", "activation": 0}, {"token": " outstanding", "activation": 0}, {"token": " accomplishments", "activation": 0}]}', }, { "role": "user", "content": '{"neuron": 2, "explanation": "the word \\u201cvariant\\u201d and other words with the same \\u201dvari\\u201d root", "activations": [{"token": "{\\"", "activation": None}, {"token": "widget", "activation": None}, {"token": "Class", "activation": None}, {"token": "\\":\\"", "activation": None}, {"token": "Variant", "activation": None}, {"token": "Matrix", "activation": None}, {"token": "Widget", "activation": None}, {"token": "\\",\\"", "activation": None}, {"token": "back", "activation": None}, {"token": "order", "activation": None}, {"token": "Message", "activation": None}, {"token": "\\":\\"", "activation": None}, {"token": "Back", "activation": None}, {"token": "ordered", "activation": None}, {"token": "\\",\\"", "activation": None}, {"token": "back", "activation": None}, {"token": "order", "activation": None}, {"token": "Message", "activation": None}, {"token": "Single", "activation": None}, {"token": "Variant", "activation": None}, {"token": "\\":\\"", "activation": None}, {"token": "This", "activation": None}, {"token": " item", "activation": None}, {"token": " is", "activation": None}, {"token": " back", "activation": None}, {"token": "ordered", "activation": None}, {"token": ".\\",\\"", "activation": None}, {"token": "ordered", "activation": None}, {"token": "Selection", "activation": None}, {"token": "\\":", "activation": None}, {"token": "true", "activation": None}, {"token": ",\\"", "activation": None}, {"token": "product", "activation": None}, {"token": "Variant", "activation": None}, {"token": "Id", "activation": None}, {"token": "\\":", "activation": None}, {"token": "0", "activation": None}, {"token": ",\\"", "activation": None}, {"token": "variant", "activation": None}, {"token": "Id", "activation": None}, {"token": "Field", "activation": None}, {"token": "\\":\\"", "activation": None}, {"token": "product", "activation": None}, {"token": "196", "activation": None}, {"token": "39", "activation": None}, {"token": "_V", "activation": None}, {"token": "ariant", "activation": None}, {"token": "Id", "activation": None}, {"token": "\\",\\"", "activation": None}, {"token": "back", "activation": None}, {"token": "order", "activation": None}, {"token": "To", "activation": None}, {"token": "Message", "activation": None}, {"token": "Single", "activation": None}, {"token": "Variant", "activation": None}, {"token": "\\":\\"", "activation": None}, {"token": "This", "activation": None}, {"token": " item", "activation": None}, {"token": " is", "activation": None}, {"token": " back", "activation": None}, {"token": "ordered", "activation": None}, {"token": " and", "activation": None}, {"token": " is", "activation": None}, {"token": " expected", "activation": None}, {"token": " by", "activation": None}, {"token": " {", "activation": None}, {"token": "0", "activation": None}, {"token": "}.", "activation": None}, {"token": "\\",\\"", "activation": None}, {"token": "low", "activation": None}, {"token": "Price", "activation": None}, {"token": "\\":", "activation": None}, {"token": "999", "activation": None}, {"token": "9", "activation": None}, {"token": ".", "activation": None}, {"token": "0", "activation": None}, {"token": ",\\"", "activation": None}, {"token": "attribute", "activation": None}, {"token": "Indexes", "activation": None}, {"token": "\\":[", "activation": None}, {"token": "],\\"", "activation": None}, {"token": "productId", "activation": None}, {"token": "\\":", "activation": None}, {"token": "196", "activation": None}, {"token": "39", "activation": None}, {"token": ",\\"", "activation": None}, {"token": "price", "activation": None}, {"token": "V", "activation": None}, {"token": "ariance", "activation": None}, {"token": "\\":", "activation": None}, {"token": "true", "activation": None}, {"token": ",\\"", "activation": None}]}', }, { "role": "assistant", "content": '{"neuron": 2, "explanation": "the word \\u201cvariant\\u201d and other words with the same \\u201dvari\\u201d root", "activations": [{"token": "{\\"", "activation": 0}, {"token": "widget", "activation": 0}, {"token": "Class", "activation": 0}, {"token": "\\":\\"", "activation": 0}, {"token": "Variant", "activation": 6}, {"token": "Matrix", "activation": 0}, {"token": "Widget", "activation": 0}, {"token": "\\",\\"", "activation": 0}, {"token": "back", "activation": 0}, {"token": "order", "activation": 0}, {"token": "Message", "activation": 0}, {"token": "\\":\\"", "activation": 0}, {"token": "Back", "activation": 0}, {"token": "ordered", "activation": 0}, {"token": "\\",\\"", "activation": 0}, {"token": "back", "activation": 0}, {"token": "order", "activation": 0}, {"token": "Message", "activation": 0}, {"token": "Single", "activation": 0}, {"token": "Variant", "activation": 0}, {"token": "\\":\\"", "activation": 0}, {"token": "This", "activation": 0}, {"token": " item", "activation": 0}, {"token": " is", "activation": 0}, {"token": " back", "activation": 0}, {"token": "ordered", "activation": 0}, {"token": ".\\",\\"", "activation": 0}, {"token": "ordered", "activation": 0}, {"token": "Selection", "activation": 0}, {"token": "\\":", "activation": 0}, {"token": "true", "activation": 0}, {"token": ",\\"", "activation": 0}, {"token": "product", "activation": 0}, {"token": "Variant", "activation": 0}, {"token": "Id", "activation": 0}, {"token": "\\":", "activation": 0}, {"token": "0", "activation": 0}, {"token": ",\\"", "activation": 0}, {"token": "variant", "activation": 0}, {"token": "Id", "activation": 0}, {"token": "Field", "activation": 0}, {"token": "\\":\\"", "activation": 0}, {"token": "product", "activation": 0}, {"token": "196", "activation": 0}, {"token": "39", "activation": 0}, {"token": "_V", "activation": 0}, {"token": "ariant", "activation": 0}, {"token": "Id", "activation": 0}, {"token": "\\",\\"", "activation": 0}, {"token": "back", "activation": 0}, {"token": "order", "activation": 0}, {"token": "To", "activation": 0}, {"token": "Message", "activation": 0}, {"token": "Single", "activation": 0}, {"token": "Variant", "activation": 0}, {"token": "\\":\\"", "activation": 0}, {"token": "This", "activation": 0}, {"token": " item", "activation": 0}, {"token": " is", "activation": 0}, {"token": " back", "activation": 0}, {"token": "ordered", "activation": 0}, {"token": " and", "activation": 0}, {"token": " is", "activation": 0}, {"token": " expected", "activation": 0}, {"token": " by", "activation": 0}, {"token": " {", "activation": 0}, {"token": "0", "activation": 0}, {"token": "}.", "activation": 0}, {"token": "\\",\\"", "activation": 0}, {"token": "low", "activation": 0}, {"token": "Price", "activation": 0}, {"token": "\\":", "activation": 0}, {"token": "999", "activation": 0}, {"token": "9", "activation": 0}, {"token": ".", "activation": 0}, {"token": "0", "activation": 0}, {"token": ",\\"", "activation": 0}, {"token": "attribute", "activation": 0}, {"token": "Indexes", "activation": 0}, {"token": "\\":[", "activation": 0}, {"token": "],\\"", "activation": 0}, {"token": "productId", "activation": 0}, {"token": "\\":", "activation": 0}, {"token": "196", "activation": 0}, {"token": "39", "activation": 0}, {"token": ",\\"", "activation": 0}, {"token": "price", "activation": 0}, {"token": "V", "activation": 0}, {"token": "ariance", "activation": 1}, {"token": "\\":", "activation": 0}, {"token": "true", "activation": 0}, {"token": ",\\"", "activation": 0}]}', }, { "role": "user", "content": json.dumps(to_send), }, ] print("activations length sent to GPT: " + str(len(to_send["activations"]))) print("activation tokens sent: ") print([activation["token"] for activation in to_send["activations"]]) async def run(): response = await api_client.make_request( messages=prompt, max_tokens=2000, temperature=0, json_mode=True ) choice = response["choices"][0] completion = choice["message"]["content"] print("received string: " + completion) received_json = json.loads(completion) print( "activations length received from GPT: " + str(len(received_json["activations"])) ) print([activation["token"] for activation in received_json["activations"]]) asyncio.run(run())

You should get the following output and error:

activations length sent to GPT: 9 activation tokens sent: ['hello', 'hello', 'hello', 'hello', 'hello', 'hello', ' …', ' •', ' £'] received string: {"neuron": 3, "explanation": "'protect', 'know', 'with' and 'save'", "activations": [{"token": "hello", "activation": 0}, {"token": "hello", "activation": 0}, {"token": "hello", "activation": 0}, {"token": "hello", "activation": 0}, {"token": "hello", "activation": 0}, {"token": "hello", "activation": 0}, {"token": " \\u2026", "activation": 0}, {"token": " \", " Traceback (most recent call last): File "/Users/johnnylin/neuronpedia-scorer/src/test-json-ellipses.py", line 75, in <module> asyncio.run(run()) File "/Users/johnnylin/.pyenv/versions/3.10.0/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/Users/johnnylin/.pyenv/versions/3.10.0/lib/python3.10/asyncio/base_events.py", line 641, in run_until_complete return future.result() File "/Users/johnnylin/neuronpedia-scorer/src/test-json-ellipses.py", line 67, in run received_json = json.loads(completion) File "/Users/johnnylin/.pyenv/versions/3.10.0/lib/python3.10/json/__init__.py", line 346, in loads return _default_decoder.decode(s) File "/Users/johnnylin/.pyenv/versions/3.10.0/lib/python3.10/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/Users/johnnylin/.pyenv/versions/3.10.0/lib/python3.10/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 465 (char 464)

Here's another reproducible example - replace to_send's activations with the array below. We give GPT 63 activations and only receive 62 back - the ellipses symbol (…) token and activation are missing from the response. In this version of the bug, GPT just silently eliminates the token.

"activations": [ {"token": " on", "activation": None}, {"token": " some", "activation": None}, {"token": " days", "activation": None}, {"token": " we", "activation": None}, {"token": " post", "activation": None}, {"token": " an", "activation": None}, {"token": " afternoon", "activation": None}, {"token": " story", "activation": None}, {"token": " at", "activation": None}, {"token": " around", "activation": None}, {"token": " 2", "activation": None}, {"token": " PM", "activation": None}, {"token": ".", "activation": None}, {"token": " After", "activation": None}, {"token": " every", "activation": None}, {"token": " new", "activation": None}, {"token": " story", "activation": None}, {"token": " we", "activation": None}, {"token": " send", "activation": None}, {"token": " out", "activation": None}, {"token": " an", "activation": None}, {"token": " alert", "activation": None}, {"token": " to", "activation": None}, {"token": " our", "activation": None}, {"token": " e", "activation": None}, {"token": "-", "activation": None}, {"token": "mail", "activation": None}, {"token": " list", "activation": None}, {"token": " and", "activation": None}, {"token": " our", "activation": None}, {"token": " FB", "activation": None}, {"token": " page", "activation": None}, {"token": ".", "activation": None}, {"token": "\n", "activation": None}, {"token": "\n", "activation": None}, {"token": "Learn", "activation": None}, {"token": " about", "activation": None}, {"token": " Scientology", "activation": None}, {"token": " with", "activation": None}, {"token": " our", "activation": None}, {"token": " numerous", "activation": None}, {"token": " series", "activation": None}, {"token": " with", "activation": None}, {"token": " experts", "activation": None}, {"token": "…", "activation": None}, {"token": "\n", "activation": None}, {"token": "\n", "activation": None}, {"token": "BL", "activation": None}, {"token": "OG", "activation": None}, {"token": "G", "activation": None}, {"token": "ING", "activation": None}, {"token": " DI", "activation": None}, {"token": "AN", "activation": None}, {"token": "ET", "activation": None}, {"token": "ICS", "activation": None}, {"token": ":", "activation": None}, {"token": " We", "activation": None}, {"token": " read", "activation": None}, {"token": " Scientology", "activation": None}, {"token": "��", "activation": None}, {"token": "s", "activation": None}, {"token": " founding", "activation": None}, {"token": " text", "activation": None}, ]

You should see the following output, which shows the incorrect length of GPT's response.

activations length sent to GPT: 63 activation tokens sent: [' on', ' some', ' days', ' we', ' post', ' an', ' afternoon', ' story', ' at', ' around', ' 2', ' PM', '.', ' After', ' every', ' new', ' story', ' we', ' send', ' out', ' an', ' alert', ' to', ' our', ' e', '-', 'mail', ' list', ' and', ' our', ' FB', ' page', '.', '\n', '\n', 'Learn', ' about', ' Scientology', ' with', ' our', ' numerous', ' series', ' with', ' experts', '…', '\n', '\n', 'BL', 'OG', 'G', 'ING', ' DI', 'AN', 'ET', 'ICS', ':', ' We', ' read', ' Scientology', '��', 's', ' founding', ' text'] activations length received from GPT: 62 [' on', ' some', ' days', ' we', ' post', ' an', ' afternoon', ' story', ' at', ' around', ' 2', ' PM', '.', ' After', ' every', ' new', ' story', ' we', ' send', ' out', ' an', ' alert', ' to', ' our', ' e', '-', 'mail', ' list', ' and', ' our', ' FB', ' page', '.', '\n', '\n', 'Learn', ' about', ' Scientology', ' with', ' our', ' numerous', ' series', ' with', ' experts', '\n', '\n', 'BL', 'OG', 'G', 'ING', ' DI', 'AN', 'ET', 'ICS', ':', ' We', ' read', ' Scientology', '\\ufffd\\ufffd', 's', ' founding', ' text']

no worries about the delay and thanks for the thorough instructions on reproducing the problem! in that case, I think it would be great to put the non-ascii preprocessing code in this file and add a flag to enable it like you suggested 🙏

henktillman · 2024-01-09T01:25:46Z

neuron-explainer/neuron_explainer/explanations/simulator.py

+            return zero_prediction
+        predicted_activations = []
+        # check that there is a token and activation value
+        # no need to double check the token matches exactly


where's the first check?

I think I should have placed this comment one line lower. If you mean the token and activation value check:
token check is line 665 if "token" not in activation:
activation check is line 669 if "activation" not in activation:

Or do you mean a different check?

If I'm understanding the code correctly it looks like you only check that the number of tokens is as expected, but you don't check that any of the tokens individually are correct. is that right?

henktillman · 2024-01-09T01:28:10Z

neuron-explainer/neuron_explainer/explanations/simulator.py

    zero_prediction = [0] * len(tokens)
-    token_lines = completion.strip("\n").split("༗\n")
+    # FIX: Strip the last ༗\n, otherwise all last activations are invalid


good catch! let's remove the FIX, I don't think it makes sense out of context of the PR :)

henktillman · 2024-01-09T01:29:46Z

neuron-explainer/neuron_explainer/explanations/simulator.py

-    token_lines = completion.strip("\n").split("༗\n")
+    # FIX: Strip the last ༗\n, otherwise all last activations are invalid
+    token_lines = completion.strip("\n").strip("༗\n").split("༗\n")
+    # Edge Case #2: Sometimes GPT doesn't use the special character when it answers, it only uses the \n"


Line breaks are fairly common. How often do we get cases where GPT doesn't use the special character and the text doesn't contain \n?

henktillman · 2024-01-09T01:33:24Z

neuron-explainer/neuron_explainer/explanations/simulator.py

-        predicted_activation = token_line.split("\t")[1]
-        if predicted_activation not in VALID_ACTIVATION_TOKENS:
+        predicted_activation_split = token_line.split("\t")
+        # Ensure token line has correct size after splitting. If not then assume it's a zero.


feels like there's a better way to do this, since I imagine tabs aren't rare. maybe we could split on tabs and take the last element in the list? and then if there's a problem with the result, it will be caught by the activation parsing code below?

henktillman · 2024-01-09T01:34:37Z

neuron-explainer/neuron_explainer/explanations/simulator.py

+                self.explanation,
+            )
+            response = await self.api_client.make_request(
+                messages=prompt, max_tokens=1000, temperature=0


let's make max_tokens = 2000 a constant and have the above make_request call use it too

…nses

hijohnnylin added 2 commits December 28, 2023 17:42

handle 3 edge cases for new simulator

8d260b3

Allow float simulated activation values and handle non-numeric values

67e4f99

hijohnnylin changed the title ~~Handle 3 edge cases for new simulator~~ Handle edge cases for new simulator Dec 29, 2023

hijohnnylin added 8 commits December 28, 2023 20:35

Enforce predicted activation float range

208663e

Allow approximate space matching with all tokens, not just the first one

9a6739d

Fix bug with last activation values

d2b5ef5

Further fix edge case 3 to set correct start_line_index when first to…

4ec7d74

…ken is <|endoftext|>, and also fix bug when activation is outside of expected range of 0 to 10

Add debug logging

610fdce

Better catching of non-float activation values

9759a15

Catch cases where GPT returns a token line with an invalid format

66dfe51

Rewrite new scorer with JSON mode. Add temperature = 0.

c7c6eea

hijohnnylin changed the title ~~Handle edge cases for new simulator~~ Rewrite new simulator to use JSON mode; additional fixes to new simulator Jan 5, 2024

hijohnnylin added 2 commits January 5, 2024 17:06

Fix: Catch incorrect type of activation value in JSON response

62eedb5

Increase max_tokens to 2000, move endoftext and non-ascii handling ou…

58af073

…t of simulator (should be handled by preprocessing outside of library in case GPT bugs are fixed), add better debug logging

henktillman reviewed Jan 9, 2024

View reviewed changes

hijohnnylin added 2 commits January 10, 2024 00:24

more detailed error logging for easier debugging of invalid GPT respo…

d62da4e

…nses

ENHANCEMENT: Fine-tuned scorer: Update prompt, add new examples

bfbeb57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rewrite new simulator to use JSON mode; additional fixes to new simulator #34

Rewrite new simulator to use JSON mode; additional fixes to new simulator #34

Uh oh!

hijohnnylin commented Dec 29, 2023 •

edited

Loading

Uh oh!

henktillman left a comment

Uh oh!

henktillman Jan 9, 2024

Uh oh!

hijohnnylin Jan 9, 2024 •

edited

Loading

Uh oh!

henktillman Jan 10, 2024

Uh oh!

hijohnnylin Jan 10, 2024 •

edited

Loading

Uh oh!

henktillman Jan 10, 2024

Uh oh!

henktillman Jan 9, 2024

Uh oh!

hijohnnylin Jan 9, 2024

Uh oh!

henktillman Jan 10, 2024

Uh oh!

henktillman Jan 9, 2024

Uh oh!

henktillman Jan 9, 2024

Uh oh!

henktillman Jan 9, 2024

Uh oh!

henktillman Jan 9, 2024

Uh oh!

Uh oh!

Rewrite new simulator to use JSON mode; additional fixes to new simulator #34

Are you sure you want to change the base?

Rewrite new simulator to use JSON mode; additional fixes to new simulator #34

Uh oh!

Conversation

hijohnnylin commented Dec 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henktillman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hijohnnylin Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hijohnnylin Jan 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hijohnnylin commented Dec 29, 2023 •

edited

Loading

hijohnnylin Jan 9, 2024 •

edited

Loading

hijohnnylin Jan 10, 2024 •

edited

Loading