Replies: 5 comments 27 replies
-
I've thought about this a bit - could be useful to look at how the probability of the answer token changes as a function of the chain of thought. for example, when asked 2+2, if R1 says "oh I think it's 4 wait let me check maybe it's 5 no I think it's 4" does the probability of the 4 token actually change when R1 does this backtracking/reflection with waits? This is measurable with LogitLens. |
Beta Was this translation helpful? Give feedback.
-
This is a very rough update on this project:
The data collection is still ongoing Next steps:
|
Beta Was this translation helpful? Give feedback.
-
For the first visualization, what would it look like if sorted by something other than question ID? For example, sorting by the values for the different colors? |
Beta Was this translation helpful? Give feedback.
-
It might be worth framing this within the chain of thought faithfulness literature (https://arxiv.org/abs/2307.13702, https://arxiv.org/pdf/2305.04388), which considers very similar questions, i.e. Lanham interrupts the model's generations and then asks it to answer the question directly, and studies how this probability changes through the generations. There is also this paper from Prof. Lakkaraju's lab with mixed results on using probing or finetuning for increasing faithfulness (https://arxiv.org/abs/2406.10625), though I don't believe they tested reasoning models. I think it might be cool to investigate mechanistic questions related to ''pivotal tokens',' steps in the reasoning process where the probability of obtaining the right answer after resampling increases in a few steps.If the model responds to a math question with tokens y1 through yT, then a pivotal token is some index t' such that resampling conditioning on y1 through yt' produces the same answer with small prob and resampling conditoning on y1 through yt'+1 produces the answer with large prob. They were used to improve the model in the Phi-4 technical report (https://arxiv.org/abs/2412.08905v1, https://arxiv.org/abs/2411.19943), and I have some work on these as well (https://arxiv.org/abs/2502.00921). I think it could be interesting to see mechanistically what these pivotal tokens correspond to. Can we see this narrowing down of final answers through probing? |
Beta Was this translation helpful? Give feedback.
-
Just a quick update on the visualization: In general, there are three interesting patterns: Productive Reasoning:
I am using viridis color map, and the brighter the color, the higher the probability of correct answer (if we exit the thinking at that token). These are ideal situations where the reasoning is connected with the model's choice (of the right answer). See HTML visualizations here: https://drive.google.com/drive/folders/1DPCig7Sz3HvQ40Kq6yFrB_X1oOqfjOMX?usp=drive_link Counterproductive Reasoning:
It's certainly unfortunate the additional reasoning lead to the wrong answer, but I do not think this behavior is an anomaly given the model does not know the correct answer beforehand, and its reasoning did affect its final answer. See this link for the corresponding HTML visualization: https://drive.google.com/drive/folders/1gTiLaUt0fHKa-6-Te2jHM_0QKaDM5lcg?usp=sharing Meaningless Effort:
See more examples here: https://drive.google.com/drive/folders/1MUMbKV6J4XKOATwiWNB5d-oXDzELr1ru?usp=sharing Ongoing:I am creating more visualizations like this and will upload them to the folders above. Meanwhile, I am also starting on the probing for this hidden progress. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Research Question
When asked the DeepSeek models a challenging abstract algebra question, they often generated hundreds of tokens of reasoning before providing the final answer. Yet, on some questions, if I removed their generated reasoning and asked for its answer immediately, their answers are also correct.
On these questions, when the model begins with phrases like “Wait, wait,...” in its reasoning, is it actually reasoning through the problem for itself, or is it merely performing for the user? Can we tell from the question’s representation whether reasoning is necessary for producing the correct answer? Given the representation of an ongoing CoT, can we distinguish if it reflects genuine reasoning or is simply a post-hoc explanation of a pre-planned answer?
Owner
Yida Chen (@yc015)
Contributors
Martin Wattenberg (@wattenberg), Aghyad Deeb (@aghyad-deeb)
Project status
Ongoing: I am temporarily pausing on taking new collaborators for this project, but it may reopen again in the near future!
Current Findings
On the test set of MMLU benchmark (with questions from abstract_algebra, anatomy, astronomy), the number of thinking tokens used by DeepSeek-R1-Distill-Qwen-7B varied a lot when sampling the output multiple times. Meanwhile, the number of tokens used in the thinking (CoT) has almost no correlation with the correctness of the output (point-biserial correlation ~= -0.05).
On some questions, replacing the content between the
<think>
(beginning of thinking) and</think>
end of thinking tokens with a blank space or a sequence of random tokens has no influence on the model's choice of the answers.Feb 24th:
Distilled a dataset of 4,688 multiple choice questions with the DeepSeek-R1-Distill-Qwen-7B's answers before/after thinking.
Have code that collects the logit of the correct answer during the reasoning progress. Currently working on creating visualization for the reasoning progress. Once this part is done, will give probing a try and potentially cluster these time series data at the same time.
March 6th:
See this website https://yc015.github.io/reasoning-progress-viz/ for a collection of visualizations on the distilled Qwen-7B model's reasoning progress on MMLU dataset.
March 8th:
Now, the visualizations for most of the questions in the first two rows of this website are ready to be viewed.
If you click on any subject, you will be redirected to a gallery of the Qwen-7B's thinking process on the questions from that subject.
How to read these visualizations: Each colored matrix is a thumbnail of the actual thinking process. The thinking tokens are colored in four colors: green color indicates the possibility that model outputs the correct choice if we stop the reasoning right on that token. The other three colors indicate the probabilities of other three (wrong) possible.
Question 12 thumbnail:

Detailed reasoning visualization for question 12:

Does the model's internal reasoning outrun its external output? Or does it speak too fast, leaving behind its "thought"?
Beta Was this translation helpful? Give feedback.
All reactions