Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to comprehend the token generate procession? just confused about the current context_length? I thought it was just a fixed parameter .... #2

Open
jimmmwong opened this issue Apr 18, 2024 · 3 comments

Comments

@jimmmwong
Copy link

def generate(self, idx, max_new_tokens):
# idx is (B,T) array of indices in the current context
for _ in range(max_new_tokens):
# Crop idx to the max size of our positional embeddings table
idx_crop = idx[:, -self.context_length:]
# Get predictions
logits, loss = self(idx_crop)
# Get the last time step from logits where the dimensions of the logits are (B,T,C)
logits_last_timestep = logits[:, -1, :]
# Apply softmax to get probabilities
probs = F.softmax(input=logits_last_timestep, dim=-1)
# Sample from the probabilities' distribution.
idx_next = torch.multinomial(input=probs, num_samples=1)
# Append the sampled indexes idx_next to idx
idx = torch.cat((idx, idx_next), dim=1)
return idx

@waylandzhang
Copy link
Owner

A bit hard part to explain in pure texts, you can watch my video posts.. if you can read Chinese ^

@Gooddz1
Copy link

Gooddz1 commented Apr 24, 2024

idx_next = torch.multinomial(input=probs,num_samples=1) 这个怎么理解?我还以为想着应该是取probs最大值或者最小值的索引号,但是又不是这样

@waylandzhang
Copy link
Owner

取的确实是token索引,只是取值方法被multinomial那个函数定义了。有多种取法,并不一定是最大概率的那个。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants