Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server : parallel decoding and multimodal (cont) #3677

Merged
merged 72 commits into from
Oct 22, 2023
Merged
Changes from 1 commit
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
63f99b1
implementing parallel decoding in server example
FSSRepo Oct 11, 2023
4712302
crash fixed
FSSRepo Oct 11, 2023
7850421
save dev progress
FSSRepo Oct 12, 2023
b716eeb
Merge branch 'master' of https://github.com/ggerganov/llama.cpp
FSSRepo Oct 12, 2023
29c8cdd
refactored sampling function
FSSRepo Oct 12, 2023
8148480
completion endpoint working
FSSRepo Oct 12, 2023
5b8e29d
multiple client support
FSSRepo Oct 12, 2023
83c2b35
grammar + no stream completion
FSSRepo Oct 12, 2023
500ac71
cached prompt support
FSSRepo Oct 13, 2023
4ba5a50
chat.mjs support cached prompt + some fixes
FSSRepo Oct 13, 2023
6358ae5
server ui now support multiple clients
FSSRepo Oct 13, 2023
a410a9e
unused change reverted
FSSRepo Oct 13, 2023
b6d9e21
fixed timings per slot
FSSRepo Oct 13, 2023
a2c2d98
add context swap
FSSRepo Oct 13, 2023
eb08201
add changes to README.md
FSSRepo Oct 13, 2023
9d98cdd
llava multimodal integration
FSSRepo Oct 13, 2023
de35b47
fixed tokens probs
FSSRepo Oct 13, 2023
9f72b44
add multimodal input - alfa
FSSRepo Oct 14, 2023
7e64bfe
refactor code + remove unused comments + improved README.md
FSSRepo Oct 14, 2023
299f6b5
fix compilation errors with llvm
damian0815 Oct 14, 2023
4e5c5c4
notify the user from server ui that multimodality is unavialable
FSSRepo Oct 14, 2023
f47fd17
Merge branch 'ggerganov:master' into master
FSSRepo Oct 15, 2023
9035978
Merge pull request #6 from damian0815/fssrepo_mac_fixes
FSSRepo Oct 15, 2023
ce961a3
some ci fixes
FSSRepo Oct 15, 2023
b727e02
fix ci make build undefined ref errors
FSSRepo Oct 15, 2023
fd64f04
fix long prompt than ctx proposed in #3639
FSSRepo Oct 15, 2023
2d9f11d
fixed premature end due stop word
FSSRepo Oct 16, 2023
d7eca25
context shift fixed
FSSRepo Oct 16, 2023
4d18043
fix llava implementation
FSSRepo Oct 16, 2023
aa2268f
sync README.md changes
FSSRepo Oct 17, 2023
fa0f22f
Merge remote-tracking branch 'upstream/master'
FSSRepo Oct 17, 2023
58f8ae9
readme change
FSSRepo Oct 17, 2023
6c277ea
update api like OpenAI
FSSRepo Oct 17, 2023
ed0c11c
multimodal support enabled by default
FSSRepo Oct 17, 2023
d2b1fac
fix make bui;d errors
FSSRepo Oct 17, 2023
c02c52e
fix multiple clients
FSSRepo Oct 17, 2023
35fd374
fix zig build
FSSRepo Oct 17, 2023
84b8f2b
Merge branch 'ggerganov:master' into master
FSSRepo Oct 18, 2023
7196c4e
new sampling API
FSSRepo Oct 18, 2023
8540568
Merge branch 'master' of https://github.com/ggerganov/llama.cpp
FSSRepo Oct 18, 2023
ab2fc00
latest changes of sampling API
FSSRepo Oct 18, 2023
e44ed60
server : coding-style normalization
ggerganov Oct 19, 2023
654e0a1
server : coding-style normalization (part 2)
ggerganov Oct 19, 2023
a8c981b
server : remove beam-search functionality
ggerganov Oct 19, 2023
3d5929e
server : bug fix in ingest_images
ggerganov Oct 19, 2023
e3a2c3f
server : use refs + use llama_batch_clear()
ggerganov Oct 19, 2023
9740824
server : snake case
ggerganov Oct 19, 2023
325d179
server : minor sync
ggerganov Oct 19, 2023
6b2437e
added thread safe pipeline
FSSRepo Oct 20, 2023
113dd60
server : bach has to be allocated for n_parallel sequences
ggerganov Oct 20, 2023
5d540e8
server : no need for atomic int - already using mutex
ggerganov Oct 20, 2023
778c070
server : logs + minor code style
ggerganov Oct 20, 2023
17b23eb
server : fix multibyte handle in partial response (#3706)
jhen0409 Oct 21, 2023
2eb4c11
fix image load + view image in chat
FSSRepo Oct 21, 2023
176993c
Merge branch 'master' into server-rev
ggerganov Oct 22, 2023
4b4ab72
make : silence stb warnings
ggerganov Oct 22, 2023
715f384
clip : link to ggml, not to llama
ggerganov Oct 22, 2023
197a0a9
server : fix switch fallthrough
ggerganov Oct 22, 2023
ef18f4d
server : fix crash in Debug on macOS (I have no idea why this fixes i…
ggerganov Oct 22, 2023
569ebf1
server : refactor ctx_sampling init + n_ctx + names
ggerganov Oct 22, 2023
f67d971
server : bug fix for prompt caching
ggerganov Oct 22, 2023
5359fb9
Do not save/load image_data to localStorage
monatis Oct 22, 2023
f305d64
editorconfig : new line in index.html
ggerganov Oct 22, 2023
a806317
server : completion requests remember slot_id
ggerganov Oct 22, 2023
2679c43
Update readme to document multimodal in server
monatis Oct 22, 2023
a4d69d8
Merge branch 'server-rev' of https://github.com//ggerganov/llama.cpp …
monatis Oct 22, 2023
dd1af2e
server : minor style
ggerganov Oct 22, 2023
3d6a687
Update readme to document multimodal in server
monatis Oct 22, 2023
00ae55b
server : hide ctx_sampling->prev behind API (#3696)
ggerganov Oct 22, 2023
8fe7ca4
server : apply fix from #3722
ggerganov Oct 22, 2023
83e1490
server : fix slot reuse
ggerganov Oct 22, 2023
c0f4d54
server : add comment about changing slot_state to bool
ggerganov Oct 22, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
server : fix crash in Debug on macOS (I have no idea why this fixes i…
…t!?)
  • Loading branch information
ggerganov committed Oct 22, 2023
commit ef18f4d579b90344eff045dcc29d15048310e6bc
31 changes: 21 additions & 10 deletions examples/server/server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2440,20 +2440,31 @@ int main(int argc, char **argv)
{"hostname", sparams.hostname},
{"port", sparams.port},
});
std::thread t([&llama]()
{
bool running = true;
while (running)

// run the HTTP server in a thread - see comment below
std::thread t([&]()
{
running = llama.update_slots();
}
}
);
if (!svr.listen_after_bind())
{
return 1;
}

if (!svr.listen_after_bind())
return 0;
});

// GG: if I put the main loop inside a thread, it crashes on the first request when build in Debug!?
// "Bus error: 10" - this is on macOS, it does not crash on Linux
//std::thread t2([&]()
{
return 1;
bool running = true;
while (running)
{
running = llama.update_slots();
}
}
//);

t.join();

llama_backend_free();
return 0;
Expand Down