Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge with mlc-ai/main (adc6ee6ae2de97a507291aaff6279af4e3d16a83, July 2nd 2024) #272

Merged
merged 494 commits into from
Jul 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
494 commits
Select commit Hold shift + click to select a range
fd65973
[Bugfix] layer_norm_eps in GPT2Config should be float (#2240)
rickzx Apr 27, 2024
63a3804
[REFACTOR] Migrate JSONFFIEngine to formal namespace (#2241)
tqchen Apr 27, 2024
1a8bad0
[Serving] Share disco sessions among multiple model function tables (…
vinx13 Apr 28, 2024
5a26795
[DOC] Improve Install via environment variable (#2245)
JackWeiw Apr 29, 2024
3cb2ee8
[Sampler] FlashInfer sampling func integration (#2224)
MasterJH5574 Apr 29, 2024
d3d264d
Model Library Delivery (#2139)
Kartik14 Apr 29, 2024
2489964
[Support] Simplify function names in encoding.h (#2251)
Ubospica Apr 30, 2024
afde65c
[Serving] Introduce DraftTokenWorkspaceManager (#2250)
vinx13 Apr 30, 2024
6a43570
[Fix] fix a typo in event_trace_recorder (#2253)
Kevin-XiongC Apr 30, 2024
ca7cdcc
[Tokenizer] Support ByteLevel BPE in tokenizer token table (#2248)
Ubospica Apr 30, 2024
51391c3
[Eagle] Avoid worker - engine transfer for hidden states (#2256)
vinx13 May 1, 2024
eb4d624
[Serving] Add engine stats for speculative decoding (#2257)
vinx13 May 1, 2024
d206c44
[Serving] Fix lints (#2258)
vinx13 May 1, 2024
9941b4f
[Sampler] Avoid unnecessary sync in GPU verifier (#2260)
vinx13 May 1, 2024
cfd3b2c
Fix typo in token_postproc_method names (#2261)
zifeitong May 1, 2024
8e5af29
[Sampler] Add missing sync in gpu verifier (#2262)
vinx13 May 1, 2024
e756f23
[Model] Remove redundant space in llama2 tokenizer (#2263)
vinx13 May 2, 2024
878be83
[Model] Fix llama2 chat template and remove redundant separator added…
vinx13 May 3, 2024
b310ee1
[Refactor][Serving] EngineConfig refactor and "model-lib-path" rename…
MasterJH5574 May 3, 2024
17fb1c4
[Serving] Add some try-except captures in AsyncMLCEngine (#2265)
yongwww May 3, 2024
b124b0b
[Eagle] Fix token shifting for prefill step (#2266)
vinx13 May 3, 2024
c030660
[Fix] Fix the two-stage softmax func by removing log2e (#2269)
MasterJH5574 May 3, 2024
8d58e52
[Eagle] Fix missing broadcast in hidden states gather/scatter (#2271)
vinx13 May 4, 2024
c166a90
[Sampler] Use pivot-based renormalization for top-p sampling (#2272)
MasterJH5574 May 4, 2024
0ca6b33
[JSONFFI] Update JSONFFI error checking with the Result class (#2275)
MasterJH5574 May 5, 2024
f181ce2
[Bugfix] fix _kv_cache_transpose_append buffer read region error (#2277)
JackWeiw May 5, 2024
23636e5
[GenConfig] Set upper bound for prefill chunk size (#2278)
MasterJH5574 May 5, 2024
6bcd70c
[iOS] Initial scaffolding of MLCEngine in Swift (#2279)
tqchen May 6, 2024
d31941f
Rename READMD.md to README.md
tqchen May 6, 2024
5ae393a
[Serving] Image support in JSONFFIEngine (#2208)
anibohara2000 May 6, 2024
cd09933
[Pass] Attach manual softmax-with-temperature (#2280)
MasterJH5574 May 6, 2024
eb1454f
[Model] Remove unused import to fix lint (#2284)
MasterJH5574 May 6, 2024
44b5675
[Serving] Fix BatchVerify to feed the extra token when fully accepted…
MasterJH5574 May 7, 2024
ec6cc30
Update engine.cc
tqchen May 7, 2024
d01e1fc
[CMAKE][BUILD] Add config option to enable OpenCL Host ptr (#2287)
krishnaraj36 May 7, 2024
0829bcf
[Serving][Fix] Pass draft length when constructing draft action (#2291)
MasterJH5574 May 7, 2024
2306086
[Pass] Fix sampling func attachment to not read existing vocab size (…
MasterJH5574 May 7, 2024
b499d2b
[SLM] Introduce microsoft/Phi-3 (#2222)
mengshyu May 7, 2024
3621bf6
[Eagle] Run additional decode for draft model when all proposals are …
vinx13 May 7, 2024
df4e2f3
[iOS] Introducing package CLI for iOS app packaging (#2297)
MasterJH5574 May 8, 2024
8a31986
Increase the timeout in PopenServer (#2298)
yongwww May 8, 2024
65f9716
[LLM-CHAT] Enable gpu softmax for penality softmax (#2288)
krishnaraj36 May 8, 2024
1bd1ab0
[iOS][REFACTOR] Restructure the iOS folders (#2299)
tqchen May 8, 2024
c580140
[KVCACHE][TIR] Improved tir schedule for decode tir page attention (#…
krishnaraj36 May 8, 2024
10f3e4d
[Sampler] Remove unneeded output_prob_dist param (#2300)
vinx13 May 9, 2024
33c15e7
Enable cuda graph for batch_verify (#2304)
vinx13 May 9, 2024
dbd13f4
[Android] Introducing mlc4j and app packaging (#2305)
MasterJH5574 May 10, 2024
b62dd91
[DOCS] Minor cleanup (#2308)
tqchen May 10, 2024
37230db
[DOCS] Update android doc (#2309)
tqchen May 10, 2024
8bb1d6e
[DOCS] Update android doc (#2310)
tqchen May 10, 2024
459ffe3
[SLM] Support BERT architecture. Implement a text embedding module (#…
rickzx May 10, 2024
ea391de
[Serving] Log batch size in NVTX (#2312)
vinx13 May 10, 2024
b01cfab
[Model] Removing unnecessary reshapes in get_logits (#2314)
vinx13 May 10, 2024
347222c
Skip cublas dispatch for single batch (#2315)
vinx13 May 10, 2024
73b733d
Auto updated submodule references
May 10, 2024
3a0b42c
[DOCS] Remove mention of legacy modules (#2318)
tqchen May 10, 2024
2b8aadf
[Android] Add `-j` option to cmake build (#2321)
MasterJH5574 May 10, 2024
98f0424
[DOCS] More clear android instruction (#2327)
tqchen May 11, 2024
21feb70
[Serving] Refactor to consolidate new request prefill (#2329)
vinx13 May 12, 2024
45a0487
[iOS] Make MLCEngine input to take in structured data (#2330)
tqchen May 12, 2024
679d3a8
[REFACTOR] Refactor JSONFFI Conv template (#2331)
tqchen May 13, 2024
821ee5d
[Eagle] Fix the requests for additional decode in eagle verify (#2336)
vinx13 May 13, 2024
bc6e3ed
[Serving][Grammar] Refactor GrammarStateMatcher and support LLaMA-3 (…
Ubospica May 14, 2024
0c03537
[DebugChat] Fix DebugChat softmax function and save logits to debug f…
rickzx May 14, 2024
b247f8d
[Serving] Add Medusa speculative decoding (#2337)
vinx13 May 14, 2024
2bbbd52
Fix cublas offloading (#2343)
vinx13 May 15, 2024
227dbb8
Add false for arg worker0_only in disco.empty (#2344)
yongwww May 15, 2024
9b89e04
Auto updated submodule references
May 15, 2024
56ea156
[JSONFFIEngine] Refactor device argument and request_stream_callback …
anibohara2000 May 15, 2024
152ecc4
[Serving] Add reset_engine in debug_entrypoints (#2347)
yongwww May 16, 2024
ac1cd51
[Bugfix] Make sequence_length dtype int64 in EngineConfig. Fix Mistra…
rickzx May 18, 2024
96fc289
[JSON FFI] Example Android Application using JSON FFI Engine (#2322)
Kartik14 May 18, 2024
0e3d536
[iOS] Update MLCEngine API to latest JSON FFI convention (#2359)
tqchen May 18, 2024
9998076
[JSONFFI] Fix JSONFFI conv template. Add unit tests (#2360)
rickzx May 19, 2024
beb126c
[Fix][Serving] Fix prefill chunk in interactive mode (#2363)
MasterJH5574 May 20, 2024
2146f15
[Fix][Serving] Respect sliding window size in config inference (#2364)
MasterJH5574 May 20, 2024
27dc5c8
[iOS] Add padding to app icon (#2365)
Neet-Nestor May 21, 2024
8aed35e
[Serving] Fix the self-ref in engine (#2367)
tqchen May 21, 2024
5444fd5
[Serving] Prefix Cache (#2295)
cyx-6 May 21, 2024
3c0b15c
[Fix] Use static_cast for `.size()` for safety (#2369)
MasterJH5574 May 21, 2024
ff39925
[Serving] Sliding-window-aware request prefill (#2370)
MasterJH5574 May 22, 2024
db039cf
[iOS] Update MLCSwift to fully follow OAI style. (#2371)
tqchen May 22, 2024
edc434d
Add nvtx in logic update (#2372)
yongwww May 22, 2024
8d3194c
[Test] Use HF model for JIT as much as possible (#2373)
MasterJH5574 May 22, 2024
20c198f
[Fix] Fix prefix cache reset and forking logic (#2374)
cyx-6 May 22, 2024
a5e71b3
[CLI] Migrate CLI to use the new Engine (#2375)
tqchen May 22, 2024
0724983
[TESTING] Introduce testing util to manage models (#2377)
tqchen May 22, 2024
6dd6c89
[REFACTOR][Rename] MLC_LLM_SOURCE_DIR and TVM_SOURCE_DIR source dire…
tqchen May 22, 2024
6de0f55
[REFACTOR][ENV] MLC_CACHE_DIR to MLC_LLM_HOME (#2379)
tqchen May 22, 2024
547060a
[iOS] Switch MLC Chat to use MLCEngine (#2380)
tqchen May 22, 2024
db833aa
[REFACTOR] Cleanup legacy code (#2381)
tqchen May 22, 2024
600a3e5
[Fix] Update prefix cache config (#2382)
cyx-6 May 22, 2024
2e1ff62
[PREFIX-CACHE] Fix some issues with prefix cache (#2384)
tqchen May 23, 2024
7eaeed1
[FIX] Typo on OpenAI Chat class in engine (#2385)
Faolain May 23, 2024
ac4dff7
[Serving][Refactor] Metrics and stats for CLI (#2387)
MasterJH5574 May 23, 2024
fbe3b9e
[REFACTOR] Organize metrics (#2390)
tqchen May 23, 2024
9631cc3
[Fix] Avoid ref capture in prefix cache contruction (#2391)
MasterJH5574 May 23, 2024
370fca5
[REFACTOR] Cleanup Metrics (#2392)
tqchen May 23, 2024
00c2292
[FIX] Fix mlc llm source dir argument (#2394)
tqchen May 23, 2024
ddbec62
[Fix] Fix the serialization of SpecDecodeMetrics (#2395)
MasterJH5574 May 23, 2024
eb546ee
[Fix] Update missing change in engine ffi func name (#2396)
cyx-6 May 23, 2024
040b10e
Auto updated submodule references
May 24, 2024
641b64b
[Fix] Fix no prefix cache (#2397)
cyx-6 May 24, 2024
988e9f0
add hasattr safecheck for MLCEngineBase (#2400)
BodhiHu May 24, 2024
70f2a76
[Refactor] Expose EngineConfig in engine constructor (#2399)
MasterJH5574 May 24, 2024
37da8e4
[REFACTOR] Introduce RequestMetrics and metrics endpoint (#2401)
tqchen May 24, 2024
a6d3cc1
[Fix] Fix format issue of MLCEngineBase (#2402)
MasterJH5574 May 24, 2024
9f96333
[FIX] fix comments in radix_tree.py (#2403)
ita9naiwa May 24, 2024
db78862
[Fix] Fix metric names in tests and static PrefixCacheModes (#2404)
MasterJH5574 May 24, 2024
d12afce
[Op] Tree attention (#2376)
spectrometerHBH May 24, 2024
d39272a
[REFACTOR] Reorganize GenerationConfig DebugConfig and FFI (#2407)
tqchen May 24, 2024
d770270
[Fix] Fix vector OOB when no inputs can be prefilled in spec decode (…
MasterJH5574 May 24, 2024
97df697
[Fix] Update number of available pages after prefix cache free (#2409)
MasterJH5574 May 24, 2024
7eba612
[REFACTOR] Enable validation logic in GenerationConfig (#2411)
tqchen May 24, 2024
905620c
[Chat] Support chat completion config override (#2412)
MasterJH5574 May 24, 2024
cd79b96
Change name RedixPage -> RadixPage in RadixTree.cc (#2413)
ita9naiwa May 24, 2024
cfc0597
[Fix] Fix ignore_eos support (#2414)
MasterJH5574 May 24, 2024
135419e
[Test][Refactor] Update tests to use require_test_model (#2415)
MasterJH5574 May 25, 2024
b18284b
[Serving] Enable GPU Sampling (#2368)
Hzfengsy May 25, 2024
0b2cbb2
[REFACTOR] Support latest include_usage and DebugOptions (#2417)
tqchen May 26, 2024
3b272eb
[DOWNLOAD] MLC_DOWNLOAD_POLICY and MLC_LLM_READONLY_WEIGHT_CACHES (#2…
tqchen May 26, 2024
c62e143
[REFACTOR] Rename MLC_LLM_READONLY_WEIGHT_CACHES (#2423)
tqchen May 26, 2024
13c0661
[Tokenizer] Auto-detect TokenizerInfo from tokenizer.json (#2416)
Ubospica May 26, 2024
8b38a4b
[REFACTOR] Remove dependencies on legacy chat_module (#2424)
tqchen May 26, 2024
ff91749
[REFACTOR] Terminology download=>download_cache (#2425)
tqchen May 26, 2024
14bec5a
[REFACTOR] Move GenerationConfig to protocol (#2427)
tqchen May 26, 2024
ae88612
Update README.md
Neet-Nestor May 27, 2024
0df00bf
[site] Add hero section to website (#2430)
Neet-Nestor May 27, 2024
1025926
[Compile] Skip CUDA graph rewrite when target is not CUDA (#2433)
MasterJH5574 May 27, 2024
00e79d1
[DOCS] Simplify read me (#2435)
tqchen May 27, 2024
21ac3a2
[DOCS] Update title to focus on engine feature
tqchen May 27, 2024
4538cc7
[Metadata] Remove stale KV cache size (#2434)
MasterJH5574 May 27, 2024
526114e
[iOS] Update the MLCSwift APIs to async (#2436)
tqchen May 27, 2024
c87d369
[Android] Switch MLC Chat to use MLCEngine (#2410)
mengshyu May 27, 2024
5b73ec3
[iOS] Remove Legacy ChatModule (#2437)
tqchen May 27, 2024
16fb729
[Delivery] Update model delivery script to support specifying the out…
rickzx May 27, 2024
ba8e20a
[Android] Remove Legacy ChatModule (#2438)
mengshyu May 27, 2024
be15b22
[Refactor] Remove ChatModule (#2440)
MasterJH5574 May 27, 2024
50adede
[Fix][REST] Fix usage-related server tests (#2441)
MasterJH5574 May 27, 2024
dc40656
[Site] Enlarge hero image in small screens
Neet-Nestor May 27, 2024
f2db8e4
Fix lint
tqchen May 27, 2024
d93e5a6
[ANDROID] Patches to enable windows usescase (#2443)
tqchen May 28, 2024
709644f
[DOCS] Guides for android on windows (#2444)
tqchen May 28, 2024
4df3abf
[DOCS] mention git-lfs (#2445)
tqchen May 28, 2024
2fc9c63
Fix Llama-3 conversation template. Add unit test (#2442)
rickzx May 27, 2024
cd4a853
[Grammar][Wasm] Update new grammar to wasm runtime (#2446)
CharlieFRuan May 28, 2024
de61926
[Model] Use float32 for RoPE calculation (#2449)
MasterJH5574 May 28, 2024
cf4bffe
[LogitProcessor] Use min float value as the mask value (#2451)
MasterJH5574 May 28, 2024
570380c
[Protocol] Use `by_alias=True` when dumping pydantic classes (#2450)
MasterJH5574 May 28, 2024
30e46b4
[Protocol] Use `by_alias=True` when dumping pydantic classes (#2452)
MasterJH5574 May 28, 2024
e9a63ed
[DOCS] Updates the URL of the Android APK (#2453)
mengshyu May 28, 2024
d1f5f51
Auto updated submodule references
May 28, 2024
6c31701
[Fix][Phi3] Add `</s>` as stop token for phi3 (#2455)
CharlieFRuan May 28, 2024
d7c159e
[Site] Add GitHub link to hero section
Neet-Nestor May 29, 2024
477da69
Update README.md
Neet-Nestor May 29, 2024
dc091e7
[Hermes2] Add conv template for Hermes2-Pro-Llama3 (#2457)
CharlieFRuan May 29, 2024
27d1f6f
[Compile] Add max_batch_size to metadata (#2463)
MasterJH5574 May 29, 2024
f2c1582
[REFACTOR] Re-organize the modules after transition to MLCEngine (#2464)
tqchen May 29, 2024
e90f2e7
[Serving] Add ICHECK for running batch size (#2465)
MasterJH5574 May 29, 2024
5df26b6
Auto updated submodule references
May 29, 2024
a8e85d0
[TEST] Start to categorize tests (#2466)
tqchen May 29, 2024
249b945
Implemented FP8 calibration (#2454)
vinx13 May 29, 2024
9efb1ba
[CI] Update CUDA build script with FlashInfer options (#2469)
MasterJH5574 May 30, 2024
e0e779a
[Serving] Use preferred host memory for host NDArrays (#2468)
MasterJH5574 May 30, 2024
515823c
[TEST] Temp disable UT stage
tqchen May 30, 2024
c4d337d
[CUDA] Turn on cuda graph at O2 (#2467)
vinx13 May 30, 2024
96d752c
[CI] Enable GPU env in CI (#2476)
tqchen May 30, 2024
cf0278f
[CMake] Update config.cmake generation script (#2478)
MasterJH5574 May 30, 2024
16f0af4
[TEST] MockEchoEngine (#2479)
tqchen May 31, 2024
33dbfd1
Auto updated submodule references
May 31, 2024
ab52b72
[Fix] Fix JSONFFI MemoryBufferStream after dmlc bump (#2480)
MasterJH5574 May 31, 2024
61889fe
[JSON-FFI] Enable n generation and pass in json schema (#2481)
tqchen May 31, 2024
8fc5efa
Refactor model delivery script to use pydantic (#2482)
rickzx May 31, 2024
589c76f
Fix tokenizers encode batch (#2484)
vinx13 Jun 1, 2024
c1628dd
[Bugfix] Fix delivered log issue in delivery cli (#2489)
rickzx Jun 2, 2024
abd7d51
Support Qwen2-MoE Architecture (#2089)
Hzfengsy Jun 2, 2024
46ee63a
[3rdparty] Bump tokenizers-cpp to include HF tokenizers bump (#2490)
MasterJH5574 Jun 2, 2024
71828b0
[Bench] Add mlc bench (#2474)
yongwww Jun 3, 2024
5b4fc07
Auto updated submodule references
Jun 3, 2024
91cc194
Enable n-sampling for Medusa spec decoding (#2495)
vinx13 Jun 3, 2024
94de2a4
[CONFIG] Remove mean_gen_len from the config (#2493)
tqchen Jun 3, 2024
c8bfb50
Update ios android docs (#2497)
tqchen Jun 3, 2024
5a8a728
[Bench] Add seed to __init__ and some minor change (#2496)
yongwww Jun 4, 2024
90170e6
[Fix][Config] Max total sequence length overflow with sliding window …
MasterJH5574 Jun 4, 2024
c0c33a5
[Serving] PagedKVCache tree-attention integration (#2487)
MasterJH5574 Jun 4, 2024
d6f7a58
[Sampler] Enhance checks for whether FlashInfer is enabled (#2502)
MasterJH5574 Jun 4, 2024
70b3102
[Android] Updates the default mode list and the APK link in the docum…
mengshyu Jun 4, 2024
e63aab4
[Fix] Fix the global func name of TokenizerDecode (#2514)
MasterJH5574 Jun 5, 2024
8e56d95
[Fix] Use the correct model to validate stream_options (#2508)
zifeitong Jun 5, 2024
4179922
[Fix] Typo in docs/install/tvm.rst (#2507)
zifeitong Jun 5, 2024
64e33c5
[FP8] Use f32 scale to enable better fusion (#2505)
vinx13 Jun 5, 2024
3bdc8f6
[Metrics] Add ttft and itl to server metrics (#2510)
yongwww Jun 5, 2024
3184294
[Model] Fix config detection for Mistral (#2504)
MasterJH5574 Jun 5, 2024
78e59ab
[Fix] Provide a GetTokenId API for SampleResult (#2516)
Ubospica Jun 5, 2024
3f36236
[Reapply][BUGFIX] Fix rare deadlock in threaded engine (#2429) (#2518)
MasterJH5574 Jun 6, 2024
fbc75c0
[Fix] Fix metrics division by 0 (#2519)
MasterJH5574 Jun 6, 2024
80789f4
Corrected the folder path for Android Studio Project (#2520)
Ramees025 Jun 6, 2024
fd51f97
Update tvm.rst
tqchen Jun 6, 2024
9de380c
[iOS] Update model list (#2524)
spectrometerHBH Jun 6, 2024
1881992
[Android] Updates the order of mode list and the APK link in the docu…
mengshyu Jun 6, 2024
61f5623
[Sampler] Skip top-p renormalization if top-p is 1 in CPUSampler (#2528)
MasterJH5574 Jun 6, 2024
9d16fec
[Docs] Rename javascript.rst to webllm.rst (#2531)
CharlieFRuan Jun 6, 2024
69c600c
[Conv] Add tinyLlama v1.0 conv template (#2530)
CharlieFRuan Jun 6, 2024
868334d
[iOS] correct mistral q3 url and handle screen switch off (#2529)
tqchen Jun 6, 2024
206db55
[Grammar] Fix include protection and paths in docstring (#2515)
Ubospica Jun 7, 2024
50a1a7c
[Tokenizer][Fix] Fix SegFault when analyzing tokenizers without token…
Ubospica Jun 7, 2024
5f71aa9
[Serving] Use stop strs and token ids for completions (#2534)
MasterJH5574 Jun 7, 2024
a096c91
[Serving] Support tensor parallel shards override in command line (#2…
MasterJH5574 Jun 7, 2024
9be4b92
Add tie_word_embedding option for Qwen2 model (#2535)
rickzx Jun 7, 2024
b5b40ee
[Bench] Defaults to aiohttp client, add ServerMetrics (#2527)
yongwww Jun 7, 2024
e601409
[Android] Remove var capture in TVM_SOURCE_DIR (#2538)
MasterJH5574 Jun 7, 2024
d5fbde2
[Fix] Fix inconsistent system prompt handling (#2539)
MasterJH5574 Jun 7, 2024
208642d
[Attention] Fix attn kernel for general GQA group size (#2543)
MasterJH5574 Jun 7, 2024
fcb50a2
fix: typo error (#2544)
michaelhenry Jun 7, 2024
6bd049e
[Fix] Fix attn kernel build issue (#2545)
MasterJH5574 Jun 7, 2024
961d5f1
[iOS] Add Qwen2 support (#2547)
tqchen Jun 7, 2024
78b6e1f
[Android] Add Qwen2 support (#2548)
mengshyu Jun 7, 2024
26a9cf0
[Android] Escape backslashes and quotation marks (#2546)
MasterJH5574 Jun 7, 2024
6bbd49c
[EngineConfig] Add override options (#2550)
MasterJH5574 Jun 7, 2024
f489d8d
[Site] Update link to webllm
Neet-Nestor Jun 8, 2024
db896d1
[Site] Update heading
Neet-Nestor Jun 8, 2024
203cda6
[Preset] Add model preset for model delivery (#2553)
CharlieFRuan Jun 8, 2024
9633c9f
Update docs to remove mention of older models (#2557)
tqchen Jun 8, 2024
c25834d
[Docs] Fix typo in mlc_llm chat command (#2560)
Neet-Nestor Jun 9, 2024
931587b
Fix compilation for gcc 13.2 (#2561)
elvin-n Jun 10, 2024
4234262
[Tokenizer] Priorize HuggingFace/SentencePiece over ByteLevelBPE (#2559)
MasterJH5574 Jun 10, 2024
42f146d
[Serving][Grammar] Jump-forward decoding (#2551)
Ubospica Jun 11, 2024
a231ae1
[Delivery] Update model delivery script (#2565)
rickzx Jun 11, 2024
873827c
[Model] Enhance error reporting for invalid tensor-parallel settings …
MasterJH5574 Jun 12, 2024
dcece51
[Serving] Apply tree structure in draft token verification (#2563)
vinx13 Jun 12, 2024
07c92b0
[Bench] Json mode bench (#2552)
cyx-6 Jun 12, 2024
94a0295
[Model] Support Multi-GPU for Qwen-MoE model (#2573)
MasterJH5574 Jun 13, 2024
ceba951
[Metrics] Add missing fields in `Reset` (#2574)
MasterJH5574 Jun 13, 2024
75b970b
[Doc] Update WebLLM doc (#2578)
CharlieFRuan Jun 14, 2024
e9340c3
[Op] Top-4 implementation for MoE model (#2586)
MasterJH5574 Jun 17, 2024
437166a
[Model] Gemma 1.1 compatibility (#2594)
MasterJH5574 Jun 19, 2024
6a48a02
[Serving] Hybrid prefill (#2604)
cyx-6 Jun 25, 2024
cbf0b02
Update quick_start.rst to fix broken links (#2607)
GunjanDhanuka Jun 27, 2024
d911c60
[Fix] Set the missed prefill finish time (#2613)
MasterJH5574 Jul 1, 2024
fbb6a48
[Android] Reduce binary size (#2606)
MasterJH5574 Jul 1, 2024
0575b92
[Fix] Gemma hidden_activation compatibility (#2614)
MasterJH5574 Jul 1, 2024
c09b108
Update debug_compare (#2612)
Hzfengsy Jul 2, 2024
2d32094
[SLM] Add support for InternLM2 architecture (#2608)
tlopex Jul 2, 2024
0fb5609
[Fix] Prefix cache only enables sliding window on leaf sequence (#2615)
cyx-6 Jul 2, 2024
adc6ee6
[Android] Update include path for tvm runtime src (#2616)
MasterJH5574 Jul 2, 2024
18b6e2b
remove
sunggg Jul 3, 2024
4712016
Merge remote-tracking branch 'upstream/main' into spark/merge-upstrea…
sunggg Jul 3, 2024
48e807e
works
sunggg Jul 3, 2024
6b5b0a9
seems working
sunggg Jul 3, 2024
0cb6ed5
Merge remote-tracking branch 'origin/mlc-serve-v0.2.0' into HEAD
sunggg Jul 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
9 changes: 0 additions & 9 deletions .gitmodules

This file was deleted.

1 change: 0 additions & 1 deletion 3rdparty/argparse
Submodule argparse deleted from 557948
1 change: 0 additions & 1 deletion 3rdparty/googletest
Submodule googletest deleted from 458046
1 change: 0 additions & 1 deletion 3rdparty/tokenizers-cpp
Submodule tokenizers-cpp deleted from 27dbe1
176 changes: 0 additions & 176 deletions CMakeLists.txt

This file was deleted.

6 changes: 0 additions & 6 deletions CONTRIBUTORS.md

This file was deleted.

Loading