Skip to content

Releases: opendatalab/WebMainBench

release v1.0.0

24 Oct 07:27
898ed30

Choose a tag to compare

Include 4 extractors and bench for 545 data

What's Changed

  • fix bug:table 重复 by @pekopoke in #42
  • Optimize table edit distance calculation by using normalize by @pekopoke in #43
  • add extractor version in results by @pekopoke in #44
  • fix back to old formula match by @pekopoke in #45
  • feat: add language and style classify by @e06084 in #46
  • 使用LLM修正预测公式 by @1041206149 in #47
  • feat: refactor _extract_from_markdown with LLM-enhanced table/formula/code extraction by @1041206149 in #48
  • Dev:增加trafilatura输出txt的方法 by @pekopoke in #50
  • 将LLM api 配置放到config.py中 by @1041206149 in #51
  • fix:行内行间代码块中不进行表格和公式提取 by @pekopoke in #52

New Contributors

Full Changelog: v0.2.0...v1.0.0

v0.2.0

08 Sep 02:54
b34a46f

Choose a tag to compare

What's Changed

  • feat: add multi extractor compare script by @e06084 in #34
  • feat(metrics): implement comprehensive memoization for TEDS algorithm by @SHUzhangshuo in #35
  • Main html by @darkrush in #37
  • feat: add dataset statics script by @e06084 in #38
  • feat: text_edit metric use all text by @e06084 in #39
  • Dev:优化表格分割、删除code行内分割、teds性能提升 by @pekopoke in #40
  • fix code match by @pekopoke in #41

New Contributors

Full Changelog: v0.0.1...v0.2.0

v0.1.0

28 Aug 02:36
3cdd9e6

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: https://github.com/ccprocessor/WebMainBench/commits/v0.0.1