Skip to content

[Bug]: if *.xls is too large, .eg 50M, i get error. so i fix it #4856

Closed
@SkyfireWXY

Description

@SkyfireWXY

Is there an existing issue for the same bug?

  • I have checked the existing issues.

RAGFlow workspace code commit ID

none

RAGFlow image version

v0.15.0

Other environment information

Actual behavior

i didn't keep the erryr report, sorry

Expected behavior

No response

Steps to reproduce

upload a large .xls file,

Additional information

i fixed the bug by modify deepdoc/parser/excel_parser.py, class RAGFlowExcelParser:,

def call(self, fnm):
# if isinstance(fnm, str):
# wb = load_workbook(fnm)
# else:
# wb = load_workbook(BytesIO(fnm))

s_fnm = fnm
if not isinstance(fnm, str):
  s_fnm = BytesIO(fnm)
else: pass

try:
  wb = load_workbook(s_fnm)
except Exception as e:
  print(f'****wxy: file parser error: {e}, s_fnm={s_fnm}, trying convert files')
  df = pd.read_excel(s_fnm)
  wb = Workbook()
  if len(wb.worksheets) > 0:
    del wb.worksheets[0]
  else: pass
  ws = wb.active
  ws.title = "Data"
  for col_num, column_name in enumerate(df.columns, 1):
    ws.cell(row=1, column=col_num, value=column_name)
  else: pass
  for row_num, row in enumerate(df.values, 2):
    for col_num, value in enumerate(row, 1):
      ws.cell(row=row_num, column=col_num, value=value)
    else: pass
  else: pass

res = []
for sheetname in wb.sheetnames:
  ws = wb[sheetname]
  rows = list(ws.rows)
  if not rows:
    continue
  ti = list(rows[0])
  for r in list(rows[1:]):
    fields = []
    for i, c in enumerate(r):
      if not c.value:
        continue
      t = str(ti[i].value) if i < len(ti) else ""
      t += (":" if t else "") + str(c.value)
      fields.append(t)
    line = "; ".join(fields)
    if sheetname.lower().find("sheet") < 0:
      line += " ——" + sheetname
    res.append(line)
return res

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions