Cannot start Phi-3-mini-128k-instruct from Docker

### System Info

Using two official Docker images (latest and main). 

### Information

- [X] Docker
- [ ] The CLI directly

### Tasks

- [X] An officially supported command
- [X] My own modifications

### Reproduction

After running
`docker run --gpus all -p 8080:80 -v /home/ubuntu/lorax_weight_cache ghcr.io/predibase/lorax:latest --model-id microsoft/Phi-3-mini-128k-instruct --trust-remote-code`

I get:
```Traceback (most recent call last):

  File "/opt/conda/bin/lorax-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 83, in serve
    server.serve(

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 309, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 243, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/__init__.py", line 251, in get_model
    return FlashPhi3(

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_phi3.py", line 88, in __init__
    model = FlashPhi3ForCausalLM(config, weights)

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_phi3_modeling.py", line 482, in __init__
    self.model = FlashPhi3Model(config, weights)

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_phi3_modeling.py", line 422, in __init__
    [

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_phi3_modeling.py", line 423, in <listcomp>
    FlashPhi3Layer(

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_phi3_modeling.py", line 360, in __init__
    self.self_attn = FlashPhi3Attention(

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_phi3_modeling.py", line 190, in __init__
    self.rotary_emb = PositionRotaryEmbedding.static(

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/layers.py", line 879, in static
    scaling_factor = rope_scaling["factor"]

KeyError: 'factor'
```


After running
`docker run --gpus all -p 8080:80 -v /home/ubuntu/lorax_weight_cache ghcr.io/predibase/lorax:main --model-id microsoft/Phi-3-mini-128k-instruct --trust-remote-code=True`

I get:
```Traceback (most recent call last):
  File "/opt/conda/bin/lorax-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 87, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 371, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 251, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/__init__.py", line 275, in get_model
    return FlashPhi3(
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_phi3.py", line 88, in __init__
    model = FlashPhi3ForCausalLM(config, weights)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_phi3_modeling.py", line 482, in __init__
    self.model = FlashPhi3Model(config, weights)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_phi3_modeling.py", line 422, in __init__
    [
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_phi3_modeling.py", line 423, in <listcomp>
    FlashPhi3Layer(
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_phi3_modeling.py", line 360, in __init__
    self.self_attn = FlashPhi3Attention(
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_phi3_modeling.py", line 190, in __init__
    self.rotary_emb = PositionRotaryEmbedding.static(
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/layers.py", line 479, in static
    raise NotImplementedError(f"rope scaling type {rope_type} is not implemented or invalid")
NotImplementedError: rope scaling type longrope is not implemented or invalid
```

The --trust-remote-code flag is not an issue, running the same command without it outputs the same errors.

### Expected behavior

Server should start successfully.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cannot start Phi-3-mini-128k-instruct from Docker #550

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot start Phi-3-mini-128k-instruct from Docker #550

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions