Skip to content

Commit

Permalink
Python: dev_setup and readme updates (#5770)
Browse files Browse the repository at this point in the history
This pull request primarily updates the `DEV_SETUP.md` and `README.md`
files in the `python` directory. The changes focus on improving the
setup instructions, updating the code snippets, and providing additional
information on running tests, code coverage, and keeping the local
repository up-to-date.

Changes to setup instructions:

*
[`python/DEV_SETUP.md`](diffhunk://#diff-22a026622159be328328c5bb6fef874e8aa799efb6ab8844d0c94ad051b08f3cR75-R90):
Added instructions on how to use Python 3.11 with Poetry, how to install
pre-commit hooks, and how to run tests using VSCode Tasks.
[[1]](diffhunk://#diff-22a026622159be328328c5bb6fef874e8aa799efb6ab8844d0c94ad051b08f3cR75-R90)
[[2]](diffhunk://#diff-22a026622159be328328c5bb6fef874e8aa799efb6ab8844d0c94ad051b08f3cR112-R114)
[[3]](diffhunk://#diff-22a026622159be328328c5bb6fef874e8aa799efb6ab8844d0c94ad051b08f3cR131-R133)
[[4]](diffhunk://#diff-22a026622159be328328c5bb6fef874e8aa799efb6ab8844d0c94ad051b08f3cL287-R252)

Code snippets updates:

*
[`python/DEV_SETUP.md`](diffhunk://#diff-22a026622159be328328c5bb6fef874e8aa799efb6ab8844d0c94ad051b08f3cL140-L227):
Updated the code snippets to reflect the latest changes in the codebase.
This includes changes in the use of Pydantic and the use of
KernelBaseModel.
[[1]](diffhunk://#diff-22a026622159be328328c5bb6fef874e8aa799efb6ab8844d0c94ad051b08f3cL140-L227)
[[2]](diffhunk://#diff-22a026622159be328328c5bb6fef874e8aa799efb6ab8844d0c94ad051b08f3cL244-R177)
*
[`python/README.md`](diffhunk://#diff-217ed82f87b78b399e304b07a159b27dff327c0f83adf4a2fc30b03bcbf84b01L58-R66):
Updated the code snippet to reflect the change in getting request
settings from a service ID.

Additional information:

*
[`python/DEV_SETUP.md`](diffhunk://#diff-22a026622159be328328c5bb6fef874e8aa799efb6ab8844d0c94ad051b08f3cL287-R252):
Added information on how to run code coverage, how to keep the local
repository up-to-date, and how to resolve conflicts after a rebase.
*
[`python/README.md`](diffhunk://#diff-217ed82f87b78b399e304b07a159b27dff327c0f83adf4a2fc30b03bcbf84b01R7-R14):
Added instructions on how to install optional dependencies.
  • Loading branch information
eavanvalkenburg authored Apr 4, 2024
1 parent d050f8d commit 0a75ba8
Show file tree
Hide file tree
Showing 3 changed files with 69 additions and 90 deletions.
2 changes: 1 addition & 1 deletion python/.vscode/tasks.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
}
},
{
"label": "Python: Run Checks - PR",
"label": "Python: Run Checks - Staged",
"type": "shell",
"command": "poetry",
"args": [
Expand Down
147 changes: 59 additions & 88 deletions python/DEV_SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,15 +72,22 @@ Note: SK requires at least Poetry 1.2.0.
# Install poetry package
pip3 install poetry

# optionally, define which python version you want to use
poetry env use python3.11

# Use poetry to install base project dependencies
poetry install

# If you want to use connectors such as hugging face
# poetry install --with <connector group name>
# If you want to get all dependencies for tests installed, use
# poetry install --with tests
# example: poetry install --with hugging_face

# Use poetry to activate project venv
poetry shell

# Optionally, you can install the pre-commit hooks
poetry run pre-commit install
# this will run linters and mypy checks on all the changed code.
```

## VSCode Setup
Expand All @@ -90,8 +97,7 @@ command from the command palette. Make sure the virtual env (venv) created by
`poetry` is selected.
The python you're looking for should be under `~/.cache/pypoetry/virtualenvs/semantic-kernel-.../bin/python`.

If prompted, install `black` and `flake8` (if VSCode doesn't find those packages,
it will prompt you to install them).
If prompted, install `ruff` and `black` (these should have been installed as part of `poetry install`).

## Tests

Expand All @@ -103,6 +109,9 @@ You can run the unit tests under the [tests/unit](tests/unit/) folder.
poetry run pytest tests/unit
```

Alternatively, you can run them using VSCode Tasks. Open the command palette
(`Ctrl+Shift+P`) and type `Tasks: Run Task`. Select `Python: Tests - Unit` or `Python: Tests - Code Coverage` from the list.

You can run the integration tests under the [tests/integration](tests/integration/) folder.

```bash
Expand All @@ -119,6 +128,9 @@ You can also run all the tests together under the [tests](tests/) folder.
poetry run pytest tests
```

Alternatively, you can run them using VSCode Tasks. Open the command palette
(`Ctrl+Shift+P`) and type `Tasks: Run Task`. Select `Python: Tests - All` from the list.

## Tools and scripts

## Implementation Decisions
Expand All @@ -137,94 +149,13 @@ with either `async def` or `def` to understand if something is asynchronous or n

This section describes how one can enable serialization for their class using Pydantic.

IMPORTANT: This document (and SemanticKernel) currently use Pydantic 1.x. When SK is upgraded
to use Pydantic 2.x, this document will be upgraded accordingly.

### Terminology

There are 3 types of classes you need to be aware of when enabling serialization with Pydantic:

1. Classes which contain no data - examples are Protocols, ABC subclasses and any other classes
that don't contain any data that needs to be serialized.
2. Classes which contain data that need to be serialized, but don't contain any generic classes.
3. Classes which contain data that need to be serialized, AND contain generic classes.

### Upgrading existing classes to use Pydantic

#### Classes without any data

Let's take the following classes as examples - 1 ABC, 1 Protocol, and 1 class that only contains
data that doesn't need to be serialized.

```python
class A(Protocol):
def some_method(self, *args, **kwargs): ...

class B(ABC):
def some_method(self, *args, **kwargs): ...

class C:
def __init__(self):
# IMPORTANT: These variables are NOT being passed into the initializer
# so they don't need to be serialized. If the are though, you'll have
# to treat this as a class that contains data that needs to be serialized
self._a = ...
```

For `Protocol` subclasses, nothing needs to be done, and they can be left as is.

For the remaining types, SemanticKernel provides a class named `PydanticField`. Subclassing
from this field is sufficient to have these types of classes as valid Pydantic fields, and allows
any class using them as attributes to be serialized.

```python
from semantic_kernel.kernel_pydantic import PydanticField

class B(PydanticField): ... # correct, B is still an ABC because PydanticField subclasses ABC
class B(PydanticField, ABC): ... # Also correct
class B(ABC, PydanticField): ... # ERROR: Python cannot find a valid super class ordering.

class C(PydanticField): ... # No other changes needed
```

The classes B and C can now be used as valid Pydantic Field annotations.

```python
from pydantic import BaseModel

class MyModel(BaseModel):
b: B
c: C
```

Class A can only be used as a Pydantic Field annotation for a Pydantic BaseModel subclass
which is configured to allow arbitrary field types like so:

```python
from pydantic import BaseModel
class IncorrectModel(BaseModel):
a: A # Pydantic error

class CorrectModel(BaseModel):
a: A # Okay
class Config: # Configuration that tells Pydantic to allow field types that it can't serialize
arbitrary_types_allowed = True
```

#### Classes with data, but no Generic types that need to be serialized

If your class has any data that needs to be serialized, but the field annotation for that data type
in your class is not a Generic type, this section applies to you.

Let's take the following example:

```python
class A:
def __init__(self, a: int, b: float, c: List[float], d: dict[str, tuple[float, str]] = {}):
# Since a, b, c and d are needed to initialize this class, they need to be serialized
# if can be serialized.
# Although a, b, c and d are builtin python types, any valid pydantic field can be used
# here. This includes the classes defined in the previous category.
self.a = a
self.b = b
self.c = c
Expand All @@ -241,9 +172,9 @@ class A(KernelBaseModel):
# The notation for the fields is similar to dataclasses.
a: int
b: float
c: List[float]
c: list[float]
# Only, instead of using dataclasses.field, you would use pydantic.Field
d: dict[str, tuple[flost, str]] = Field(default_factory=dict)
d: dict[str, tuple[float, str]] = Field(default_factory=dict)
```

#### Classes with data that need to be serialized, and some of them are Generic types
Expand Down Expand Up @@ -284,5 +215,45 @@ To run the same checks that run during the GitHub Action build, you can use
this command, from the [python](../python) folder:

```bash
poetry run pre-commit run -c ../.pre-commit-config.yaml -a
poetry run pre-commit run -a
```

or use the following task (using `Ctrl+Shift+P`):
- `Python - Run Checks` to run the checks on the whole project.
- `Python - Run Checks - Staged` to run the checks on the currently staged files only.

Ideally you should run these checks before committing any changes, use `poetry run pre-commit install` to set that up.

## Code Coverage

We try to maintain a high code coverage for the project. To run the code coverage on the unit tests, you can use the following command:

```bash
cd python
poetry run pytest --cov=semantic_kernel --cov-report=term-missing:skip-covered tests/unit/
```
or use the following task (using `Ctrl+Shift+P`):
- `Python: Tests - Code Coverage` to run the code coverage on the whole project.

This will show you which files are not covered by the tests, including the specific lines not covered.

## Catching up with the latest changes
There are many people committing to Semantic Kernel, so it is important to keep your local repository up to date. To do this, you can run the following commands:

```bash
git fetch upstream main
git rebase upstream/main
git push --force-with-lease
```
or:

```bash
git fetch upstream main
git merge upstream/main
git push
```

This is assuming the upstream branch refers to the main repository. If you have a different name for the upstream branch, you can replace `upstream` with the name of your upstream branch.

After running the rebase command, you may need to resolve any conflicts that arise. If you are unsure how to resolve a conflict, please refer to the [GitHub's documentation on resolving conflicts](https://docs.github.com/en/get-started/using-git/resolving-merge-conflicts-after-a-git-rebase), or for [VSCode](https://code.visualstudio.com/docs/sourcecontrol/overview#_merge-conflicts).

10 changes: 9 additions & 1 deletion python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,14 @@ Install the latest package:

python -m pip install --upgrade semantic-kernel

If you want to use some of the optional dependencies (OpenAI is installed by default), you can install them with:

python -m pip install --upgrade semantic-kernel[hugging_face]

of all of them:

python -m pip install --upgrade semantic-kernel[all]

# AI Services

## OpenAI / Azure OpenAI API keys
Expand Down Expand Up @@ -55,7 +63,7 @@ kernel.add_service(
# )

# Define the request settings
req_settings = kernel.get_service(service_id).get_prompt_execution_settings_class()(service_id=service_id)
req_settings = kernel.get_prompt_execution_settings_from_service_id(service_id)
req_settings.max_tokens = 2000
req_settings.temperature = 0.7
req_settings.top_p = 0.8
Expand Down

0 comments on commit 0a75ba8

Please sign in to comment.