Skip to content

Commit 2429ea2

Browse files
committed
update create generators
1 parent 9e77814 commit 2429ea2

File tree

5 files changed

+84
-33
lines changed

5 files changed

+84
-33
lines changed

pages/generators/train.mdx

Lines changed: 80 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -16,51 +16,102 @@ You can quickly train a new generator with a single tabular data file.
1616

1717
<CustomCallout>You can also train generators with two-table and multi-table datasets. For more information, see [Set table relationships](/generators/configure/set-table-relationships).</CustomCallout>
1818

19-
<Tabs items={['UI', 'Synthetic Data SDK']}>
19+
<Tabs items={['Platform', 'Assistant', 'Synthetic Data SDK']}>
2020
<Tabs.Tab>
21-
If you use the web application, you can start the training of a new generator from the **Generators** page.
2221

23-
**Steps**
22+
1. On the MOSTLY AI platform, open **Generators** from the left-side navigation menu.
23+
2. There are four ways to create a new generator:
2424

25-
1. On the **Generators** page, click **+ New generator**.
25+
| Method | Description |
26+
| ---------------------- | ----------------------------------------------------------------------------------------- |
27+
| Start from a connector | Use an existing [connector](/connectors) to train a new generator. |
28+
| Upload your data | Provide a CSV, Parquet, or TSV file to train a new generator from your local file system. |
29+
| Use the SDK | Navigate to the [Synthetic Data SDK](/python-sdk) repository. |
30+
| Import a generator | Upload a [configured generator](/generators/export-import#export-a-generator) file. |
2631

27-
<Image src="/docimages/generators/train/01-click-new-generator.webp" alt="MOSTLY AI - Generators page - Click New generator" width={800} height={300} />
28-
**Step result**: You now have a generator object created in the MOSTLY AI database and the generator is listed on the **Generators** page.
32+
3. After selecting your training method and uploading any required files, click **Configure models**.
33+
4. Each connected or uploaded table supports its own configuration. Expand each table description to customize model behavior.
2934

30-
The **Add data** window appears prompting you to add tabular data for your generator to train on.
35+
| Method | Description |
36+
| -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
37+
| Model | The model your generator uses to create synthetic data. |
38+
| Compute | The [compute resources](/administration/compute) used to train the generator. |
39+
| Training parameters | The model-level parameters which control the training process. Each parameter is defined by a tooltip in the platform. |
40+
| Differential privacy | Use differential privacy when you need a mathematical guarantee of privacy, with epsilon quantifying the upper bound on an individual's impact on the trained model. |
41+
| Flexible generation | Enabled by default, flexible generation gives you the option to apply smart imputation, data rebalancing, seeded generation and apply fairness when you generate synthetic datasets with the model. |
42+
| Value protection | Value protection prevents membership inference by replacing rare categories and removing extreme values from your dataset. |
43+
| Model report | Enabled by default, the Model report provides metrics and charts to gauge the quality of a model. The calculated metrics and charts include accuracy, similarity, and distances between original and synthetic samples, and the correlations, univariate, and bivariate distribution charts to compare the original and synthetic correlations and distributions. |
3144

32-
2. Add a table with file upload.
33-
1. From the **Add data** window, click **Upload file**.
34-
<Callout>You can also add data from a database or cloud bucket when you select **Connect to a source**. To learn more, see [Add data from a database](/generators/configure/add-data#add-data-from-a-database) and [Add data from a cloud storage bucket](/generators/configure/add-data#add-data-from-a-cloud-bucket).</Callout>
35-
<Image src="/docimages/generators/train/02-select-upload-file.webp" alt="MOSTLY AI - Generators page - Click Upload file" width={800} height={300} />
36-
2. Under **Upload file**, drag a local file onto the box or click the box to browse your local file system.<br />
37-
<Callout>If you need a dataset, download one from the [Datasets](/datasets) page.</Callout>
38-
<Image src="/docimages/generators/train/03-upload-file-drag-browse.webp" alt="MOSTLY AI - Generators page - Drag to upload or click to browse" width={800} height={300} />
39-
3. (Optional) Enter a name for the table.<br />
40-
The table name appears in the list of tables added to the generator. Also, the table name that you provide is what appears in each generated synthetic dataset.
41-
<Image src="/docimages/generators/train/04-enter-table-name.webp" alt="MOSTLY AI - Generators page - Name the new table" width={575} height={300} />
42-
4. Click **Proceed**.
43-
3. Train the generator.
44-
1. Click **Configure models** in the upper right.
45-
<Image src="/docimages/generators/train/05-click-configure-models.webp" alt="MOSTLY AI - Data configuration - Click Configure models" width={800} height={300} />
46-
2. On the **Model configuration** page, click **Start training**.
47-
<Image src="/docimages/generators/train/06-click-start-training.webp" alt="MOSTLY AI - Model configuration - click Start training" width={800} height={300} />
45+
{' '}
4846

49-
**Result**
47+
<Callout>MOSTLY AI offers three training Presets in the **Model configuration** section header if you don't want to configure individual parameters: **Accuracy**, **Speed**, and **Turbo**.</Callout>
5048

51-
Your generator now starts training. When the training completes, your generator is ready to generate synthetic data.
49+
5. In the **Model configuration** section header, you can optionally configure **Random State** which is a seed value to ensure reproducible results during training. If left empty, a random seed will be used each time.
50+
6. After completing configuration, click **Start training** to begin the training process.
51+
52+
Follow progress in the **Training status** section on the generator page.
5253

5354
</Tabs.Tab>
5455
<Tabs.Tab>
55-
Use the code example below to train a new generator with the MOSTLY AI Synthetic Data SDK.
5656

57-
```python copy filename="python"
57+
1. Start a new chat with the Assistant by clicking **New chat** in the left-side navigation menu.
58+
2. Prompt the Assistant to connect to a configured [dataset](/datasets) or upload a dataset file into the Assistant workspace.
59+
60+
```bash
61+
Connect to the Berka dataset and briefly describe this resource.
62+
```
63+
64+
3. Prompt the Assistant to create a generator with the defined resource.
65+
66+
```bash
67+
Configure a generator that will produce data which follows the statistical patterns of the least active accounts in the dataset.
68+
```
69+
70+
</Tabs.Tab>
71+
<Tabs.Tab>
72+
73+
1. [Install the MOSTLY AI Synthetic Data SDK](/python-sdk#installation).<br /><br />
74+
You can install and use the SDK in **Local** or **Client** mode.
75+
76+
- In **Local mode**, you use the SDK with the compute resources on your local machine (or any Python environment) to [train generators](/generators/train) and [create synthetic datasets](/synthetic-datasets/generate).
77+
- In **Client mode**, you connect to a remote MOSTLY AI Platform instance and use its available compute resources.
78+
For details, see [_Local and Client modes_](/python-sdk#local-and-client-modes).<br /><br />
79+
80+
2. Create your first generator using the [US Census Income](/datasets#us-census-income-dataset) dataset, start its training, and wait for it to finish.
81+
82+
<Tabs items={['Local mode', 'Client mode']}>
83+
<Tabs.Tab>
84+
85+
```python copy filename="python" {7} # 1. Load original data into a pd.DataFrame
5886
import pandas as pd
87+
df = pd.read_csv("https://docs.mostly.ai/datasets/us-census-income.csv.gz")
88+
89+
# 2. Instantiate in Local use
5990
from mostlyai.sdk import MostlyAI
60-
mostly = MostlyAI(api_key="INSERT_API_KEY")
91+
mostly = MostlyAI(local=True)
92+
93+
# 3. Create a generator and launch its training
94+
g = mostly.train(data=df, start=True, wait=True)
95+
```
96+
97+
</Tabs.Tab>
98+
<Tabs.Tab>
99+
100+
```python copy filename="python" {7}
101+
# 1. Load original data into a pd.DataFrame
102+
import pandas as pd
61103
df = pd.read_csv("https://docs.mostly.ai/datasets/us-census-income.csv.gz")
62-
g = mostly.train(data=df, name="US Census Income")
104+
105+
# 2. Instantiate in Client mode by connecting to a remote Platform instance
106+
from mostlyai.sdk import MostlyAI
107+
mostly_remote = MostlyAI(base_url="https://app.mostly.ai", api_key="INSERT_API_KEY")
108+
109+
# 3. Create a generator and launch its training
110+
g = mostly.train(data=df, start=True, wait=True)
63111
```
64112

65113
</Tabs.Tab>
66114
</Tabs>
115+
116+
</Tabs.Tab>
117+
</Tabs>

pages/quick-start/connector-quickstart.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Follow these instructions to create a new connector. You can share connectors wi
1717
<Steps>
1818
## Step 1: Create a connector
1919

20-
<Tabs items={['Manual', 'Assistant', 'Synthetic Data SDK']}>
20+
<Tabs items={['Platform', 'Assistant', 'Synthetic Data SDK']}>
2121
<Tabs.Tab>
2222

2323
1. On the MOSTLY AI platform, open **Connectors** from the left-side navigation menu.

pages/quick-start/data-consumers.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Use a [generator](/generators) to create sythetic data based on your requirement
1818
<Steps>
1919
## Step 1: Generate synthetic data
2020

21-
<Tabs items={['Manual', 'Assistant', 'Synthetic Data SDK']}>
21+
<Tabs items={['Platform', 'Assistant', 'Synthetic Data SDK']}>
2222
<Tabs.Tab>
2323
1. Navigate to the generator that you wish to use by clicking **Generators** in the left-side navigation menu and selecting from the available generators.
2424
2. On the generator page, click **Generate data**.

pages/quick-start/dataset-quickstart.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Follow these instructions to create a new dataset. You can share datasets with o
1818
<Steps>
1919
## Step 1: Create a dataset
2020

21-
<Tabs items={['Manual', 'Assistant']}>
21+
<Tabs items={['Platform', 'Assistant']}>
2222
<Tabs.Tab>
2323

2424
1. On the MOSTLY AI platform, open **Datasets** from the left-side navigation menu.

pages/quick-start/model-creators.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Follow these instructions to create a new generator. You can transfer the genera
1818
<Steps>
1919
## Step 1: Train a generator
2020

21-
<Tabs items={['Manual', 'Assistant', 'Synthetic Data SDK']}>
21+
<Tabs items={['Platform', 'Assistant', 'Synthetic Data SDK']}>
2222
<Tabs.Tab>
2323

2424
1. On the MOSTLY AI platform, open **Generators** from the left-side navigation menu.

0 commit comments

Comments
 (0)