You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pages/generators/train.mdx
+80-29Lines changed: 80 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,51 +16,102 @@ You can quickly train a new generator with a single tabular data file.
16
16
17
17
<CustomCallout>You can also train generators with two-table and multi-table datasets. For more information, see [Set table relationships](/generators/configure/set-table-relationships).</CustomCallout>
18
18
19
-
<Tabsitems={['UI', 'Synthetic Data SDK']}>
19
+
<Tabsitems={['Platform', 'Assistant', 'Synthetic Data SDK']}>
20
20
<Tabs.Tab>
21
-
If you use the web application, you can start the training of a new generator from the **Generators** page.
22
21
23
-
**Steps**
22
+
1. On the MOSTLY AI platform, open **Generators** from the left-side navigation menu.
23
+
2. There are four ways to create a new generator:
24
24
25
-
1. On the **Generators** page, click **+ New generator**.
| Model | The model your generator uses to create synthetic data. |
38
+
| Compute | The [compute resources](/administration/compute) used to train the generator. |
39
+
| Training parameters | The model-level parameters which control the training process. Each parameter is defined by a tooltip in the platform. |
40
+
| Differential privacy | Use differential privacy when you need a mathematical guarantee of privacy, with epsilon quantifying the upper bound on an individual's impact on the trained model. |
41
+
| Flexible generation | Enabled by default, flexible generation gives you the option to apply smart imputation, data rebalancing, seeded generation and apply fairness when you generate synthetic datasets with the model. |
42
+
| Value protection | Value protection prevents membership inference by replacing rare categories and removing extreme values from your dataset. |
43
+
| Model report | Enabled by default, the Model report provides metrics and charts to gauge the quality of a model. The calculated metrics and charts include accuracy, similarity, and distances between original and synthetic samples, and the correlations, univariate, and bivariate distribution charts to compare the original and synthetic correlations and distributions. |
31
44
32
-
2. Add a table with file upload.
33
-
1. From the **Add data** window, click **Upload file**.
34
-
<Callout>You can also add data from a database or cloud bucket when you select **Connect to a source**. To learn more, see [Add data from a database](/generators/configure/add-data#add-data-from-a-database) and [Add data from a cloud storage bucket](/generators/configure/add-data#add-data-from-a-cloud-bucket).</Callout>
35
-
<Imagesrc="/docimages/generators/train/02-select-upload-file.webp"alt="MOSTLY AI - Generators page - Click Upload file"width={800}height={300} />
36
-
2. Under **Upload file**, drag a local file onto the box or click the box to browse your local file system.<br />
37
-
<Callout>If you need a dataset, download one from the [Datasets](/datasets) page.</Callout>
38
-
<Imagesrc="/docimages/generators/train/03-upload-file-drag-browse.webp"alt="MOSTLY AI - Generators page - Drag to upload or click to browse"width={800}height={300} />
39
-
3. (Optional) Enter a name for the table.<br />
40
-
The table name appears in the list of tables added to the generator. Also, the table name that you provide is what appears in each generated synthetic dataset.
41
-
<Imagesrc="/docimages/generators/train/04-enter-table-name.webp"alt="MOSTLY AI - Generators page - Name the new table"width={575}height={300} />
42
-
4. Click **Proceed**.
43
-
3. Train the generator.
44
-
1. Click **Configure models** in the upper right.
45
-
<Imagesrc="/docimages/generators/train/05-click-configure-models.webp"alt="MOSTLY AI - Data configuration - Click Configure models"width={800}height={300} />
46
-
2. On the **Model configuration** page, click **Start training**.
47
-
<Imagesrc="/docimages/generators/train/06-click-start-training.webp"alt="MOSTLY AI - Model configuration - click Start training"width={800}height={300} />
45
+
{''}
48
46
49
-
**Result**
47
+
<Callout>MOSTLY AI offers three training Presets in the **Model configuration** section header if you don't want to configure individual parameters: **Accuracy**, **Speed**, and **Turbo**.</Callout>
50
48
51
-
Your generator now starts training. When the training completes, your generator is ready to generate synthetic data.
49
+
5. In the **Model configuration** section header, you can optionally configure **Random State** which is a seed value to ensure reproducible results during training. If left empty, a random seed will be used each time.
50
+
6. After completing configuration, click **Start training** to begin the training process.
51
+
52
+
Follow progress in the **Training status** section on the generator page.
52
53
53
54
</Tabs.Tab>
54
55
<Tabs.Tab>
55
-
Use the code example below to train a new generator with the MOSTLY AI Synthetic Data SDK.
56
56
57
-
```python copy filename="python"
57
+
1. Start a new chat with the Assistant by clicking **New chat** in the left-side navigation menu.
58
+
2. Prompt the Assistant to connect to a configured [dataset](/datasets) or upload a dataset file into the Assistant workspace.
59
+
60
+
```bash
61
+
Connect to the Berka dataset and briefly describe this resource.
62
+
```
63
+
64
+
3. Prompt the Assistant to create a generator with the defined resource.
65
+
66
+
```bash
67
+
Configure a generator that will produce data which follows the statistical patterns of the least active accounts in the dataset.
68
+
```
69
+
70
+
</Tabs.Tab>
71
+
<Tabs.Tab>
72
+
73
+
1.[Install the MOSTLY AI Synthetic Data SDK](/python-sdk#installation).<br /><br />
74
+
You can install and use the SDK in **Local** or **Client** mode.
75
+
76
+
- In **Local mode**, you use the SDK with the compute resources on your local machine (or any Python environment) to [train generators](/generators/train) and [create synthetic datasets](/synthetic-datasets/generate).
77
+
- In **Client mode**, you connect to a remote MOSTLY AI Platform instance and use its available compute resources.
78
+
For details, see [_Local and Client modes_](/python-sdk#local-and-client-modes).<br /><br />
79
+
80
+
2. Create your first generator using the [US Census Income](/datasets#us-census-income-dataset) dataset, start its training, and wait for it to finish.
81
+
82
+
<Tabsitems={['Local mode', 'Client mode']}>
83
+
<Tabs.Tab>
84
+
85
+
```python copy filename="python" {7} # 1. Load original data into a pd.DataFrame
Copy file name to clipboardExpand all lines: pages/quick-start/data-consumers.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ Use a [generator](/generators) to create sythetic data based on your requirement
18
18
<Steps>
19
19
## Step 1: Generate synthetic data
20
20
21
-
<Tabsitems={['Manual', 'Assistant', 'Synthetic Data SDK']}>
21
+
<Tabsitems={['Platform', 'Assistant', 'Synthetic Data SDK']}>
22
22
<Tabs.Tab>
23
23
1. Navigate to the generator that you wish to use by clicking **Generators** in the left-side navigation menu and selecting from the available generators.
24
24
2. On the generator page, click **Generate data**.
0 commit comments