You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tribe_blog/1_gentle_intro.md
+34-54Lines changed: 34 additions & 54 deletions
Original file line number
Diff line number
Diff line change
@@ -267,15 +267,10 @@ Claude response: A JSON schema is a declarative format for describing the struct
267
267
```
268
268
This is how we access the LLM's responses.
269
269
### 3. Using the System Prompt to Guide the Output Format
270
-
We now want to obtain more structured outputs.
271
-
We will pose the same question on the JSON schema, but this time we will provide the model with an example of the desired output format directly in the prompt, separately including an example Python dictionary, a JSON string and a YAML string.
270
+
We now want to obtain more structured outputs, for instance
271
+
a response that contains a topic, citations and a short answer. So we pose the same question on the JSON schema, but this time we provide the model with an example of the desired output format directly in the prompt, separately including an example Python dictionary, a JSON string and a YAML string.
272
272
273
-
Anthropic's models are trained to receive instructions through the system prompt, which is a message that is prepended to the user's message and sent along with it to help guide the model's response. In order to access the system prompt, we use the `system` argument in the `messages.create` function.
274
-
275
-
276
-
Imagine we wanted our structured response to contain a topic, citations and a short answer.
277
-
278
-
For the model to be able and output a Python dictionary, we provide the following example:
273
+
For the model to be able and output a Python dictionary, we include the following example in the prompt:
279
274
1. Python Dictionary:
280
275
```python
281
276
example_dictionary = {
@@ -284,13 +279,13 @@ example_dictionary = {
284
279
"answer": "The .zip format is a compressed file format that groups multiple files into a single archive, with the files inside the archive appearing as if they were not compressed."
285
280
}
286
281
```
287
-
Dictionaries are native Python data structures. They are easy to work with in Python code but not easily interchangeable with other programming languages. A more exchangeable format is:
282
+
Dictionaries are native Python data structures. They are easy to work with in Python, but not easily interchangeable with other programming languages. A more exchangeable format is:
288
283
289
284
2. JSON (JavaScript Object Notation):
290
285
```python
291
286
example_json_string ='{"topic": "zip format", "citations": [{"citation_number": 1, "source": "https://example.com"}], "answer": "The .zip format is a compressed file format that groups multiple files into a single archive, with the files inside the archive appearing as if they were not compressed."}'
292
287
```
293
-
JSON is easy for humans to read and write, and easy for machines to parse and generate. It's language-independent and widely used for API responses and configuration files. Finally, a more human-friendly format is:
288
+
JSON is easy for humans to read and write, and easy for machines to parse and generate. It's language-independent and widely used for API responses and configuration files. An even more human-friendly format is:
294
289
295
290
3. YAML (YAML Ain't Markup Language):
296
291
```python
@@ -301,9 +296,12 @@ citations:
301
296
answer: The .zip format is a compressed file format that groups multiple files into a single archive, with the files inside the archive appearing as if they were not compressed.
302
297
"""
303
298
```
304
-
YAML is data serialization standard, often used for configuration files and in applications where data is stored or transmitted. It is more readable than JSON for complex structures but can be more prone to errors due to its reliance on indentation.
299
+
YAML is data serialization standard, often used for configuration files and in applications where data is stored or transmitted. It is more readable than JSON for complex structures, but can be prone to errors due to its reliance on indentation.
305
300
306
-
Now, let's use these examples in our prompts:
301
+
Now, let's use these three examples in our prompts.
302
+
303
+
304
+
Anthropic's models are trained to receive instructions through the system prompt, which is a message that is prepended to the user's message and sent along with it to help guide the model's response. In order to access the system prompt, we use the `system` argument in the `messages.create` function.
307
305
308
306
```python
309
307
response_list = []
@@ -333,7 +331,7 @@ This will give us the following output in a Python dictionary format:
333
331
"answer": "A JSON schema is a declarative language that allows you to annotate and validate JSON documents, defining the structure, constraints, and documentation of JSON data."
334
332
}
335
333
```
336
-
Here's the answer from the json string format:
334
+
Here's the answer from the JSON string format:
337
335
338
336
```json
339
337
{
@@ -348,7 +346,7 @@ Here's the answer from the json string format:
348
346
}
349
347
```
350
348
351
-
And this is the answer from the yaml string format:
349
+
And this is the answer from the YAML string format:
352
350
353
351
```yaml
354
352
topic: JSON schema
@@ -360,7 +358,7 @@ citations:
360
358
answer: A JSON schema is a declarative language that allows you to annotate and validate JSON documents, defining the structure, constraints, and documentation of JSON data.
361
359
```
362
360
363
-
Now, let's parse each of these responses to demonstrate how we can work with different formats. It's important to note that executing generated Python code can be risky in general. We're only doing this for a simple example and with a safe and reliable model like Claude 3.5 Sonnet. In real-world applications, always validate and sanitize any data before processing.
361
+
Parsing each of these responses demonstrates how we can work with different formats. Note that executing generated Python code can be risky in general. We're only doing this for illustrative purposes, and with a model as safe and reliable as Claude 3.5 Sonnet. In real-world applications, always validate and sanitize any data before processing.
364
362
365
363
```python
366
364
import json
@@ -392,9 +390,9 @@ for response, format_type in zip(response_list, ['dict', 'json', 'yaml']):
392
390
print(f"\nFailed to parse {format_type.upper()} response")
393
391
```
394
392
395
-
This code demonstrates how to parse each type of response. For the Python dictionary, we use `ast.literal_eval()`. For JSON, we use the built-in `json` module, and for YAML, we use the `pyyaml` library's `safe_load()` function.
393
+
This code shows how to parse each type of response. For the Python dictionary, we use `ast.literal_eval()`. For JSON, we use the built-in `json` module, and for YAML, we use the `pyyaml` library's `safe_load()` function.
396
394
397
-
By using these different formats and parsing methods, we can see how structured generation can produce outputs that are not only human-readable but also easily processable by machines. This flexibility is what allows us to integrate LLM outputs into broader workflows.
395
+
By using these different formats and parsing methods, structured generation produces outputs that are not only human-readable, but also easily processable by machines. This flexibility is what allows us to integrate LLM outputs into broader workflows.
398
396
399
397
As a final example, we will process a larger list of file formats, create a dictionary of their responses indexed by the topic with answers as values, and save it to disk as a JSON file.
400
398
@@ -428,58 +426,40 @@ with open('file_formats_info.json', 'w') as f:
428
426
print("File format information has been saved to file_formats_info.json")
429
427
```
430
428
431
-
Overall, for simple tasks like the one above, prompt engineering is an efficient strategyto guide the model towards the desired output format.
429
+
Overall, for simple tasks like the one above, prompt engineering proves an efficient strategy to guide the model towards the desired output format.
432
430
433
431
## Conclusion
434
432
435
-
In this tutorial, we've demonstrated a practical implementation of structured generation via prompt engineering. Using Anthropic's Claude API, we have successfully steered LLM outputs into predefined formats (Python dictionaries, JSON, and YAML), which enhance the reliability and usability of AI-generated content in various applications.
433
+
In this tutorial, we've presented a practical implementation of structured generation via prompt engineering. Using Anthropic's Claude API, we have successfully steered LLM outputs into predefined formats (Python dictionaries, JSON, and YAML).
436
434
437
435
Key takeaways:
438
-
1. Structured generation addresses the challenge of inconsistent LLM outputs, crucial for integrating AI into enterprise systems.
436
+
1. Structured generation addresses the challenge of inconsistent LLM outputs, as a crucial step for integrating AI into enterprise systems.
439
437
2. Combining prompt engineering with structured generation techniques offers fine-grained control over model responses.
440
-
3. Different output formats (Python dict, JSON, YAML) cater to various use cases and integration needs.
441
-
4. Parsing and processing structured outputs enables seamless incorporation into existing workflows.
438
+
3. Different output formats (Python dict, JSON, YAML) cater to various use cases and system integration needs.
439
+
4. Parsing and processing structured outputs enables seamless incorporation into broader workflows.
442
440
443
-
As we advance in this series, we'll look deeper into more sophisticated techniques like assistant response prefilling, function calling, and leveraging Pydantic for schema generation and validation. These advanced methods will further refine our ability harness the full potential of LLMs while maintaining the consistency and reliability required for production-grade systems.
444
-
445
-
If you found this tutorial helpful, please consider showing your support:
446
-
447
-
1. Star our GitHub repository: [StructuredGenTutorial](https://github.com/HK3-Lab-Team/StructuredGenTutorial)
448
-
2. Stay tuned for our next blog post on [tribe.ai/blog](https://www.tribe.ai/blog)
449
-
3. Follow us on Twitter:
450
-
- [@hyp_enri](https://twitter.com/hyp_enri)
451
-
- [@cyndesama](https://twitter.com/cyndesama)
452
-
453
-
We're excited to continue the next chapter of this series on Schema Engineering, explaining advanced techniques like assistant response prefilling, function calling, and integrating Pydantic for more complex structured generation tasks.
441
+
As we advance in this series, we'll look into more sophisticated techniques like assistant response prefilling, function calling, and we'll leverage Pydantic for schema generation and validation. These methods will further refine our ability harness the full potential of AI-generated content, while maintaining the consistency and reliability required for production-grade systems.
454
442
455
443
444
+
## Learning More
456
445
446
+
Here are some valuable resources on structured generation and related topics:
457
447
448
+
1. **[Anthropic Cookbook](https://github.com/anthropics/anthropic-cookbook/blob/main/tool_use/extracting_structured_json.ipynb)**: Explore practical examples of using Claude for structured JSON data extraction. The cookbook covers tasks like summarization, entity extraction, and sentiment analysis, from basic implementations to advanced use cases.
458
449
459
-
## Learning More
450
+
2. **[Instructor Library](https://jxnl.github.io/instructor/)**: A powerful tool for generating structured outputs with LLMs, built on top of Pydantic.
460
451
461
-
Here are some valuable resources to deepen your understanding of structured generation and related topics:
452
+
3. **[Outlines Library](https://github.com/dottxt-ai/outlines)**: An open-source implementation for structured generation with multiple model integrations.
462
453
463
-
1. **Anthropic Cookbook**: Explore practical examples of using Claude for structured JSON data extraction. The cookbook covers tasks like summarization, entity extraction, and sentiment analysis:
464
-
[Extracting Structured JSON with Claude](https://github.com/anthropics/anthropic-cookbook/blob/main/tool_use/extracting_structured_json.ipynb)
465
454
466
-
2. **Instructor Library**: A powerful tool for generating structured outputs with LLMs, built on top of Pydantic:
0 commit comments