Skip to content

Commit f28f30c

Browse files
Modified Gentle Intro (Blog 1)
1 parent 6acdcdd commit f28f30c

File tree

1 file changed

+34
-54
lines changed

1 file changed

+34
-54
lines changed

tribe_blog/1_gentle_intro.md

Lines changed: 34 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -267,15 +267,10 @@ Claude response: A JSON schema is a declarative format for describing the struct
267267
```
268268
This is how we access the LLM's responses.
269269
### 3. Using the System Prompt to Guide the Output Format
270-
We now want to obtain more structured outputs.
271-
We will pose the same question on the JSON schema, but this time we will provide the model with an example of the desired output format directly in the prompt, separately including an example Python dictionary, a JSON string and a YAML string.
270+
We now want to obtain more structured outputs, for instance
271+
a response that contains a topic, citations and a short answer. So we pose the same question on the JSON schema, but this time we provide the model with an example of the desired output format directly in the prompt, separately including an example Python dictionary, a JSON string and a YAML string.
272272

273-
Anthropic's models are trained to receive instructions through the system prompt, which is a message that is prepended to the user's message and sent along with it to help guide the model's response. In order to access the system prompt, we use the `system` argument in the `messages.create` function.
274-
275-
276-
Imagine we wanted our structured response to contain a topic, citations and a short answer.
277-
278-
For the model to be able and output a Python dictionary, we provide the following example:
273+
For the model to be able and output a Python dictionary, we include the following example in the prompt:
279274
1. Python Dictionary:
280275
```python
281276
example_dictionary = {
@@ -284,13 +279,13 @@ example_dictionary = {
284279
"answer": "The .zip format is a compressed file format that groups multiple files into a single archive, with the files inside the archive appearing as if they were not compressed."
285280
}
286281
```
287-
Dictionaries are native Python data structures. They are easy to work with in Python code but not easily interchangeable with other programming languages. A more exchangeable format is:
282+
Dictionaries are native Python data structures. They are easy to work with in Python, but not easily interchangeable with other programming languages. A more exchangeable format is:
288283

289284
2. JSON (JavaScript Object Notation):
290285
```python
291286
example_json_string = '{"topic": "zip format", "citations": [{"citation_number": 1, "source": "https://example.com"}], "answer": "The .zip format is a compressed file format that groups multiple files into a single archive, with the files inside the archive appearing as if they were not compressed."}'
292287
```
293-
JSON is easy for humans to read and write, and easy for machines to parse and generate. It's language-independent and widely used for API responses and configuration files. Finally, a more human-friendly format is:
288+
JSON is easy for humans to read and write, and easy for machines to parse and generate. It's language-independent and widely used for API responses and configuration files. An even more human-friendly format is:
294289

295290
3. YAML (YAML Ain't Markup Language):
296291
```python
@@ -301,9 +296,12 @@ citations:
301296
answer: The .zip format is a compressed file format that groups multiple files into a single archive, with the files inside the archive appearing as if they were not compressed.
302297
"""
303298
```
304-
YAML is data serialization standard, often used for configuration files and in applications where data is stored or transmitted. It is more readable than JSON for complex structures but can be more prone to errors due to its reliance on indentation.
299+
YAML is data serialization standard, often used for configuration files and in applications where data is stored or transmitted. It is more readable than JSON for complex structures, but can be prone to errors due to its reliance on indentation.
305300

306-
Now, let's use these examples in our prompts:
301+
Now, let's use these three examples in our prompts.
302+
303+
304+
Anthropic's models are trained to receive instructions through the system prompt, which is a message that is prepended to the user's message and sent along with it to help guide the model's response. In order to access the system prompt, we use the `system` argument in the `messages.create` function.
307305

308306
```python
309307
response_list = []
@@ -333,7 +331,7 @@ This will give us the following output in a Python dictionary format:
333331
"answer": "A JSON schema is a declarative language that allows you to annotate and validate JSON documents, defining the structure, constraints, and documentation of JSON data."
334332
}
335333
```
336-
Here's the answer from the json string format:
334+
Here's the answer from the JSON string format:
337335

338336
```json
339337
{
@@ -348,7 +346,7 @@ Here's the answer from the json string format:
348346
}
349347
```
350348

351-
And this is the answer from the yaml string format:
349+
And this is the answer from the YAML string format:
352350

353351
```yaml
354352
topic: JSON schema
@@ -360,7 +358,7 @@ citations:
360358
answer: A JSON schema is a declarative language that allows you to annotate and validate JSON documents, defining the structure, constraints, and documentation of JSON data.
361359
```
362360
363-
Now, let's parse each of these responses to demonstrate how we can work with different formats. It's important to note that executing generated Python code can be risky in general. We're only doing this for a simple example and with a safe and reliable model like Claude 3.5 Sonnet. In real-world applications, always validate and sanitize any data before processing.
361+
Parsing each of these responses demonstrates how we can work with different formats. Note that executing generated Python code can be risky in general. We're only doing this for illustrative purposes, and with a model as safe and reliable as Claude 3.5 Sonnet. In real-world applications, always validate and sanitize any data before processing.
364362
365363
```python
366364
import json
@@ -392,9 +390,9 @@ for response, format_type in zip(response_list, ['dict', 'json', 'yaml']):
392390
print(f"\nFailed to parse {format_type.upper()} response")
393391
```
394392
395-
This code demonstrates how to parse each type of response. For the Python dictionary, we use `ast.literal_eval()`. For JSON, we use the built-in `json` module, and for YAML, we use the `pyyaml` library's `safe_load()` function.
393+
This code shows how to parse each type of response. For the Python dictionary, we use `ast.literal_eval()`. For JSON, we use the built-in `json` module, and for YAML, we use the `pyyaml` library's `safe_load()` function.
396394

397-
By using these different formats and parsing methods, we can see how structured generation can produce outputs that are not only human-readable but also easily processable by machines. This flexibility is what allows us to integrate LLM outputs into broader workflows.
395+
By using these different formats and parsing methods, structured generation produces outputs that are not only human-readable, but also easily processable by machines. This flexibility is what allows us to integrate LLM outputs into broader workflows.
398396

399397
As a final example, we will process a larger list of file formats, create a dictionary of their responses indexed by the topic with answers as values, and save it to disk as a JSON file.
400398

@@ -428,58 +426,40 @@ with open('file_formats_info.json', 'w') as f:
428426
print("File format information has been saved to file_formats_info.json")
429427
```
430428

431-
Overall, for simple tasks like the one above, prompt engineering is an efficient strategyto guide the model towards the desired output format.
429+
Overall, for simple tasks like the one above, prompt engineering proves an efficient strategy to guide the model towards the desired output format.
432430

433431
## Conclusion
434432

435-
In this tutorial, we've demonstrated a practical implementation of structured generation via prompt engineering. Using Anthropic's Claude API, we have successfully steered LLM outputs into predefined formats (Python dictionaries, JSON, and YAML), which enhance the reliability and usability of AI-generated content in various applications.
433+
In this tutorial, we've presented a practical implementation of structured generation via prompt engineering. Using Anthropic's Claude API, we have successfully steered LLM outputs into predefined formats (Python dictionaries, JSON, and YAML).
436434

437435
Key takeaways:
438-
1. Structured generation addresses the challenge of inconsistent LLM outputs, crucial for integrating AI into enterprise systems.
436+
1. Structured generation addresses the challenge of inconsistent LLM outputs, as a crucial step for integrating AI into enterprise systems.
439437
2. Combining prompt engineering with structured generation techniques offers fine-grained control over model responses.
440-
3. Different output formats (Python dict, JSON, YAML) cater to various use cases and integration needs.
441-
4. Parsing and processing structured outputs enables seamless incorporation into existing workflows.
438+
3. Different output formats (Python dict, JSON, YAML) cater to various use cases and system integration needs.
439+
4. Parsing and processing structured outputs enables seamless incorporation into broader workflows.
442440

443-
As we advance in this series, we'll look deeper into more sophisticated techniques like assistant response prefilling, function calling, and leveraging Pydantic for schema generation and validation. These advanced methods will further refine our ability harness the full potential of LLMs while maintaining the consistency and reliability required for production-grade systems.
444-
445-
If you found this tutorial helpful, please consider showing your support:
446-
447-
1. Star our GitHub repository: [StructuredGenTutorial](https://github.com/HK3-Lab-Team/StructuredGenTutorial)
448-
2. Stay tuned for our next blog post on [tribe.ai/blog](https://www.tribe.ai/blog)
449-
3. Follow us on Twitter:
450-
- [@hyp_enri](https://twitter.com/hyp_enri)
451-
- [@cyndesama](https://twitter.com/cyndesama)
452-
453-
We're excited to continue the next chapter of this series on Schema Engineering, explaining advanced techniques like assistant response prefilling, function calling, and integrating Pydantic for more complex structured generation tasks.
441+
As we advance in this series, we'll look into more sophisticated techniques like assistant response prefilling, function calling, and we'll leverage Pydantic for schema generation and validation. These methods will further refine our ability harness the full potential of AI-generated content, while maintaining the consistency and reliability required for production-grade systems.
454442

455443

444+
## Learning More
456445

446+
Here are some valuable resources on structured generation and related topics:
457447

448+
1. **[Anthropic Cookbook](https://github.com/anthropics/anthropic-cookbook/blob/main/tool_use/extracting_structured_json.ipynb)**: Explore practical examples of using Claude for structured JSON data extraction. The cookbook covers tasks like summarization, entity extraction, and sentiment analysis, from basic implementations to advanced use cases.
458449

459-
## Learning More
450+
2. **[Instructor Library](https://jxnl.github.io/instructor/)**: A powerful tool for generating structured outputs with LLMs, built on top of Pydantic.
460451

461-
Here are some valuable resources to deepen your understanding of structured generation and related topics:
452+
3. **[Outlines Library](https://github.com/dottxt-ai/outlines)**: An open-source implementation for structured generation with multiple model integrations.
462453

463-
1. **Anthropic Cookbook**: Explore practical examples of using Claude for structured JSON data extraction. The cookbook covers tasks like summarization, entity extraction, and sentiment analysis:
464-
[Extracting Structured JSON with Claude](https://github.com/anthropics/anthropic-cookbook/blob/main/tool_use/extracting_structured_json.ipynb)
465454

466-
2. **Instructor Library**: A powerful tool for generating structured outputs with LLMs, built on top of Pydantic:
467-
[Instructor Documentation](https://jxnl.github.io/instructor/)
468-
469-
Key features:
470-
- Support for multiple LLM models (GPT-3.5, GPT-4, GPT-4-Vision, Mistral/Mixtral, Anyscale, Ollama, llama-cpp-python)
471-
- Simplifies management of validation context and retries
472-
- Enables streaming of Lists and Partial responses
455+
4. **[Pydantic Documentation](https://docs.pydantic.dev/)**: For in-depth information on schema generation and validation.
473456

474-
3. **[Outlines Library](https://github.com/dottxt-ai/outlines)**: An open-source implementation for structured generation with notable features:
475-
- Multiple model integrations (OpenAI, transformers, llama.cpp, exllama2, mamba)
476-
- Prompting primitives based on Jinja templating
477-
- Support for multiple choices, type constraints, and dynamic stopping
478-
- Fast regex and JSON generation
479-
- Grammar-structured generation
480-
- Python function interleaving with completions
481-
- Caching and batch inference capabilities
457+
___
482458

483-
4. **[Pydantic Documentation](https://docs.pydantic.dev/)**: For in-depth information on schema generation and validation.
459+
If you found this tutorial helpful, please consider showing your support:
484460

485-
These resources provide an overview of structured generation techniques, from basic implementations to advanced use case.
461+
1. Star our GitHub repository: [StructuredGenTutorial](https://github.com/HK3-Lab-Team/StructuredGenTutorial)
462+
2. Stay tuned for our next blog post on [tribe.ai/blog](https://www.tribe.ai/blog)
463+
3. Follow us on Twitter:
464+
- [@hyp_enri](https://twitter.com/hyp_enri)
465+
- [@cyndesama](https://twitter.com/cyndesama)

0 commit comments

Comments
 (0)