-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathprompt.txt
91 lines (86 loc) · 3.34 KB
/
prompt.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
# DataComp Cluster Quality Classifier for CLIP Model Training
Classify image-text clusters as either "Acceptable" or "Unacceptable" for training high-quality CLIP (Contrastive Language-Image Pre-training) models. These models learn visual concepts from natural language supervision. Focus on clusters that contain diverse, informative, and high-quality samples that would benefit CLIP's ability to understand and connect images with text.
## Key Considerations
1. Diversity: Prioritize clusters with a wide range of subjects, scales, and perspectives.
2. Informative Content: Favor clusters with educational value and rich descriptive text.
3. Quality: Look for clusters likely to contain high-quality imagery based on descriptions.
4. Avoid Repetition: Be cautious of clusters with near-duplicate or highly similar entries.
5. Ethical Concerns: Be wary of clusters potentially containing explicit content or harmful stereotypes.
6. Long-tail Knowledge: Consider the value of specialized knowledge, but be cautious of overly niche content.
## Examples
### Example 1
Cluster data:
```json
{{
"closest_samples": [
"Detailed landscape photography of mountains",
"Close-up of a rare flower species",
"Historic architecture in European cities",
"Portrait of a person from an indigenous culture",
"Microscopic image of a unique cellular structure"
],
"furthest_samples": [
"Abstract digital art",
"Underwater photography of coral reefs",
"Aerial view of urban sprawl",
"Close-up of a mechanical watch movement",
"Traditional cuisine from different cultures"
]
}}
```
Reason: Highly diverse content covering various subjects, scales, and perspectives. Rich descriptive text suggests high-quality imagery. Educational value across multiple domains.
Classification: Acceptable
### Example 2
Cluster data:
```json
{{
"closest_samples": [
"IMG_001.jpg",
"IMG_002.jpg",
"IMG_003.jpg",
"IMG_004.jpg",
"IMG_005.jpg"
],
"furthest_samples": [
"IMG_100.jpg",
"IMG_101.jpg",
"IMG_102.jpg",
"IMG_103.jpg",
"IMG_104.jpg"
]
}}
```
Reason: Generic filenames without descriptive content. Likely low-quality or repetitive images. Provides no useful text-image pairs for CLIP training.
Classification: Unacceptable
### Example 3
Cluster data:
```json
{{
"closest_samples": [
"Quantum computing explained with visual diagrams",
"Renewable energy technologies comparison infographic",
"Ancient civilization archaeological discoveries photos",
"Cutting-edge medical research findings illustrated",
"Innovative sustainable architecture designs renderings"
],
"furthest_samples": [
"Renaissance art masterpieces high-resolution scans",
"Modern dance performance action shots",
"Culinary techniques from around the world step-by-step photos",
"Space exploration milestones historic images",
"Breakthrough in artificial intelligence conceptual illustrations"
]
}}
```
Reason: Extremely diverse and informative content covering various fields of science, technology, culture, and arts. Rich descriptive text suggests high-quality visuals. Excellent material for training CLIP on complex concepts across multiple domains.
Classification: Acceptable
## Cluster to Classify
Cluster data:
```json
{{
"closest_samples": {closest_samples},
"furthest_samples": {furthest_samples}
}}
```
Reason:
Classification: