@@ -24,12 +24,6 @@ Testing has shown that |compass| has minimal impact in prototype
2424deployments, though additional performance testing and monitoring is in
2525progress.
2626
27- For best results, use MongoDB 3.2 or higher, which includes the
28- :manual:`$sample </reference/operator/aggregation/sample/>` operator for
29- efficient sampling on a collection. On older versions of MongoDB,
30- |compass| falls back on a
31- :ref:`less efficient sampling method <compass_fallback_sampling>`.
32-
3327You should only execute queries that are indexed appropriately in the
3428database to avoid scanning the entire collection.
3529
@@ -64,63 +58,6 @@ Why am I seeing a warning about a non-genuine MongoDB server?
6458
6559.. include:: /includes/fact-non-genuine-warning.rst
6660
67- .. _compass-faq-sampling:
68-
69- What is sampling and why is it used?
70- ------------------------------------
71-
72- Sampling in |compass| is the selection a subset of data
73- from a particular collection and analyzing the documents within the
74- sample set.
75-
76- Sampling is a common technique in statistical analysis because analyzing
77- a subset of the data gives similar results to analyzing all of it. In
78- addition, sampling allows results to be generated quickly rather than
79- performing a computationally-expensive collection scan.
80-
81- How does sampling work?
82- -----------------------
83-
84- |compass| employs two distinct sampling mechanisms.
85-
86- In MongoDB 3.2, collections are sampled with the
87- :manual:`$sample </reference/operator/aggregation/sample/>` operator via
88- the :manual:`aggregation pipeline </core/aggregation-pipeline>`. This
89- provides efficient random sampling without replacement over the entire
90- collection, or over the subset of documents specified by a query.
91-
92- .. _compass_fallback_sampling:
93-
94- In MongoDB 3.0, collections are sampled via a
95- backwards-compatible algorithm executed entirely within |compass|. It
96- takes place in three stages:
97-
98- 1. |compass| opens a :term:`cursor` on the desired collection, limited
99- to at most 10,000 documents sorted in descending order of the ``_id``
100- field.
101- 2. ``sampleSize`` documents are randomly selected from the stream. To
102- do this efficiently, |compass| employs `reservoir sampling
103- <http://en.wikipedia.org/wiki/Reservoir_sampling>`_.
104- 3. |compass| performs a query to select the chosen documents directly
105- via ``_id``.
106-
107- ``sampleSize`` is set to 1000 documents.
108-
109- .. note::
110- The choice of sampling method is done transparently in the
111- background, with no changes required by the user.
112-
113- Won't sampling miss documents?
114- ------------------------------
115-
116- Sampling is chosen for its efficiency: the amount of time required to
117- perform a sample is minimal, on the order of a few seconds. Increasing
118- the sample confidence will demand more processing power and time.
119- Furthermore, sophisticated outlier detection requires an inspection of
120- every document in a MongoDB deployment, which would be unfeasible for
121- large data sets. The MongoDB team is in the process of conducting user
122- tests on large data sets to find a reasonable balance.
123-
12461What happens to long running queries?
12562-------------------------------------
12663
@@ -133,9 +70,9 @@ Slow Sampling
13370All queries that Compass sends to your MongoDB instance have a timeout
13471flag set which automatically aborts a request if it takes longer than
13572the specified timeout. This timeout is currently set to 10 seconds. If
136- sampling on the database takes longer, Compass will notify you about
137- the timeout and give you the options of (a) retrying with a longer
138- timeout (60 seconds) or (b) running a different query.
73+ :ref:` sampling <sampling>` on the database takes longer, Compass will
74+ notify you about the timeout and give you the options of (a) retrying
75+ with a longer timeout (60 seconds) or (b) running a different query.
13976
14077.. note::
14178
0 commit comments