Skip to content

Commit fdd643d

Browse files
committed
update docs
1 parent 7ab7fd5 commit fdd643d

File tree

12 files changed

+22
-22
lines changed

12 files changed

+22
-22
lines changed

.env

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
es_host=b963fb901829469e968500b83b71ab81.us-central1.gcp.cloud.es.io
1+
es_host=391f53d8501748afaf1fbfc591ed43e1.us-central1.gcp.cloud.es.io
22
es_username=elastic
33
es_port=9243
4-
es_pw=ytuLqABRF5zE7gbwEqL1aR7l
4+
es_pw=VjLfWuuVTOnUZbFlI6a9eriq

prod.sh

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,8 @@
1-
echo "Starting to deploy docker image..."
2-
3-
AWS_REGION=us-east-2
41
DOCKER_CONTAINER_NAME=chatbox-nlp-api-gunicorn-container
52
REPOSITORY_URI=public.ecr.aws/q0s5b2t6/chatbox-nlp-api
63
DEPLOY_DOCKER_COMPOSE_FILE=/home/ec2-user/server/docker-compose.yml
74

5+
echo "Starting to deploy docker image..."
86
echo "Stopping previous containers..."
97
docker ps -q --filter "name=$DOCKER_CONTAINER_NAME" | grep -q . && docker stop $DOCKER_CONTAINER_NAME && docker rm -fv $DOCKER_CONTAINER_NAME
108
if [[ "$(docker images -q $REPOSITORY_URI:latest 2> /dev/null)" != "" ]]; then

run.sh

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,4 @@ python -m src.dataset.start
44
zip wiki_algos_text.zip src/dataset/general_kb/data/* -j
55

66
# connect and write to the document store
7-
python -m src.haystack.elasticsearch.db > out.txt
8-
9-
# refresh the dependency list
10-
pipenv lock -r > requirements.txt
7+
python -m src.haystack.elasticsearch.db

src/dataset/general_kb/data/B-tree.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ According to Knuth's definition, a B-tree of order m is a tree which satisfies t
1313
1. Every node has at most m children.
1414
2. Every internal node has at least ⌈m/2⌉ children.
1515
3. Every non-leaf node has at least two children.
16-
4. All leaves appear on the same level and carry no information.
16+
4. All leaves appear on the same level.
1717
5. A non-leaf node with k children contains k−1 keys.
1818

1919
Each internal node's keys act as separation values which divide its subtrees. For example, if an internal node has 3 child nodes (or subtrees) then it must have 2 keys: a1 and a2. All values in the leftmost subtree will be less than a1, all values in the middle subtree will be between a1 and a2, and all values in the rightmost subtree will be greater than a2.

src/dataset/general_kb/data/Depth-first__search.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ Another possible implementation of iterative depth-first search uses a stack of
8888

8989
procedure DFS_iterative(G, v) is
9090
let S be a stack
91+
label v as discovered
9192
S.push(iterator of G.adjacentEdges(v))
9293
while S is not empty do
9394
if S.peek().hasNext() then

src/dataset/general_kb/data/Heapsort.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
In computer science, heapsort is a comparison-based sorting algorithm. Heapsort can be thought of as an improved selection sort: like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the unsorted region by extracting the largest element from it and inserting it into the sorted region. Unlike selection sort, heapsort does not waste time with a linear-time scan of the unsorted region; rather, heap sort maintains the unsorted region in a heap data structure to more quickly find the largest element in each step.
22

3-
Although somewhat slower in practice on most machines than a well-implemented quicksort, it has the advantage of a more favorable worst-case O(n log n) runtime. Heapsort is an in-place algorithm, but it is not a stable sort.
3+
Although somewhat slower in practice on most machines than a well-implemented quicksort, it has the advantage of a more favorable worst-case O(n log n) runtime (and as such is used by Introsort as a fallback should it detect that quicksort is becoming degenerate). Heapsort is an in-place algorithm, but it is not a stable sort.
44

55
Heapsort was invented by J. W. J. Williams in 1964. This was also the birth of the heap, presented already by Williams as a useful data structure in its own right. In the same year, Robert W. Floyd published an improved version that could sort an array in-place, continuing his earlier research into the treesort algorithm.
66

src/dataset/general_kb/data/Longest__common__subsequence__problem.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -242,6 +242,10 @@ The third drawback is that of collisions. Since the checksum or hash is not guar
242242

243243
If only the length of the LCS is required, the matrix can be reduced to a <math-expression>2\times \min(n,m)</math-expression> matrix with ease, or to a <math-expression>\min(m,n)+1</math-expression> vector (smarter) as the dynamic programming approach only needs the current and previous columns of the matrix. Hirschberg's algorithm allows the construction of the optimal sequence itself in the same quadratic time and linear space bounds.
244244

245+
### Reduce cache misses
246+
247+
Chowdhury and Ramachandran devised a quadratic-time linear-space algorithm for finding the LCS length along with an optimal sequence which runs faster than Hirschberg's algorithm in practice due to its superior cache performance. The algorithm has an asymptotically optimal cache complexity under the Ideal cache model. Interestingly, the algorithm itself is cache-oblivious meaning that it does not make any choices based on the cache parameters (e.g., cache size and cache line size) of the machine.
248+
245249
### Further optimized algorithms
246250

247251
Several algorithms exist that run faster than the presented dynamic programming approach. One of them is Hunt–Szymanski algorithm, which typically runs in <math-expression>O((n+r)\log(n))</math-expression> time (for <math-expression>n>m</math-expression>), where <math-expression>r</math-expression> is the number of matches between the two sequences. For problems with a bounded alphabet size, the Method of Four Russians can be used to reduce the running time of the dynamic programming algorithm by a logarithmic factor.

src/dataset/general_kb/data/Quicksort.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,7 @@ The quicksort algorithm was developed in 1959 by Tony Hoare while he was a visit
1212

1313
Quicksort gained widespread adoption, appearing, for example, in Unix as the default library sort subroutine. Hence, it lent its name to the C standard library subroutine qsort and in the reference implementation of Java.
1414

15-
Robert Sedgewick's PhD thesis in 1975 is considered a milestone in the study of Quicksort where he resolved many open problems related to the analysis of various pivot selection schemes including Samplesort, adaptive partitioning by Van Emden as well as derivation of expected number of comparisons and swaps. Jon Bentley and Doug McIlroy incorporated various improvements for use in programming libraries, including a technique to deal with equal elements and a pivot scheme known as pseudomedian of nine, where a sample of nine elements is divided into groups of three and then the median of the three medians from three groups is chosen. Bentley described another simpler and compact partitioning scheme in his book Programming Pearls that he attributed to Nico Lomuto. Later Bentley wrote that he used Hoare's version for years but never really understood it but Lomuto's version was simple enough to prove correct. Bentley described Quicksort as the "most beautiful code I had ever written" in the same essay. Lomuto's partition scheme was also popularized by the textbook Introduction to Algorithms although it is inferior to Hoare's scheme because it does three times more swaps on average and degrades to O(n) runtime when all elements are equal.
16-
17-
In 2009, Vladimir Yaroslavskiy proposed a new Quicksort implementation using two pivots instead of one. In the Java core library mailing lists, he initiated a discussion claiming his new algorithm to be superior to the runtime library's sorting method, which was at that time based on the widely used and carefully tuned variant of classic Quicksort by Bentley and McIlroy. Yaroslavskiy's Quicksort has been chosen as the new default sorting algorithm in Oracle's Java 7 runtime library after extensive empirical performance tests.
15+
Robert Sedgewick's PhD thesis in 1975 is considered a milestone in the study of Quicksort where he resolved many open problems related to the analysis of various pivot selection schemes including Samplesort, adaptive partitioning by Van Emden as well as derivation of expected number of comparisons and swaps. Jon Bentley and Doug McIlroy in 1993 incorporated various improvements for use in programming libraries, including a technique to deal with equal elements and a pivot scheme known as pseudomedian of nine, where a sample of nine elements is divided into groups of three and then the median of the three medians from three groups is chosen. Bentley described another simpler and compact partitioning scheme in his book Programming Pearls that he attributed to Nico Lomuto. Later Bentley wrote that he used Hoare's version for years but never really understood it but Lomuto's version was simple enough to prove correct. Bentley described Quicksort as the "most beautiful code I had ever written" in the same essay. Lomuto's partition scheme was also popularized by the textbook Introduction to Algorithms although it is inferior to Hoare's scheme because it does three times more swaps on average and degrades to O(n) runtime when all elements are equal. McIlroy would further produce anAntiQuicksort (aqsort) function in 1998, which consistently drives even his 1993 variant of Quicksort into quadratic behavior by producing adversarial data on-the-fly.
1816

1917
## Algorithm
2018

@@ -273,7 +271,7 @@ Another, less common, not-in-place, version of quicksort uses O(n) space for wor
273271

274272
Quicksort is a space-optimized version of the binary tree sort. Instead of inserting items sequentially into an explicit tree, quicksort organizes them concurrently into a tree that is implied by the recursive calls. The algorithms make exactly the same comparisons, but in a different order. An often desirable property of a sorting algorithm is stability – that is the order of elements that compare equal is not changed, allowing controlling order of multikey tables (e.g. directory or folder listings) in a natural way. This property is hard to maintain for in situ (or in place) quicksort (that uses only constant additional space for pointers and buffers, and O(log n) additional space for the management of explicit or implicit recursion). For variant quicksorts involving extra memory due to representations using pointers (e.g. lists or trees) or files (effectively lists), it is trivial to maintain stability. The more complex, or disk-bound, data structures tend to increase time cost, in general making increasing use of virtual memory or disk.
275273

276-
The most direct competitor of quicksort is heapsort. Heapsort's running time is O(n log n), but heapsort's average running time is usually considered slower than in-place quicksort. This result is debatable; some publications indicate the opposite. Introsort is a variant of quicksort that switches to heapsort when a bad case is detected to avoid quicksort's worst-case running time.
274+
The most direct competitor of quicksort is heapsort. Heapsort's running time is O(n log n), but heapsort's average running time is usually considered slower than in-place quicksort. This result is debatable; some publications indicate the opposite. Introsort is a variant of quicksort that switches to heapsort when a bad case is detected to avoid quicksort's worst-case running time. Major programming languages, such as C++ (in the GNU and LLVM implementations), use introsort.
277275

278276
Quicksort also competes with merge sort, another O(n log n) sorting algorithm. Mergesort is a stable sort, unlike standard in-place quicksort and heapsort, and has excellent worst-case performance. The main disadvantage of mergesort is that, when operating on arrays, efficient implementations require O(n) auxiliary space, whereas the variant of quicksort with in-place partitioning and tail recursion uses only O(log n) space.
279277

@@ -311,6 +309,8 @@ Also developed by Powers as an O(K) parallel PRAM algorithm. This is again a com
311309

312310
In any comparison-based sorting algorithm, minimizing the number of comparisons requires maximizing the amount of information gained from each comparison, meaning that the comparison results are unpredictable. This causes frequent branch mispredictions, limiting performance. BlockQuicksort rearranges the computations of quicksort to convert unpredictable branches to data dependencies. When partitioning, the input is divided into moderate-sized blocks (which fit easily into the data cache), and two arrays are filled with the positions of elements to swap. (To avoid conditional branches, the position is unconditionally stored at the end of the array, and the index of the end is incremented if a swap is needed.) A second pass exchanges the elements at the positions indicated in the arrays. Both loops have only one conditional branch, a test for termination, which is usually taken.
313311

312+
The BlockQuicksort technique is incorporated into LLVM's C++ STL implementation, libcxx, providing a 50% improvement on random integer sequences. Pattern-defeating quicksort (pdqsort), a version of introsort, also incorporates this technique.
313+
314314
#### Partial and incremental quicksort
315315

316316
Several variants of quicksort exist that separate the k smallest or largest elements from the rest of the input.

src/dataset/general_kb/data/Red–black__tree.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Tracking the color of each node requires only one bit of information per node be
1010

1111
In 1972, Rudolf Bayer invented a data structure that was a special order-4 case of a B-tree. These trees maintained all paths from root to leaf with the same number of nodes, creating perfectly balanced trees. However, they were not binary search trees. Bayer called them a "symmetric binary B-tree" in his paper and later they became popular as 2–3–4 trees or just 2–4 trees.
1212

13-
In a 1978 paper, "A Dichromatic Framework for Balanced Trees", Leonidas J. Guibas and Robert Sedgewick derived the red–black tree from the symmetric binary B-tree. The color "red" was chosen because it was the best-looking color produced by the color laser printer available to the authors while working at Xerox PARC. Another response from Guibas states that it was because of the red and black pens available to them to draw the trees. Author's name was Rudolf Bayer so he took the initials from his name that is R B and in colours, R means red and B means Black
13+
In a 1978 paper, "A Dichromatic Framework for Balanced Trees", Leonidas J. Guibas and Robert Sedgewick derived the red–black tree from the symmetric binary B-tree. The color "red" was chosen because it was the best-looking color produced by the color laser printer available to the authors while working at Xerox PARC. Another response from Guibas states that it was because of the red and black pens available to them to draw the trees.
1414

1515
In 1993, Arne Andersson introduced the idea of a right leaning tree to simplify insert and delete operations.
1616

src/dataset/general_kb/data/Travelling__salesman__problem.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ The most direct solution would be to try all permutations (ordered combinations)
136136

137137
One of the earliest applications of dynamic programming is the Held–Karp algorithm that solves the problem in time <math-expression>O(n^{2}2^{n})</math-expression>. This bound has also been reached by Exclusion-Inclusion in an attempt preceding the dynamic programming approach.
138138

139-
Improving these time bounds seems to be difficult. For example, it has not been determined whether a classical exact algorithm for TSP that runs in time <math-expression>O(1.9999^{n})</math-expression> exists.
139+
Improving these time bounds seems to be difficult. For example, it has not been determined whether a classical exact algorithm for TSP that runs in time <math-expression>O(1.9999^{n})</math-expression> exists. The currently best quantum exact algorithm for TSP due to Ambainis et al. runs in time <math-expression>O(1.728^{n})</math-expression>.
140140

141141
Other approaches include:
142142

0 commit comments

Comments
 (0)