Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions SilverSolver/assignment5/tests.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import unittest
from unknown_alphabet import *

class TestUnknownAlphabet(unittest.TestCase):

def test_0(self):
self.assertEqual(unknown_alphabet([]), [])

def test_1(self):
self.assertEqual(unknown_alphabet(["a"]), ["a"])
self.assertEqual(unknown_alphabet(["1", "2", "3"]), ["1", "2", "3"])
self.assertFalse(unknown_alphabet(["1", "2", "3", "2"]))

def test_2(self):
self.assertEqual(unknown_alphabet(["1a", "1b", "a1", "ab"]), ["1", "a", "b"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests, we should merge them with @crossopt 's unittests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've already think about it


def test_3(self):
self.assertTrue(unknown_alphabet(["ART", "RAT", "CAT", "CAR"]) == ["A", "T", "R", "C"] or
unknown_alphabet(["ART", "RAT", "CAT", "CAR"]) == ["T", "A", "R", "C"])

if __name__ == '__main__':
unittest.main()
49 changes: 49 additions & 0 deletions SilverSolver/assignment5/unknown_alphabet.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
from collections import defaultdict

def dfs(ordered_vertices, edges, color, v):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dfs is seriously scary.

I think it needs at least some comments. What does it return and why?

My current guess is topsort order and a color array if it exists or None if the graph has a cycle, but why return them if the function modifies them anyway?

Also imho searching for cycles and topsorting in the same function causes unnecessary confusion. Maybe comments would help, but why not split this stuff into two functions anyway?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think we should look for cycles during the topsort, because they both require searching on graph. I liked the coloring, which helps detecting if it is possible or not, but not sure how to use it for finding maximal satisfying subset.

Also, comments would be great indeed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll add comments soon

if color[v] == 1:
return (ordered_vertices, None)
if color[v] == 2:
return (ordered_vertices, color)
color[v] = 1
if v in edges.keys():
for v_ in edges[v]:
ordered_vertices, color = dfs(ordered_vertices, edges, color, v_)
if v_ in edges.keys() and color == None:
return (ordered_vertices, None)
ordered_vertices.append(v)
color[v] = 2
return (ordered_vertices, color)

def topological_sort(edges, symbols):
ordered_vertices = []
N = len(symbols)
color = dict()
for v in symbols:
color[v] = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stuff like this could look cleaner as
color = dict.fromkeys(symbols, 0)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree

for v in symbols:
ordered_vertices, color = dfs(ordered_vertices, edges, color, v)
if color == None:
return False # False if there is no correct alphabet possible
residual_symbols = list(symbols.difference(set(ordered_vertices)))
ordered_vertices.extend(residual_symbols)
return ordered_vertices[::-1]

def unknown_alphabet(dictionary):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name of the function is not definitive. Something like "find_alphabet" might be better?

if len(dictionary) < 1:
return []
edges = defaultdict(set)
symbols = set()
symbols.update(dictionary[0])

for w1, w2 in zip(dictionary[:-1], dictionary[1:]):
for i in range(min(len(w1), len(w2))):
symbols.add(w1[i])
if w1[i] != w2[i]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, I do get the feeling that there's a lot going on in these functions.

Possibly splitting up some of the logic could help?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments here. Hope it will be more clear now.

edges[w1[i]].add(w2[i])
symbols.add(w2[i])
break
symbols.update(w2)

alphabet = topological_sort(edges, symbols)
return alphabet