haohaom1 · haohaom1 · Dec 5, 2018 · Dec 5, 2018
diff --git a/README.md b/README.md
@@ -1,73 +1,78 @@
-# Automated-Email
-> Building a email reader that can classify whether an email is important or not
-
-### Summary
-
-Every day, our department gets over 500 google alerts regarding Colby's alumni. Google alerts are emails that contain links to articles that match the name of our constituents. The task of this project is to build a classifier that can identify whether the person mentioned in the article is affiliated with Colby.
-
-I am currently able to achieve an accuracy of 92% with my model.
-
-The entire process can be broken down into 4 phases: Retrieving the email, scraping the website from the article, classifying the email based on features, and finally displaying it in a Graphical User Interface (GUI).
-
-### Retrieving the Email
-
-To connect Python to Gmail, I used Imaplib. All the methods I used to communicate to my inbox is in the EmailReader.py file
-
-### Web Scraping
-
-To collect the information about each article, I used the packages requests and BeautifulSoup to scrape information off of websites. Because some emails contain multiple links, I treat each link separately, and merge them in the Classification phase. All methods I used to scrape can be found in the scraper.py
-
-### Classification
-
-After retrieving the words from the article,  I devised three scoring metrics: 
-- Occupation Score
-- Occupation Score Adjusted
-- Colby Score. 
-
-### Frontend GUI
-
-I also built a GUI that displays the models:
-
-
-<div style="display: block; float: left">
-      <img src="visualization/GUI.png" width="400" height="250"> 
-</div>
-
-
-## Future Endeavors
-
-I have been doing personal studies on Natural language processin, which hopefully I can incorporate into this project to increase the accuaracy.
-
-#ChangeLogs
-
-Beta version completed
-
-Version 1.2.0
-
-WINDOWS VERSION
-
-- basic GUI completed
-- scoring accuracy of 92%
-- features 3 scoring metrics
-- original text words part of dataset
-- prelim analysis for text data from links, such as length of words, etc
-
-- changed double click to right click that allows user to access links
-- added pathlib as a dependency to change the paths from mac to windows
-- change naming convention for logs from colons (:) to periods (.)
-- Deleted SigAlarm because it doesn't work on windows, need to use threading instead
-
--FIXED BUG binds not working: set focus AFTER displaying graph
-
-- Changed Scoring Metric: words now have to be at least length 3
-
-
-Version 1.3.0
-10-18-18
-
-WINDOWS VERSION
-
-- decoded email id into string data type
-    - this will be encoded back to byte data later
-- made constituent id into int data type
-- Added option to choose to automate things and the threshold from GUI
+# Automated-Email
+> Building a email reader that can classify whether an email is important or not
+
+### Summary
+
+Every day, our department gets over 500 google alerts regarding Colby's alumni. Google alerts are emails that contain links to articles that match the name of our constituents. The task of this project is to build a classifier that can identify whether the person mentioned in the article is affiliated with Colby.
+
+I am currently able to achieve an accuracy of 92% with my model.
+
+The entire process can be broken down into 4 phases: Retrieving the email, scraping the website from the article, classifying the email based on features, and finally displaying it in a Graphical User Interface (GUI).
+
+### Retrieving the Email
+
+To connect Python to Gmail, I used Imaplib. All the methods I used to communicate to my inbox is in the EmailReader.py file
+
+### Web Scraping
+
+To collect the information about each article, I used the packages requests and BeautifulSoup to scrape information off of websites. Because some emails contain multiple links, I treat each link separately, and merge them in the Classification phase. All methods I used to scrape can be found in the scraper.py
+
+### Classification
+
+After retrieving the words from the article,  I devised three scoring metrics: 
+- Occupation Score
+- Occupation Score Adjusted
+- Colby Score. 
+
+### Frontend GUI
+
+I also built a GUI that displays the models:
+
+
+<div style="display: block; float: left">
+      <img src="visualization/GUI.png" width="400" height="250"> 
+</div>
+
+
+## Future Endeavors
+
+I have been doing personal studies on Natural language processin, which hopefully I can incorporate into this project to increase the accuaracy.
+
+#ChangeLogs
+
+Beta version completed
+
+12-5-18
+
+- Added sorting capabilities onto dataframe
+- changed master df to not have extra indexCol anymore
+
+Version 1.2.0
+
+WINDOWS VERSION
+
+- basic GUI completed
+- scoring accuracy of 92%
+- features 3 scoring metrics
+- original text words part of dataset
+- prelim analysis for text data from links, such as length of words, etc
+
+- changed double click to right click that allows user to access links
+- added pathlib as a dependency to change the paths from mac to windows
+- change naming convention for logs from colons (:) to periods (.)
+- Deleted SigAlarm because it doesn't work on windows, need to use threading instead
+
+-FIXED BUG binds not working: set focus AFTER displaying graph
+
+- Changed Scoring Metric: words now have to be at least length 3
+
+
+Version 1.3.0
+10-18-18
+
+WINDOWS VERSION
+
+- decoded email id into string data type
+    - this will be encoded back to byte data later
+- made constituent id into int data type
+- Added option to choose to automate things and the threshold from GUI
diff --git a/scraper.py b/scraper.py
@@ -25,7 +25,7 @@ def __init__(self):
 
         try:
             path = 'datasets/OrganizationRelationships_NickNamesAdded_5.24.2018.csv'
-            self.constituents_df = pd.read_csv(path, index_col=0, low_memory=False)
+            self.constituents_df = pd.read_csv(path, low_memory=False)
         except FileNotFoundError:
             warnings.warn('unable to find Constituents data. Please use set_constituents_path to locate the datafile')
 
@@ -433,7 +433,6 @@ def create_scores_data(self, df, label=None, split_up_links=False):
         df[['Occupation score', 'Occupation score adjusted', 'Colby score']] = pd.DataFrame(scores, index=df.index)
         df['constituent_id'] = constituent_id
 
-        print(df)
         print('finished adding scores')
 
         # if given a label (for training) add label as a column

diff --git a/tkinter-skeleton.py b/tkinter-skeleton.py
@@ -319,9 +319,15 @@ def setBindings(self):
         # binds the logs listbox to onselect
         self.logs_lbox.bind('<<ListboxSelect>>', self.onselect)
 
-        # binds double click to bottom table
+        # binds double click to doubleClick function
         self.root.bind('<Double-Button-1>', self.doubleClick)
 
+        # binds left click to handleLeftMouseClick function
+        self.root.bind('<Button-1>', self.handleLeftMouseClick)
+
+        # binds right click to mousebutton2 function
+        self.root.bind('<Button-3>', self.handleRightMouseClick)
+
         # binds control-e to switch the label of an element in the bottom table
         self.bottomFrame.bind('<Control-e>', self.switchLabel)
         self.bottomFrame.bind('<Control-w>', self.switchMovedState)
@@ -332,7 +338,6 @@ def handleQuit(self, event=None):
         print('Terminating')
         self.root.destroy()
 
-
 ################################ Build Tables and Graphs to Display ############################
 
     def buildBottomTable(self):
@@ -342,6 +347,8 @@ def buildBottomTable(self):
 
         # delete previous tables, if ones exist
         self.refreshFrame(self.bottomFrame)
+        # this isn't necessary, only to prevent left shifting of the main frames
+        self.refreshFrame(self.rightmainframe)
 
         self.tree = ttk.Treeview(self.bottomFrame)
 
@@ -376,11 +383,14 @@ def process_text(string, length=50, total_string_size=100):
             string = string[:total_string_size]
             return '\n'.join(textwrap.wrap(string, length)) + '...'
 
+        # dictionary that shortens the long names from the main to display them on the treeview
+        self.tree_column_shortened = {'first_name': 'first',
+                                       'last_name': 'last',
+                                       'time': 'date',
+                                       'constituent_id': 'id'}
+
         # preprocesses the dataframe
-        df = df.rename(index=int, columns={'first_name': 'first',
-                                           'last_name': 'last',
-                                           'time': 'date',
-                                           'constituent_id': 'id'})  # make the columns shorter
+        df = df.rename(index=int, columns=self.tree_column_shortened)  # make the columns shorter
         df['text'] = df['text'].apply(process_text)    # limits the characters in the text column
         df['confidence'] = df['confidence'].apply(lambda x: np.around(x, 3))    # rounds the confidence
         df['date'] = df['date'].apply(lambda x: x.split()[0])       # only shows the date and not the time
@@ -417,7 +427,6 @@ def buildScoresTable(self, curItem):
         '''
         passes in the current selected treeview row to display the score table
         '''
-        print(curItem)
         row_num = self.tree.item(curItem)['text']
         scores = self.df[['Occupation score', 'Occupation score adjusted', 'Colby score']].iloc[row_num]
 
@@ -538,7 +547,6 @@ def embedChart(self, master, fig=None, side=tk.TOP, legends=None, title=None):
     def onselect(self, event):
         w = event.widget
 
-
         # for when the logs listbox is selected
         if w == self.logs_lbox:
             try:
@@ -563,6 +571,7 @@ def onselect(self, event):
 
         # for when the bottom table is selected
         elif w == self.tree:
+
             self.refreshFrame(self.rightmainframe)
 
             # displays the scores graph
@@ -583,18 +592,50 @@ def onselect(self, event):
             # builds the label that displays the constituent's occupation
             self.buildOccsLabel(curItem)
 
+    def handleLeftMouseClick(self, event=None):
+        w = event.widget
+        x, y = event.x, event.y
+
+        # if the user left clicks the heading of a tree -> sort it by that order
+        if w == self.tree and self.tree.identify_region(x, y) == 'heading':
+            self.sort_dataframe(x, ascending=True)
 
-    def doubleClick(self, event):
+    def handleRightMouseClick(self, event=None):
         w = event.widget
+        x, y = event.x, event.y
 
-        if w == self.tree:
+        print(w, self.tree.identify_region(x, y))
+        # if the user left clicks the heading of a tree -> sort it by that order
+        if w == self.tree and self.tree.identify_region(x, y) == 'heading':
+            self.sort_dataframe(x, ascending=False)
+
+    def doubleClick(self, event=None):
+        w = event.widget
+        x, y = event.x, event.y
+
+        # Double click to access a link inside the bottom_tree widget
+        if w == self.tree and self.tree.identify_region(x, y) == 'cell':
             print('double clicked')
 
             # selects scores from the current row
             curItem = self.tree.focus()
 
             self.openUrl(curItem)
 
+
+    # sorts the dataframe by the column name
+    # prereq: the user clicked a heading on self.tree
+    def sort_dataframe(self, mouseX, ascending):
+        label = self.tree.heading(self.tree.identify_column(mouseX))['text']
+
+        # if a value is shortened, unshorten it by finding the key of the shortened dictionary
+        for k, v in self.tree_column_shortened.items():
+            if label == v:
+                label = k
+
+        self.df = self.df.sort_values([label], ascending=ascending).reset_index(drop=True)
+        self.buildBottomTable()
+
     # opens the url of the current item from
     def openUrl(self, curItem):
         row_num = self.tree.item(curItem)['text']