-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature audit based on logs history #3267
Comments
I think the logs within the Google cloud console should show frequency of urls like /tree, /rstudio, /lab, etc. They would also have gitpuller URLs, syncthing, desktop. Basically features in the URL can be tracked. However course info is not in the URL, except in cases where the course info happens to be in a git repo name. The latter don't follow any strict format though. Instructors can name their repos however they like. |
@ryanlovett Amazing to know that feature-related metrics can be tracked. Can understand the complexity related to retrieving course-related information. I will follow up more on ways to retrieve feature-related information in the Google Cloud console |
Next Steps from March Sprint Planning Meeting:
|
@felder - During your free time, Can you please let me know whether we can get analytics data to answer the below questions? It would help build a narrative around Datahub's value proposition.
If instructor-level data is not available, how would you like these questions to be framed so that we can get data that are closely relevant to the question being asked? |
@felder Sharing the context from the conversation which happened between us in Slack, Instructor + Course-specific data cannot be retrieved with the GCP logs that are stored currently. It would require us to figure out another mechanism to fetch the data (possibly nbgitpuller links). I will set up some time with you the week after to figure out the near-term scope of the data to be retrieved and options we can explore to answer the highlighted questions in the longer run. |
Qualitative Insights based on preliminary log analysis from the last 30 days:
|
@balajialg Interesting, thanks! Do you know what admin is being used for? It is just to view the list of people or is it being used to stop/start servers too? |
@ryanlovett I searched ``resource.type="k8s_container" resource.labels.cluster_name="fall-2019" resource.labels.namespace_name= in the log explorer to see how many users actively click on the option "access server" to access other users' hub instances. Apparently, I could see that almost all hub users noted in the above comment access other users' instances. Let me know if querying "oauth2/authorize?" as part of the text payload is the right search query to retrieve users clicking on access server options. Here is the link to the gcp log explorer with the search query |
@balajialg i'm not sure but maybe Another way is to look at just the hub logs and look for uses of the admin panel there by URL, with |
The other point is that these are only logs across last 30 days, so we can't make inferences about longer term usage patterns. We can start saving the nginx logs too though, and make that happen. |
@yuvipanda Completely agree with you! I am looking at the above points as potential hypotheses to explore possible trends with the long-term log data. My other hypothesis is that except for a few variations, this data should highly correlate with the long-term data (considering this is a snapshot from mid-semester). But I can be completely wrong about this point. Searching for hub logs - I am seeing entries in the logs for all hubs which I am not able to make sense of. Should I interpret these logs as admin access features that got widely used by instructors/GSIs across hubs over the past month or some of these logs are configuration-based and did not get logged due to a user action? Check the log results here |
@balajialg what log lines do you get when you access admin hub yourself? Basically we need to look at that and derive regexes and filters from that info. Some post processing may also be needed. Everything with |
I think a basic process should be to:
|
@yuvipanda Looked at the hub logs by searching for the Thanks for detailing the process! I am spending a lot of time fine-tuning the search query (learning regex on the side) to ensure that the search results only show up for that particular search query. It is time-intensive. Is the Nginx log structure similar to the current logs? or would it require fine-tuning the search query once more based on the resulting logs? |
Summary
Thanks to @yuvipanda's nudge, I started working on the feature matrix document to segment our instructors based on the type of features they use. My initial understanding is that we can classify the instructors into three archetypes - instructors using foundational/intermediary or complex use cases. I also spent some time mapping which features map to which user archetypes in the doc. Open to the team's input on whether the classification makes sense.
I would like to get the team's input on whether it is possible to retrieve metrics around usage for a particular feature? (Thanks @ryanlovett for nudging me to think in this direction). Just like the python popularity dashboard which tracks the usage of python libraries, Is it possible for us to track the most used features by our instructors? We can use this information during the semester onboarding to tailor the feature demo based on their prior usage
Feature List
File Management
User Management
Application
3rd party libraries
High-Performance Computing
User Stories
Tasks
The text was updated successfully, but these errors were encountered: