-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathon-data-governance.html
129 lines (129 loc) · 16 KB
/
on-data-governance.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
<!doctype html><html lang=en>
<head>
<meta charset=utf-8>
<meta http-equiv=x-ua-compatible content="chrome=1">
<meta name=HandheldFriendly content="True">
<meta name=MobileOptimized content="320">
<meta name=viewport content="width=device-width,initial-scale=1">
<meta name=referrer content="no-referrer">
<meta name=description content="Adam Drake is an advisor to scale-up tech companies. He writes about ML/AI/crypto/data, leadership, and building tech teams.">
<title>
On Data Governance - Adam Drake
</title>
<link rel="shortcut icon" href=/static/favicon.ico>
<link rel=stylesheet href=https://adamdrake.com/sass/style.min.4b0d3fd52024283b14d542e540f013de2976b7a9ca4436a50d9555c6a678c3be.css integrity="sha256-Sw0/1SAkKDsU1ULlQPAT3il2t6nKRDalDZVVxqZ4w74=" crossorigin=anonymous media=screen>
<meta name=twitter:card content="summary_large_image">
<meta name=twitter:image content="https://adamdrake.com/static/images/twitter-card.jpg">
<meta name=twitter:title content="On Data Governance">
<meta name=twitter:description content="Overview The short version, is that Data Governance is all of the business concerns surrounding data. This means things like data quality, management, risks, and similar non-technical things. It’s comprised of the kind of concerns that business people would typically have surrounding data, although the quality topic is critically important for machine learning practitioners. For our purposes, I’ll group the concerns generally into quality, management, and risks. Each of those things can be pursued very deeply, but we need only an overview in order to make some key points.">
<meta property="og:title" content="On Data Governance">
<meta property="og:description" content="Overview The short version, is that Data Governance is all of the business concerns surrounding data. This means things like data quality, management, risks, and similar non-technical things. It’s comprised of the kind of concerns that business people would typically have surrounding data, although the quality topic is critically important for machine learning practitioners. For our purposes, I’ll group the concerns generally into quality, management, and risks. Each of those things can be pursued very deeply, but we need only an overview in order to make some key points.">
<meta property="og:type" content="article">
<meta property="og:url" content="https://adamdrake.com/on-data-governance.html"><meta property="og:image" content="https://adamdrake.com/static/images/twitter-card.jpg"><meta property="article:section" content="posts">
<meta property="article:published_time" content="2014-09-07T00:00:00+00:00">
<meta property="article:modified_time" content="2014-09-07T00:00:00+00:00">
</head>
<body>
<div class=title-box>
<div class=title-left>
<h1 class=name><a href=/>Adam Drake</a></h1>
</div>
<div class=title-right>
<div class=social-icons>
<a href=https://github.com/adamdrake>
<img class=icon src=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAEAAAABACAYAAACqaXHeAAAC5UlEQVR4nO2a0XXiMBBFKYESKIEOlg5CB3EHmw5wB9DBbgebDnAHTgemA9PB3Q8J1mtkaSSPjTnRPYePHIzem/FoJFtZrTKZTCaTyWQmBFgD78ABOAMNjzT2u4O9dv1s36OwQf8EakewUmo7xuskA9gAR6AdEXif1o65eXZ8XjDlqxm4KxGHZ8f5ALBlXKnHUrOUagAKpr3rQ7TAfgnBP5viOwd/Y94ksKzgb8wzHTAN7xlzPkTLHI0Rf7ffdZJUAleFwK52rK0de+e5tp46+NJn1HH92vGbK1DZz6f93P7uJ6zEsQsMJKycKvgN/tKvAr/dIyhRybU2WUNMMxWAUyDzlbrosBdfAvSrAFPKocZXqYr6/YQS0KL5AAV8BAQBWjXBsB/JKlRoCkr2+Q9NcCqQrS46KwKm/CXsVARlnnxLYZfx0wDZru9TIa5YX6E+ABq7Q8LdH2a8+x1fe4GvUkMolOnZ5r7DW6gXVBoizeQi6d5CN6fREAlRjQ8l2VuwD2iI5AR89wRcJhdJ9xbioiEiWW83o4XifW0EvioNIUkCitFC8b4kG7RKQ0iyETqPDyna11ng66QhJNlxwTKfBVS2wtKHoZoZDjGtH+kplI4f4GsJSYgM/ktTOOYcoMa+wdUEU/ZNhI9CU3zN44NHZRPze8DAH+BNQfvNjhWD/gMaj6vB/YQWcw7g2zCdMUfnwcrA3Okjsi4/xPju7zDmqoKWfwcWW8f3XS4I+oPVCe4+PVwlOqlJcL0cvb+Bxd8rPkbqSBHrJIF7Z1gKzIsbI/J1vk81Rcx9c65S/+89PKaM9zYZOxJKMiH4a0ySRzFwh34pa8QyT/Adg675flQcP4ZCSzfWpCsJDWbJ+9H5vCeMvezgO0Z3CE5qEsYNMd+cD4FpjN73Bglj+qgWE3wXTNd3VkPCWEN3fdp1fiyYJfDUT0TCOP3AT7zS/wyvVvcmWZGwN7cB3x64XivwTCaTyWQymVfgL42yxWFGEKJcAAAAAElFTkSuQmCC width=64 height=64>
</a>
<a href=https://twitter.com/aadrake>
<img class=icon src=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAEAAAABACAYAAACqaXHeAAACQ0lEQVR4nO2a0XWjMBBFXQIluIR0EDpIOjAdLB1kO0g6sDvIdgAd2B3gDkwHNx8jEkIMNmTGkvfMPUc//hjmPTTSIHm1chzHcRzHcRzHSQggAx6BlzA2wEPsvMwBcqBinCYYkl2I8WyRXKEe9Ct2BmwnhJ8z4mEQ46ln3lo7wTwELlQDrz7F72eI7zgBr8D74Pc37RxXwK73gEI59naB+DEOTJTHb5IcvqFCKW7+U8PvxIex0cqxS/QcW4W4jZL4rhyqvhka2rtEx9izcHsC1krih7RLc5pK9jDxwBMXtqaRmKWB+KO6+JDsvyse3gCbGTH/Kou3WQBDssWMRBrgz6VkDAywER+SzZDamkuFmPFjWqJsgJn4XsIaNVsh+/4L0y3vbMwNCCZcsxZEwVr4K7JtZXzvClOhtTagY49M3VMcnaPU1gZM9QEpsLM24C22wgsU1gZYta1a2J8Wke4sOJqLDwZkpLkW6B+A3JkJtz0sRUxIpRwONxU/MGKNNEXHiAYUMYQ/I81QN2I1Rcebiw8GrFn2ZaiN/tn/DBO0v+XnUkcT3zOhjiS+RfviY6EBGXE+j4vY2r+BHJTcak3YxdZ7FmQ2lMiMsDIj3p4/B+S2p9EWj+WhpwbILBheVGpQJy0e6Qu2BsIh4Zp/RM4Jl1xrX0NLzEanJzRH+v0K/boeI70pj3R+1ltdDeSxtU6CXJFpdoAtMsPu6w9QyMJXsuxz+ICcJ8SvcS2QLTAPo0RKphtl+P2+3rLjOI7jOI7jOP8tH5ahgbeYuZE9AAAAAElFTkSuQmCC width=64 height=64>
</a>
<a href=https://linkedin.com/in/aadrake>
<img class=icon src=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABoAAAAaCAYAAACpSkzOAAAAzElEQVRIie2WWxGDMBBFVwISkIAEnBQJOCoOKqE4AAeNg+Lg9AOY2VmSTls29If7eTPJmbubl8gioAEG/DQAF9ECOkeA1VUnya1GgPEA0CARs2dO2QLBi2RBAShU78pcoF6MgCkH6AlUClJ7QGKgFXYD7l6QFOhbBeYN9Hb3bkCRHqX8CaiNV6aAe0CV9Ra/IHIsfgapRYuIv7lt9iTSi7VmbHP+9oB0eR6peR6gj+adoBP0X9ART/l43OdkiZnzu9XZmjb4lnFkTSIiLyov4WUSpGLDAAAAAElFTkSuQmCC width=64 height=64>
</a>
</div>
<button class="subscribe subscribe-btn">
<a href=https://www.digitalmaneuver.com/#/portal>Subscribe to my newsletter</a>
</button>
</div>
</div>
<div class="nav-box row">
<div class=nav-left-menu>
<ul>
<li><a href=/>Latest</a> | </li>
<li><a href=/about.html>About</a> | </li>
<li><a href=/cases.html>Case Studies</a> | </li>
<li><a href=/contact.html>Contact</a> | </li>
<li><a href=/press.html>Press</a></li>
</ul>
</div>
</div>
<section class=section>
<div class=container>
<a href=https://applybyapi.com><button class=btn>Struggling to hire developers? Check out ApplyByAPI!</button></a>
<h1 class=page-title>On Data Governance</h1>
<h2 class=content-date>September 7, 2014</h2>
<div class=share-links>
Share this:
<a class=twitter-share-button href="https://twitter.com/intent/tweet?text=Read%20On%20Data%20Governance%20https%3a%2f%2fadamdrake.com%2fon-data-governance.html" onclick="return window.open(this.href,'twitter-share','width=550,height=235'),!1">
twitter
</a> //
<a class=icon-facebook href="https://www.facebook.com/sharer/sharer.php?u=https%3a%2f%2fadamdrake.com%2fon-data-governance.html" onclick="return window.open(this.href,'facebook-share','width=580,height=296'),!1">
facebook
</a> //
<a class=icon-linkedin href="https://www.linkedin.com/shareArticle?mini=true&url=https://adamdrake.com&title=On%20Data%20Governance&source=Adam%20Drake" onclick="return window.open(this.href,'linkedin-share','width=980,height=980'),!1">
linkedin
</a>
</div>
<div class=content>
<h3 id=overview class=anchor-link><a href=#overview>Overview</a></h3>
<p>The short version, is that Data Governance is all of the business concerns surrounding data. This means things like data quality, management, risks, and similar non-technical things. It’s comprised of the kind of concerns that business people would typically have surrounding data, although the quality topic is critically important for machine learning practitioners. For our purposes, I’ll group the concerns generally into quality, management, and risks. Each of those things can be pursued very deeply, but we need only an overview in order to make some key points.</p>
<h3 id=quality class=anchor-link><a href=#quality>Quality</a></h3>
<p>The quality of data starts at collection, and although it may be a contrarian or controversial point I maintain that all high-quality data needs to have a schema, even in transit. This means that things like sending around data in any kind of string format where serializing and deserializing is not enforced by a type system is counter-productive when it comes to having high-quality data. If data does not have a schema on collection, it is up to the entire software engineering organization to pass around data in consistent and reliable formats. Since this is not feasible, a schema is required. Besides, if your data has no schema, have you really evaluated the underlying problems and the data needed to solve those problems or are you just trying to collect everything in the hopes that someday it will be useful? That is a common anti-pattern and simply collecting all data should be avoided.</p>
<h3 id=management class=anchor-link><a href=#management>Management</a></h3>
<p>Data management topics are arguably the most important of all. Consider the impact of data spread in different places, or of people waiting for data access they need in order to work effectively. These are just two of the many concerns in the are of data management, but they’re often the two biggest Data Management problems that most companies face. In the case of data being in disparate places, this is the commonly-noted <em>data silo</em> problem that plagues many organizations. It often stems from the so-called <em>Conway’s Law</em> which effectively states that any organization producing any technical system will invariably produce a system that reflects the communication structure within the organization. This means that it’s very common for groups to replicate data for the sole purpose of further processing. Sometimes this is needed for scalability purposes, but oftentimes it’s simply a matter of overcoming an organizational problem with a technical solution (albeit not a very good one). The result is that data is locked away in silos, and oftentimes it’s the same data.</p>
<p>The additional problem is that since organizations have multiple teams developing information products that should be using the same data source, but are using different sources, there are discrepancies in the results of these information products. This results in massive additional cost for the organization due to less trust from customers, lost development time from hunting down esoteric bugs between systems processing similar data, and the unquantifiable loss resulting from reduced morale of employees who feel like they are working on a dysfunctional system. In reality, the system isn’t dysfunctional and could be improved greatly by simply eliminating redundant data sources. For data to be used well in an organization, the management of the data is perhaps the biggest burden. This is a post-collection and pre-product topic, so the bulk of data problems reside here.</p>
<h3 id=risks class=anchor-link><a href=#risks>Risks</a></h3>
<p>Some of the more subtle risks have already been mentioned, but more obvious ones include things like data that can be subpoenaed in the event of legal action. If unnecessary data is maintained by the organization, this data constitutes a risk in legal cases because it can be used to prove or discover actions with penalties not beneficial to the company.</p>
<p>In a similar light, there is the possibility that there could be some kind of data breach where data is unintentionally exposed to the public. In this case, as in the legal example, having unnecessary data is a risk to the company and its customers. If there is more customer data stored than needed to operate the business and its products, then this data is also at risk of being stolen or leaked.</p>
<p>There are also risks for low-quality data, and this goes along with the Data Management topic above. Low-quality data causes numerous problems for organizations including the aforementioned decrease in customer trust, reduction in employee morale, and lost development time, but it additionally can result in adoption of flawed strategies or initiatives. If the low-quality data is used to support business decisions then the consequences of those bad decisions can be directly attributed to the bad data used to make them.</p>
<h3 id=summary class=anchor-link><a href=#summary>Summary</a></h3>
<p>This is just a brief overview of some of the general topics encountered when considering data governance within an organization. Ultimately, the reasons for implementing more effective data governance are usually to further strategic goals of the company in terms of products or services, or reduce surface area for various kinds of data risk. Without some basic data governance in place, the challenges of using data effectively in an organization are often too great to overcome, resulting in frustration and failed efforts to become data-driven.</p>
</div>
</div>
</section>
<div class="nav-box row">
<div class=nav-left-menu>
<ul>
<li><a href=/>Latest</a> | </li>
<li><a href=/about.html>About</a> | </li>
<li><a href=/cases.html>Case Studies</a> | </li>
<li><a href=/contact.html>Contact</a> | </li>
<li><a href=/press.html>Press</a></li>
</ul>
</div>
</div>
<div class="footer-box row">
<div class="footer-left col-md-6 col-xs-12">
<div class="footer-bio content">
<p><strong>Adam Drake</strong> leads technical business transformations in global and multi-cultural environments. He has a passion for helping companies become more productive by improving internal leadership capabilities, and accelerating product development through technology and data architecture guidance. Adam has served as a White House Presidential Innovation Fellow and is an IEEE Senior Member.</p>
</div>
</div>
<div class=footer-right>
<div class=social-icons>
<a href=https://github.com/adamdrake>
<img class=icon src=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAEAAAABACAYAAACqaXHeAAAC5UlEQVR4nO2a0XXiMBBFKYESKIEOlg5CB3EHmw5wB9DBbgebDnAHTgemA9PB3Q8J1mtkaSSPjTnRPYePHIzem/FoJFtZrTKZTCaTyWQmBFgD78ABOAMNjzT2u4O9dv1s36OwQf8EakewUmo7xuskA9gAR6AdEXif1o65eXZ8XjDlqxm4KxGHZ8f5ALBlXKnHUrOUagAKpr3rQ7TAfgnBP5viOwd/Y94ksKzgb8wzHTAN7xlzPkTLHI0Rf7ffdZJUAleFwK52rK0de+e5tp46+NJn1HH92vGbK1DZz6f93P7uJ6zEsQsMJKycKvgN/tKvAr/dIyhRybU2WUNMMxWAUyDzlbrosBdfAvSrAFPKocZXqYr6/YQS0KL5AAV8BAQBWjXBsB/JKlRoCkr2+Q9NcCqQrS46KwKm/CXsVARlnnxLYZfx0wDZru9TIa5YX6E+ABq7Q8LdH2a8+x1fe4GvUkMolOnZ5r7DW6gXVBoizeQi6d5CN6fREAlRjQ8l2VuwD2iI5AR89wRcJhdJ9xbioiEiWW83o4XifW0EvioNIUkCitFC8b4kG7RKQ0iyETqPDyna11ng66QhJNlxwTKfBVS2wtKHoZoZDjGtH+kplI4f4GsJSYgM/ktTOOYcoMa+wdUEU/ZNhI9CU3zN44NHZRPze8DAH+BNQfvNjhWD/gMaj6vB/YQWcw7g2zCdMUfnwcrA3Okjsi4/xPju7zDmqoKWfwcWW8f3XS4I+oPVCe4+PVwlOqlJcL0cvb+Bxd8rPkbqSBHrJIF7Z1gKzIsbI/J1vk81Rcx9c65S/+89PKaM9zYZOxJKMiH4a0ySRzFwh34pa8QyT/Adg675flQcP4ZCSzfWpCsJDWbJ+9H5vCeMvezgO0Z3CE5qEsYNMd+cD4FpjN73Bglj+qgWE3wXTNd3VkPCWEN3fdp1fiyYJfDUT0TCOP3AT7zS/wyvVvcmWZGwN7cB3x64XivwTCaTyWQymVfgL42yxWFGEKJcAAAAAElFTkSuQmCC width=64 height=64>
</a>
<a href=https://twitter.com/aadrake>
<img class=icon src=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAEAAAABACAYAAACqaXHeAAACQ0lEQVR4nO2a0XWjMBBFXQIluIR0EDpIOjAdLB1kO0g6sDvIdgAd2B3gDkwHNx8jEkIMNmTGkvfMPUc//hjmPTTSIHm1chzHcRzHcRzHSQggAx6BlzA2wEPsvMwBcqBinCYYkl2I8WyRXKEe9Ct2BmwnhJ8z4mEQ46ln3lo7wTwELlQDrz7F72eI7zgBr8D74Pc37RxXwK73gEI59naB+DEOTJTHb5IcvqFCKW7+U8PvxIex0cqxS/QcW4W4jZL4rhyqvhka2rtEx9izcHsC1krih7RLc5pK9jDxwBMXtqaRmKWB+KO6+JDsvyse3gCbGTH/Kou3WQBDssWMRBrgz6VkDAywER+SzZDamkuFmPFjWqJsgJn4XsIaNVsh+/4L0y3vbMwNCCZcsxZEwVr4K7JtZXzvClOhtTagY49M3VMcnaPU1gZM9QEpsLM24C22wgsU1gZYta1a2J8Wke4sOJqLDwZkpLkW6B+A3JkJtz0sRUxIpRwONxU/MGKNNEXHiAYUMYQ/I81QN2I1Rcebiw8GrFn2ZaiN/tn/DBO0v+XnUkcT3zOhjiS+RfviY6EBGXE+j4vY2r+BHJTcak3YxdZ7FmQ2lMiMsDIj3p4/B+S2p9EWj+WhpwbILBheVGpQJy0e6Qu2BsIh4Zp/RM4Jl1xrX0NLzEanJzRH+v0K/boeI70pj3R+1ltdDeSxtU6CXJFpdoAtMsPu6w9QyMJXsuxz+ICcJ8SvcS2QLTAPo0RKphtl+P2+3rLjOI7jOI7jOP8tH5ahgbeYuZE9AAAAAElFTkSuQmCC width=64 height=64>
</a>
<a href=https://linkedin.com/in/aadrake>
<img class=icon src=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABoAAAAaCAYAAACpSkzOAAAAzElEQVRIie2WWxGDMBBFVwISkIAEnBQJOCoOKqE4AAeNg+Lg9AOY2VmSTls29If7eTPJmbubl8gioAEG/DQAF9ECOkeA1VUnya1GgPEA0CARs2dO2QLBi2RBAShU78pcoF6MgCkH6AlUClJ7QGKgFXYD7l6QFOhbBeYN9Hb3bkCRHqX8CaiNV6aAe0CV9Ra/IHIsfgapRYuIv7lt9iTSi7VmbHP+9oB0eR6peR6gj+adoBP0X9ART/l43OdkiZnzu9XZmjb4lnFkTSIiLyov4WUSpGLDAAAAAElFTkSuQmCC width=64 height=64>
</a>
</div>
<button class="subscribe subscribe-btn">
<a href=https://www.digitalmaneuver.com/#/portal>Subscribe to my newsletter</a>
</button>
</div>
</div>
<div class="container has-text-centered footer-copyright">
</div>
</body>