Skip to content

Commit d572eeb

Browse files
alambwesmadriangbakurmustafazhuqi-lucas
authored
Add explicit PMC/committers list to governance docs page (#17574)
* Add committers explicitly to governance page, with script * add license header * Update Wes McKinney's affiliation in governance.md * Update adriangb's affiliation * Update affiliation * Andy Grove Affiliation * Update Qi Zhu affiliation * Updatd linwei's info * Update docs/source/contributor-guide/governance.md * Update docs/source/contributor-guide/governance.md * Apply suggestions from code review Co-authored-by: Oleks V <comphead@users.noreply.github.com> Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> * Apply suggestions from code review Co-authored-by: Alex Huang <huangweijun1001@gmail.com> Co-authored-by: Yang Jiang <jiangyang381@163.com> Co-authored-by: Yongting You <2010youy01@gmail.com> * Apply suggestions from code review Co-authored-by: Yijie Shen <henry.yijieshen@gmail.com> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Brent Gardner <bgardner@squarelabs.net> Co-authored-by: Dmitrii Blaginin <github@blaginin.me> Co-authored-by: Jax Liu <liugs963@gmail.com> Co-authored-by: Ifeanyi Ubah <ify1992@yahoo.com> * Apply suggestions from code review Co-authored-by: Will Jones <willjones127@gmail.com> * Clarify what is updated in the script * Apply suggestions from code review Co-authored-by: Paddy Horan <5733408+paddyhoran@users.noreply.github.com> Co-authored-by: Dan Harris <1327726+thinkharderdev@users.noreply.github.com> * Update docs/source/contributor-guide/governance.md * Update docs/source/contributor-guide/governance.md Co-authored-by: Parth Chandra <parthc@apache.org> * Update docs/source/contributor-guide/governance.md * prettier --------- Co-authored-by: Wes McKinney <wesm@apache.org> Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com> Co-authored-by: Mustafa Akur <akurmustafa@gmail.com> Co-authored-by: Qi Zhu <821684824@qq.com> Co-authored-by: 张林伟 <lewiszlw520@gmail.com> Co-authored-by: xudong.w <wxd963996380@gmail.com> Co-authored-by: Oleks V <comphead@users.noreply.github.com> Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Co-authored-by: Alex Huang <huangweijun1001@gmail.com> Co-authored-by: Yang Jiang <jiangyang381@163.com> Co-authored-by: Yongting You <2010youy01@gmail.com> Co-authored-by: Yijie Shen <henry.yijieshen@gmail.com> Co-authored-by: Brent Gardner <bgardner@squarelabs.net> Co-authored-by: Dmitrii Blaginin <github@blaginin.me> Co-authored-by: Jax Liu <liugs963@gmail.com> Co-authored-by: Ifeanyi Ubah <ify1992@yahoo.com> Co-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: Paddy Horan <5733408+paddyhoran@users.noreply.github.com> Co-authored-by: Dan Harris <1327726+thinkharderdev@users.noreply.github.com> Co-authored-by: Ruihang Xia <waynestxia@gmail.com> Co-authored-by: Parth Chandra <parthc@apache.org>
1 parent b9517a1 commit d572eeb

File tree

2 files changed

+344
-4
lines changed

2 files changed

+344
-4
lines changed
Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
#!/usr/bin/env python3
2+
3+
# Licensed to the Apache Software Foundation (ASF) under one
4+
# or more contributor license agreements. See the NOTICE file
5+
# distributed with this work for additional information
6+
# regarding copyright ownership. The ASF licenses this file
7+
# to you under the Apache License, Version 2.0 (the
8+
# "License"); you may not use this file except in compliance
9+
# with the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing,
14+
# software distributed under the License is distributed on an
15+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16+
# KIND, either express or implied. See the License for the
17+
# specific language governing permissions and limitations
18+
# under the License.
19+
20+
21+
"""
22+
Utility for updating the committer list in the governance documentation
23+
by reading from the Apache DataFusion phonebook and combining with existing data.
24+
"""
25+
26+
import re
27+
import requests
28+
import sys
29+
import os
30+
from typing import Dict, List, NamedTuple, Set
31+
32+
33+
class Committer(NamedTuple):
34+
name: str
35+
apache: str
36+
github: str
37+
affiliation: str
38+
role: str
39+
40+
41+
# Return (pmc, committers) each a dictionary like
42+
# key: apache id
43+
# value: Real name
44+
45+
def get_asf_roster():
46+
"""Get the current roster from Apache phonebook."""
47+
# See https://home.apache.org/phonebook-about.html
48+
committers_url = "https://whimsy.apache.org/public/public_ldap_projects.json"
49+
50+
# people https://whimsy.apache.org/public/public_ldap_people.json
51+
people_url = "https://whimsy.apache.org/public/public_ldap_people.json"
52+
53+
try:
54+
r = requests.get(committers_url)
55+
r.raise_for_status()
56+
j = r.json()
57+
proj = j['projects']['datafusion']
58+
59+
# Get PMC members and committers
60+
pmc_ids = set(proj['owners'])
61+
committer_ids = set(proj['members']) - pmc_ids
62+
63+
except Exception as e:
64+
print(f"Error fetching ASF roster: {e}")
65+
return set(), set()
66+
67+
# Fetch people to get github handles and affiliations
68+
#
69+
# The data looks like this:
70+
# {
71+
# "lastCreateTimestamp": "20250913131506Z",
72+
# "people_count": 9932,
73+
# "people": {
74+
# "a_budroni": {
75+
# "name": "Alessandro Budroni",
76+
# "createTimestamp": "20160720223917Z"
77+
# },
78+
# ...
79+
# }
80+
try:
81+
r = requests.get(people_url)
82+
r.raise_for_status()
83+
j = r.json()
84+
people = j['people']
85+
86+
# make a dictionary with each pmc_id and value their real name
87+
pmcs = {p: people[p]['name'] for p in pmc_ids}
88+
committers = {c: people[c]['name'] for c in committer_ids}
89+
90+
except Exception as e:
91+
print(f"Error fetching ASF people: {e}")
92+
93+
94+
return pmcs, committers
95+
96+
97+
98+
def parse_existing_table(content: str) -> List[Committer]:
99+
"""Parse the existing committer table from the markdown content."""
100+
committers = []
101+
102+
# Find the table between the markers
103+
start_marker = "<!-- Begin Auto-Generated Committer List -->"
104+
end_marker = "<!-- End Auto-Generated Committer List -->"
105+
106+
start_idx = content.find(start_marker)
107+
end_idx = content.find(end_marker)
108+
109+
if start_idx == -1 or end_idx == -1:
110+
return committers
111+
112+
table_content = content[start_idx:end_idx]
113+
114+
# Parse table rows (skip header and separator)
115+
lines = table_content.split('\n')
116+
for line in lines:
117+
line = line.strip()
118+
if line.startswith('|') and '---' not in line and line.count('|') >= 4:
119+
# Split by | and clean up
120+
parts = [part.strip() for part in line.split('|')]
121+
if len(parts) >= 5:
122+
name = parts[1].strip()
123+
apache = parts[2].strip()
124+
github = parts[3].strip()
125+
affiliation = parts[4].strip()
126+
role = parts[5].strip()
127+
128+
if name and name != 'Name' and (not '-----' in name):
129+
committers.append(Committer(name, apache, github, affiliation, role))
130+
131+
return committers
132+
133+
134+
def generate_table_row(committer: Committer) -> str:
135+
"""Generate a markdown table row for a committer."""
136+
github_link = f"[{committer.github}](https://github.com/{committer.github})"
137+
return f"| {committer.name:<23} | {committer.apache:<39} |{committer.github:<39} | {committer.affiliation:<11} | {committer.role:<9} |"
138+
139+
140+
def sort_committers(committers: List[Committer]) -> List[Committer]:
141+
"""Sort committers by role ('PMC Chair', PMC, Committer) then by apache id."""
142+
role_order = {'PMC Chair': 0, 'PMC': 1, 'Committer': 2}
143+
144+
return sorted(committers, key=lambda c: (role_order.get(c.role, 3), c.apache.lower()))
145+
146+
147+
def update_governance_file(file_path: str):
148+
"""Update the governance file with the latest committer information."""
149+
try:
150+
with open(file_path, 'r') as f:
151+
content = f.read()
152+
except FileNotFoundError:
153+
print(f"Error: File {file_path} not found")
154+
return False
155+
156+
# Parse existing committers
157+
existing_committers = parse_existing_table(content)
158+
print(f"Found {len(existing_committers)} existing committers")
159+
160+
# Get ASF roster
161+
asf_pmcs, asf_committers = get_asf_roster()
162+
print(f"Found {len(asf_pmcs)} PMCs and {len(asf_committers)} committers in ASF roster")
163+
164+
165+
# Create a map of existing committers by apache id
166+
existing_by_apache = {c.apache: c for c in existing_committers}
167+
168+
# Update the entries based on the ASF roster
169+
updated_committers = []
170+
for apache_id, name in {**asf_pmcs, **asf_committers}.items():
171+
role = 'PMC' if apache_id in asf_pmcs else 'Committer'
172+
if apache_id in existing_by_apache:
173+
existing = existing_by_apache[apache_id]
174+
# Preserve PMC Chair role if already set
175+
if existing.role == 'PMC Chair':
176+
role = 'PMC Chair'
177+
updated_committers.append(Committer(
178+
name=existing.name,
179+
apache=apache_id,
180+
github=existing.github,
181+
affiliation=existing.affiliation,
182+
role=role
183+
))
184+
# add a new entry for new committers with placeholder values
185+
else:
186+
print(f"New entry found: {name} ({apache_id})")
187+
# Placeholder github and affiliation
188+
updated_committers.append(Committer(
189+
name=name,
190+
apache=apache_id,
191+
github="", # user should update
192+
affiliation="", # User should update
193+
role=role
194+
))
195+
196+
197+
# Sort the committers
198+
sorted_committers = sort_committers(updated_committers)
199+
200+
# Generate new table
201+
table_lines = [
202+
"| Name | Apache ID | github | Affiliation | Role |",
203+
"|-------------------------|-----------|----------------------------|-------------|-----------|"
204+
]
205+
206+
for committer in sorted_committers:
207+
table_lines.append(generate_table_row(committer))
208+
209+
new_table = '\n'.join(table_lines)
210+
211+
# Replace the table in the content
212+
start_marker = "<!-- Begin Auto-Generated Committer List -->"
213+
end_marker = "<!-- End Auto-Generated Committer List -->"
214+
215+
start_idx = content.find(start_marker)
216+
end_idx = content.find(end_marker)
217+
218+
if start_idx == -1 or end_idx == -1:
219+
print("Error: Could not find table markers in file")
220+
return False
221+
222+
# Find the end of the start marker line
223+
start_line_end = content.find('\n', start_idx) + 1
224+
225+
new_content = (
226+
content[:start_line_end] +
227+
new_table + '\n' +
228+
content[end_idx:]
229+
)
230+
231+
# Write back to file
232+
try:
233+
with open(file_path, 'w') as f:
234+
f.write(new_content)
235+
print(f"Successfully updated {file_path}")
236+
return True
237+
except Exception as e:
238+
print(f"Error writing file: {e}")
239+
return False
240+
241+
242+
def main():
243+
"""Main function."""
244+
# Default path to governance file
245+
script_dir = os.path.dirname(os.path.abspath(__file__))
246+
repo_root = os.path.dirname(script_dir)
247+
governance_file = os.path.join(repo_root, "source", "contributor-guide", "governance.md")
248+
249+
if len(sys.argv) > 1:
250+
governance_file = sys.argv[1]
251+
252+
if not os.path.exists(governance_file):
253+
print(f"Error: Governance file not found at {governance_file}")
254+
sys.exit(1)
255+
256+
print(f"Updating committer list in {governance_file}")
257+
258+
if update_governance_file(governance_file):
259+
print("Committer list updated successfully")
260+
else:
261+
print("Failed to update committer list")
262+
sys.exit(1)
263+
264+
265+
if __name__ == "__main__":
266+
main()

docs/source/contributor-guide/governance.md

Lines changed: 78 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,6 @@
1919

2020
# Governance
2121

22-
The current PMC and committers are listed in the [Apache Phonebook].
23-
24-
[apache phonebook]: https://projects.apache.org/committee.html?datafusion
25-
2622
## Overview
2723

2824
DataFusion is part of the [Apache Software Foundation] and is governed following
@@ -38,6 +34,84 @@ As much as practicable, we strive to make decisions by consensus, and anyone in
3834
the community is encouraged to propose ideas, start discussions, and contribute
3935
to the project.
4036

37+
## People
38+
39+
DataFusion is currently governed by the following individuals
40+
41+
<!--
42+
43+
The following table can be updated by running the following script:
44+
45+
docs/scripts/update_committer_list.py
46+
47+
Notes:
48+
49+
* The script only updates the Name and Apache ID columns. The rest of the data
50+
is manually provided.
51+
52+
-->
53+
54+
<!-- Begin Auto-Generated Committer List -->
55+
56+
| Name | Apache ID | github | Affiliation | Role |
57+
| ----------------------- | ---------------- | ------------------------------------------------------- | -------------- | --------- |
58+
| Andrew Lamb | alamb | [alamb](https://github.com/alamb) | InfluxData | PMC Chair |
59+
| Andrew Grove | agrove | [andygrove](https://github.com/andygrove) | Apple | PMC |
60+
| Mustafa Akur | akurmustafa | [akurmustafa](https://github.com/akurmustafa) | OHSU | PMC |
61+
| Berkay Şahin | berkay | [berkaysynnada](https://github.com/berkaysynnada) | Synnada | PMC |
62+
| Oleksandr Voievodin | comphead | [comphead](https://github.com/comphead) | Apple | PMC |
63+
| Daniël Heres | dheres | [Dandandan](https://github.com/Dandandan) | | PMC |
64+
| QP Hou | houqp | [houqp](https://github.com/houqp) | | PMC |
65+
| Jie Wen | jackwener | [jakevin](https://github.com/jackwener) | | PMC |
66+
| Jay Zhan | jayzhan | [jayzhan211](https://github.com/jayzhan211) | | PMC |
67+
| Jonah Gao | jonah | [jonahgao](https://github.com/jonahgao) | | PMC |
68+
| Kun Liu | liukun | [liukun4515](https://github.com/liukun4515) | | PMC |
69+
| Mehmet Ozan Kabak | ozankabak | [ozankabak](https://github.com/ozankabak) | Synnada, Inc | PMC |
70+
| Tim Saucer | timsaucer | [timsaucer](https://github.com/timsaucer) | | PMC |
71+
| L. C. Hsieh | viirya | [viirya](https://github.com/viirya) | Databricks | PMC |
72+
| Ruihang Xia | wayne | [waynexia](https://github.com/waynexia) | Greptime | PMC |
73+
| Wes McKinney | wesm | [wesm](https://github.com/wesm) | Posit | PMC |
74+
| Will Jones | wjones127 | [wjones127](https://github.com/wjones127) | LanceDB | PMC |
75+
| Xudong Wang | xudong963 | [xudong963](https://github.com/xudong963) | Polygon.io | PMC |
76+
| Adrian Garcia Badaracco | adriangb | [adriangb](https://github.com/adriangb) | Pydantic | Committer |
77+
| Brent Gardner | avantgardner | [avantgardnerio](https://github.com/avantgardnerio) | Coralogix | Committer |
78+
| Dmitrii Blaginin | blaginin | [blaginin](https://github.com/blaginin) | SpiralDB | Committer |
79+
| Piotr Findeisen | findepi | [findepi](https://github.com/findepi) | dbt Labs | Committer |
80+
| Jax Liu | goldmedal | [goldmedal](https://github.com/goldmedal) | Canner | Committer |
81+
| Huaxin Gao | huaxingao | [huaxingao](https://github.com/huaxingao) | | Committer |
82+
| Ifeanyi Ubah | iffyio | [iffyio](https://github.com/iffyio) | Validio | Committer |
83+
| Jeffrey Vo | jeffreyvo | [Jefffrey](https://github.com/Jefffrey) | | Committer |
84+
| Liu Jiayu | jiayuliu | [jimexist](https://github.com/jimexist) | | Committer |
85+
| Ruiqiu Cao | kamille | [Rachelint](https://github.com/Rachelint) | Tencent | Committer |
86+
| Kazuyuki Tanimura | kazuyukitanimura | [kazuyukitanimura](https://github.com/kazuyukitanimura) | | Committer |
87+
| Eduard Karacharov | korowa | [korowa](https://github.com/korowa) | | Committer |
88+
| Siew Kam Onn | kosiew | [kosiew](https://github.com/kosiew) | | Committer |
89+
| Lewis Zhang | linwei | [lewiszlw](https://github.com/lewiszlw) | diit.cn | Committer |
90+
| Matt Butrovich | mbutrovich | [mbutrovich](https://github.com/mbutrovich) | Apple | Committer |
91+
| Metehan Yildirim | mete | [metegenez](https://github.com/metegenez) | | Committer |
92+
| Marko Milenković | milenkovicm | [milenkovicm](https://github.com/milenkovicm) | | Committer |
93+
| Wang Mingming | mingmwang | [mingmwang](https://github.com/mingmwang) | | Committer |
94+
| Michael Ward | mjward | [Michael-J-Ward ](https://github.com/Michael-J-Ward) | | Committer |
95+
| Marco Neumann | mneumann | [crepererum](https://github.com/crepererum) | InfluxData | Committer |
96+
| Zhong Yanghong | nju_yaho | [yahoNanJing](https://github.com/yahoNanJing) | | Committer |
97+
| Paddy Horan | paddyhoran | [paddyhoran](https://github.com/paddyhoran) | Assured Allies | Committer |
98+
| Parth Chandra | parthc | [parthchandra](https://github.com/parthchandra) | Apple | Committer |
99+
| Rémi Dettai | rdettai | [rdettai](https://github.com/rdettai) | | Committer |
100+
| Chao Sun | sunchao | [sunchao](https://github.com/sunchao) | OpenAI | Committer |
101+
| Daniel Harris | thinkharderdev | [thinkharderdev](https://github.com/thinkharderdev) | Coralogix | Committer |
102+
| Raphael Taylor-Davies | tustvold | [tustvold](https://github.com/tustvold) | | Committer |
103+
| Weijun Huang | weijun | [Weijun-H](https://github.com/Weijun-H) | OrbDB | Committer |
104+
| Yang Jiang | yangjiang | [Ted-jiang](https://github.com/Ted-jiang) | Ebay | Committer |
105+
| Yijie Shen | yjshen | [yjshen](https://github.com/yjshen) | DataPelago | Committer |
106+
| Yongting You | ytyou | [2010YOUY01](https://github.com/2010YOUY01) | Independent | Committer |
107+
| Qi Zhu | zhuqi | [zhuqi-lucas](https://github.com/zhuqi-lucas) | Polygon.io | Committer |
108+
109+
<!-- End Auto-Generated Committer List -->
110+
111+
Note that the authoritative list of PMC and committers is the [Apache Phonebook]
112+
113+
[apache phonebook]: https://projects.apache.org/committee.html?datafusion
114+
41115
## Roles
42116

43117
- **Contributors**: Anyone who contributes to the project, whether it be code,

0 commit comments

Comments
 (0)