Skip to content

Commit

Permalink
seperate pyfiles and optimize group cases
Browse files Browse the repository at this point in the history
  • Loading branch information
autolordz committed Sep 20, 2019
1 parent 130f828 commit 122b75e
Show file tree
Hide file tree
Showing 11 changed files with 871 additions and 788 deletions.
57 changes: 16 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,26 +3,10 @@
> * 法院法务自动化批量生成邮寄单据-Legal agency postal notes automatically generate app
> * 给予法务邮递人员从法务OA数据表(excel)和公开的判决书(docx)提取当事人地址内容,批量直接生成邮单。 减轻相关员负担,尤其系列案,人员多地址多,手工输入地址重复性劳动太多,信息容易错漏


[![](https://img.shields.io/github/release/autolordz/docx-content-modify.svg?style=popout&logo=github&colorB=ff69b4)](https://github.com/autolordz/docx-content-modify/releases)
[![](https://img.shields.io/badge/github-source-orange.svg?style=popout&logo=github)](https://github.com/autolordz/docx-content-modify)
[![](https://img.shields.io/github/license/autolordz/docx-content-modify.svg?style=popout&logo=github)](https://github.com/autolordz/docx-content-modify/blob/master/LICENSE)

## 目录

<!-- MarkdownTOC autoanchor="true" autolink="true" uri_encoding="false" -->

- [环境](#环境)
- [更新](#更新)
- [内容](#内容)
- [规则](#规则)
- [详细指南](#详细指南)
- [Licence](#licence)

<!-- /MarkdownTOC -->

<a id="环境"></a>
## 环境

> * conda : 4.6.14
Expand All @@ -31,18 +15,17 @@
> * 组件: python-docx,pandas,StyleFrame,configparser
> * 打包程序: pyinstaller
<a id="更新"></a>
## 更新

【2019-6-19】
> * 添加合并系列案功能,节省打印资源
【2019-9-19】

【2019-6-12】
> * 整理合并系列案功能,优化代码
> * 更新判决书过滤词汇
【2019-6-19】

> * 添加合并系列案功能,节省打印资源

<a id="内容"></a>
## 内容

- [x] 按格式重命名判决书
Expand All @@ -57,7 +40,6 @@
- [x] 按照Data表输出寄送邮单
- [x] 填充好所有信息,再次运行就能输出Data表指定邮单

<a id="规则"></a>
## 规则

1. 当事人收信规则,没代理律师的每个当事人一份,有委托律师的只要寄给律师一份,多个律师寄给第一个律师,同一律所也是一份
Expand Down Expand Up @@ -89,21 +71,17 @@ Data表部分字段演示:

4. 【适用程序】规则(系列案用):

此处在OA表的【适用程序】填写,人为判断几个案是同一系列案的请在该字段中标注len(str)>3的唯一记号,系列案会自动合并

len(str)>3 = 记号多于三个字符
此处在OA表中当事人几个案件中完全相同就合并为一个案件,发一次邮单,假如人员稍有差别,仍然按原来分开处理

例如:

| 【适用程序】 | 【案号】 |
| --- | --- |
| AAA | 2773 |
| 2774-2776 | 2774 |
| 2774-2776 | 2775 |
| 2774-2776 | 2776 |
| 2160、2161_集合 | 2160 |
| 2160、2161_集合 | 2161 |


5. config.txt:
5. conf.txt:
```python
[config]
data_xlsx = data_main.xlsx # 数据模板地址
Expand All @@ -114,20 +92,18 @@ flag_append_oa = 1 # 是否导入OA数据
flag_to_postal = 1 # 是否打印邮单
flag_check_jdocs = 0 # 是否检查用户格式,输出提示信息
flag_check_postal = 0 # 是否检查邮单格式,输出提示信息
flag_output_log = 1 # 是否保存打印
data_case_codes = # 指定打印案号,可接多个,示例:AAA,BBB,优先级1
data_case_codes = # 指定打印案号,可接多个,示例:AAA号,BBB号,优先级1
data_date_range = # 指定打印数据日期范围示例:2018-09-01:2018-12-01,优先级2
data_last_lines = 10 # 指定打印最后行数,优先级3
data_last_lines = 3 # 指定打印最后行数,优先级3
```

<a id="详细指南"></a>
## 详细指南

简称:
- [A表: data_oa.xlsx,OA表自己下载,这个只是参考](./demo_docs/data_oa.xlsx)
- [B表: data_main.xlsx,会自动生成,也要修改](./demo_docs/data_main.xlsx)
- [C目录: jdocs/,判决书目录,要放下载的判决书](./demo_docs/jdocs/)
- [D文档: sheet.docx,邮单模板,按照背景生成邮单](./demo_docs/sheet.docx)
- [A表: data_oa.xlsx,OA表自己下载,这个只是参考](./demo_docs/data_oa.xlsx)
- [B表: data_main.xlsx,会自动生成,也要修改](./demo_docs/data_main.xlsx)
- [C目录: jdocs/,判决书目录,要放下载的判决书](./demo_docs/jdocs/)
- [D文档: sheet.docx,邮单模板,按照背景生成邮单](./demo_docs/sheet.docx)
- [E目录: postal/,邮单目录](./demo_docs/postal/)

1. 根据 **A表** 格式,整理自己的OA表(没数据是没用的),先在OA表中修改【适用程序】(系列案),修改conf.txt文件,参考[规则](#规则),如文件丢失再次运行会生成
Expand All @@ -147,9 +123,8 @@ data_last_lines = 10 # 指定打印最后行数,优先级3
5. 第二次运行(带【诉讼代理人】)
会重复 3.4. 3.5. 3.6.

6. 小白没有python环境,可以直接下载最新的exe版本,[win7/win10(32/64))](https://github.com/autolordz/docx-content-modify/releases/download/1.0.1/exe-win7win10-8962f68c.zip),仍然需要设置config文件
6. 小白没有python环境,可以直接下载最新的exe版本,使用前先配置conf.txt文件

<a id="licence"></a>
## Licence

[See Licence](https://github.com/autolordz/docx-content-modify/blob/master/LICENSE)
Expand Down
56 changes: 56 additions & 0 deletions configure.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# -*- coding: utf-8 -*-
"""
Created on Wed Sep 11 11:41:46 2019
@author: autol
"""

import configparser

#%% config and default values

def write_config(cfgfile):
cfg = configparser.ConfigParser(allow_no_value=1,
inline_comment_prefixes=('#', ';'))

cfg['config'] = dict(
data_xlsx = 'data_main.xlsx # 数据模板地址',
data_oa_xlsx = 'data_oa.xlsx # OA数据地址',
sheet_docx = 'sheet.docx # 邮单模板地址',
flag_fill_jdocs_infos = '1 # 是否填充判决书地址',
flag_append_oa = '1 # 是否导入OA数据',
flag_to_postal = '1 # 是否打印邮单',
flag_check_jdocs = '0 # 是否检查用户格式,输出提示信息',
flag_check_postal = '0 # 是否检查邮单格式,输出提示信息',
data_case_codes = ' # 指定打印案号,可接多个,示例:AAA,BBB,优先级1',
data_date_range = ' # 指定打印数据日期范围示例:2018-09-01:2018-12-01,优先级2',
data_last_lines = '3 # 指定打印最后行数,优先级3',
)

with open(cfgfile, 'w',encoding='utf-8-sig') as configfile:
cfg.write(configfile)
print('>>> 重新生成配置 %s ...'%cfgfile)
return cfg['config']


#%%
def read_config(cfgfile):
cfg = configparser.ConfigParser(allow_no_value=True,
inline_comment_prefixes=('#', ';'))
cfg.read(cfgfile,encoding='utf-8-sig')
ret = dict(
data_xlsx = cfg['config']['data_xlsx'],
data_oa_xlsx = cfg['config']['data_oa_xlsx'],
sheet_docx = cfg['config']['sheet_docx'],
data_case_codes = cfg['config']['data_case_codes'],
data_date_range = cfg['config']['data_date_range'],
data_last_lines = cfg['config']['data_last_lines'],
flag_fill_jdocs_infos = int(cfg['config']['flag_fill_jdocs_infos']),
flag_append_oa = int(cfg['config']['flag_append_oa']),
flag_to_postal = int(cfg['config']['flag_to_postal']),
flag_check_jdocs = int(cfg['config']['flag_check_jdocs']),
flag_check_postal = int(cfg['config']['flag_check_postal']),
)
return ret
# return dict(cfg.items('config'))

133 changes: 133 additions & 0 deletions copyinfos.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# -*- coding: utf-8 -*-
"""
Created on Wed Sep 11 12:08:08 2019
@author: autol
"""


#%%
import re
from collections import Counter
from util import split_list,user_to_list,save_adjust_xlsx
from globalvar import *

#%%

def copy_users_compare(jrow,df,errs=list(' ')):
'''copy users and check users completement
errs=['【OA无用户记录】','【用户错别字】','【字段重复】','【系列案】']
如下对比:
不相交,OA无用户记录
判断字段重复,输出重复的内容
比例确定怀疑用户错别字,判别不了直接正常输出
判决书多于当前案件,认为是系列案
判决书少于当前案件,当前案件缺部分地址
'''

code0 = str(df['案号']).strip()
code1 = str(df['原一审案号']).strip()
jcode = str(jrow['判决书源号']).strip()
x = Counter(user_to_list(df['当事人'])) # 当前案件
y = Counter(list(jrow['new_adr'].keys())) # 判决书
rxy = len(list((x&y).elements()))/len(list((x|y).elements()))
rxyx = len(list((x&y).elements()))/len(list(x.elements()))
rxyy = len(list((x&y).elements()))/len(list(y.elements()))
# print('x=',x);print('y=',y);print('rxy=',rxy)
# print('rxyx=',rxyx);print('rxyy=',rxyy)
if rxy == 0: # 不相交,完全无关
return errs[0]
if max(x.values()) > 1 or max(y.values()) > 1: # 有字段重复
xdu = [k for k,v in x.items() if v > 1] # 重复的内容
ydu = [k for k,v in y.items() if v > 1]
print_log('>>> %s 用户有字段重复【%s】-【案件:%s】 vs 【判决书:%s】'
%(code0,'{0:.0%}'.format(rxy),xdu,ydu))
return errs[2]
if rxy == 1: # 完全匹配
return df['当事人']
if 0 < rxy < 1: # 错别字
dx = list((x-y).elements())
dy = list((y-x).elements())
xx = Counter(''.join(dx))
yy = Counter(''.join(dy))
rxxyy = len(list(xx&yy.keys()))/len(list(xx|yy.keys()))
# print('rxxyy=',rxxyy)
if rxxyy >= .6:
print_log('>>> %s 认为【错别字率 %s】->【案件:%s vs 判决书:%s】'
%(code0,'{0:.0%}'.format(1-rxxyy),dx,dy))
return errs[1]
elif rxxyy >= .2:
print_log('>>> %s 认为【不好判断当正常处理【差异率 %s】vs【相同范围:%s】->【差异范围:案件:%s vs 判决书:%s】 '
%(code0,'{0:.0%}'.format(1-rxxyy),
list((x&y).elements()),
dx,dy))
return df['当事人']
if rxyx > .8:
print_log('>>> %s 案件 %s人 < 判决书 %s人'%(code0,len(x),len(y)))
if jcode != code1:# 系列案
print_log('>>> %s 认为【系列案,判决书人员 %s 多出地址】'%(code0,list((y-x).elements())))
return errs[3]
else:
return df['当事人']
elif rxyy > .8:
print_log('>>> %s 案件 %s人 > 判决书 %s人'%(code0,len(x),len(y)))
print_log('>>> %s 认为【当前案件人员 %s 缺地址】'%(code0,list((x-y).elements())))
return df['当事人']
return errs[0]


def copy_rows_adr1(x,n_adr):
''' copy jdocs address to address column
格式:['当事人','诉讼代理人','地址','new_adr','案号']
同时排除已有代理人的信息
'''
user = x['当事人'];agent = x['诉讼代理人'];adr = x['地址']; codes = x['案号']
if not isinstance(n_adr,dict):
return adr
else:
y = split_list(r'[,,]',adr)
adr1 = y.copy()
for i,k in enumerate(n_adr):
by_agent = any([k in ag for ag in re.findall(r'[\w+、]*\/[\w+]*',agent)]) # 找到代理人格式 'XX、XX/XX_123123'
if by_agent and k in adr: # remove user's address when user with agent 用户有代理人就不要地址
y = list(filter(lambda x:not k in x,y))
if type(n_adr) == dict and not k in adr and k in user and not by_agent:
y += [k+adr_tag+n_adr.get(k)] # append address by rules 输出地址格式
adr2 = y.copy()
adr = ','.join(list(filter(None, y)))
if Counter(adr1) != Counter(adr2) and adr and flag_check_jdocs:
print_log('>>> 【%s】成功复制判决书地址=>【%s】'%(codes,adr))
return adr

address_tmp_xlsx = 'address_tmp.xlsx'

def copy_rows_user_func(dfj,dfo):

'''copy users line regard adr user'''
errs = ['【OA无用户记录】','【用户错别字】','【字段重复】','【系列案】']

dfo['判决书源号'] = ''

def find_source():
print_log('\n>>> 判决书信息 | 案号=%s | 源号=%s | 判决书源号=%s'%(code0,code1,jcode))
dfo.loc[i,'地址'] = copy_rows_adr1(dfor,n_adr)
dfo.loc[i,'判决书源号'] = jcode

for (i,dfor) in dfo.iterrows():
for (j,dfjr) in dfj.iterrows():
code0 = str(dfor['案号']).strip()
code1 = str(dfor['原一审案号']).strip()
jcode = str(dfjr['判决书源号']).strip()
n_adr = dfjr['new_adr']
if isinstance(n_adr,dict):
if not n_adr:continue# 提取jdocs字段失败
if code1 == jcode:# 同案号,则找到内容
find_source() ; break
else:#[::-1] # 没案号
tag1 = copy_users_compare(dfjr,dfor,errs)
if tag1 not in errs:
find_source() ; break
else: pass
dfj = dfj.fillna('')
save_adjust_xlsx(dfj,address_tmp_xlsx,textfit=('判决书源号','new_adr')) # 保存临时提取信息
return dfo
13 changes: 13 additions & 0 deletions demo_docs/conf.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[config]
data_xlsx = data_main.xlsx # 数据模板地址
data_oa_xlsx = data_oa.xlsx # OA数据地址
sheet_docx = sheet.docx # 邮单模板地址
flag_fill_jdocs_infos = 1 # 是否填充判决书地址
flag_append_oa = 1 # 是否导入OA数据
flag_to_postal = 1 # 是否打印邮单
flag_check_jdocs = 0 # 是否检查用户格式,输出提示信息
flag_check_postal = 0 # 是否检查邮单格式,输出提示信息
data_case_codes = # 指定打印案号,可接多个,示例:AAA,BBB,优先级1
data_date_range = # 指定打印数据日期范围示例:2018-09-01:2018-12-01,优先级2
data_last_lines = 3 # 指定打印最后行数,优先级3

Loading

0 comments on commit 122b75e

Please sign in to comment.