Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<format>: Add grapheme clusterization support for width computation #2119

Merged
merged 71 commits into from
Apr 27, 2022
Merged
Show file tree
Hide file tree
Changes from 70 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
23c009a
Add generator for unicode static data and add static data to <format>
barcharcraz Jul 14, 2021
ac723c9
work on utf-8 conversion function that can deal with invalid unicode
barcharcraz Jul 15, 2021
5fc54a9
work on utf8 error handling decoder.
barcharcraz Jul 17, 2021
7dc4b9a
utf-8 decoder works
barcharcraz Jul 19, 2021
8a3f277
add _Unicode_codepoint_iterator, to be wrapped by the grapheme cluste…
barcharcraz Jul 22, 2021
697ca14
unicode iterator actually works
barcharcraz Jul 23, 2021
87c02c8
in progress break iterator op++
barcharcraz Jul 23, 2021
6108f9c
add grapemem test data gen and code generator
barcharcraz Aug 4, 2021
c883233
minor comment revisions.
barcharcraz Jul 27, 2021
79c6e36
clusterization tests pass
barcharcraz Aug 9, 2021
a87dd27
start work on porting c++ data generator to python.
barcharcraz Aug 9, 2021
98033c5
add python data generator
barcharcraz Aug 11, 2021
ed69f8a
remove the c++ data generator.
barcharcraz Aug 12, 2021
a1b3a2d
small comment correction
barcharcraz Aug 12, 2021
348e217
add grapheme clusterization
barcharcraz Aug 13, 2021
3ee49f4
add license header
barcharcraz Aug 13, 2021
416866f
constexpr gb11 regex.
barcharcraz Aug 13, 2021
8726535
line length in data generator.
barcharcraz Aug 13, 2021
a3a6213
teach validate to ignore unicode data files.
barcharcraz Aug 16, 2021
5ee59cb
use lower bound instead of upper bound, as lower bound is in xutil.
barcharcraz Aug 18, 2021
a23093a
wave 1 of review suggestions.
barcharcraz Aug 25, 2021
d9aaf62
fix typos in the test (sorry CI)
barcharcraz Aug 27, 2021
daa566f
more code review comments
barcharcraz Aug 30, 2021
da303e3
don't parenthesize *this
barcharcraz Aug 30, 2021
52a97cc
simplify _Unicode_codepoint_iterator::operator==
barcharcraz Aug 30, 2021
02003b4
Don't _STD qualify call to _Decode_utf
barcharcraz Aug 30, 2021
0814e86
use private: explicitly at the start of _GB11_LeftHand_regex
barcharcraz Aug 30, 2021
f4bf037
use capital hexits
barcharcraz Aug 30, 2021
090a356
spell "number" correctly
barcharcraz Aug 30, 2021
99ba0fe
newlines between if statements
barcharcraz Aug 30, 2021
fb79192
don't hyphenate codepoint
barcharcraz Aug 30, 2021
f00fb33
generated type name capitalization changes
barcharcraz Aug 30, 2021
9067d46
More capitalization changes
barcharcraz Aug 30, 2021
53f4170
More capitalization changes
barcharcraz Aug 30, 2021
4804180
use a foreach loop in test
barcharcraz Aug 31, 2021
d8b1ed4
comment consistency, fix two byte utf-8 decode (+ test)
barcharcraz Sep 14, 2021
528e61e
address review comments and add many utf decoding tests.
barcharcraz Sep 15, 2021
b1c8c83
add generated code comment
barcharcraz Sep 15, 2021
c9d9521
nodiscard and noexceptify the new iterators
barcharcraz Sep 23, 2021
e942ee7
apply more code review changes
barcharcraz Sep 23, 2021
5564063
comment changes
barcharcraz Sep 23, 2021
65c6cb5
_STL_INTERNAL_CHECK wants one argument.
barcharcraz Sep 23, 2021
c0826f9
tests pass
barcharcraz Sep 25, 2021
ae4a9bb
no need to keep track of RIs across EGC boundaries
barcharcraz Sep 25, 2021
86bd7b7
Merge remote-tracking branch 'upstream/main' into format_uax29
barcharcraz Jan 15, 2022
21ec105
Make decoding functions constexpr
barcharcraz Jan 18, 2022
2747743
Merge remote-tracking branch 'upstream/main' into format_uax29_inprog
barcharcraz Jan 19, 2022
c7e6782
add constexpr to a few more decoding functions
barcharcraz Jan 19, 2022
64d699f
Merge remote-tracking branch 'upstream/main' into format_uax29
barcharcraz Jan 27, 2022
a1a9b25
fix clang-format
barcharcraz Jan 27, 2022
06e3018
address some code review comments
barcharcraz Jan 28, 2022
564bded
remove iterator concept and linebreaks in generators
barcharcraz Jan 29, 2022
300c03e
remove linebreaks in generators
barcharcraz Jan 29, 2022
c5b691b
fix tests
barcharcraz Jan 29, 2022
db10cec
typename -> class
barcharcraz Jan 29, 2022
d944035
iter_value_t
barcharcraz Jan 29, 2022
5b3db99
don't use implicit private
barcharcraz Jan 29, 2022
2895204
remove unicode data files from the repo.
barcharcraz Feb 7, 2022
580baec
remove skipping of unicode data from validate.cpp, add copyright head…
barcharcraz Mar 24, 2022
bb25a08
correct clang-format being a little too enthusiastic
barcharcraz Mar 26, 2022
9e1bd64
split format generated data out from <format>
barcharcraz Mar 29, 2022
eb4f585
change a copyright symbol to (c) in the unicode copyright banner
barcharcraz Mar 29, 2022
8851217
Merge branch 'main' into format_uax29
barcharcraz Mar 30, 2022
da64f7a
use _Get_fmt_codec() in the legacy width measureing iterator
barcharcraz Mar 31, 2022
eca7de6
use the cached _Fmt_codec in the legacy iterator.
barcharcraz Mar 31, 2022
572b809
remove bad unicode characters from comments and break some long lines.
barcharcraz Mar 31, 2022
5a3c30c
disable clang-format for the _entire_ .gitignore file.
barcharcraz Mar 31, 2022
11b726f
Merge branch 'main' into format_uax29
StephanTLavavej Apr 22, 2022
0516c26
Teach parallelize.cpp to skip dotfiles.
StephanTLavavej Apr 22, 2022
1c4217e
Code review feedback.
StephanTLavavej Apr 22, 2022
48a2e10
Code review feedback from Nicole and Charlie.
StephanTLavavej Apr 23, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions NOTICE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -167,3 +167,52 @@ In addition, certain files include the notices provided below.
// of this Software are embedded into a machine-executable object form of such
// source code, you may redistribute such embedded portions in such object form
// without including the above copyright and permission notices.

----------------------

// UNICODE, INC. LICENSE AGREEMENT - DATA FILES AND SOFTWARE
//
// See Terms of Use <https://www.unicode.org/copyright.html>
// for definitions of Unicode Inc.'s Data Files and Software.
//
// NOTICE TO USER: Carefully read the following legal agreement.
// BY DOWNLOADING, INSTALLING, COPYING OR OTHERWISE USING UNICODE INC.'S
// DATA FILES ("DATA FILES"), AND/OR SOFTWARE ("SOFTWARE"),
// YOU UNEQUIVOCALLY ACCEPT, AND AGREE TO BE BOUND BY, ALL OF THE
// TERMS AND CONDITIONS OF THIS AGREEMENT.
// IF YOU DO NOT AGREE, DO NOT DOWNLOAD, INSTALL, COPY, DISTRIBUTE OR USE
// THE DATA FILES OR SOFTWARE.
//
// COPYRIGHT AND PERMISSION NOTICE
//
// Copyright (c) 1991-2022 Unicode, Inc. All rights reserved.
// Distributed under the Terms of Use in https://www.unicode.org/copyright.html.
//
// Permission is hereby granted, free of charge, to any person obtaining
// a copy of the Unicode data files and any associated documentation
// (the "Data Files") or Unicode software and any associated documentation
// (the "Software") to deal in the Data Files or Software
// without restriction, including without limitation the rights to use,
// copy, modify, merge, publish, distribute, and/or sell copies of
// the Data Files or Software, and to permit persons to whom the Data Files
// or Software are furnished to do so, provided that either
// (a) this copyright and permission notice appear with all copies
// of the Data Files or Software, or
// (b) this copyright and permission notice appear in associated
// Documentation.
//
// THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF
// ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
// WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT OF THIRD PARTY RIGHTS.
// IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS
// NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL
// DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE,
// DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
// TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
// PERFORMANCE OF THE DATA FILES OR SOFTWARE.
//
// Except as contained in this notice, the name of a copyright holder
// shall not be used in advertising or otherwise to promote the sale,
// use or other dealings in these Data Files or Software without prior
// written authorization of the copyright holder.
1 change: 1 addition & 0 deletions stl/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
set(HEADERS
${CMAKE_CURRENT_LIST_DIR}/inc/__msvc_all_public_headers.hpp
${CMAKE_CURRENT_LIST_DIR}/inc/__msvc_chrono.hpp
${CMAKE_CURRENT_LIST_DIR}/inc/__msvc_format_ucd_tables.hpp
StephanTLavavej marked this conversation as resolved.
Show resolved Hide resolved
StephanTLavavej marked this conversation as resolved.
Show resolved Hide resolved
${CMAKE_CURRENT_LIST_DIR}/inc/__msvc_int128.hpp
${CMAKE_CURRENT_LIST_DIR}/inc/__msvc_system_error_abi.hpp
${CMAKE_CURRENT_LIST_DIR}/inc/__msvc_tzdb.hpp
Expand Down
425 changes: 425 additions & 0 deletions stl/inc/__msvc_format_ucd_tables.hpp

Large diffs are not rendered by default.

32 changes: 0 additions & 32 deletions stl/inc/algorithm
Original file line number Diff line number Diff line change
Expand Up @@ -6173,39 +6173,7 @@ namespace ranges {
};

inline constexpr _Lower_bound_fn lower_bound{_Not_quite_object::_Construct_tag{}};
} // namespace ranges
#endif // __cpp_lib_concepts

template <class _FwdIt, class _Ty, class _Pr>
_NODISCARD _CONSTEXPR20 _FwdIt upper_bound(_FwdIt _First, _FwdIt _Last, const _Ty& _Val, _Pr _Pred) {
// find first element that _Val is before
_Adl_verify_range(_First, _Last);
auto _UFirst = _Get_unwrapped(_First);
_Iter_diff_t<_FwdIt> _Count = _STD distance(_UFirst, _Get_unwrapped(_Last));

while (0 < _Count) { // divide and conquer, find half that contains answer
_Iter_diff_t<_FwdIt> _Count2 = _Count / 2;
const auto _UMid = _STD next(_UFirst, _Count2);
if (_Pred(_Val, *_UMid)) {
_Count = _Count2;
} else { // try top half
_UFirst = _Next_iter(_UMid);
_Count -= _Count2 + 1;
}
}

_Seek_wrapped(_First, _UFirst);
return _First;
}

template <class _FwdIt, class _Ty>
_NODISCARD _CONSTEXPR20 _FwdIt upper_bound(_FwdIt _First, _FwdIt _Last, const _Ty& _Val) {
// find first element that _Val is before
return _STD upper_bound(_First, _Last, _Val, less<>{});
}

#ifdef __cpp_lib_concepts
namespace ranges {
template <class _It, class _Ty, class _Pr, class _Pj>
_NODISCARD constexpr _It _Upper_bound_unchecked(
_It _First, iter_difference_t<_It> _Count, const _Ty& _Val, _Pr _Pred, _Pj _Proj) {
Expand Down
Loading