diff options
| author | Pierre Letouzey | 2016-05-19 15:18:26 +0200 |
|---|---|---|
| committer | Pierre Letouzey | 2016-05-19 15:18:26 +0200 |
| commit | 244d7a9aafe7ad613dd2095ca3126560cb3ea1d0 (patch) | |
| tree | 26102e433f0072ab32f724fa231693510119c37b /lib/unicode.mli | |
| parent | c14e6eebc6c3696623a440cd7eaa4a8d8fe4f492 (diff) | |
Unicode.ascii_of_ident is now truly injective
A non-ASCII char is now converted to _UUxxxx_ with xxxx being its unicode index
in hexa. And any preexisting _UU substring in the ident is converted to _UUU.
The switch from __Uxxxx_ to _UUxxxx_ is cosmetic, it just helps the extraction
(less __ in names). But the other part of the patch (detection of preexisting
_UU substrings) is critical to make ascii_of_ident truly injective and avoid
the following kind of proof of False via native_compute :
Definition α := 1.
Definition __U03b1_ := 2.
Lemma oups : False.
Proof.
assert (α = __U03b1_). { native_compute. reflexivity. }
discriminate.
Qed.
Diffstat (limited to 'lib/unicode.mli')
| -rw-r--r-- | lib/unicode.mli | 13 |
1 files changed, 7 insertions, 6 deletions
diff --git a/lib/unicode.mli b/lib/unicode.mli index b8a11e2945..aaf455dec5 100644 --- a/lib/unicode.mli +++ b/lib/unicode.mli @@ -27,14 +27,15 @@ val ident_refutation : string -> (bool * string) option @raise Assert_failure if the input string is empty. *) val lowercase_first_char : string -> string -(** Return [true] if all UTF-8 characters in the input string are just plain ASCII characters. - Returns [false] otherwise. *) +(** Return [true] if all UTF-8 characters in the input string are just plain + ASCII characters. Returns [false] otherwise. *) val is_basic_ascii : string -> bool -(** [ascii_of_ident s] maps UTF-8 string to a string composed solely from ASCII characters. - Those UTF-8 characters which do not have their ASCII counterparts are - translated to ["__Uxxxx_"] where {i xxxx} are four hexadecimal digits. - @raise Unsupported if the input string contains unsupported UTF-8 characters. *) +(** [ascii_of_ident s] maps UTF-8 string to a string composed solely from ASCII + characters. The non-ASCII characters are translated to ["_UUxxxx_"] where + {i xxxx} is the Unicode index of the character in hexadecimal (from four + to six hex digits). To avoid potential name clashes, any preexisting + substring ["_UU"] is turned into ["_UUU"]. *) val ascii_of_ident : string -> string (** Validate an UTF-8 string *) |
