aboutsummaryrefslogtreecommitdiff
path: root/lib/unicode.mli
diff options
context:
space:
mode:
authorPierre Letouzey2016-05-19 15:18:26 +0200
committerPierre Letouzey2016-05-19 15:39:03 +0200
commitd2f9a457d0bb2fd11ac7d5f6587174a79ca9c4b6 (patch)
tree52e2fcfa652b460399777769f8840539e6c5d202 /lib/unicode.mli
parent9b2beca375e1b3fd8f1298ee13656124fe24e807 (diff)
Unicode.ascii_of_ident is now truly injective
A non-ASCII char is now converted to _UUxxxx_ with xxxx being its unicode index in hexa. And any preexisting _UU substring in the ident is converted to _UUU. The switch from __Uxxxx_ to _UUxxxx_ is cosmetic, it just helps the extraction (less __ in names). But the other part of the patch (detection of preexisting _UU substrings) is critical to make ascii_of_ident truly injective and avoid the following kind of proof of False via native_compute : Definition α := 1. Definition __U03b1_ := 2. Lemma oups : False. Proof. assert (α = __U03b1_). { native_compute. reflexivity. } discriminate. Qed. Conflicts: lib/unicode.mli
Diffstat (limited to 'lib/unicode.mli')
-rw-r--r--lib/unicode.mli9
1 files changed, 8 insertions, 1 deletions
diff --git a/lib/unicode.mli b/lib/unicode.mli
index 65e75a20d6..00211164fb 100644
--- a/lib/unicode.mli
+++ b/lib/unicode.mli
@@ -23,8 +23,15 @@ val ident_refutation : string -> (bool * string) option
(** First char of a string, converted to lowercase *)
val lowercase_first_char : string -> string
-(** For extraction, turn a unicode string into an ascii-only one *)
+(** Return [true] if all UTF-8 characters in the input string are just plain
+ ASCII characters. Returns [false] otherwise. *)
val is_basic_ascii : string -> bool
+
+(** [ascii_of_ident s] maps UTF-8 string to a string composed solely from ASCII
+ characters. The non-ASCII characters are translated to ["_UUxxxx_"] where
+ {i xxxx} is the Unicode index of the character in hexadecimal (from four
+ to six hex digits). To avoid potential name clashes, any preexisting
+ substring ["_UU"] is turned into ["_UUU"]. *)
val ascii_of_ident : string -> string
(** Validate an UTF-8 string *)