fix: use utf8proc to detect emojis #29645

dundargoc · 2024-07-10T14:18:03Z

More accurately, it will check if the codepoint has the
"Extended_Pictographic" property, which according to
https:// unicode.org/reports/tr51/#Emoji_Properties_and_Data_Filesis
described as:

"The Extended_Pictographic characters contain all the Emoji characters
except for some Emoji_Component characters. "This should in most cases
align with what people refer to when they say "emoji".

dundargoc · 2024-07-10T14:19:24Z

I think this new solution should be superior but it wouldn't hurt if some emoji experts/connoisseurs tested this PR.

clason · 2024-07-10T14:35:42Z

How is this change breaking precisely?

dundargoc · 2024-07-10T15:29:41Z

How is this change breaking precisely?

Previously, neovim counted a codepoint as an emoji if it were part of the group "Emoji", "Emoji_Presentation", "Emoji_Modifier", "Emoji_Modifier_Base" and "Emoji_Component" in unicode.org/reports/tr51. In this PR, neovim counts an emoji if it is part of the "Extended_Pictographic".

I am not 100% sure what this entails in practice. Here's alist of changed codepoints.Note that this list is not exhaustive as all emojis inemoji-data.txtisn't explicitly listed.

clason · 2024-07-10T15:31:26Z

Could also count as "fix" (not breaking) if we declare utf8proc as the source of truth for this (which arguably we should)?

More accurately, it will check if the codepoint has the "Extended_Pictographic" property, which according to https:// unicode.org/reports/tr51/#Emoji_Properties_and_Data_Filesis described as: "The Extended_Pictographic characters contain all the Emoji characters except for some Emoji_Component characters." This should in most cases align with what people refer to when they say "emoji".

zeertzjq · 2024-07-11T01:50:49Z

It's strange to replaceemoji_allwithout replacingemoji_wide.

dundargoc · 2024-08-10T10:54:06Z

So it turns out it's not possible to determine from codepoint alone if a character is an emoji, as there are codepoints that can be both text and emoji depending on variant selector(?). Closing for the time being until we have a better solution in mind.

github-actions bot added build building and installing Neovim using the provided scripts breaking-change labels Jul 10, 2024

dundargoc force-pushed the build/utf8proc/emoji branch from 198f051 to a6e5e5e Compare July 10, 2024 18:50

dundargoc changed the title ~~build!: use utf8proc to detect emojis~~ fix: use utf8proc to detect emojis Jul 10, 2024

dundargoc removed the breaking-change label Jul 10, 2024

dundargoc closed this Aug 10, 2024

dundargoc deleted the build/utf8proc/emoji branch August 10, 2024 12:13

justinmk added the unicode 💩 (multibyte) unicode characters label Aug 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use utf8proc to detect emojis #29645

fix: use utf8proc to detect emojis #29645

dundargoc commented Jul 10, 2024 •

edited

Loading

dundargoc commented Jul 10, 2024

clason commented Jul 10, 2024

dundargoc commented Jul 10, 2024

clason commented Jul 10, 2024

zeertzjq commented Jul 11, 2024

dundargoc commented Aug 10, 2024

fix: use utf8proc to detect emojis #29645

fix: use utf8proc to detect emojis #29645

Conversation

dundargoc commented Jul 10, 2024 • edited Loading

dundargoc commented Jul 10, 2024

clason commented Jul 10, 2024

dundargoc commented Jul 10, 2024

clason commented Jul 10, 2024

zeertzjq commented Jul 11, 2024

dundargoc commented Aug 10, 2024

dundargoc commented Jul 10, 2024 •

edited

Loading