Module:affix
- The followingdocumentationis located atModule:affix/documentation.[edit]
- Useful links:subpage list•links•transclusions•testcases•sandbox
This module, accessed throughModule:affix/templates,powers most of the morphology templates, including{{affix}}
,{{prefix}}
,{{suffix}}
,{{confix}}
,{{compound}}
,{{blend}}
,{{univerbation}}
and various others, as well as{{prefixsee}}
and{{suffixsee}}
.
Exported functions
About different types of hyphens ( "template", "display" and "lookup" ):
- The "template hyphen" is the per-script hyphen character that is used in template calls to indicate that a term is an affix. This is always a single Unicode char, but there may be multiple possible hyphens for a given script. Normally this is just the regular hyphen character "-", but for some non-Latin-script languages (currently only right-to-left languages), it is different.
- The "display hyphen" is the string (which might be an empty string) that is added onto a term as displayed and linked, to indicate that a term is an affix. Currently this is always either the same as the template hyphen or an empty string, but the code below is written generally enough to handle arbitrary display hyphens. Specifically:
- For East Asian languages, the display hyphen is always blank.
- For Arabic-script languages, either tatweel (ـ) or ZWNJ (zero-width non-joiner) are allowed as template hyphens, where ZWNJ is supported primarily for Farsi, because some suffixes have non-joining behavior. The display hyphen corresponding to tatweel is also tatweel, but the display hyphen corresponding to ZWNJ is blank (tatweel is also the default display hyphen, for calls to
{{prefix}}
/{{suffix}}
/etc. that don't include an explicit hyphen).
- The "lookup hyphen" is the hyphen that is used when looking up language-specific affix mappings. (These mappings are discussed in more detail below when discussing link affixes.) It depends only on the script of the affix in question. Most scripts (including East Asian scripts) use a regular hyphen "-" as the lookup hyphen, but Hebrew and Arabic have their own lookup hyphens (respectively maqqef and tatweel). Note that for Arabic in particular, there are three possible template hyphens that are recognized (tatweel, ZWNJ and regular hyphen), but mappings must use tatweel.
About different types of affixes ( "template", "display", "link", "lookup" and "category" ):
- A "template affix" is an affix in its source form as it appears in a template call. Generally, a template affix has an attached template hyphen (see above) to indicate that it is an affix and indicate what type of affix it is (prefix, suffix, interfix/infix or circumfix), but some of the older-style templates such as
{{suffix}}
,{{prefix}}
,{{confix}}
,etc. have "positional" affixes where the presence of the affix in a certain position (e.g. the second or third parameter) indicates that it is a certain type of affix, whether or not it has an attached template hyphen. - A "display affix" is the corresponding affix as it is actually displayed to the user. The display affix may differ from the template affix for various reasons:
- The display affix may be specified explicitly using the
|alt
parameter, theN
=<alt:...>
inline modifier or a piped link of the form e.g.[[-kas|-käs]]
(here indicating that the affix should display as-käs
but be linked as-kas
). Here, the template affix is arguably the entire piped link, while the display affix is-käs
. - Even in the absence of
|alt
parameters,N
=<alt:...>
inline modifiers and piped links, certain languages have differences between the "template hyphen" specified in the template (which always needs to be specified somehow or other in templates like{{affix}}
,to indicate that the term is an affix and what type of affix it is) and the display hyphen (see above), with corresponding differences between template and display affixes.
- The display affix may be specified explicitly using the
- A (regular) "link affix" is the affix that is linked to when the affix is shown to the user. The link affix is usually the same as the display affix, but will differ in one of three circumstances:
- The display and link affixes are explicitly made different using
|alt
parameters,N
=<alt:...>
inline modifiers or piped links, as described above under "display affix". - For certain languages, certain affixes are mapped to canonical form using language-specific mappings. For example, in Finnish, the adjective-forming suffix-kasappears as-käsafter front vowels, but logically both forms are the same suffix and should be linked and categorized the same. Similarly, in Latin, the negative and intensive prefixes spelledin-(etymologically two distinct prefixes) appear variously asil-,im-orir-before certain consonants. Mappings are supplied inModule:affix/lang-data/LANGCODEto convert Finnish-kästo-kasfor linking and categorization purposes. Note that the affixes in the mappings use "lookup hyphens" to indicate the different types of affixes, which is usually the same as the template hyphen but differs for Arabic scripts, because there are multiple possible template hyphens recognized but only one lookup hyphen (tatweel). The form of the affix as used to look up in the mapping tables is called the "lookup affix"; see below.
- The display and link affixes are explicitly made different using
- A "stripped link affix" is a link affix that has been passed through the language's
makeEntryName()
function, which may strip certain diacritics: e.g. macrons in Latin and Old English (indicating length); acute and grave accents in Russian and various other Slavic languages (indicating stress); vowel diacritics in most Arabic-script languages; and also tatweel in some Arabic-script languages (currently, for example, Persian, Arabic and Urdu strip tatweel, but Ottoman Turkish does not). Stripped link affixes are currently what are used in category names. - A "lookup affix" is the form of the affix as it is looked up in the language-specific lookup mappings described above under link affixes. There are actually two lookup stages:
- First, the affix is looked up in a modified display form (specifically, the same as the display affix but using lookup hyphens). Note that this lookup does not occur if an explicit display form is given using
|alt
or anN
=<alt:...>
inline modifier, or if the template affix contains a piped or embedded link. - If no entry is found, the affix is then looked up in a modified link form (specifically, the modified display form passed through the language's
makeEntryName()
function, which strips out certain diacritics, but with the lookup hyphen re-added if it was stripped out, as in the case of tatweel in many Arabic-script languages). The reason for this double lookup procedure is to allow for mappings that are sensitive to the extra diacritics, but also allow for mappings that are not sensitive in this fashion (e.g. Russian-ливыйoccurs both stressed and unstressed, but is the same prefix either way).
- First, the affix is looked up in a modified display form (specifically, the same as the display affix but using lookup hyphens). Note that this lookup does not occur if an explicit display form is given using
- A "category affix" is the affix as it appears in categories such asCategory:Finnish terms suffixed with -kas.The category affix is currently always the same as the stripped link affix. This means that for Arabic-script languages, it may or may not have a tatweel, even if the correponding display affix and regular link affix have a tatweel. As mentioned above, makeEntryName() strips tatweel for Arabic, Persian and Urdu, but not for Ottoman Turkish. Hence affix categories for Arabic, Persian and Urdu will be missing the tatweel, but affix categories for Ottoman Turkish will have it. An additional complication is that if the template affix contains a ZWNJ, the display (and hence the link and category affixes) will have no hyphen attached in any case.
export.join_formatted_parts
functionexport.join_formatted_parts(data)
parts_formatted
) together with any overall|lit=
spec (inlit
) plus categories, which are formatted by prepending the language name as found inlang
.The value of an entry incategories
can be either a string (which is formatted usingsort_key
) or a table of the form `{cat=<var>category</var>,
sort_key=<var>sort_key</var>,sort_base=<var>sort_base</var>
nocat
is given, no categories are added; otherwise,force_cat
causes categories to be added even on userspace pages.
export.concat_parts
functionexport.concat_parts(lang,parts_formatted,categories,nocat,sort_key,lit,force_cat)
Older entry point for calling `join_formatted_parts(). FIXME: Convert callers.
export.link_term
functionexport.link_term(part,data)
Construct a single linked part based on the information inpart
,for use byshow_affix()
and other entry points. This should be called aftercanonicalize_part()
is called on the part. This is a thin wrapper aroundfull_link()
inModule:linksunlesspart.part_lang
is specified (indicating that a part-specific language was given), in which caseformat_derived()
inModule:etymologyis called to display a term in a language other than the language of the overall term (specified indata.lang
).data
contains the entire object passed into the entry point and is used to access information for constructing the categories added byformat_derived()
.
export.make_affix
functionexport.make_affix(term,lang,sc,affix_type,do_affix_mapping,return_lookup_affix,affix_id)
Add a hyphen to a term in the appropriate place, based on the specified affix type, stripping off any existing hyphens in that place. For example, ifaffix_type
=="prefix"
,we'll add a hyphen onto the end if it's not already there (or is of the wrong type). Three values are returned: the link term, display term and lookup term. This function is a thin wrapper aroundparse_term_for_affixes
;see the comments above that function for more information. Note that this function is exposed externally because it is called byModule:category tree/poscatboiler/data/affixes and compounds;see the comment inparse_term_for_affixes
for more information.
export.show_affix
functionexport.show_affix(data)
Implementation of{{affix}}
and{{surface analysis}}
.data
contains all the information describing the affixes to be displayed, and contains the following:
.lang
(required): Overall language object. Different from term-specific language objects (see.parts
below)..sc
:Overall script object (usually omitted). Different from term-specific script objects..parts
(required): List of objects describing the affixes to show. The general format of each object is as would be passed tofull_link()
,except that the.lang
field should be missing unless the term is of a language different from the overall.lang
value (in such a case, the language name is shown along with the term and an additional "derived from" category is added).WARNING:The data in.parts
will be destructively modified..pos
:Overall part of speech (used in categories, defaults to"terms"
). Different from term-specific part of speech..sort_key
:Overall sort key. Normally omitted except e.g. in Japanese..type
:Type of compound, if the parts in.parts
describe a compound. Strictly optional, and if supplied, the compound type is displayed before the parts (normally capitalized, unless.nocap
is given)..nocap
:Don't capitalize the first letter of text displayed before the parts (relevant only if.type
or.surface_analysis
is given)..notext
:Don't display any text before the parts (relevant only if.type
or.surface_analysis
is given)..nocat
:Disable all categorization..lit
:Overall literal definition. Different from term-specific literal definitions..force_cat
:Always display categories, even on userspace pages..surface_analysis
:Implement Bysurface analysis;addsBy surface analysis,
before the parts.
WARNING:This destructively modifies bothdata
and the individual structures within.parts
.
export.show_surface_analysis
functionexport.show_surface_analysis(data)
This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.
export.show_compound
functionexport.show_compound(data)
Implementation of{{compound}}
.
WARNING:This destructively modifies bothdata
and the individual structures within.parts
.
export.show_compound_like
functionexport.show_compound_like(data)
Implementation of{{blend}}
,{{univerbation}}
and similar "compound-like" templates.
WARNING:This destructively modifies bothdata
and the individual structures within.parts
.
export.show_circumfix
functionexport.show_circumfix(data)
Implementation of{{circumfix}}
.
WARNING:This destructively modifies bothdata
and.prefix
,.base
and.suffix
.
export.show_confix
functionexport.show_confix(data)
Implementation of{{confix}}
.
WARNING:This destructively modifies bothdata
and.prefix
,.base
and.suffix
.
export.show_infix
functionexport.show_infix(data)
Implementation of{{infix}}
.
WARNING:This destructively modifies bothdata
and.base
and.infix
.
export.show_prefix
functionexport.show_prefix(data)
Implementation of{{prefix}}
.
WARNING:This destructively modifies bothdata
and the structures within.prefixes
,as well as.base
.
export.show_suffix
functionexport.show_suffix(data)
Implementation of{{suffix}}
.
WARNING:This destructively modifies bothdata
and the structures within.suffixes
,as well as.base
.
localexport={}
localdebug_force_cat=false-- if set to true, always display categories even on userspace pages
localm_links=require("Module:links")
localm_str_utils=require("Module:string utilities")
localm_table=require("Module:table")
localetymology_module="Module:etymology"
localpron_qualifier_module="Module:pron qualifier"
localscripts_module="Module:scripts"
localutilities_module="Module:utilities"
-- Export this so the category code in [[Module:category tree/poscatboiler/data/terms by etymology]] can access it.
export.affix_lang_data_module_prefix="Module:affix/lang-data/"
localrsub=m_str_utils.gsub
localusub=m_str_utils.sub
localulen=m_str_utils.len
localrfind=m_str_utils.find
localrmatch=m_str_utils.match
localpluralize=require("Module:en-utilities").pluralize
localu=m_str_utils.char
localucfirst=m_str_utils.ucfirst
-- Export this so the category code in [[Module:category tree/poscatboiler/data/terms by etymology]] can access it.
export.langs_with_lang_specific_data={
["az"]=true,
["fi"]=true,
["izh"]=true,
["la"]=true,
["sah"]=true,
["tr"]=true,
}
localdefault_pos="term"
--[==[ intro:
===About different types of hyphens ( "template", "display" and "lookup" ):===
* The "template hyphen" is the per-script hyphen character that is used in template calls to indicate that a term is an
affix. This is always a single Unicode char, but there may be multiple possible hyphens for a given script. Normally
this is just the regular hyphen character "-", but for some non-Latin-script languages (currently only right-to-left
languages), it is different.
* The "display hyphen" is the string (which might be an empty string) that is added onto a term as displayed and linked,
to indicate that a term is an affix. Currently this is always either the same as the template hyphen or an empty
string, but the code below is written generally enough to handle arbitrary display hyphens. Specifically:
*# For East Asian languages, the display hyphen is always blank.
*# For Arabic-script languages, either tatweel (ـ) or ZWNJ (zero-width non-joiner) are allowed as template hyphens,
where ZWNJ is supported primarily for Farsi, because some suffixes have non-joining behavior. The display hyphen
corresponding to tatweel is also tatweel, but the display hyphen corresponding to ZWNJ is blank (tatweel is also
the default display hyphen, for calls to {{tl|prefix}}/{{tl|suffix}}/etc. that don't include an explicit hyphen).
* The "lookup hyphen" is the hyphen that is used when looking up language-specific affix mappings. (These mappings are
discussed in more detail below when discussing link affixes.) It depends only on the script of the affix in question.
Most scripts (including East Asian scripts) use a regular hyphen "-" as the lookup hyphen, but Hebrew and Arabic
have their own lookup hyphens (respectively maqqef and tatweel). Note that for Arabic in particular, there are
three possible template hyphens that are recognized (tatweel, ZWNJ and regular hyphen), but mappings must use tatweel.
===About different types of affixes ( "template", "display", "link", "lookup" and "category" ):===
* A "template affix" is an affix in its source form as it appears in a template call. Generally, a template affix has
an attached template hyphen (see above) to indicate that it is an affix and indicate what type of affix it is
(prefix, suffix, interfix/infix or circumfix), but some of the older-style templates such as {{tl|suffix}},
{{tl|prefix}}, {{tl|confix}}, etc. have "positional" affixes where the presence of the affix in a certain position
(e.g. the second or third parameter) indicates that it is a certain type of affix, whether or not it has an attached
template hyphen.
* A "display affix" is the corresponding affix as it is actually displayed to the user. The display affix may differ
from the template affix for various reasons:
*# The display affix may be specified explicitly using the {{para|alt<var>N</var>}} parameter, the `<alt:...>` inline
modifier or a piped link of the form e.g. `<nowiki>[[-kas|-käs]]</nowiki>` (here indicating that the affix should
display as `-käs` but be linked as `-kas`). Here, the template affix is arguably the entire piped link, while the
display affix is `-käs`.
*# Even in the absence of {{para|alt<var>N</var>}} parameters, `<alt:...>` inline modifiers and piped links, certain
languages have differences between the "template hyphen" specified in the template (which always needs to be
specified somehow or other in templates like {{tl|affix}}, to indicate that the term is an affix and what type of
affix it is) and the display hyphen (see above), with corresponding differences between template and display affixes.
* A (regular) "link affix" is the affix that is linked to when the affix is shown to the user. The link affix is usually
the same as the display affix, but will differ in one of three circumstances:
*# The display and link affixes are explicitly made different using {{para|alt<var>N</var>}} parameters, `<alt:...>`
inline modifiers or piped links, as described above under "display affix".
*# For certain languages, certain affixes are mapped to canonical form using language-specific mappings. For example,
in Finnish, the adjective-forming suffix [[-kas]] appears as [[-käs]] after front vowels, but logically both
forms are the same suffix and should be linked and categorized the same. Similarly, in Latin, the negative and
intensive prefixes spelled [[in-]] (etymologically two distinct prefixes) appear variously as [[il-]], [[im-]] or
[[ir-]] before certain consonants. Mappings are supplied in [[Module:affix/lang-data/LANGCODE]] to convert
Finnish [[-käs]] to [[-kas]] for linking and categorization purposes. Note that the affixes in the mappings use
"lookup hyphens" to indicate the different types of affixes, which is usually the same as the template hyphen but
differs for Arabic scripts, because there are multiple possible template hyphens recognized but only one lookup
hyphen (tatweel). The form of the affix as used to look up in the mapping tables is called the "lookup affix";
see below.
* A "stripped link affix" is a link affix that has been passed through the language's `makeEntryName()` function, which
may strip certain diacritics: e.g. macrons in Latin and Old English (indicating length); acute and grave accents in
Russian and various other Slavic languages (indicating stress); vowel diacritics in most Arabic-script languages; and
also tatweel in some Arabic-script languages (currently, for example, Persian, Arabic and Urdu strip tatweel, but
Ottoman Turkish does not). Stripped link affixes are currently what are used in category names.
* A "lookup affix" is the form of the affix as it is looked up in the language-specific lookup mappings described above
under link affixes. There are actually two lookup stages:
*# First, the affix is looked up in a modified display form (specifically, the same as the display affix but using
lookup hyphens). Note that this lookup does not occur if an explicit display form is given using
{{para|alt<var>N</var>}} or an `<alt:...>` inline modifier, or if the template affix contains a piped or embedded
link.
*# If no entry is found, the affix is then looked up in a modified link form (specifically, the modified display
form passed through the language's `makeEntryName()` function, which strips out certain diacritics, but with the
lookup hyphen re-added if it was stripped out, as in the case of tatweel in many Arabic-script languages).
The reason for this double lookup procedure is to allow for mappings that are sensitive to the extra diacritics, but
also allow for mappings that are not sensitive in this fashion (e.g. Russian [[-ливый]] occurs both stressed and
unstressed, but is the same prefix either way).
* A "category affix" is the affix as it appears in categories such as [[:Category:Finnish terms suffixed with -kas]].
The category affix is currently always the same as the stripped link affix. This means that for Arabic-script
languages, it may or may not have a tatweel, even if the correponding display affix and regular link affix have a
tatweel. As mentioned above, makeEntryName() strips tatweel for Arabic, Persian and Urdu, but not for Ottoman Turkish.
Hence affix categories for Arabic, Persian and Urdu will be missing the tatweel, but affix categories for
Ottoman Turkish will have it. An additional complication is that if the template affix contains a ZWNJ, the display
(and hence the link and category affixes) will have no hyphen attached in any case.
]==]
-----------------------------------------------------------------------------------------
-- Template and display hyphens --
-----------------------------------------------------------------------------------------
--[=[
Per-script template hyphens. The template hyphen is what appears in the {{affix}}/{{prefix}}/{{suffix}}/etc. template
(in the wikicode). See above.
They key below is a script code, after removing a hyphen and anything preceding. Hence, script codes like 'fa-Arab'
and 'ur-Arab' will match 'Arab'.
The value below is a string consisting of one or more hyphen characters. If there is more than one character, the
default hyphen must come last and a non-default function must be specified for the script in display_hyphens[] so
the correct display hyphen will be specified when no template hyphen is given (in {{suffix}}/{{prefix}}/etc.).
Script detection is normally done when linking, but we need to do it earlier. However, under most circumstances we
don't need to do script detection. Specifically, we only need to do script detection for a given language if
(a) the language has multiple scripts; and
(b) at least one of those scripts is listed below or in display_hyphens.
]=]
localZWNJ=u(0x200C)-- zero-width non-joiner
localtemplate_hyphens={
["Arab"]="ـ"..ZWNJ.."-",-- tatweel + zero-width non-joiner + regular hyphen
["Hebr"]="־",-- Hebrew-specific hyphen termed "maqqef"
-- This covers all Arabic scripts. See above.
["Mong"]="᠊",
["mnc-Mong"]="᠊",
["sjo-Mong"]="᠊",
["xwo-Mong"]="᠊",
-- FIXME! What about the following right-to-left scripts?
-- Adlm (Adlam)
-- Armi (Imperial Aramaic)
-- Avst (Avestan)
-- Cprt (Cypriot)
-- Khar (Kharoshthi)
-- Mand (Mandaic/Mandaean)
-- Mani (Manichaean)
-- Mend (Mende/Mende Kikakui)
-- Narb (Old North Arabian)
-- Nbat (Nabataean/Nabatean)
-- Nkoo (N'Ko)
-- Orkh (Orkhon runes)
-- Phli (Inscriptional Pahlavi)
-- Phlp (Psalter Pahlavi)
-- Phlv (Book Pahlavi)
-- Phnx (Phoenician)
-- Prti (Inscriptional Parthian)
-- Rohg (Hanifi Rohingya)
-- Samr (Samaritan)
-- Sarb (Old South Arabian)
-- Sogd (Sogdian)
-- Sogo (Old Sogdian)
-- Syrc (Syriac)
-- Thaa (Thaana)
}
-- Hyphens used when looking up an affix in a lang-specific affix mapping. Defaults to regular hyphen (-). The keys
-- are script codes, after removing a hyphen and anything preceding. Hence, script codes like 'fa-Arab' and 'ur-Arab'
-- will match 'Arab'. The value should be a single character.
locallookup_hyphens={
["Hebr"]="־",
-- This covers all Arabic scripts. See above.
["Arab"]="ـ",
}
-- Default display-hyphen function.
localfunctiondefault_display_hyphen(script,hyph)
ifnothyphthen
returntemplate_hyphens[script]or"-"
end
returnhyph
end
localfunctionarab_get_display_hyphen(script,hyph)
ifnothyphthen
return"ـ"-- tatweel
elseifhyph==ZWNJthen
return""
else
returnhyph
end
end
localfunctionno_display_hyphen(script,hyph)
return""
end
-- Per-script function to return the correct display hyphen given the script and template hyphen. The function should
-- also handle the case where the passed-in template hyphen is nil, corresponding to the situation in
-- {{prefix}}/{{suffix}}/etc. where no template hyphen is specified. The key is the script code after removing a hyphen
-- and anything preceding, so 'fa-Arab', 'ur-Arab' etc. will match 'Arab'.
localdisplay_hyphens={
-- This covers all Arabic scripts. See above.
["Arab"]=arab_get_display_hyphen,
["Bopo"]=no_display_hyphen,
["Hani"]=no_display_hyphen,
["Hans"]=no_display_hyphen,
["Hant"]=no_display_hyphen,
-- The following is a mixture of several scripts. Hopefully the specs here are correct!
["Jpan"]=no_display_hyphen,
["Jurc"]=no_display_hyphen,
["Kitl"]=no_display_hyphen,
["Kits"]=no_display_hyphen,
["Laoo"]=no_display_hyphen,
["Nshu"]=no_display_hyphen,
["Shui"]=no_display_hyphen,
["Tang"]=no_display_hyphen,
["Thaa"]=no_display_hyphen,
["Thai"]=no_display_hyphen,
}
-----------------------------------------------------------------------------------------
-- Basic Utility functions --
-----------------------------------------------------------------------------------------
localfunctionglossary_link(entry,text)
text=textorentry
return"[[Appendix:Glossary#"..entry.."|"..text.."]]"
end
localfunctiontrack(page)
iftype(page)=="table"then
fori,pginipairs(page)do
page[i]="affix/"..pg
end
else
page="affix/"..page
end
require("Module:debug/track")(page)
end
localfunctionine(val)
returnval~=""andvalornil
end
-----------------------------------------------------------------------------------------
-- Compound types --
-----------------------------------------------------------------------------------------
localfunctionmake_compound_type(typ,alttext)
return{
text=glossary_link(typ,alttext).."compound",
cat=typ.."compounds",
}
end
-- Make a compound type entry with a simple rather than glossary link.
-- These should be replaced with a glossary link when the entry in the glossary
-- is created.
localfunctionmake_non_glossary_compound_type(typ,alttext)
locallink=alttextand"[["..typ.."|"..alttext.."]]"or"[["..typ.."]]"
return{
text=link.."compound",
cat=typ.."compounds",
}
end
localfunctionmake_raw_compound_type(typ,alttext)
return{
text=glossary_link(typ,alttext),
cat=pluralize(typ),
}
end
localfunctionmake_borrowing_type(typ,alttext)
return{
text=glossary_link(typ,alttext),
borrowing_type=pluralize(typ),
}
end
export.etymology_types={
["adapted borrowing"]=make_borrowing_type("adapted borrowing"),
["adap"]="adapted borrowing",
["abor"]="adapted borrowing",
["alliterative"]=make_non_glossary_compound_type("alliterative"),
["allit"]="alliterative",
["antonymous"]=make_non_glossary_compound_type("antonymous"),
["ant"]="antonymous",
["bahuvrihi"]=make_compound_type("bahuvrihi","bahuvrīhi"),
["bahu"]="bahuvrihi",
["bv"]="bahuvrihi",
["coordinative"]=make_compound_type("coordinative"),
["coord"]="coordinative",
["descriptive"]=make_compound_type("descriptive"),
["desc"]="descriptive",
["determinative"]=make_compound_type("determinative"),
["det"]="determinative",
["dvandva"]=make_compound_type("dvandva"),
["dva"]="dvandva",
["dvigu"]=make_compound_type("dvigu"),
["dvi"]="dvigu",
["endocentric"]=make_compound_type("endocentric"),
["endo"]="endocentric",
["exocentric"]=make_compound_type("exocentric"),
["exo"]="exocentric",
["izafet I"]=make_compound_type("izafet I"),
["iz1"]="izafet I",
["izafet II"]=make_compound_type("izafet II"),
["iz2"]="izafet II",
["izafet III"]=make_compound_type("izafet III"),
["iz3"]="izafet III",
["karmadharaya"]=make_compound_type("karmadharaya","karmadhāraya"),
["karma"]="karmadharaya",
["kd"]="karmadharaya",
["kenning"]=make_raw_compound_type("kenning"),
["ken"]="kenning",
["rhyming"]=make_non_glossary_compound_type("rhyming"),
["rhy"]="rhyming",
["synonymous"]=make_non_glossary_compound_type("synonymous"),
["syn"]="synonymous",
["tatpurusa"]=make_compound_type("tatpurusa","tatpuruṣa"),
["tat"]="tatpurusa",
["tp"]="tatpurusa",
}
localfunctionprocess_etymology_type(typ,nocap,notext,has_parts)
localtext_sections={}
localcategories={}
localborrowing_type
iftypthen
localtypdata=export.etymology_types[typ]
iftype(typdata)=="string"then
typdata=export.etymology_types[typdata]
end
ifnottypdatathen
error("Internal error: Unrecognized type '"..typ.."'")
end
localtext=typdata.text
ifnotnocapthen
text=ucfirst(text)
end
localcat=typdata.cat
borrowing_type=typdata.borrowing_type
localoftext=typdata.oftextor"of"
ifnotnotextthen
table.insert(text_sections,text)
ifhas_partsthen
table.insert(text_sections,oftext)
table.insert(text_sections,"")
end
end
ifcatthen
table.insert(categories,cat)
end
end
returntext_sections,categories,borrowing_type
end
-----------------------------------------------------------------------------------------
-- Utility functions --
-----------------------------------------------------------------------------------------
-- Iterate an array up to the greatest integer index found.
localfunctionipairs_with_gaps(t)
localindices=m_table.numKeys(t)
localmax_index=#indices>0andmath.max(unpack(indices))or0
locali=0
returnfunction()
whilei<max_indexdo
i=i+1
returni,t[i]
end
end
end
export.ipairs_with_gaps=ipairs_with_gaps
--[==[
Join formatted parts (in `parts_formatted`) together with any overall {{para|lit}} spec (in `lit`) plus categories,
which are formatted by prepending the language name as found in `lang`. The value of an entry in `categories` can be
either a string (which is formatted using `sort_key`) or a table of the form `{ {cat=<var>category</var>,
sort_key=<var>sort_key</var>, sort_base=<var>sort_base</var>}`, specifying the sort key and sort base to use when
formatting the category. If `nocat` is given, no categories are added; otherwise, `force_cat` causes categories to be
added even on userspace pages.
]==]
functionexport.join_formatted_parts(data)
localcattext
locallang=data.data.lang
localforce_cat=data.data.force_catordebug_force_cat
ifdata.data.nocatthen
cattext=""
else
fori,catinipairs(data.categories)do
iftype(cat)=="table"then
data.categories[i]=require(utilities_module).format_categories({lang:getFullName()..""..cat.cat},
lang,cat.sort_key,cat.sort_base,force_cat)
else
data.categories[i]=require(utilities_module).format_categories({lang:getFullName()..""..cat},lang,
data.data.sort_key,nil,force_cat)
end
end
cattext=table.concat(data.categories)
end
localresult=table.concat(data.parts_formatted,"+‎")..(data.data.litand",literally"..
m_links.mark(data.data.lit,"gloss")or"")
localq=data.data.q
localqq=data.data.qq
locall=data.data.l
localll=data.data.ll
ifqandq[1]orqqandqq[1]orlandl[1]orllandll[1]then
result=require(pron_qualifier_module).format_qualifiers{
lang=lang,
text=result,
q=q,
qq=qq,
l=l,
ll=ll,
}
end
returnresult..cattext
end
--[==[
Older entry point for calling `join_formatted_parts(). FIXME: Convert callers.
]==]
functionexport.concat_parts(lang,parts_formatted,categories,nocat,sort_key,lit,force_cat)
returnexport.join_formatted_parts{
data={
lang=lang,
nocat=nocat,
sort_key=sort_key,
lit=lit,
force_cat=force_cat,
},
parts_formatted=parts_formatted,
categories=categories,
}
end
localfunctionpluralize(pos)
ifpos~="nouns"andusub(pos,-5)~="verbs"andusub(pos,-4)~="ives"then
ifpos:find("[sx]$")then
pos=pos.."es"
else
pos=pos.."s"
end
end
returnpos
end
-- Remove links and call lang:makeEntryName(term).
localfunctionmake_entry_name_no_links(lang,term)
-- Double parens because makeEntryName() returns multiple values. Yuck.
return(lang:makeEntryName(m_links.remove_links(term)))
end
--[=[
Convert a raw part as passed into an entry point into a part ready for linking. `lang` and `sc` are the overall
language and script objects. This uses the overall language and script objects as defaults for the part and parses off
any fragment from the term. We need to do the latter so that fragments don't end up in categories and so that we
correctly do affix mapping even in the presence of fragments.
]=]
localfunctioncanonicalize_part(part,lang,sc)
ifnotpartthen
return
end
-- Save the original (user-specified, part-specific) value of `lang`. If such a value is specified, we don't insert
-- a '*fixed with' category, and we format the part using format_derived() in [[Module:etymology]] rather than
-- full_link() in [[Module:links]].
part.part_lang=part.lang
part.lang=part.langorlang
part.sc=part.scorsc
localterm=part.term
ifnottermthen
return
elseifnotpart.fragmentthen
part.term,part.fragment=m_links.get_fragment(term)
else
part.term=m_links.get_fragment(term)
end
end
--[==[
Construct a single linked part based on the information in `part`, for use by `show_affix()` and other entry points.
This should be called after `canonicalize_part()` is called on the part. This is a thin wrapper around `full_link()` in
[[Module:links]] unless `part.part_lang` is specified (indicating that a part-specific language was given), in which
case `format_derived()` in [[Module:etymology]] is called to display a term in a language other than the language of
the overall term (specified in `data.lang`). `data` contains the entire object passed into the entry point and is used
to access information for constructing the categories added by `format_derived()`.
]==]
functionexport.link_term(part,data)
localresult
ifpart.part_langthen
result=require(etymology_module).format_derived{
lang=data.lang,
terminfo=part,
sort_key=data.sort_key,
nocat=data.nocat,
borrowing_type=data.borrowing_type,
force_cat=data.force_catordebug_force_cat,
}
else
-- language (e.g. in a pseudo-loan).
result=m_links.full_link(part,"term")
end
ifpart.qandpart.q[1]orpart.qqandpart.qq[1]orpart.landpart.l[1]orpart.llandpart.ll[1]or
part.refsandpart.refs[1]then
result=require(pron_qualifier_module).format_qualifiers{
lang=part.lang,
text=result,
q=part.q,
qq=part.qq,
l=part.l,
ll=part.ll,
refs=part.refs,
}
end
returnresult
end
localfunctioncanonicalize_script_code(scode)
-- Convert fa-Arab, ur-Arab etc. to Arab.
return(scode:gsub("^.*%-",""))
end
-----------------------------------------------------------------------------------------
-- Affix-handling functions --
-----------------------------------------------------------------------------------------
-- Figure out the appropriate script for the given affix and language (unless the script is explicitly passed in), and
-- return the values of template_hyphens[], display_hyphens[] and lookup_hyphens[] for that script, substituting
-- default values as appropriate. Four values are returned:
-- DETECTED_SCRIPT, TEMPLATE_HYPHEN, DISPLAY_HYPHEN, LOOKUP_HYPHEN
localfunctiondetect_script_and_hyphens(text,lang,sc)
localscode
-- 1. If the script is explicitly passed in, use it.
ifscthen
scode=sc:getCode()
else
localpossible_script_codes=lang:getScriptCodes()
-- YUCK! `possible_script_codes` comes from loadData() so #possible_scripts doesn't work (always returns 0).
localnum_possible_script_codes=m_table.length(possible_script_codes)
ifnum_possible_script_codes==0then
-- This shouldn't happen; if the language has no script codes,
-- the list { "None" } should be returned.
error("Something is majorly wrong! Language"..lang:getCanonicalName().."has no script codes.")
end
ifnum_possible_script_codes==1then
-- 2. If the language has only one possible script, use it.
scode=possible_script_codes[1]
else
-- 3. Check if any of the possible scripts for the language have non-default values for template_hyphens[]
-- or display_hyphens[]. If so, we need to do script detection on the text. If not, just use "Latn",
-- which may not be technically correct but produces the right results because Latn has all default
-- values for template_hyphens[] and display_hyphens[].
localmay_have_nondefault_hyphen=false
for_,script_codeinipairs(possible_script_codes)do
script_code=canonicalize_script_code(script_code)
iftemplate_hyphens[script_code]ordisplay_hyphens[script_code]then
may_have_nondefault_hyphen=true
break
end
end
ifnotmay_have_nondefault_hyphenthen
scode="Latn"
else
scode=lang:findBestScript(text):getCode()
end
end
end
scode=canonicalize_script_code(scode)
localtemplate_hyphen=template_hyphens[scode]or"-"
locallookup_hyphen=lookup_hyphens[scode]or"-"
localdisplay_hyphen=display_hyphens[scode]ordefault_display_hyphen
returnscode,template_hyphen,display_hyphen,lookup_hyphen
end
--[=[
Given a template affix `term` and an affix type `affix_type`, change the relevant template hyphen(s) in the affix to
the display or lookup hyphen specified in `new_hyphen`, or add them if they are missing. `new_hyphen` can be a string,
specifying a fixed hyphen, or a function of two arguments (the script code `scode` and the discovered template hyphen,
or nil of no relevant template hyphen is present). `thyph_re` is a Lua pattern (which must be enclosed in parens) that
matches the possible template hyphens. Note that not all template hyphens present in the affix are changed, but only
the "relevant" ones (e.g. for a prefix, a relevant template hyphen is one coming at the end of the affix).
]=]
localfunctionreconstruct_term_per_hyphens(term,affix_type,scode,thyph_re,new_hyphen)
localfunctionget_hyphen(hyph)
iftype(new_hyphen)=="string"then
returnnew_hyphen
end
returnnew_hyphen(scode,hyph)
end
ifnotaffix_typethen
returnterm
elseifaffix_type=="circumfix"then
localbefore,before_hyphen,after_hyphen,after=rmatch(term,"^(.*)"..thyph_re..""..thyph_re
.."(.*)$")
ifnotbeforeorulen(term)<=3then
-- Unlike with other types of affixes, don't try to add hyphens in the middle of the term to convert it to
-- a circumfix. Also, if the term is just hyphen + space + hyphen, return it.
returnterm
end
returnbefore..get_hyphen(before_hyphen)..""..get_hyphen(after_hyphen)..after
elseifaffix_type=="infix"oraffix_type=="interfix"then
localbefore_hyphen,middle,after_hyphen=rmatch(term,"^"..thyph_re.."(.*)"..thyph_re.."$")
ifbefore_hyphenandulen(term)<=1then
-- If the term is just a hyphen, return it.
returnterm
end
returnget_hyphen(before_hyphen)..(middleorterm)..get_hyphen(after_hyphen)
elseifaffix_type=="prefix"then
localmiddle,after_hyphen=rmatch(term,"^(.*)"..thyph_re.."$")
ifmiddleandulen(term)<=1then
-- If the term is just a hyphen, return it.
returnterm
end
return(middleorterm)..get_hyphen(after_hyphen)
elseifaffix_type=="suffix"then
localbefore_hyphen,middle=rmatch(term,"^"..thyph_re.."(.*)$")
ifbefore_hyphenandulen(term)<=1then
-- If the term is just a hyphen, return it.
returnterm
end
returnget_hyphen(before_hyphen)..(middleorterm)
else
error(("Internal error: Unrecognized affix type '%s'"):format(affix_type))
end
end
--[=[
Look up a mapping from a given affix variant to the canonical form used in categories and links. The lookup tables are
language-specific according to `lang`, and may be ID-specific according to `affix_id`. The affixes as they appear in the
lookup tables (both the variant and the canonical form) are in "lookup affix" format (approximately speaking, they use a
regular hyphen for most scripts, but a tatweel for Arabic-script entries and a maqqef for Hebrew-script entries), but
the passed-in `affix` param is in "template affix" format (which differs from the lookup affix for Arabic-script
entries, because more types of hyphens are allowed in template affixes; see the comments at the top of the file). The
remaining parameters to this function are used to convert from template affixes to lookup affixes; see the
reconstruct_term_per_hyphens() function above.
If the affix contains brackets, no lookup is done. Otherwise, a two-stage process is used, first looking up the affix
directly and then stripping diacritics and looking it up again. The reason for this is documented above in the comments
at the top of the file (specifically, the comments describing lookup affixes).
The value of a mapping can either be a string (do the mapping regardless of affix ID) or a table indexed by affix ID
(where the special value `false` indicates no affix ID). The values of entries in this table can also be strings, or
tables with keys `affix` and `id` (again, use `false` to indicate no ID). This allows an affix mapping to map from one
ID to another (for example, this is used in English to map the [[an-]] prefix with no ID to the [[a-]] prefix with the
ID 'not').
The Given a template affix `term` and an affix type `affix_type`, change the relevant template hyphen(s) in the affix to
the display or lookup hyphen specified in `new_hyphen`, or add them if they are missing. `new_hyphen` can be a string,
specifying a fixed hyphen, or a function of two arguments (the script code `scode` and the discovered template hyphen,
or nil of no relevant template hyphen is present). `thyph_re` is a Lua pattern (which must be enclosed in parens) that
matches the possible template hyphens. Note that not all template hyphens present in the affix are changed, but only
the "relevant" ones (e.g. for a prefix, a relevant template hyphen is one coming at the end of the affix).
]=]
localfunctionlookup_affix_mapping(affix,affix_type,lang,scode,thyph_re,lookup_hyph,affix_id)
localfunctiondo_lookup(affix)
-- Ensure that the affix uses lookup hyphens regardless of whether it used a different type of hyphens before
-- or no hyphens.
locallookup_affix=reconstruct_term_per_hyphens(affix,affix_type,scode,thyph_re,lookup_hyph)
localfunctiondo_lookup_for_langcode(langcode)
ifexport.langs_with_lang_specific_data[langcode]then
locallangdata=mw.loadData(export.affix_lang_data_module_prefix..langcode)
iflangdata.affix_mappingsthen
localmapping=langdata.affix_mappings[lookup_affix]
ifmappingthen
iftype(mapping)=="table"then
mapping=mapping[affix_idorfalse]
ifmappingthen
returnmapping
end
else
returnmapping
end
end
end
end
end
-- If `lang` is an etymology-only language, look for a mapping both for it and its full parent.
locallangcode=lang:getCode()
localmapping=do_lookup_for_langcode(langcode)
ifmappingthen
returnmapping
end
localfull_langcode=lang:getFullCode()
iffull_langcode~=langcodethen
mapping=do_lookup_for_langcode(full_langcode)
ifmappingthen
returnmapping
end
end
returnnil
end
ifaffix:find("%[%[")then
returnnil
end
-- Double parens because makeEntryName() returns multiple values. Yuck.
returndo_lookup(affix)ordo_lookup((lang:makeEntryName(affix)))ornil
end
--[==[
For a given template term in a given language (see the definition of "template affix" near the top of the file),
possibly in an explicitly specified script `sc` (but usually nil), return the term's affix type ({ "prefix" }, { "infix" },
{ "suffix" }, { "circumfix" } or {nil} for non-affix) along with the corresponding link and display affixes (see definitions
near the top of the file); also the corresponding lookup affix (if `return_lookup_affix` is specified). The term passed
in should already have any fragment (after the # sign) parsed off of it. Four values are returned: `affix_type`,
`link_term`, `display_term` and `lookup_term`. The affix type can be passed in instead of autodetected (pass in {false}
if the term is not an affix); in this case, the template term need not have any attached hyphens, and the appropriate
hyphens will be added in the appropriate places. If `do_affix_mapping` is specified, look up the affix in the
lang-specific affix mappings, as described in the comment at the top of the file; otherwise, the link and display terms
will always be the same. (They will be the same in any case if the template term has a bracketed link in it or is not
an affix.) If `return_lookup_affix` is given, the fourth return value contains the term with appropriate lookup hyphens
in the appropriate places; otherwise, it is the same as the display term. (This functionality is used in
[[Module:category tree/poscatboiler/data/affixes and compounds]] to convert link affixes into lookup affixes so that
they can be looked up in the affix mapping tables.)
]==]
localfunctionparse_term_for_affixes(term,lang,sc,affix_type,do_affix_mapping,return_lookup_affix,affix_id)
ifnottermthen
returnnil,nil,nil,nil
end
ifterm:find("^%^")then
-- If term begins with ^, it's not an affix no matter what. Strip off the ^ and return "no affix".
term=usub(term,2)
returnnil,term,term,term
end
-- Remove an asterisk if the morpheme is reconstructed and add it back at the end.
localreconstructed=""
ifterm:find("^%*")then
reconstructed="*"
term=term:gsub("^%*","")
end
localscode,thyph,dhyph,lhyph=detect_script_and_hyphens(term,lang,sc)
thyph="(["..thyph.."])"
ifaffix_type==nilthen
ifrfind(term,thyph..""..thyph)then
affix_type="circumfix"
else
localhas_beginning_hyphen=rfind(term,"^"..thyph)
localhas_ending_hyphen=rfind(term,thyph.."$")
ifhas_beginning_hyphenandhas_ending_hyphenthen
affix_type="infix"
elseifhas_ending_hyphenthen
affix_type="prefix"
elseifhas_beginning_hyphenthen
affix_type="suffix"
end
end
end
locallink_term,display_term,lookup_term
ifaffix_typethen
display_term=reconstruct_term_per_hyphens(term,affix_type,scode,thyph,dhyph)
ifdo_affix_mappingthen
link_term=lookup_affix_mapping(term,affix_type,lang,scode,thyph,lhyph,affix_id)
-- The return value of lookup_affix_mapping() may be an affix mapping with lookup hyphens if a mapping
-- was found, otherwise nil if a mapping was not found. We need to convert to display hyphens in
-- either case, but in the latter case we can reuse the display term, which has already been converted.
iflink_termthen
link_term=reconstruct_term_per_hyphens(link_term,affix_type,scode,thyph,dhyph)
else
link_term=display_term
end
else
link_term=display_term
end
ifreturn_lookup_affixthen
lookup_term=reconstruct_term_per_hyphens(term,affix_type,scode,thyph,lhyph)
else
lookup_term=display_term
end
else
link_term=term
display_term=term
lookup_term=term
end
link_term=reconstructed..link_term
display_term=reconstructed..display_term
lookup_term=reconstructed..lookup_term
returnaffix_type,link_term,display_term,lookup_term
end
--[==[
Add a hyphen to a term in the appropriate place, based on the specified affix type, stripping off any existing hyphens
in that place. For example, if `affix_type` == { "prefix" }, we'll add a hyphen onto the end if it's not already there (or
is of the wrong type). Three values are returned: the link term, display term and lookup term. This function is a thin
wrapper around `parse_term_for_affixes`; see the comments above that function for more information. Note that this
function is exposed externally because it is called by [[Module:category tree/poscatboiler/data/affixes and compounds]];
see the comment in `parse_term_for_affixes` for more information.
]==]
functionexport.make_affix(term,lang,sc,affix_type,do_affix_mapping,return_lookup_affix,affix_id)
ifnot(affix_type=="prefix"oraffix_type=="suffix"oraffix_type=="circumfix"oraffix_type=="infix"or
affix_type=="interfix")then
error("Internal error: Invalid affix type"..(affix_typeor"(nil)"))
end
local_,link_term,display_term,lookup_term=parse_term_for_affixes(term,lang,sc,affix_type,
do_affix_mapping,return_lookup_affix,affix_id)
returnlink_term,display_term,lookup_term
end
-----------------------------------------------------------------------------------------
-- Main entry points --
-----------------------------------------------------------------------------------------
--[==[
Implementation of {{tl|affix}} and {{tl|surface analysis}}. `data` contains all the information describing the affixes to
be displayed, and contains the following:
* `.lang` ('''required'''): Overall language object. Different from term-specific language objects (see `.parts` below).
* `.sc`: Overall script object (usually omitted). Different from term-specific script objects.
* `.parts` ('''required'''): List of objects describing the affixes to show. The general format of each object is as would
be passed to `full_link()`, except that the `.lang` field should be missing unless the term is of a language
different from the overall `.lang` value (in such a case, the language name is shown along with the term and
an additional "derived from" category is added). '''WARNING''': The data in `.parts` will be destructively
modified.
* `.pos`: Overall part of speech (used in categories, defaults to { "terms" }). Different from term-specific part of speech.
* `.sort_key`: Overall sort key. Normally omitted except e.g. in Japanese.
* `.type`: Type of compound, if the parts in `.parts` describe a compound. Strictly optional, and if supplied, the
compound type is displayed before the parts (normally capitalized, unless `.nocap` is given).
* `.nocap`: Don't capitalize the first letter of text displayed before the parts (relevant only if `.type` or
`.surface_analysis` is given).
* `.notext`: Don't display any text before the parts (relevant only if `.type` or `.surface_analysis` is given).
* `.nocat`: Disable all categorization.
* `.lit`: Overall literal definition. Different from term-specific literal definitions.
* `.force_cat`: Always display categories, even on userspace pages.
* `.surface_analysis`: Implement {{surface analysis}}; adds `By surface analysis, ` before the parts.
'''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`.
]==]
functionexport.show_affix(data)
data.pos=data.posordefault_pos
data.pos=pluralize(data.pos)
localtext_sections,categories,borrowing_type=
process_etymology_type(data.type,data.surface_analysisordata.nocap,data.notext,#data.parts>0)
data.borrowing_type=borrowing_type
-- Process each part
localparts_formatted={}
localwhole_words=0
localis_affix_or_compound=false
-- Canonicalize and generate links for all the parts first; then do categorization in a separate step, because when
-- processing the first part for categorization, we may access the second part and need it already canonicalized.
fori,partinipairs_with_gaps(data.parts)do
part=partor{}
data.parts[i]=part
canonicalize_part(part,data.lang,data.sc)
-- Determine affix type and get link and display terms (see text at top of file). Store them in the part
-- (in fields that won't clash with fields used by full_link() in [[Module:links]] or link_term()), so they
-- can be used in the loop below when categorizing.
part.affix_type,part.affix_link_term,part.affix_display_term=parse_term_for_affixes(part.term,
part.lang,part.sc,nil,notpart.alt,nil,part.id)
-- If link_term is an empty string, either a bare ^ was specified or an empty term was used along with inline
-- modifiers. The intention in either case is not to link the term.
part.term=ine(part.affix_link_term)
-- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being
-- redundant alt text.
part.alt=part.altor(part.affix_display_term~=part.affix_link_termandpart.affix_display_term)ornil
-- Make a link for the part.
table.insert(parts_formatted,export.link_term(part,data))
end
-- Now do categorization.
fori,partinipairs_with_gaps(data.parts)do
localaffix_type=part.affix_type
ifaffix_typethen
is_affix_or_compound=true
-- We cannot distinguish interfixes from infixes by appearance. Prefer interfixes; infixes will need to
-- use {{infix}}.
ifaffix_type=="infix"thenaffix_type="interfix"end
-- Make a sort key. For the first part, use the second part as the sort key; the intention is that if the
-- term has a prefix, sorting by the prefix won't be very useful so we sort by what follows, which is
-- presumably the root.
localpart_sort_base=nil
localpart_sort=part.sortordata.sort_key
ifi==1anddata.parts[2]anddata.parts[2].termthen
localpart2=data.parts[2]
-- If the second-part link term is empty, the user requested an unlinked term; avoid a wikitext error
-- by using the alt value if available.
part_sort_base=ine(part2.affix_link_term)orine(part2.alt)
ifpart_sort_basethen
part_sort_base=make_entry_name_no_links(part2.lang,part_sort_base)
end
end
ifpart.posandrfind(part.pos,"patronym")then
table.insert(categories,{cat="patronymics",sort_key=part_sort,sort_base=part_sort_base})
end
ifdata.pos~="terms"andpart.posandrfind(part.pos,"diminutive")then
table.insert(categories,{cat="diminutive"..data.pos,sort_key=part_sort,
sort_base=part_sort_base})
end
-- Don't add a '*fixed with' category if the link term is empty or is in a different language.
ifine(part.affix_link_term)andnotpart.part_langthen
table.insert(categories,{cat=data.pos..""..affix_type.."ed with"..
make_entry_name_no_links(part.lang,part.affix_link_term)..
(part.idand"("..part.id..")"or""),
sort_key=part_sort,sort_base=part_sort_base})
end
else
whole_words=whole_words+1
ifwhole_words==2then
is_affix_or_compound=true
table.insert(categories,"compound"..data.pos)
end
end
end
-- Make sure there was either an affix or a compound (two or more regular terms).
ifnotis_affix_or_compoundthen
error("The parameters did not include any affixes, and the term is not a compound. Please provide at least one affix.")
end
ifdata.surface_analysisthen
localtext="by"..glossary_link("surface analysis")..","
ifnotdata.nocapthen
text=ucfirst(text)
end
table.insert(text_sections,1,text)
end
table.insert(text_sections,export.join_formatted_parts{data=data,parts_formatted=parts_formatted,
categories=categories})
returntable.concat(text_sections)
end
functionexport.show_surface_analysis(data)
data.surface_analysis=true
returnexport.show_affix(data)
end
--[==[
Implementation of {{tl|compound}}.
'''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`.
]==]
functionexport.show_compound(data)
data.pos=data.posordefault_pos
data.pos=pluralize(data.pos)
localtext_sections,categories,borrowing_type=
process_etymology_type(data.type,data.nocap,data.notext,#data.parts>0)
data.borrowing_type=borrowing_type
localparts_formatted={}
table.insert(categories,"compound"..data.pos)
-- Make links out of all the parts
localwhole_words=0
fori,partinipairs(data.parts)do
canonicalize_part(part,data.lang,data.sc)
-- Determine affix type and get link and display terms (see text at top of file).
localaffix_type,link_term,display_term=parse_term_for_affixes(part.term,part.lang,part.sc,
nil,notpart.alt,nil,part.id)
-- If the term is an infix, recognize it as such (which means e.g. that we will display the term without
-- hyphens for East Asian languages). Otherwise, ignore the fact that it looks like an affix and display as
-- specified in the template (but pay attention to the detected affix type for certain tracking purposes).
ifaffix_type=="infix"then
-- If link_term is an empty string, either a bare ^ was specified or an empty term was used along with
-- inline modifiers. The intention in either case is not to link the term. Don't add a '*fixed with'
-- category in this case, or if the term is in a different language.
-- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being
-- redundant alt text.
iflink_termandlink_term~=""andnotpart.part_langthen
table.insert(categories,{cat=data.pos.."interfixed with"..make_entry_name_no_links(part.lang,
link_term),sort_key=part.sortordata.sort_key})
end
part.term=link_term~=""andlink_termornil
part.alt=part.altor(display_term~=link_termanddisplay_term)ornil
else
ifaffix_typethen
locallangcode=data.lang:getCode()
-- If `data.lang` is an etymology-only language, track both using its code and its full parent's code.
track{affix_type,affix_type.."/lang/"..langcode}
localfull_langcode=data.lang:getFullCode()
iflangcode~=full_langcodethen
track(affix_type.."/lang/"..full_langcode)
end
else
whole_words=whole_words+1
end
end
table.insert(parts_formatted,export.link_term(part,data))
end
ifwhole_words==1then
track("one whole word")
elseifwhole_words==0then
track("looks like confix")
end
table.insert(text_sections,export.join_formatted_parts{data=data,parts_formatted=parts_formatted,
categories=categories})
returntable.concat(text_sections)
end
--[==[
Implementation of {{tl|blend}}, {{tl|univerbation}} and similar "compound-like" templates.
'''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`.
]==]
functionexport.show_compound_like(data)
localparts_formatted={}
localcategories={}
ifdata.catthen
table.insert(categories,data.cat)
end
-- Make links out of all the parts
fori,partinipairs(data.parts)do
canonicalize_part(part,data.lang,data.sc)
table.insert(parts_formatted,export.link_term(part,data))
end
localtext_sections={}
ifdata.textthen
table.insert(text_sections,data.text)
end
if#data.parts>0anddata.oftextthen
table.insert(text_sections,"")
table.insert(text_sections,data.oftext)
table.insert(text_sections,"")
end
table.insert(text_sections,export.join_formatted_parts{data=data,parts_formatted=parts_formatted,
categories=categories})
returntable.concat(text_sections)
end
--[==[
Make `part` (a structure holding information on an affix part) into an affix of type `affix_type`, and apply any
relevant affix mappings. For example, if the desired affix type is "suffix", this will (in general) add a hyphen onto
the beginning of the term, alt, tr and ts components of the part if not already present. The hyphen that's added is the
"display hyphen" (see above) and may be script-specific. (In the case of East Asian scripts, the display hyphen is an
empty string whereas the template hyphen is the regular hyphen, meaning that any regular hyphen at the beginning of the
part will be effectively removed.) `lang` and `sc` hold overall language and script objects.
Note that this also applies any language-specific affix mappings, so that e.g. if the language is Finnish and the user
specified [[-käs]] in the affix and didn't specify an `.alt` value, `part.term` will contain [[-kas]] and `part.alt` will
contain [[-käs]].
This function is used by the "legacy" templates ({{tl|prefix}}, {{tl|suffix}}, {{tl|confix}}, etc.) where the nature of
the affix is specified by the template itself rather than auto-determined from the affix, as is the case with
{{tl|affix}}.
'''WARNING''': This destructively modifies `part`.
]==]
localfunctionmake_part_into_affix(part,lang,sc,affix_type)
canonicalize_part(part,lang,sc)
locallink_term,display_term=export.make_affix(part.term,part.lang,part.sc,affix_type,notpart.alt,nil,part.id)
part.term=link_term
-- When we don't specify `do_affix_mapping` to make_affix(), link and display terms (first and second retvals of
-- make_affix()) are the same.
-- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being
-- redundant alt text.
part.alt=part.altandexport.make_affix(part.alt,part.lang,part.sc,affix_type)or(display_term~=link_termanddisplay_term)ornil
localLatn=require(scripts_module).getByCode("Latn")
part.tr=export.make_affix(part.tr,part.lang,Latn,affix_type)
part.ts=export.make_affix(part.ts,part.lang,Latn,affix_type)
end
localfunctiontrack_wrong_affix_type(template,part,expected_affix_type)
ifpartthen
localaffix_type=parse_term_for_affixes(part.term,part.lang,part.sc)
ifaffix_type~=expected_affix_typethen
localpart_name=expected_affix_typeor"base"
locallangcode=part.lang:getCode()
localfull_langcode=part.lang:getFullCode()
require("Module:debug/track"){
template,
template.."/"..part_name,
template.."/"..part_name.."/"..(affix_typeor"none"),
template.."/"..part_name.."/"..(affix_typeor"none").."/lang/"..langcode
}
-- If `part.lang` is an etymology-only language, track both using its code and its full parent's code.
iffull_langcode~=langcodethen
require("Module:debug/track")(
template.."/"..part_name.."/"..(affix_typeor"none").."/lang/"..full_langcode
)
end
end
end
end
localfunctioninsert_affix_category(categories,pos,affix_type,part,sort_key,sort_base)
-- Don't add a '*fixed with' category if the link term is empty or is in a different language.
ifpart.termandnotpart.part_langthen
localcat=pos..""..affix_type.."ed with"..make_entry_name_no_links(part.lang,part.term)..
(part.idand"("..part.id..")"or"")
ifsort_keyorsort_basethen
table.insert(categories,{cat=cat,sort_key=sort_key,sort_base=sort_base})
else
table.insert(categories,cat)
end
end
end
--[==[
Implementation of {{tl|circumfix}}.
'''WARNING''': This destructively modifies both `data` and `.prefix`, `.base` and `.suffix`.
]==]
functionexport.show_circumfix(data)
data.pos=data.posordefault_pos
data.pos=pluralize(data.pos)
canonicalize_part(data.base,data.lang,data.sc)
-- Hyphenate the affixes and apply any affix mappings.
make_part_into_affix(data.prefix,data.lang,data.sc,"prefix")
make_part_into_affix(data.suffix,data.lang,data.sc,"suffix")
track_wrong_affix_type("circumfix",data.prefix,"prefix")
track_wrong_affix_type("circumfix",data.base,nil)
track_wrong_affix_type("circumfix",data.suffix,"suffix")
-- Create circumfix term.
localcircumfix=nil
ifdata.prefix.termanddata.suffix.termthen
circumfix=data.prefix.term..""..data.suffix.term
data.prefix.alt=data.prefix.altordata.prefix.term
data.suffix.alt=data.suffix.altordata.suffix.term
data.prefix.term=circumfix
data.suffix.term=circumfix
end
-- Make links out of all the parts.
localparts_formatted={}
localcategories={}
localsort_base
ifdata.base.termthen
sort_base=make_entry_name_no_links(data.base.lang,data.base.term)
end
table.insert(parts_formatted,export.link_term(data.prefix,data))
table.insert(parts_formatted,export.link_term(data.base,data))
table.insert(parts_formatted,export.link_term(data.suffix,data))
-- Insert the categories, but don't add a '*fixed with' category if the link term is in a different language.
ifnotdata.prefix.part_langthen
table.insert(categories,{cat=data.pos.."circumfixed with"..make_entry_name_no_links(data.prefix.lang,
circumfix),sort_key=data.sort_key,sort_base=sort_base})
end
returnexport.join_formatted_parts{data=data,parts_formatted=parts_formatted,categories=categories}
end
--[==[
Implementation of {{tl|confix}}.
'''WARNING''': This destructively modifies both `data` and `.prefix`, `.base` and `.suffix`.
]==]
functionexport.show_confix(data)
data.pos=data.posordefault_pos
data.pos=pluralize(data.pos)
canonicalize_part(data.base,data.lang,data.sc)
-- Hyphenate the affixes and apply any affix mappings.
make_part_into_affix(data.prefix,data.lang,data.sc,"prefix")
make_part_into_affix(data.suffix,data.lang,data.sc,"suffix")
track_wrong_affix_type("confix",data.prefix,"prefix")
track_wrong_affix_type("confix",data.base,nil)
track_wrong_affix_type("confix",data.suffix,"suffix")
-- Make links out of all the parts.
localparts_formatted={}
localprefix_sort_base
ifdata.baseanddata.base.termthen
prefix_sort_base=make_entry_name_no_links(data.base.lang,data.base.term)
elseifdata.suffix.termthen
prefix_sort_base=make_entry_name_no_links(data.suffix.lang,data.suffix.term)
end
-- Insert the categories and parts.
localcategories={}
table.insert(parts_formatted,export.link_term(data.prefix,data))
insert_affix_category(categories,data.pos,"prefix",data.prefix,data.sort_key,prefix_sort_base)
ifdata.basethen
table.insert(parts_formatted,export.link_term(data.base,data))
end
table.insert(parts_formatted,export.link_term(data.suffix,data))
-- FIXME, should we be specifying a sort base here?
insert_affix_category(categories,data.pos,"suffix",data.suffix)
returnexport.join_formatted_parts{data=data,parts_formatted=parts_formatted,categories=categories}
end
--[==[
Implementation of {{tl|infix}}.
'''WARNING''': This destructively modifies both `data` and `.base` and `.infix`.
]==]
functionexport.show_infix(data)
data.pos=data.posordefault_pos
data.pos=pluralize(data.pos)
canonicalize_part(data.base,data.lang,data.sc)
-- Hyphenate the affixes and apply any affix mappings.
make_part_into_affix(data.infix,data.lang,data.sc,"infix")
track_wrong_affix_type("infix",data.base,nil)
track_wrong_affix_type("infix",data.infix,"infix")
-- Make links out of all the parts.
localparts_formatted={}
localcategories={}
table.insert(parts_formatted,export.link_term(data.base,data))
table.insert(parts_formatted,export.link_term(data.infix,data))
-- Insert the categories.
-- FIXME, should we be specifying a sort base here?
insert_affix_category(categories,data.pos,"infix",data.infix)
returnexport.join_formatted_parts{data=data,parts_formatted=parts_formatted,categories=categories}
end
--[==[
Implementation of {{tl|prefix}}.
'''WARNING''': This destructively modifies both `data` and the structures within `.prefixes`, as well as `.base`.
]==]
functionexport.show_prefix(data)
data.pos=data.posordefault_pos
data.pos=pluralize(data.pos)
canonicalize_part(data.base,data.lang,data.sc)
-- Hyphenate the affixes and apply any affix mappings.
fori,prefixinipairs(data.prefixes)do
make_part_into_affix(prefix,data.lang,data.sc,"prefix")
end
fori,prefixinipairs(data.prefixes)do
track_wrong_affix_type("prefix",prefix,"prefix")
end
track_wrong_affix_type("prefix",data.base,nil)
-- Make links out of all the parts.
localparts_formatted={}
localfirst_sort_base=nil
localcategories={}
ifdata.prefixes[2]then
first_sort_base=ine(data.prefixes[2].term)orine(data.prefixes[2].alt)
iffirst_sort_basethen
first_sort_base=make_entry_name_no_links(data.prefixes[2].lang,first_sort_base)
end
elseifdata.basethen
first_sort_base=ine(data.base.term)orine(data.base.alt)
iffirst_sort_basethen
first_sort_base=make_entry_name_no_links(data.base.lang,first_sort_base)
end
end
fori,prefixinipairs(data.prefixes)do
table.insert(parts_formatted,export.link_term(prefix,data))
insert_affix_category(categories,data.pos,"prefix",prefix,data.sort_key,i==1andfirst_sort_baseornil)
end
ifdata.basethen
table.insert(parts_formatted,export.link_term(data.base,data))
else
table.insert(parts_formatted,"")
end
returnexport.join_formatted_parts{data=data,parts_formatted=parts_formatted,categories=categories}
end
--[==[
Implementation of {{tl|suffix}}.
'''WARNING''': This destructively modifies both `data` and the structures within `.suffixes`, as well as `.base`.
]==]
functionexport.show_suffix(data)
localcategories={}
data.pos=data.posordefault_pos
data.pos=pluralize(data.pos)
canonicalize_part(data.base,data.lang,data.sc)
-- Hyphenate the affixes and apply any affix mappings.
fori,suffixinipairs(data.suffixes)do
make_part_into_affix(suffix,data.lang,data.sc,"suffix")
end
track_wrong_affix_type("suffix",data.base,nil)
fori,suffixinipairs(data.suffixes)do
track_wrong_affix_type("suffix",suffix,"suffix")
end
-- Make links out of all the parts.
localparts_formatted={}
ifdata.basethen
table.insert(parts_formatted,export.link_term(data.base,data))
else
table.insert(parts_formatted,"")
end
fori,suffixinipairs(data.suffixes)do
table.insert(parts_formatted,export.link_term(suffix,data))
end
-- Insert the categories.
fori,suffixinipairs(data.suffixes)do
-- FIXME, should we be specifying a sort base here?
insert_affix_category(categories,data.pos,"suffix",suffix)
ifsuffix.posandrfind(suffix.pos,"patronym")then
table.insert(categories,"patronymics")
end
end
returnexport.join_formatted_parts{data=data,parts_formatted=parts_formatted,categories=categories}
end
returnexport