Agglutinative Languages
For languages like Turkish, where words are formed by chaining morphemes, Pāṇini offers a specific architecture based on Morpheme Extraction.
Linguistic Flexibility
The distinction between "agglutinative" and "fusional" / "inflectional" languages is often a spectrum rather than a binary choice. Pāṇini's morpheme extraction system can be implemented for any language where your analysis requires tokenizing internal word structures—even as a borderline case for languages like Armenian.
🏗 Agglutination Model
The Pāṇini model maps surface allomorphs to archiphonemic "base forms" (e.g., -lar and -ler to lAr) using a static inventory.
graph TD
Word["Surface Word: 'kediler'"] --> Split[Segmentation]
Split --> Root["Root: 'kedi'"]
Split --> Suffix["Suffix: 'ler'"]
subgraph Inventory ["Morpheme Inventory"]
M1["base_form: 'lAr'"] --> F1["Function: Plural"]
M1 --> A1["Applies to: Noun, Verb"]
M2["base_form: 'DA'"] --> F2["Function: Locative"]
end
Suffix -- "Map allomorph" --> M1
Root -- "Lemmatize" --> Lemma["Lemma: 'kedi'"]
M1 --> Result[GrammaticalFunction]
Lemma --> Result
Sample Segmentation Output (Turkish):
{
"word": "kediler",
"lemma": "kedi",
"morphemes": [
{
"base_form": "lAr",
"function": { "agreement": { "person": "third", "number": "plural" } }
}
]
}
🛠 Implementation
1. GrammaticalFunction Enum
Unlike fusion languages, agglutinative languages define a GrammaticalFunction enum that describes the role of each morpheme.
#[derive(
Debug,
Clone,
PartialEq,
Serialize,
Deserialize,
schemars::JsonSchema,
panini_macro::AggregableFields,
panini_macro::GrammaticalFunctionCatalog,
)]
#[serde(tag = "category", rename_all = "snake_case")]
pub enum TurkishGrammaticalFunction {
Case { value: TurkishCase },
Tense { value: TurkishTense },
Polarity { value: TurkishPolarity },
Agreement { person: Person, number: BinaryNumber },
Possessive { person: Person, number: BinaryNumber },
Derivation { value: TurkishDerivation },
}
#[derive(GrammaticalFunctionCatalog)] generates curated-pivot handles for each function category:
TurkishGrammaticalFunction::PIVOT_CASE;
TurkishGrammaticalFunction::PIVOT_TENSE;
TurkishGrammaticalFunction::PIVOT_POLARITY;
For single-field variants, the pivot is closed when the field type implements ClosedValues and open when it is a String. For multi-field variants such as Agreement { person, number }, the generated pivot key represents that function category and extracts a combined value only from matching Agreement morphemes.
2. Morpheme Inventory
Define a static list of MorphemeDefinition. These morphemes map archiphonemes to grammatical functions.
type P = TurkishMorphologyPosTag; // Generated by MorphologyInfo
type F = TurkishGrammaticalFunction;
static TURKISH_MORPHEMES: &[MorphemeDefinition<F, P>] = &[
// Plural suffix: "lAr" is the archiphoneme (handles -lar and -ler)
MorphemeDefinition {
base_form: "lAr",
functions: &[F::Agreement { person: Person::Third, number: BinaryNumber::Plural }],
applies_to: &[P::Noun, P::Verb, P::ProperNoun]
},
// Case suffix: "(y)A" (handles -a and -e with linking consonant)
MorphemeDefinition {
base_form: "(y)A",
functions: &[F::Case { value: TurkishCase::Dative }],
applies_to: &[P::Noun, P::Pronoun]
},
];
🤖 Interaction with the LLM
To help the AI know which morphemes to extract, Pāṇini dynamically injects your inventory into the prompt via extra_extraction_directives().
Dynamic Directives
By using extra_extraction_directives(), you ensure that the LLM only uses base_forms that your code can technically recognize.
🪄 Post-processing and Enrichment
Sometimes the LLM response needs additional verification or static label injection. Use post_process_extraction to modify the result after parsing.
impl LinguisticDefinition for Turkish {
// ...
const MORPHEME_PIVOTS: &'static [PivotField<Self::GrammaticalFunction>] = &[
TurkishGrammaticalFunction::PIVOT_POLARITY,
TurkishGrammaticalFunction::PIVOT_TENSE,
TurkishGrammaticalFunction::PIVOT_CASE,
];
fn post_process_extraction(
&self,
segmentation: &mut Option<Vec<WordSegmentation<Self::GrammaticalFunction>>>,
) -> Result<(), String> {
if let Some(segs) = segmentation {
for seg in segs {
// Example: Automatically enrich morphemes
for morph in &mut seg.morphemes {
if morph.base_form == "CI" {
// Agentive suffix "-cı/-ci/-cu/-cü"
// Custom enrichment logic here...
}
}
}
}
Ok(())
}
}
Morpheme pivots are opt-in. Non-agglutinative languages normally keep type GrammaticalFunction = () and do not declare MORPHEME_PIVOTS.