API Documentation for mlconjug3

API Reference for the classes in mlconjug.py

mlconjug3 Main module.

This module provides an easy-to-use interface for conjugating verbs using machine learning models. It includes a pre-trained model for French, English, Spanish, Italian, Portuguese and Romanian verbs, as well as interfaces for training custom models and conjugating verbs in multiple languages.

The main class of the module is Conjugator, which provides the conjugate() method for conjugating verbs. The class also manages the Verbiste data set and provides an interface with the scikit-learn pipeline. The class can be initialized with a specific language and a custom model, otherwise the default language is French and the pre-trained French conjugation pipeline is used.

The module also includes helper classes for managing verb data, such as VerbInfo and Verb, as well as utility functions for feature extraction and evaluation.

class mlconjug3.mlconjug.Conjugator(language='fr', model=None)[source]

Bases: object

This is the main class of the project.
The class manages the Verbiste data set and provides an interface with the scikit-learn pipeline.
If no parameters are provided, the default language is set to french and the pre-trained french conjugation pipeline is used.
The class defines the method conjugate(verb, language) which is the main method of the module.
Parameters:
  • language – string. Language of the conjugator. The default language is ‘fr’ for french.

  • model – mlconjug3.Model or scikit-learn Pipeline or Classifier implementing the fit() and predict() methods. A user provided pipeline if the user has trained his own pipeline.

Variables:
  • language – string. Language of the conjugator.

  • model – mlconjug3.Model or scikit-learn Pipeline or Classifier implementing the fit() and predict() methods.

  • conjug_manager – Verbiste object.

conjugate(verbs, subject='abbrev')[source]

Conjugate multiple verbs using multi-processing.

Parameters:
  • verbs – list of strings or string. Verbs to conjugate.

  • subject – string. Toggles abbreviated or full pronouns. The default value is ‘abbrev’. Select ‘pronoun’ for full pronouns.

Return verbs:

list of Verb objects or None.

_conjugate(verb, subject='abbrev')[source]
This is the main method of this class.
It first checks to see if the verb is in Verbiste.
If it is not, and a pre-trained scikit-learn pipeline has been supplied, the method then calls the pipeline to predict the conjugation class of the provided verb.
Returns a Verb object or None.
Parameters:
  • verb – string. Verb to conjugate.

  • subject – string. Toggles abbreviated or full pronouns. The default value is ‘abbrev’. Select ‘pronoun’ for full pronouns.

Return verb:

Verb object or None.

set_model(model)[source]

Assigns the provided pre-trained scikit-learn pipeline to be able to conjugate unknown verbs.

Parameters:

model – scikit-learn Classifier or Pipeline.

Raises:

ValueError.

API Reference for the classes in verbs/verbs.py

This module defines the VerbInfo and the Verb, VerbFr, VerbEn, VerbEs, VerbIt, VerbPt, VerbRo classes for representing verb conjugation information.

The VerbInfo class defines the structure for storing information about a verb, including its infinitive form, lexical root, and ending pattern template.

The Verb class represents a verb with information from a VerbInfo object, a dictionary of conjugation information, and options for subject pronoun format and whether or not the conjugation information was predicted by a model. The class also has methods for iterating through the conjugated forms and loading pronoun conjugations.

class mlconjug3.verbs.verbs.VerbInfo(infinitive, root, template)[source]

Bases: object

This class defines the Verbiste verb information structure.

Parameters:
  • infinitive – string. Infinitive form of the verb.

  • root – string. Lexical root of the verb.

  • template – string. Name of the verb ending pattern.

Variables:
  • infinitive – string. Infinitive form of the verb.

  • root – string. Lexical root of the verb.

  • template – string. Name of the verb ending pattern.

class mlconjug3.verbs.verbs.VerbMeta(name, bases, namespace, /, **kwargs)[source]

Bases: ABCMeta

This is a metaclass for creating verb classes. It contains the following abstract methods: - __init__: Initializes the verb class with verb information, conjugation information, subject (default is ‘abbrev’) and a flag for whether the verb is predicted or not - __getitem__: Allows for indexing of the verb class - __setitem__: Allows for setting values of the verb class through indexing - __contains__: Allows for checking if a key is present in the verb class - __iter__: Allows for iteration over the verb class - language: An abstract property that should be implemented to return the language of the verb class - iterate: An abstract method that should be implemented to iterate over all forms of the verb - load_conjug: An abstract method that should be implemented to load conjugation information for the verb - conjugate: An abstract method that should be implemented to conjugate the verb based on the subject and tense provided

class mlconjug3.verbs.verbs.Verb(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: object

This class defines the Verb Object.

Parameters:
  • verb_info – VerbInfo Object.

  • conjug_info – OrderedDict.

  • subject – string. Toggles abbreviated or full pronouns. The default value is ‘abbrev’. Select ‘pronoun’ for full pronouns.

  • predicted – bool. Indicates if the conjugation information was predicted by the model or retrieved from the dataset.

Variables:
  • verb_info – VerbInfo Object.

  • conjug_info – OrderedDict.

  • confidence_score – float. Confidence score of the prediction accuracy.

  • subject – string. Either ‘abbrev’ or ‘pronoun’

  • predicted – bool. Indicates if the conjugation information was predicted by the model or retrieved from the dataset.

iterate()[source]

Iterates over all conjugated forms and returns a list of tuples of those conjugated forms.

Return conjugated_forms:

generator. Lazy generator of conjugated forms.

_load_conjug(subject='abbrev')[source]
Populates the inflected forms of the verb.
This is the generic version of this method.
It does not add personal pronouns to the conjugated forms.
This method can handle any new language if the conjugation structure conforms to the Verbiste XML Schema.
conjugate_person(key, persons_dict, term)[source]

Creates the conjugated form of the person specified by the key argument.

Parameters:
  • key – string.

  • persons_dict – OrderedDict

  • term – string.

Returns:

None.

class mlconjug3.verbs.verbs.VerbFr(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: Verb

This class defines the French Verb Object.

_load_conjug(subject)[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.
class mlconjug3.verbs.verbs.VerbEn(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: Verb

This class defines the English Verb Object.

_load_conjug(subject)[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.
class mlconjug3.verbs.verbs.VerbEs(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: Verb

This class defines the Spanish Verb Object.

_load_conjug(subject)[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.
class mlconjug3.verbs.verbs.VerbIt(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: Verb

This class defines the Italian Verb Object.

_load_conjug(subject)[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.
class mlconjug3.verbs.verbs.VerbPt(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: Verb

This class defines the Portuguese Verb Object.

_load_conjug(subject)[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.
class mlconjug3.verbs.verbs.VerbRo(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: Verb

This class defines the Romanian Verb Object.

_load_conjug(subject)[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.

API Reference for the classes in PyVerbiste/PyVerbiste.py

PyVerbiste.

This module contains the code for the class Vrbiste. More information about mlconjug3 at https://pypi.org/project/mlconjug3/

The conjugation data conforms to the XML schema defined by Verbiste. More information on Verbiste at https://perso.b2b2c.ca/~sarrazip/dev/conjug_manager.html

class mlconjug3.PyVerbiste.PyVerbiste.Verbiste(language='default')[source]

Bases: ConjugManager

This is the class handling the Verbiste xml files.

Parameters:

language – string. | The language of the conjugator. The default value is fr for French. | The allowed values are: fr, en, es, it, pt, ro.

Variables:
  • language – Language of the conjugator.

  • verbs – Dictionary where the keys are verbs and the values are conjugation patterns.

  • conjugations – Dictionary where the keys are conjugation patterns and the values are inflected forms.

  • _allowed_endings – set. | A set containing the allowed endings of verbs in the target language.

  • templates – list of strings. List of the conjugation patterns.

_load_verbs(verbs_file)[source]

Load and parses the verbs from the xml file.

Parameters:

verbs_file – string or path object. Path to the verbs xml file.

_parse_verbs(file)[source]

Parses the XML file.

Parameters:

file – FileObject. XML file containing the verbs.

Return verb_templates:

OrderedDict. An OrderedDict containing the verb and its template for all verbs in the file.

_load_conjugations(conjugations_file)[source]

Load and parses the conjugations from the xml file.

Parameters:

conjugations_file – string or path object. Path to the conjugation xml file.

_detect_allowed_endings()
Detects the allowed endings for verbs in the supported languages.
All the supported languages except for English restrict the form a verb can take.
As English is much more productive and varied in the morphology of its verbs, any word is allowed as a verb.
Return allowed_endings:

set. A set containing the allowed endings of verbs in the target language.

_parse_conjugations(file)[source]

Parses the XML file.

Parameters:

file – FileObject. XML file containing the conjugation templates.

Return conjugations:

OrderedDict. An OrderedDict containing all the conjugation templates in the file.

get_conjug_info(template)

Gets conjugation information corresponding to the given template.

Parameters:

template – string. Name of the verb ending pattern.

Return inflected_forms:

OrderedDict or None. OrderedDict containing the conjugated suffixes of the template.

get_verb_info(verb)

Gets verb information and returns a VerbInfo instance.

Parameters:

verb – string. Verb to conjugate.

Return VerbInfo:

VerbInfo object or None.

is_valid_verb(verb)
Checks if the verb is a valid verb in the given language.
English words are always treated as possible verbs.
Verbs in other languages are filtered by their endings.
Parameters:

verb – string. The verb to conjugate.

Return is_allowed:

bool.

True if the verb is a valid verb in the language. False otherwise.

static _load_tense(tense)[source]

Load and parses the inflected forms of the tense from xml file.

Parameters:

tense – list of xml tags containing inflected forms. The list of inflected forms for the current tense being processed.

Return inflected_forms:

list. List of inflected forms.

API Reference for the classes in conjug_manager/conjug_manager.py

ConjugManager.

This module declares the code for the class ConjugManager.

More information about mlconjug3 at https://pypi.org/project/mlconjug3/ The conjugation data conforms to the JSon schema defined by mlconjug3.

class mlconjug3.conjug_manager.conjug_manager.ConjugManager(language='default')[source]

Bases: object

This is the class handling the mlconjug3 json files.

Parameters:

language – string. | The language of the conjugator. The default value is fr for French. | The allowed values are: fr, en, es, it, pt, ro.

Variables:
  • language – Language of the conjugator.

  • verbs – Dictionary where the keys are verbs and the values are conjugation patterns.

  • conjugations – Dictionary where the keys are conjugation patterns and the values are inflected forms.

  • templates – list of string representing the conjugation templates.

  • _allowed_endings – set containing the allowed endings of verbs in the target language.

_load_verbs(verbs_file)[source]

Load and parses the verbs from the json file.

Parameters:

verbs_file – string or path object. Path to the verbs json file.

_load_conjugations(conjugations_file)[source]

Load and parses the conjugations from the json file.

Parameters:

conjugations_file – string or path object. Path to the conjugation json file.

_detect_allowed_endings()[source]
Detects the allowed endings for verbs in the supported languages.
All the supported languages except for English restrict the form a verb can take.
As English is much more productive and varied in the morphology of its verbs, any word is allowed as a verb.
Return allowed_endings:

set. A set containing the allowed endings of verbs in the target language.

is_valid_verb(verb)[source]
Checks if the verb is a valid verb in the given language.
English words are always treated as possible verbs.
Verbs in other languages are filtered by their endings.
Parameters:

verb – string. The verb to conjugate.

Return is_allowed:

bool.

True if the verb is a valid verb in the language. False otherwise.

get_verb_info(verb)[source]

Gets verb information and returns a VerbInfo instance.

Parameters:

verb – string. Verb to conjugate.

Return VerbInfo:

VerbInfo object or None.

get_conjug_info(template)[source]

Gets conjugation information corresponding to the given template.

Parameters:

template – string. Name of the verb ending pattern.

Return inflected_forms:

OrderedDict or None. OrderedDict containing the conjugated suffixes of the template.

API Reference for the classes in dataset/dataset.py

This module contains the DataSet class, which holds and manages the data set for conjugating verbs.

It defines helper methods for managing Machine Learning tasks like constructing a training and testing set.

class mlconjug3.dataset.dataset.DataSet(verbs_dict)[source]

Bases: object

This class holds and manages the data set.
Defines helper methodss for managing Machine Learning tasks like constructing a training and testing set.
Parameters:

verbs_dict – A dictionary of verbs and their corresponding conjugation class.

Variables:
  • verbs_dict – A dictionary of verbs and their corresponding conjugation class.

  • verbs – A list of all the verbs in the data set.

  • templates – A list of all the templates in the data set.

  • verbs_list – A list of all the verbs in the data set, shuffled randomly.

  • templates_list – A list of the template index of each verb in the shuffled verbs_list.

  • dict_conjug – A dictionary where the keys are conjugation templates and the values are the verbs that belong to that template.

  • min_threshold – The minimum number of verbs in a conjugation class for it to be split into a training and testing set.

  • split_proportion – The proportion of the data set that should be used as the training set.

  • train_input – A list of the verbs in the training set.

  • train_labels – A list of the template index of each verb in the training set.

  • test_input – A list of the verbs in the testing set.

  • test_labels – A list of the template index of each verb in the testing set.

construct_dict_conjug()[source]
Populates the dictionary containing the conjugation templates.
Populates the lists containing the verbs and their templates.
split_data(threshold=8, proportion=0.5)[source]

Splits the data into a training and a testing set.

Parameters:
  • threshold – int. Minimum size of conjugation class to be split.

  • proportion – float. Proportion of samples in the training set. Must be between 0 and 1.

Raises:

ValueError.

API Reference for the classes in feature_extractor/feature_extractor.py

This module declares the feature extractors for verbs.

A custom vectorizer optimized for extracting verb features, including n-grams of verb endings and beginnings, verb length, number of vowels and consonants, and ratio of vowels to consonants.

mlconjug3.feature_extractor.feature_extractor.extract_verb_features(verb, lang, ngram_range)[source]
Custom Vectorizer optimized for extracting verbs features.
As in Indo-European languages verbs are inflected by adding a morphological suffix, the vectorizer extracts verb endings and produces a vector representation of the verb with binary features.
To enhance the results of the feature extration, several other features have been included:
The features are the verb’s ending n-grams, starting n-grams, length of the verb, number of vowels, number of consonants and the ratio of vowels over consonants.
Parameters:
  • verb – string. Verb to vectorize.

  • lang – string. Language to analyze.

  • ngram_range – tuple. The range of the ngram sliding window.

Return features:

list. List of the most salient features of the verb for the task of finding it’s conjugation’s class.

API Reference for the classes in models/models.py

This module declares the Model class.

It provides a Model class that wraps around scikit-learn’s pipeline, and offers a simple train, predict, and evaluate interface for training conjugation models. The Model class also provides default values for the vectorizer, feature selector and classifier, which work well for many languages and can be overridden as needed.

class mlconjug3.models.models.Model(vectorizer=None, feature_selector=None, classifier=None, language=None)[source]

Bases: object

This class manages the scikit-learn pipeline.
The Pipeline includes a feature vectorizer, a feature selector and a classifier.
If any of the vectorizer, feature selector or classifier is not supplied at instance declaration, the __init__ method will provide good default values that get more than 92% prediction accuracy.
Parameters:
  • vectorizer – scikit-learn Vectorizer.

  • feature_selector – scikit-learn Classifier with a fit_transform() method

  • classifier – scikit-learn Classifier with a predict() method

  • language – Language of the corpus of verbs to be analyzed.

Variables:
  • pipeline – scikit-learn Pipeline Object.

  • language – Language of the corpus of verbs to be analyzed.

train(samples, labels)[source]

Trains the pipeline on the supplied samples and labels.

Parameters:
  • samples – list. List of verbs.

  • labels – list. List of verb templates.

predict(verbs)[source]

Predicts the conjugation class of the provided list of verbs.

Parameters:

verbs – list. List of verbs.

Return predictions:

list. List of predicted conjugation groups.

API Reference for the classes in utils/model_trainer.py

This module provides the ConjugatorTrainer class, a tool for training, evaluating, and saving models for conjugating verbs in different languages.

The ConjugatorTrainer class allows the user to train a model for a specific language using the mlconjug3 library. The user can also evaluate the model’s performance and save the trained model for later use.

class mlconjug3.utils.model_trainer.ConjugatorTrainer(lang, output_folder, split_proportion, dataset, model)[source]

Bases: object

Initialize a ConjugatorTrainer instance.

Parameters:
  • lang – str. | The language for which the model will be trained.

  • output_folder – str. | The directory where the trained model will be saved.

  • split_proportion – float. | The proportion of the data set to use for training.

  • dataset – class. | The DataSet class from the mlconjug3 library.

  • model – obj. | The model to be trained.

Variables:
  • lang – Language of the conjugator.

  • output_folder – Output folder for the trained model.

  • split_proportion – Proportion of the data set to use for training.

  • dataset – DataSet class from the mlconjug3 library.

  • model – Model to be trained.

  • conjugator – mlconjug3 Conjugator instance.

train()[source]

Train the model using the specified parameters.

predict()[source]

Make predictions using the trained model.

Returns predictions:

list predictions: A list of predictions for the conjugated verbs.

evaluate()[source]

Evaluate the performance of the model’s predictions.

Prints the score of the model, with the number of misses out of the total number of entries.

save()[source]

Save the trained conjugator model to the specified output folder.