Usage

Command Line Interface

Example of using mlconjug3 through a remote ssh connection:

To see a list of mlconjug3’s commands type ‘mlconjug3 -h’ from the command line:

  $ mlconjug3 -h
  Usage: mlconjug3 [OPTIONS] [VERBS]...

  Examples of how to use mlconjug3 from the terminal

  To conjugate a verb in English, abbreviated subject format : $ mlconjug3 -l
  en -s abbrev 'have'

  To conjugate multiple verbs in French, full subject format : $ mlconjug3 -l
  fr -s pronoun 'aimer' 'être' 'aller'

  To conjugate a verb in Spanish, full subject format and save the conjugation
  table in a json file: $ mlconjug3 -l es -s pronoun -f json 'hablar' -o
  'conjugation_table.json'

  To conjugate multiple verbs in Italian, abbreviated subject format and save
  the conjugation table in a csv file: $ mlconjug3 -l it -s abbrev -f json
  'parlare' 'avere' 'essere' -o 'conjugation_table.json'

  Examples of how to use mlconjug3 from the terminal with a config file:

  To use a config file in your home directory:
  $ mlconjug3 -c have

  To use a specific config file:
  $ mlconjug3 -c /path/to/config.toml have be eat

  To use a specific config file and override some of the settings:
  $ mlconjug3 -c /path/to/config.toml -l en -s pronoun -o conjugation_table.json -f json have fly

Options:
  -l, --language TEXT     The language for the conjugation pipeline. The
                          values can be 'fr', 'en', 'es', 'it', 'pt' or 'ro'.
                          The default value is fr.
  -o, --output TEXT       Path of the filename for storing the conjugation
                          tables.
  -s, --subject TEXT      The subject format type for the conjugated forms.
                          The values can be 'abbrev' or 'pronoun'. The default
                          value is 'abbrev'.
  -f, --file_format TEXT  The output format for storing the conjugation
                          tables. The values can be 'json', 'csv'. The default
                          value is 'json'.
  -c, --config FILE       Path of the configuration file for specifying
                        language, subject, output file name and format, as
                        well as theme settings for the conjugation table
                        columns. Supported file formats: toml, yaml
  -h, --help              Show this message and exit.

Note

The default language is French.

When called without specifying a language, the library will try to conjugate the verb in French.

To conjugate a verb in English, abbreviated subject format :

$ mlconjug3 -l en -s abbrev 'have'

To conjugate multiple verbs in French, full subject format :

$ mlconjug3 -l fr -s pronoun 'aimer' 'être' 'aller'

To conjugate a verb in Spanish, full subject format and save the conjugation table in a json file:

$ mlconjug3 -l es -s pronoun -f json 'hablar' -o 'conjugation_table.json'

To conjugate multiple verbs in Italian, abbreviated subject format and save the conjugation table in a csv file:

$ mlconjug3 -l it -s abbrev -f csv 'parlare' 'avere' 'essere' -o 'conjugation_table.csv'

Examples of how to use mlconjug3 from the terminal with a config file:

To use a config file in your home directory:

$ mlconjug3 -c hablar

To use a specific config file:

$ mlconjug3 -c /path/to/config.toml manger parler

To use a specific config file and override some of the settings:

$ mlconjug3 -c /path/to/config.toml -l en -s pronoun -o conjugation_table.json -f json have

Using Configuration Files

mlconjug3 allows you to specify various settings using configuration files so that you don’t have to type them at the command line. These files can be in either TOML or YAML format and mlcnjug3 will automatically check if a configuration file is located in a directory in your home folder called /mlconjug3/. You can also pass the path to your configuration file by using the ‘-c’ option.

Here is an example of a config.toml file:

language = "en"
subject = "abbrev"
output = "conjugation_table.json"
file_format = "json"

[theme]
header_style = "bold #0D47A1"
mood_style = "bold #F9A825"
tense_style = "bold bright_magenta"
person_style = "bold cyan"
conjugation_style = "bold #4CAF50"

And here is an example of a config.yamll file:

language: fr
subject: pronoun
output: conjugation_table.json
file_format: json

theme:
  header_style: bold blue
  mood_style: bold yellow
  tense_style: bold green
  person_style: bold bright_cyan
  conjugation_style: bold bright_magenta

Use mlconjug3 in your own code

This library provides an easy-to-use interface for conjugating verbs using machine learning models. It includes a pre-trained model for French, English, Spanish, Italian, Portuguese and Romanian verbs, as well as interfaces for training custom models and conjugating verbs in multiple languages.

The main class of the library is Conjugator, which provides the conjugate() method for conjugating verbs. The class also manages the Verbiste data set and provides an interface with the scikit-learn pipeline. The class can be initialized with a specific language and a custom model, otherwise the default language is French and the pre-trained French conjugation pipeline is used.

The library mlconjug3 also includes helper classes for managing verb data, such as VerbInfo and Verb, as well as utility functions for feature extraction and evaluation.

Using the Conjugator class:

To use the Conjugator class, you need to first import the class in your code.

from mlconjug3 import Conjugator

# initialize the conjugator
conjugator = Conjugator()

# conjugate the verb "parler"
verb = conjugator.conjugate("parler")

# print all the conjugated forms as a list of tuples.
print(verb.iterate())

The class Verb and it’s children adhere to the Python Data Model and can be accessed as a dictionary. This way you can conveniently access parts of the conjugation either in the form Verb[mood][tense][person] or the form Verb[(mood, tense, person)].

Using the form Verb[mood][tense][person] to access the conjugated forms:

# get the conjugation for the indicative mood, present tense, first person singular
print(verb["Indicatif"]["Présent"]["1s"])

# get the conjugation for the indicative mood, present tense
print(verb["Indicatif"]["Présent"])

# get the conjugation for the indicative mood
print(verb["Indicatif"])

Using the form Verb[(mood, tense, person)] to access the conjugated forms:

# get the conjugation for the indicative mood, present tense, first person singular
print(verb["Indicatif", "Présent", "1s"])

# get the conjugation for the indicative mood, present tense
print(verb["Indicatif", "Présent"])

# get the conjugation for the indicative mood
print(verb["Indicatif"])

You can check if a conjugated form is present in the verb:

# check if the form "je parle" is in the conjugated forms. Prints True.
print("je parle" in verb)

# check if the form "tu parles" is in the conjugated forms. Prints True.
print("tu parles" in verb)

# check if the form "parlent" is in the conjugated forms. Prints True.
print("parlent" in verb)

# check if the form "tu manges" is in the conjugated forms. Prints False.
print("tu manges" not in verb)

You can also access the conjugated forms in the attribute conjug_info

# print all the conjugations for the indicative mood
print(verb.conjug_info["Indicatif"])

# print the conjugation for the indicative mood, present tense, first person singular
print(verb.conjug_info["Indicatif"]["Présent"]["1s"])

# print the conjugation for the indicative mood, present tense
print(verb.conjug_info["Indicatif"]["Présent"])

# print the conjugation for the indicative mood
print(verb.conjug_info["Indicatif"])

Providing a pre-trained model

You can provide your own trained model to the Conjugator class if you have trained a model using the ConjugatorTrainer class. To do this, pass the trained model object as the second argument to the Conjugator class.

For example, if you have trained a French conjugation model and saved it to the file “my_french_model.pickle”, you can load this model and use it with the Conjugator class as follows:

import joblib
from mlconjug3 import Conjugator

# load the trained model from file
my_french_model = joblib.load("my_french_model.pickle")

# create an instance of the Conjugator class with the custom model
conjugator = Conjugator(language='fr', model=my_french_model)

# conjugate a verb
conjugations = conjugator.conjugate("aimer")

Note that the Conjugator class expects the model object to have a similar structure as the default model, with the following methods and properties.

The model should have:

a fit() method for training the model on a dataset
a predict() method for making predictions on new data
a ‘__classes__’ property that returns an array of the class labels

As long as your custom model has these properties and methods, it should be compatible with the Conjugator class.

To use mlconjug3 in a project and train a new model:

The following sample script demonstrates how to train your own model using the mlconjug3 library. The script uses the ConjugatorTrainer class, which wraps the scikit-learn classifier, feature selector and vectorizer into a single object, making it easy to train, predict and evaluate the model.

The script starts by importing the necessary modules and setting the parameters for the model.

The parameters are:

lang: the language of the conjugator. The default language is ‘fr’ for French.
output_folder: the location where the trained model will be saved.
split_proportion: the proportion of the data that will be used for training. The remaining data will be used for testing.
dataset: the dataset object which contains the data for the model.
model: the model object which wraps the classifier, feature selector and vectorizer.

Once the parameters are set, the script creates an instance of the ConjugatorTrainer class, passing the parameters as keyword arguments.

The script then calls the train() method on the ConjugatorTrainer object to train the model. This step may take a while, depending on the size of the dataset and the complexity of the model.

Once the model is trained, the script calls the predict() method to make predictions on the test data.

It then calls the evaluate() method to evaluate the model’s performance.

Finally, the script saves the model to the specified output folder.

It is important to note that this script uses the default parameters for the model, and these may not be optimal for your specific use case. We recommend experimenting with different parameters and evaluating the model’s performance to find the best configuration for your use case.

"""
Script to train a new french Conjugator model
"""
import mlconjug3
from mlconjug3.feature_extractor import extract_verb_features
from functools import partial

lang = "fr"

params = {'lang': lang,
          'output_folder': "models",
          'split_proportion': 0.8,
          'dataset': mlconjug3.DataSet(mlconjug3.Verbiste(lang).verbs),
          'model': mlconjug3.Model(
              language=lang,
              vectorizer=mlconjug3.CountVectorizer(analyzer=partial(extract_verb_features, lang=lang, ngram_range=(2, 7)),
                                         binary=True, lowercase=False),
              feature_selector=mlconjug3.SelectFromModel(mlconjug3.LinearSVC(penalty = "l1", max_iter = 12000, dual = False, verbose = 0)),
              classifier=mlconjug3.SGDClassifier(loss = "log", penalty = "elasticnet", l1_ratio = 0.15, max_iter = 40000, alpha = 1e-5, verbose = 0)
          )
         }

ct = mlconjug3.utils.ConjugatorTrainer(**params)

print("training model...")
ct.train()
print("model has benn trained.")

ct.predict()

print("evaluating model")
ct.evaluate()

print("saving model")
ct.save()

Alternatively you can load the model parameters from a yaml file using PyYaml, Hydra or any other library.

Here is an example of a yaml file to store the model settings:

# config.yaml

language: fr

output_folder: models

split_proportion: 0.8

vectorizer:
    type: mlconjug3.CountVectorizer
    kwargs:
        analyzer:
            type: functools.partial
            kwargs:
                func: mlconjug3.feature_extractor.extract_verb_features
                lang: fr
                ngram_range: [2, 7]
        binary: true
        lowercase: false

feature_selector:
    type: mlconjug3.SelectFromModel
    kwargs:
        estimator:
            type: mlconjug3.LinearSVC
            kwargs:
                penalty: l1
                max_iter: 12000
                dual: false
                verbose: 0

classifier:
    type: mlconjug3.SGDClassifier
    kwargs:
        loss: log
        penalty: elasticnet
        l1_ratio: 0.15
        max_iter: 40000
        alpha: 1e-5
        verbose: 0

In conclusion, the mlconjug3 library provides a simple and flexible interface for conjugating verbs using machine learning models, with support for multiple languages and the ability to train custom models.

The main class of the library is the Conjugator, which can be used to conjugate verbs in the supported languages using the pre-trained models, or custom models trained using the ConjugatorTrainer class.