Hebrew OT Number Morphology¶

Analysis of the 6,801 number tokens (class_='num') in the MACULA Hebrew WLC dataset. Covers cardinal and ordinal numbers, the gender-polarity rule, state distribution, book/genre distribution, and the numbers that dominate census and chronological texts.

Key pedagogical point (BBH Ch11): Hebrew cardinal numbers 3–10 follow a reverse-gender rule — the form without ה counts feminine nouns; the form with ה counts masculine nouns. Numbers 1–2 agree normally; 20+ are invariable.

Data source: MACULA Hebrew WLC (macula-hebrew/ submodule), filtered to class_='num' and lang='H'.

In [ ]:

Copied!





# @title Colab setup (runs only on Google Colab)
import sys
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
    import subprocess, os
    # Clone the repo so all source and data paths work
    if not os.path.isdir("/content/berean-bible-bots"):
        subprocess.run(
            ["git", "clone", "--depth", "1",
             "https://github.com/dnovick/berean-bible-bots.git",
             "/content/berean-bible-bots"],
            check=True,
        )
    os.chdir("/content/berean-bible-bots")
    sys.path.insert(0, "/content/berean-bible-bots/src")
    # Install Python dependencies
    subprocess.run(
        [sys.executable, "-m", "pip", "install", "-q", "-r",
         "binder/requirements.txt"],
        check=True,
    )
    # Download processed data files (~295 MB, one-time)
    subprocess.run(["bash", "binder/postBuild"], check=True)
    print("Colab environment ready.")
# @title Colab setup (runs only on Google Colab)
import sys
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
    import subprocess, os
    # Clone the repo so all source and data paths work
    if not os.path.isdir("/content/berean-bible-bots"):
        subprocess.run(
            ["git", "clone", "--depth", "1",
             "https://github.com/dnovick/berean-bible-bots.git",
             "/content/berean-bible-bots"],
            check=True,
        )
    os.chdir("/content/berean-bible-bots")
    sys.path.insert(0, "/content/berean-bible-bots/src")
    # Install Python dependencies
    subprocess.run(
        [sys.executable, "-m", "pip", "install", "-q", "-r",
         "binder/requirements.txt"],
        check=True,
    )
    # Download processed data files (~295 MB, one-time)
    subprocess.run(["bash", "binder/postBuild"], check=True)
    print("Colab environment ready.")

In [ ]:

Copied!





import sys
sys.path.insert(0, '../../../src')

from bible_grammar import (
    ot_number_data, ot_number_frequency, ot_top_number_lemmas,
    ot_number_gender_profile, ot_number_state_profile,
    ot_number_book_distribution, ot_number_genre_profile,
    ot_number_polarity_table,
    print_ot_number_overview, print_ot_number_frequency,
    print_ot_number_gender, print_ot_number_state,
    print_ot_number_book_distribution, print_ot_number_genre_profile,
    print_ot_number_polarity,
    ot_number_frequency_chart, ot_number_genre_chart, ot_number_book_chart,
    CARDINALS_1_10,
)
import pandas as pd
import sys
sys.path.insert(0, '../../../src')

from bible_grammar import (
    ot_number_data, ot_number_frequency, ot_top_number_lemmas,
    ot_number_gender_profile, ot_number_state_profile,
    ot_number_book_distribution, ot_number_genre_profile,
    ot_number_polarity_table,
    print_ot_number_overview, print_ot_number_frequency,
    print_ot_number_gender, print_ot_number_state,
    print_ot_number_book_distribution, print_ot_number_genre_profile,
    print_ot_number_polarity,
    ot_number_frequency_chart, ot_number_genre_chart, ot_number_book_chart,
    CARDINALS_1_10,
)
import pandas as pd

1. Overview¶

In [ ]:

Copied!

print_ot_number_overview()
print_ot_number_overview()

In [ ]:

Copied!

df = ot_number_data()
print(f"Sample of number tokens:")
df[['text','lemma','strong_h','gloss','gender','state','number','book']].head(10)
df = ot_number_data()
print(f"Sample of number tokens:")
df[['text','lemma','strong_h','gloss','gender','state','number','book']].head(10)

2. Frequency — Most Common Number Lemmas¶

In [ ]:

Copied!

print_ot_number_frequency(20)
print_ot_number_frequency(20)

In [ ]:

Copied!

# אֶחָד (one) is the most frequent number by a large margin
# שְׁנַיִם (two) and the common cardinals 3-7 follow
ot_number_frequency_chart(15)
# אֶחָד (one) is the most frequent number by a large margin
# שְׁנַיִם (two) and the common cardinals 3-7 follow
ot_number_frequency_chart(15)

3. Gender-Polarity Rule (Cardinals 1–10)¶

This is the most distinctive feature of Hebrew numbers. For cardinals 3–10, the masculine form (no feminine ending) is used with feminine nouns, and the feminine form (-ה ending) is used with masculine nouns — the opposite of what English speakers expect.

Number	With masc. noun	With fem. noun
three	שָׁלֹשׁ בָּנִים (masc. form)	שָׁלֹשׁ נָשִׁים → שְׁלֹשׁ (fem. form)

Numbers 1–2 agree normally. Numbers 11–19 use both forms. Numbers 20+ are invariable.

In [ ]:

Copied!

print_ot_number_polarity()
print_ot_number_polarity()

In [ ]:

Copied!

# Raw polarity table for custom analysis
ot_number_polarity_table()
# Raw polarity table for custom analysis
ot_number_polarity_table()

4. Gender and State Distribution¶

In [ ]:

Copied!

print_ot_number_gender()
print_ot_number_gender()

In [ ]:

Copied!

# 'both' = the token can be read as either gender depending on context
print_ot_number_state()
# 'both' = the token can be read as either gender depending on context
print_ot_number_state()

In [ ]:

Copied!





# Construct state in numbers — used in compound expressions like
# עֶשְׂרִים וּשְׁנַיִם (twenty-two) where the lower number is construct
df_construct = ot_number_data()
construct = df_construct[df_construct['state'] == 'construct']
print(f"Construct-state number tokens: {len(construct)}")
print("\nTop construct lemmas:")
print(construct['lemma'].value_counts().head(10).to_string())
# Construct state in numbers — used in compound expressions like
# עֶשְׂרִים וּשְׁנַיִם (twenty-two) where the lower number is construct
df_construct = ot_number_data()
construct = df_construct[df_construct['state'] == 'construct']
print(f"Construct-state number tokens: {len(construct)}")
print("\nTop construct lemmas:")
print(construct['lemma'].value_counts().head(10).to_string())

5. Book Distribution — Census and Chronological Texts¶

In [ ]:

Copied!

print_ot_number_book_distribution()
print_ot_number_book_distribution()

In [ ]:

Copied!





# Numbers dominates — it contains the two census counts (chs. 1-4, 26)
# 1 Chronicles is dense with genealogical numbers
# Ezekiel: temple vision dimensions (chs. 40-48)
ot_number_book_chart(20)
# Numbers dominates — it contains the two census counts (chs. 1-4, 26)
# 1 Chronicles is dense with genealogical numbers
# Ezekiel: temple vision dimensions (chs. 40-48)
ot_number_book_chart(20)

6. Genre Profile¶

In [ ]:

Copied!

print_ot_number_genre_profile()
print_ot_number_genre_profile()

In [ ]:

Copied!





# Historical books dominate (~48%) due to Kings/Chronicles temple construction
# and census/genealogy data. Wisdom books have almost no numbers (~2.4%) —
# poetry and proverbs rarely cite quantities.
ot_number_genre_chart()
# Historical books dominate (~48%) due to Kings/Chronicles temple construction
# and census/genealogy data. Wisdom books have almost no numbers (~2.4%) —
# poetry and proverbs rarely cite quantities.
ot_number_genre_chart()

7. Ad-hoc Queries¶

In [ ]:

Copied!





# Numbers in Genesis
print_ot_number_frequency.__doc__
from bible_grammar import ot_number_data
gen_nums = ot_number_data(book='Gen')
print(f"Genesis number tokens: {len(gen_nums)}")
print(gen_nums['lemma'].value_counts().head(10).to_string())
# Numbers in Genesis
print_ot_number_frequency.__doc__
from bible_grammar import ot_number_data
gen_nums = ot_number_data(book='Gen')
print(f"Genesis number tokens: {len(gen_nums)}")
print(gen_nums['lemma'].value_counts().head(10).to_string())

In [ ]:

Copied!





# Ordinal numbers only (second, third, fourth...)
df = ot_number_data()
ordinals = df[df['gloss'].str.contains('second|third|fourth|fifth|sixth|seventh|eighth|ninth|tenth', na=False, case=False)]
print(f"Ordinal-gloss number tokens: {len(ordinals)}")
print(ordinals[['text','lemma','gloss','book']].head(10).to_string())
# Ordinal numbers only (second, third, fourth...)
df = ot_number_data()
ordinals = df[df['gloss'].str.contains('second|third|fourth|fifth|sixth|seventh|eighth|ninth|tenth', na=False, case=False)]
print(f"Ordinal-gloss number tokens: {len(ordinals)}")
print(ordinals[['text','lemma','gloss','book']].head(10).to_string())