Hebrew OT Number Morphology¶
Analysis of the 6,801 number tokens (class_='num') in the MACULA Hebrew WLC dataset.
Covers cardinal and ordinal numbers, the gender-polarity rule, state distribution,
book/genre distribution, and the numbers that dominate census and chronological texts.
Key pedagogical point (BBH Ch11): Hebrew cardinal numbers 3–10 follow a reverse-gender rule — the form without ה counts feminine nouns; the form with ה counts masculine nouns. Numbers 1–2 agree normally; 20+ are invariable.
Data source: MACULA Hebrew WLC (macula-hebrew/ submodule), filtered to class_='num' and lang='H'.
import sys
sys.path.insert(0, '../../../src')
from bible_grammar import (
ot_number_data, ot_number_frequency, ot_top_number_lemmas,
ot_number_gender_profile, ot_number_state_profile,
ot_number_book_distribution, ot_number_genre_profile,
ot_number_polarity_table,
print_ot_number_overview, print_ot_number_frequency,
print_ot_number_gender, print_ot_number_state,
print_ot_number_book_distribution, print_ot_number_genre_profile,
print_ot_number_polarity,
ot_number_frequency_chart, ot_number_genre_chart, ot_number_book_chart,
CARDINALS_1_10,
)
import pandas as pd
1. Overview¶
print_ot_number_overview()
df = ot_number_data()
print(f"Sample of number tokens:")
df[['text','lemma','strong_h','gloss','gender','state','number','book']].head(10)
2. Frequency — Most Common Number Lemmas¶
print_ot_number_frequency(20)
# אֶחָד (one) is the most frequent number by a large margin
# שְׁנַיִם (two) and the common cardinals 3-7 follow
ot_number_frequency_chart(15)
3. Gender-Polarity Rule (Cardinals 1–10)¶
This is the most distinctive feature of Hebrew numbers. For cardinals 3–10, the masculine form (no feminine ending) is used with feminine nouns, and the feminine form (-ה ending) is used with masculine nouns — the opposite of what English speakers expect.
| Number | With masc. noun | With fem. noun |
|---|---|---|
| three | שָׁלֹשׁ בָּנִים (masc. form) | שָׁלֹשׁ נָשִׁים → שְׁלֹשׁ (fem. form) |
Numbers 1–2 agree normally. Numbers 11–19 use both forms. Numbers 20+ are invariable.
print_ot_number_polarity()
# Raw polarity table for custom analysis
ot_number_polarity_table()
4. Gender and State Distribution¶
print_ot_number_gender()
# 'both' = the token can be read as either gender depending on context
print_ot_number_state()
# Construct state in numbers — used in compound expressions like
# עֶשְׂרִים וּשְׁנַיִם (twenty-two) where the lower number is construct
df_construct = ot_number_data()
construct = df_construct[df_construct['state'] == 'construct']
print(f"Construct-state number tokens: {len(construct)}")
print("\nTop construct lemmas:")
print(construct['lemma'].value_counts().head(10).to_string())
5. Book Distribution — Census and Chronological Texts¶
print_ot_number_book_distribution()
# Numbers dominates — it contains the two census counts (chs. 1-4, 26)
# 1 Chronicles is dense with genealogical numbers
# Ezekiel: temple vision dimensions (chs. 40-48)
ot_number_book_chart(20)
6. Genre Profile¶
print_ot_number_genre_profile()
# Historical books dominate (~48%) due to Kings/Chronicles temple construction
# and census/genealogy data. Wisdom books have almost no numbers (~2.4%) —
# poetry and proverbs rarely cite quantities.
ot_number_genre_chart()
7. Ad-hoc Queries¶
# Numbers in Genesis
print_ot_number_frequency.__doc__
from bible_grammar import ot_number_data
gen_nums = ot_number_data(book='Gen')
print(f"Genesis number tokens: {len(gen_nums)}")
print(gen_nums['lemma'].value_counts().head(10).to_string())
# Ordinal numbers only (second, third, fourth...)
df = ot_number_data()
ordinals = df[df['gloss'].str.contains('second|third|fourth|fifth|sixth|seventh|eighth|ninth|tenth', na=False, case=False)]
print(f"Ordinal-gloss number tokens: {len(ordinals)}")
print(ordinals[['text','lemma','gloss','book']].head(10).to_string())