Hebrew OT Semantic Domain Analysis¶
Analysis of the MARBLE SDBH semantic domain annotations in the MACULA Hebrew WLC dataset.
Each Hebrew token is tagged with a coredomain code (one or more space-separated numbers)
drawn from a set of ~170 thematic categories developed by the United Bible Societies
MARBLE (Multilingual Aligned Rich Biblical Encoding) project.
Two annotation columns:
coredomain— contextual semantic category (e.g. Covenant/berith, Speech/Utterance, Royalty/King); ~158,245 tokens tagged (~33.8% of Hebrew OT)lexdomain— hierarchical lexical domain code (Objects/Events/Referents/Markers); ~240,782 tokens tagged (~51.4% of Hebrew OT)
Key questions this notebook answers:
- What are the most common semantic categories in the OT?
- Which books have the richest theological vocabulary (Deity, Covenant, Worship)?
- How does a theological concept like 'Covenant' distribute across the canon?
- What are the key vocabulary items in each semantic domain?
- How do genre profiles compare across Torah, Wisdom, and Prophets?
Data source: MACULA Hebrew WLC (macula-hebrew/ submodule), MARBLE SDBH word-sense data.
In [ ]:
Copied!
import sys
sys.path.insert(0, '../../../src')
from bible_grammar import (
ot_domain_data, ot_domain_frequency, ot_top_domain_lemmas,
ot_domain_book_distribution, ot_domain_genre_profile,
ot_domain_comparison, ot_coredomain_profile, ot_theology_profile,
print_ot_domain_overview, print_ot_domain_frequency,
print_ot_top_lemmas, print_ot_domain_book_distribution,
print_ot_domain_genre_profile, print_ot_domain_comparison,
print_ot_theology_profile,
ot_domain_frequency_chart, ot_domain_book_chart,
ot_domain_genre_chart, ot_domain_heatmap,
COREDOMAIN_NAMES, THEOLOGY_COREDOMAINS,
)
import pandas as pd
import sys
sys.path.insert(0, '../../../src')
from bible_grammar import (
ot_domain_data, ot_domain_frequency, ot_top_domain_lemmas,
ot_domain_book_distribution, ot_domain_genre_profile,
ot_domain_comparison, ot_coredomain_profile, ot_theology_profile,
print_ot_domain_overview, print_ot_domain_frequency,
print_ot_top_lemmas, print_ot_domain_book_distribution,
print_ot_domain_genre_profile, print_ot_domain_comparison,
print_ot_theology_profile,
ot_domain_frequency_chart, ot_domain_book_chart,
ot_domain_genre_chart, ot_domain_heatmap,
COREDOMAIN_NAMES, THEOLOGY_COREDOMAINS,
)
import pandas as pd
1. Overview — Coverage and Categories¶
In [ ]:
Copied!
print_ot_domain_overview()
print_ot_domain_overview()
In [ ]:
Copied!
# Show all available coredomain categories
cat_df = pd.DataFrame([
{'code': k, 'name': v}
for k, v in sorted(COREDOMAIN_NAMES.items(), key=lambda x: int(x[0]))
])
print(f"Total coredomain categories: {len(cat_df)}")
cat_df.head(30)
# Show all available coredomain categories
cat_df = pd.DataFrame([
{'code': k, 'name': v}
for k, v in sorted(COREDOMAIN_NAMES.items(), key=lambda x: int(x[0]))
])
print(f"Total coredomain categories: {len(cat_df)}")
cat_df.head(30)
2. Frequency — Most Common Semantic Domains¶
In [ ]:
Copied!
# Top 25 domains across the entire OT
print_ot_domain_frequency(top_n=25)
# Top 25 domains across the entire OT
print_ot_domain_frequency(top_n=25)
In [ ]:
Copied!
# Note: object-marker (et), pronouns, and common verbs dominate by raw count.
# Theological domains (Covenant/berith, Royalty, Speech) rank highly when
# grammatical function words are filtered.
ot_domain_frequency_chart(top_n=25)
# Note: object-marker (et), pronouns, and common verbs dominate by raw count.
# Theological domains (Covenant/berith, Royalty, Speech) rank highly when
# grammatical function words are filtered.
ot_domain_frequency_chart(top_n=25)
3. Theological Domain Profiles¶
Twelve pre-built theological clusters group related coredomain codes:
| Cluster | Domains |
|---|---|
| Divinity | God/Deity (050), Divine Name/LORD (055) |
| Covenant | Covenant/berith (046), Faithfulness/Hesed (061), Oath/Vow (118) |
| Worship | Altar/Sacrifice (149), Music/Psalm (111), Praise/Thanks (130), Prayer (131) |
| Priesthood | Priesthood (132), Ritual Uncleanness (147), Washing (122), Holiness (079) |
| Prophecy | Prophecy/Oracle (134), Speech/Utterance (041), Signs/Wonders (159) |
| Wisdom | Wisdom/Knowledge (199), Plans/Counsel (124), Deceit/Falsehood (184) |
| Kingship | Royalty/King (015, 170, 095), Throne/Kingdom (148) |
| Salvation | Salvation/Rescue (152), Refuge (047), Memory/Remembrance (143) |
| Creation | Creation (028), Land/Earth (097, 185) |
| Warfare | Battle/Warfare (188), Violence/Sword (006), Enemy (044), Siege (158) |
| Justice/Law | Statutes/Law (098), Sin/Blood (145), Ethics/Good-Evil (017) |
| Exile/Return | Exile (070), Idols (085), Land/Earth (097) |
In [ ]:
Copied!
print_ot_theology_profile('Covenant')
print_ot_theology_profile('Covenant')
In [ ]:
Copied!
print_ot_theology_profile('Worship')
print_ot_theology_profile('Worship')
In [ ]:
Copied!
print_ot_theology_profile('Divinity')
print_ot_theology_profile('Divinity')
4. Domain Vocabulary — Top Lemmas per Domain¶
In [ ]:
Copied!
# Covenant/berith (046)
print_ot_top_lemmas('Covenant/berith', top_n=15)
# Covenant/berith (046)
print_ot_top_lemmas('Covenant/berith', top_n=15)
In [ ]:
Copied!
# Speech/Utterance (041) — אָמַר dominates
print_ot_top_lemmas('Speech/Utterance', top_n=15)
# Speech/Utterance (041) — אָמַר dominates
print_ot_top_lemmas('Speech/Utterance', top_n=15)
In [ ]:
Copied!
# Wisdom/Knowledge (199) — חָכְמָה, דַּעַת, בִּינָה
print_ot_top_lemmas('Wisdom/Knowledge', top_n=15)
# Wisdom/Knowledge (199) — חָכְמָה, דַּעַת, בִּינָה
print_ot_top_lemmas('Wisdom/Knowledge', top_n=15)
In [ ]:
Copied!
# Prophecy/Oracle (134) — נְבוּאָה, נָבִיא, נְאֻם
print_ot_top_lemmas('Prophecy/Oracle', top_n=15)
# Prophecy/Oracle (134) — נְבוּאָה, נָבִיא, נְאֻם
print_ot_top_lemmas('Prophecy/Oracle', top_n=15)
5. Book Distribution — Where Does Each Domain Concentrate?¶
In [ ]:
Copied!
# Covenant vocabulary — where is it most dense?
print_ot_domain_book_distribution('Covenant/berith')
# Covenant vocabulary — where is it most dense?
print_ot_domain_book_distribution('Covenant/berith')
In [ ]:
Copied!
ot_domain_book_chart('Covenant/berith')
ot_domain_book_chart('Covenant/berith')
In [ ]:
Copied!
# Wisdom vocabulary — Proverbs, Job, Ecclesiastes dominate
print_ot_domain_book_distribution('Wisdom/Knowledge')
# Wisdom vocabulary — Proverbs, Job, Ecclesiastes dominate
print_ot_domain_book_distribution('Wisdom/Knowledge')
In [ ]:
Copied!
ot_domain_book_chart('Wisdom/Knowledge')
ot_domain_book_chart('Wisdom/Knowledge')
In [ ]:
Copied!
# Warfare vocabulary
print_ot_domain_book_distribution('Battle/Warfare')
# Warfare vocabulary
print_ot_domain_book_distribution('Battle/Warfare')
6. Genre Profile¶
In [ ]:
Copied!
# Covenant — mostly Torah and Prophets
print_ot_domain_genre_profile('Covenant/berith')
ot_domain_genre_chart('Covenant/berith')
# Covenant — mostly Torah and Prophets
print_ot_domain_genre_profile('Covenant/berith')
ot_domain_genre_chart('Covenant/berith')
In [ ]:
Copied!
# Wisdom — almost exclusively Wisdom books
print_ot_domain_genre_profile('Wisdom/Knowledge')
ot_domain_genre_chart('Wisdom/Knowledge')
# Wisdom — almost exclusively Wisdom books
print_ot_domain_genre_profile('Wisdom/Knowledge')
ot_domain_genre_chart('Wisdom/Knowledge')
In [ ]:
Copied!
# Prayer — compare across genres
print_ot_domain_genre_profile('Prayer')
ot_domain_genre_chart('Prayer')
# Prayer — compare across genres
print_ot_domain_genre_profile('Prayer')
ot_domain_genre_chart('Prayer')
7. Book Comparison — Semantic Domain Fingerprints¶
In [ ]:
Copied!
# Compare Deuteronomy, Isaiah, Psalms, and Proverbs domain profiles
print_ot_domain_comparison(['Deu', 'Isa', 'Psa', 'Pro'], top_n=15)
# Compare Deuteronomy, Isaiah, Psalms, and Proverbs domain profiles
print_ot_domain_comparison(['Deu', 'Isa', 'Psa', 'Pro'], top_n=15)
In [ ]:
Copied!
# Heatmap visualization
ot_domain_heatmap(['Deu', 'Isa', 'Psa', 'Pro'], top_n=15)
# Heatmap visualization
ot_domain_heatmap(['Deu', 'Isa', 'Psa', 'Pro'], top_n=15)
In [ ]:
Copied!
# Pentateuch comparison
print_ot_domain_comparison(['Gen', 'Exo', 'Lev', 'Num', 'Deu'], top_n=12)
# Pentateuch comparison
print_ot_domain_comparison(['Gen', 'Exo', 'Lev', 'Num', 'Deu'], top_n=12)
8. Per-Book Theological Domain Profile¶
In [ ]:
Copied!
# Deuteronomy — law, covenant, and ethical vocabulary
print_ot_domain_frequency('Deu', top_n=20)
# Deuteronomy — law, covenant, and ethical vocabulary
print_ot_domain_frequency('Deu', top_n=20)
In [ ]:
Copied!
# Psalms — praise, prayer, lament vocabulary
print_ot_domain_frequency('Psa', top_n=20)
# Psalms — praise, prayer, lament vocabulary
print_ot_domain_frequency('Psa', top_n=20)
In [ ]:
Copied!
# Isaiah — prophecy, salvation, holiness
print_ot_domain_frequency('Isa', top_n=20)
# Isaiah — prophecy, salvation, holiness
print_ot_domain_frequency('Isa', top_n=20)
In [ ]:
Copied!
# Proverbs — wisdom, ethics, behavior
print_ot_domain_frequency('Pro', top_n=20)
# Proverbs — wisdom, ethics, behavior
print_ot_domain_frequency('Pro', top_n=20)
9. Ad-hoc Queries¶
In [ ]:
Copied!
# All tokens tagged with Covenant/berith (046)
df = ot_domain_data('Covenant/berith')
print(f"Covenant domain tokens: {len(df)}")
df[['text', 'lemma', 'gloss', 'coredomain', 'book', 'chapter', 'verse']].head(10)
# All tokens tagged with Covenant/berith (046)
df = ot_domain_data('Covenant/berith')
print(f"Covenant domain tokens: {len(df)}")
df[['text', 'lemma', 'gloss', 'coredomain', 'book', 'chapter', 'verse']].head(10)
In [ ]:
Copied!
# Covenant vocabulary in Jeremiah specifically
print_ot_theology_profile('Covenant', book='Jer')
# Covenant vocabulary in Jeremiah specifically
print_ot_theology_profile('Covenant', book='Jer')
In [ ]:
Copied!
# Exile vocabulary
print_ot_theology_profile('Exile/Return')
# Exile vocabulary
print_ot_theology_profile('Exile/Return')