Hebrew OT Information Structure¶

Information structure describes how a clause packages information as given vs. new, topic vs. comment, and foregrounded vs. backgrounded. This notebook quantifies several proxy metrics that are reliably computable from MACULA morphological data:

Metric	What it measures
Parataxis ratio	Proportion of verses beginning with waw-consecutive or wayyiqtol
Hypotaxis/1k	Subordinating connectives (כִּי, אֲשֶׁר, לְמַעַן, etc.) per 1,000 tokens
Fronted ratio	Proportion of verses with non-verb-initial clause type
Nominal clause %	Verbless clauses as % of all verses (proxy for copula-free sentences)
Inf. construct/1k	Infinitive construct density (subordinate purpose/time clauses)

Limitation: Full topic/focus analysis requires syntactic annotation beyond what MACULA provides. These metrics are approximations.

References:

Heimerdinger, Topic, Focus and Foreground in Ancient Hebrew Narratives (1999)
Longacre, Joseph: A Story of Divine Providence (1989)
IBHS Chapters 7–8 (clause types and hierarchy)

In [ ]:

Copied!





# @title Colab setup (runs only on Google Colab)
import sys
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
    import subprocess, os
    # Clone the repo so all source and data paths work
    if not os.path.isdir("/content/berean-bible-bots"):
        subprocess.run(
            ["git", "clone", "--depth", "1",
             "https://github.com/dnovick/berean-bible-bots.git",
             "/content/berean-bible-bots"],
            check=True,
        )
    os.chdir("/content/berean-bible-bots")
    sys.path.insert(0, "/content/berean-bible-bots/src")
    # Install Python dependencies
    subprocess.run(
        [sys.executable, "-m", "pip", "install", "-q", "-r",
         "binder/requirements.txt"],
        check=True,
    )
    # Download processed data files (~295 MB, one-time)
    subprocess.run(["bash", "binder/postBuild"], check=True)
    print("Colab environment ready.")
# @title Colab setup (runs only on Google Colab)
import sys
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
    import subprocess, os
    # Clone the repo so all source and data paths work
    if not os.path.isdir("/content/berean-bible-bots"):
        subprocess.run(
            ["git", "clone", "--depth", "1",
             "https://github.com/dnovick/berean-bible-bots.git",
             "/content/berean-bible-bots"],
            check=True,
        )
    os.chdir("/content/berean-bible-bots")
    sys.path.insert(0, "/content/berean-bible-bots/src")
    # Install Python dependencies
    subprocess.run(
        [sys.executable, "-m", "pip", "install", "-q", "-r",
         "binder/requirements.txt"],
        check=True,
    )
    # Download processed data files (~295 MB, one-time)
    subprocess.run(["bash", "binder/postBuild"], check=True)
    print("Colab environment ready.")

In [ ]:

Copied!





import sys
sys.path.insert(0, '../../../src')

from bible_grammar import (
    ot_information_profile, ot_clause_linking_comparison,
    print_ot_information_profile, print_ot_clause_linking_comparison,
    ot_clause_linking_chart,
)
import pandas as pd
import sys
sys.path.insert(0, '../../../src')

from bible_grammar import (
    ot_information_profile, ot_clause_linking_comparison,
    print_ot_information_profile, print_ot_clause_linking_comparison,
    ot_clause_linking_chart,
)
import pandas as pd

1. Overview — Single Book Profile¶

In [ ]:

Copied!

# Genesis: highly paratactic narrative
print_ot_information_profile('Gen')
# Genesis: highly paratactic narrative
print_ot_information_profile('Gen')

In [ ]:

Copied!

# Deuteronomy: legal/homiletical, should be more hypotactic
print_ot_information_profile('Deu')
# Deuteronomy: legal/homiletical, should be more hypotactic
print_ot_information_profile('Deu')

In [ ]:

Copied!

# Psalms: poetry, high nominal clause rate expected
print_ot_information_profile('Psa')
# Psalms: poetry, high nominal clause rate expected
print_ot_information_profile('Psa')

2. Parataxis vs. Hypotaxis — Narrative vs. Law vs. Poetry¶

Hebrew narrative (Genesis, Kings) is predominantly paratactic — clauses chained by waw-consecutive. Legal texts (Leviticus, Deuteronomy) and poetry (Psalms, Job) use more subordination.

In [ ]:

Copied!

# Torah comparison
print_ot_clause_linking_comparison(['Gen', 'Exo', 'Lev', 'Num', 'Deu'])
# Torah comparison
print_ot_clause_linking_comparison(['Gen', 'Exo', 'Lev', 'Num', 'Deu'])

In [ ]:

Copied!

ot_clause_linking_chart(['Gen', 'Exo', 'Lev', 'Num', 'Deu'])
ot_clause_linking_chart(['Gen', 'Exo', 'Lev', 'Num', 'Deu'])

In [ ]:

Copied!

# Genre comparison: narrative / wisdom / prophecy
genre_sample = ['Gen', '1Sa', '2Ki', 'Psa', 'Pro', 'Job', 'Isa', 'Jer', 'Eze']
print_ot_clause_linking_comparison(genre_sample)
# Genre comparison: narrative / wisdom / prophecy
genre_sample = ['Gen', '1Sa', '2Ki', 'Psa', 'Pro', 'Job', 'Isa', 'Jer', 'Eze']
print_ot_clause_linking_comparison(genre_sample)

In [ ]:

Copied!

ot_clause_linking_chart(genre_sample)
ot_clause_linking_chart(genre_sample)

3. Fronted Elements — Topic and Focus¶

Non-verb-initial clauses signal that something is fronted (pre-posed) for topic or focus. High fronted-ratio suggests contrastive/emphatic discourse. Wisdom literature (Proverbs, Psalms) and legal texts tend to have more fronted nominal elements than fast-moving narrative.

In [ ]:

Copied!





df = ot_clause_linking_comparison(genre_sample)
df[['total_verses', 'fronted_ratio', 'nominal_clause_pct']].sort_values(
    'fronted_ratio', ascending=False
)
df = ot_clause_linking_comparison(genre_sample)
df[['total_verses', 'fronted_ratio', 'nominal_clause_pct']].sort_values(
    'fronted_ratio', ascending=False
)

4. Nominal Clauses — Verbless Sentences¶

Biblical Hebrew regularly omits the copula (to be) in present-tense assertions. Wisdom, praise (Psalms), and instructional genres have high nominal-clause rates; narrative has low.

In [ ]:

Copied!





all_books = [
    'Gen', 'Exo', 'Lev', 'Num', 'Deu', 'Jos', 'Jdg',
    '1Sa', '2Sa', '1Ki', '2Ki', 'Job', 'Psa', 'Pro',
    'Ecc', 'Isa', 'Jer', 'Eze', 'Dan', 'Amo', 'Jon'
]
df_all = ot_clause_linking_comparison(all_books)
df_all[['total_verses', 'parataxis_ratio', 'nominal_clause_pct', 'hypotaxis_per1k']].sort_values(
    'nominal_clause_pct', ascending=False
).head(15)
all_books = [
    'Gen', 'Exo', 'Lev', 'Num', 'Deu', 'Jos', 'Jdg',
    '1Sa', '2Sa', '1Ki', '2Ki', 'Job', 'Psa', 'Pro',
    'Ecc', 'Isa', 'Jer', 'Eze', 'Dan', 'Amo', 'Jon'
]
df_all = ot_clause_linking_comparison(all_books)
df_all[['total_verses', 'parataxis_ratio', 'nominal_clause_pct', 'hypotaxis_per1k']].sort_values(
    'nominal_clause_pct', ascending=False
).head(15)

5. Isaiah 1–39 vs. 40–66 — Clause-Linking Structure¶

Isaiah 40–66 (Deutero-Isaiah) is known for its elevated style. Does it also differ in parataxis/hypotaxis ratios from Proto-Isaiah?

In [ ]:

Copied!





from bible_grammar import load_syntax_ot as load_ot_data
from bible_grammar.discourse.information_structure import ot_information_profile

isa_df = load_ot_data()
isa_h = isa_df[(isa_df['book'] == 'Isa') & (isa_df['lang'] == 'H')]

# Temporarily monkey-patch to profile each half
isa1_39 = isa_h[isa_h['chapter'] <= 39].copy()
isa40_66 = isa_h[isa_h['chapter'] >= 40].copy()

def _profile_raw(df, label):
    from bible_grammar.discourse.information_structure import (
        _SUBORDINATING_LEMMAS, _NON_VERB_INITIAL_TYPES
    )
    total = len(df)
    verse_firsts = df.groupby(['book', 'chapter', 'verse']).first().reset_index()
    total_verses = len(verse_firsts)
    parataxis = int(
        (verse_firsts['type_'] == 'wayyiqtol').sum() +
        verse_firsts['lemma'].isin({'וְ', 'וּ', 'וַ'}).sum()
    )
    subordinating = df['lemma'].isin(_SUBORDINATING_LEMMAS).sum()
    fronted = int(verse_firsts['type_'].isin(_NON_VERB_INITIAL_TYPES).sum())
    verbal_types = {'wayyiqtol', 'qatal', 'yiqtol', 'imperative', 'cohortative',
                    'jussive', 'participle active', 'participle passive',
                    'infinitive construct', 'infinitive absolute'}
    verse_has_verb = df.groupby(['book', 'chapter', 'verse'])['type_'].apply(
        lambda s: s.isin(verbal_types).any()
    )
    nominal = int((~verse_has_verb).sum())
    return {
        'label': label, 'total_verses': total_verses,
        'parataxis_ratio': round(parataxis / max(total_verses, 1), 4),
        'hypotaxis_per1k': round(subordinating / total * 1000, 2),
        'fronted_ratio': round(fronted / max(total_verses, 1), 4),
        'nominal_pct': round(nominal / max(total_verses, 1) * 100, 2),
    }

comparison = pd.DataFrame([
    _profile_raw(isa1_39, 'Isa 1–39'),
    _profile_raw(isa40_66, 'Isa 40–66'),
]).set_index('label')
comparison
from bible_grammar import load_syntax_ot as load_ot_data
from bible_grammar.discourse.information_structure import ot_information_profile

isa_df = load_ot_data()
isa_h = isa_df[(isa_df['book'] == 'Isa') & (isa_df['lang'] == 'H')]

# Temporarily monkey-patch to profile each half
isa1_39 = isa_h[isa_h['chapter'] <= 39].copy()
isa40_66 = isa_h[isa_h['chapter'] >= 40].copy()

def _profile_raw(df, label):
    from bible_grammar.discourse.information_structure import (
        _SUBORDINATING_LEMMAS, _NON_VERB_INITIAL_TYPES
    )
    total = len(df)
    verse_firsts = df.groupby(['book', 'chapter', 'verse']).first().reset_index()
    total_verses = len(verse_firsts)
    parataxis = int(
        (verse_firsts['type_'] == 'wayyiqtol').sum() +
        verse_firsts['lemma'].isin({'וְ', 'וּ', 'וַ'}).sum()
    )
    subordinating = df['lemma'].isin(_SUBORDINATING_LEMMAS).sum()
    fronted = int(verse_firsts['type_'].isin(_NON_VERB_INITIAL_TYPES).sum())
    verbal_types = {'wayyiqtol', 'qatal', 'yiqtol', 'imperative', 'cohortative',
                    'jussive', 'participle active', 'participle passive',
                    'infinitive construct', 'infinitive absolute'}
    verse_has_verb = df.groupby(['book', 'chapter', 'verse'])['type_'].apply(
        lambda s: s.isin(verbal_types).any()
    )
    nominal = int((~verse_has_verb).sum())
    return {
        'label': label, 'total_verses': total_verses,
        'parataxis_ratio': round(parataxis / max(total_verses, 1), 4),
        'hypotaxis_per1k': round(subordinating / total * 1000, 2),
        'fronted_ratio': round(fronted / max(total_verses, 1), 4),
        'nominal_pct': round(nominal / max(total_verses, 1) * 100, 2),
    }

comparison = pd.DataFrame([
    _profile_raw(isa1_39, 'Isa 1–39'),
    _profile_raw(isa40_66, 'Isa 40–66'),
]).set_index('label')
comparison

6. Pentateuch — Law vs. Narrative Sections¶

Compare all five Torah books. Leviticus and Deuteronomy should show lower parataxis (less narrative) and higher hypotaxis (more subordinate clauses in legal formulas).

In [ ]:

Copied!

df_torah = ot_clause_linking_comparison(['Gen', 'Exo', 'Lev', 'Num', 'Deu'])
df_torah[['total_verses', 'parataxis_ratio', 'hypotaxis_per1k', 'nominal_clause_pct', 'inf_construct_per1k']]
df_torah = ot_clause_linking_comparison(['Gen', 'Exo', 'Lev', 'Num', 'Deu'])
df_torah[['total_verses', 'parataxis_ratio', 'hypotaxis_per1k', 'nominal_clause_pct', 'inf_construct_per1k']]

7. Pedagogical Note — LXX Translation Tendencies¶

The LXX often renders Hebrew parataxis (waw-consecutive) as Greek hypotaxis (participle constructions, subordinate clauses). The high parataxis ratios in narrative books explain why direct translation into Greek required significant syntactic restructuring.

In [ ]:

Copied!





# Books with highest parataxis ratio — these are the ones most affected by LXX restructuring
df_all.sort_values('parataxis_ratio', ascending=False).head(10)[[
    'total_verses', 'parataxis_ratio', 'hypotaxis_per1k'
]]
# Books with highest parataxis ratio — these are the ones most affected by LXX restructuring
df_all.sort_values('parataxis_ratio', ascending=False).head(10)[[
    'total_verses', 'parataxis_ratio', 'hypotaxis_per1k'
]]