OT Speaker Attribution & Discourse Particles¶

Identification of who speaks in the Hebrew Bible using MACULA Hebrew subjref links on speech-verb tokens, plus analysis of key Hebrew discourse particles.

Speech verbs tracked: אָמַר (say), דָּבַר (speak), קָרָא (call/proclaim), עָנָה (answer), צָוָה (command), שָׁלַח (send), נָאַם (declare/oracle formula).

Discourse particles analyzed: הִנֵּה (presentative), כִּי (connective/causal), וְ (connective), לָכֵן (consequence), עַתָּה (temporal), גַּם (additive), אַךְ (restrictive).

Sections:

OT Speaker Attribution (print_speaker_summary)
Divine Speech by Book (print_divine_speech_by_book)
Who Speaks in a Book (who_speaks)
Divine Speech Verse References (divine_speech_verses)
Generate Speaker Report
Discourse Particle Tagging
Particle Summary by Book
Cross-Book כִּי Comparison

In [ ]:

Copied!





# @title Colab setup (runs only on Google Colab)
import sys
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
    import subprocess, os
    # Clone the repo so all source and data paths work
    if not os.path.isdir("/content/berean-bible-bots"):
        subprocess.run(
            ["git", "clone", "--depth", "1",
             "https://github.com/dnovick/berean-bible-bots.git",
             "/content/berean-bible-bots"],
            check=True,
        )
    os.chdir("/content/berean-bible-bots")
    sys.path.insert(0, "/content/berean-bible-bots/src")
    # Install Python dependencies
    subprocess.run(
        [sys.executable, "-m", "pip", "install", "-q", "-r",
         "binder/requirements.txt"],
        check=True,
    )
    # Download processed data files (~295 MB, one-time)
    subprocess.run(["bash", "binder/postBuild"], check=True)
    print("Colab environment ready.")
# @title Colab setup (runs only on Google Colab)
import sys
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
    import subprocess, os
    # Clone the repo so all source and data paths work
    if not os.path.isdir("/content/berean-bible-bots"):
        subprocess.run(
            ["git", "clone", "--depth", "1",
             "https://github.com/dnovick/berean-bible-bots.git",
             "/content/berean-bible-bots"],
            check=True,
        )
    os.chdir("/content/berean-bible-bots")
    sys.path.insert(0, "/content/berean-bible-bots/src")
    # Install Python dependencies
    subprocess.run(
        [sys.executable, "-m", "pip", "install", "-q", "-r",
         "binder/requirements.txt"],
        check=True,
    )
    # Download processed data files (~295 MB, one-time)
    subprocess.run(["bash", "binder/postBuild"], check=True)
    print("Colab environment ready.")

In [ ]:

Copied!





import sys
sys.path.insert(0, '../../../src')

import warnings
warnings.filterwarnings('ignore')

import pandas as pd
pd.set_option('display.max_rows', 60)
pd.set_option('display.max_columns', 20)
pd.set_option('display.width', 120)

print('Ready.')
import sys
sys.path.insert(0, '../../../src')

import warnings
warnings.filterwarnings('ignore')

import pandas as pd
pd.set_option('display.max_rows', 60)
pd.set_option('display.max_columns', 20)
pd.set_option('display.width', 120)

print('Ready.')

1. OT Speaker Attribution¶

The ot_speaker module identifies who speaks in the Hebrew Bible using MACULA Hebrew subjref links on speech-verb tokens. This answers: what proportion of each OT book is direct divine speech? Who dominates dialogue in Job, Genesis, Jeremiah? What speech verbs does YHWH use in Isaiah vs Deuteronomy?

In [ ]:

Copied!

from bible_grammar import print_speaker_summary

# What does YHWH+Elohim say in Isaiah?
print_speaker_summary(['H3068', 'H0430'], books=['Isa'], label='YHWH+Elohim')
from bible_grammar import print_speaker_summary

# What does YHWH+Elohim say in Isaiah?
print_speaker_summary(['H3068', 'H0430'], books=['Isa'], label='YHWH+Elohim')

2. Divine Speech by Book¶

Per-book divine speech percentage across the entire OT. Lamentations (~31.8%) and Psalms (~25.9%) have the highest ratios; Leviticus (~22.0%) reflects dense legal/priestly speech. Historical narratives tend to have lower percentages.

In [ ]:

Copied!

from bible_grammar import print_divine_speech_by_book

print_divine_speech_by_book(min_count=3)
from bible_grammar import print_divine_speech_by_book

print_divine_speech_by_book(min_count=3)

3. Who Speaks in a Book¶

Character dialogue breakdown for individual books. Job has a distinctive multi-voice structure: Job dominates (47 speech tokens), with Elihu, God, and the three friends as secondary voices. Genesis dialogue is dominated by YHWH/Elohim and the patriarchs.

In [ ]:

Copied!

from bible_grammar import who_speaks

# Who speaks in Job? — character dialogue breakdown
print('=== Who speaks in Job ===')
print(who_speaks('Job').to_string(index=False))
from bible_grammar import who_speaks

# Who speaks in Job? — character dialogue breakdown
print('=== Who speaks in Job ===')
print(who_speaks('Job').to_string(index=False))

In [ ]:

Copied!

# Who speaks in Genesis?
print('=== Who speaks in Genesis ===')
print(who_speaks('Gen', top_n=15).to_string(index=False))
# Who speaks in Genesis?
print('=== Who speaks in Genesis ===')
print(who_speaks('Gen', top_n=15).to_string(index=False))

4. Divine Speech Verse References¶

Retrieve all verse references where YHWH speaks in a given book. These can be used for sermon preparation, course illustration, or targeted syntactic study of divine speech patterns.

In [ ]:

Copied!





from bible_grammar import divine_speech_verses

# All refs where YHWH speaks in Jeremiah
refs = divine_speech_verses('Jer')
print(f'Jeremiah: {len(refs)} YHWH speech refs')
for r in refs[:10]:
    print(f'  {r}')
from bible_grammar import divine_speech_verses

# All refs where YHWH speaks in Jeremiah
refs = divine_speech_verses('Jer')
print(f'Jeremiah: {len(refs)} YHWH speech refs')
for r in refs[:10]:
    print(f'  {r}')

5. Generate Speaker Report¶

Generates a full Markdown report for a given speaker in a given book. Includes top speech verbs, book distribution, and cross-testament context.

In [ ]:

Copied!





from bible_grammar import speaker_report

# Generate full Markdown report for YHWH speech in Isaiah
report = speaker_report(
    ['H3068', 'H0430'], books=['Isa'], label='YHWH+Elohim',
    output_dir='../../../output/reports/ot/lexicon'
)
print(f'Report: {report}')
from bible_grammar import speaker_report

# Generate full Markdown report for YHWH speech in Isaiah
report = speaker_report(
    ['H3068', 'H0430'], books=['Isa'], label='YHWH+Elohim',
    output_dir='../../../output/reports/ot/lexicon'
)
print(f'Report: {report}')

6. Discourse Particle Tagging¶

Seven key Hebrew discourse particles, classified by function using MACULA's English gloss:

Particle	Label	Functions detected
הִנֵּה	presentative	attention-getter ('behold/look')
כִּי	connective	causal / content / adversative / conditional / asseverative / temporal
וְ	connective	sequential / adversative / logical / emphatic / temporal
לָכֵן	consequence	'therefore / so'
עַתָּה	temporal	discourse 'now' (logical pivot)
גַּם	additive	'also / even' (emphasis)
אַךְ	restrictive	'only / surely / but'

In [ ]:

Copied!

from bible_grammar import print_discourse_particles, print_particle_summary

# Isaiah 40: three hinne + ki content/causal/temporal clauses
print_discourse_particles('Isa', 40)
from bible_grammar import print_discourse_particles, print_particle_summary

# Isaiah 40: three hinne + ki content/causal/temporal clauses
print_discourse_particles('Isa', 40)

7. Particle Summary by Book¶

Genesis כִּי: 55% causal, 29% content. Deuteronomy has more כִּי conditional (legal protasis) and לָכֵן consequence markers, reflecting its instructional/covenantal genre.

In [ ]:

Copied!

# Genesis ki sense breakdown
print_particle_summary('Gen')

# Deuteronomy: more ki conditional (legal protasis) and laken consequence
print_particle_summary('Deu')
# Genesis ki sense breakdown
print_particle_summary('Gen')

# Deuteronomy: more ki conditional (legal protasis) and laken consequence
print_particle_summary('Deu')

8. Cross-Book כִּי Comparison¶

The multi-functional כִּי is one of the most important words in Biblical Hebrew discourse analysis. Its distribution across causal, content, adversative, conditional, and temporal functions varies significantly by genre.

In [ ]:

Copied!





from bible_grammar import discourse_particle_summary

books = ['Gen', 'Deu', 'Isa', 'Psa', 'Job']
frames = []
for b in books:
    df = discourse_particle_summary(b)
    # Filter to ki (כִּי)
    ki = df[df['particle_label'] == '\u05db\u05bc\u05b4\u05d9']  # כִּי
    ki = ki.copy()
    ki['book'] = b
    frames.append(ki)

combined = pd.concat(frames, ignore_index=True)
pivot = combined.pivot_table(
    index='discourse_function', columns='book', values='count', fill_value=0
)
print('=== ki function distribution by book ===')
print(pivot.to_string())
from bible_grammar import discourse_particle_summary

books = ['Gen', 'Deu', 'Isa', 'Psa', 'Job']
frames = []
for b in books:
    df = discourse_particle_summary(b)
    # Filter to ki (כִּי)
    ki = df[df['particle_label'] == '\u05db\u05bc\u05b4\u05d9']  # כִּי
    ki = ki.copy()
    ki['book'] = b
    frames.append(ki)

combined = pd.concat(frames, ignore_index=True)
pivot = combined.pivot_table(
    index='discourse_function', columns='book', values='count', fill_value=0
)
print('=== ki function distribution by book ===')
print(pivot.to_string())