Export & Profiles — Generating Shareable Reports¶

This notebook documents two complementary reporting tools:

export.py — Takes any analysis result and writes it as a styled self-contained HTML report or CSV file. Handles word studies, genre comparisons, divine names, and semantic profiles.
profiles.py — Generates standardized one-page statistical summaries for any Bible book: word count, vocabulary richness, POS distribution, verb breakdown, top lemmas, hapax count.

These modules are the primary way to produce output that can be shared with others without requiring them to run Python.

In [ ]:

Copied!





# @title Colab setup (runs only on Google Colab)
import sys
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
    import subprocess, os
    # Clone the repo so all source and data paths work
    if not os.path.isdir("/content/berean-bible-bots"):
        subprocess.run(
            ["git", "clone", "--depth", "1",
             "https://github.com/dnovick/berean-bible-bots.git",
             "/content/berean-bible-bots"],
            check=True,
        )
    os.chdir("/content/berean-bible-bots")
    sys.path.insert(0, "/content/berean-bible-bots/src")
    # Install Python dependencies
    subprocess.run(
        [sys.executable, "-m", "pip", "install", "-q", "-r",
         "binder/requirements.txt"],
        check=True,
    )
    # Download processed data files (~295 MB, one-time)
    subprocess.run(["bash", "binder/postBuild"], check=True)
    print("Colab environment ready.")
# @title Colab setup (runs only on Google Colab)
import sys
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
    import subprocess, os
    # Clone the repo so all source and data paths work
    if not os.path.isdir("/content/berean-bible-bots"):
        subprocess.run(
            ["git", "clone", "--depth", "1",
             "https://github.com/dnovick/berean-bible-bots.git",
             "/content/berean-bible-bots"],
            check=True,
        )
    os.chdir("/content/berean-bible-bots")
    sys.path.insert(0, "/content/berean-bible-bots/src")
    # Install Python dependencies
    subprocess.run(
        [sys.executable, "-m", "pip", "install", "-q", "-r",
         "binder/requirements.txt"],
        check=True,
    )
    # Download processed data files (~295 MB, one-time)
    subprocess.run(["bash", "binder/postBuild"], check=True)
    print("Colab environment ready.")

1. Overview¶

export.py¶

Output directories:

output/exports/csv/ — raw CSV files, one per DataFrame
output/exports/html/ — self-contained HTML reports (inline CSS, embedded charts as data URIs)

The HTML reports are fully standalone: they embed all styles and charts inline, so they can be shared as a single file with no external dependencies.

High-level exporters call lower-level analysis modules internally and produce both CSV and HTML output together. The low-level helper export_html_page() accepts an arbitrary list of section dicts and can be used to build custom reports.

profiles.py¶

Profiles are statistical summaries of a single Bible book. They compare the book against the corpus average for its testament (OT or NT) and report the delta. Profiles can be printed to stdout, returned as a raw dict, or saved as a Markdown file.

Profiles are saved under output/reports/ot/survey/ (OT) or output/reports/nt/survey/ (NT).

In [ ]:

Copied!

import sys
sys.path.insert(0, '../../src')
import sys
sys.path.insert(0, '../../src')

2. Per-Book Language Profile¶

book_profile() returns a dict with the following keys:

Key	Content
`book_id`, `book_name`, `testament`	Identification
`canonical_order`, `chapter_count`	Position info
`total_words`	Token count
`unique_strongs`	Distinct lexical lemmas
`hapax_count`	Lemmas appearing exactly once in this book
`ttr`	Type-token ratio (unique lemmas / total words)
`pos_distribution`	POS percentages (top 10)
`verb_detail`	OT: stem distribution; NT: tense/voice/mood
`top_lemmas`	Top 20 most frequent Strong's numbers
`baseline_delta`	Difference from corpus average per POS
`baseline`	Full corpus-average stats for this testament

print_profile() formats the dict for stdout — useful for quick inspection.

save_profile_report() writes a Markdown file. The default path is output/reports/<ot|nt>/survey/<book_id>_profile.md.

batch_profiles() generates reports for multiple books at once. It accepts a testament filter ('OT' or 'NT') or an explicit book_ids list.

In [ ]:

Copied!

from bible_grammar.reporting.profiles import book_profile, print_profile, save_profile_report, batch_profiles
from bible_grammar.reporting.profiles import book_profile, print_profile, save_profile_report, batch_profiles

In [ ]:

Copied!

# Genesis: OT example — word count, hapax, stem breakdown
print_profile('Gen')
# Genesis: OT example — word count, hapax, stem breakdown
print_profile('Gen')

In [ ]:

Copied!

# Romans: NT example — tense/voice breakdown
print_profile('Rom')
# Romans: NT example — tense/voice breakdown
print_profile('Rom')

In [ ]:

Copied!

# Job: expected to show the highest hapax count of any OT book
print_profile('Job')
# Job: expected to show the highest hapax count of any OT book
print_profile('Job')

In [ ]:

Copied!

# Raw dict access — useful for programmatic use
profile = book_profile('Isa')
profile
# Raw dict access — useful for programmatic use
profile = book_profile('Isa')
profile

In [ ]:

Copied!





# Compare hapax counts across a few books
for book in ['Gen', 'Exo', 'Job', 'Psa', 'Isa', 'Dan']:
    p = book_profile(book)
    print(f"{book:<5} words={p['total_words']:>6,}  hapax={p['hapax_count']:>4}  ttr={p['ttr']:.3f}")
# Compare hapax counts across a few books
for book in ['Gen', 'Exo', 'Job', 'Psa', 'Isa', 'Dan']:
    p = book_profile(book)
    print(f"{book:<5} words={p['total_words']:>6,}  hapax={p['hapax_count']:>4}  ttr={p['ttr']:.3f}")

In [ ]:

Copied!

# Save a Markdown profile report for Genesis
path = save_profile_report('Gen')
print('Saved:', path)
# Save a Markdown profile report for Genesis
path = save_profile_report('Gen')
print('Saved:', path)

In [ ]:

Copied!





# Batch: generate profiles for four books
# paths = batch_profiles(book_ids=['Gen', 'Exo', 'Rom', 'Rev'])
# for p in paths:
#     print(p.name)
# Batch: generate profiles for four books
# paths = batch_profiles(book_ids=['Gen', 'Exo', 'Rom', 'Rev'])
# for p in paths:
#     print(p.name)

In [ ]:

Copied!

# Batch all NT books (generates ~27 Markdown files)
# paths = batch_profiles(testament='NT')
# print(f'Generated {len(paths)} NT profiles')
# Batch all NT books (generates ~27 Markdown files)
# paths = batch_profiles(testament='NT')
# print(f'Generated {len(paths)} NT profiles')

3. CSV Export¶

export_csv(df, slug) writes any pandas DataFrame to output/exports/csv/<slug>.csv. An optional subdir parameter puts the file in a subdirectory under the CSV root.

The function returns the Path object pointing to the saved file.

In [ ]:

Copied!

from bible_grammar.reporting.export import export_csv
from bible_grammar.reporting.export import export_csv

In [ ]:

Copied!





# Build a simple analysis result and export it
from bible_grammar.core.db import load
import pandas as pd

df = load()

# Count Qal verbs by book
qal_counts = (
    df[(df['source'] == 'TAHOT') & (df['stem'] == 'Qal')]
    .groupby('book_id')
    .size()
    .reset_index(name='qal_count')
    .sort_values('qal_count', ascending=False)
)
qal_counts.head(5)
# Build a simple analysis result and export it
from bible_grammar.core.db import load
import pandas as pd

df = load()

# Count Qal verbs by book
qal_counts = (
    df[(df['source'] == 'TAHOT') & (df['stem'] == 'Qal')]
    .groupby('book_id')
    .size()
    .reset_index(name='qal_count')
    .sort_values('qal_count', ascending=False)
)
qal_counts.head(5)

In [ ]:

Copied!

# Export the result as CSV
path = export_csv(qal_counts, 'qal-counts-by-book')
print('Saved CSV:', path)
# Export the result as CSV
path = export_csv(qal_counts, 'qal-counts-by-book')
print('Saved CSV:', path)

In [ ]:

Copied!

# Export to a subdirectory
path2 = export_csv(qal_counts, 'qal-counts-by-book', subdir='verb-stems')
print('Saved CSV:', path2)
# Export to a subdirectory
path2 = export_csv(qal_counts, 'qal-counts-by-book', subdir='verb-stems')
print('Saved CSV:', path2)

4. HTML Export — Word Study Report¶

export_word_study(strongs) produces a complete word study report as both HTML and CSV. It internally calls several analysis modules:

wordstudy.word_study() — distribution by book, example verses, LXX equivalents, NT trajectory
collocation.collocations() — PMI/G² collocate statistics
morph_chart.morph_distribution() — morphological form breakdown

Returns a dict:

{
    'html':           Path,  # full report
    'csv_by_book':    Path,  # distribution table
    'csv_morphology': Path,  # morphology table (may be None)
    'csv_collocates': Path,  # collocates table (may be None)
}

Works for both Hebrew (H-prefix) and Greek (G-prefix) Strong's numbers.

In [ ]:

Copied!

from bible_grammar.reporting.export import export_word_study
from bible_grammar.reporting.export import export_word_study

In [ ]:

Copied!





# Word study for H7965 (shalom)
result = export_word_study('H7965')
print('HTML:', result['html'])
print('CSV (by book):', result['csv_by_book'])
# Word study for H7965 (shalom)
result = export_word_study('H7965')
print('HTML:', result['html'])
print('CSV (by book):', result['csv_by_book'])

In [ ]:

Copied!

# Word study for G3056 (logos)
result_grk = export_word_study('G3056')
print('HTML:', result_grk['html'])
# Word study for G3056 (logos)
result_grk = export_word_study('G3056')
print('HTML:', result_grk['html'])

5. HTML Export — Other Analysis Types¶

Each high-level exporter follows the same pattern: it calls underlying analysis modules, generates charts, writes CSV companions, and produces a standalone HTML report. All return a dict containing at least an 'html' key.

export_divine_names(corpora) — Frequency and distribution of divine names (YHWH, Elohim, Adonai, etc.) across OT, LXX, and NT. Generates stacked bar charts and heatmaps.

export_genre_compare(corpus) — Morphological patterns across canonical literary sections. OT features: verb stem, conjugation, POS. NT features: tense, voice, mood, POS. Includes heatmaps per feature.

export_semantic_profile(strongs) — Combines all word study data with LXX translation consistency analysis into a single comprehensive profile.

export_all() — Runs all exporters in sequence. Slow (several minutes) but produces a complete set of reports. Accepts an optional word_studies list to override the default set.

In [ ]:

Copied!





from bible_grammar.reporting.export import (
    export_divine_names,
    export_genre_compare,
    export_semantic_profile,
    export_all,
)
from bible_grammar.reporting.export import (
    export_divine_names,
    export_genre_compare,
    export_semantic_profile,
    export_all,
)

In [ ]:

Copied!

# Divine names report — OT, LXX, and NT
r = export_divine_names()
print('HTML:', r['html'])
# Divine names report — OT, LXX, and NT
r = export_divine_names()
print('HTML:', r['html'])

In [ ]:

Copied!

# Genre comparison — OT only
r = export_genre_compare('OT')
print('HTML:', r['html'])
# Genre comparison — OT only
r = export_genre_compare('OT')
print('HTML:', r['html'])

In [ ]:

Copied!

# Genre comparison — NT only
r = export_genre_compare('NT')
print('HTML:', r['html'])
# Genre comparison — NT only
r = export_genre_compare('NT')
print('HTML:', r['html'])

In [ ]:

Copied!

# Semantic profile for H7965 (shalom)
r = export_semantic_profile('H7965')
print('HTML:', r['html'])
# Semantic profile for H7965 (shalom)
r = export_semantic_profile('H7965')
print('HTML:', r['html'])

In [ ]:

Copied!





# export_all() regenerates every standard report — slow, run deliberately
# results = export_all()
# for category, paths in results.items():
#     print(f'{category}: {len(paths)} files')
# export_all() regenerates every standard report — slow, run deliberately
# results = export_all()
# for category, paths in results.items():
#     print(f'{category}: {len(paths)} files')

6. HTML Export — Low-Level Helper¶

export_html_page() is the engine behind all high-level exporters. It accepts a list of section dicts and assembles them into a standalone HTML page with consistent styling.

Signature:

export_html_page(
    sections: list[dict],
    title: str,
    slug: str,
    *,
    subtitle: str = '',
    source_note: str = 'STEPBible TAHOT/TAGNT/TALXX (CC BY 4.0, Tyndale House Cambridge)',
) -> Path

Each dict in sections may have:

'heading' — <h2> section title (also used for the table of contents if there are 3+ sections)
'subheading' — <h3> sub-title
'text' — paragraph of prose
'df' — DataFrame to render as a styled HTML table
'pct_cols' — list of column names to format as percentages
'chart' — path to a PNG file to embed as a data URI
'html' — raw HTML fragment to insert verbatim

The output file is written to output/exports/html/<slug>.html.

In [ ]:

Copied!

from bible_grammar.reporting.export import export_html_page
import pandas as pd
from bible_grammar.reporting.export import export_html_page
import pandas as pd

In [ ]:

Copied!





# Example: build a custom HTML report from any DataFrame
df = load()

stem_counts = (
    df[df['source'] == 'TAHOT']
    .groupby('stem')
    .size()
    .reset_index(name='count')
    .sort_values('count', ascending=False)
    .head(10)
)
stem_counts['pct'] = (stem_counts['count'] / stem_counts['count'].sum() * 100).round(1)

path = export_html_page(
    sections=[
        {
            'heading': 'OT Verb Stem Totals',
            'text': 'Count of all OT verb tokens by Hebrew binyan stem.',
            'df': stem_counts.rename(columns={'stem': 'Stem', 'count': 'Count', 'pct': '%'}),
            'pct_cols': ['%'],
        }
    ],
    title='Hebrew Verb Stem Overview',
    slug='ot-verb-stem-overview',
    subtitle='All binyanim · full OT corpus',
)
print('Saved:', path)
# Example: build a custom HTML report from any DataFrame
df = load()

stem_counts = (
    df[df['source'] == 'TAHOT']
    .groupby('stem')
    .size()
    .reset_index(name='count')
    .sort_values('count', ascending=False)
    .head(10)
)
stem_counts['pct'] = (stem_counts['count'] / stem_counts['count'].sum() * 100).round(1)

path = export_html_page(
    sections=[
        {
            'heading': 'OT Verb Stem Totals',
            'text': 'Count of all OT verb tokens by Hebrew binyan stem.',
            'df': stem_counts.rename(columns={'stem': 'Stem', 'count': 'Count', 'pct': '%'}),
            'pct_cols': ['%'],
        }
    ],
    title='Hebrew Verb Stem Overview',
    slug='ot-verb-stem-overview',
    subtitle='All binyanim · full OT corpus',
)
print('Saved:', path)

7. Quick Reference¶

# ── profiles.py ───────────────────────────────────────────────────────────────
from bible_grammar.profiles import book_profile, print_profile, save_profile_report, batch_profiles

book_profile('Gen')                   # -> dict with full stats
print_profile('Gen')                  # formatted stdout summary
save_profile_report('Gen')            # -> Path (Markdown file)
save_profile_report('Gen', 'my.md')   # custom output path
batch_profiles(testament='OT')        # all OT books -> list[Path]
batch_profiles(testament='NT')        # all NT books
batch_profiles(book_ids=['Gen', 'Rom'])  # explicit list

# ── export.py — CSV ───────────────────────────────────────────────────────────
from bible_grammar.export import export_csv

export_csv(df, 'my-analysis')          # -> Path: output/exports/csv/my-analysis.csv
export_csv(df, 'my-analysis', subdir='word-studies')  # subdirectory

# ── export.py — HTML ──────────────────────────────────────────────────────────
from bible_grammar.export import (
    export_word_study,
    export_genre_compare,
    export_divine_names,
    export_semantic_profile,
    export_html_page,
    export_all,
)

export_word_study('H7965')             # shalom — HTML + CSV
export_word_study('G3056')             # logos

export_genre_compare('OT')             # OT genre heatmaps
export_genre_compare('NT')             # NT genre heatmaps

export_divine_names()                  # OT + LXX + NT (default)
export_divine_names(corpora=['OT'])    # OT only

export_semantic_profile('H7965')       # full semantic profile

export_all()                           # regenerate all standard reports (slow)
export_all(word_studies=['H7965', 'G26'])  # override word study list

export_html_page(                      # low-level custom page builder
    sections=[{'heading': ..., 'df': ..., 'chart': ..., 'html': ..., 'pct_cols': [...]}],
    title='My Report',
    slug='my-report',
)  # -> Path: output/exports/html/my-report.html