Export & Profiles — Generating Shareable Reports¶
This notebook documents two complementary reporting tools:
export.py— Takes any analysis result and writes it as a styled self-contained HTML report or CSV file. Handles word studies, genre comparisons, divine names, and semantic profiles.profiles.py— Generates standardized one-page statistical summaries for any Bible book: word count, vocabulary richness, POS distribution, verb breakdown, top lemmas, hapax count.
These modules are the primary way to produce output that can be shared with others without requiring them to run Python.
1. Overview¶
export.py¶
Output directories:
output/exports/csv/— raw CSV files, one per DataFrameoutput/exports/html/— self-contained HTML reports (inline CSS, embedded charts as data URIs)
The HTML reports are fully standalone: they embed all styles and charts inline, so they can be shared as a single file with no external dependencies.
High-level exporters call lower-level analysis modules internally and produce both CSV and HTML output together. The low-level helper export_html_page() accepts an arbitrary list of section dicts and can be used to build custom reports.
profiles.py¶
Profiles are statistical summaries of a single Bible book. They compare the book against the corpus average for its testament (OT or NT) and report the delta. Profiles can be printed to stdout, returned as a raw dict, or saved as a Markdown file.
Profiles are saved under output/reports/ot/survey/ (OT) or output/reports/nt/survey/ (NT).
import sys
sys.path.insert(0, '../../src')
2. Per-Book Language Profile¶
book_profile() returns a dict with the following keys:
| Key | Content |
|---|---|
book_id, book_name, testament |
Identification |
canonical_order, chapter_count |
Position info |
total_words |
Token count |
unique_strongs |
Distinct lexical lemmas |
hapax_count |
Lemmas appearing exactly once in this book |
ttr |
Type-token ratio (unique lemmas / total words) |
pos_distribution |
POS percentages (top 10) |
verb_detail |
OT: stem distribution; NT: tense/voice/mood |
top_lemmas |
Top 20 most frequent Strong's numbers |
baseline_delta |
Difference from corpus average per POS |
baseline |
Full corpus-average stats for this testament |
print_profile() formats the dict for stdout — useful for quick inspection.
save_profile_report() writes a Markdown file. The default path is output/reports/<ot|nt>/survey/<book_id>_profile.md.
batch_profiles() generates reports for multiple books at once. It accepts a testament filter ('OT' or 'NT') or an explicit book_ids list.
from bible_grammar.profiles import book_profile, print_profile, save_profile_report, batch_profiles
# Genesis: OT example — word count, hapax, stem breakdown
print_profile('Gen')
# Romans: NT example — tense/voice breakdown
print_profile('Rom')
# Job: expected to show the highest hapax count of any OT book
print_profile('Job')
# Raw dict access — useful for programmatic use
profile = book_profile('Isa')
profile
# Compare hapax counts across a few books
for book in ['Gen', 'Exo', 'Job', 'Psa', 'Isa', 'Dan']:
p = book_profile(book)
print(f"{book:<5} words={p['total_words']:>6,} hapax={p['hapax_count']:>4} ttr={p['ttr']:.3f}")
# Save a Markdown profile report for Genesis
path = save_profile_report('Gen')
print('Saved:', path)
# Batch: generate profiles for four books
# paths = batch_profiles(book_ids=['Gen', 'Exo', 'Rom', 'Rev'])
# for p in paths:
# print(p.name)
# Batch all NT books (generates ~27 Markdown files)
# paths = batch_profiles(testament='NT')
# print(f'Generated {len(paths)} NT profiles')
3. CSV Export¶
export_csv(df, slug) writes any pandas DataFrame to output/exports/csv/<slug>.csv. An optional subdir parameter puts the file in a subdirectory under the CSV root.
The function returns the Path object pointing to the saved file.
from bible_grammar.export import export_csv
# Build a simple analysis result and export it
from bible_grammar.db import load
import pandas as pd
df = load()
# Count Qal verbs by book
qal_counts = (
df[(df['source'] == 'TAHOT') & (df['stem'] == 'Qal')]
.groupby('book_id')
.size()
.reset_index(name='qal_count')
.sort_values('qal_count', ascending=False)
)
qal_counts.head(5)
# Export the result as CSV
path = export_csv(qal_counts, 'qal-counts-by-book')
print('Saved CSV:', path)
# Export to a subdirectory
path2 = export_csv(qal_counts, 'qal-counts-by-book', subdir='verb-stems')
print('Saved CSV:', path2)
4. HTML Export — Word Study Report¶
export_word_study(strongs) produces a complete word study report as both HTML and CSV. It internally calls several analysis modules:
wordstudy.word_study()— distribution by book, example verses, LXX equivalents, NT trajectorycollocation.collocations()— PMI/G² collocate statisticsmorph_chart.morph_distribution()— morphological form breakdown
Returns a dict:
{
'html': Path, # full report
'csv_by_book': Path, # distribution table
'csv_morphology': Path, # morphology table (may be None)
'csv_collocates': Path, # collocates table (may be None)
}
Works for both Hebrew (H-prefix) and Greek (G-prefix) Strong's numbers.
from bible_grammar.export import export_word_study
# Word study for H7965 (shalom)
result = export_word_study('H7965')
print('HTML:', result['html'])
print('CSV (by book):', result['csv_by_book'])
# Word study for G3056 (logos)
result_grk = export_word_study('G3056')
print('HTML:', result_grk['html'])
5. HTML Export — Other Analysis Types¶
Each high-level exporter follows the same pattern: it calls underlying analysis modules, generates charts, writes CSV companions, and produces a standalone HTML report. All return a dict containing at least an 'html' key.
export_divine_names(corpora) — Frequency and distribution of divine names (YHWH, Elohim, Adonai, etc.) across OT, LXX, and NT. Generates stacked bar charts and heatmaps.
export_genre_compare(corpus) — Morphological patterns across canonical literary sections. OT features: verb stem, conjugation, POS. NT features: tense, voice, mood, POS. Includes heatmaps per feature.
export_semantic_profile(strongs) — Combines all word study data with LXX translation consistency analysis into a single comprehensive profile.
export_all() — Runs all exporters in sequence. Slow (several minutes) but produces a complete set of reports. Accepts an optional word_studies list to override the default set.
from bible_grammar.export import (
export_divine_names,
export_genre_compare,
export_semantic_profile,
export_all,
)
# Divine names report — OT, LXX, and NT
r = export_divine_names()
print('HTML:', r['html'])
# Genre comparison — OT only
r = export_genre_compare('OT')
print('HTML:', r['html'])
# Genre comparison — NT only
r = export_genre_compare('NT')
print('HTML:', r['html'])
# Semantic profile for H7965 (shalom)
r = export_semantic_profile('H7965')
print('HTML:', r['html'])
# export_all() regenerates every standard report — slow, run deliberately
# results = export_all()
# for category, paths in results.items():
# print(f'{category}: {len(paths)} files')
6. HTML Export — Low-Level Helper¶
export_html_page() is the engine behind all high-level exporters. It accepts a list of section dicts and assembles them into a standalone HTML page with consistent styling.
Signature:
export_html_page(
sections: list[dict],
title: str,
slug: str,
*,
subtitle: str = '',
source_note: str = 'STEPBible TAHOT/TAGNT/TALXX (CC BY 4.0, Tyndale House Cambridge)',
) -> Path
Each dict in sections may have:
'heading'—<h2>section title (also used for the table of contents if there are 3+ sections)'subheading'—<h3>sub-title'text'— paragraph of prose'df'— DataFrame to render as a styled HTML table'pct_cols'— list of column names to format as percentages'chart'— path to a PNG file to embed as a data URI'html'— raw HTML fragment to insert verbatim
The output file is written to output/exports/html/<slug>.html.
from bible_grammar.export import export_html_page
import pandas as pd
# Example: build a custom HTML report from any DataFrame
df = load()
stem_counts = (
df[df['source'] == 'TAHOT']
.groupby('stem')
.size()
.reset_index(name='count')
.sort_values('count', ascending=False)
.head(10)
)
stem_counts['pct'] = (stem_counts['count'] / stem_counts['count'].sum() * 100).round(1)
path = export_html_page(
sections=[
{
'heading': 'OT Verb Stem Totals',
'text': 'Count of all OT verb tokens by Hebrew binyan stem.',
'df': stem_counts.rename(columns={'stem': 'Stem', 'count': 'Count', 'pct': '%'}),
'pct_cols': ['%'],
}
],
title='Hebrew Verb Stem Overview',
slug='ot-verb-stem-overview',
subtitle='All binyanim · full OT corpus',
)
print('Saved:', path)
7. Quick Reference¶
# ── profiles.py ───────────────────────────────────────────────────────────────
from bible_grammar.profiles import book_profile, print_profile, save_profile_report, batch_profiles
book_profile('Gen') # -> dict with full stats
print_profile('Gen') # formatted stdout summary
save_profile_report('Gen') # -> Path (Markdown file)
save_profile_report('Gen', 'my.md') # custom output path
batch_profiles(testament='OT') # all OT books -> list[Path]
batch_profiles(testament='NT') # all NT books
batch_profiles(book_ids=['Gen', 'Rom']) # explicit list
# ── export.py — CSV ───────────────────────────────────────────────────────────
from bible_grammar.export import export_csv
export_csv(df, 'my-analysis') # -> Path: output/exports/csv/my-analysis.csv
export_csv(df, 'my-analysis', subdir='word-studies') # subdirectory
# ── export.py — HTML ──────────────────────────────────────────────────────────
from bible_grammar.export import (
export_word_study,
export_genre_compare,
export_divine_names,
export_semantic_profile,
export_html_page,
export_all,
)
export_word_study('H7965') # shalom — HTML + CSV
export_word_study('G3056') # logos
export_genre_compare('OT') # OT genre heatmaps
export_genre_compare('NT') # NT genre heatmaps
export_divine_names() # OT + LXX + NT (default)
export_divine_names(corpora=['OT']) # OT only
export_semantic_profile('H7965') # full semantic profile
export_all() # regenerate all standard reports (slow)
export_all(word_studies=['H7965', 'G26']) # override word study list
export_html_page( # low-level custom page builder
sections=[{'heading': ..., 'df': ..., 'chart': ..., 'html': ..., 'pct_cols': [...]}],
title='My Report',
slug='my-report',
) # -> Path: output/exports/html/my-report.html