Getting Started with Berean Bible Bots¶
Welcome! This notebook introduces the Berean Bible Bots analysis toolkit and shows you how to work with it — whether you are running it here in Google Colab, in a local Jupyter environment, or just reading along on the website.
No prior Python experience is required to follow this notebook. If you can read a recipe, you can run a notebook.
Running interactively? Click the Open in Colab badge at the top of this page to open the notebook in Google Colab, then run the Colab setup cell (Section 2 below) before anything else. Every notebook in this collection has the same badge — just click it to launch any notebook directly in Colab.
Contents¶
1. How to Use a Jupyter Notebook¶
A notebook is made of cells. Each cell is either:
- Markdown (like this one) — explanatory text, headings, and tables. Read it; no action needed.
- Code (grey background) — Python code you can run.
Running a cell¶
| Environment | How to run the selected cell |
|---|---|
| Google Colab | Click the ▶ play button to the left of the cell, or press Shift + Enter |
| Jupyter Lab / Notebook | Press Shift + Enter, or click Run in the toolbar |
Shift + Enter runs the current cell and moves to the next one — it's the fastest way to step through a notebook.
Order matters¶
Cells share state. If cell 5 uses a variable defined in cell 3, you must run cell 3 first. The safest habit: use Runtime → Run all (Colab) or Kernel → Restart and Run All (Jupyter) to execute the whole notebook top-to-bottom.
When a cell is running¶
In Colab, you'll see a spinning circle to the left. In Jupyter, the bracket shows
[*]. Wait until it shows a number (e.g. [3]) before moving on.
Output appears below the cell¶
Tables, charts, and printed text all appear directly beneath the cell that produced them. Scroll down if you don't see output immediately.
# @title Colab setup (runs only on Google Colab)
import sys
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
import subprocess, os
# Clone the repo so all source and data paths work
if not os.path.isdir("/content/berean-bible-bots"):
subprocess.run(
["git", "clone", "--depth", "1",
"https://github.com/dnovick/berean-bible-bots.git",
"/content/berean-bible-bots"],
check=True,
)
os.chdir("/content/berean-bible-bots")
sys.path.insert(0, "/content/berean-bible-bots/src")
# Install Python dependencies
subprocess.run(
[sys.executable, "-m", "pip", "install", "-q", "-r",
"binder/requirements.txt"],
check=True,
)
# Download processed data files (~295 MB, one-time)
subprocess.run(["bash", "binder/postBuild"], check=True)
print("Colab environment ready.")
2. Setting Up the Environment (Colab)¶
The cell above handles all setup automatically when running in Google Colab:
- Clones the repository into
/content/berean-bible-bots - Installs the required Python packages
- Downloads the processed data files (~295 MB) from
bereanbiblebots.com/data/
This takes 2–4 minutes the first time. Once it finishes (you'll see
Colab environment ready.), all subsequent cells run instantly.
Running locally? Skip this cell — if you followed the setup instructions in the Notebooks index, your environment is already ready.
# Standard local import path (ignored in Colab — already set above)
import sys
if 'google.colab' not in sys.modules:
sys.path.insert(0, '../../../src')
import pandas as pd
from bible_grammar.query import query, reload
reload()
print('bible_grammar loaded successfully.')
3. The query() Function¶
query() is the primary entry point for the entire dataset. Called with no
arguments it returns every word in the dataset — Hebrew OT, Aramaic OT,
and Greek NT — as a pandas DataFrame.
A DataFrame is a table: rows are individual word tokens, columns are morphological fields. If you have ever used Excel, the mental model is identical.
# Load the full dataset and show the first few rows
df = query()
print(f'Total word tokens: {len(df):,}')
df.head()
# What columns (fields) are available?
print(df.columns.tolist())
4. Filtering by Book, Chapter, and Verse¶
query() accepts keyword filters. Book codes follow the standard three-letter
abbreviation used throughout the dataset (e.g. Gen, Psa, Mat, Rom).
# Genesis 1:1 — the first verse of the Hebrew Bible
gen_1_1 = query(book='Gen', chapter=1, verse=1)
gen_1_1[['word', 'transliteration', 'translation', 'part_of_speech']]
# John 1:1 — the opening of the Gospel of John
jhn_1_1 = query(book='Jhn', chapter=1, verse=1)
jhn_1_1[['word', 'transliteration', 'translation', 'part_of_speech']]
# All of Romans — how many words?
romans = query(book='Rom')
print(f'Romans word count: {len(romans):,}')
5. Filtering by Morphology¶
You can pass any column value as a filter. The most useful morphological
fields for Hebrew are part_of_speech, stem, and conjugation.
For Greek: part_of_speech, tense, voice, mood, case_.
Hebrew stems include: Qal, Niphal, Piel, Pual, Hiphil, Hophal, Hithpael.
Hebrew conjugations include: perfect, imperfect, imperative, infinitive construct,
infinitive absolute, participle active, participle passive, wayyiqtol.
# All Niphal perfect verbs in the Old Testament
niphal_perf = query(part_of_speech='verb', stem='Niphal', conjugation='perfect')
print(f'Niphal perfect tokens (whole OT): {len(niphal_perf):,}')
niphal_perf[['ref', 'word', 'translation', 'root']].head(10)
# Niphal perfects in Genesis only
nip_gen = query(book='Gen', part_of_speech='verb', stem='Niphal', conjugation='perfect')
print(f'Niphal perfect tokens in Genesis: {len(nip_gen):,}')
nip_gen[['ref', 'word', 'translation', 'root']]
# Greek: aorist passive indicative verbs in Romans
aor_pass = query(book='Rom', part_of_speech='verb', tense='aorist', voice='passive', mood='indicative')
print(f'Aorist passive indicative in Romans: {len(aor_pass):,}')
aor_pass[['ref', 'word', 'translation']].head(10)
6. Counting and Summarizing¶
Once you have a filtered DataFrame, standard pandas operations let you count and group the results. Two patterns cover most use cases.
# How many Niphal perfect verbs are in each book of the OT?
niphal_perf = query(part_of_speech='verb', stem='Niphal', conjugation='perfect')
by_book = niphal_perf.groupby('book').size().sort_values(ascending=False)
print(by_book.head(10).to_string())
# Verb stem distribution across the entire Torah (Gen–Deu)
torah_books = ['Gen', 'Exo', 'Lev', 'Num', 'Deu']
torah_verbs = query(part_of_speech='verb')
torah_verbs = torah_verbs[torah_verbs['book'].isin(torah_books)]
stem_dist = torah_verbs.groupby('stem').size().sort_values(ascending=False)
print(stem_dist.to_string())
# Most frequent roots in the Qal perfect across the OT
qal_perf = query(part_of_speech='verb', stem='Qal', conjugation='perfect')
top_roots = qal_perf.groupby('root').size().sort_values(ascending=False).head(15)
print(top_roots.to_string())
7. Generating Charts¶
Most analysis modules include chart-generating functions that return a file path
and can display inline in Colab or Jupyter using IPython.display.Image.
import matplotlib.pyplot as plt
from IPython.display import display
# Quick bar chart — verb stem distribution across the OT
ot_verbs = query(part_of_speech='verb', source='hebrew')
stem_counts = ot_verbs.groupby('stem').size().sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(10, 4))
stem_counts.plot(kind='bar', ax=ax, color='steelblue')
ax.set_title('Hebrew Verb Stem Distribution — Whole OT')
ax.set_xlabel('Stem')
ax.set_ylabel('Token count')
ax.tick_params(axis='x', rotation=45)
plt.tight_layout()
plt.show()
8. Exploring the Analysis Modules¶
Beyond query(), the toolkit includes higher-level modules that pre-build
common analyses. Here is a quick tour:
| Module | Import | What it does |
|---|---|---|
bible_grammar.qal |
from bible_grammar.qal import ... |
Qal verb statistics, conjugation profiles, top roots |
bible_grammar.niphal |
from bible_grammar.niphal import ... |
Niphal verb statistics and charts |
bible_grammar.hiphil |
from bible_grammar.hiphil import ... |
Hiphil verb statistics and charts |
bible_grammar.nt_discourse |
from bible_grammar.nt_discourse import ... |
Greek discourse particles (καί, δέ, ὅτι, …) |
bible_grammar.nt_verbs |
from bible_grammar.nt_verbs import ... |
Greek verb tense/voice/mood profiles |
bible_grammar.query |
from bible_grammar.query import query |
Low-level filtered access to the whole dataset |
Each module exposes print_* functions for quick terminal-style summaries
and *_chart() functions that save and return a PNG path.
The best way to discover what's available is to browse the other notebooks in this collection — each one focuses on a specific area of the grammar.
# Example: Niphal verb overview using the high-level module
from bible_grammar.niphal import print_niphal_overview
print_niphal_overview()
# Example: καί semantic function profile across the GNT
from bible_grammar import nt_kai_profile
nt_kai_profile()
9. A Note on AI-Generated Content¶
Every analysis, chart, exercise, and piece of content in this project was generated by AI (Claude, Anthropic) under human direction. The data itself comes from carefully curated open-source datasets (STEPBible, MACULA), but the code, analysis, and interpretation layers were written by AI.
Errors will exist. Treat every result as a starting point, not a final answer. Verify significant findings against your Hebrew, Aramaic, and Greek texts directly, and against trusted grammars and lexicons.
"These were more noble than those in Thessalonica, in that they received the word with all readiness of mind, and searched the scriptures daily, whether those things were so." — Acts 17:11 (KJV)
Happy studying!