Getting Started¶
This guide covers installation, parsing your first document, generating quiz questions, and launching the web UI.
Installation¶
pip install quiz-gen
Verify the installation:
quiz-gen --version
Development installation¶
git clone https://github.com/yauheniya-ai/quiz-gen.git
cd quiz-gen
pip install -e ".[dev]"
Parsing a document¶
Using the CLI¶
Parse a EUR-Lex regulation directly from its URL:
quiz-gen https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32018R1139
This downloads the document, parses it, and saves two files to data/processed/:
32018R1139_chunks.json— all content units with metadata32018R1139_toc.json— hierarchical table of contents
Parse a local HTML file:
quiz-gen data/raw/regulation.html
Specify an output directory and print the table of contents:
quiz-gen data/raw/regulation.html --output data/processed --print-toc
Using Python¶
from quiz_gen import EURLexParser
# Parse from URL
url = "https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32018R1139"
parser = EURLexParser(url=url)
chunks, toc = parser.parse()
print(f"Title: {toc['title']}")
print(f"Total chunks: {len(chunks)}")
# Save results
parser.save_chunks("data/processed/chunks.json")
parser.save_toc("data/processed/toc.json")
Parsing a local file¶
from quiz_gen import EURLexParser
with open("data/raw/regulation.html", "r", encoding="utf-8") as f:
html_content = f.read()
parser = EURLexParser(html_content=html_content)
chunks, toc = parser.parse()
Understanding the output¶
Chunks¶
Each document is split into RegulationChunk objects representing logical units:
{
"section_type": "article",
"number": "1",
"title": "Article 1 - Subject matter and objectives",
"content": "1. The principal objective of this Regulation...",
"hierarchy_path": ["Enacting Terms", "CHAPTER I - PRINCIPLES", "Article 1..."],
"metadata": {"id": "art_1", "subtitle": "Subject matter and objectives"}
}
section_type value |
Description |
|---|---|
title |
Document title |
citation |
Combined citations block |
recital |
Individual recital |
article |
Article (main content unit) |
annex |
Annex |
concluding_formulas |
Signatures and adoption info |
Table of contents¶
A dictionary with a title key and a sections list describing the full document hierarchy, including chapter, section, and article nesting.
Filtering chunks¶
from quiz_gen import EURLexParser, SectionType
parser = EURLexParser(url=url)
chunks, toc = parser.parse()
articles = [c for c in chunks if c.section_type == SectionType.ARTICLE]
recitals = [c for c in chunks if c.section_type == SectionType.RECITAL]
print(f"Articles: {len(articles)}, Recitals: {len(recitals)}")
# Find a specific article by number
article_5 = next(
c for c in chunks
if c.section_type == SectionType.ARTICLE and c.number == "5"
)
print(article_5.content[:200])
Generating quiz questions¶
Quiz generation requires at least one AI provider API key. Set the relevant environment variable or create a .env file in the project root:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
Basic usage¶
from quiz_gen import EURLexParser
from quiz_gen.agents.workflow import QuizGenerationWorkflow
from quiz_gen.agents.config import AgentConfig
# Parse a document
parser = EURLexParser(url="https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32018R1139")
chunks, _ = parser.parse()
# Configure and run the workflow (reads keys from environment)
config = AgentConfig()
workflow = QuizGenerationWorkflow(config)
# Generate questions for one article
result = workflow.run(chunks[50].to_dict())
print(f"Judge decision: {result['judge_decision']}")
for q in result["final_questions"]:
print(f"\n[{q['focus']}] {q['question']}")
for letter, text in q["options"].items():
print(f" {letter}. {text}")
print(f" Correct: {q['correct_answer']}")
Configuring providers¶
Providers and models are independently configurable per agent:
config = AgentConfig(
conceptual_provider="openai",
conceptual_model="gpt-4o",
practical_provider="anthropic",
practical_model="claude-sonnet-4-20250514",
validator_provider="openai",
validator_model="gpt-4o",
refiner_provider="openai",
refiner_model="gpt-4o",
judge_provider="anthropic",
judge_model="claude-sonnet-4-20250514",
)
Supported provider values: openai, anthropic, google, mistral, cohere.
See Agents for the full configuration reference.
Using the web UI¶
The web UI is built into the package. Launch it with:
quiz-gen --ui
The browser opens automatically at http://localhost:8000. Additional options:
# Custom port
quiz-gen --ui --port 9000
# Skip automatic browser opening
quiz-gen --ui --no-browser
# Bind to localhost only
quiz-gen --ui --host 127.0.0.1
# Set server log level
quiz-gen --ui --log-level info
The UI supports document parsing by URL or file upload, TOC navigation, chunk preview, and quiz generation with configurable AI providers.
Common patterns¶
Batch processing multiple documents¶
from pathlib import Path
from quiz_gen import EURLexParser
for path in Path("data/raw").glob("*.html"):
with open(path, encoding="utf-8") as f:
html = f.read()
parser = EURLexParser(html_content=html)
chunks, toc = parser.parse()
parser.save_chunks(f"data/processed/{path.stem}_chunks.json")
parser.save_toc(f"data/processed/{path.stem}_toc.json")
print(f"Processed {path.name}: {len(chunks)} chunks")
Preview TOC without saving¶
quiz-gen --print-toc --no-save regulation.html
Or in Python:
parser.print_toc()
Search chunk content¶
# Find all articles mentioning "safety"
safety_articles = [
c for c in chunks
if c.section_type == SectionType.ARTICLE
and "safety" in c.content.lower()
]
for article in safety_articles:
print(f"Article {article.number}: {article.title}")
Next steps¶
- Parsers — full EUR-Lex parser reference and output structure
- Agents — multi-agent pipeline architecture and configuration
- CLI — complete CLI reference
- API Reference — full class and method documentation
- Examples — complete working examples