Skip to content

Getting Started

This guide covers installation, parsing your first document, generating quiz questions, and launching the web UI.

Installation

pip install quiz-gen

Verify the installation:

quiz-gen --version

Development installation

git clone https://github.com/yauheniya-ai/quiz-gen.git
cd quiz-gen
pip install -e ".[dev]"

Parsing a document

Using the CLI

Parse a EUR-Lex regulation directly from its URL:

quiz-gen https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32018R1139

This downloads the document, parses it, and saves two files to data/processed/:

  • 32018R1139_chunks.json — all content units with metadata
  • 32018R1139_toc.json — hierarchical table of contents

Parse a local HTML file:

quiz-gen data/raw/regulation.html

Specify an output directory and print the table of contents:

quiz-gen data/raw/regulation.html --output data/processed --print-toc

Using Python

from quiz_gen import EURLexParser

# Parse from URL
url = "https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32018R1139"
parser = EURLexParser(url=url)
chunks, toc = parser.parse()

print(f"Title: {toc['title']}")
print(f"Total chunks: {len(chunks)}")

# Save results
parser.save_chunks("data/processed/chunks.json")
parser.save_toc("data/processed/toc.json")

Parsing a local file

from quiz_gen import EURLexParser

with open("data/raw/regulation.html", "r", encoding="utf-8") as f:
    html_content = f.read()

parser = EURLexParser(html_content=html_content)
chunks, toc = parser.parse()

Understanding the output

Chunks

Each document is split into RegulationChunk objects representing logical units:

{
  "section_type": "article",
  "number": "1",
  "title": "Article 1 - Subject matter and objectives",
  "content": "1. The principal objective of this Regulation...",
  "hierarchy_path": ["Enacting Terms", "CHAPTER I - PRINCIPLES", "Article 1..."],
  "metadata": {"id": "art_1", "subtitle": "Subject matter and objectives"}
}
section_type value Description
title Document title
citation Combined citations block
recital Individual recital
article Article (main content unit)
annex Annex
concluding_formulas Signatures and adoption info

Table of contents

A dictionary with a title key and a sections list describing the full document hierarchy, including chapter, section, and article nesting.

Filtering chunks

from quiz_gen import EURLexParser, SectionType

parser = EURLexParser(url=url)
chunks, toc = parser.parse()

articles = [c for c in chunks if c.section_type == SectionType.ARTICLE]
recitals = [c for c in chunks if c.section_type == SectionType.RECITAL]
print(f"Articles: {len(articles)}, Recitals: {len(recitals)}")

# Find a specific article by number
article_5 = next(
    c for c in chunks
    if c.section_type == SectionType.ARTICLE and c.number == "5"
)
print(article_5.content[:200])

Generating quiz questions

Quiz generation requires at least one AI provider API key. Set the relevant environment variable or create a .env file in the project root:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

Basic usage

from quiz_gen import EURLexParser
from quiz_gen.agents.workflow import QuizGenerationWorkflow
from quiz_gen.agents.config import AgentConfig

# Parse a document
parser = EURLexParser(url="https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32018R1139")
chunks, _ = parser.parse()

# Configure and run the workflow (reads keys from environment)
config = AgentConfig()
workflow = QuizGenerationWorkflow(config)

# Generate questions for one article
result = workflow.run(chunks[50].to_dict())

print(f"Judge decision: {result['judge_decision']}")
for q in result["final_questions"]:
    print(f"\n[{q['focus']}] {q['question']}")
    for letter, text in q["options"].items():
        print(f"  {letter}. {text}")
    print(f"  Correct: {q['correct_answer']}")

Configuring providers

Providers and models are independently configurable per agent:

config = AgentConfig(
    conceptual_provider="openai",
    conceptual_model="gpt-4o",
    practical_provider="anthropic",
    practical_model="claude-sonnet-4-20250514",
    validator_provider="openai",
    validator_model="gpt-4o",
    refiner_provider="openai",
    refiner_model="gpt-4o",
    judge_provider="anthropic",
    judge_model="claude-sonnet-4-20250514",
)

Supported provider values: openai, anthropic, google, mistral, cohere.

See Agents for the full configuration reference.

Using the web UI

The web UI is built into the package. Launch it with:

quiz-gen --ui

The browser opens automatically at http://localhost:8000. Additional options:

# Custom port
quiz-gen --ui --port 9000

# Skip automatic browser opening
quiz-gen --ui --no-browser

# Bind to localhost only
quiz-gen --ui --host 127.0.0.1

# Set server log level
quiz-gen --ui --log-level info

The UI supports document parsing by URL or file upload, TOC navigation, chunk preview, and quiz generation with configurable AI providers.

Common patterns

Batch processing multiple documents

from pathlib import Path
from quiz_gen import EURLexParser

for path in Path("data/raw").glob("*.html"):
    with open(path, encoding="utf-8") as f:
        html = f.read()
    parser = EURLexParser(html_content=html)
    chunks, toc = parser.parse()
    parser.save_chunks(f"data/processed/{path.stem}_chunks.json")
    parser.save_toc(f"data/processed/{path.stem}_toc.json")
    print(f"Processed {path.name}: {len(chunks)} chunks")

Preview TOC without saving

quiz-gen --print-toc --no-save regulation.html

Or in Python:

parser.print_toc()

Search chunk content

# Find all articles mentioning "safety"
safety_articles = [
    c for c in chunks
    if c.section_type == SectionType.ARTICLE
    and "safety" in c.content.lower()
]

for article in safety_articles:
    print(f"Article {article.number}: {article.title}")

Next steps

  • Parsers — full EUR-Lex parser reference and output structure
  • Agents — multi-agent pipeline architecture and configuration
  • CLI — complete CLI reference
  • API Reference — full class and method documentation
  • Examples — complete working examples