Command-Line Interface¶

The Quiz-Gen CLI provides a powerful and user-friendly command-line interface for parsing EUR-Lex documents and extracting structured content.

Installation¶

The CLI is automatically available after installing the package:

pip install quiz-gen

Verify installation:

quiz-gen --version

Basic Usage¶

Quick Start¶

Parse a document from URL:

quiz-gen https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32018R1139

Parse a local HTML file:

quiz-gen data/documents/regulation.html

Command Syntax¶

quiz-gen [OPTIONS] INPUT

Arguments: - INPUT - URL or file path to EUR-Lex HTML document (required)

Options Reference¶

Input/Output Options¶

`INPUT` (required)¶

The source document to parse. Can be: - URL: https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32018R1139 - Local file: data/documents/regulation.html or /absolute/path/to/file.html

# Parse from URL
quiz-gen https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32018R1139

# Parse from relative path
quiz-gen data/documents/regulation.html

# Parse from absolute path
quiz-gen /Users/username/Documents/regulation.html

`-o, --output DIRECTORY`¶

Output directory for generated JSON files.

Default: data/processed

# Save to custom directory
quiz-gen --output results regulation.html

# Save to absolute path
quiz-gen --output /Users/username/output regulation.html

# Save to current directory
quiz-gen --output . regulation.html

The directory will be created automatically if it doesn't exist.

`--chunks FILENAME`¶

Custom filename for chunks JSON output.

Default: <document-id>_chunks.json

# Custom chunks filename
quiz-gen --chunks my_articles.json regulation.html

# Will create: data/processed/my_articles.json

When not specified, the filename is generated from: - URL: CELEX number (e.g., 32018R1139_chunks.json) - File: File stem (e.g., regulation_chunks.json)

`--toc FILENAME`¶

Custom filename for table of contents JSON output.

Default: <document-id>_toc.json

# Custom TOC filename
quiz-gen --toc my_structure.json regulation.html

# Will create: data/processed/my_structure.json

`--no-save`¶

Parse document and show statistics but don't save any files.

# Preview parsing results without saving
quiz-gen --no-save regulation.html

Useful for: - Testing document compatibility - Previewing chunk counts - Checking document structure

Display Options¶

`--print-toc`¶

Print formatted table of contents to console after parsing.

# Show TOC in console
quiz-gen --print-toc regulation.html

Output example:

Regulation (EU) 2018/1139
├── Preamble
│   ├── Citation
│   └── Recitals (88)
├── Enacting Terms
│   ├── CHAPTER I - GENERAL PROVISIONS
│   │   ├── Article 1 - Subject matter and scope
│   │   └── Article 2 - Definitions
...

Can be combined with --no-save to only display TOC:

quiz-gen --print-toc --no-save regulation.html

`--verbose`¶

Enable detailed output showing parsing progress and errors.

quiz-gen --verbose regulation.html

Output includes: - Document fetching/reading status - Parsing progress for each section - Detailed error messages with stack traces - File saving confirmations

Example output:

Fetching document from URL: https://eur-lex.europa.eu/...
Parsing document...

 Successfully parsed document
  Title: Regulation (EU) 2018/1139 of the European Parliament...
  Total chunks: 242
    article: 141
    recital: 88
    annex: 10
    citation: 1
    concluding_formulas: 1
    title: 1

 Files saved to: data/processed

When an error occurs in verbose mode, a full stack trace is printed to help diagnose the problem.

`-v, --version`¶

Display version information and exit.

quiz-gen --version

Output:

quiz-gen 0.5.2

`-h, --help`¶

Show help message with all options and examples.

quiz-gen --help

Web UI Options¶

The following options are only relevant when using --ui.

`--ui`¶

Launch the built-in web UI served by uvicorn. Opens the browser automatically.

quiz-gen --ui

`--host HOST`¶

Host address to bind the UI server to.

Default: 0.0.0.0 (all interfaces)

# Bind to localhost only
quiz-gen --ui --host 127.0.0.1

`--port PORT`¶

Port for the UI server.

Default: 8000

quiz-gen --ui --port 9000

`--reload`¶

Enable auto-reload for development (restarts the server on code changes).

quiz-gen --ui --reload

`--no-browser`¶

Prevent the CLI from automatically opening a browser tab.

quiz-gen --ui --no-browser

`--log-level LEVEL`¶

Log verbosity for the uvicorn server.

Default: warning Choices: debug, info, warning, error

quiz-gen --ui --log-level info

Examples¶

Basic Parsing¶

Parse a regulation from EUR-Lex:

quiz-gen https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32018R1139

Output: - data/processed/32018R1139_chunks.json - data/processed/32018R1139_toc.json

Custom Output Location¶

Save to specific directory:

quiz-gen --output regulations/easa regulation.html

Output: - regulations/easa/regulation_chunks.json - regulations/easa/regulation_toc.json

Custom Filenames¶

Use custom names for both output files:

quiz-gen --output data \
  --chunks easa_articles.json \
  --toc easa_structure.json \
  regulation.html

Output: - data/easa_articles.json - data/easa_structure.json

Preview Without Saving¶

Check document structure before saving:

quiz-gen --print-toc --no-save regulation.html

Shows full TOC in console without creating files.

Batch Processing¶

Process multiple documents:

# Using a shell loop
for file in data/documents/*.html; do
  quiz-gen --output data/processed "$file"
done

Or with custom naming:

#!/bin/bash
for file in data/documents/*.html; do
  base=$(basename "$file" .html)
  quiz-gen --output data/processed \
    --chunks "${base}_articles.json" \
    --toc "${base}_toc.json" \
    "$file"
done

Pipeline Integration¶

Use in data processing pipelines:

# Download, parse, and extract articles
curl -s "https://eur-lex.europa.eu/...uri=CELEX:32018R1139" > temp.html && \
  quiz-gen --output . temp.html && \
  rm temp.html

Check if parsing succeeded:

quiz-gen regulation.html
if [ $? -eq 0 ]; then
  echo "Parsing successful"
  # Continue processing...
else
  echo "Parsing failed"
  exit 1
fi

Verbose Mode for Debugging¶

Get detailed output for troubleshooting:

quiz-gen --verbose --print-toc regulation.html

Shows: - Detailed parsing progress - Chunk type counts - Full TOC structure - Error stack traces (if any)

Current Directory Output¶

Save files in current working directory:

quiz-gen --output . regulation.html

Output Files¶

Chunks JSON¶

Contains all document content split into logical chunks.

Filename pattern: <document-id>_chunks.json

Structure:

[
  {
    "section_type": "title",
    "number": null,
    "title": "Regulation (EU) 2018/1139",
    "subtitle": null,
    "content": "Regulation (EU) 2018/1139 of the European Parliament...",
    "navigation_id": "title",
    "hierarchy_path": []
  },
  {
    "section_type": "article",
    "number": "1",
    "title": "Subject matter and scope",
    "subtitle": null,
    "content": "This Regulation lays down common rules...",
    "navigation_id": "art_1",
    "hierarchy_path": ["Enacting Terms", "CHAPTER I", "Article 1"]
  }
]

Table of Contents JSON¶

Hierarchical navigation structure of the document.

Filename pattern: <document-id>_toc.json

Structure:

{
  "title": "Regulation (EU) 2018/1139",
  "hierarchy": {
    "Preamble": {
      "Citation": {
        "id": "cit_1",
        "type": "citation"
      },
      "Recital 1": {
        "id": "rec_1",
        "type": "recital"
      }
    },
    "Enacting Terms": {
      "CHAPTER I - GENERAL PROVISIONS": {
        "Article 1": {
          "id": "art_1",
          "type": "article"
        }
      }
    }
  }
}

Exit Codes¶

The CLI returns standard exit codes:

0 - Success: Document parsed and files saved
1 - Error: Parsing failed or invalid input

Use in scripts:

if quiz-gen regulation.html; then
  echo "Success"
else
  echo "Failed with exit code $?"
fi

Error Handling¶

Common Errors¶

File Not Found¶

Error: File not found: data/regulation.html

Solution: Check file path is correct and file exists.

Invalid URL¶

Error: Invalid URL or empty document

Solutions: - Verify URL is correct and accessible - Check internet connection - Try downloading HTML and parsing locally

Permission Denied¶

Error: [Errno 13] Permission denied: 'data/processed'

Solutions:

# Create directory manually
mkdir -p data/processed

# Or use writable location
quiz-gen --output ~/Documents regulation.html

Parse Errors¶

Error: Failed to parse document structure

Solution: Use --verbose to see detailed error:

quiz-gen --verbose regulation.html

Debugging Tips¶

Use verbose mode to see what's happening: bash quiz-gen --verbose regulation.html
Preview without saving to test parsing: bash quiz-gen --no-save regulation.html
Check document structure with TOC: bash quiz-gen --print-toc --no-save regulation.html
Test with known document to verify installation: bash quiz-gen https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32018R1139

Performance¶

Typical Processing Times¶

Document Size	Articles	Processing Time
Small (< 50 articles)	< 50	< 5 seconds
Medium (50-150 articles)	50-150	5-15 seconds
Large (> 150 articles)	> 150	15-30 seconds

Times include: - Document download/reading - HTML parsing - Content extraction - Text cleaning - JSON serialization

Memory Usage¶

Memory usage scales with document size:

Small documents: < 50 MB
Medium documents: 50-100 MB
Large documents: 100-200 MB

The parser processes documents in memory, so ensure adequate RAM for large documents.

Network Performance¶

For URL parsing: - Download time depends on internet speed - EUR-Lex documents are typically 200 KB - 2 MB - Use local files for batch processing to avoid network overhead

Integration¶

Python Scripts¶

import subprocess
import sys

result = subprocess.run(
    ["quiz-gen", "--output", "data", "regulation.html"],
    capture_output=True,
    text=True
)

if result.returncode == 0:
    print("Success:", result.stdout)
else:
    print("Error:", result.stderr, file=sys.stderr)

Makefiles¶

.PHONY: parse-all

parse-all:
    @for file in data/raw/*.html; do \
        quiz-gen --output data/processed "$$file"; \
    done

parse-verbose:
    quiz-gen --verbose --print-toc $(FILE)

CI/CD Pipelines¶

# GitHub Actions example
- name: Parse EUR-Lex documents
  run: |
    pip install quiz-gen
    quiz-gen --output artifacts regulation.html

- name: Upload results
  uses: actions/upload-artifact@v3
  with:
    name: parsed-documents
    path: artifacts/*.json

Advanced Usage¶

Environment Variables¶

While not directly supported, you can use shell variables:

OUTPUT_DIR="data/processed"
VERBOSE_FLAG="--verbose"

quiz-gen $VERBOSE_FLAG --output $OUTPUT_DIR regulation.html

Process Substitution¶

Parse from curl output:

quiz-gen <(curl -s "https://eur-lex.europa.eu/...uri=CELEX:32018R1139")

JSON Processing¶

Pipe output to jq for analysis:

# Count articles by chapter
quiz-gen --no-save regulation.html 2>&1 | grep "article:"

Or process saved JSON:

quiz-gen regulation.html
jq '[.[] | select(.section_type == "article")] | length' \
  data/processed/regulation_chunks.json

Parallel Processing¶

Process multiple documents in parallel:

# GNU parallel
parallel quiz-gen --output data/processed ::: data/raw/*.html

# xargs (macOS/Linux)
ls data/raw/*.html | xargs -n 1 -P 4 quiz-gen --output data/processed

Best Practices¶

File Organization¶

Recommended directory structure:

project/
├── data/
│   ├── raw/              # Original HTML files
│   ├── processed/        # Parsed JSON output (default)
│   └── documents/
│       └── html/         # Downloaded documents
├── scripts/
│   └── parse_all.sh      # Batch processing scripts
└── results/              # Final analysis output

Naming Conventions¶

Use consistent naming for outputs:

# Good: includes document ID
quiz-gen --chunks 2018_1139_content.json regulation.html

# Better: includes date and version
quiz-gen --chunks 2018_1139_v1_20260118_content.json regulation.html

Error Handling in Scripts¶

#!/bin/bash
set -e  # Exit on error

for file in data/raw/*.html; do
  if ! quiz-gen --output data/processed "$file"; then
    echo "Failed to parse: $file" >> errors.log
  fi
done

Version Pinning¶

For reproducible environments:

# requirements.txt
quiz-gen==0.1.1

pip install -r requirements.txt

Troubleshooting¶

CLI Not Found¶

quiz-gen: command not found