Command-Line Interface¶
The Quiz-Gen CLI provides a powerful and user-friendly command-line interface for parsing EUR-Lex documents and extracting structured content.
Installation¶
The CLI is automatically available after installing the package:
pip install quiz-gen
Verify installation:
quiz-gen --version
Basic Usage¶
Quick Start¶
Parse a document from URL:
quiz-gen https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32018R1139
Parse a local HTML file:
quiz-gen data/documents/regulation.html
Command Syntax¶
quiz-gen [OPTIONS] INPUT
Arguments:
- INPUT - URL or file path to EUR-Lex HTML document (required)
Options Reference¶
Input/Output Options¶
INPUT (required)¶
The source document to parse. Can be:
- URL: https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32018R1139
- Local file: data/documents/regulation.html or /absolute/path/to/file.html
# Parse from URL
quiz-gen https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32018R1139
# Parse from relative path
quiz-gen data/documents/regulation.html
# Parse from absolute path
quiz-gen /Users/username/Documents/regulation.html
-o, --output DIRECTORY¶
Output directory for generated JSON files.
Default: data/processed
# Save to custom directory
quiz-gen --output results regulation.html
# Save to absolute path
quiz-gen --output /Users/username/output regulation.html
# Save to current directory
quiz-gen --output . regulation.html
The directory will be created automatically if it doesn't exist.
--chunks FILENAME¶
Custom filename for chunks JSON output.
Default: <document-id>_chunks.json
# Custom chunks filename
quiz-gen --chunks my_articles.json regulation.html
# Will create: data/processed/my_articles.json
When not specified, the filename is generated from:
- URL: CELEX number (e.g., 32018R1139_chunks.json)
- File: File stem (e.g., regulation_chunks.json)
--toc FILENAME¶
Custom filename for table of contents JSON output.
Default: <document-id>_toc.json
# Custom TOC filename
quiz-gen --toc my_structure.json regulation.html
# Will create: data/processed/my_structure.json
--no-save¶
Parse document and show statistics but don't save any files.
# Preview parsing results without saving
quiz-gen --no-save regulation.html
Useful for: - Testing document compatibility - Previewing chunk counts - Checking document structure
Display Options¶
--print-toc¶
Print formatted table of contents to console after parsing.
# Show TOC in console
quiz-gen --print-toc regulation.html
Output example:
Regulation (EU) 2018/1139
├── Preamble
│ ├── Citation
│ └── Recitals (88)
├── Enacting Terms
│ ├── CHAPTER I - GENERAL PROVISIONS
│ │ ├── Article 1 - Subject matter and scope
│ │ └── Article 2 - Definitions
...
Can be combined with --no-save to only display TOC:
quiz-gen --print-toc --no-save regulation.html
--verbose¶
Enable detailed output showing parsing progress and errors.
quiz-gen --verbose regulation.html
Output includes: - Document fetching/reading status - Parsing progress for each section - Detailed error messages with stack traces - File saving confirmations
Example output:
Fetching document from URL: https://eur-lex.europa.eu/...
Parsing document...
Successfully parsed document
Title: Regulation (EU) 2018/1139 of the European Parliament...
Total chunks: 242
article: 141
recital: 88
annex: 10
citation: 1
concluding_formulas: 1
title: 1
Files saved to: data/processed
When an error occurs in verbose mode, a full stack trace is printed to help diagnose the problem.
-v, --version¶
Display version information and exit.
quiz-gen --version
Output:
quiz-gen 0.5.2
-h, --help¶
Show help message with all options and examples.
quiz-gen --help
Web UI Options¶
The following options are only relevant when using --ui.
--ui¶
Launch the built-in web UI served by uvicorn. Opens the browser automatically.
quiz-gen --ui
--host HOST¶
Host address to bind the UI server to.
Default: 0.0.0.0 (all interfaces)
# Bind to localhost only
quiz-gen --ui --host 127.0.0.1
--port PORT¶
Port for the UI server.
Default: 8000
quiz-gen --ui --port 9000
--reload¶
Enable auto-reload for development (restarts the server on code changes).
quiz-gen --ui --reload
--no-browser¶
Prevent the CLI from automatically opening a browser tab.
quiz-gen --ui --no-browser
--log-level LEVEL¶
Log verbosity for the uvicorn server.
Default: warning
Choices: debug, info, warning, error
quiz-gen --ui --log-level info
Examples¶
Basic Parsing¶
Parse a regulation from EUR-Lex:
quiz-gen https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32018R1139
Output:
- data/processed/32018R1139_chunks.json
- data/processed/32018R1139_toc.json
Custom Output Location¶
Save to specific directory:
quiz-gen --output regulations/easa regulation.html
Output:
- regulations/easa/regulation_chunks.json
- regulations/easa/regulation_toc.json
Custom Filenames¶
Use custom names for both output files:
quiz-gen --output data \
--chunks easa_articles.json \
--toc easa_structure.json \
regulation.html
Output:
- data/easa_articles.json
- data/easa_structure.json
Preview Without Saving¶
Check document structure before saving:
quiz-gen --print-toc --no-save regulation.html
Shows full TOC in console without creating files.
Batch Processing¶
Process multiple documents:
# Using a shell loop
for file in data/documents/*.html; do
quiz-gen --output data/processed "$file"
done
Or with custom naming:
#!/bin/bash
for file in data/documents/*.html; do
base=$(basename "$file" .html)
quiz-gen --output data/processed \
--chunks "${base}_articles.json" \
--toc "${base}_toc.json" \
"$file"
done
Pipeline Integration¶
Use in data processing pipelines:
# Download, parse, and extract articles
curl -s "https://eur-lex.europa.eu/...uri=CELEX:32018R1139" > temp.html && \
quiz-gen --output . temp.html && \
rm temp.html
Check if parsing succeeded:
quiz-gen regulation.html
if [ $? -eq 0 ]; then
echo "Parsing successful"
# Continue processing...
else
echo "Parsing failed"
exit 1
fi
Verbose Mode for Debugging¶
Get detailed output for troubleshooting:
quiz-gen --verbose --print-toc regulation.html
Shows: - Detailed parsing progress - Chunk type counts - Full TOC structure - Error stack traces (if any)
Current Directory Output¶
Save files in current working directory:
quiz-gen --output . regulation.html
Output Files¶
Chunks JSON¶
Contains all document content split into logical chunks.
Filename pattern: <document-id>_chunks.json
Structure:
[
{
"section_type": "title",
"number": null,
"title": "Regulation (EU) 2018/1139",
"subtitle": null,
"content": "Regulation (EU) 2018/1139 of the European Parliament...",
"navigation_id": "title",
"hierarchy_path": []
},
{
"section_type": "article",
"number": "1",
"title": "Subject matter and scope",
"subtitle": null,
"content": "This Regulation lays down common rules...",
"navigation_id": "art_1",
"hierarchy_path": ["Enacting Terms", "CHAPTER I", "Article 1"]
}
]
Table of Contents JSON¶
Hierarchical navigation structure of the document.
Filename pattern: <document-id>_toc.json
Structure:
{
"title": "Regulation (EU) 2018/1139",
"hierarchy": {
"Preamble": {
"Citation": {
"id": "cit_1",
"type": "citation"
},
"Recital 1": {
"id": "rec_1",
"type": "recital"
}
},
"Enacting Terms": {
"CHAPTER I - GENERAL PROVISIONS": {
"Article 1": {
"id": "art_1",
"type": "article"
}
}
}
}
}
Exit Codes¶
The CLI returns standard exit codes:
- 0 - Success: Document parsed and files saved
- 1 - Error: Parsing failed or invalid input
Use in scripts:
if quiz-gen regulation.html; then
echo "Success"
else
echo "Failed with exit code $?"
fi
Error Handling¶
Common Errors¶
File Not Found¶
Error: File not found: data/regulation.html
Solution: Check file path is correct and file exists.
Invalid URL¶
Error: Invalid URL or empty document
Solutions: - Verify URL is correct and accessible - Check internet connection - Try downloading HTML and parsing locally
Permission Denied¶
Error: [Errno 13] Permission denied: 'data/processed'
Solutions:
# Create directory manually
mkdir -p data/processed
# Or use writable location
quiz-gen --output ~/Documents regulation.html
Parse Errors¶
Error: Failed to parse document structure
Solution: Use --verbose to see detailed error:
quiz-gen --verbose regulation.html
Debugging Tips¶
-
Use verbose mode to see what's happening:
bash quiz-gen --verbose regulation.html -
Preview without saving to test parsing:
bash quiz-gen --no-save regulation.html -
Check document structure with TOC:
bash quiz-gen --print-toc --no-save regulation.html -
Test with known document to verify installation:
bash quiz-gen https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32018R1139
Performance¶
Typical Processing Times¶
| Document Size | Articles | Processing Time |
|---|---|---|
| Small (< 50 articles) | < 50 | < 5 seconds |
| Medium (50-150 articles) | 50-150 | 5-15 seconds |
| Large (> 150 articles) | > 150 | 15-30 seconds |
Times include: - Document download/reading - HTML parsing - Content extraction - Text cleaning - JSON serialization
Memory Usage¶
Memory usage scales with document size:
- Small documents: < 50 MB
- Medium documents: 50-100 MB
- Large documents: 100-200 MB
The parser processes documents in memory, so ensure adequate RAM for large documents.
Network Performance¶
For URL parsing: - Download time depends on internet speed - EUR-Lex documents are typically 200 KB - 2 MB - Use local files for batch processing to avoid network overhead
Integration¶
Python Scripts¶
import subprocess
import sys
result = subprocess.run(
["quiz-gen", "--output", "data", "regulation.html"],
capture_output=True,
text=True
)
if result.returncode == 0:
print("Success:", result.stdout)
else:
print("Error:", result.stderr, file=sys.stderr)
Makefiles¶
.PHONY: parse-all
parse-all:
@for file in data/raw/*.html; do \
quiz-gen --output data/processed "$$file"; \
done
parse-verbose:
quiz-gen --verbose --print-toc $(FILE)
CI/CD Pipelines¶
# GitHub Actions example
- name: Parse EUR-Lex documents
run: |
pip install quiz-gen
quiz-gen --output artifacts regulation.html
- name: Upload results
uses: actions/upload-artifact@v3
with:
name: parsed-documents
path: artifacts/*.json
Advanced Usage¶
Environment Variables¶
While not directly supported, you can use shell variables:
OUTPUT_DIR="data/processed"
VERBOSE_FLAG="--verbose"
quiz-gen $VERBOSE_FLAG --output $OUTPUT_DIR regulation.html
Process Substitution¶
Parse from curl output:
quiz-gen <(curl -s "https://eur-lex.europa.eu/...uri=CELEX:32018R1139")
JSON Processing¶
Pipe output to jq for analysis:
# Count articles by chapter
quiz-gen --no-save regulation.html 2>&1 | grep "article:"
Or process saved JSON:
quiz-gen regulation.html
jq '[.[] | select(.section_type == "article")] | length' \
data/processed/regulation_chunks.json
Parallel Processing¶
Process multiple documents in parallel:
# GNU parallel
parallel quiz-gen --output data/processed ::: data/raw/*.html
# xargs (macOS/Linux)
ls data/raw/*.html | xargs -n 1 -P 4 quiz-gen --output data/processed
Best Practices¶
File Organization¶
Recommended directory structure:
project/
├── data/
│ ├── raw/ # Original HTML files
│ ├── processed/ # Parsed JSON output (default)
│ └── documents/
│ └── html/ # Downloaded documents
├── scripts/
│ └── parse_all.sh # Batch processing scripts
└── results/ # Final analysis output
Naming Conventions¶
Use consistent naming for outputs:
# Good: includes document ID
quiz-gen --chunks 2018_1139_content.json regulation.html
# Better: includes date and version
quiz-gen --chunks 2018_1139_v1_20260118_content.json regulation.html
Error Handling in Scripts¶
#!/bin/bash
set -e # Exit on error
for file in data/raw/*.html; do
if ! quiz-gen --output data/processed "$file"; then
echo "Failed to parse: $file" >> errors.log
fi
done
Version Pinning¶
For reproducible environments:
# requirements.txt
quiz-gen==0.1.1
pip install -r requirements.txt
Troubleshooting¶
CLI Not Found¶
quiz-gen: command not found
Solutions:
-
Verify installation:
bash pip list | grep quiz-gen -
Reinstall:
bash pip install --force-reinstall quiz-gen -
Check PATH:
bash which quiz-gen python -m quiz_gen.cli --version
Wrong Version¶
# Check current version
quiz-gen --version
# Upgrade to latest
pip install --upgrade quiz-gen
Import Errors¶
If you see module import errors, reinstall dependencies:
pip install --upgrade beautifulsoup4 lxml requests
Related Documentation¶
- Getting Started - Installation and first steps
- Parsers - Detailed parser documentation
- API Reference - Python API documentation
- Examples - Advanced usage examples
Support¶
For CLI-specific issues:
- Check this documentation
- Run with
--verbosefor detailed errors - Report issues at GitHub Issue Tracker
- Include:
- Command used
- Complete error message
- Output from
quiz-gen --version - Sample document (if possible)