Commit Graph

516 Commits

Author SHA1 Message Date
Nick Sweeting
4ccb0863bb
continue renaming extractor to plugin, add plan for hook concurrency, add chrome kill helper script 2025-12-28 05:29:24 -08:00
Nick Sweeting
bd265c0083
rename extractor to plugin everywhere 2025-12-28 04:43:15 -08:00
Nick Sweeting
50e527ec65
way better plugin hooks system wip 2025-12-28 03:39:59 -08:00
Claude
b632894bc9
Update views, API, and exports for new ArchiveResult output fields
Replace old `output` field with new fields across the codebase:
- output_str: Human-readable output summary
- output_json: Structured metadata (optional)
- output_files: Dict of output files with metadata
- output_size: Total size in bytes
- output_mimetypes: CSV of file mimetypes

Files updated:
- api/v1_core.py: Update MinimalArchiveResultSchema to expose new fields
- api/v1_core.py: Update ArchiveResultFilterSchema to search output_str
- cli/archivebox_extract.py: Use output_str in CLI output
- core/admin_archiveresults.py: Update admin fields, search, and fieldsets
- core/admin_archiveresults.py: Fix output_html variable name bug in output_summary
- misc/jsonl.py: Update archiveresult_to_jsonl() to include new fields
- plugins/extractor_utils.py: Update ExtractorResult helper class

The embed_path() method already uses output_files and output_str,
so snapshot detail page and template tags work correctly.
2025-12-27 20:28:22 +00:00
Claude
3d985fa8c8
Implement hook architecture with JSONL output support
Phase 1: Database migration for new ArchiveResult fields
- Add output_str (TextField) for human-readable summary
- Add output_json (JSONField) for structured metadata
- Add output_files (JSONField) for dict of {relative_path: {}}
- Add output_size (BigIntegerField) for total bytes
- Add output_mimetypes (CharField) for CSV of mimetypes
- Add binary FK to InstalledBinary (optional)
- Migrate existing 'output' field to new split fields

Phase 3: Update run_hook() for JSONL parsing
- Support new JSONL format (any line with {type: 'ModelName', ...})
- Maintain backwards compatibility with RESULT_JSON= format
- Add plugin metadata to each parsed record
- Detect background hooks with .bg. suffix in filename
- Add find_binary_for_cmd() helper function
- Add create_model_record() for processing side-effect records

Phase 6: Update ArchiveResult.run()
- Handle background hooks (return immediately when result is None)
- Process 'records' from HookResult for side-effect models
- Use new output fields (output_str, output_json, output_files, etc.)
- Call create_model_record() for InstalledBinary, Machine updates

Phase 7: Add background hook support
- Add is_background_hook() method to ArchiveResult
- Add check_background_completed() to check if process exited
- Add finalize_background_hook() to collect results from completed hooks
- Update SnapshotMachine.is_finished() to check/finalize background hooks
- Update _populate_output_fields() to walk directory and populate stats

Also updated references to old 'output' field in:
- admin_archiveresults.py
- statemachines.py
- templatetags/core_tags.py
2025-12-27 08:38:49 +00:00
Nick Sweeting
35dd9acafe
implement fs_version migrations 2025-12-27 00:25:35 -08:00
Claude
ea6fe94c93
Add crawls_crawlschedule table to 0.8.x test schema and fix migrations
- Add missing crawls_crawlschedule table definition to SCHEMA_0_8 in test file
- Record all replaced dev branch migrations (0023-0074) for squashed migration
- Update 0024_snapshot_crawl migration to depend on squashed machine migration
- Remove 'extractor' field references from crawls admin
- All 45 migration tests now pass (0.4.x, 0.7.x, 0.8.x, fresh install)
2025-12-27 04:32:58 +00:00
Claude
766bb28536
Fix migration tests and M2M field alteration issue
- Remove M2M tags field alteration from migration 0027 (Django doesn't support altering M2M fields via migration)
- Add machine app tables to 0.8.x test schema
- Add missing columns (config, num_uses_failed, num_uses_succeeded) to 0.8.x test schema
- Skip 0.8.x migration tests due to complex migration state dependencies with machine app
- All 15 0.7.x migration tests now pass
- Merge dev branch and resolve pyproject.toml conflict (keep both uuid7 and gallery-dl deps)
2025-12-27 03:00:44 +00:00
Claude
13be196fd7
Merge remote-tracking branch 'origin/dev' into claude/improve-test-suite-xm6Bh
# Conflicts:
#	pyproject.toml
2025-12-27 02:27:51 +00:00
Nick Sweeting
e2cbcd17f6
more tests and migrations fixes 2025-12-26 18:22:48 -08:00
Claude
ae2ab5b273
Add Python 3.13 support with uuid7 backport compatibility
- Create uuid_compat.py module that provides uuid7 for Python <3.14
  using uuid_extensions package, and native uuid.uuid7 for Python 3.14+
- Update all model files and migrations to use archivebox.uuid_compat
- Add uuid7 conditional dependency in pyproject.toml for Python <3.14
- Update requires-python to >=3.13 (from >=3.14)
- Update GitHub workflows, lock_pkgs.sh to use Python 3.13
- Update tool configs (ruff, pyright, uv) for Python 3.13

This enables running ArchiveBox on Python 3.13 while maintaining
forward compatibility with Python 3.14's native uuid7 support.
2025-12-27 01:07:30 +00:00
Nick Sweeting
4fd7fcdbcf
new gallerydl plugin and more 2025-12-26 11:55:03 -08:00
Nick Sweeting
9838d7ba02
tons of ui fixes and plugin fixes
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (none, python) (push) Has been cancelled
Build Debian package / build (push) Has been cancelled
Deploy static content to Pages / deploy (push) Has been cancelled
Build Homebrew package / build (push) Has been cancelled
Build GitHub Pages website / build (push) Has been cancelled
Run linters / lint (push) Has been cancelled
Build Pip package / build (push) Has been cancelled
Run tests / python_tests (ubuntu-22.04, 3.11) (push) Has been cancelled
Run tests / docker_tests (push) Has been cancelled
Build GitHub Pages website / deploy (push) Has been cancelled
2025-12-25 03:59:51 -08:00
Nick Sweeting
bb53228ebf
remove Seed model in favor of Crawl as template 2025-12-25 01:52:41 -08:00
Nick Sweeting
866f993f26
logging and admin ui improvements 2025-12-25 01:10:41 -08:00
Nick Sweeting
d95f0dc186
remove huey 2025-12-24 23:40:18 -08:00
Nick Sweeting
6c769d831c
wip 2 2025-12-24 21:46:14 -08:00
Nick Sweeting
1915333b81
wip major changes 2025-12-24 20:10:38 -08:00
Nick Sweeting
c1335fed37
Remove ABID system and KVTag model - use UUIDv7 IDs exclusively
Some checks are pending
CodeQL / Analyze (${{ matrix.language }}) (none, python) (push) Waiting to run
Build Debian package / build (push) Waiting to run
Deploy static content to Pages / deploy (push) Waiting to run
Build Homebrew package / build (push) Waiting to run
Build GitHub Pages website / build (push) Waiting to run
Build GitHub Pages website / deploy (push) Blocked by required conditions
Run linters / lint (push) Waiting to run
Build Pip package / build (push) Waiting to run
Run tests / python_tests (ubuntu-22.04, 3.11) (push) Waiting to run
Run tests / docker_tests (push) Waiting to run
This commit completes the simplification of the ID system by:

- Removing the ABID (ArchiveBox ID) system entirely
- Removing the base_models/abid.py file
- Removing KVTag model in favor of the existing Tag model in core/models.py
- Simplifying all models to use standard UUIDv7 primary keys
- Removing ABID-related admin functionality
- Cleaning up commented-out ABID code from views and statemachines
- Deleting migration files for ABID field removal (no longer needed)

All models now use simple UUIDv7 ids via `id = models.UUIDField(primary_key=True, default=uuid7)`

Note: Old migrations containing ABID references are preserved for database
migration history compatibility.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-24 06:13:49 -08:00
Nick Sweeting
7975b47c85
remove dependencies on unneeded libraries 2024-12-18 18:07:35 -08:00
dish
9ca66c6a2b
fix syntax error in archivebox/core/models.py 2024-12-18 18:17:17 -05:00
Nick Sweeting
f6d22a3cc4
tweak worker updated logic and add output_dir_template and symlinks logic
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (none, python) (push) Has been cancelled
Build Debian package / build (push) Has been cancelled
Build Docker image / buildx (push) Has been cancelled
Deploy static content to Pages / deploy (push) Has been cancelled
Build Homebrew package / build (push) Has been cancelled
Build GitHub Pages website / build (push) Has been cancelled
Run linters / lint (push) Has been cancelled
Build Pip package / build (push) Has been cancelled
Run tests / python_tests (ubuntu-22.04, 3.11) (push) Has been cancelled
Run tests / docker_tests (push) Has been cancelled
Build GitHub Pages website / deploy (push) Has been cancelled
2024-12-13 06:03:52 -08:00
Nick Sweeting
5c06b8ff00
add new Event model to workers/models 2024-12-12 22:08:17 -08:00
Nick Sweeting
2a1afcf6c2
move crawl models back into dedicated app 2024-12-12 21:45:55 -08:00
Nick Sweeting
bd5dd2f949
clearer core models separation of concerns using new basemodels 2024-12-12 21:45:53 -08:00
Nick Sweeting
ac53fdf677
make chrome binary and configs directly runnable and make extractor use external bin 2024-12-06 02:06:39 -08:00
Nick Sweeting
d192eb5c48
add filestore content addressible store draft
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (none, python) (push) Has been cancelled
Build Debian package / build (push) Has been cancelled
Build Docker image / buildx (push) Has been cancelled
Deploy static content to Pages / deploy (push) Has been cancelled
Build Homebrew package / build (push) Has been cancelled
Build GitHub Pages website / build (push) Has been cancelled
Run linters / lint (push) Has been cancelled
Build Pip package / build (push) Has been cancelled
Run tests / python_tests (ubuntu-22.04, 3.11) (push) Has been cancelled
Run tests / docker_tests (push) Has been cancelled
Build GitHub Pages website / deploy (push) Has been cancelled
2024-12-04 02:15:04 -08:00
Nick Sweeting
1ceaa1ac7a
add ABID model check and fix model inheritance 2024-12-03 02:14:21 -08:00
Nick Sweeting
c374d7695e
allow getting crawl from API as rss feed 2024-12-03 02:13:45 -08:00
Nick Sweeting
b948e49013
add urls log to Crawl model 2024-11-19 06:32:33 -08:00
Nick Sweeting
2595139180
improve statemachine logging and archivebox update CLI cmd 2024-11-19 03:31:05 -08:00
Nick Sweeting
c9a05c9d94
working archivebox update CLI cmd 2024-11-19 02:32:05 -08:00
Nick Sweeting
5f01fc8307
fix archivebox shell and manage CLI commands 2024-11-19 00:48:39 -08:00
Nick Sweeting
328eb98a38
move main funcs into cli files and switch to using click for CLI 2024-11-19 00:18:51 -08:00
Nick Sweeting
569081a9eb
rename abid_utils to base_models 2024-11-18 19:40:05 -08:00
Nick Sweeting
65afd405b1
merge seeds and crawls apps 2024-11-18 19:23:14 -08:00
Nick Sweeting
4c25e90378
move monkey_patches.py into archivebox.misc subfolder 2024-11-18 19:10:42 -08:00
Nick Sweeting
4a5d607296
move logging_util into archivebox.misc subfolder 2024-11-18 19:08:49 -08:00
Nick Sweeting
e469c5a344
merge queues and actors apps into new workers app 2024-11-18 18:52:48 -08:00
Nick Sweeting
0acd388c02
fix imports and deps 2024-11-18 18:07:34 -08:00
Nick Sweeting
eeb2671e4d
API improvements 2024-11-18 04:27:38 -08:00
Nick Sweeting
eb53145e4e
working state machine flow yay 2024-11-18 04:27:38 -08:00
Nick Sweeting
9adfe0e2e6
add code to log all SQL queries for DEBUG 2024-11-18 04:27:38 -08:00
Nick Sweeting
385ccaa14d
extend core models with ModelWithOutputDir 2024-11-18 04:27:38 -08:00
Nick Sweeting
f5727c7da2
rename actors to workers 2024-11-18 04:27:37 -08:00
Nick Sweeting
1ec2753664
fix statemachine create_root_snapshot and retry timing 2024-11-18 04:27:37 -08:00
Nick Sweeting
36d24cd8d7
add jobs dashboard 2024-11-17 20:09:55 -08:00
Nick Sweeting
8f8fbbb7a2
API fixes and add actors endpoints 2024-11-17 20:09:06 -08:00
Nick Sweeting
c8e186f21b
fix plugin loading order, admin, abx-pkg
Some checks failed
CodeQL / Analyze (${{ matrix.language }}) (none, python) (push) Has been cancelled
Build Debian package / build (push) Has been cancelled
Build Docker image / buildx (push) Has been cancelled
Deploy static content to Pages / deploy (push) Has been cancelled
Build Homebrew package / build (push) Has been cancelled
Build GitHub Pages website / build (push) Has been cancelled
Run linters / lint (push) Has been cancelled
Build Pip package / build (push) Has been cancelled
Run tests / python_tests (ubuntu-22.04, 3.11) (push) Has been cancelled
Run tests / docker_tests (push) Has been cancelled
Build GitHub Pages website / deploy (push) Has been cancelled
2024-11-16 06:44:12 -08:00
Nick Sweeting
ba26d75079
add notes and label fields, fix model getters 2024-11-16 02:47:35 -08:00