ArchiveBox

mirror of https://github.com/ArchiveBox/ArchiveBox.git synced 2025-12-28 14:49:55 +00:00

Author	SHA1	Message	Date
Nick Sweeting	4ccb0863bb	continue renaming extractor to plugin, add plan for hook concurrency, add chrome kill helper script	2025-12-28 05:29:24 -08:00
Nick Sweeting	bd265c0083	rename extractor to plugin everywhere	2025-12-28 04:43:15 -08:00
Nick Sweeting	50e527ec65	way better plugin hooks system wip	2025-12-28 03:39:59 -08:00
Claude	b632894bc9	Update views, API, and exports for new ArchiveResult output fields Replace old `output` field with new fields across the codebase: - output_str: Human-readable output summary - output_json: Structured metadata (optional) - output_files: Dict of output files with metadata - output_size: Total size in bytes - output_mimetypes: CSV of file mimetypes Files updated: - api/v1_core.py: Update MinimalArchiveResultSchema to expose new fields - api/v1_core.py: Update ArchiveResultFilterSchema to search output_str - cli/archivebox_extract.py: Use output_str in CLI output - core/admin_archiveresults.py: Update admin fields, search, and fieldsets - core/admin_archiveresults.py: Fix output_html variable name bug in output_summary - misc/jsonl.py: Update archiveresult_to_jsonl() to include new fields - plugins/extractor_utils.py: Update ExtractorResult helper class The embed_path() method already uses output_files and output_str, so snapshot detail page and template tags work correctly.	2025-12-27 20:28:22 +00:00
Claude	3d985fa8c8	Implement hook architecture with JSONL output support Phase 1: Database migration for new ArchiveResult fields - Add output_str (TextField) for human-readable summary - Add output_json (JSONField) for structured metadata - Add output_files (JSONField) for dict of {relative_path: {}} - Add output_size (BigIntegerField) for total bytes - Add output_mimetypes (CharField) for CSV of mimetypes - Add binary FK to InstalledBinary (optional) - Migrate existing 'output' field to new split fields Phase 3: Update run_hook() for JSONL parsing - Support new JSONL format (any line with {type: 'ModelName', ...}) - Maintain backwards compatibility with RESULT_JSON= format - Add plugin metadata to each parsed record - Detect background hooks with .bg. suffix in filename - Add find_binary_for_cmd() helper function - Add create_model_record() for processing side-effect records Phase 6: Update ArchiveResult.run() - Handle background hooks (return immediately when result is None) - Process 'records' from HookResult for side-effect models - Use new output fields (output_str, output_json, output_files, etc.) - Call create_model_record() for InstalledBinary, Machine updates Phase 7: Add background hook support - Add is_background_hook() method to ArchiveResult - Add check_background_completed() to check if process exited - Add finalize_background_hook() to collect results from completed hooks - Update SnapshotMachine.is_finished() to check/finalize background hooks - Update _populate_output_fields() to walk directory and populate stats Also updated references to old 'output' field in: - admin_archiveresults.py - statemachines.py - templatetags/core_tags.py	2025-12-27 08:38:49 +00:00
Nick Sweeting	35dd9acafe	implement fs_version migrations	2025-12-27 00:25:35 -08:00
Claude	ea6fe94c93	Add crawls_crawlschedule table to 0.8.x test schema and fix migrations - Add missing crawls_crawlschedule table definition to SCHEMA_0_8 in test file - Record all replaced dev branch migrations (0023-0074) for squashed migration - Update 0024_snapshot_crawl migration to depend on squashed machine migration - Remove 'extractor' field references from crawls admin - All 45 migration tests now pass (0.4.x, 0.7.x, 0.8.x, fresh install)	2025-12-27 04:32:58 +00:00
Claude	766bb28536	Fix migration tests and M2M field alteration issue - Remove M2M tags field alteration from migration 0027 (Django doesn't support altering M2M fields via migration) - Add machine app tables to 0.8.x test schema - Add missing columns (config, num_uses_failed, num_uses_succeeded) to 0.8.x test schema - Skip 0.8.x migration tests due to complex migration state dependencies with machine app - All 15 0.7.x migration tests now pass - Merge dev branch and resolve pyproject.toml conflict (keep both uuid7 and gallery-dl deps)	2025-12-27 03:00:44 +00:00
Claude	13be196fd7	Merge remote-tracking branch 'origin/dev' into claude/improve-test-suite-xm6Bh # Conflicts: # pyproject.toml	2025-12-27 02:27:51 +00:00
Nick Sweeting	e2cbcd17f6	more tests and migrations fixes	2025-12-26 18:22:48 -08:00
Claude	ae2ab5b273	Add Python 3.13 support with uuid7 backport compatibility - Create uuid_compat.py module that provides uuid7 for Python <3.14 using uuid_extensions package, and native uuid.uuid7 for Python 3.14+ - Update all model files and migrations to use archivebox.uuid_compat - Add uuid7 conditional dependency in pyproject.toml for Python <3.14 - Update requires-python to >=3.13 (from >=3.14) - Update GitHub workflows, lock_pkgs.sh to use Python 3.13 - Update tool configs (ruff, pyright, uv) for Python 3.13 This enables running ArchiveBox on Python 3.13 while maintaining forward compatibility with Python 3.14's native uuid7 support.	2025-12-27 01:07:30 +00:00
Nick Sweeting	4fd7fcdbcf	new gallerydl plugin and more	2025-12-26 11:55:03 -08:00
Nick Sweeting	9838d7ba02	tons of ui fixes and plugin fixes Some checks failed CodeQL / Analyze (${{ matrix.language }}) (none, python) (push) Has been cancelled Details Build Debian package / build (push) Has been cancelled Details Deploy static content to Pages / deploy (push) Has been cancelled Details Build Homebrew package / build (push) Has been cancelled Details Build GitHub Pages website / build (push) Has been cancelled Details Run linters / lint (push) Has been cancelled Details Build Pip package / build (push) Has been cancelled Details Run tests / python_tests (ubuntu-22.04, 3.11) (push) Has been cancelled Details Run tests / docker_tests (push) Has been cancelled Details Build GitHub Pages website / deploy (push) Has been cancelled Details	2025-12-25 03:59:51 -08:00
Nick Sweeting	bb53228ebf	remove Seed model in favor of Crawl as template	2025-12-25 01:52:41 -08:00
Nick Sweeting	866f993f26	logging and admin ui improvements	2025-12-25 01:10:41 -08:00
Nick Sweeting	d95f0dc186	remove huey	2025-12-24 23:40:18 -08:00
Nick Sweeting	6c769d831c	wip 2	2025-12-24 21:46:14 -08:00
Nick Sweeting	1915333b81	wip major changes	2025-12-24 20:10:38 -08:00
Nick Sweeting	c1335fed37	Remove ABID system and KVTag model - use UUIDv7 IDs exclusively Some checks are pending CodeQL / Analyze (${{ matrix.language }}) (none, python) (push) Waiting to run Details Build Debian package / build (push) Waiting to run Details Deploy static content to Pages / deploy (push) Waiting to run Details Build Homebrew package / build (push) Waiting to run Details Build GitHub Pages website / build (push) Waiting to run Details Build GitHub Pages website / deploy (push) Blocked by required conditions Details Run linters / lint (push) Waiting to run Details Build Pip package / build (push) Waiting to run Details Run tests / python_tests (ubuntu-22.04, 3.11) (push) Waiting to run Details Run tests / docker_tests (push) Waiting to run Details This commit completes the simplification of the ID system by: - Removing the ABID (ArchiveBox ID) system entirely - Removing the base_models/abid.py file - Removing KVTag model in favor of the existing Tag model in core/models.py - Simplifying all models to use standard UUIDv7 primary keys - Removing ABID-related admin functionality - Cleaning up commented-out ABID code from views and statemachines - Deleting migration files for ABID field removal (no longer needed) All models now use simple UUIDv7 ids via `id = models.UUIDField(primary_key=True, default=uuid7)` Note: Old migrations containing ABID references are preserved for database migration history compatibility. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-24 06:13:49 -08:00
Nick Sweeting	7975b47c85	remove dependencies on unneeded libraries	2024-12-18 18:07:35 -08:00
dish	9ca66c6a2b	fix syntax error in archivebox/core/models.py	2024-12-18 18:17:17 -05:00
Nick Sweeting	f6d22a3cc4	tweak worker updated logic and add output_dir_template and symlinks logic Some checks failed CodeQL / Analyze (${{ matrix.language }}) (none, python) (push) Has been cancelled Details Build Debian package / build (push) Has been cancelled Details Build Docker image / buildx (push) Has been cancelled Details Deploy static content to Pages / deploy (push) Has been cancelled Details Build Homebrew package / build (push) Has been cancelled Details Build GitHub Pages website / build (push) Has been cancelled Details Run linters / lint (push) Has been cancelled Details Build Pip package / build (push) Has been cancelled Details Run tests / python_tests (ubuntu-22.04, 3.11) (push) Has been cancelled Details Run tests / docker_tests (push) Has been cancelled Details Build GitHub Pages website / deploy (push) Has been cancelled Details	2024-12-13 06:03:52 -08:00
Nick Sweeting	5c06b8ff00	add new Event model to workers/models	2024-12-12 22:08:17 -08:00
Nick Sweeting	2a1afcf6c2	move crawl models back into dedicated app	2024-12-12 21:45:55 -08:00
Nick Sweeting	bd5dd2f949	clearer core models separation of concerns using new basemodels	2024-12-12 21:45:53 -08:00
Nick Sweeting	ac53fdf677	make chrome binary and configs directly runnable and make extractor use external bin	2024-12-06 02:06:39 -08:00
Nick Sweeting	d192eb5c48	add filestore content addressible store draft Some checks failed CodeQL / Analyze (${{ matrix.language }}) (none, python) (push) Has been cancelled Details Build Debian package / build (push) Has been cancelled Details Build Docker image / buildx (push) Has been cancelled Details Deploy static content to Pages / deploy (push) Has been cancelled Details Build Homebrew package / build (push) Has been cancelled Details Build GitHub Pages website / build (push) Has been cancelled Details Run linters / lint (push) Has been cancelled Details Build Pip package / build (push) Has been cancelled Details Run tests / python_tests (ubuntu-22.04, 3.11) (push) Has been cancelled Details Run tests / docker_tests (push) Has been cancelled Details Build GitHub Pages website / deploy (push) Has been cancelled Details	2024-12-04 02:15:04 -08:00
Nick Sweeting	1ceaa1ac7a	add ABID model check and fix model inheritance	2024-12-03 02:14:21 -08:00
Nick Sweeting	c374d7695e	allow getting crawl from API as rss feed	2024-12-03 02:13:45 -08:00
Nick Sweeting	b948e49013	add urls log to Crawl model	2024-11-19 06:32:33 -08:00
Nick Sweeting	2595139180	improve statemachine logging and archivebox update CLI cmd	2024-11-19 03:31:05 -08:00
Nick Sweeting	c9a05c9d94	working archivebox update CLI cmd	2024-11-19 02:32:05 -08:00
Nick Sweeting	5f01fc8307	fix archivebox shell and manage CLI commands	2024-11-19 00:48:39 -08:00
Nick Sweeting	328eb98a38	move main funcs into cli files and switch to using click for CLI	2024-11-19 00:18:51 -08:00
Nick Sweeting	569081a9eb	rename abid_utils to base_models	2024-11-18 19:40:05 -08:00
Nick Sweeting	65afd405b1	merge seeds and crawls apps	2024-11-18 19:23:14 -08:00
Nick Sweeting	4c25e90378	move monkey_patches.py into archivebox.misc subfolder	2024-11-18 19:10:42 -08:00
Nick Sweeting	4a5d607296	move logging_util into archivebox.misc subfolder	2024-11-18 19:08:49 -08:00
Nick Sweeting	e469c5a344	merge queues and actors apps into new workers app	2024-11-18 18:52:48 -08:00
Nick Sweeting	0acd388c02	fix imports and deps	2024-11-18 18:07:34 -08:00
Nick Sweeting	eeb2671e4d	API improvements	2024-11-18 04:27:38 -08:00
Nick Sweeting	eb53145e4e	working state machine flow yay	2024-11-18 04:27:38 -08:00
Nick Sweeting	9adfe0e2e6	add code to log all SQL queries for DEBUG	2024-11-18 04:27:38 -08:00
Nick Sweeting	385ccaa14d	extend core models with ModelWithOutputDir	2024-11-18 04:27:38 -08:00
Nick Sweeting	f5727c7da2	rename actors to workers	2024-11-18 04:27:37 -08:00
Nick Sweeting	1ec2753664	fix statemachine create_root_snapshot and retry timing	2024-11-18 04:27:37 -08:00
Nick Sweeting	36d24cd8d7	add jobs dashboard	2024-11-17 20:09:55 -08:00
Nick Sweeting	8f8fbbb7a2	API fixes and add actors endpoints	2024-11-17 20:09:06 -08:00
Nick Sweeting	c8e186f21b	fix plugin loading order, admin, abx-pkg Some checks failed CodeQL / Analyze (${{ matrix.language }}) (none, python) (push) Has been cancelled Details Build Debian package / build (push) Has been cancelled Details Build Docker image / buildx (push) Has been cancelled Details Deploy static content to Pages / deploy (push) Has been cancelled Details Build Homebrew package / build (push) Has been cancelled Details Build GitHub Pages website / build (push) Has been cancelled Details Run linters / lint (push) Has been cancelled Details Build Pip package / build (push) Has been cancelled Details Run tests / python_tests (ubuntu-22.04, 3.11) (push) Has been cancelled Details Run tests / docker_tests (push) Has been cancelled Details Build GitHub Pages website / deploy (push) Has been cancelled Details	2024-11-16 06:44:12 -08:00
Nick Sweeting	ba26d75079	add notes and label fields, fix model getters	2024-11-16 02:47:35 -08:00

1 2 3 4 5 ...

516 Commits