Commit Graph

42 Commits

Author SHA1 Message Date
Andras Bacsai
6d47d24169 Fix standalone database "restarting" status flickering and add restart tracking
- Fix status flickering: Track databases in active/transient states (restarting, starting, created, paused) not just running
- Add isActiveOrTransient() helper to distinguish between active states and terminal states (exited, dead)
- Add safeguard: Protect updateNotFoundDatabaseStatus() from marking as exited when containers collection is empty
- Add restart_count tracking: New migration adds restart_count, last_restart_at, last_restart_type to all standalone database tables
- Update 8 database models with $casts for new restart tracking fields
- Update GetContainersStatus to extract RestartCount from Docker and update database models
- Reset restart tracking when database exits completely

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-17 16:25:41 +01:00
Andras Bacsai
0efa4af5c3 Optimize PushServerUpdateJob performance with batch updates and async jobs
- Eager load service applications and databases to eliminate N+1 queries
- Replace individual model updates with batch database updates for applications, previews, and services
- Move connectProxyToNetworks to async ConnectProxyToNetworksJob to avoid blocking status updates
- Optimize Server.databases() and applications() methods with efficient database queries
- Use flatMap for cleaner collection transformations

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-15 14:06:32 +01:00
Andras Bacsai
66e81d6d96 Fix container status display: preserve "Restarting" for applications and sub-resources
Add preserveRestarting parameter to ContainerStatusAggregator to allow applications
and service sub-resources to display "Restarting" status instead of being marked as
"Degraded". This gives better visibility into container restart behavior.

- Update ContainerStatusAggregator to accept preserveRestarting parameter (defaults to false)
- Update GetContainersStatus to use preserveRestarting: true for applications and service sub-resources
- Update PushServerUpdateJob to use preserveRestarting: true for applications and service sub-resources
- Add comprehensive documentation explaining the parameter behavior and when to use it

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 08:23:35 +01:00
Andras Bacsai
e0dc12678b
fix: comprehensive SERVICE_URL/SERVICE_FQDN handling improvements and queue reliability fixes (#7275) 2025-11-24 11:47:11 +01:00
Andras Bacsai
ac9eca3c05 fix: don't show health status for exited containers
Exited containers don't run health checks, so showing "(unhealthy)" is
misleading. This fix ensures exited status displays without health
suffixes across all monitoring systems (SSH, Sentinel, services, etc.)
and at the UI layer for backward compatibility with existing data.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 09:09:37 +01:00
Andras Bacsai
85b73a8c00 fix: initialize Collection properties to handle queue deserialization edge cases 2025-11-21 12:25:25 +01:00
Andras Bacsai
ae6eef3cdb feat(tests): add comprehensive tests for ContainerStatusAggregator and serverStatus accessor
- Introduced tests for ContainerStatusAggregator to validate status aggregation logic across various container states.
- Implemented tests to ensure serverStatus accessor correctly checks server infrastructure health without being affected by container status.
- Updated ExcludeFromHealthCheckTest to verify excluded status handling in various components.
- Removed obsolete PushServerUpdateJobStatusAggregationTest as its functionality is covered elsewhere.
- Updated version number for sentinel to 0.0.17 in versions.json.
2025-11-20 17:31:07 +01:00
Andras Bacsai
14bba8ba86 fix: correct Sentinel default health status and remove debug logging
This commit addresses container status reporting issues and removes debug logging:

**Primary Fix:**
- Changed PushServerUpdateJob to default to 'unknown' instead of 'unhealthy' when health_status field is missing from Sentinel data
- This ensures containers WITHOUT healthcheck defined are correctly reported as "unknown" not "unhealthy"
- Matches SSH path behavior (GetContainersStatus) which already defaulted to 'unknown'

**Service Multi-Container Aggregation:**
- Implemented service container status aggregation (same pattern as applications)
- Added serviceContainerStatuses collection to both Sentinel and SSH paths
- Services now aggregate status using priority: unhealthy > unknown > healthy
- Prevents race conditions where last-processed container would win

**Debug Logging Cleanup:**
- Removed all [STATUS-DEBUG] logging statements (25 total)
- Removed all ray() debugging calls (3 total)
- Removed proof_unknown_preserved and health_status_was_null debug fields
- Code is now production-ready

**Test Coverage:**
- Added 2 new tests for Sentinel default health status behavior
- Added 5 new tests for service aggregation in SSH path
- All 16 tests pass (66 assertions)

**Note:** The root cause was identified as Sentinel (Go binary) also defaulting to "unhealthy". That will need a separate fix in the Sentinel codebase.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 11:10:34 +01:00
Andras Bacsai
747a48b933 debug: add detailed Sentinel container processing logging
Added comprehensive logging to track why applicationContainerStatuses
collection is empty in PushServerUpdateJob.

## Logging Added

### 1. Raw Sentinel Data (line 113-118)
**Logs**: Complete container data received from Sentinel
**Purpose**: See exactly what Sentinel is sending
**Data**: Container count and full container array with all labels

### 2. Container Processing Loop (line 157-163)
**Logs**: Every container as it's being processed
**Purpose**: Track which containers enter the processing loop
**Data**: Container name, status, all labels, coolify.managed flag

### 3. Skipped Containers - Not Managed (line 165-171)
**Logs**: Containers without coolify.managed label
**Purpose**: Identify containers being filtered out early
**Data**: Container name

### 4. Successful Container Addition (line 193-198)
**Logs**: When container is successfully added to applicationContainerStatuses
**Purpose**: Confirm containers ARE being processed
**Data**: Application ID, container name, container status

### 5. Missing com.docker.compose.service Label (line 200-206)
**Logs**: Containers skipped due to missing com.docker.compose.service
**Purpose**: Identify the most likely root cause
**Data**: Container name, application ID, all labels

## Why This Matters

User reported applicationContainerStatuses is empty (`[]`) even though
Sentinel is pushing updates. This logging will reveal:

1. Is Sentinel sending containers at all?
2. Are containers filtered by coolify.managed check?
3. Is com.docker.compose.service label missing? (most likely)
4. What labels IS Sentinel actually sending?

## Expected Findings

Based on investigation, the issue is likely:
- Sentinel is NOT sending com.docker.compose.service in labels
- Or Sentinel uses a different label format/name
- Containers pass all other checks but fail on line 190-206

## Next Steps

After logs appear, we'll see exactly which filter is blocking containers
and can fix the root cause (likely need to extract com.docker.compose.service
from container name or use a different label source).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 08:34:42 +01:00
Andras Bacsai
d2d9c1b2bc debug: add comprehensive status change logging
Added detailed debug logging to all status update paths to help
diagnose why "unhealthy" status appears in the UI.

## Logging Added

### 1. PushServerUpdateJob (Sentinel updates)
**Location**: Lines 303-315
**Logs**: Status changes from Sentinel push updates
**Data tracked**:
- Old vs new status
- Container statuses that led to aggregation
- Status flags (hasRunning, hasUnhealthy, hasUnknown)

### 2. GetContainersStatus (SSH updates)
**Location**: Lines 441-449, 346-354, 358-365
**Logs**: Status changes from SSH-based checks
**Scenarios**:
- Normal status aggregation
- Recently restarted containers (kept as degraded)
- Applications not running (set to exited)
**Data tracked**:
- Old vs new status
- Container statuses
- Restart count and timing
- Whether containers exist

### 3. Application Model Status Accessor
**Location**: Lines 706-712, 726-732
**Logs**: When status is set without explicit health information
**Issue**: Highlights cases where health defaults to "unhealthy"
**Data tracked**:
- Raw value passed to setter
- Final result after default applied

## How to Use

### Enable Debug Logging
Edit `.env` or `config/logging.php` to set log level to debug:
```
LOG_LEVEL=debug
```

### Monitor Logs
```bash
tail -f storage/logs/laravel.log | grep STATUS-DEBUG
```

### Log Format
All logs use `[STATUS-DEBUG]` prefix for easy filtering:
```
[2025-11-19 13:00:00] local.DEBUG: [STATUS-DEBUG] Sentinel status change
{
  "source": "PushServerUpdateJob",
  "app_id": 123,
  "app_name": "my-app",
  "old_status": "running:unknown",
  "new_status": "running:healthy",
  "container_statuses": [...],
  "flags": {...}
}
```

## What to Look For

1. **Default to unhealthy**: Check Application model accessor logs
2. **Status flipping**: Compare timestamps between Sentinel and SSH updates
3. **Incorrect aggregation**: Check flags and container_statuses
4. **Stale database values**: Check if old_status persists across multiple logs

## Next Steps

After gathering logs, we can:
1. Identify the exact source of "unhealthy" status
2. Determine if it's a default issue, aggregation bug, or timing problem
3. Apply targeted fix based on evidence

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:52:08 +01:00
Andras Bacsai
6b62847a11 fix: preserve unknown health status in Sentinel updates (PushServerUpdateJob)
## Problem
Services with "running (unknown)" status were periodically changing
to "running (healthy)" every ~30 seconds when Sentinel pushed updates.
This was confusing for users and inconsistent with SSH-based status checks.

## Root Cause
`PushServerUpdateJob::aggregateMultiContainerStatuses()` was missing
logic to track "unknown" health state. It only tracked "unhealthy" and
defaulted everything else to "healthy".

When Sentinel pushed updates with "running (unknown)" containers:
- The job saw `hasRunning = true` and `hasUnhealthy = false`
- It incorrectly returned "running (healthy)" instead of "running (unknown)"

## Solution
Updated `PushServerUpdateJob` to match the logic in `GetContainersStatus`:

1. Added `$hasUnknown` tracking variable
2. Check for "unknown" in status strings (alongside "unhealthy")
3. Implement 3-way priority: unhealthy > unknown > healthy

This ensures consistency between:
- SSH-based updates (`GetContainersStatus`)
- Sentinel-based updates (`PushServerUpdateJob`)
- UI display logic

## Changes
- **app/Jobs/PushServerUpdateJob.php**: Added unknown status tracking
- **tests/Unit/PushServerUpdateJobStatusAggregationTest.php**: New comprehensive tests
- **tests/Unit/ExcludeFromHealthCheckTest.php**: Updated to match current implementation

## Testing
All 31 status-related unit tests passing:
- 18 tests in ContainerHealthStatusTest
- 8 tests in ExcludeFromHealthCheckTest (updated)
- 6 tests in PushServerUpdateJobStatusAggregationTest (new)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:40:58 +01:00
Andras Bacsai
08d257535a fix(docker): enhance container status aggregation for multi-container applications, including exclusion handling based on docker-compose configuration
Some checks are pending
Staging Build / amd64 (push) Waiting to run
Staging Build / aarch64 (push) Waiting to run
Staging Build / merge-manifest (push) Blocked by required conditions
2025-09-13 20:32:15 +02:00
Andras Bacsai
0f5c988658 fix(horizon): add silenced jobs 2025-07-12 14:44:32 +02:00
Andras Bacsai
24688b2ad8 fix(jobs): update middleware to use expireAfter for WithoutOverlapping in multiple job classes 2025-07-01 10:50:27 +02:00
Andras Bacsai
f9a0ca2ca6 refactor(proxy): update StartProxy calls to use named parameter for async option 2025-06-16 13:13:01 +02:00
Andras Bacsai
ddcb14500d refactor(proxy-status): refactored how the proxy status is handled on the UI and on the backend
feat(cloudflare): improved cloudflare tunnel automated installation
2025-06-06 14:47:54 +02:00
Andras Bacsai
97ec579910 refactor(push-server-update): enhance application preview handling by incorporating pull request IDs and adding status update protections 2025-06-04 10:03:36 +02:00
Andras Bacsai
9883cef26d refactor(jobs): update middleware to include job-specific identifiers for WithoutOverlapping 2025-05-29 17:31:43 +02:00
Andras Bacsai
0369909408 fix(PushServerUpdateJob): add null checks before updating application and database statuses 2025-05-29 10:47:26 +02:00
Andras Bacsai
c6278a06ba refactor(jobs): unify middleware configuration to prevent job release after expiration for DockerCleanupJob and PushServerUpdateJob 2025-05-07 14:42:42 +02:00
Andras Bacsai
b78f2cccff refactor(jobs): update WithoutOverlapping middleware to use expireAfter for better queue management 2025-04-18 09:52:32 +02:00
Andras Bacsai
b09f0043d1 fix: restrict jobs on cloud
fix: restrict sentinel endpoint
2025-01-10 11:54:45 +01:00
Andras Bacsai
7dc65dfd79 fix: make sure important jobs/actions are running on high prio queue
Some checks are pending
Staging Build / amd64 (push) Waiting to run
Staging Build / aarch64 (push) Waiting to run
Staging Build / merge-manifest (push) Blocked by required conditions
2024-11-22 11:16:01 +01:00
Andras Bacsai
275edb6c1f put a few things on high queue 2024-11-06 12:33:56 +01:00
Lucas Michot
8e1444eaa7 Get rid of many useless blank lines 2024-10-31 17:44:01 +01:00
Andras Bacsai
96ca72fcdb refactor server view (phuuu) 2024-10-30 20:03:30 +01:00
Lucas Michot
5b6e466e0c Remove some useless catch blocks 2024-10-28 14:37:00 +01:00
Lucas Michot
d557a22b91 Remove all ray() calls 2024-10-28 13:51:23 +01:00
Andras Bacsai
8c96ab52d7 feat: notification rate limiter
Some checks are pending
Staging Build / amd64 (push) Waiting to run
Staging Build / aarch64 (push) Waiting to run
Staging Build / merge-manifest (push) Blocked by required conditions
fix: limit server up / down notification limits
2024-10-25 15:13:23 +02:00
Andras Bacsai
621e063bf1 Refactor PushServerUpdateJob to implement ShouldBeEncrypted interface 2024-10-24 15:16:00 +02:00
Andras Bacsai
ac768e5313 feat: limit storage check emails
feat: sentinel should send storage usage
2024-10-22 14:01:36 +02:00
Andras Bacsai
537630acc6 Refactor PushServerUpdateJob to handle container restart notifications 2024-10-22 11:42:24 +02:00
Andras Bacsai
d7efe8a6d1 fix: no sentinel for swarm yet 2024-10-22 11:29:43 +02:00
Andras Bacsai
4c95647b96 feat: cleanup sentinel on server deletion
fix: Sentinel should not be enabled on build servers
2024-10-17 11:21:43 +02:00
Andras Bacsai
2702fbc284 Refactor logging in PushServerUpdateJob, Application, and SentinelSeeder
Some checks failed
Staging Build / amd64 (push) Has been cancelled
Staging Build / aarch64 (push) Has been cancelled
Staging Build / merge-manifest (push) Has been cancelled
2024-10-15 17:03:50 +02:00
Andras Bacsai
d446cd4f31 sentinel updates 2024-10-15 13:39:19 +02:00
Andras Bacsai
81db57002b Refactor PushServerUpdateJob to handle multiple servers, previews, and emails
Some checks are pending
Staging Build / amd64 (push) Waiting to run
Staging Build / aarch64 (push) Waiting to run
Staging Build / merge-manifest (push) Blocked by required conditions
2024-10-14 22:53:16 +02:00
Andras Bacsai
fdeb9353be chore: Update project service configuration view 2024-10-14 19:45:03 +02:00
Andras Bacsai
1f72321681 fix: sentinel 2024-10-14 18:04:36 +02:00
Andras Bacsai
8a2c9f3d44 updates sentinel 2024-10-14 17:54:29 +02:00
Andras Bacsai
b2e515f770 sentinel 2024-10-14 13:32:36 +02:00
Andras Bacsai
1f193d465d sentinel updates 2024-10-14 12:07:37 +02:00