Commit Graph

188 Commits

Author SHA1 Message Date
Erik Johnston
1bddd25a85
Port Clock functions to use Duration class (#19229)
This changes the arguments in clock functions to be `Duration` and
converts call sites and constants into `Duration`. There are still some
more functions around that should be converted (e.g.
`timeout_deferred`), but we leave that to another PR.

We also changes `.as_secs()` to return a float, as the rounding broke
things subtly. The only reason to keep it (its the same as
`timedelta.total_seconds()`) is for symmetry with `as_millis()`.

Follows on from https://github.com/element-hq/synapse/pull/19223
2025-12-01 13:55:06 +00:00
Andrew Ferrazzutti
fcac7e0282
Write union types as X | Y where possible (#19111)
aka PEP 604, added in Python 3.10
2025-11-06 14:02:33 -06:00
Andrew Ferrazzutti
fc244bb592
Use type hinting generics in standard collections (#19046)
aka PEP 585, added in Python 3.9

 - https://peps.python.org/pep-0585/
 - https://docs.astral.sh/ruff/rules/non-pep585-annotation/
2025-10-22 16:48:19 -05:00
Andrew Morgan
ad8dcc2119
Remove internal ReplicationUploadKeysForUserRestServlet (#18988) 2025-09-30 11:12:14 +01:00
Richard van der Hoff
7ec5e60671
Introduce EventPersistencePair type (#18857)
`Tuple[EventBase, EventContext]` is everywhere and I keep misspelling
it. Let's just define a type for it.
2025-08-26 10:15:03 +01:00
Andrew Morgan
664f0e8938 Merge branch 'release-v1.135' into develop 2025-07-30 14:04:29 +01:00
reivilibre
a2ba909ded
Remove obsolete /send_event replication endpoint. (#18730)
Fixes: #18441

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
2025-07-30 12:30:40 +01:00
Eric Eastwood
f13a136396
Refactor Gauge metrics to be homeserver-scoped (#18725)
Bulk refactor `Gauge` metrics to be homeserver-scoped. We also add lints
to make sure that new `Gauge` metrics don't sneak in without using the
`server_name` label (`SERVER_NAME_LABEL`).

Part of https://github.com/element-hq/synapse/issues/18592



### Testing strategy

 1. Add the `metrics` listener in your `homeserver.yaml`
    ```yaml
    listeners:
      # This is just showing how to configure metrics either way
      #
      # `http` `metrics` resource
      - port: 9322
        type: http
        bind_addresses: ['127.0.0.1']
        resources:
          - names: [metrics]
            compress: false
      # `metrics` listener
      - port: 9323
        type: metrics
        bind_addresses: ['127.0.0.1']
    ```
1. Start the homeserver: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Fetch `http://localhost:9322/_synapse/metrics` and/or
`http://localhost:9323/metrics`
1. Observe response includes the TODO metrics with the `server_name`
label

### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct (run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
2025-07-29 10:37:59 -05:00
Eric Eastwood
2c236be058
Refactor Counter metrics to be homeserver-scoped (#18656)
Bulk refactor `Counter` metrics to be homeserver-scoped. We also add
lints to make sure that new `Counter` metrics don't sneak in without
using the `server_name` label (`SERVER_NAME_LABEL`).

All of the "Fill in" commits are just bulk refactor.

Part of https://github.com/element-hq/synapse/issues/18592



### Testing strategy

 1. Add the `metrics` listener in your `homeserver.yaml`
    ```yaml
    listeners:
      # This is just showing how to configure metrics either way
      #
      # `http` `metrics` resource
      - port: 9322
        type: http
        bind_addresses: ['127.0.0.1']
        resources:
          - names: [metrics]
            compress: false
      # `metrics` listener
      - port: 9323
        type: metrics
        bind_addresses: ['127.0.0.1']
    ```
1. Start the homeserver: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Fetch `http://localhost:9322/_synapse/metrics` and/or
`http://localhost:9323/metrics`
1. Observe response includes the `synapse_user_registrations_total`,
`synapse_http_server_response_count_total`, etc metrics with the
`server_name` label
2025-07-25 14:58:47 -05:00
Quentin Gliech
61e79a4cdf
Fix deactivation running off the main process (#18716)
Best reviewed commit by commit.

With the new dedicated MAS API
(https://github.com/element-hq/synapse/pull/18520), it's possible that
deactivation starts off the main process, which was not possible because
of a few calls.

I basically looked at everything that the deactivation handler was
doing, reviewed whether it could run on workers or not, and find a
workaround when possible

---------

Co-authored-by: Eric Eastwood <erice@element.io>
2025-07-24 08:43:58 +00:00
Quentin Gliech
5ea2cf2484
Move device changes off the main process (#18581)
The main goal of this PR is to handle device list changes onto multiple
writers, off the main process, so that we can have logins happening
whilst Synapse is rolling-restarting.

This is quite an intrusive change, so I would advise to review this
commit by commit; I tried to keep the history as clean as possible.

There are a few things to consider:

- the `device_list_key` in stream tokens becomes a
`MultiWriterStreamToken`, which has a few implications in sync and on
the storage layer
- we had a split between `DeviceHandler` and `DeviceWorkerHandler` for
master vs. worker process. I've kept this split, but making it rather
writer vs. non-writer worker, using method overrides for doing
replication calls when needed
- there are a few operations that need to happen on a single worker at a
time. Instead of using cross-worker locks, for now I made them run on
the first writer on the list

---------

Co-authored-by: Eric Eastwood <erice@element.io>
2025-07-18 09:06:14 +02:00
Eric Eastwood
88785dbaeb
Refactor cache metrics to be homeserver-scoped (#18604)
(add `server_name` label to cache metrics).

Part of https://github.com/element-hq/synapse/issues/18592
2025-07-16 16:04:57 -05:00
Eric Eastwood
fc10a5ee29
Refactor Measure block metrics to be homeserver-scoped (v2) (#18601)
Refactor `Measure` block metrics to be homeserver-scoped (add
`server_name` label to block metrics).

Part of https://github.com/element-hq/synapse/issues/18592


### Testing strategy

#### See behavior of previous `metrics` listener

 1. Add the `metrics` listener in your `homeserver.yaml`
    ```yaml
    listeners:
      - port: 9323
        type: metrics
        bind_addresses: ['127.0.0.1']
    ```
1. Start the homeserver: `poetry run synapse_homeserver --config-path
homeserver.yaml`
 1. Fetch `http://localhost:9323/metrics`
1. Observe response includes the block metrics
(`synapse_util_metrics_block_count`,
`synapse_util_metrics_block_in_flight`, etc)


#### See behavior of the `http` `metrics` resource

1. Add the `metrics` resource to a new or existing `http` listeners in
your `homeserver.yaml`
    ```yaml
    listeners:
      - port: 9322
        type: http
        bind_addresses: ['127.0.0.1']
        resources:
          - names: [metrics]
            compress: false
    ```
1. Start the homeserver: `poetry run synapse_homeserver --config-path
homeserver.yaml`
1. Fetch `http://localhost:9322/_synapse/metrics` (it's just a `GET`
request so you can even do in the browser)
1. Observe response includes the block metrics
(`synapse_util_metrics_block_count`,
`synapse_util_metrics_block_in_flight`, etc)
2025-07-15 15:55:23 -05:00
Quentin Gliech
28c9ed3ccb
Remove unnecessary replication calls (#18564)
This should be reviewed commit by commit.

Nowadays it's trivial to propagate cache invalidations, which means we
can move some things off the main process, and not go through HTTP
replication.

`ReplicationGetQueryRestServlet` appeared to be unused, and was very
weird, as it was being called if the current instance is the main one…
to RPC to the main one (if no instance is set on a replication client,
it makes it to the main process)

The other two handlers could be relatively trivially moved to any
workers, moving some methods to the worker store.

**I've intentionally not removed the replication servlets yet** so that
it's safe to rollout, and will do another PR that clean those up to
remove on the N+1 version
2025-07-11 08:47:54 +00:00
Quentin Gliech
1dc29563c1
Move registrations off the main worker (#18552)
This is mainly moving a few store methods around. Note that this doesn't
yet remove the replication servlet to avoid breaking during rollout.
2025-07-10 13:13:27 +00:00
dependabot[bot]
9d43bec326
Bump ruff from 0.7.3 to 0.11.10 (#18451)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Morgan <andrew@amorgan.xyz>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2025-05-20 15:23:30 +01:00
Andrew Ferrazzutti
006251a5d0
Add missing license header (#17799)
Co-authored-by: Erik Johnston <erik@matrix.org>
2024-10-08 12:01:44 +01:00
Andrew Morgan
316d635906
Fix NAME attribute of ReplicationRemovePusherRestServlet (#17779) 2024-10-04 09:53:35 +01:00
Andrew Ferrazzutti
5173741c71
Support MSC4140: Delayed events (Futures) (#17326)
Some checks are pending
Build release artifacts / Build sdist (push) Waiting to run
Build release artifacts / Attach assets to release (push) Blocked by required conditions
Tests / changes (push) Waiting to run
Tests / check-sampleconfig (push) Blocked by required conditions
Tests / check-schema-delta (push) Blocked by required conditions
Tests / check-lockfile (push) Waiting to run
Tests / lint (push) Blocked by required conditions
Tests / Typechecking (push) Blocked by required conditions
Tests / lint-crlf (push) Waiting to run
Tests / lint-newsfile (push) Waiting to run
Tests / lint-pydantic (push) Blocked by required conditions
Tests / lint-clippy (push) Blocked by required conditions
Tests / lint-clippy-nightly (push) Blocked by required conditions
Tests / lint-rustfmt (push) Blocked by required conditions
Tests / lint-readme (push) Blocked by required conditions
Tests / linting-done (push) Blocked by required conditions
Tests / calculate-test-jobs (push) Blocked by required conditions
Tests / trial (push) Blocked by required conditions
Tests / trial-olddeps (push) Blocked by required conditions
Tests / trial-pypy (all, pypy-3.8) (push) Blocked by required conditions
Tests / sytest (push) Blocked by required conditions
Tests / export-data (push) Blocked by required conditions
Tests / portdb (11, 3.8) (push) Blocked by required conditions
Tests / portdb (15, 3.11) (push) Blocked by required conditions
Tests / complement (monolith, Postgres) (push) Blocked by required conditions
Tests / complement (monolith, SQLite) (push) Blocked by required conditions
Tests / complement (workers, Postgres) (push) Blocked by required conditions
Tests / cargo-test (push) Blocked by required conditions
Tests / cargo-bench (push) Blocked by required conditions
Tests / tests-done (push) Blocked by required conditions
2024-09-23 13:33:48 +01:00
Quentin Gliech
7d52ce7d4b
Format files with Ruff (#17643)
Some checks are pending
Build release artifacts / Build sdist (push) Waiting to run
Build release artifacts / Attach assets to release (push) Blocked by required conditions
Tests / changes (push) Waiting to run
Tests / check-sampleconfig (push) Blocked by required conditions
Tests / check-schema-delta (push) Blocked by required conditions
Tests / check-lockfile (push) Waiting to run
Tests / lint (push) Blocked by required conditions
Tests / Typechecking (push) Blocked by required conditions
Tests / lint-crlf (push) Waiting to run
Tests / lint-newsfile (push) Waiting to run
Tests / lint-pydantic (push) Blocked by required conditions
Tests / lint-clippy (push) Blocked by required conditions
Tests / lint-clippy-nightly (push) Blocked by required conditions
Tests / lint-rustfmt (push) Blocked by required conditions
Tests / lint-readme (push) Blocked by required conditions
Tests / linting-done (push) Blocked by required conditions
Tests / calculate-test-jobs (push) Blocked by required conditions
Tests / trial (push) Blocked by required conditions
Tests / trial-olddeps (push) Blocked by required conditions
Tests / trial-pypy (all, pypy-3.8) (push) Blocked by required conditions
Tests / sytest (push) Blocked by required conditions
Tests / export-data (push) Blocked by required conditions
Tests / portdb (11, 3.8) (push) Blocked by required conditions
Tests / portdb (15, 3.11) (push) Blocked by required conditions
Tests / complement (monolith, Postgres) (push) Blocked by required conditions
Tests / complement (monolith, SQLite) (push) Blocked by required conditions
Tests / complement (workers, Postgres) (push) Blocked by required conditions
Tests / cargo-test (push) Blocked by required conditions
Tests / cargo-bench (push) Blocked by required conditions
Tests / tests-done (push) Blocked by required conditions
I thought ruff check would also format, but it doesn't.

This runs ruff format in CI and dev scripts. The first commit is just a
run of `ruff format .` in the root directory.
2024-09-02 12:39:04 +01:00
Erik Johnston
ea6bfae0fc
Add support for moving /push_rules off of main process (#17037) 2024-03-28 15:44:07 +00:00
dependabot[bot]
1e68b56a62
Bump black from 23.10.1 to 24.2.0 (#16936) 2024-03-13 16:46:44 +00:00
Erik Johnston
23740eaa3d
Correctly mention previous copyright (#16820)
During the migration the automated script to update the copyright
headers accidentally got rid of some of the existing copyright lines.
Reinstate them.
2024-01-23 11:26:48 +00:00
Patrick Cloke
8e1e62c9e0 Update license headers 2023-11-21 15:29:58 -05:00
Erik Johnston
8c63e93286
Fix HTTP repl response to use minimum token (#16578) 2023-10-30 12:27:14 +00:00
Erik Johnston
8f35f8148e
Fix bug where a new writer advances their token too quickly (#16473)
* Fix bug where a new writer advances their token too quickly

When starting a new writer (for e.g. persisting events), the
`MultiWriterIdGenerator` doesn't have a minimum token for it as there
are no rows matching that new writer in the DB.

This results in the the first stream ID it acquired being announced as
persisted *before* it actually finishes persisting, if another writer
gets and persists a subsequent stream ID. This is due to the logic of
setting the minimum persisted position to the minimum known position of
across all writers, and the new writer starts off not being considered.

* Fix sending out POSITIONs when our token advances without update

Broke in #14820

* For replication HTTP requests, only wait for minimal position
2023-10-23 16:57:30 +01:00
Richard van der Hoff
109882230c
Clean up logging on event persister endpoints (#16488) 2023-10-14 17:57:27 +01:00
Erik Johnston
1cd410a783
Recheck if remote device is cached before requesting it (#16252)
This fixes a bug where we could get stuck re-requesting the device over
replication again and again.
2023-09-07 12:45:43 +00:00
David Robertson
e9eb26e3af
Cache device resync requests over replication (#16241) 2023-09-04 11:57:59 +01:00
Patrick Cloke
40901af5e0
Pass the device ID around in the presence handler (#16171)
Refactoring to pass the device ID (in addition to the user ID) through
the presence handler (specifically the `user_syncing`, `set_state`,
and `bump_presence_active_time` methods and their replication
versions).
2023-08-28 13:08:49 -04:00
Patrick Cloke
1bf143699c
Combine logic about not overriding BUSY presence. (#16170)
Simplify some of the presence code by reducing duplicated code between
worker & non-worker modes.

The main change is to push some of the logic from `user_syncing` into
`set_state`. This is done by passing whether the user is setting the presence
via a `/sync` with a new `is_sync` flag to `set_state`. If this is `true` some
additional logic is performed:

* Don't override `busy` presence.
* Update the `last_user_sync_ts`.
* Never update the status message.
2023-08-28 11:03:23 -04:00
Shay
68b2611783
Clarify comment on key uploads over replication (#16016) 2023-07-27 15:08:46 -07:00
Jason Little
1df0221bda
Use a custom scheme & the worker name for replication requests. (#15578)
All the information needed is already in the `instance_map`, so
use that instead of passing the hostname / IP & port manually
for each replication request.

This consolidates logic for future improvements of using e.g.
UNIX sockets for workers.
2023-05-23 09:05:30 -04:00
Jason Little
e4f545c452
Remove worker_replication_* settings (#15491)
* Add master to the instance_map as part of Complement, have ReplicationEndpoint look at instance_map for master.

* Fix typo in drive by.

* Remove unnecessary worker_replication_* bits from unit tests and add master to instance_map(hopefully in the right place)

* Several updates:

1. Switch from master to main for naming the main process in the instance_map. Add useful constants for easier adjustment of names in the future.
2. Add backwards compatibility for worker_replication_* to allow time to transition to new style. Make sure to prioritize declaring main directly on the instance_map.
3. Clean up old comments/commented out code.
4. Adjust unit tests to match with new code.
5. Adjust Complement setup infrastructure to only add main to the instance_map if workers are used and remove now unused options from the worker.yaml template.

* Initial Docs upload

* Changelog

* Missed some commented out code that can go now

* Remove TODO comment that no longer holds true.

* Fix links in docs

* More docs

* Remove debug logging

* Apply suggestions from code review

Co-authored-by: reivilibre <olivier@librepush.net>

* Apply suggestions from code review

Co-authored-by: reivilibre <olivier@librepush.net>

* Update version to latest, include completeish before/after examples in upgrade notes.

* Fix up and docs too

---------

Co-authored-by: reivilibre <olivier@librepush.net>
2023-05-11 11:30:56 +01:00
Jason Little
d3bd03559b
HTTP Replication Client (#15470)
Separate out a HTTP client for replication in preparation for
also supporting using UNIX sockets. The major difference from
the base class is that this does not use treq to handle HTTP
requests.
2023-05-09 14:25:20 -04:00
Alok Kumar Singh
197fbb123b
Remove legacy code of single user device resync api (#15418)
* Removed single-user resync usage and updated it to use multi-user counterpart

Signed-off-by: Alok Kumar Singh alokaks601@gmail.com
2023-04-21 12:06:39 +01:00
David Robertson
1bc9985eb7
Have replication clients remove _INT_STREAM_POS (#15309)
* Have replication clients remove _INT_STREAM_POS

Suppose worker A makes an internal http request from worker B. B may
make changes that A later learns about over replication. We want A's
request to block until it has seen those changes—mainly to ensure A's
caches are invalidated promptly. This helps provide read-after-write
consistency, eliminating entire categories of races and test flakes.

To implement this, B includes a top-level field `_INT_STREAM_POS` in its
response JSON. Roughly speaking, the field's value tells A what to wait
for. But we weren't removing that internal field before A's request
completed!

Introduced in https://github.com/matrix-org/synapse/pull/14820.
Fixes #15308.

* Changelog
2023-03-22 12:53:55 +00:00
Dirk Klimpel
ecbe0ddbe7
Add support for knocking to workers. (#15133) 2023-03-02 12:59:53 -05:00
dependabot[bot]
9bb2eac719
Bump black from 22.12.0 to 23.1.0 (#15103) 2023-02-22 15:29:09 -05:00
Erik Johnston
c78c67c5a9
Fix bug in replication where response is cached (#15024) 2023-02-08 16:41:55 +00:00
Erik Johnston
0ec12a3753
Reduce max time we wait for stream positions (#14881)
Now that we wait for stream positions whenever we do a HTTP replication
hit, we need to be less brutal in the case where we do timeout (as we
have bugs around this).
2023-01-20 21:04:33 +00:00
Erik Johnston
9187fd940e
Wait for streams to catch up when processing HTTP replication. (#14820)
This should hopefully mitigate a class of races where data gets out of
sync due a HTTP replication request racing with the replication streams.
2023-01-18 19:35:29 +00:00
reivilibre
ba4ea7d13f
Batch up replication requests to request the resyncing of remote users's devices. (#14716) 2023-01-10 11:17:59 +00:00
Andrew Morgan
c4456114e1
Add experimental support for MSC3391: deleting account data (#14714) 2023-01-01 03:40:46 +00:00
Patrick Cloke
6d47b7e325
Add a type hint for get_device_handler() and fix incorrect types. (#14055)
This was the last untyped handler from the HomeServer object. Since
it was being treated as Any (and thus unchecked) it was being used
incorrectly in a few places.
2022-11-22 14:08:04 -05:00
realtyem
c15e9a0edb
Remove need for worker_main_http_uri setting to use /keys/upload. (#14400) 2022-11-16 22:16:25 +00:00
Patrick Cloke
d8cc86eff4
Remove redundant types from comments. (#14412)
Remove type hints from comments which have been added
as Python type hints. This helps avoid drift between comments
and reality, as well as removing redundant information.

Also adds some missing type hints which were simple to fill in.
2022-11-16 15:25:24 +00:00
Tuomas Ojamies
b5ab2c428a
Support using SSL on worker endpoints. (#14128)
* Fix missing SSL support in worker endpoints.

* Add changelog

* SSL for Replication endpoint

* Remove unit test change

* Refactor listener creation to reduce duplicated code

* Fix the logger message

* Update synapse/app/_base.py

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

* Update synapse/app/_base.py

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

* Update synapse/app/_base.py

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>

* Add config documentation for new TLS option

Co-authored-by: Tuomas Ojamies <tojamies@palantir.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
2022-11-15 12:55:00 +00:00
Brendan Abolivier
422cff7df6
Fallback if 'approved' isn't included in a registration replication request (#14135) 2022-10-11 14:41:06 +02:00
Brendan Abolivier
be76cd8200
Allow admins to require a manual approval process before new accounts can be used (using MSC3866) (#13556) 2022-09-29 15:23:24 +02:00