[go: up one dir, main page]

Page MenuHomePhabricator

dcausse (David Causse)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Jun 9 2015, 9:03 AM (484 w, 6 d)
Availability
Available
IRC Nick
dcausse
LDAP User
DCausse
MediaWiki User
DCausse (WMF) [ Global Accounts ]

Recent Activity

Yesterday

dcausse moved T333373: The WDQS streaming updater should support connecting to kafka with SSL from In Progress to To Be Deployed on the Discovery-Search (Current work) board.
Mon, Sep 23, 4:21 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse updated the task description for T373195: Migrate Search Platform-owned helm charts to Calico Network Policies.
Mon, Sep 23, 1:31 PM · Patch-For-Review, Data-Platform-SRE (2024.09.06 - 2024.09.27), Prod-Kubernetes, Kubernetes, serviceops
dcausse claimed T333373: The WDQS streaming updater should support connecting to kafka with SSL.
Mon, Sep 23, 12:44 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse moved T333373: The WDQS streaming updater should support connecting to kafka with SSL from Incoming to In Progress on the Discovery-Search (Current work) board.
Mon, Sep 23, 12:43 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse added a project to T333373: The WDQS streaming updater should support connecting to kafka with SSL: Discovery-Search (Current work).
Mon, Sep 23, 12:43 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse added a comment to T374729: Use kafka-main-[eqiad|codfw].external-services.svc.cluster.local to discover kafka brokers in kafka client running in k8s.

Restarts are all done. You may give it another try @dcausse

Mon, Sep 23, 12:36 PM · Prod-Kubernetes, Kubernetes, Discovery-Search (Current work), serviceops
dcausse moved T373459: SUP: set up alerting for page_change_weighted_tags ingestion from In Progress to Needs review on the Discovery-Search (Current work) board.
Mon, Sep 23, 9:15 AM · Patch-For-Review, Discovery-Search (Current work), CirrusSearch, Growth-Team

Fri, Sep 20

dcausse added a comment to T374987: "Account autocreation denied for CirrusSearch Streaming Updater by ClosedWikiProvider".

But also why is CirrusSearch updating on closed wikis anyway? There shouldn't be anything to update.

Closed wikis are still being served and search is available there, CirrusSearch is still scanning some documents to do some automatic cleanup which I suspect is the source of these API requests.

I wonder, should we turn that off for closed wikis? We are already doing filtering based on sitematrix there for private wikis in some cases, we could similarly exclude closed wikis.

Fri, Sep 20, 6:06 AM · Discovery-Search (Current work), Patch-For-Review, NetworkSession, CirrusSearch

Wed, Sep 18

dcausse closed T374914: SpecialMediaSearch sometimes fails with "Search is currently too busy" as Invalid.

Was looking at a dashboard that already filtered non-mediasearch poolcounter errors

Wed, Sep 18, 5:39 PM · CirrusSearch, Structured-Data-Backlog, Discovery-Search, SDAW-MediaSearch
dcausse claimed T373391: Create wdqs-main and wdqs-scholarly specific test queries.
Wed, Sep 18, 8:54 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Discovery-Search (Current work), Wikidata
dcausse claimed T373459: SUP: set up alerting for page_change_weighted_tags ingestion.
Wed, Sep 18, 8:53 AM · Patch-For-Review, Discovery-Search (Current work), CirrusSearch, Growth-Team
dcausse added a comment to T374987: "Account autocreation denied for CirrusSearch Streaming Updater by ClosedWikiProvider".

But also why is CirrusSearch updating on closed wikis anyway? There shouldn't be anything to update.

Closed wikis are still being served and search is available there, CirrusSearch is still scanning some documents to do some automatic cleanup which I suspect is the source of these API requests.

Wed, Sep 18, 7:22 AM · Discovery-Search (Current work), Patch-For-Review, NetworkSession, CirrusSearch

Tue, Sep 17

dcausse added a comment to T374729: Use kafka-main-[eqiad|codfw].external-services.svc.cluster.local to discover kafka brokers in kafka client running in k8s.

In the RFC I read

In some cases, the URI is specified as an IP address rather than a
hostname. In this case, the iPAddress subjectAltName must be present
in the certificate and must exactly match the IP in the URI.
Tue, Sep 17, 7:45 PM · Prod-Kubernetes, Kubernetes, Discovery-Search (Current work), serviceops
dcausse reopened T374729: Use kafka-main-[eqiad|codfw].external-services.svc.cluster.local to discover kafka brokers in kafka client running in k8s as "Open".

Thanks for looking into this!
Now failing with javax.net.ssl.SSLHandshakeException: No subject alternative DNS name matching kafka-main-eqiad.external-services.svc.cluster.local found use_all_dns_ips and now it seems that it wants to validate the hostname passed via bootstrap.servers...
I'll investigate more to see if there are more options, if we fail to workaround this do you think it'll be acceptable to add kafka-main-eqiad.external-services.svc.cluster.local as a valid alternative in the cert?

Tue, Sep 17, 1:52 PM · Prod-Kubernetes, Kubernetes, Discovery-Search (Current work), serviceops
dcausse added a comment to T374916: Port Categories lag / ping checks to Prometheus/Alertmanager.

not sure that the check_categories.py --ping is necessary and could be dropped imo, it should already be covered by some other sensors.

Tue, Sep 17, 1:01 PM · Patch-For-Review, Data-Platform-SRE (2024.09.06 - 2024.09.27), Wikidata, Discovery-Search, Wikidata-Query-Service, SRE Observability (FY2024/2025-Q1), Observability-Alerting
dcausse added a comment to T374916: Port Categories lag / ping checks to Prometheus/Alertmanager.

We could perhaps adapt modules/query_service/files/monitor/prometheus-blazegraph-exporter.py to take care of running this query by possibly re-using the same gauge blazegraph_lastupdated but adapting the query depending on the namespace it's running.

Tue, Sep 17, 12:59 PM · Patch-For-Review, Data-Platform-SRE (2024.09.06 - 2024.09.27), Wikidata, Discovery-Search, Wikidata-Query-Service, SRE Observability (FY2024/2025-Q1), Observability-Alerting
dcausse created T374944: Refactor RdfClient so that blazegraph specific ResponseHandler implementation can be changed.
Tue, Sep 17, 12:26 PM · Wikidata, Wikidata-Query-Service
dcausse created T374939: Write a client that consumes the RDF update stream from https://stream.wikimedia.org/ and update a triple store.
Tue, Sep 17, 12:14 PM · Wikidata, Wikidata-Query-Service
dcausse created T374921: Configure https://stream.wikimedia.org to expose rdf-streaming-updater.mutation.
Tue, Sep 17, 10:04 AM · Event-Platform, Data-Engineering, Discovery-Search (Current work), Wikidata
dcausse created T374919: Adapt the rdf-streaming-updater flink job to use wikimedia-eventutilities-flink.
Tue, Sep 17, 9:52 AM · Discovery-Search (Current work), Wikidata
dcausse created T374918: Define a schema for the rdf-streaming-updater mutation stream.
Tue, Sep 17, 9:48 AM · Discovery-Search (Current work), Wikidata
dcausse created T374914: SpecialMediaSearch sometimes fails with "Search is currently too busy".
Tue, Sep 17, 8:59 AM · CirrusSearch, Structured-Data-Backlog, Discovery-Search, SDAW-MediaSearch

Mon, Sep 16

dcausse moved T373812: Internal federation sometimes fail with HttpConnectionOverHTTP from In Progress to Needs review on the Discovery-Search (Current work) board.
Mon, Sep 16, 8:03 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata
dcausse awarded T374016: Consider separating wdqs-categories from the rest of the wdqs stack a Love token.
Mon, Sep 16, 2:05 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Wikidata, Wikidata-Query-Service
dcausse added a comment to T370665: Handle Late-Arrived Events from Gobblin into Airflow triggered Refine.

I think the next steps of this analysis are to investigate why we have late-events at specific hours (my guts tell me to look at Event-gate glitches), and why some datasets have those late events as well (hint: 6 over 7 are mediawiki generated events).

Unsure if related but we recently found that some MW requests might last for several hours (T374662), so depending on how the event is created it's possible that late-events are created by MW:

I believe that this might possibly lead to late events being sent by MW.

Mon, Sep 16, 8:36 AM · Data-Engineering (Q1 2024 July 1st - September 30th)
dcausse added a comment to T330525: Migrate Wikidata off of Blazegraph.

@dcausse , thinking of your comment from office hours this month: this is the sort of hopefully-separable work that I imagine finding a grapher in residence to work on ;)

Mon, Sep 16, 7:13 AM · Wikidata, Wikidata-Query-Service

Fri, Sep 13

dcausse updated subscribers of T374729: Use kafka-main-[eqiad|codfw].external-services.svc.cluster.local to discover kafka brokers in kafka client running in k8s.
Fri, Sep 13, 4:27 PM · Prod-Kubernetes, Kubernetes, Discovery-Search (Current work), serviceops
dcausse created T374729: Use kafka-main-[eqiad|codfw].external-services.svc.cluster.local to discover kafka brokers in kafka client running in k8s.
Fri, Sep 13, 4:23 PM · Prod-Kubernetes, Kubernetes, Discovery-Search (Current work), serviceops
dcausse added a comment to T374628: Investigate why rdf-streaming-updater is unable to recover after replacing kafka-main@codfw nodes.

The current hypothesis is that the problem happen right after the node comes online and is advertised by the cluster as usable but that node is not yet allowed by the egress rule. The kafka-client is then too confused and the job enters a crash loop. Other other jobs seem to be more tolerant to this setup. To be precise the search job also suffered some blips during the process impacting the search update lag but I think this is totally acceptable.

Fri, Sep 13, 10:25 AM · Discovery-Search (Current work), Wikidata

Thu, Sep 12

dcausse added a comment to T331127: phantom redirects lingering in incategory searches after page moves.

I think we still have issues with this, looking closer it appears that MW is not properly flagging this event as a page move but a page edit which then is misleading CirrusSearch into thinking that it's a "normal" edit: P69074.
@Ottomata is this expected? is this possible that page moves between namespaces are not properly identified by the page_change stream?

Thu, Sep 12, 4:58 PM · MW-1.43-notes (1.43.0-wmf.21; 2024-09-03), MW-1.40-notes (1.40.0-wmf.25; 2023-02-27), Discovery-Search (Current work), CirrusSearch
dcausse updated subscribers of T331127: phantom redirects lingering in incategory searches after page moves.

After having not seen this problem recur for several weeks since the last batch I posted about was cleared, there's been a new isolated case recurring again, with User:Citrivescence/Asher Perlman still not dropping from https://en.wikipedia.org/w/index.php?search=incategory%3A%22Living_people%22&title=Special%3ASearch&profile=advanced&fulltext=1&ns2=1&ns3=1&ns118=1&ns119=1 after being moved to mainspace five days ago. Given the "two weeks to work through all possible pages" comment above, I suppose it may take a while, but I'm just reporting it so that it's known.

Thu, Sep 12, 4:30 PM · MW-1.43-notes (1.43.0-wmf.21; 2024-09-03), MW-1.40-notes (1.40.0-wmf.25; 2023-02-27), Discovery-Search (Current work), CirrusSearch
dcausse updated the title for P69074 page_change event for a page move from draft to NS_MAIN from page move between to page_change event for a page move from draft to NS_MAIN .
Thu, Sep 12, 4:28 PM
dcausse created P69074 page_change event for a page move from draft to NS_MAIN .
Thu, Sep 12, 4:28 PM
dcausse created T374637: Decide how to make datasets owned by analytics-search-users also readable by analytics-privatedata-users.
Thu, Sep 12, 3:58 PM · Discovery-Search (Current work), Data-Engineering
dcausse created T374628: Investigate why rdf-streaming-updater is unable to recover after replacing kafka-main@codfw nodes.
Thu, Sep 12, 2:55 PM · Discovery-Search (Current work), Wikidata

Wed, Sep 11

dcausse moved T373791: Transfer a sane journal (subgraph:main) to wdqs2021 from wdqs2022 from Needs Review to Done on the Data-Platform-SRE (2024.09.06 - 2024.09.27) board.

@bking everything looks good, thanks!

Wed, Sep 11, 5:48 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Wikidata, Wikidata-Query-Service

Tue, Sep 10

dcausse claimed T373812: Internal federation sometimes fail with HttpConnectionOverHTTP.
Tue, Sep 10, 1:25 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata
dcausse moved T371401: Adapt search ranking for mul language code from In Progress to To Be Deployed on the Discovery-Search (Current work) board.

I think that all patches have been merged, most of them deployed except one which should get deployed via the train tomorrow for group1 (re-enable fine-tuning per language).

Tue, Sep 10, 7:21 AM · MW-1.43-notes (1.43.0-wmf.23; 2024-09-17), Discovery-Search (Current work), CirrusSearch, Wikidata

Mon, Sep 9

dcausse updated the task description for T373391: Create wdqs-main and wdqs-scholarly specific test queries.
Mon, Sep 9, 4:41 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Discovery-Search (Current work), Wikidata
dcausse added a comment to T368067: Post-creation work for btmwiki.

Untagging WDQS as I believe there are no issues with WDQS, I think the query does not work because https://www.wikidata.org/wiki/Q126787117 was created without a trailing slash for the P856 property.

Mon, Sep 9, 4:09 PM · Countervandalism-Network, Content-Transform-Team, Wiki-Setup
dcausse removed a project from T368067: Post-creation work for btmwiki: Wikidata-Query-Service.

Untagging WDQS as I believe there are no issues with WDQS, I think the query does not work because https://www.wikidata.org/wiki/Q126787117 was created without a trailing slash for the P856 property.

Mon, Sep 9, 4:08 PM · Countervandalism-Network, Content-Transform-Team, Wiki-Setup
dcausse updated the task description for T374341: [SPIKE] how can we support Spark producer/consumers in Event Platform.
Mon, Sep 9, 12:54 PM · Dumps 2.0, Data-Engineering, Event-Platform
dcausse updated the task description for T374341: [SPIKE] how can we support Spark producer/consumers in Event Platform.
Mon, Sep 9, 10:03 AM · Dumps 2.0, Data-Engineering, Event-Platform
dcausse created T374335: The SUP producer should ship private wiki update events to a separate stream.
Mon, Sep 9, 8:04 AM · Patch-For-Review, Discovery-Search (Current work), CirrusSearch

Thu, Sep 5

dcausse awarded T371359: Migrate Wikitech's Jobqueue a Love token.
Thu, Sep 5, 3:16 PM · Patch-For-Review, wikitech.wikimedia.org, MW-on-K8s, serviceops
dcausse added a comment to T271776: Allow limiting lexeme searches by language.

the data seems to be indexed so it might be trivial to implement these keywords, moving to needs triage to raise visibility.

Thu, Sep 5, 9:39 AM · CirrusSearch, Discovery-Search, Wikidata, Wikidata Lexicographical data
dcausse updated the task description for T374009: Investigate EQIAD WDQS graph split host alerts.
Thu, Sep 5, 6:51 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27)

Wed, Sep 4

dcausse moved T271776: Allow limiting lexeme searches by language from Wikibase Search to needs triage on the Discovery-Search board.

the data seems to be indexed so it might be trivial to implement these keywords, moving to needs triage to raise visibility.

Wed, Sep 4, 4:03 PM · CirrusSearch, Discovery-Search, Wikidata, Wikidata Lexicographical data
dcausse closed T373086: PHP Warning: Stats: Cannot associate label keys with label values: Not all initialized labels have an assigned value. as Resolved.

Seems to be fixed, tentatively closing

Wed, Sep 4, 9:14 AM · MW-1.43-notes (1.43.0-wmf.21; 2024-09-03), Discovery-Search, CirrusSearch, Data-Engineering (Q1 2024 July 1st - September 30th), Event-Platform, Wikimedia-production-error
dcausse closed T373086: PHP Warning: Stats: Cannot associate label keys with label values: Not all initialized labels have an assigned value., a subtask of T373640: 1.43.0-wmf.21 deployment blockers, as Resolved.
Wed, Sep 4, 9:11 AM · Release-Engineering-Team (Priority Backlog 📥), Release, Train Deployments
dcausse claimed T373086: PHP Warning: Stats: Cannot associate label keys with label values: Not all initialized labels have an assigned value..
Wed, Sep 4, 8:40 AM · MW-1.43-notes (1.43.0-wmf.21; 2024-09-03), Discovery-Search, CirrusSearch, Data-Engineering (Q1 2024 July 1st - September 30th), Event-Platform, Wikimedia-production-error
dcausse added a comment to T373086: PHP Warning: Stats: Cannot associate label keys with label values: Not all initialized labels have an assigned value..

Looks like the spike is only related to CirrusSearch?

Wed, Sep 4, 8:39 AM · MW-1.43-notes (1.43.0-wmf.21; 2024-09-03), Discovery-Search, CirrusSearch, Data-Engineering (Q1 2024 July 1st - September 30th), Event-Platform, Wikimedia-production-error
dcausse added a comment to T373791: Transfer a sane journal (subgraph:main) to wdqs2021 from wdqs2022.

Unfortunately wdqs2021 is still consumer from the wrong topic after the transfer.
Looking closer it appears that the service definition for the wdqs-updater is duplicated in two locations:

  • /etc/systemd/system/wdqs-updater.service containing the wrong topic codfw.rdf-streaming-updater.mutation
  • /lib/systemd/system/wdqs-updater.service with the right topic codfw.rdf-streaming-updater.mutation-main
Wed, Sep 4, 8:33 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Wikidata, Wikidata-Query-Service

Tue, Sep 3

dcausse added a comment to T371401: Adapt search ranking for mul language code.

https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1060433 appears to be not-needed in the end, I don't see where we use these manually tuned profiles

After some review it turns out the code that used the language-tuned profiles was lost as part of splitting WikibaseCirrusSearch out of the Wikibase repo. All the related machinery still exists and it would be pretty easy to add it back in now, but I wonder if we should be doing testing of some sort to verify those profiles are better than the defaults we've been using. They were at the time, but user behaviour can change in the 5 years that have passed.

Tue, Sep 3, 12:14 PM · MW-1.43-notes (1.43.0-wmf.23; 2024-09-17), Discovery-Search (Current work), CirrusSearch, Wikidata
dcausse moved T372030: Index statements in commons media datatype for haswbstatements from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Tue, Sep 3, 12:10 PM · MW-1.43-notes (1.43.0-wmf.22; 2024-09-10), Discovery-Search (Current work), CirrusSearch, Wikidata
dcausse moved T371929: Index all statements (without value) for all datatypes for haswbstatement from Incoming to To Be Deployed on the Discovery-Search (Current work) board.
Tue, Sep 3, 12:10 PM · Discovery-Search (Current work), Wikidata, SDC General, CirrusSearch
dcausse added a comment to T372030: Index statements in commons media datatype for haswbstatements.

My understanding is that https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseCirrusSearch/+/1064792 is going to fix both this ticket and T371929.
From this ticket description it is not entirely clear if the ask is also to index the full P18 statements or just the flag that indicates the presence of the use of this property, for the former I'm with Erik this possibly adds a lot of new tokens that might be be particularly hard to search for (untokenized URLs) and thus probably not very useful.

Tue, Sep 3, 10:01 AM · MW-1.43-notes (1.43.0-wmf.22; 2024-09-10), Discovery-Search (Current work), CirrusSearch, Wikidata
dcausse assigned T371929: Index all statements (without value) for all datatypes for haswbstatement to EBernhardson.

Should be resolved with https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseCirrusSearch/+/1064792

Tue, Sep 3, 9:59 AM · Discovery-Search (Current work), Wikidata, SDC General, CirrusSearch

Mon, Sep 2

dcausse added a subtask for T364363: [Epic] Productionize federated wdqs graph-split endpoints: T373812: Internal federation sometimes fail with HttpConnectionOverHTTP.
Mon, Sep 2, 2:25 PM · Data-Platform-SRE, Discovery-Search, Epic, Wikidata-Query-Service, Wikidata
dcausse added a parent task for T373812: Internal federation sometimes fail with HttpConnectionOverHTTP: T364363: [Epic] Productionize federated wdqs graph-split endpoints.
Mon, Sep 2, 2:25 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata
dcausse created T373812: Internal federation sometimes fail with HttpConnectionOverHTTP.
Mon, Sep 2, 2:24 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata
dcausse added a subtask for T364363: [Epic] Productionize federated wdqs graph-split endpoints: T373791: Transfer a sane journal (subgraph:main) to wdqs2021 from wdqs2022.
Mon, Sep 2, 9:49 AM · Data-Platform-SRE, Discovery-Search, Epic, Wikidata-Query-Service, Wikidata
dcausse added a parent task for T373791: Transfer a sane journal (subgraph:main) to wdqs2021 from wdqs2022: T364363: [Epic] Productionize federated wdqs graph-split endpoints.
Mon, Sep 2, 9:49 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Wikidata, Wikidata-Query-Service
dcausse created T373791: Transfer a sane journal (subgraph:main) to wdqs2021 from wdqs2022.
Mon, Sep 2, 9:48 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Wikidata, Wikidata-Query-Service
dcausse added a comment to T369808: The Commons search "deepcategory" operator often does not work (Deep category query returned too many categories).

Interesting, I thought as well that the 1k limits would apply to nested bool queries (which is probably one reason it was set to 256 initially). It means that we can probably safely bump the limit to 1k without even nesting bool queries. I'm not clear why it has such an impact when getting past 2.5k and I have no clue if a terms query would perform significantly better, it's less costly for sure since there's no need to analyze & rewrite the query, we could probably test this as well to see the impact?
So perhaps we can at least bump to 1k right now with a simple config change and ponder what to do next based on some testing of the terms query? If the terms query does not show a significant gain compared to nested bool queries we might just use this?

Mon, Sep 2, 8:14 AM · MW-1.43-notes (1.43.0-wmf.22; 2024-09-10), Patch-For-Review, Discovery-Search (Current work), CirrusSearch, Commons
dcausse created T373778: NetworkSession and AbuseFilter may be spammy.
Mon, Sep 2, 7:27 AM · MW-1.43-notes (1.43.0-wmf.22; 2024-09-10), Discovery-Search (Current work), CirrusSearch, AbuseFilter, NetworkSession

Aug 9 2024

dcausse added a comment to T372128: cirrus-reindex-orchestrator seems to miss the last required backfill.

I was able to trigger the backfill for wikidatawiki_content running another re-index for an unrelated wiki on both eqiad and codfw, seems to me that there's an early stop when all the re-index are done, it should perhaps double check that no remaining backfills are needed before quitting?

Aug 9 2024, 7:47 AM · Discovery-Search (Current work), CirrusSearch
dcausse created T372128: cirrus-reindex-orchestrator seems to miss the last required backfill.
Aug 9 2024, 7:34 AM · Discovery-Search (Current work), CirrusSearch
dcausse created P67255 cirrus-reindex-orchestrator output on two consecutive runs for testwikidatawiki and wikidatawiki.
Aug 9 2024, 7:27 AM
dcausse added a comment to T371401: Adapt search ranking for mul language code.

Current status:

Aug 9 2024, 6:58 AM · MW-1.43-notes (1.43.0-wmf.23; 2024-09-17), Discovery-Search (Current work), CirrusSearch, Wikidata
dcausse moved T371352: Search for haslabel:mul does not show results on wikidata.org from Ready for Dev -- SWE to Needs Reporting on the Discovery-Search (Current work) board.

This should work now, we had to re-index to wikidata

Aug 9 2024, 6:32 AM · Discovery-Search (Current work), Wikidata, Wikidata Dev Team

Aug 8 2024

dcausse triaged T371746: Tilde Tilde Tilde not found by search as Low priority.

~ is definitely a confusing character for search and is heavily overloaded:

  • used to force entering the search results page when used as a prefix
  • considered as a punctuation and ignored by many text analysis components
  • used to trigger fuzziness word~
  • used to control the phrase slop in "foo bar"~2
  • used to perform a phrase search on stems "foos bars"~
  • has some restrictions on page titles (impossible to create a page named ~~~)
Aug 8 2024, 1:51 PM · Discovery-Search, CirrusSearch

Aug 7 2024

dcausse added a comment to T371401: Adapt search ranking for mul language code.

Regarding fallbacks WikibaseCirrusSearch is relying on \Wikibase\Lib\TermLanguageFallbackChain::getFetchLanguageCodes, the order in which these languages are returned is quite important as well as the weight attributed to such matches are inversely proportional to its position in this array.

Aug 7 2024, 1:41 PM · MW-1.43-notes (1.43.0-wmf.23; 2024-09-17), Discovery-Search (Current work), CirrusSearch, Wikidata
dcausse added a comment to T371401: Adapt search ranking for mul language code.

The procedure should be:

Aug 7 2024, 1:30 PM · MW-1.43-notes (1.43.0-wmf.23; 2024-09-17), Discovery-Search (Current work), CirrusSearch, Wikidata
dcausse added a comment to T371401: Adapt search ranking for mul language code.

The mul labels and descriptions (can we have mul descriptions?) are currently not indexed and explains to some degree why search is behaving poorly on these items. We'll index those and see how it performs, tuning search might come as a separate step.
If I'm not mistaken mul is considered a fallback for all languages so it should always be queried.

Aug 7 2024, 7:54 AM · MW-1.43-notes (1.43.0-wmf.23; 2024-09-17), Discovery-Search (Current work), CirrusSearch, Wikidata

Aug 6 2024

dcausse moved T328330: Create SLI / SLO on Search update lag from In Progress to Needs review on the Discovery-Search (Current work) board.

Dashboard is up at https://grafana.wikimedia.org/d/8xDerelVz/search-update-lag-slo?orgId=1
A patch to grafana-grilly is upload at https://gerrit.wikimedia.org/r/c/operations/grafana-grizzly/+/1060150 (but not sure how to move forward with it)

Aug 6 2024, 5:27 PM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Patch-For-Review, Discovery-Search (Current work)
dcausse added a comment to T371746: Tilde Tilde Tilde not found by search.

Why doesn't whatever logic allows "Double tilde" to be found work here then, though? I was assuming that was found via the redirect "~~", which you seem to be implying should have the same problem.

Aug 6 2024, 4:53 PM · Discovery-Search, CirrusSearch
dcausse added a comment to T371746: Tilde Tilde Tilde not found by search.

I created the redirect "~~ (album)". It does show up in the completion index for "~~~" (after 5 irrelevant results), but does not show up in the search index. Odd.

Aug 6 2024, 4:00 PM · Discovery-Search, CirrusSearch
dcausse added a comment to T370754: Import WDQS subgraphs to production nodes.

Unchecked the prerequisite regarding kafka topics, the split graph hosts are currently configured to consume from the full graph topic, the reload should not start (probably be stopped/restarted on wdqs1021) before https://gerrit.wikimedia.org/r/c/operations/puppet/+/1060049 is merged and applied to the corresponding hosts.

Aug 6 2024, 10:36 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Discovery-Search (Current work), Wikidata
dcausse updated the task description for T370754: Import WDQS subgraphs to production nodes.
Aug 6 2024, 10:33 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Discovery-Search (Current work), Wikidata
dcausse added a comment to T371871: Very high maxlag on Wikidata due to WDQS lag.

As part of the work to expose two new endpoints serving the split graph (T364363) we are configuring new wdqs hosts to run blazegraph.
The way maxlag is propagated from WDQS to mediawiki is by measuring the most lagged wdqs host that is online.
In order to know what wdqs hosts are online we measure the number of queries that it serves.
We also run some "monitoring queries" internally to measure the health of the system and in order for these monitoring queries to not interfere with this system we flag such internal user-agents so that they're ignored.
This is where we made a mistake, meaning that a monitoring user-agent was not properly flagged as internal and caused a new host (not yet fully loaded) to be considered online and thus taken into account by maxlag.

Aug 6 2024, 9:31 AM · Wikidata-Query-Service, Wikidata

Aug 5 2024

dcausse added a comment to T371746: Tilde Tilde Tilde not found by search.

I was unaware of the meaning of "~" as a search operator. Is that documented somewhere?

Aug 5 2024, 4:47 PM · Discovery-Search, CirrusSearch
dcausse added a comment to T371746: Tilde Tilde Tilde not found by search.

Does this mean that if I were to create a redirect "~~ (album)" -> Tilde Tilde Tilde for example then it would be found by that search. That's hacky but tolerable.

Aug 5 2024, 4:43 PM · Discovery-Search, CirrusSearch
dcausse added a comment to T371746: Tilde Tilde Tilde not found by search.

Search queries prefixed with ~ has a special meaning for Special:Search, it instructs the UI to go to Special:Search rather than the article page if it exists, it's the reason why ~~ is found when searching ~~~.
This is sadly not the sole reason why it's not found, ~ are likely ignored in the fulltext search index and thus only relying on titles or redirects to find ~~. Given that there's no way to add such titles nor redirects with ~~~ I don't see an easy way to solve this issue because search needs to pull this data from somewhere.

Aug 5 2024, 4:35 PM · Discovery-Search, CirrusSearch
dcausse added a project to T371464: Investigation: uniqueness of statement IDs within an entity: Wikidata-Query-Service.
Aug 5 2024, 9:16 AM · Wikidata-Query-Service, Wikidata, Wikidata Dev Team
dcausse merged T371786: Duplicated statement ids in some wikidata entities into T371464: Investigation: uniqueness of statement IDs within an entity.
Aug 5 2024, 9:16 AM · Wikidata-Query-Service, Wikidata, Wikidata Dev Team
dcausse merged task T371786: Duplicated statement ids in some wikidata entities into T371464: Investigation: uniqueness of statement IDs within an entity.
Aug 5 2024, 9:15 AM · Wikidata, Wikidata-Query-Service
dcausse updated the task description for T371464: Investigation: uniqueness of statement IDs within an entity.
Aug 5 2024, 9:14 AM · Wikidata-Query-Service, Wikidata, Wikidata Dev Team
dcausse updated the task description for T370754: Import WDQS subgraphs to production nodes.
Aug 5 2024, 8:48 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Discovery-Search (Current work), Wikidata
dcausse updated the task description for T370754: Import WDQS subgraphs to production nodes.
Aug 5 2024, 8:47 AM · Data-Platform-SRE (2024.09.06 - 2024.09.27), Discovery-Search (Current work), Wikidata
dcausse created T371786: Duplicated statement ids in some wikidata entities.
Aug 5 2024, 8:34 AM · Wikidata, Wikidata-Query-Service
dcausse added a comment to T356161: WikibaseMediaInfo seems to reuse statement identifiers from other entities.

I doubt the same situation is possible on Wikidata, since we disallow moving items, properties or lexemes, and the moving seems to be a crucial part of how the history was split here.

It seems it is, the same problem was reported on Wikidata a few days ago - https://www.wikidata.org/wiki/Wikidata:Report_a_technical_problem#Non-unique_statement_id_in_Q85046372

Aug 5 2024, 8:17 AM · Wikidata, Wikidata-Query-Service, WikibaseMediaInfo, Structured-Data-Backlog
dcausse merged T317530: MediaInfo does seem to allow entities to share same statement IDs into T356161: WikibaseMediaInfo seems to reuse statement identifiers from other entities.
Aug 5 2024, 7:05 AM · Wikidata, Wikidata-Query-Service, WikibaseMediaInfo, Structured-Data-Backlog
dcausse merged task T317530: MediaInfo does seem to allow entities to share same statement IDs into T356161: WikibaseMediaInfo seems to reuse statement identifiers from other entities.
Aug 5 2024, 7:03 AM · Wikidata-Query-Service, Wikidata, Structured-Data-Backlog, Commons, SDC General
dcausse added a comment to T356161: WikibaseMediaInfo seems to reuse statement identifiers from other entities.

Is this the same as T317530?

Aug 5 2024, 7:02 AM · Wikidata, Wikidata-Query-Service, WikibaseMediaInfo, Structured-Data-Backlog

Jul 31 2024

dcausse merged T371459: DispatchChangeVisibilityNotificationJobTest failures with wmf-quibble-vendor-mysql-php74 into T371460: Build failures from Wikibase\Repo\Tests\ChangeModification\DispatchChangeVisibilityNotificationJobTest::testHandle and Wikibase\Lib\Tests\Store\Sql\SqlChangeStoreTest::testSaveChange_insert.
Jul 31 2024, 9:06 AM · MediaModeration, MW-1.43-notes (1.43.0-wmf.17; 2024-08-06), Wikidata Dev Team (Wikidata.org Slice), MediaWiki-extensions-WikibaseRepository, Wikidata, ci-test-error (WMF-deployed Build Failure)
dcausse added a project to T371460: Build failures from Wikibase\Repo\Tests\ChangeModification\DispatchChangeVisibilityNotificationJobTest::testHandle and Wikibase\Lib\Tests\Store\Sql\SqlChangeStoreTest::testSaveChange_insert: MediaWiki-extensions-WikibaseRepository.
Jul 31 2024, 9:05 AM · MediaModeration, MW-1.43-notes (1.43.0-wmf.17; 2024-08-06), Wikidata Dev Team (Wikidata.org Slice), MediaWiki-extensions-WikibaseRepository, Wikidata, ci-test-error (WMF-deployed Build Failure)
dcausse merged task T371459: DispatchChangeVisibilityNotificationJobTest failures with wmf-quibble-vendor-mysql-php74 into T371460: Build failures from Wikibase\Repo\Tests\ChangeModification\DispatchChangeVisibilityNotificationJobTest::testHandle and Wikibase\Lib\Tests\Store\Sql\SqlChangeStoreTest::testSaveChange_insert.
Jul 31 2024, 9:05 AM · Wikidata, MediaWiki-extensions-WikibaseRepository, ci-test-error
dcausse created T371459: DispatchChangeVisibilityNotificationJobTest failures with wmf-quibble-vendor-mysql-php74.
Jul 31 2024, 9:01 AM · Wikidata, MediaWiki-extensions-WikibaseRepository, ci-test-error
dcausse moved T371129: Extension:CirrusSearch not propagating tracing headers from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Jul 31 2024, 7:53 AM · MW-1.43-notes (1.43.0-wmf.17; 2024-08-06), Discovery-Search (Current work), Observability-Tracing, CirrusSearch