[go: up one dir, main page]

Page MenuHomePhabricator

Mvolz (Marielle Volz)
User

Projects (13)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Oct 15 2014, 9:50 PM (526 w, 3 d)
Availability
Available
IRC Nick
Mvolz
LDAP User
Mvolz
MediaWiki User
Mvolz (WMF) [ Global Accounts ]

mvolz@wikimedia.org

Recent Activity

Fri, Nov 15

Mvolz updated the task description for T364779: Migrate node-based services in production to node20.
Fri, Nov 15, 7:18 PM · Platform Engineering, Recommendation-API, Wikifeeds, Push-Notification-Service, Mobile-Content-Service, Maps (Kartotherian), EventStreams, Citoid, Proton, ChangeProp

Thu, Nov 14

Mvolz added a comment to T369547: Introduce a generic unsupported format error message within Citoid UI.

Thanks @EAkinloose, that's really helpful. It looks like the Citoid service might be returning 404 on beta cluster. I'm seeing a 404 error from the REST API:

https://en.wikipedia.beta.wmflabs.org/api/rest_v1/data/citation/mediawiki/https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2F9%2F98%2FColoring_page_for_Wikipedia_Day_2019_in_NYC.pdf?action=query&format=json

That means we never see a 415 error, and therefore never see the more specific error text. @Mvolz is this an expected issue on beta cluster?

Thu, Nov 14, 9:36 PM · Editing QA, Goal, Design, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz added a comment to T346624: citoid service doesn't work on beta .

It's working now (though outdated.)

Thu, Nov 14, 9:33 PM · Citoid, Beta-Cluster-Infrastructure
Mvolz closed T346624: citoid service doesn't work on beta as Resolved.
Thu, Nov 14, 9:33 PM · Citoid, Beta-Cluster-Infrastructure
Mvolz added a comment to T371323: [SPIKE] Earn verification from Cloudfront (Amazon) for Citoid.

Example get response:

Thu, Nov 14, 12:41 PM · Citoid, Editing-team (Kanban Board)
Mvolz added a comment to T371323: [SPIKE] Earn verification from Cloudfront (Amazon) for Citoid.

WSJ.com is one of the publishers we can't cite because of CloudFront. I'm not sure if there are others.

Thu, Nov 14, 12:17 PM · Citoid, Editing-team (Kanban Board)

Wed, Nov 13

Mvolz added a comment to T369547: Introduce a generic unsupported format error message within Citoid UI.

Resolving as deployed (Went out to group 1 and 2 Tuesday.)

Wed, Nov 13, 5:48 PM · Editing QA, Goal, Design, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz closed T369547: Introduce a generic unsupported format error message within Citoid UI, a subtask of T364594: Revise Citoid error message to be more specific, as Resolved.
Wed, Nov 13, 5:47 PM · MW-1.44-notes (1.44.0-wmf.2; 2024-11-05), VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz closed T369547: Introduce a generic unsupported format error message within Citoid UI as Resolved.
Wed, Nov 13, 5:47 PM · Editing QA, Goal, Design, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz updated the task description for T374558: Migrate Zotero Server repo from gerrit to GitLab.
Wed, Nov 13, 5:38 PM · Citoid, GitLab (Pipeline Services Migration🐤), Editing-team

Tue, Nov 12

Mvolz added a comment to T370118: Register Citoid as a "friendly bot" (or alternatively verified bot) with Cloudflare.

Any news?

Tue, Nov 12, 4:30 PM · serviceops, Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid

Mon, Nov 11

Mvolz added a comment to T378461: Investigate Wikipedia bot/userscript usage of citoid impacting www.pro-football-reference.com .

I think both have been resolved/addressed. @Mvolz do you think this task can be resolved?

Mon, Nov 11, 1:56 PM · User-aborrero, cloud-services-team, Citoid

Sun, Nov 10

Mvolz added a comment to T314942: Pywikibot client to load ISBN related data into Wikidata.

Which graph would these data go into? The main graph, or the scholarly metadata one?

Sun, Nov 10, 11:58 AM · Patch-For-Review, WikiCite, Pywikibot, Pywikibot-Scripts, Wikimania-Hackathon-2022

Thu, Nov 7

Mvolz updated the task description for T349118: Migrate node-based services in production to node18.
Thu, Nov 7, 11:12 AM · Content-Transform-Team, Platform Engineering, Trust and Safety Product Team (Engineering), Patch-For-Review, Essential-Work, Page Content Service, MediaWiki-Engineering, [DEPRECATED] wdwb-tech, Wikidata, Citoid, Wikidata-Termbox, Wikimedia-Portals, Data-Engineering, serviceops

Wed, Nov 6

Mvolz renamed T362873: NPR blocking Citoid user-agent string, causing timeouts. from Citing NPR article with Citoid times out in production but not locally to NPR blocking Citoid user-agent string, causing timeouts. .
Wed, Nov 6, 11:42 AM · VisualEditor, VisualEditor-MediaWiki-References, Web2Cit, Citoid

Tue, Nov 5

Mvolz added a comment to T370702: Calculate rate at which URL requests fail and succeed.

@ppelberg we only keep the last 3 months of data. For the 6 months, should I download now and wait until we collect the next 3 months worth of data and compile it, or just do 3 months.

Tue, Nov 5, 11:31 AM · Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz updated subscribers of T362873: NPR blocking Citoid user-agent string, causing timeouts. .

Now we have better tooling we can examine this more in depth; NPR requests are actually timing out and responding with 504. This would be slightly weird for a deliberate block as you're more likely to get a 403 and for the response to be relatively fast where indeed these are actually timing out.

Tue, Nov 5, 11:08 AM · VisualEditor, VisualEditor-MediaWiki-References, Web2Cit, Citoid
Mvolz added a subtask for T362379: Several major news websites (NYT, NPR, Reuters...) block citoid : T362873: NPR blocking Citoid user-agent string, causing timeouts. .
Tue, Nov 5, 11:05 AM · Patch-For-Review, Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz added a parent task for T362873: NPR blocking Citoid user-agent string, causing timeouts. : T362379: Several major news websites (NYT, NPR, Reuters...) block citoid .
Tue, Nov 5, 11:05 AM · VisualEditor, VisualEditor-MediaWiki-References, Web2Cit, Citoid
Mvolz renamed T362873: NPR blocking Citoid user-agent string, causing timeouts. from Citing NPR article with Citoid does not work to Citing NPR article with Citoid times out in production but not locally.
Tue, Nov 5, 11:02 AM · VisualEditor, VisualEditor-MediaWiki-References, Web2Cit, Citoid

Fri, Nov 1

Mvolz added a comment to T378686: refill: review citoid usage.

I see in the other thread you don't have Turnilo access, sorry. The user-agent is 'reFill/2 (http://en.wikipedia.org/wiki/User:Zhaofeng_Li/reFill)'

Fri, Nov 1, 10:40 AM · Tool-refill
Mvolz updated subscribers of T378686: refill: review citoid usage.

There is certainly nothing in the service log that indicates that reFill was called 30,000 times in any 12 hours.

Fri, Nov 1, 10:36 AM · Tool-refill
Mvolz updated the task description for T378461: Investigate Wikipedia bot/userscript usage of citoid impacting www.pro-football-reference.com .
Fri, Nov 1, 10:34 AM · User-aborrero, cloud-services-team, Citoid

Wed, Oct 30

Mvolz added a comment to T369547: Introduce a generic unsupported format error message within Citoid UI.

I find this message kind of confusing because the statements aren't explicitly connected.

"We couldn't make a citation for you because the resource is in a format we don't support" sounds clearer to me.

Great spot, @Mvolz . What do you think about the proposed version below?

Separately: what page do you think should be linked from the format we don't support portion of the message?

We couldn't create a citation for you.<br>
The resource is in a [[TBD | format we don't support]]. Create a citation manually below.
Wed, Oct 30, 4:28 PM · Editing QA, Goal, Design, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid

Tue, Oct 29

Mvolz added a comment to T374488: Reconsider Citoid and Zotero using different user agents.

Should we trial this for a month or something? I don't like it for the long term but maybe worth seeing if there's an impact?

Tue, Oct 29, 12:13 PM · Editing-team, Citoid
Mvolz updated the task description for T378461: Investigate Wikipedia bot/userscript usage of citoid impacting www.pro-football-reference.com .
Tue, Oct 29, 12:09 PM · User-aborrero, cloud-services-team, Citoid
Mvolz created T378461: Investigate Wikipedia bot/userscript usage of citoid impacting www.pro-football-reference.com .
Tue, Oct 29, 11:53 AM · User-aborrero, cloud-services-team, Citoid
Mvolz triaged T378460: Update Zotero translators as Medium priority.
Tue, Oct 29, 11:39 AM · Citoid
Mvolz created T378460: Update Zotero translators.
Tue, Oct 29, 11:35 AM · Citoid

Thu, Oct 24

Mvolz closed T377828: Create panel on citoid dashboard to identify domains which may block us, a subtask of T368802: Identify patterns in Citoid requests/traffic , as Declined.
Thu, Oct 24, 12:06 PM · Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz closed T377828: Create panel on citoid dashboard to identify domains which may block us as Declined.
Thu, Oct 24, 12:06 PM · observability, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid

Wed, Oct 23

Mvolz updated the task description for T364779: Migrate node-based services in production to node20.
Wed, Oct 23, 11:17 AM · Platform Engineering, Recommendation-API, Wikifeeds, Push-Notification-Service, Mobile-Content-Service, Maps (Kartotherian), EventStreams, Citoid, Proton, ChangeProp

Tue, Oct 22

Mvolz added a comment to T377828: Create panel on citoid dashboard to identify domains which may block us.

"Count Percentage" isn't a calculation that OpenSearch returns in its response. It's not even a percent of all hits returned by the query. Count percentage is calculated client-side by dividing each leaf aggregation doc_count by the sum of all branch root aggregations doc_count. This makes it heavily influenced by the size parameter.

I don't think it's possible in the current implementation. OpenSearch Dashboards would need a new feature to do it.

Tue, Oct 22, 2:51 PM · observability, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz renamed T377828: Create panel on citoid dashboard to identify domains which may block us from Create panel on citoid dashboard to identify domains which block us to Create panel on citoid dashboard to identify domains which may block us.
Tue, Oct 22, 11:35 AM · observability, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz added a comment to T377828: Create panel on citoid dashboard to identify domains which may block us.

So something like: https://logstash.wikimedia.org/app/visualize#/edit/a2d1a480-473d-11ef-8912-517dff44673b?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-1M,to:now))&_a=(filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'logstash-*',key:outgoingReqResult.hostname.keyword,negate:!t,params:(query:en.wikipedia.org),type:phrase),query:(match_phrase:(outgoingReqResult.hostname.keyword:en.wikipedia.org)))),linked:!f,query:(language:kuery,query:''),uiState:(),vis:(aggs:!((enabled:!t,id:'1',params:(),schema:metric,type:count),(enabled:!t,id:'3',params:(exclude:'200',field:outgoingReqResult.status.keyword,include:'',missingBucket:!f,missingBucketLabel:Missing,order:desc,orderBy:'1',otherBucket:!f,otherBucketLabel:Errors,size:5),schema:bucket,type:terms),(enabled:!t,id:'2',params:(field:outgoingReqResult.hostname.keyword,missingBucket:!f,missingBucketLabel:Missing,order:desc,orderBy:'1',otherBucket:!f,otherBucketLabel:Other,size:100),schema:bucket,type:terms)),params:(perPage:10,percentageCol:Count,showMetricsAtAllLevels:!f,showPartialRows:!f,showTotal:!f,totalFunc:sum),title:'Top%20errors%20by%20requested%20domain',type:table))

Tue, Oct 22, 11:35 AM · observability, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz created T377828: Create panel on citoid dashboard to identify domains which may block us.
Tue, Oct 22, 11:29 AM · observability, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid

Oct 17 2024

Mvolz added a comment to T349118: Migrate node-based services in production to node18.

What service are you using this for?

https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams_HTTP_Service
T361769: Migrate and re-deploy eventstreams using service-utils

Unfortunately we need [...] Open API spec generation.

OpenAPI spec generation sounds great. File a task...and help implement it!?

I don't believe service-runner or service-template-node ever supported this though, did it? FWIW, EventStreams also generates some spec, but it is manual, and not handled by a lib.

It was part of the list of things "not included" by service-utils so I assumed it was service-runner, but yeah now that I look that's in service-template-node, not service-runner: https://github.com/wikimedia/service-template-node/blob/main/lib/swagger-ui.js

Unfortunately we need [...] workers

Curious, why do you need workers? Because WMF services are deployed on k8s now, we decided to rely on k8s pod replicas and k8s routing, instead of node.js worker / clustering.

Oct 17 2024, 10:53 AM · Content-Transform-Team, Platform Engineering, Trust and Safety Product Team (Engineering), Patch-For-Review, Essential-Work, Page Content Service, MediaWiki-Engineering, [DEPRECATED] wdwb-tech, Wikidata, Citoid, Wikidata-Termbox, Wikimedia-Portals, Data-Engineering, serviceops

Oct 16 2024

Mvolz added a comment to T362379: Several major news websites (NYT, NPR, Reuters...) block citoid .

Another user report: https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(miscellaneous)#Adding_Hindustan_Times_sources

Oct 16 2024, 10:23 AM · Patch-For-Review, Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid

Oct 9 2024

Mvolz added a comment to T349118: Migrate node-based services in production to node18.

FYI, service-utils (a replacement for service-runner) is nearing its first production deployment. In case you want to wait for it instead of solving service-runner problems :) cc @tchin

Oct 9 2024, 12:56 PM · Content-Transform-Team, Platform Engineering, Trust and Safety Product Team (Engineering), Patch-For-Review, Essential-Work, Page Content Service, MediaWiki-Engineering, [DEPRECATED] wdwb-tech, Wikidata, Citoid, Wikidata-Termbox, Wikimedia-Portals, Data-Engineering, serviceops
Mvolz added a comment to T374274: Make automatic citation tool use {{Cite arxiv}} instead of {{Cite journal}} for arxiv..

I think your best bet to accomplish something like this is a userscript/gadget.

Oct 9 2024, 12:38 PM · Citoid

Oct 8 2024

Mvolz updated subscribers of T363292: Usage statistics for automatic reference generation with Citoid.

Quoting from T372438 thread:

Looking at how it was done...

automatic-generate-fail-searchResultsThe request made it all the way through the reliability check, tried to build the templates from the citoid results, and failed at that point (probably because there were no results, but could also include any other errors in building the templates)
automatic-generate-fail-networkThe promise from the mw.Api request was rejected - without any testing for further details, so I think this is a consolidation of any possible API errors + any network errors that interrupted the request

I think these events are misleadingly labelled, then.

automatic-generate-fail-network is more like no search results from the api. When we don't have results, it returns a 404 (or a 415 if we don't have results because it's a pdf). This likely is the majority of cases and the numbers roughly match the api numbers. Only rarely would it be some sort of connection issue.

You don't get to network success unless there are search results. So the automatic-generate-fail-searchResults thing means there were definitely search results from the API. However, in some rare cases even if there are results from the API, we fail to build a template. One reason might be if the template listed for it doesn't have any template data. Then we can't build the template, so it fails.

Oct 8 2024, 10:13 AM · MW-1.44-notes (1.44.0-wmf.3; 2024-11-12), WMDE-TechWish-Sprint-2024-10-16, WMDE-TechWish-Sprint-2024-10-02, WMDE-TechWish-Sprint-2024-05-29, MW-1.43-notes (1.43.0-wmf.5; 2024-05-14), WMDE-TechWish-Sprint-2024-05-08, WMDE-TechWish-Sprint-2024-04-24, WMDE-References-FocusArea

Oct 7 2024

Mvolz added a comment to T372438: Reassess status of major sites that have been blocking Citoid.

Which is to say: if citoid is returning an API error when we encounter a non-200 status code, all of that should get swept into the network category.

Oct 7 2024, 11:01 AM · Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid

Oct 1 2024

Mvolz added a comment to T372438: Reassess status of major sites that have been blocking Citoid.
  1. How do the following metrics compare in the two weeks before and after 3 August 2024?
Oct 1 2024, 11:02 AM · Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz updated the task description for T372438: Reassess status of major sites that have been blocking Citoid.
Oct 1 2024, 10:18 AM · Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz updated the task description for T372438: Reassess status of major sites that have been blocking Citoid.
Oct 1 2024, 10:02 AM · Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz added a comment to T372438: Reassess status of major sites that have been blocking Citoid.

403 responses:

Oct 1 2024, 10:01 AM · Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz added a comment to T372438: Reassess status of major sites that have been blocking Citoid.

Of the times when people open Citoid and do NOT follow through to insert a reference, what reasons explain why this might be the case?

Overall proportion of failed citoid editing sessions by reason:

ReasonJuly 2024August 2024
Automatic citation generation failed19.1%19.3%
* Network error while attempting to retrieve the necessary information18.4%18.7%
* No search results were found0.7%0.6%
Session aborted80.9% 80.7%

As found in T368988#10123059, about 19% of all citoid editing sessions that failed to insert a reference were due to an automatic citation generation failure. Of these sessions, the majority of citation generation failures were due to a network error while retrieving citation info. Network errors account for 18.7% of all citoid sessions that failed and 97% of all citoid sessions that failed due an automatic citation generation failure.

There were no observed changes in observed failure rates before or after the decrease in Citoid traffic on August 3rd.

Per platform proportion of failed citoid editing sessions by reason:
Per platform results are very similar to the overall results. The majority of citation generation failures on both desktop and mobile were due to a network error. 97% of all citation generation failures on desktop were due to a network error and 95% of all citation generation failures on mobile were due to a network error.

Time Series Chart
I further investigated the daily number of Citoid editing events and confirmed there were no any sudden changes in the number of automatic citation generation failure events right around August 3rd. The number of daily citoid sessions that have failed to insert a reference due to a network failure to retrieve results has remained stable before and after the change in Citoid traffic.

citoid_failure_events_daily.jpg (540×960 px, 38 KB)

From this, it still appears that the 3 August 2024 change had no impact on overall user success or failure rates using Citoid on desktop or mobile.

cc @ppelberg

Oct 1 2024, 9:47 AM · Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid

Sep 26 2024

Mvolz added a comment to T361728: SwaggerProbeHasFailures for citoid (due to Zotero failures) after upgrading to node 18.

This was mostly specifically related to our probe, i.e. only requests to wikipedia.org were timing out; @akosiaris determined this was due to it only being available in iPv4 and node 18 preferring iPv6.

Sep 26 2024, 12:20 PM · serviceops-radar, Citoid
Mvolz updated the task description for T349118: Migrate node-based services in production to node18.
Sep 26 2024, 12:14 PM · Content-Transform-Team, Platform Engineering, Trust and Safety Product Team (Engineering), Patch-For-Review, Essential-Work, Page Content Service, MediaWiki-Engineering, [DEPRECATED] wdwb-tech, Wikidata, Citoid, Wikidata-Termbox, Wikimedia-Portals, Data-Engineering, serviceops

Sep 24 2024

Mvolz added a comment to T374831: Start adding on-wiki mappings for archive_url and archive_date.

For Tech News:

[technical] "When Citoid generates a reference based on an archive.org URL, we currently have no way of populating the archive-url and archive-date parameters in the citation template. The archive.org URL is considered the "original" URL, not as an archive link. The Editing Team is working on fixing this. We are asking communities to preemptively add mappings for these parameters to the citoid map within the TemplateData for each citation template."

Sep 24 2024, 9:40 AM · User-notice, Editing-team (Kanban Board), Internet-Archive, VisualEditor, Citoid

Sep 19 2024

Mvolz added a comment to T374831: Start adding on-wiki mappings for archive_url and archive_date.

Though might get complaints i.e. if we add a link when we're blocked, and we're de facto marking nytimes links as dead because that's the default if we don't include the parameter... but I don't see an easy way around this anyway because if we're blocked we don't know for sure by code... could assume 415 is blocked and 404 is not available but so many different codes are used here)

Sep 19 2024, 11:35 AM · User-notice, Editing-team (Kanban Board), Internet-Archive, VisualEditor, Citoid
Mvolz updated the task description for T374831: Start adding on-wiki mappings for archive_url and archive_date.
Sep 19 2024, 11:31 AM · User-notice, Editing-team (Kanban Board), Internet-Archive, VisualEditor, Citoid
Mvolz added a comment to T374831: Start adding on-wiki mappings for archive_url and archive_date.

@Mvolz I'm not sure how urlStatus would work given the value type is different per wiki:

  • on en.wiki it is a one of various strings (dead/live/deviated/unfit...)
  • on fr.wiki it is a boolean/datetime (brisé le broken since = 4 juin 2018 or oui).
  • on es.wiki it is a boolean (urlmuerta = [|no])
Sep 19 2024, 11:30 AM · User-notice, Editing-team (Kanban Board), Internet-Archive, VisualEditor, Citoid
Mvolz updated subscribers of T344736: Migrate Citoid/Zotero Pipeline Repos to GitLab.

@dchan @zoe before we actually start using the zotero repo in gitlab we need to set up ci to build the docker image. @dchan would this be something you could do?

Sep 19 2024, 10:23 AM · GitLab (Pipeline Services Migration🐤), Editing-team
Mvolz updated the task description for T344736: Migrate Citoid/Zotero Pipeline Repos to GitLab.
Sep 19 2024, 10:20 AM · GitLab (Pipeline Services Migration🐤), Editing-team
Mvolz updated the task description for T344736: Migrate Citoid/Zotero Pipeline Repos to GitLab.
Sep 19 2024, 10:20 AM · GitLab (Pipeline Services Migration🐤), Editing-team
Mvolz updated subscribers of T344736: Migrate Citoid/Zotero Pipeline Repos to GitLab.
Sep 19 2024, 10:18 AM · GitLab (Pipeline Services Migration🐤), Editing-team

Aug 27 2024

Mvolz added a comment to T370809: Add response header logging to citoid and whitelist headers that indicate anti-bot challenges.

I was thinking of having an allow or denylist to only log certain headers, and possibly simply logging if cf-mitigated is present and what its value is. That way, when we're looking at metrics we can start to segment by whether we hit a web application firewall or are having a site-specific issue.

Aug 27 2024, 4:47 PM · Patch-For-Review, Editing-team (Kanban Board), Citoid
Mvolz added a comment to T370809: Add response header logging to citoid and whitelist headers that indicate anti-bot challenges.

No, we still need to actually log these. I wasn't sure who was working on this but I just started a PR... right now I'm just JSON.stringifying the headers, but were there particular headers we wanted? Or do we juts stringify the whole thing?

Aug 27 2024, 1:07 PM · Patch-For-Review, Editing-team (Kanban Board), Citoid

Aug 21 2024

Mvolz updated the task description for T349118: Migrate node-based services in production to node18.
Aug 21 2024, 9:39 AM · Content-Transform-Team, Platform Engineering, Trust and Safety Product Team (Engineering), Patch-For-Review, Essential-Work, Page Content Service, MediaWiki-Engineering, [DEPRECATED] wdwb-tech, Wikidata, Citoid, Wikidata-Termbox, Wikimedia-Portals, Data-Engineering, serviceops
Mvolz added a comment to T349118: Migrate node-based services in production to node18.

FYI Citoid upgrade is blocked by dependency on service-runner: https://github.com/wikimedia/service-runner/pull/251

Aug 21 2024, 9:39 AM · Content-Transform-Team, Platform Engineering, Trust and Safety Product Team (Engineering), Patch-For-Review, Essential-Work, Page Content Service, MediaWiki-Engineering, [DEPRECATED] wdwb-tech, Wikidata, Citoid, Wikidata-Termbox, Wikimedia-Portals, Data-Engineering, serviceops
Mvolz added a subtask for T349118: Migrate node-based services in production to node18: T361728: SwaggerProbeHasFailures for citoid (due to Zotero failures) after upgrading to node 18.
Aug 21 2024, 9:34 AM · Content-Transform-Team, Platform Engineering, Trust and Safety Product Team (Engineering), Patch-For-Review, Essential-Work, Page Content Service, MediaWiki-Engineering, [DEPRECATED] wdwb-tech, Wikidata, Citoid, Wikidata-Termbox, Wikimedia-Portals, Data-Engineering, serviceops
Mvolz added a parent task for T361728: SwaggerProbeHasFailures for citoid (due to Zotero failures) after upgrading to node 18: T349118: Migrate node-based services in production to node18.
Aug 21 2024, 9:34 AM · serviceops-radar, Citoid

Aug 14 2024

Mvolz added a comment to T370702: Calculate rate at which URL requests fail and succeed.

I've added some stats here.

Wonderful.

I'm looking into getting some longer term data from metrics.

Excellent.

To get the above stats, I've added some new panels to the dashboard:

"Outgoing requests by status" in the lower right hand corner reports the percentage which are 200, so this is our "success" rate, and the failure rate is just 100% minus that.

outgoing.png (433×950 px, 32 KB)

Unfortunately this includes some pdfs are "succeeding" because some pdfs return us a 200, so for the above data I've filtered out the pdfs in a hacky way by excluding *.pdf on the request url.

@mvols: 2 questions in response:

  1. What field did you filter out *.pdf from? [i]
  2. With .pdf approximately excluded, are there any other media types that we ought to try to exclude such that the metrics Outgoing requests by status return are specific to URLs? [ii]

And hey! Thank you for sharing how you arrived to these metrics. You doing so helps equip me with the know-how I need to use this dashboard to ask and answer questions of Citoid independently.


i. This is the interface I'm assuming you used to create the filter you described above:

image.png (764×1 px, 109 KB)

ii. One thought: maybe the effort required to do this is not worthwhile considering how minimal we assume the traffic to be for formats like .mov?

Aug 14 2024, 11:50 AM · Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz added a comment to T369547: Introduce a generic unsupported format error message within Citoid UI.

I find this message kind of confusing because the statements aren't explicitly connected.

Aug 14 2024, 11:34 AM · Editing QA, Goal, Design, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid

Aug 6 2024

Mvolz added a comment to T370702: Calculate rate at which URL requests fail and succeed.

I've added some stats here.

Aug 6 2024, 1:28 PM · Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid

Aug 5 2024

Mvolz updated the task description for T370702: Calculate rate at which URL requests fail and succeed.
Aug 5 2024, 8:52 AM · Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz updated the task description for T370702: Calculate rate at which URL requests fail and succeed.
Aug 5 2024, 8:15 AM · Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz updated the task description for T370702: Calculate rate at which URL requests fail and succeed.
Aug 5 2024, 7:54 AM · Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz updated the task description for T370702: Calculate rate at which URL requests fail and succeed.
Aug 5 2024, 7:53 AM · Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz updated the task description for T370702: Calculate rate at which URL requests fail and succeed.
Aug 5 2024, 7:52 AM · Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid

Aug 1 2024

Mvolz claimed T370702: Calculate rate at which URL requests fail and succeed.
Aug 1 2024, 10:30 AM · Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid

Jul 28 2024

Mvolz updated subscribers of T370702: Calculate rate at which URL requests fail and succeed.
Jul 28 2024, 7:38 AM · Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz added a comment to T370809: Add response header logging to citoid and whitelist headers that indicate anti-bot challenges.

We can start logging response headers after https://gerrit.wikimedia.org/r/c/mediawiki/services/citoid/+/1056571 is merged, as this change makes the get response (and consequently the headers) available. (currently still wip... hopefully almost done though!)

Jul 28 2024, 7:37 AM · Patch-For-Review, Editing-team (Kanban Board), Citoid

Jul 21 2024

Mvolz added a comment to T95388: Try to find link in archive.org when direct scraping fails.

I've implemented this.

Jul 21 2024, 7:48 AM · Editing-team (Kanban Board), Patch-For-Review, Internet-Archive, VisualEditor, Citoid
Mvolz added a comment to T115224: On URL submission, look up the archived page in the Internet Archive's index and add to the return data.

I would like to emphasize that the availability API has seen improvements since 2017. Is it possible you can redo your investigation a bit?

Jul 21 2024, 7:45 AM · Patch-Needs-Improvement, Internet-Archive, VisualEditor-MediaWiki-References, VisualEditor, Citoid

Jul 17 2024

Mvolz added a comment to T370263: Review the entire flow of interaction between VisualEditor Citoid extension, the Citoid service and Zotero.

Something to note: preq library will automatically retry requests in certain situations so potentially every preq request is two.

Jul 17 2024, 5:41 PM · Citoid

Jul 13 2024

Mvolz added a comment to T369928: Add caching of citoid results in the extension.

Is this for purely client-side cache (in-memory, just affecting people who repeatedly click "generate" in a single session)? If so it seems harmless, but I can't imagine it'd have much impact.

Jul 13 2024, 4:41 PM · Editing-team (Kanban Board), VisualEditor, Citoid

Jul 12 2024

Mvolz created T369928: Add caching of citoid results in the extension.
Jul 12 2024, 6:47 PM · Editing-team (Kanban Board), VisualEditor, Citoid

Jul 10 2024

Mvolz added a comment to T365583: Return 415 Media Type not Supported errors for pdfs and other types of unsupported formats in the citoid back end. .
In T365583#9946857, @Mvolz wrote...

@Mvolz, three questions in response to this update...

1) What work would be involved to, as you described, "...detect content-type in citoid first."?

A patch in the backend, but I would do this after 2) is resolved.

2) How – if at all – does allowing, "...restbase/hyperswitch to correctly pass through the error itself." impact our ability to detect the content-type that caused the error to be activated and subsequently, offer people feedback specific to the content-type they're trying to cite?

Jul 10 2024, 5:38 AM · Platform Engineering, VisualEditor, Editing-team (Kanban Board), VisualEditor-MediaWiki-References, Citoid

Jul 9 2024

Mvolz added a comment to T369547: Introduce a generic unsupported format error message within Citoid UI.

We can do this now but not with the specified content-type, as we are returning 415 but not with a specific rejected content type.

Jul 9 2024, 1:59 PM · Editing QA, Goal, Design, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz added a comment to T364594: Revise Citoid error message to be more specific.

Can we create a link to the "Manual tab"? If so the message should invite users to fill a template.

Jul 9 2024, 1:59 PM · MW-1.44-notes (1.44.0-wmf.2; 2024-11-05), VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid

Jul 7 2024

Mvolz updated subscribers of T368980: Enable people to generate and save citations to a personal sandbox.

I think @Esanders mentioned in that task earlier that we cannot check user's browser for caches of other sites they've visited as it violates our privacy policy.

Jul 7 2024, 2:28 PM · VisualEditor, Citoid

Jul 5 2024

Mvolz added a comment to T367452: Reduce Citoid HTTP request volume by using HTTP HEAD instead of HTTP GET.

I had a look at our error percentage from when we deployed this and it doesn't seem to have done much for citation success rate in any particular direction. (New panel alert!)

Jul 5 2024, 11:26 AM · Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz added a comment to T367452: Reduce Citoid HTTP request volume by using HTTP HEAD instead of HTTP GET.

This caused T368971.

Jul 5 2024, 10:25 AM · Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz renamed T368971: https://www.clydeships.co.uk URL gets replaced by google.com.hk URL due to redirect given in response to HEAD requests from https://www.clydeships.co.uk URL gets replaced by google.com.hk URL to https://www.clydeships.co.uk URL gets replaced by google.com.hk URL due to redirect given in response to HEAD requests.
Jul 5 2024, 10:22 AM · Citoid
Mvolz added a comment to T368971: https://www.clydeships.co.uk URL gets replaced by google.com.hk URL due to redirect given in response to HEAD requests.

I've looked into this more and this seems to be a direct consequence of switching from head requests from get requests in order to follow redirects here: T367452

Jul 5 2024, 10:22 AM · Citoid

Jul 4 2024

Mvolz added a comment to T368971: https://www.clydeships.co.uk URL gets replaced by google.com.hk URL due to redirect given in response to HEAD requests.

After a bit of poking around, it appears the underlying issue is with the website, which (depending on user-agent) will send a redirect to google.com.hk. It has the same behavior in curl:

$ curl https://www.clydeships.co.uk/view.php?ref=10727
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="http://www.google.com.hk//?ref=10727">here</a>.</p>
</body></html>

I'm not sure exactly what user-agents cause it to have issues, though. Zotero can fetch the website just fine, and the text "Custom user agent" causes it to send back the actual page content. Maybe Citoid is explicitly deny-listed?

Jul 4 2024, 9:57 PM · Citoid

Jul 2 2024

Mvolz added a comment to T365583: Return 415 Media Type not Supported errors for pdfs and other types of unsupported formats in the citoid back end. .

I'm not sure how long it will take to fix the hyper switch issue. We might need to switch from restbase to the api gateway first.

Jul 2 2024, 7:32 PM · Platform Engineering, VisualEditor, Editing-team (Kanban Board), VisualEditor-MediaWiki-References, Citoid
Mvolz added a comment to T366093: Change Citoid user agent to use same pattern as Zotero.

No QA needed.

Jul 2 2024, 7:29 PM · Citoid, Editing-team (Kanban Board)
Mvolz closed T366093: Change Citoid user agent to use same pattern as Zotero, a subtask of T362379: Several major news websites (NYT, NPR, Reuters...) block citoid , as Resolved.
Jul 2 2024, 7:28 PM · Patch-For-Review, Goal, VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid
Mvolz closed T366093: Change Citoid user agent to use same pattern as Zotero as Resolved.
Jul 2 2024, 7:28 PM · Citoid, Editing-team (Kanban Board)

Jul 1 2024

Mvolz added a comment to T367870: Index OutgoingReqResult fields .

Okay, so if I redeploy right before midnight, we should all be good?

That can work, but note that if you do change the field type, you will lose the ability to query or filter that field while the field exists as multiple types. This conflict could take 90 days to resolve.

Jul 1 2024, 10:31 AM · Observability-Logging, VisualEditor, Editing-team (Kanban Board), VisualEditor-MediaWiki-References, Citoid

Jun 28 2024

Mvolz added a comment to T367870: Index OutgoingReqResult fields .

This week I tried to deploy a change that would have changed outgoingReqResult.error from a JSON obj to a string.

A change like that would cause OpenSearch to throw:

mapper_parsing_exception: object mapping for [error] tried to parse field [error] as object, but found a concrete value

All logs that did not match the previous schema would be dropped until index rollover (midnight each day).

Jun 28 2024, 8:17 PM · Observability-Logging, VisualEditor, Editing-team (Kanban Board), VisualEditor-MediaWiki-References, Citoid
Mvolz added a comment to T367870: Index OutgoingReqResult fields .

Indexed fields list is refreshed and outgoingReqResult.* is now searchable and aggregatable.

Please reach out regarding ECS. We're happy to help folks navigate the migration. :)

Jun 28 2024, 5:23 PM · Observability-Logging, VisualEditor, Editing-team (Kanban Board), VisualEditor-MediaWiki-References, Citoid
Mvolz added a comment to T361728: SwaggerProbeHasFailures for citoid (due to Zotero failures) after upgrading to node 18.

For zotero I'm afraid we can't add additional logging. AIUI the software does not have any proper logging options. But I think it's might also be more valuable to check on the citoid side as there is clearly where things change with the update.
From what I can see in the logs from citoid and the tls terminator of zotero is these requests failing: https://logstash.wikimedia.org/goto/208ad22ca41ab54070c889a59297b369
I would suggest to check which requests are actually send to zotero and then see how they differ between nodejs versions.

Jun 28 2024, 2:16 PM · serviceops-radar, Citoid

Jun 26 2024

Mvolz renamed T361728: SwaggerProbeHasFailures for citoid (due to Zotero failures) after upgrading to node 18 from SwaggerProbeHasFailures for citoid (due to Zotero failures) since last deployment to SwaggerProbeHasFailures for citoid (due to Zotero failures) after upgrading to node 18.
Jun 26 2024, 11:36 AM · serviceops-radar, Citoid
Mvolz renamed T361728: SwaggerProbeHasFailures for citoid (due to Zotero failures) after upgrading to node 18 from SwaggerProbeHasFailures for citoid since last deployment to SwaggerProbeHasFailures for citoid (due to Zotero failures) since last deployment.
Jun 26 2024, 11:34 AM · serviceops-radar, Citoid
Mvolz updated subscribers of T361728: SwaggerProbeHasFailures for citoid (due to Zotero failures) after upgrading to node 18.

Tried redeploy on just codfw with the upgraded chart, it did Not Go Well.

Jun 26 2024, 11:33 AM · serviceops-radar, Citoid

Jun 24 2024

Mvolz added a comment to T367194: Citoid/Zotero: Create rate limiting configurable on a per site basis.
Jun 24 2024, 1:39 PM · VisualEditor-MediaWiki-References, Editing-team (Kanban Board), VisualEditor, Citoid