Steps to replicate the issue (include links if applicable):
- Search for book title "ꦥꦼꦥꦼꦛꦶꦏ꧀ꦏꦤ꧀ꦱꦏꦶꦁꦥꦿꦗꦚ꧀ꦗꦶꦪꦤ꧀ꦭꦩꦶ" in Commons works (the full title), found the PDF
- Search partial "ꦥꦼꦥꦼꦛꦶꦏ꧀ꦏꦤ꧀ꦱꦏꦶꦁꦥꦿꦗꦚ꧀ꦗꦶꦪꦤ꧀" or "ꦥꦼꦥꦼꦛꦶꦏ꧀ꦏꦤ꧀ꦱꦏꦶꦁ" or "ꦥꦼꦥꦼꦛꦶꦏ꧀" or any parts of the full title won't work
What happens?:
- First identified 10 years ago, T46350, marked wont fix, since last time was during migration from Lucene to Cirrus Search.
- I identified there was a problem with scriptio continua nature of Javanese script (no word marker)
- T58505 Cirrus Search ticket was closed as solved
What should have happened instead?:
The Commons search et. al should be able to find parts of the full title.
Software version (skip for WMF-hosted wikis like Wikipedia):
Other information (browser name/version, screenshots, etc.):
A bit background on the way the script is written:
- Scriptio continua is a hassle to display in web, because there's no obvious line break. Therefore, in projects such as Wikisource, if not handled properly, would break the page view of the transcribed documents. (become too wide)
- Using certain keyboards, including the way we handle the problem is, to automatically insert ZWS (zero-width space) after certain character (e.g. comma, period, etc. ) I can gave you the full list. Therefore, the line break would still works, except in very rare cases where there's no occurrences of those characters (thus the ZWS not auto-inserted). [the zws doesn't always equal to the Latin space]
- AFAIK ZWS is not supported in page titles, (e.g. when I upload books with Javanese script titles that contain ZWS), so none of the titles in jv wiki projects contain ZWS, and thus the cirrus search won't be able to know the word delimiter.