[go: up one dir, main page]

Page MenuHomePhabricator

Mapping creation fails when wgCirrusSearchPhraseSuggestUseText is activated
Closed, ResolvedPublic

Description

I have been using CirrusSearch from the beginning but I am having major troubles with MW 1.28.0.
Any help would be appreciated ;-) Maybe this is because of php7.0 or maybe because of Elasticsearch 2.4.3?

This happens when I setup a new index:

MediaWiki 1.28.0
PHP 7.0.13-0ubuntu0.16.04.1 (apache2handler)
MySQL 5.7.16-0ubuntu0.16.04.1
Elasticsearch 2.4.3

CirrusSearch Extension : snapshot of version c23ae6a of the CirrusSearch extension for MediaWiki REL1_28
Elastica Extension: snapshot of version 0959e38 of the Elastica extension for MediaWiki REL1_28

php updateSearchIndexConfig.php

content index...
        Fetching Elasticsearch version...2.4.3...ok
        Scanning available plugins...
                head
        Inferring index identifier...wiki_mw-test_content_first
        Picking analyzer...german
                Validating number of shards...ok
                Validating replica range...ok
                Validating shard allocation settings...done
                Validating max shards per node...ok
        Validating analyzers...ok
PHP Warning:  Illegal string offset 'analyzer' in /var/www/mw-test/extensions/CirrusSearch/includes/Search/TextIndexField.php on line 153
PHP Warning:  array_merge(): Argument #2 is not an array in /var/www/mw-test/extensions/CirrusSearch/includes/Search/TextIndexField.php on line 157
        Validating mappings...
                Validating mapping...different...failed!
Couldn't update existing mappings. You may need to reindex.
Here is elasticsearch's error message: mapper_parsing_exception: illegal field [s], only fields can be specified inside fields

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

After updating the CirrusSearch extension to "master" I now get this, so this looks promising. But still, the download for mw 1.28 should work, shouldn't it?

php updateSearchIndexConfig.php
content index...
        Fetching Elasticsearch version...2.4.3...ok
        Scanning available plugins...
                head
        Inferring index identifier...wiki_mw-test_content_first
        Picking analyzer...german
                Validating number of shards...ok
                Validating replica range...ok
                Validating shard allocation settings...done
                Validating max shards per node...ok
        Validating analyzers...cannot correct
This script encountered an index difference that requires that the index be
copied, indexed to, and then the old index removed. Re-run this script with the
--reindexAndRemoveOk --indexIdentifier=now parameters to do this.

@SmartK yes this is definitely a bug, I'm trying to reproduce to understand what's wrong: is it just a mismatch when the 1.28 branch on cirrus was cut, or something else? I have no idea yet...

so the problem still is the same also with the "master" version of the extension: when I run the suggested command I get this:

php updateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier=now

content index...
        Fetching Elasticsearch version...2.4.3...ok
        Scanning available plugins...
                head
        Setting index identifier...wiki_mw-test_content_1484669599
        Picking analyzer...german
        Creating index...ok
                Validating number of shards...ok
                Validating replica range...ok
                Validating shard allocation settings...done
                Validating max shards per node...ok
        Validating analyzers...ok
PHP Warning:  Illegal string offset 'analyzer' in /var/www/mw-test/extensions/CirrusSearch/includes/Search/TextIndexField.php on line 154
PHP Warning:  array_merge(): Argument #2 is not an array in /var/www/mw-test/extensions/CirrusSearch/includes/Search/TextIndexField.php on line 158
        Validating mappings...
                Validating mapping...different...failed!
Couldn't update existing mappings. You may need to reindex.
Here is elasticsearch's error message: mapper_parsing_exception: illegal field [s], only fields can be specified inside fields

I tried with PHP7 but could not reproduce either, @SmartK would it possible to add this line of code in the the TextIndexField.php file (lines between the DEBUG comments) :

                foreach ( $extra as $extraField ) {
/// DEBUG START
                        if ( !is_array( $extraField ) ) {
                                throw new \Exception( 'Bug: ' . print_r( $extra, true ) );
                        }
/// DEBUG END
                        $extraName = $extraField[ 'analyzer' ];

Then please run the script again and paste the full output here. Thanks!

ok, here you go....

php updateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier=now


content index...
        Fetching Elasticsearch version...2.4.4...ok
        Scanning available plugins...
                head
        Setting index identifier...wiki_mw-test_content_1484750021
        Picking analyzer...german
        Creating index...ok
                Validating number of shards...ok
                Validating replica range...ok
                Validating shard allocation settings...done
                Validating max shards per node...ok
        Validating analyzers...ok
[6f905bad89408c49a5c39a55] [no req]   Exception from line 156 of /var/www/mw-test/extensions/CirrusSearch/includes/Search/TextIndexField.php: Bug: Array
(
    [analyzer] => suggest
    [0] => Array
        (
            [analyzer] => word_prefix
            [search_analyzer] => plain_search
            [index_options] => docs
        )

)

Backtrace:
#0 /var/www/mw-test/extensions/CirrusSearch/includes/Maintenance/MappingConfigBuilder.php(138): CirrusSearch\Search\TextIndexField->getMapping(CirrusSearch)
#1 /var/www/mw-test/extensions/CirrusSearch/includes/Maintenance/MappingConfigBuilder.php(191): CirrusSearch\Maintenance\MappingConfigBuilder->getDefaultFields(integer)
#2 /var/www/mw-test/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php(524): CirrusSearch\Maintenance\MappingConfigBuilder->buildConfig(integer)
#3 /var/www/mw-test/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php(399): CirrusSearch\Maintenance\UpdateOneSearchIndexConfig->getMappingConfig()
#4 /var/www/mw-test/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php(263): CirrusSearch\Maintenance\UpdateOneSearchIndexConfig->validateMapping()
#5 /var/www/mw-test/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php(58): CirrusSearch\Maintenance\UpdateOneSearchIndexConfig->execute()
#6 /var/www/mw-test/maintenance/doMaintenance.php(111): CirrusSearch\Maintenance\UpdateSearchIndexConfig->execute()
#7 /var/www/mw-test/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php(65): require_once(string)
#8 {main}

@SmartK awesome thanks, I can now reproduce.
The problem is due to the config option $wgCirrusSearchPhraseSuggestUseText being certainly set to true in your config.
The workaround would be to set it to false while we fix this bug.
Sorry about that, this option is never used in our context and our test suite failed to catch it...

dcausse renamed this task from mapper_parsing_exception with mw 1.28.0 to Mapping create fails when wgCirrusSearchPhraseSuggestUseText is activated.Jan 18 2017, 3:01 PM
dcausse renamed this task from Mapping create fails when wgCirrusSearchPhraseSuggestUseText is activated to Mapping creation fails when wgCirrusSearchPhraseSuggestUseText is activated.
debt triaged this task as Medium priority.Jan 19 2017, 11:10 PM
debt moved this task from needs triage to This Quarter on the Discovery-Search board.
debt subscribed.

Since there is a work-around, moving to this quarter.

Thank you @dcausse ! This does work!!!
Can I activate it after initializing CirrusSearch or do I have to keep it disabled for now?

I normally have these lines active in LocalSettings.php

$wgCirrusSearchPhraseUseText = true;
$wgCirrusSearchPrefixSearchStartsWithAnyWord = true;
$wgCirrusSearchPhraseSuggestUseText = true;
$wgCirrusSearchAllowLeadingWildcard = true;

$wgCirrusSearchPhraseSuggestUseText is only read when a new index is created, activating it afterward will have no effect sadly. The purpose of this option is to increase recall on "did you mean" suggestions, it will basically use the text content to build its dictionary for typo correction. Without it only titles and redirects are used, your index should be smaller and faster but fewer typos will be detected.
I'd suggest to keep it disabled until a bugfix is available.

Concerning other options:

  • $wgCirrusSearchPhraseUseText: I don't know this one, it's maybe an old config var that is now removed, you can remove it I think
  • $wgCirrusSearchPrefixSearchStartsWithAnyWord: This one also is read at index time, you can keep it like that unless you found issues
  • $wgCirrusSearchAllowLeadingWildcard: this one is used at query time to prevent running slow queries but if it works for you it's fine to keep it.

To sum up your config should look like :

$wgCirrusSearchPrefixSearchStartsWithAnyWord = true;
// FIXME: switch to true when cirrus is deployed on this wiki with https://phabricator.wikimedia.org/T155489 fixed
$wgCirrusSearchPhraseSuggestUseText = false;
$wgCirrusSearchAllowLeadingWildcard = true;

Any news on this option being fixed: $wgCirrusSearchPhraseSuggestUseText

Would be highly appreciated!

Hi @dcausse - do you have a sense of how long this bug fix would take to complete or level of effort?

Change 365593 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/CirrusSearch@master] Fix mapping bug when using text for DYM suggestions

https://gerrit.wikimedia.org/r/365593

sorry this bug fell into the crack, it's a oneliner, fixed.

You are the best. Thank you a lot!!!
To clarify, can I use:

$wgCirrusSearchPhraseSuggestUseText = true;

or do I have to use it in a different way now?

@SmartK yes no need to change anything in your config. Also I think you can easily cherry-pick the fix (very easy fix) on your wiki installation if you don't want to wait for the fix to be actually released.

Thank you, I can wait.... ;-)
So I will change

$wgCirrusSearchPhraseSuggestUseText = false; 
to
$wgCirrusSearchPhraseSuggestUseText = true;

again and re-index the whole thing....

ATM I have trouble with Mediawiki 1.29.0 and ElasticSearch 5.5 but this is a different problem. Will wait some weeks, maybe this gets "fixed" as well.

Thank you again!

Change 365593 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Fix mapping bug when using text for DYM suggestions

https://gerrit.wikimedia.org/r/365593

Change 365927 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/CirrusSearch@REL1_29] Fix mapping bug when using text for DYM suggestions

https://gerrit.wikimedia.org/r/365927

Change 365927 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@REL1_29] Fix mapping bug when using text for DYM suggestions

https://gerrit.wikimedia.org/r/365927