ArchCom RFC Meeting W30: authenticated key-value store (2016-07-27, #wikimedia-office)
ActivePublic
Actions

Hosted by daniel on Jul 27 2016, 9:00 PM - 10:00 PM.

Description

Agenda

Location: #wikimedia-office IRC channel
Topic: T128602: Create and deploy an extension that implements an authenticated key-value store
Meeting type: Problem definition

Meeting summary

LINK: https://phabricator.wikimedia.org/E237 Phab event link (robla, 21:01:38)
LINK: https://phabricator.wikimedia.org/T128602 authenticated key-value store RFC (robla, 21:02:09)
Question discussed: does the store need to be authenticated? (robla, 21:05:38)
conversation turned to discussion of use cases for Mobile App (robla, 21:08:47)
Discussion turned to authentication possibilities, and then to using user_props (robla, 21:24:36)
LINK: https://www.mediawiki.org/wiki/Manual:User_properties_table (Marybelle, 21:26:03)
LINK: https://remotestorage.io/ (DanielK_WMDE__, 21:27:11)
DanielK_WMDE__ makes the case for using remoteStorage IETF draft implementation; discussion about that will likely continue in the RFC (robla, 21:37:08)
ACTION: ArchCom needs to bump the priority on a watchlist specific RFC (robla, 21:56:18)

People present (lines said)

brion (78)
tgr (37)
Marybelle (36)
robla (31)
DanielK_WMDE__ (30)
anomie (28)
SMalyshev (25)
gwicke (22)
dbrant (14)
niedzielski (12)
Scott_WUaS (7)
wm-labs-meetbot` (3)
TimStarling (3)
mdholloway (2)
Pchelolo (1)

Full log

P3589 ArchCom-RFC-2016W30-irc-E237.txt

1	21:00:40 <robla> #startmeeting ArchCom 2016W30: authenticated key-value store
2	21:00:40 <wm-labs-meetbot`> Meeting started Wed Jul 27 21:00:40 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot.
3	21:00:40 <wm-labs-meetbot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
4	21:00:40 <wm-labs-meetbot`> The meeting name has been set to 'archcom_2016w30__authenticated_key_value_store'
5	21:01:01 <robla> #topic Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) \| Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
6	21:01:25 <Marybelle> tgr: When you say "large amounts of global data which are needed infrequently", what do you mean specifically?
7	21:01:38 <robla> #link https://phabricator.wikimedia.org/E237 Phab event link
8	21:02:09 <robla> #link https://phabricator.wikimedia.org/T128602 authenticated key-value store RFC
9	21:02:13 <tgr> the specific use case that resulted in this RfC was reading lists
10	21:02:42 <tgr> ie lists of favorite articles which are synchronized across devices
11	21:03:03 <Marybelle> > While "reading lists" is one of the reasons the app teams want this, this isn't an implementation of page lists.
12	21:03:05 <robla> dbrant and anomie: thanks for your work on this!
13	21:03:07 <Marybelle> This task makes no sense to me.
14	21:03:28 <robla> I think maybe we can start with the list of questions in the task
15	21:03:37 <Marybelle> If you want a private list of favorite articles, that sounds like a watchlist.
16	21:03:41 <SMalyshev> question: does the store itself needs to be authenticated? I mean, we store sessions relying just on long random ID to be secure. Can't we use another long random ID to secure prefs?
17	21:04:06 <Scott_WUaS> Hi All!
18	21:04:16 <niedzielski> o/
19	21:04:18 <brion> SMalyshev: by default you'd do it behind the API which is authenticated
20	21:04:23 <anomie> SMalyshev: Sessions are that way because of the bootstrapping problem: you have to start somewhere.
21	21:04:32 <brion> No need to invent new auth methods
22	21:04:47 <robla> alright, I guess we'll start with SMalyshev 's question, and then move to "Should this be implemented as a MediaWiki action API endpoint or a restbase service?"
23	21:04:51 <mdholloway> SMalyshev: at least for the apps' use case, the reading lists should be private and therefore we'd need authentication
24	21:04:54 <mdholloway> hi all, btw
25	21:04:55 <tgr> SMalyshev: not sure what you mean by the store itself being authenticated
26	21:05:06 <SMalyshev> brion: well, yes, if it's mediawiki API then sure. but understand it's one of the options?
27	21:05:10 <Marybelle> robla: I think focusing on implementation before defining use-cases is silly.
28	21:05:11 <tgr> the API should authenticate, just like sessions do
29	21:05:38 <robla> #info Question discussed: does the store need to be authenticated?
30	21:05:56 <brion> Marybelle: the implementation of the store? Or the client side app feature that uses it?
31	21:06:06 <anomie> If it's in the action API, it'll be authenticated just like anything else. If it's in restbase, I think that has support for the same sort of thing (but I don't know details).
32	21:06:16 <tgr> how that's represented in the backend is an implementation details that is IMO low relevance compared to other factors for choosing backends
33	21:06:20 <brion> The app is a thing that already exists and already has a reading list feature.
34	21:06:23 <Marybelle> brion: The use-cases of having another authenticated store.
35	21:06:43 <Marybelle> If this is for private lists of articles, that sounds like a watchlist.
36	21:07:02 <brion> Marybelle ok. The use case is to store per user private data on the server and be able to retrieve and update it, with no versioning.
37	21:07:23 <Marybelle> We do that already in several places. :-)
38	21:07:47 <Marybelle> I'm not sure making yet another private place is needed.
39	21:07:55 <brion> Marybelle: yes, in ways that don't match this use case as already replied on the bug.
40	21:08:08 <Marybelle> Okay. I guess I'm the only person not getting it.
41	21:08:09 <brion> Either we can change one of those methods to match, or make a new one.
42	21:08:13 <tgr> note that the apps already store global authenticate private data, but they are currently using user_props for that, which is not terrible but suboptimal
43	21:08:13 <dbrant> Marybelle: this is a sort of generalization of watchlists, where the user can have multiple lists, each with its own attributes (name, description, whether it's saved for offline reading, etc)
44	21:08:26 <tgr> so reading lists are not in fact the only current use case
45	21:08:47 <robla> #info conversation turned to discussion of use cases for Mobile App
46	21:08:56 <niedzielski> tgr: we currently use user prefs to store theme info. we'd probably migrate that to this new store
47	21:09:02 <Marybelle> brion: Right. I'd really like to avoid yet another special thing to maintain indefinitely.
48	21:09:05 <niedzielski> (just as an example)
49	21:09:21 <brion> nod keep em simple when possible :)
50	21:09:22 <SMalyshev> reading https://phabricator.wikimedia.org/T128602#2499662 it looks like the delta vs. user_properties is mostly size/performance concerns?
51	21:09:30 <Marybelle> niedzielski: You think MediaWiki's database should store a user's app theme preference?
52	21:09:32 <gwicke> dbrant: do you see this to eventually covering collection (book) functionality as well?
53	21:10:22 <tgr> SMalyshev: and keeping complexity down
54	21:10:36 <tgr> user prefs need to be loaded as a bundle, for perf reasons
55	21:10:50 <niedzielski> Marybelle: what i mean to say is that we currently use user options to store a theme preference. if we had a general purpose key / value store, that would be a more appropriate spot
56	21:10:52 <tgr> reading lists would have to be loaded one by one, again for perf reasons
57	21:11:02 <anomie> SMalyshev: On the back end, anyway, although there's a question of whether it should be cross-wiki or if users should just pick one "central" wiki. On the front end it'll support fetching individual keys instead of having to fetch the whole thing at once, and possibly some other bits.
58	21:11:14 <Marybelle> niedzielski: I'm not sure I get why a mobile app/client gets to use MediaWiki's database to store its preferences.
59	21:11:18 <tgr> mixing the two would probably result in more awkward code then having different services for them
60	21:11:25 <Marybelle> That seems weird.
61	21:11:28 <DanielK_WMDE__> from the RFC, it seems to me that one premise was "re-doing watchlists is hard, let's just a do a key/value store, that's easy".
62	21:11:30 <DanielK_WMDE__> But perhaps it's not that easy to do right, and so we should perhaps re-do watchlists instead? After all, we discussed global watchlists last week! Perhaps we'll be doing that anyway...
63	21:11:52 <niedzielski> Marybelle: do you mean why store preferences on a server vs on device? or mediawiki specifically?
64	21:11:57 <dbrant> Marybelle: the MediaWiki db already stores plenty of user settings. Whether these settings apply to the desktop browsing experience or the app experience shouldn't matter.
65	21:12:00 <brion> Marybelle: because it's an app for the site which you log in with your site credentials, and it's that or invent a new storage service ?
66	21:12:15 <DanielK_WMDE__> We have several needs that drive changes to watchlists: global watchlists, multiple watchlists, automatic expiry...
67	21:12:25 <anomie> DanielK_WMDE__: Part of the idea was prototyping their reading lists on top of a basic key-value store, instead of designing something then having to redo it when they find out they need different behavior.
68	21:13:01 <SMalyshev> DanielK_WMDE__: I think it's bigger than just watchlists, watchlists would be one usecase for this?
69	21:13:24 <tgr> DanielK_WMDE__: actually the original proposal was to the k-v story as a temporary solution that makes sense on its own as well and then eventually migrate to a dedicated lists API based on some sort of lists concept in core
70	21:13:35 <brion> Right thisll be used for the other user data that's currently shoehorned into userjs prefs as I understand?
71	21:13:36 <tgr> we might have given up on that by now, not sure
72	21:13:39 <Marybelle> brion: I guess I think about non-Wikimedia Foundation apps. Would those clients also be using MediaWiki's/Wikimedia's database to store their user preferences and data?
73	21:13:52 <robla> anomie: having a flexible solution sounds really nice. what if we find out it didn't work the way we wanted it to? How does this not become tech debt?
74	21:13:53 <dbrant> gwicke: i wouldn't see why not; do you mean something like "turn my reading list into a pdf"?
75	21:13:55 <brion> Marybelle: sure, why not?
76	21:14:02 <anomie> DanielK_WMDE__: As for watchlists in particular, the apps want the ability to have multiple lists that each aren't limited to a single wiki, and extra metadata, but not (I think?) actually the recentchanges-filtering functionality watchlists have.
77	21:14:05 <DanielK_WMDE__> SMalyshev: the driving use cases is (named) reading lists (aka bookmarks). which is very similar to watchlists. a k/v store could be used to cover this to some degree, as long as the lists don't become very large.
78	21:14:13 <gwicke> dbrant: yes
79	21:14:13 <Marybelle> It seems outside the scope and responsibility of MediaWiki a bit.
80	21:14:57 <brion> Marybelle: no more than watchlist and user prefs for the web UI surely
81	21:15:00 <anomie> robla: Are you asking about the key-value store, or a specialized reading-list service? If the former, that's the nice thing about a simple, generic key-value service.
82	21:15:05 <DanielK_WMDE__> anomie: cross-wiki watchlists are a lot of fun, as discussed here last week (or was it the week before)?
83	21:15:28 <Marybelle> brion: MediaWiki the application using the MediaWiki database isn't so crazy. Any random client application using the MediaWiki database seems a lot zanier.
84	21:15:29 <tgr> DanielK_WMDE__: apart from lists in core being a big and long project, I think it would be much more sane to go into it after we have a good understanding of the use cases, and a key-value store is great for prototyping
85	21:15:32 <brion> Those can all be implemented separately and could invent separate places to store their data, but I think it would not be super practical
86	21:16:03 <brion> Depends how narrowly or broadly you view MediaWiki IMO
87	21:16:15 <DanielK_WMDE__> i see two questions here, and we should perhaps pick one to discuss. a) do we want/need a generic k/v store, what needs does it address, what features does it need? and b) how do we best implement (or prototype) reading lists for mobile?
88	21:16:20 <brion> And how narrowly or broadly you view Wikipedia as a site or product or place
89	21:16:22 <DanielK_WMDE__> which of the two should we discuss?
90	21:16:27 <tgr> Gather tried to build its own API from the start and maintain it across use-case changes, and pretty much ended up with a key-value store (JSON blobs in an SQL table) with lots of cruft around it
91	21:16:33 <robla> anomie: more the latter. I'll simplify to 3 options: 1) wild success 2) questionable success 3) obvious failure. outcome 2 is where tech debt accrues
92	21:16:42 <gwicke> one issue with a generic key-value store without schema enforcement is that any client side app could write any kind of blob to any key
93	21:16:45 <anomie> And really, one of the open questions here is whether the key-value store should actually be in MediaWiki (action API) or should be a separate service for WMF app use (restbase).
94	21:17:04 <gwicke> this would put the burden of schema checking / validation squarely on the client
95	21:17:13 <Marybelle> brion: Do other big sites let client applications use their databases for arbitrary private data? Like Twitter and Facebook and friends?
96	21:17:51 <anomie> gwicke: You say that like it's a disadvantage. It could as well be an advantage.
97	21:17:57 <brion> Marybelle: are they platforms for sharing free knowledge?
98	21:18:02 <brion> :)
99	21:18:11 <gwicke> anomie: it's a trade-off
100	21:18:21 <brion> Anyway, we already can store tons of arbitrary data as you point out Marybelle
101	21:18:37 <brion> The question is can we do it in a way that's efficient and meets the needs of users
102	21:18:41 <Marybelle> I mean, if I were making a regular bookmark application, I wouldn't expect MediaWiki to be my back-end off-hand.
103	21:18:45 <gwicke> there is the related issue of schema migrations
104	21:18:58 <anomie> robla: Yeah, a specialized reading-list service would certainly have the danger of falling into #2. That's why I personally don't want to build one, at least not without decent planning to make it more likely to hit #1.
105	21:18:59 <SMalyshev> given that you can just create a wiki page and dump the data there I don't think it changes a lot
106	21:19:06 <gwicke> and format versioning
107	21:19:20 <brion> Schemas are out of scope for now, imo
108	21:19:32 <Scott_WUaS> (gwicke: can you please clarify what "collection (book) functionality" is? Thanks)
109	21:19:54 <anomie> Scott_WUaS: I'm guessing https://www.mediawiki.org/wiki/Extension:Collection
110	21:19:58 <gwicke> brion: they are necessarily in scope, the question is just where you handle them
111	21:20:03 <brion> Marybelle: maybe, or you might store it in one of the several places in the application servers user database that it makes available for that sort of thing
112	21:20:04 <Marybelle> brion: This use-case seems a bit against sharing free knowledge, if these are per-user and private, FWIW.
113	21:20:15 <Scott_WUaS> anomie: thnx
114	21:20:20 <gwicke> if you say it's out of scope on the server, then that implicitly means that they will need to be handled on the client
115	21:20:32 <Marybelle> Or a separate server.
116	21:20:34 <DanielK_WMDE__> would it be an option to just expose an existing K/V system to the public (with the necesssary auth in place)?
117	21:20:39 <brion> gwicke: yes it's explicitly in the clients sphere of responsibility
118	21:21:00 <Marybelle> DanielK_WMDE__: Existing like Redis or something?
119	21:21:06 <anomie> gwicke: Once you start shoving schemas and stuff into it, it's no longer a generic key-value store. The client is free to implement a schema on top of a generic key-value store if it wants, which makes the store itself more flexible.
120	21:21:15 <gwicke> with multiple clients, this might be tricky to support
121	21:21:19 <brion> DanielK_WMDE__: if we can query individual items and not send them to every view, user props would work.
122	21:21:26 <SMalyshev> DanielK_WMDE__: I think that would be one of the solutions. If we have a suitable one
123	21:21:27 <DanielK_WMDE__> Marybelle: yes. though redis explicitly says it's designed to be accessed by trusted clients only (i just checked) http://redis.io/topics/security
124	21:21:31 <brion> That's basically the difference
125	21:21:46 <robla> Marybelle is not likely to be convinced about use cases this hour, but other folks seem more interested in talking implementation, so let's focus on implementation
126	21:21:53 <brion> :)
127	21:21:58 <SMalyshev> DanielK_WMDE__: it doesn't have to be redis directly, can be redis (or other non-auth k/v) behind Mediawiki API front
128	21:22:03 <tgr> anomie: I guess something similar to how EventLogging schemas are handled could be done, I doubt it's worth the effort though
129	21:22:05 <Marybelle> I asked on the task about just using a separate key/namespace in user_props and just filtering.
130	21:22:14 <Marybelle> You don't have to send every user option on every page load.
131	21:22:17 <Marybelle> I'm not sure why we do.
132	21:22:17 <DanielK_WMDE__> brion: ok, new plan: hack user props that keys starting with an underscore will be skipped when writing props into jsconfig.
133	21:22:20 <SMalyshev> DanielK_WMDE__: which supports only API like "give me my data", not "give me her data"
134	21:22:21 <brion> Marybelle might work
135	21:22:29 <Marybelle> DanielK_WMDE__: +1
136	21:22:47 <brion> :)
137	21:23:05 <anomie> DanielK_WMDE__: Disadvantage: every existing user of user_props has to be updated to deal with the filtering, and we have to make sure the additional data doesn't have negative performance impact on the existing uses.
138	21:23:10 <DanielK_WMDE__> filtering user_props by prefix seems the simples solution by far...
139	21:23:21 <brion> Are there any low level probs with how that's stored?
140	21:23:21 <niedzielski> would user options be able to hold thousands of arbitrary keys ok?
141	21:23:30 <DanielK_WMDE__> anomie: yes. existing uses need to be surveyed
142	21:23:35 <tgr> see https://phabricator.wikimedia.org/T128602#2499662 for a list of problems with user_props
143	21:23:39 <SMalyshev> DanielK_WMDE__: if there's separate keys for up_property that would work I think
144	21:24:01 <gwicke> setting up key-value storage is fairly easy, I would say quite a bit easier than handling format versioning & migrations correctly
145	21:24:08 <anomie> Also, jcrespo didn't like the idea of putting it in the main database much in https://phabricator.wikimedia.org/T128602#2476545
146	21:24:10 <brion> Ok there's a byte limit I think on those, is that a problem?
147	21:24:16 <brion> Could be lifted with a schema tweak
148	21:24:17 <SMalyshev> anomie: does existing API right now just dumps all keys for the user, or it chooses specific ones?
149	21:24:32 <brion> Ah yes, and we did have req to move to separate db cluster
150	21:24:36 <robla> #info Discussion turned to authentication possibilities, and then to using user_props
151	21:24:38 <brion> Which is easy to do per table iirc
152	21:24:45 <Marybelle> tgr: First three bullets seem trivially solvable.
153	21:24:46 <anomie> SMalyshev: The existing API query only supports fetching all data for the user, unless I'm completely mistaken.
154	21:25:01 <DanielK_WMDE__> hm, i just found this: https://remotestorage.io/
155	21:25:02 <dbrant> brion: what's the limit, roughly?
156	21:25:03 <brion> Yeah API needs enhancement for query
157	21:25:07 <niedzielski> maybe it would be easier to drop a new generic key value store if the feature is unpopular than to clear user options
158	21:25:15 <DanielK_WMDE__> no idea if it's good, but it seems worth a look.
159	21:25:16 <brion> dbrant: 65534 bytes iirc
160	21:25:17 <SMalyshev> anomie: right, but that's only one API. I don't think it should be too hard to make this API skip certain keys in DB?
161	21:25:29 <brion> Should be a matter of changing column type
162	21:25:54 <gwicke> niedzielski: if there are clear patterns in how keys are structured, or there is only a single use case using this service, yes
163	21:25:56 <SMalyshev> anomie: also, if main DB is bad, we could make it two-stage - store opaque id in main db, store actual data in better storage
164	21:26:03 <Marybelle> https://www.mediawiki.org/wiki/Manual:User_properties_table
165	21:26:12 <anomie> SMalyshev: Then you're really making things complicated.
166	21:26:30 <SMalyshev> it's not that complicated I think... just one more call
167	21:26:31 <anomie> At that point, why not just use the actual-data storage?
168	21:26:33 <Marybelle> How is a filter complicated?
169	21:26:35 <anomie> directly
170	21:26:41 <SMalyshev> anomie: because auth, etc.
171	21:26:54 <tgr> again, if the goal is keeping things simple then having two separate systems with a very simple mode of operation seems better than having one that tries to be the mix of the two
172	21:26:56 <SMalyshev> I think maintaining two auth systems in sync is worse
173	21:26:58 <anomie> SMalyshev: No, not direct access to the backend.
174	21:27:00 <DanielK_WMDE__> Huh. interesting. https://remotestorage.io/ is sponsored by the Wau Holland Stiftung? that indicates it's not industry bullshit. doesn't ell me if it's any good for our use case, but
175	21:27:11 <DanielK_WMDE__> #link https://remotestorage.io/
176	21:27:22 <brion> Ooh that's neat
177	21:27:30 <anomie> SMalyshev: But the frontend directly accessing the better-storage backend instead of looking up in user_properties then in the better-storage.
178	21:27:32 * brion bookmarks for later
179	21:27:33 <SMalyshev> anomie: that means two APIs instead of one, otherwise the same. I'd rather have users learn one API :)
180	21:28:05 <anomie> SMalyshev: Ah, the Gather approach?
181	21:28:19 <TimStarling> my vote is to just add a table
182	21:28:35 <gwicke> data like reading lists would likely be interesting to multiple applications; I would be surprised if the app would end up being the only user
183	21:28:46 <TimStarling> avoid joins so that you can hack up some cross-server thing later if need be, query groups or something
184	21:29:01 * anomie is actually more interested in "BagOStuff vs some custom abstraction" over "where exactly do we hide the database table"
185	21:29:06 <TimStarling> this whole project seems like something that could be done in hardly any more time than it takes to have this meeting
186	21:29:16 <brion> Hehe
187	21:29:32 <gwicke> simple key-value starage, sure
188	21:29:34 <SMalyshev> defining what to do is often longer than doing it... it's normal :)
189	21:29:42 <gwicke> but that's hardly a solution
190	21:29:47 <dbrant> +1 to gwicke -- i can easily see mobile web and/or desktop surfacing reading lists for logged-in users.
191	21:29:50 <robla> anomie: should we try to answer that question now?
192	21:29:52 <Scott_WUaS> :)
193	21:30:25 <DanielK_WMDE__> Is this spec in line with what we need? https://datatracker.ietf.org/doc/draft-dejong-remotestorage/
194	21:30:29 <brion> That might be a more interesting question for the watchlist cross wiki magic future discussion :)
195	21:30:39 <anomie> robla: Wouldn't hurt. Although "action API vs restbase" is an even more interesting question, since it's the difference between me writing it and me not ;)
196	21:30:49 <DanielK_WMDE__> if so, perhaps we can rely on some ready-to-go solution.
197	21:31:07 <brion> I just don't want mobile apps to get bogged down in the meantime while we bikeshed reading lists
198	21:31:17 <gwicke> dbrant: once we have multiple clients, it might make more sense to provide an API that gives a bit more guarantees than just "it's a blob of bytes"
199	21:31:25 <SMalyshev> DanielK_WMDE__: does it have TLDR description?
200	21:31:47 <DanielK_WMDE__> SMalyshev: https://remotestorage.io/
201	21:32:48 <gwicke> updating the schema when all your schema handling is implemented in $n clients sounds hard
202	21:32:58 <SMalyshev> DanielK_WMDE__: this seems to be client-side storage?
203	21:33:26 <tgr> DanielK_WMDE__: how would auth work?
204	21:33:30 <SMalyshev> or the picture is misleading
205	21:33:41 <DanielK_WMDE__> SMalyshev: no, it's not.
206	21:33:50 <tgr> DanielK_WMDE__: "If <auth-dialog> is a URL, the user can supply their credentials for accessing the account (how, is out of scope)
207	21:33:58 <tgr> - not so promising
208	21:33:59 <DanielK_WMDE__> SMalyshev: the client picks it's storage provider. local is one option. all providers implement the same protocol
209	21:34:00 <anomie> DanielK_WMDE__: At a glance, that looks like it would serve the purpose. It wouldn't be able to be done in the context of the action API, but that's not a requirement (assuming it's not me writing it, anyway).
210	21:34:00 <Marybelle> Is 65,535 bytes really insufficient?
211	21:34:25 <brion> Long names add up and it's hard to make guarantees :)
212	21:34:25 <DanielK_WMDE__> tgr: they seem to use oauth
213	21:34:29 <Marybelle> That sounds like a lot of page IDs.
214	21:34:43 <brion> 64k would likely be enough in practice until it breaks on some extreme case
215	21:34:43 <robla> SMalyshev: it looks like the storage is OAuth protected server storage (that basically provides similar constraints to most client storage solutions)
216	21:35:00 <tgr> forcing all apps to go through an extra oauth authorization screen might be suboptimal
217	21:35:14 <DanielK_WMDE__> using an actual standard to implement this would be nice.
218	21:35:27 <DanielK_WMDE__> forhtering the idea of "clients pick where they store their data" is even nicer
219	21:35:29 <robla> DanielK_WMDE__ agreed
220	21:35:32 <tgr> anyway, that protocol seems like it might be a better solution but evaluating it within this IRC meeting is not realistic
221	21:35:49 <brion> Well ideally you'd probably want the same login for login and storage in a case like this
222	21:35:52 <DanielK_WMDE__> even if we implement this ourselves, perhaps we should implement the protocol defined there.
223	21:36:20 <DanielK_WMDE__> tgr: i agree. i'll link to it from the ticket
224	21:36:23 <brion> I am curious about it and it smells useful for unofficial apps and such, do bring it up in future :)
225	21:36:24 <tgr> and worth mentioning again that building a k-v store that relies on the action API is super simple
226	21:36:27 <dbrant> Marybelle: brion: for reference, the current record holder for the most pages in reading lists in our app is over 7000.
227	21:36:50 <tgr> bulding a new API that should match a draft protocol with OAuth 2 and whatnot is no
228	21:36:52 <niedzielski> i don't think 64k would be too small personally. it would encourage favoring distinct keys instead of json blobs
229	21:36:54 <tgr> not
230	21:37:08 <robla> #info DanielK_WMDE__ makes the case for using remoteStorage IETF draft implementation; discussion about that will likely continue in the RFC
231	21:37:12 <brion> 7000 dang! Might fit in int IDs bit maybe not titles yeah :)
232	21:37:38 <brion> Anyway that bits changeable if we need it
233	21:38:01 <brion> Easy to bump the column type up one
234	21:38:32 <SMalyshev> I understand that if we use IETF one we'd need a client for remoteStorage protocol and a backend. which should rely on some kind of k'v storage inside mediawiki
235	21:39:07 <SMalyshev> so we're kind of back to sq. 1? maybe with more standard client API
236	21:39:20 <robla> so...my understanding is that the Mobile Apps team plans to prototype something this quarter; should they use one of our existing tech, or should they explore something new?
237	21:39:38 <brion> So we have some interest in that protocol; and some talk about tweaking user props with a modified MW API method, and some talk about just adding a table still. Any other current main alternatives?
238	21:39:56 <brion> :)
239	21:40:11 <tgr> restbase I suppose?
240	21:40:15 * robla notes that the "IETF one" is a draft that hasn't made it to "Proposed Standard", and that submitting an IETF draft is trivial
241	21:40:28 <SMalyshev> tgr: does restbase authenticate?
242	21:41:35 <Pchelolo> SMalyshev: restbase can authenticate
243	21:42:02 <brion> Ok so that's another possibility yes :)
244	21:42:24 <brion> Though I think gwicke would prefer restbase services to be better specified for their data schemas ?
245	21:42:41 <brion> And this is still a young feature likely to change in details
246	21:42:46 <Marybelle> dbrant: How common is a reading list of 7000 pages? Just one user? A dozen users?
247	21:42:52 <dbrant> robla: to be clear, this isn't a blocker for our current goals this quarter. it would simply make our reading lists become a "complete" feature, the way we intended it to be (technically, two quarters ago :) ).
248	21:42:55 <brion> Eventually hopefully merging into fancy watchlists
249	21:43:25 <robla> dbrant: thanks for the clarification!
250	21:43:34 <brion> That helps us set timelines :)
251	21:43:38 <Marybelle> brion: I get pretty frustrated at how bad watchlists are, especially when I see people basically working around them instead of investing resources to fix them. :-/
252	21:43:51 <brion> Understandable!
253	21:44:13 <brion> I think we need to bring some more focus on that I agree
254	21:44:46 <gwicke> added a comment re versioning & API stability at https://phabricator.wikimedia.org/T128602#2500338
255	21:44:48 <Marybelle> For the general idea of private reading lists, it's kind of maddening that watchlists won't work.
256	21:44:54 <Marybelle> Sigh.
257	21:45:07 * anomie wouldn't mind working on watchlists, but there are currently enough other cooks in that kitchen and that's not the problem at hand here.
258	21:45:10 * robla tries to remember what ori and Steven Walling were pushing for a few years ago on the watchlist front
259	21:45:14 <DanielK_WMDE__> SMalyshev: the point was that client libraries and backend implementations exist, we wouldn't have to write them (if the current ones are good - which i don't know)
260	21:45:39 <SMalyshev> DanielK_WMDE__: I don't think backend which we need (i.e. with mediawiki auth) exists?
261	21:45:42 <DanielK_WMDE__> (sorry for jumping back to this, don't let me distract you9
262	21:45:53 <brion> dbrant: does the enhanced user pref model sound like a good short term for you as a sync mechanism? I think we all still agree fancier watchlist integration would be great for future
263	21:45:55 <DanielK_WMDE__> SMalyshev: backends with oath exist
264	21:46:25 <tgr> with a page lists API it really helps if you have a good idea what features you'll need exactly
265	21:46:37 <dbrant> Marybelle: there are over 100 users with at least 1000 pages in lists. Over 4000 users with at least 100 pages, etc...
266	21:46:42 <tgr> that's one place where Gather dug itself into the ground
267	21:46:55 <tgr> a k-v store is ideal for prototyping
268	21:47:21 <SMalyshev> DanielK_WMDE__: yeah but oauth against what? we need something on mediawiki side to do actual r/w... even if oauth plugs seamlessly there. Maybe I just don't understand yet what that API does :)
269	21:47:24 <tgr> for reading lists, and for a number of (non-list-related) future features I'd imagine
270	21:47:39 <dbrant> brion: i think that can definitely work, as long as it can handle a large number of keys.
271	21:48:07 <brion> Large number of keys should work with that model yes. You'd need a bulk lookup thiugh too?
272	21:48:12 <tgr> DanielK_WMDE__: note that this seems to be using OAuth 2 which MediaWiki does not support
273	21:48:18 <anomie> dbrant, brion: "as long as it can handle a large number of keys" is a good question. Again, https://phabricator.wikimedia.org/T128602#2476545
274	21:48:22 <DanielK_WMDE__> tgr: hm, right...
275	21:48:24 <brion> Heh
276	21:48:37 <tgr> probably not a huge undertaking to fix but way larger than the one proposed in this RfC
277	21:48:44 <Marybelle> robla: https://www.mediawiki.org/wiki/Requests_for_comment/Support_for_user-specific_page_lists_in_core
278	21:49:18 <robla> ah, that's the one, thanks Marybelle
279	21:49:25 <dbrant> brion: right, lookup too.
280	21:50:16 <dbrant> but then, if we're talking about a short term solution, we can limit things on the client end, too.
281	21:50:32 <Scott_WUaS> (anomie: just noticed that you're mentioned in this BBC article "Meet the 'bots' that edit Wikipedia" - http://www.bbc.com/news/magazine-18892510 :)
282	21:51:26 <tgr> one thing that hasn't been discussed is how much effort it would take to prevent abuse / how afraid we are it would happen
283	21:51:38 <brion> dbrant: ok let's maybe model a couple variants. Large blob, vs row per title? Then confirm whether they make sense on a tweaked user props table, and decide whether to look more at the alternatives?
284	21:51:53 <tgr> pirates using it for movie distribution or whatnot
285	21:52:22 <gwicke> a quota can take care of that
286	21:52:36 <dbrant> brion: +1 niedzielski: what do you think of that? &
287	21:52:45 <brion> tgr: good question. There's little we can do to prevent use of user props as a file sharing or DoS space usage against us, other than "it's inconvenient and there are probably easier ways to abuse the system"
288	21:53:01 <niedzielski> brion dbrant: one of the problems we consider with row vs blob were race conditions between clients. user options didn't seem well designed to handle that
289	21:53:07 <robla> tgr: the security considerations section of https://datatracker.ietf.org/doc/draft-dejong-remotestorage/?include_text=1 looks like a good start on a list
290	21:53:13 <brion> Changing API migh make it a bit easier to abuse buy not much
291	21:53:49 <niedzielski> brion dbrant: for example in the blob scenario, if two clients try to update the same list, the last client wins. there's also bandwidth concerns for the 7000 title person
292	21:53:53 <brion> niedzielski: yeah, you'd have to detect conflicts through some other means like a signal value
293	21:54:10 <brion> Goes smoother with smaller bits, but that complicates the filtering
294	21:54:15 <tgr> brion: we could do all kinds of usage tracking, user agent filtering etc
295	21:54:19 <gwicke> conflict resolution is somewhat orthogonal from storage strategy
296	21:54:30 <tgr> but building it is little effort and those things probably aren't
297	21:54:33 <brion> Ok were low on time :)
298	21:54:48 <brion> robla: shall we plan next steps?
299	21:55:06 * robla ponders what that would be
300	21:55:25 <gwicke> overall, I'm honestly sceptical about the value of using a generic key-blob storage service for use cases like reading lists
301	21:55:39 <brion> I think we want to bump pruoity on the more watchlist specific rfc!
302	21:55:44 <gwicke> if the use case is so well defined, then I think it deserves a real API
303	21:55:47 <brion> Priority
304	21:56:00 <anomie> gwicke: "if"
305	21:56:02 <brion> gwicke: agreed, medium to long term
306	21:56:18 <robla> #action ArchCom needs to bump the priority on a watchlist specific RFC
307	21:56:30 <Scott_WUaS> "if"
308	21:56:35 <niedzielski> brion dbrant: IIRC, we also had concerns with a list of page IDs vs a page ID with a list of lists. i don't think we came up with a great way to handle that and had to use the list title as the ID
309	21:56:58 <brion> Short term: I'll follow up with dbrant and niedzielski on using user prefs modified and see if that still makes sense
310	21:57:29 <brion> And anyone else want to do more research on general user data storage with that protocol?
311	21:57:45 <brion> It sounds potentially very useful for unofficial third-party tools and such
312	21:57:52 <dbrant> brion: sounds good
313	21:57:55 <anomie> brion: I'd suggest to run it by jcrespo too
314	21:58:02 <brion> Ah yes good
315	21:58:21 <robla> brion: I should probably take some of those action items from you, but yes, this all looks good
316	21:58:28 <brion> Hehe ok
317	21:58:30 <robla> (thank you for spelling this out!)
318	21:58:45 <brion> :)
319	21:59:09 <robla> anomie: dbrant - any last comments questions before we close this out?
320	21:59:18 <brion> Good discussion folks!
321	21:59:28 <Scott_WUaS> Yes!
322	21:59:29 <dbrant> robla: nope, really glad to see this moving forward
323	21:59:29 * anomie has no more comments at the moment
324	21:59:40 <robla> great discussion indeed....thanks everyone!
325	21:59:43 <niedzielski> \o
326	21:59:45 <brion> Woohoo
327	21:59:48 <robla> o/
328	21:59:57 <brion> Ok I gotta run, catch y'all later
329	22:00:02 <robla> #endmeeting

Other meetings

Architecture meetings
13:00 PT ArchCom Planning Meetings	upcoming	all since 2016-03-30
14:00 PT ArchCom-RFC Meetings	upcoming	all since 2015-09-09

Recurring Event
Next
Previous

Event Series: This event is an instance of E66: ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office), and repeats every week.

Details

Invitees

TechCom
• RobLa-WMF
JanZerebecki
MarkAHershberger
daniel

Event Timeline

• RobLa-WMF mentioned this in T128602: RFC: Backend for synchronized data from Wikipedia mobile apps.Jul 21 2016, 10:04 PM

• RobLa-WMF updated the event description. (Show Details)

• RobLa-WMF renamed this event from ArchCom RFC Meeting: <topic TBD> (<see "Starts" field>, #wikimedia-office) to ArchCom RFC Meeting: Create and deploy an extension that implements an authenticated key-value store (2016-07-27, #wikimedia-office).Jul 21 2016, 10:10 PM

• RobLa-WMF renamed this event from ArchCom RFC Meeting: Create and deploy an extension that implements an authenticated key-value store (2016-07-27, #wikimedia-office) to ArchCom RFC Meeting W30: Create and deploy an extension that implements an authenticated key-value store (2016-07-27, #wikimedia-office).Jul 22 2016, 12:13 AM

• RobLa-WMF updated the event description. (Show Details)Jul 27 2016, 5:14 AM

15:00:03 <wm-labs-meetbot`> Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-07-27-21.00.html
15:00:03 <wm-labs-meetbot`> Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-07-27-21.00.txt
15:00:03 <wm-labs-meetbot`> Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-07-27-21.00.wiki
15:00:03 <wm-labs-meetbot`> Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-07-27-21.00.log.html

• RobLa-WMF mentioned this in P3589 ArchCom-RFC-2016W30-irc-E237.txt.Jul 27 2016, 10:04 PM

• RobLa-WMF updated the event description. (Show Details)Jul 27 2016, 10:13 PM

• RobLa-WMF renamed this event from ArchCom RFC Meeting W30: Create and deploy an extension that implements an authenticated key-value store (2016-07-27, #wikimedia-office) to ArchCom RFC Meeting W30: authenticated key-value store (2016-07-27, #wikimedia-office).Jul 27 2016, 10:40 PM

For the specific use case of reading lists (this is not for the general case, everything you mentioned regarding generic store solution still stands):

watchlist (
  user int
  nampespace int
  title varchar(255)
  notificationtimestamp varchar(14)
)

|
|
(digievolves into)
|
|
v

watchlists (
  list_id int PK,
  user_id int, -- can be global user id
  list_type enum ('default watchlist for backwards compatibility', 'mobile bookmark stuff that may be (?) also viewable from desktop', 'any new list type we can think in the future (e.g. articles you like)', 'user defined list')
  list_wiki ('global', 'enwiki', ...)
  list name varchar(255)
)
 
watchlist_items ( -- or watchlist_titles
  list_id int
  wiki enum('enwiki', ...) -- only for cross-wiki lists, if needed
  namespace int
  title title varchar(255)
  notificationtimestamp varchar(14)
)

======================
Get all titles for a mobile list (sort of):
======================

SELECT wiki, namespace, title
FROM watchlist_items wlt
JOIN watchlists wl ON wl=list_id = wli.list_id
WHERE wl.user_id = $user AND list_type = 'mobile bookmark stuff that may be (?) also viewable from desktop'
ORDER BY ns, title;

(ignore the types like enum -we do not really want to use that type-, but it is my way of drafting and getting understood easily). All current code for watchlists (API, etc.) can apply to watchlist_items, only the multiplexing and new functions for watchlists are needed. We already handle lists of 100.000 items for some bots.

If this is global, watchlist_items can be T126641, we fix 2 RFC at once. (yes, I know it doesn't cover *ALL* of this nor all of that). The more people working on similar features, the better, isn't it?

These global lists can be on x1 now and future needs (integration with local watchlists, potentially allowing several user watchlists, etc) can be done later.

Scott_WUaS subscribed.Jul 28 2016, 2:32 PM

• Mholloway subscribed.Jul 28 2016, 3:54 PM

• RobLa-WMF mentioned this in T146749: Consider using Phabricator Calendar events to schedule Wikimedia Developer Summit sessions.Oct 6 2016, 5:57 AM

daniel renamed this event from ArchCom RFC Meeting W30: authenticated key-value store (2016-07-27, #wikimedia-office) to ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office).Nov 21 2016, 6:11 PM

daniel changed the host of this event from • RobLa-WMF to daniel.

daniel invited: ; uninvited: .

daniel updated the event description. (Show Details)

daniel updated the event description. (Show Details)Dec 9 2016, 7:42 AM

daniel renamed this event from ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office) to ArchCom RFC Meeting W30: authenticated key-value store (2016-07-27, #wikimedia-office).

ArchCom RFC Meeting W30: authenticated key-value store (2016-07-27, #wikimedia-office)ActivePublicActions