[go: up one dir, main page]

Page MenuHomePhabricator

[Data Platform] Install a Prometheus connector for Presto, pointed at thanos-query
Open, MediumPublic

Description

Docs: Prometheus connector for Presto

If we did this, we could make Prometheus timeseries available via the presto cli and wmfdata-python in Jupyter, plus we could make this data source available in Superset for very little extra effort.

We could then make dashboards correlating any of the Data Lake sources and Prometheus metrics, which I think would be quite powerful.

An additional use case that could be very powerful is looking for time intervals when some condition was true in the Prometheus metrics, and then use that to perform detailed queries against webrequest.

Event Timeline

BTullis edited subscribers, added: odimitrijevic, nshahquinn-wmf, JAllemandou and 5 others; removed: Aklapper.

Thanks @CDanis. For reference, there was some initial discussion about this functionality here on Slack.

I agree that this cross-correlation of data sources could be very useful to a number of different stakeholders, not least the SRE team.

The presto catalogs are defined in hieradata, both for the coordinator and worker roles.

So for the test cluster it's here for the coordinator and here for the workers. The prod cluster settings are similarly located.

I can't see any governance nor technical reason why we wouldn't want to add this Prometheus catalog to Presto, but I think that we should still seek approval before carrying out any work.

Perhaps @odimitrijevic would be the person to approve any integration. I've added a few other subscribers to invite their opinions and hopefully make sure that we're not barking up the wrong tree.

Also, I believe that @nshahquinn-wmf knows most about wmfdata-python so may be the person best placed to say whether or not Prometheus support would be easy or difficult to add to this library.

This looks super interesting, moving to radar for when we need to help out.

Adding this to our radar as well, to keep an eye when we start querying.

Gehel triaged this task as Medium priority.Oct 11 2023, 8:40 AM
Gehel moved this task from Incoming to Misc on the Data-Platform-SRE board.

Also, I believe that @nshahquinn-wmf knows most about wmfdata-python so may be the person best placed to say whether or not Prometheus support would be easy or difficult to add to this library.

From the Presto client point of view, is it correct that Prometheus would be accessed exactly the same as the Data Lake as now, just with a different catalog argument? If so, Wmfdata wouldn't need any changes to support that as presto.run already has a catalog argument (it currently defaults to "analytics-hive").

Ahoelzl renamed this task from Install a Prometheus connector for Presto, pointed at thanos-query to [Platform] Install a Prometheus connector for Presto, pointed at thanos-query.Oct 20 2023, 4:56 PM
Ahoelzl renamed this task from [Platform] Install a Prometheus connector for Presto, pointed at thanos-query to [Data Platform] Install a Prometheus connector for Presto, pointed at thanos-query.Oct 20 2023, 5:15 PM

@Ahoelzl why was this moved to "Radar (External Teams)" column? Per @BTullis's post, I think this was awaiting DE approval before DE would work on it...?

@CDanis I believe people currently want this kind of work to be planned more strategically, and to prioritize it appropriately I agree this would be very useful!

I'm working on compiling user stories based on the recent data usage at wmf interviews, in which I see you did call this need out! So, it will get a user story which @VirginiaPoundstone and other PMs will be looking at along with many others.

However, I agree Radar isn't quite the right place for this. I'll move it back to backlog and we can re-groom it.