[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Run3 PromptReco] 390 MB in cut/expression parser #46493

Open
makortel opened this issue Oct 23, 2024 · 17 comments
Open

[Run3 PromptReco] 390 MB in cut/expression parser #46493

makortel opened this issue Oct 23, 2024 · 17 comments

Comments

@makortel
Copy link
Contributor

From #46040 (comment) the cut/expression parser uses 390 MB (constant cost, i.e. gets amortized across threads). The top 10 modules with the largest contributions are

  • 103 MB via ObjectSelectorBase<SingleElementCollectionSelector<edm::View<reco::Muon>, ...> (link)
  • 93 MB via ObjectSelectorBase<SingleElementCollectionSelector<std::vector<pat::GenericParticle, ...> (link)
  • 90 MB via BXVectorSimpleFlatTableProducer<l1t::EGamma> (link)
  • 26 MB via SimpleFlatTableProducerBase<reco::BeamSpot, reco::BeamSpot> (link)
  • 17 MB via TopSingleLeptonDQM (link
  • 12 MB via TopMonitor (link)
  • 11 MB via VersionedIdProducer<edm::Ptr<reco::Photon> (link)
  • 8 MB via RecoTauPiZeroProducer (link)
  • 5 MB via SimpleFlatTableProducer<pat::Jet> (link)
  • 5 MB via SimpleFlatTableProducer<pat::Electron> (link)

Addressing already the top 3 would save about 286 MB.

@cmsbuild
Copy link
Contributor
cmsbuild commented Oct 23, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @makortel.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

assign reconstruction, xpog, dqm

@cmsbuild
Copy link
Contributor

New categories assigned: reconstruction,xpog,dqm

@antoniovagnerini,@ftorrresd,@hqucms,@jfernan2,@mandrenguyen,@nothingface0,@rvenditti,@syuvivida,@tjavaid you have been requested to review this Pull request/Issue and eventually sign? Thanks

@slava77
Copy link
Contributor
slava77 commented Oct 23, 2024

are you rediscovering that the first piece of code calling to the boost::spirit brings most of the cost?

@makortel
Copy link
Contributor Author
  • 103 MB via ObjectSelectorBase<SingleElementCollectionSelector<edm::View<reco::Muon>, ...> (link)

I believe this corresponds to MuonSelector

typedef SingleObjectSelector<edm::View<reco::Muon>, StringCutObjectSelector<reco::Muon>, reco::MuonCollection>
MuonSelector;
DEFINE_FWK_MODULE(MuonSelector);

The configuration defines 25 of them, of which 14 are not constructed in the C++ (i.e. these modules are not part of any Task/Sequence/Path/EndPath that is associated with the Schedule)

ALCARECOSiPixelCalSingleMuonTightGoodMuons
ALCARECOSiPixelCalSingleMuonTightRelCombIsoMuons
ALCARECOTkAlDiMuonGoodMuons
ALCARECOTkAlDiMuonRelCombIsoMuons
ALCARECOTkAlJpsiMuMuGoodMuons (not constructed)
ALCARECOTkAlMuonIsolatedGoodMuons
ALCARECOTkAlMuonIsolatedRelCombIsoMuons
ALCARECOTkAlUpsilonMuMuGoodMuons (not constructed)
ALCARECOTkAlUpsilonMuMuPAGoodMuons (not constructed)
ALCARECOTkAlUpsilonMuMuRelCombIsoMuons (not constructed)
ALCARECOTkAlZMuMuGoodMuons
ALCARECOTkAlZMuMuPAGoodMuons (not constructed)
ALCARECOTkAlZMuMuRelCombIsoMuons
ZmmgLeadingMuons (not constructed)
ZmmgTrailingMuons (not constructed)
gemDQMStaMuons
gemDQMTightGlbMuons
muonSelection (not constructed)
muonSelectorForEMu (not constructed)
muonSelectorForZMM (not constructed)
muonSelectorForZMMPA (not constructed)
muonsPt10
selectedMuons (not constructed)
selectedMuonsIso (not constructed)
tightMuonsForPbPbZMuSkim (not constructed)

Looking next the actual cut strings in the 11 modules that are constructed

ALCARECOSiPixelCalSingleMuonTightGoodMuons isGlobalMuon &isTrackerMuon &numberOfMatches > 1 &globalTrack.hitPattern.numberOfValidMuonHits > 0 &abs(eta) < 2.5 &globalTrack.normalizedChi2 < 20.
ALCARECOSiPixelCalSingleMuonTightRelCombIsoMuons (isolationR03().sumPt + isolationR03().emEt + isolationR03().hadEt)/pt < 0.15
ALCARECOTkAlDiMuonGoodMuons isGlobalMuon &isTrackerMuon &numberOfMatches > 1 &globalTrack.hitPattern.numberOfValidMuonHits > 0 &abs(eta) < 2.5 &globalTrack.normalizedChi2 < 20.
ALCARECOTkAlDiMuonRelCombIsoMuons (isolationR03().sumPt + isolationR03().emEt + isolationR03().hadEt)/pt < 0.15
ALCARECOTkAlMuonIsolatedGoodMuons isGlobalMuon &isTrackerMuon &numberOfMatches > 1 &globalTrack.hitPattern.numberOfValidMuonHits > 0 &abs(eta) < 2.5 &globalTrack.normalizedChi2 < 20.
ALCARECOTkAlMuonIsolatedRelCombIsoMuons (isolationR03().sumPt + isolationR03().emEt + isolationR03().hadEt)/pt < 0.15
ALCARECOTkAlZMuMuGoodMuons isGlobalMuon &isTrackerMuon &numberOfMatches > 1 &globalTrack.hitPattern.numberOfValidMuonHits > 0 &abs(eta) < 2.5 &globalTrack.normalizedChi2 < 20.
ALCARECOTkAlZMuMuRelCombIsoMuons (isolationR03().sumPt + isolationR03().emEt + isolationR03().hadEt)/pt < 0.15
gemDQMStaMuons isStandAloneMuon&& outerTrack.isNonnull
gemDQMTightGlbMuons isGlobalMuon&& globalTrack.isNonnull&& passed(\'CutBasedIdTight\')
muonsPt10 isGlobalMuon &isTrackerMuon &numberOfMatches > 1 &globalTrack.hitPattern.numberOfValidMuonHits > 0 &abs(eta) < 2.5 &pt > 10

There is a lot of repetition: one group of 4 modules and another group of 4 modules have exactly the same cut string.

Would it be feasible to replace these with modules that express the structure of the selection in C++ code instead?

@makortel
Copy link
Contributor Author

are you rediscovering that the first piece of code calling to the boost::spirit brings most of the cost?

Possibly. I had a vague recollection of such "constant cost" of this infrastructure, but it's source wasn't evident from the profile (maybe I just didn't dig deeply enough into Cling's call stack).

I'd guess the StringObjectFunction<l1t::EGamma, false> in BXVectorSimpleFlatTableProducer<l1t::EGamma> would be a good candidate for that "constant cost" as it takes 90 MB, while StringCutObjectSelector<l1t::EGamma, false> in the same module takes 448 B.

There is nevertheless potential for ~200 MB reduction by addressing the "next largest" consumers.

@makortel
Copy link
Contributor Author

assign alca

@cmsbuild
Copy link
Contributor

New categories assigned: alca

@atpathak,@consuegs,@perrotta you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor Author

@cms-sw/alca-l2 By the way, of the 8 ALCARECO* MuonSelectors listed above,

  • ALCARECOSiPixelCalSingleMuonTightGoodMuons, ALCARECOTkAlDiMuonGoodMuons, ALCARECOTkAlMuonIsolatedGoodMuons, ALCARECOTkAlZMuMuGoodMuons have identical configuration
  • ALCARECOSiPixelCalSingleMuonTightRelCombIsoMuons, ALCARECOTkAlDiMuonRelCombIsoMuons, ALCARECOTkAlMuonIsolatedRelCombIsoMuons, ALCARECOTkAlZMuMuRelCombIsoMuons have otherwise the same configuration, except their src parameter refer to the aforementioned 4 modules (i.e. if they would replaced with a common module, these 4 could be replaced with a common module as well)

This duplication has both memory and runtime cost.

@makortel
Copy link
Contributor Author
makortel commented Oct 23, 2024
  • 93 MB via ObjectSelectorBase<SingleElementCollectionSelector<std::vector<pat::GenericParticle, ...> (link)

I believe this one corresponds to PATGenericParticleSelector

typedef SingleObjectSelector<std::vector<GenericParticle>, StringCutObjectSelector<GenericParticle>>
PATGenericParticleSelector;

The configuration contains 2 such modules, of 1 is not constructed because it is not in any Task/Sequence/Path/EndPath associated to the Schedule

looseIsoMuonsForPbPbZMuSkim (not constructed)
looseIsoMuonsForZMuSkim

The cut string is same in both (userIsolation(\'pat::TrackIso\')/pt)<0.4 that looks straightforward to be potentially replaced with a C++-coded selection structure.

Although if this is impacted with the "first call to boost::spirit has a constant cost", it might not be that important (but looks simple nevertheless).

@makortel
Copy link
Contributor Author

assign DPGAnalysis/Skims

@makortel
Copy link
Contributor Author

type performance-improvements

@cmsbuild
Copy link
Contributor

New categories assigned: pdmv

@AdrianoDee,@kskovpen,@miquork,@sunilUIET you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor Author
  • 90 MB via BXVectorSimpleFlatTableProducer<l1t::EGamma> (link)

This is SimpleTriggerL1EGFlatTableProducer

typedef BXVectorSimpleFlatTableProducer<l1t::EGamma> SimpleTriggerL1EGFlatTableProducer;

that uses

    cut = cms.string('pt>=10'),
    ...
            expr = cms.string('eta'),
            ...
            expr = cms.string('hwIso()'),
            ...
            expr = cms.string('phi'),
            ...
            expr = cms.string('pt'),

that look like much simpler mapping from strings to member functions than full expression parser could be feasible.

@makortel
Copy link
Contributor Author
  • 26 MB via SimpleFlatTableProducerBase<reco::BeamSpot, reco::BeamSpot> (link)

I believe this is SimpleBeamspotFlatTableProducer

typedef EventSingletonSimpleFlatTableProducer<reco::BeamSpot> SimpleBeamspotFlatTableProducer;

The configuration has 2 modules of this type, of which only 1 is actually used in the job

simpleBeamspotFlatTableProducer (not constructed)
beamSpotTable

The beamSpotTable accesses

            expr = cms.string('sigmaZ()'),
            ...
            expr = cms.string('sigmaZ0Error()'),
            ...
            expr = cms.string('type()'),
            ...
            expr = cms.string('position().z()'),
            ...
            expr = cms.string('z0Error()'),

that also looks like a much simpler mapping from strings to member functions than the full expression parser could be feasible.

@mmusich
Copy link
Contributor
mmusich commented Oct 31, 2024

concerning #46493 (comment)

Would it be feasible to replace these with modules that express the structure of the selection in C++ code instead?

[...]

This duplication has both memory and runtime cost.

I had a go at it at #46574, comments are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants