[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow typechecking on nested TypedDict with union members #17231

Open
julienp opened this issue May 10, 2024 · 7 comments
Open

Slow typechecking on nested TypedDict with union members #17231

julienp opened this issue May 10, 2024 · 7 comments
Labels

Comments

@julienp
Copy link
julienp commented May 10, 2024

Bug Report

For Pulumi we are looking into generating types using TypedDict to model cloud APIs. For example for Kubernetes we have something representing a Deployment.

class DeploymentArgsDict(TypedDict):
  api_version: NotRequired[Input[str]]
  kind: NotRequired[Input[str]]
  metadata: NotRequired[Input['ObjectMetaArgsDict']]
  ...

Pulumi has a notion of inputs and outputs, and the Input type used in the above example looks like this:

Input = Union[T, Awaitable[T], Output[T]]

class Output(Generic[T]):
    pass

Output does a lot things, but for the purposes of this repro all that matters is that its a generic type.

The K8S types can nest pretty deeply, and I suspect a combination of having nested literals along with the Union via the Input type is causing slowness here.

Example:

d: DeploymentArgsDict = {
    "metadata": {
        "name": "nginx",
    },
    "spec": {
        "selector":{
            "match_labels": {}
        },
        "replicas": 1,
        "template": {
            "metadata": {
                "labels": {}
            },
            "spec": {
                "containers": [{
                    "name": "nginx",
                    "image": "nginx"
                }]
            }
        }
    }
}

If I drop Awaitable[T] from the union to reduce it to two members, typechecking completes in 2 seconds. With it present, it takes 40 seconds.

This is a simplified example, and the actual code has another union layered on top. In that case we run out of memory.

To Reproduce

I have created a repro here https://github.com/julienp/typeddict-performance

Expected Behavior

It takes a second or two to typecheck.

Actual Behavior

It takes ~40 seconds on my machine

Your Environment

  • Mypy version used: 1.10
  • Mypy command-line flags: none
  • Mypy configuration options from mypy.ini (and other config files): none
  • Python version used: 3.12.2
@julienp julienp added the bug mypy got something wrong label May 10, 2024
@julienp
Copy link
Author
julienp commented May 13, 2024

Ran some more tests with a larger set of types, and it looks like the issue might be memory related. I am seeing python max on memory on my system, causing heavy swapping, while the process sits at 100% CPU, probably GCing constantly.

@justinvp
Copy link

Any idea what the root cause could be or how we could workaround it, or even help contribute a fix?

We'd like to improve Pulumi's Python SDKs by supporting TypedDict, but this performance issue means we'd have to workaround it for Mypy users, likely by conditionally typing these as untyped dictionaries for Mypy, which is rather unfortunate.

if not MYPY:
    class DeploymentArgsDict(TypedDict):
        api_version: NotRequired[Input[str]]
        kind: NotRequired[Input[str]]
        metadata: NotRequired[Input['ObjectMetaArgsDict']]
        ...
else:
    DeploymentArgsDict: TypeAlias = Mapping[str, Any]

@east825
Copy link
east825 commented Sep 25, 2024

Because of the number of Input unions used in type annotations there and a huge number of TypedDict involved, some resulting fully expanded TypedDicts are humongous. For instance, DeploymentArgsDict depends on declarations of all the following TypedDicts and the resulting complete type contains 27M+ (!) internal types. It has no self-references and recursively defined TypeDicts, though, as far as I can tell. PyCharm inference also suffers from this. I'm wondering how Pyright approaches such TypedDict trees.

DeploymentArgsDict
 ObjectMetaArgsDict
  ManagedFieldsEntryArgsDict
  OwnerReferenceArgsDict
 DeploymentSpecArgsDict
  LabelSelectorArgsDict
   LabelSelectorRequirementArgsDict
  PodTemplateSpecArgsDict
   ObjectMetaArgsDict
    ManagedFieldsEntryArgsDict
    OwnerReferenceArgsDict
   PodSpecArgsDict
    ContainerArgsDict
     EnvVarArgsDict
      EnvVarSourceArgsDict
       ConfigMapKeySelectorArgsDict
       ObjectFieldSelectorArgsDict
       ResourceFieldSelectorArgsDict
       SecretKeySelectorArgsDict
     EnvFromSourceArgsDict
      ConfigMapEnvSourceArgsDict
      SecretEnvSourceArgsDict
     LifecycleArgsDict
      LifecycleHandlerArgsDict
       ExecActionArgsDict
       HTTPGetActionArgsDict
        HTTPHeaderArgsDict
       SleepActionArgsDict
       TCPSocketActionArgsDict
      LifecycleHandlerArgsDict
       ExecActionArgsDict
       HTTPGetActionArgsDict
        HTTPHeaderArgsDict
       SleepActionArgsDict
       TCPSocketActionArgsDict
     ProbeArgsDict
      ExecActionArgsDict
      GRPCActionArgsDict
      HTTPGetActionArgsDict
       HTTPHeaderArgsDict
      TCPSocketActionArgsDict
     ContainerPortArgsDict
     ProbeArgsDict
      ExecActionArgsDict
      GRPCActionArgsDict
      HTTPGetActionArgsDict
       HTTPHeaderArgsDict
      TCPSocketActionArgsDict
     ContainerResizePolicyArgsDict
     ResourceRequirementsArgsDict
      ResourceClaimArgsDict
     SecurityContextArgsDict
      AppArmorProfileArgsDict
      CapabilitiesArgsDict
      SELinuxOptionsArgsDict
      SeccompProfileArgsDict
      WindowsSecurityContextOptionsArgsDict
     ProbeArgsDict
      ExecActionArgsDict
      GRPCActionArgsDict
      HTTPGetActionArgsDict
       HTTPHeaderArgsDict
      TCPSocketActionArgsDict
     VolumeDeviceArgsDict
     VolumeMountArgsDict
    AffinityArgsDict
     NodeAffinityArgsDict
      PreferredSchedulingTermArgsDict
       NodeSelectorTermArgsDict
        NodeSelectorRequirementArgsDict
        NodeSelectorRequirementArgsDict
      NodeSelectorArgsDict
       NodeSelectorTermArgsDict
        NodeSelectorRequirementArgsDict
        NodeSelectorRequirementArgsDict
     PodAffinityArgsDict
      WeightedPodAffinityTermArgsDict
       PodAffinityTermArgsDict
        LabelSelectorArgsDict
         LabelSelectorRequirementArgsDict
        LabelSelectorArgsDict
         LabelSelectorRequirementArgsDict
      PodAffinityTermArgsDict
       LabelSelectorArgsDict
        LabelSelectorRequirementArgsDict
       LabelSelectorArgsDict
        LabelSelectorRequirementArgsDict
     PodAntiAffinityArgsDict
      WeightedPodAffinityTermArgsDict
       PodAffinityTermArgsDict
        LabelSelectorArgsDict
         LabelSelectorRequirementArgsDict
        LabelSelectorArgsDict
         LabelSelectorRequirementArgsDict
      PodAffinityTermArgsDict
       LabelSelectorArgsDict
        LabelSelectorRequirementArgsDict
       LabelSelectorArgsDict
        LabelSelectorRequirementArgsDict
    PodDNSConfigArgsDict
     PodDNSConfigOptionArgsDict
    EphemeralContainerArgsDict
     EnvVarArgsDict
      EnvVarSourceArgsDict
       ConfigMapKeySelectorArgsDict
       ObjectFieldSelectorArgsDict
       ResourceFieldSelectorArgsDict
       SecretKeySelectorArgsDict
     EnvFromSourceArgsDict
      ConfigMapEnvSourceArgsDict
      SecretEnvSourceArgsDict
     LifecycleArgsDict
      LifecycleHandlerArgsDict
       ExecActionArgsDict
       HTTPGetActionArgsDict
        HTTPHeaderArgsDict
       SleepActionArgsDict
       TCPSocketActionArgsDict
      LifecycleHandlerArgsDict
       ExecActionArgsDict
       HTTPGetActionArgsDict
        HTTPHeaderArgsDict
       SleepActionArgsDict
       TCPSocketActionArgsDict
     ProbeArgsDict
      ExecActionArgsDict
      GRPCActionArgsDict
      HTTPGetActionArgsDict
       HTTPHeaderArgsDict
      TCPSocketActionArgsDict
     ContainerPortArgsDict
     ProbeArgsDict
      ExecActionArgsDict
      GRPCActionArgsDict
      HTTPGetActionArgsDict
       HTTPHeaderArgsDict
      TCPSocketActionArgsDict
     ContainerResizePolicyArgsDict
     ResourceRequirementsArgsDict
      ResourceClaimArgsDict
     SecurityContextArgsDict
      AppArmorProfileArgsDict
      CapabilitiesArgsDict
      SELinuxOptionsArgsDict
      SeccompProfileArgsDict
      WindowsSecurityContextOptionsArgsDict
     ProbeArgsDict
      ExecActionArgsDict
      GRPCActionArgsDict
      HTTPGetActionArgsDict
       HTTPHeaderArgsDict
      TCPSocketActionArgsDict
     VolumeDeviceArgsDict
     VolumeMountArgsDict
    HostAliasArgsDict
  DeploymentStrategyArgsDict
   RollingUpdateDeploymentArgsDict
 DeploymentStatusArgsDict
  DeploymentConditionArgsDict

@erictraut
Copy link

I'm wondering how Pyright approaches such TypedDict trees.

Does mypy internally expand all of these TypedDict definitions? If so, I'm curious why. Pyright internally builds one object for each class. There are only 438 of them in the code sample, which isn't that many. Each internal object refers to the other objects as needed. It doesn't do any expansion.

@east825
Copy link
east825 commented Sep 26, 2024

To be clear here, I didn't check how Mypy internally represents such types. In PyCharm, we represent TypedDicts as dict[str, UnionOfTypesOfAllFields] for some type checks, and constructing this union of all field types (recursively), simultaneously expanding type aliases, leads to such combinatoric explosion. But since there are memory problems, I guess the root cause might be somewhat similar.

@JukkaL
Copy link
Collaborator
JukkaL commented Sep 27, 2024

There seem to be some easy improvements we can make to speed up the handling of nested TypedDicts. I don't think there's any deep reason why they'd have to be this slow. I'll look into this -- if it's easy enough, the next mypy release (to be out in a week or two) could include some optimizations.

@JukkaL
Copy link
Collaborator
JukkaL commented Sep 27, 2024

#17842 fixes some bottlenecks.

JukkaL added a commit that referenced this issue Sep 27, 2024
If TypedDict A has multiple items that refer to TypedDict B, don't
duplicate the types representing B during type expansion (or generally
when translating types). If TypedDicts are deeply nested, this could
result in lot of redundant type objects.

Example where this could matter (assume B is a big TypedDict):

```
class B(TypedDict):
    ...

class A(TypedDict):
    a: B
    b: B
    c: B
    ...
    z: B

```

Also deduplicate large unions. It's common to have aliases that are
defined as large unions, and again we want to avoid duplicating these
unions.

This may help with #17231, but this fix may not be sufficient.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants