U Å9%eþ6ã@sÆUddlZddlZddlmZddlmZmZmZmZm Z m Z mZmZm Z ddlZddlmZddlmZddlmZddlmZgZe eed<e e¡ZGdd „d ejƒZe eed œdd„ZdS) éN)Údeepcopy) ÚAnyÚCallableÚ CollectionÚDictÚListÚMappingÚOptionalÚUnionÚoverload)Úoptim)Ú ShardedTensor)ÚFullyShardedDataParallelÚ__all__c@sJeZdZdZdeeeeje ffe jee eeefeejddœdd„Zdd„Zeeefdœd d „Zed dddœd d„ƒZeegefedœdd„ƒZd!eegefeedœdd„Zeeejefdœdd„ƒZeeefddœdd„Zeeefddœdd„Zddœdd„Zeeefdœdd„Zeeefdœdd„ZdS)"Ú_NamedOptimizeraë ``_NamedOptimizer`` takes a dict of parameters and exposes ``state_dict`` by parameter key. We replace the original key (number) in an optim to the fully qualified name (FQN) string. User can initialize the optim as they initialize a PyTorch optim, the only difference is that they also need to pass in the FQN of each parameters. Args: named_parameters (Mapping[str, Union[torch.Tensor, ShardedTensor]]): Mapping from FQN to parameter. optimizer_class (optim.Optimizer): The class of optimizer to instantiate. param_groups (Collection[Mapping[str, Any]]): `param_groups` to pass to optimizer if specified. The key of the inner map needs to be FQNs. Default: None module (nn.Module): the module whose parameters to updated by the optimizer. args: arguments to pass to the optimizer constructor. kwargs: arguments to pass to the optimizer constructor. Example:: >>> # xdoctest: +SKIP("distributed") >>> from torch import optim >>> from torch.distributed.optim import _NamedOptimizer >>> >>> # Define the named optimizer. >>> m = Model(...) >>> named_optim = _NamedOptimizer(m.named_parameters(), optim.SGD) >>> # Forward pass + backward pass. >>> named_optim.step() >>> ... >>> # Call state_dict for the named optimizer returns a FQN state_dict. >>> named_optim.state_dict() Warning: This API is still in development and subject to change. TODO: Add tutorial for _NamedOptimizer. TODO: Add documentation in the docstring for the public attributes like self.param_groups and self.named_parameters. N)Únamed_parametersÚoptimizer_classÚparam_groupsÚmoduleÚreturncOsàtj d¡||_| ¡t|ƒ|_|dkr6|j ¡n|}||f|ž|Ž|_||_ |dkrlt |j ¡ƒ|_nft d¡dd„|j ¡Dƒ}g} |D]8} | dD]*}||krºtd|›dƒ‚| ||¡qžq’| |_|jj|_dS)Nz'torch.distributed.optim._NamedOptimizerzvSince we pass in param_groups, we will use param_groups to initialize the optimizer, not all parameters of the module.cSsi|]\}}||“qS©r©Ú.0ÚkeyÚparamrrúf/var/www/html/Darija-Ai-API/env/lib/python3.8/site-packages/torch/distributed/optim/named_optimizer.pyÚ [sz,_NamedOptimizer.__init__..ÚparamszExpect param name z% found in param group but is missing.)ÚtorchZ_CZ_log_api_usage_oncerÚ_param_groups_checkÚdictrÚvaluesÚ _optimizerrÚlistÚkeysÚordered_param_keysÚwarningsÚwarnÚitemsÚ ValueErrorÚappend)ÚselfrrrrÚargsÚkwargsZparams_for_optimizerÚparam_to_keyr%ÚgrouprrrrÚ__init__>s< ÿÿþýÿ ÿz_NamedOptimizer.__init__cCsŽ|jdk rŠ|jD]x}t|tƒs&tdƒ‚d|ks6tdƒ‚|d}t|tjƒrP|g}t|ƒ}|D]"}t|tjƒs\tdt |¡ƒ‚q\||d<qdS)Núparam group must be a dictrz#param group must contain key paramsz>optimizer can only optimize Tensors, but one of the params is ) rÚ isinstancer ÚAssertionErrorrÚTensorr#Ú TypeErrorÚtypename)r+Úparam_grouprrrrrrhs ÿÿz#_NamedOptimizer._param_groups_check)rcs¨ˆj ¡}|d}‡fdd„|d ¡Dƒ}g}|D]b}g}|dD]}| ˆj|¡qDdt|ƒi}| ¡D]\} } | dkrnt| ƒ|| <qn| |¡q4ˆ ||dœ¡S)z¬ Return the ``state_dict`` of the optimizer. Instead of using number to index parameters, we will use module fully qualified name (FQN) as the key. rcsi|]\}}ˆj||“qSr)r%)rZst_keyÚ state_val©r+rrrsÿz._NamedOptimizer.state_dict..Ústater)r:r)r"Ú state_dictr(r*r%ÚsortedrÚ_post_state_dict)r+r;rZ ret_stateZ ret_groupsr/Ú param_keysrZ ret_groupÚkÚvrr9rr;ys þz_NamedOptimizer.state_dict.)ÚclosurercCsdS©Nr©r+rArrrÚstep“sz_NamedOptimizer.stepcCsdSrBrrCrrrrD—scCs|jj|dS)z“ Performs a single optimization step. This will call :meth:`torch.optim.Optimizer.step` on the wrapped optimizer. ©rA)r"rDrCrrrrD›scCs|jjSrB)r"r:r9rrrr:¤sz_NamedOptimizer.state)r;rcCsò|j ¡}| |¡}|d}|d}t|ƒdkr8tdƒ‚t|jƒD]l\}}|| ¡krZqBt||ƒt||ƒkrœtdt||ƒ›d|›dt||ƒ›ƒ‚|| ¡D]\}}|||krÔtd|›d|›dƒ‚|||} t |t ƒrnt | t ƒsút‚t| ¡ƒ} t| ¡ƒ}| |krZ new_group_mapZ new_groupZ group_keyZ src_groupr?rrrrN¨sˆ ÿ$ÿÿ ÿÿ ÿ ÿ z_NamedOptimizer.load_state_dict)r7rcCsšt|tƒstdƒ‚|d}t|tjƒr2|g|d<nt|ƒ|d<dd„|j ¡Dƒ}|dD]$}||krntdƒ‚|j ||¡qZ|j |¡|jj |_ dS)zŸ Add a param group to the :class:`_NamedOptimizer` s `param_groups`. Warning: This API is still in development and subject to change. r1rcSsi|]\}}||“qSrrrrrrr!sz3_NamedOptimizer.add_param_group..z%some parameters are not in the moduleN)r2r r3rr4r#rr(r)r%r*r"Úadd_param_groupr)r+r7rr.rrrrrQsz_NamedOptimizer.add_param_groupcCs>|j ¡D]"}|jr t |¡}tj |¡|_q |jdddS)zà Runs a dummy optimizer step, which allows to initialize optimizer state because we do lazy init for most optimizers. This allows doing in-place loading of optimizer state from a checkpoint. NrE) rr!Z requires_gradrZ zeros_likeZautogradÚVariableZgradrD)r+rÚtrrrÚ init_state+s z_NamedOptimizer.init_statecCs&t|jtƒr"tj|j|j|ddS|S)NT)Zis_named_optimizer)r2rÚFSDPZoptim_state_dict_to_loadr"©r+r;rrrrH9sÿz$_NamedOptimizer._pre_load_state_dictcCs"t|jtƒrt |j|j|¡|SrB)r2rrUZoptim_state_dictr"rVrrrr=Bsz _NamedOptimizer._post_state_dict)NN).)N) Ú__name__Ú __module__Ú__qualname__Ú__doc__rÚstrr rr4r rÚ Optimizerr rrÚnnÚModuler0rrr;rrDrÚfloatÚpropertyr:rNrQrTrHr=rrrrrs0.ûø*" k r)r>rcCsd t|ƒ¡S)zQ Concatenate all param keys as a unique indentifier for one param group. rF)Újoinr<)r>rrrrMJsrM) Úloggingr&ÚcopyrÚtypingrrrrrrr r rrZtorch.nnr]rZ'torch.distributed._shard.sharded_tensorr Ztorch.distributed.fsdprrUrr[Ú__annotations__Ú getLoggerrWÚloggerr\rrMrrrrÚs, 9