DiffusionGenerativeModelsYimingLiu,YisongLiNVIDIA群内每日免费分享5份+最新资料300T网盘资源+40万份行业报告为您的创业、职场、商业、投资、亲子、网赚、艺术、健身、心理、个人成长……全面赋能!添加微信,备注“入群”立刻免费领取200套知识地图+最新研报收钱文案、增长黑客、产品运营、品牌企划、营销战略、办公软件、会计财务、广告设计、摄影修图、视频剪辑、直播带货、电商运营、投资理财、汽车房产、餐饮烹饪、职场经验、演讲口才、风水命理、心理思维、恋爱情趣、美妆护肤、健身瘦身、格斗搏击、漫画手绘、声乐训练、自媒体打造、效率软件工具、游戏影音……扫码先加好友,以备不时之需行业报告/思维导图/电子书/资讯情报致终身学习者社群致终身学习者社群关注公众号获取更多资料StableDiffusionTrainingOptimization❖Overviewofoptimizationideas❖Perfbreakdown❖DetailsofoptimizationapproachesAgendaTextEncoderImageInformationCreator(Unet+Scheduler)ImageDecoderStableDiffusionTrainingPipeline(v1.4)TextembeddingTrainingdatasetforVAEencoderTraining:TextencoderFP+VAEencoderFP+unetFP&BPPokemonDatasettext_to_imagefinetuneresults"yoda""ironman"StableDiffusionTrainingPipelineNsystimelinemoduleTime(ms)ratiovae.encoderfwd33.1737.60%text_encoderfwd13.9463.20%Unetfwd81.67718.72%Unetbwd125.19728.69%clip_grad_norm32.1247.36%update96.45722.11%ema_unet37.6598.63%other16.1093.69%total436.342100.00%TF32:FP16:StableDiffusionTrainingPipelinebatch_sizeavgtime/iterthroughputmemoryspeedupCommentsBaseline(tf32)80.98658.1098+69479MiB1.0000DefaultconfigofHuggingFacetrainingrecipeBaseline(fp16)80.83759.552071695MiB1.1778DefaultconfigofHuggingFacetrainingrecipedataloader80.698311.456471695MiB1.4127Dataloaderwouldbebottleneckswhenusingmulti-GPUs,DeepSpeedfusedlayer+fusedadam80.631912.660066949MiB1.5611FusedLinear,MLP,LayerNormandAdamflashattn80.442818.067025481MiB2.2278LeveragingFlashAttentionagainstPytorchstandardattentionimplementation482.036323.572772979MiB2.9067Multistream+FusedEMA482.020923.751172929MiB2.9287WeseebubblesbetweeninterscausedbyEMA,whichwillintroducefrequentmallocandfreebehaviorZero2562.330224.032772439MiB2.9634ZeRO2canhe...