I-Generative Data Intelligence

Faka amamodeli olimi amakhulu ekesi lokusebenzisa i-healthtech ku-Amazon SageMaker | Izinsizakalo Zewebhu ze-Amazon

Usuku:

Ngo 2021, the imboni yezemithi yenze amaRandi ayizigidi eziyizinkulungwane ezingama-550 emalini engenayo yase-US. Izinkampani ezenza imithi zithengisa izinhlobonhlobo zemithi ehlukene, ngokuvamile eyinoveli emakethe, lapho ngezinye izikhathi kungase kwenzeke izenzakalo ezingahlosiwe kodwa ezimbi kakhulu.

Le micimbi ingabikwa noma yikuphi, ezibhedlela noma ekhaya, futhi kufanele iqashwe ngokuzibophezela nangempumelelo. Ukucutshungulwa okuvamile kwezehlakalo ezimbi kwenziwa kube inselele inani elikhulayo ledatha yezempilo nezindleko. Sekukonke, amaRandi ayizigidi eziyizinkulungwane ezingama-384 kulindeleke ukuthi abe yizindleko zemisebenzi yokuhlolwa kwemithi embonini yezokunakekelwa kwempilo iyonke ngo-2022. Ukuze kusekelwe imisebenzi yokuqapha kwezemithi egcwele yonke indawo, amakhasimende ethu emithi afuna ukusebenzisa amandla okufunda ngomshini (ML) ukuze enze ukutholwa kwezenzakalo ezimbi ngokuzenzakalelayo emithonjeni yedatha eyahlukahlukene. , njengokuphakelayo kwenkundla yezokuxhumana, amakholi wefoni, ama-imeyili, namanothi abhalwe ngesandla, nokucupha izenzo ezifanele.

Kulokhu okuthunyelwe, sibonisa indlela yokuthuthukisa isisombululo esiqhutshwa yi-ML usebenzisa I-Amazon SageMaker ukuze kutholwe izehlakalo ezimbi kusetshenziswa i-Adverse Drug Reaction Dataset etholakala esidlangalaleni ku-Hugging Face. Kulesi sixazululo, sishuna amamodeli ahlukahlukene ku-Hugging Face ayeqeqeshwe kusengaphambili ngedatha yezokwelapha futhi asebenzisa imodeli ye-BioBERT, eyayiqeqeshwe kusengaphambili Isethi yedatha eshicilelwe futhi yenza okuhle kakhulu kulawo azanyiwe.

Senza isixazululo sisebenzisa i- Ikhithi Yokuthuthukisa Amafu ye-AWS (AWS CDK). Kodwa-ke, asifaki imininingwane yokwakha isixazululo kulokhu okuthunyelwe. Ukuze uthole ulwazi olwengeziwe mayelana nokuqaliswa kwalesi sixazululo, bheka Yakha uhlelo lokubamba izehlakalo ezimbi ngesikhathi sangempela usebenzisa i-Amazon SageMaker ne-Amazon QuickSight.

Lokhu okuthunyelwe kucubungula emikhakheni eminingana ebalulekile, kunikeze ukuhlola okuphelele kwezihloko ezilandelayo:

  • Izinselele zedatha okuhlangatshezwane nazo yi-AWS Professional Services
  • Ukuma kwezwe kanye nokusetshenziswa kwamamodeli wezilimi ezinkulu (LLMs):
    • Ama-Transformers, i-BERT, ne-GPT
    • Ubuso Obumbambayo
  • Isixazululo se-LLM esicushwe kahle kanye nezingxenye zaso:
    • Ukulungiswa kwedatha
    • Ukuqeqeshwa kwemodeli

Inselele yedatha

I-data skew ivamise ukuba yinkinga uma iza nemisebenzi yokuhlukanisa. Ngokufanelekile ungathanda ukuba nedathasethi ebhalansi, futhi lesi simo sokusebenzisa sinjalo.

Sibhekana nalo mbuzo i-AI ekhiqizayo amamodeli (i-Falcon-7B ne-Falcon-40B), aye akhuthazwa ukuba akhiqize amasampula omcimbi asekelwe ezibonelweni ezinhlanu ezivela ekuqeqeshweni okusethiwe ukuze kwandiswe ukuhlukahluka kwe-semantic futhi kwandiswe usayizi wesampula wezenzakalo ezimbi ezilebulwe. Kusizuzisa kithi ukusebenzisa amamodeli e-Falcon lapha ngoba, ngokungafani namanye ama-LLM ku-Hugging Face, i-Falcon ikunikeza idathasethi yokuqeqeshwa abayisebenzisayo, ukuze uqiniseke ukuthi asikho isethi yakho yezibonelo zokuhlola eziqukethwe kusethi yokuqeqeshwa kwe-Falcon futhi ugweme idatha. ukungcola.

Enye inselele yedatha yamakhasimende okunakekelwa kwezempilo yizimfuneko zokuthobela i-HIPAA. Ukubethela lapho uphumule futhi usendleleni kufanele kufakwe esixazululweni ukuze kuhlangatshezwane nalezi zidingo.

Ama-Transformers, i-BERT, ne-GPT

I-transformer architecture iyi-neural network architecture esetshenziselwa imisebenzi yokucubungula ulimi lwemvelo (NLP). Kwaqala kwethulwa ephepheni โ€œUkunakwa Yikho Konke Okudingayoโ€ nguVaswani et al. (2017). Isakhiwo se-transformer sisekelwe endleleni yokunaka, evumela imodeli ukuthi ifunde ukuncika okude phakathi kwamagama. Ama-Transformer, njengoba abekwe ephepheni lokuqala, aqukethe izingxenye ezimbili eziyinhloko: isishumeki kanye nesikhiphi khodi. Isifaki khodi sithatha ukulandelana kokokufaka njengokufakiwe futhi sikhiqize ukulandelana kwezimo ezifihliwe. Idekhoda ibe isithatha lezi zimo ezifihliwe njengokufaka bese ikhiqiza ukulandelana kokuphumayo. Indlela yokunaka isetshenziswa kukho kokubili isifaki khodi nesikhikhoda. Indlela yokunaka ivumela imodeli ukuthi inakekele amagama athile ngokulandelana kokufakwayo lapho ikhiqiza ukulandelana kokukhiphayo. Lokhu kuvumela imodeli ukuthi ifunde ukuncika kwebanga elide phakathi kwamagama, okubalulekile emisebenzini eminingi ye-NLP, njengokuhumusha ngomshini nokufingqa kombhalo.

Enye yezinto ezaziwa kakhulu futhi eziwusizo zezakhiwo ze-transformer, I-Bidirectional Encoder Representations evela ku-Transformers (BERT), imodeli yokumelela ulimi yethulwe nge-2018. I-BERT iqeqeshelwa ukulandelana lapho amanye amagama emshweni efihlwe khona, futhi kufanele igcwalise lawo magama ngokucabangela kokubili amagama angaphambi nangemuva kwawo. I-BERT ingalungiselelwa kahle imisebenzi ehlukahlukene ye-NLP, okuhlanganisa ukuphendula imibuzo, ukuqondiswa kolimi lwemvelo, nokuhlaziya imizwa.

Enye i-architecture edumile ye-transformer ethathe umhlaba wonke i-Generative Pre-trained Transformer (GPT). Imodeli yokuqala ye-GPT yaba yethulwe ngo-2018 ngabakwa-OpenAI. Isebenza ngokuqeqeshwa ukubikezela ngokuqinile igama elilandelayo ngokulandelana, yazi kuphela umongo ngaphambi kwegama. Amamodeli e-GPT aqeqeshwa kudathasethi enkulu yombhalo nekhodi, futhi angalungiselelwa uhla lwemisebenzi ye-NLP, okuhlanganisa ukukhiqiza umbhalo, ukuphendula imibuzo, nokufingqa.

Ngokuvamile, i-BERT ingcono emisebenzini edinga ukuqonda okujulile komongo wamagama, kanti I-GPT iyifanele kangcono imisebenzi edinga ukukhiqiza umbhalo.

Ubuso Obumbambayo

I-Hugging Face yinkampani yezobunhloli yokwenziwa egxile kwi-NLP. Ihlinzeka ngenkundla enamathuluzi nezinsiza ezivumela abathuthukisi ukuthi bakhe, baqeqeshe, futhi bakhiphe amamodeli e-ML agxile emisebenzini ye-NLP. Okunye okunikezwayo okubalulekile kwe-Hugging Face ilabhulali yayo, Transformers, okuhlanganisa amamodeli aqeqeshwe kusengaphambili angalungiselelwa kahle imisebenzi yolimi ehlukahlukene njengokuhlukanisa umbhalo, ukuhumusha, ukufingqa, nokuphendula imibuzo.

I-Hugging Face ihlanganisa ngaphandle komthungo ne-SageMaker, okuyisevisi ephethwe ngokugcwele eyenza onjiniyela nososayensi bedatha bakhe, baqeqeshe, futhi bakhiphe amamodeli e-ML esikalini. Lokhu kusebenzisana kuzuzisa abasebenzisi ngokunikeza ingqalasizinda eqinile nenwebekayo yokusingatha imisebenzi ye-NLP ngamamodeli esimanjemanje anikezwa i-Hugging Face, kuhlanganiswe namasevisi anamandla futhi aguquguqukayo e-ML avela ku-AWS. Ungakwazi futhi ukufinyelela amamodeli e-Hugging Face ngokuqondile I-Amazon SageMaker JumpStart, okwenza kube lula ukuqala ngezixazululo ezakhelwe ngaphambilini.

Ukubukwa kwesisombululo

Sisebenzise ilabhulali ye-Hugging Face Transformers ukulungisa kahle amamodeli e-transformer ku-SageMaker ngomsebenzi wokuhlukanisa umcimbi ongathandeki. Umsebenzi wokuqeqesha wakhiwe kusetshenziswa isilinganiso se-SageMaker PyTorch. I-SageMaker JumpStart futhi inokuhlanganiswa okuhambisanayo ne-Hugging Face eyenza kube lula ukuyisebenzisa. Kulesi sigaba, sichaza izinyathelo ezinkulu ezihilelekile ekulungiseleleni idatha nokuqeqeshwa kwamamodeli.

Ukulungiswa kwedatha

Sisebenzise Idatha Yokusabela Kwezidakamizwa (ade_corpus_v2) ngaphakathi kwedathasethi ye-Hugging Face enokwahlukana kokuqeqeshwa/kokuhlola okungu-80/20. Isakhiwo sedatha esidingekayo sokuqeqeshwa kwethu kwemodeli kanye nokusikisela kunamakholomu amabili:

  • Ikholomu eyodwa yokuqukethwe kombhalo njengedatha yokufaka eyimodeli.
  • Enye ikholomu yekilasi lelebula. Sinezigaba ezimbili ezingaba khona zombhalo: Not_AE futhi Adverse_Event.

Ukuqeqeshwa okuyisibonelo nokuhlola

Ukuze sihlole kahle isikhala samamodeli e-Hugging Face okungenzeka sicule kahle idatha yethu ehlanganisiwe yezehlakalo ezimbi, sakhe umsebenzi we-SageMaker hyperparameter optimization (HPO) futhi sadlula ngamamodeli ahlukene e-Hugging Face njenge-hyperparameter, kanye namanye amapharamitha abalulekile. njengosayizi weqoqo lokuqeqesha, ubude bokulandelana, amamodeli, nezinga lokufunda. Imisebenzi yokuqeqesha isebenzise isibonelo esingu-ml.p3dn.24xlarge futhi ithathe isilinganiso semizuzu engama-30 ngomsebenzi ngamunye ngalolo hlobo lwesibonelo. Amamethrikhi okuqeqesha athwetshulwa nakuba I-Amazon SageMaker Experiments ithuluzi, futhi umsebenzi ngamunye wokuqeqesha udlule izinkathi eziyi-10.

Sicacisa okulandelayo kukhodi yethu:

  • Usayizi weqeqebana lokuqeqesha - Inombolo yamasampuli acutshungulwa ndawonye ngaphambi kokuba izisindo zemodeli zibuyekezwe
  • Ukulandelana ubude - Ubude obukhulu bokulandelana kokufaka i-BERT engacubungula
  • Izinga lokufunda - Imodeli ivuselela ngokushesha kangakanani izisindo zayo ngesikhathi sokuqeqeshwa
  • models - Amamodeli aqeqeshwe kusengaphambili obuso beHugging
# we use the Hyperparameter Tuner
from sagemaker.tuner import IntegerParameter,ContinuousParameter, CategoricalParameter
tuning_job_name = 'ade-hpo'
# Define exploration boundaries
hyperparameter_ranges = { 'learning_rate': ContinuousParameter(5e-6,5e-4), 'max_seq_length': CategoricalParameter(['16', '32', '64', '128', '256']), 'train_batch_size': CategoricalParameter(['16', '32', '64', '128', '256']), 'model_name': CategoricalParameter(["emilyalsentzer/Bio_ClinicalBERT", "dmis-lab/biobert-base-cased-v1.2", "monologg/biobert_v1.1_pubmed", "pritamdeka/BioBert-PubMed200kRCT", "saidhr20/pubmed-biobert-text-classification" ])
} # create Optimizer
Optimizer = sagemaker.tuner.HyperparameterTuner( estimator=bert_estimator, hyperparameter_ranges=hyperparameter_ranges, base_tuning_job_name=tuning_job_name, objective_type='Maximize', objective_metric_name='f1', metric_definitions=[ {'Name': 'f1', 'Regex': "f1: ([0-9.]+).*$"}], max_jobs=40, max_parallel_jobs=4,
) Optimizer.fit({'training': inputs_data}, wait=False)

Imiphumela

Imodeli esebenze kahle kakhulu esimweni sethu sokusetshenziswa kwakuyi- monologg/biobert_v1.1_pubmed imodeli esingethwe ku-Hugging Face, okuyinguqulo yezakhiwo ze-BERT esiqeqeshwe kusengaphambili kudathasethi ye-Pubmed, equkethe ukushicilelwa kwesayensi okungu-19,717. Ukuqeqeshwa kwangaphambili kwe-BERT kule dathasethi kunikeza le modeli ubungcweti obengeziwe uma kuziwa ekuhlonzeni umongo mayelana namagama esayensi ahlobene nezokwelapha. Lokhu kuthuthukisa ukusebenza kwemodeli komsebenzi wokuthola umcimbi omubi ngenxa yokuthi iqeqeshwe kusengaphambili ku-syntax ethile yezokwelapha ebonakala kaningi kudathasethi yethu.

Ithebula elilandelayo lifingqa amamethrikhi ethu okuhlaziya.

imodeli Ukwenza kahle hle Khumbula F1
Isisekelo se-BERT 0.87 0.95 0.91
I-BioBert 0.89 0.95 0.92
I-BioBERT ene-HPO 0.89 0.96 0.929
I-BioBERT ene-HPO kanye nomcimbi omubi owenziwe ngokwenziwa 0.90 0.96 0.933

Nakuba lokhu kuyintuthuko encane futhi ekhulayo ngaphezu kwemodeli ye-BERT eyisisekelo, lokhu nokho kubonisa amasu athile asebenzayo okuthuthukisa ukusebenza kwamamodeli ngalezi zindlela. Ukwenziwa kwedatha yokwenziwa nge-Falcon kubonakala kunezithembiso eziningi namandla okuthuthukiswa kokusebenza, ikakhulukazi njengoba lawa mamodeli e-AI akhiqizayo eba ngcono ngokuhamba kwesikhathi.

Hlanza

Ukuze ugweme ukuzitholela izindleko ezizayo, susa noma yiziphi izinsiza ezidalwe njengemodeli namaphoyinti okugcina wemodeli owadalile ngekhodi elandelayo:

# Delete resources
model_predictor.delete_model()
model_predictor.delete_endpoint()

Isiphetho

Izinkampani eziningi zemithi namuhla zingathanda ukwenza ngokuzenzakalelayo inqubo yokuhlonza izehlakalo ezimbi ekusebenzelaneni kwamakhasimende abo ngendlela ehlelekile ukuze zisize ukuthuthukisa ukuphepha kwamakhasimende kanye nemiphumela. Njengoba sibonisile kulokhu okuthunyelwe, i-LLM BioBERT ecushwe kahle enezehlakalo ezimbi ezikhiqizwe ngokuzenzakalelayo ezingezwe kudatha ihlukanisa izehlakalo ezimbi ezinamanani aphezulu e-F1 futhi ingasetshenziswa ukwakha isisombululo esivumelana ne-HIPAA samakhasimende ethu.

Njengenjwayelo, i-AWS iyayemukela impendulo yakho. Sicela ushiye imicabango kanye nemibuzo yakho esigabeni sokuphawula.


Mayelana nababhali

Zack Peterson ungusosayensi wedatha ku-AWS Professional Services. Ube nesandla ekuletheni izixazululo zokufunda ngomshini kumakhasimende iminyaka eminingi futhi uneziqu ze-master in Economics.

UDkt. Adewale Akinfaderin ungusosayensi ophezulu wedatha ku-Healthcare and Life Sciences kwa-AWS. Ubuchwepheshe bakhe busezindleleni ze-AI/ML eziphindaphindekayo neziphelayo, ukusetshenziswa okungokoqobo, nokusiza amakhasimende okunakekelwa kwezempilo emhlabeni wonke ukuthi akhe futhi athuthukise izixazululo ezingabazekayo ezinkingeni zemikhakha eyahlukene. Uneziqu ezimbili zePhysics kanye neziqu zobudokotela kwi-Engineering.

Ekta Walia Bhullar, PhD, ungumeluleki omkhulu we-AI/ML ophikweni lwebhizinisi le-AWS Healthcare and Life Sciences (HCLS) Lwezinsizakalo Zochwepheshe. Unolwazi olunzulu ekusetshenzisweni kwe-AI/ML ngaphakathi kwesizinda sokunakekelwa kwezempilo, ikakhulukazi ku-radiology. Ngaphandle komsebenzi, lapho engaxoxi nge-AI ku-radiology, uthanda ukugijima nokuqwala.

UHan Man unguMphathi Omkhulu Wesayensi Yedatha Nokufunda Ngomshini one-AWS Professional Services ezinze e-San Diego, CA. Uneziqu ze-PhD kwezobunjiniyela azithola eNyuvesi yaseNorthwestern futhi unesipiliyoni seminyaka eminingana njengomeluleki wabaphathi oweluleka amakhasimende kwezokukhiqiza, ezinsizeni zezezimali, namandla. Namuhla, usebenza ngentshiseko namakhasimende abalulekile avela ezinhlobonhlobo zezimboni ukuze athuthukise futhi asebenzise izixazululo ze-ML nezikhiqizayo ze-AI ku-AWS.

indawo_img

Latest Intelligence

indawo_img

Xoxa nathi

Sawubona lapho! Ngingakusiza kanjani?