@@ -437,8 +437,8 @@ A user joins a teleconference via a web-based video conferencing application at
437
437
her desk since no meeting room in her office is available. During the
438
438
teleconference, she does not wish that her room and people in the background are
439
439
visible. To protect the privacy of the other people and the surroundings, the
440
- application runs a machine learning model such as [[DeepLabv3+]] or
441
- [[MaskR-CNN ]] to semantically split an image into segments and replaces
440
+ application runs a machine learning model such as [[DeepLabv3+]] , [[MaskR-CNN]]
441
+ or [[SegAny ]] to semantically split an image into segments and replaces
442
442
segments that represent other people and background with another picture.
443
443
444
444
### Skeleton Detection ### {#usecase-skeleton-detection}
@@ -490,6 +490,20 @@ For better accessibility, a web-based presentation application provides
490
490
automatic image captioning by running a machine learning model such as
491
491
[[im2txt]] which predicts explanatory words of the presentation slides.
492
492
493
+ ### Text-to-image ### {#usecase-text-to-image}
494
+
495
+ Images are a core part of modern web experiences. An ability to generate images
496
+ based on text input in a privacy-preserving manner enables visual
497
+ personalization and adaptation of web applications and content. For example, a web
498
+ application can use as an input a natural language description on the web page
499
+ or a description provided by the user within a text prompt to produce an
500
+ image matching the text description. This text-to-image use case enabled by
501
+ latent diffusion model architecture [[LDM]] forms the basis for additional
502
+ text-to-image use cases. For example, inpainting where a portion of an existing
503
+ image on the web page is selectively modified using the newly generated content,
504
+ or the converse, outpainting, where an original image is extended beyond its
505
+ original dimensions filling the empty space with generated content.
506
+
493
507
### Machine Translation ### {#usecase-translation}
494
508
495
509
Multiple people from various countries are talking via a web-based real-time
@@ -520,6 +534,29 @@ noise suppression using Recurrent Neural Network such as [[RNNoise]] for
520
534
suppressing background dynamic noise like baby cry or dog barking to improve
521
535
audio experiences in video conferences.
522
536
537
+ ### Speech Recognition ### {#usecase-speech-recognition}
538
+
539
+ Speech recognition, also known as speech to text, enables recognition and
540
+ translation of spoken language into text. Example applications of speech
541
+ recognition include transcription, automatic translation, multimodal interaction,
542
+ real-time captioning and virtual assistants. Speech recognition improves
543
+ accessibility of auditory content and makes it possible to interact with such
544
+ content in a privacy-preserving manner in a textual form. Examples of common
545
+ use cases include watching videos or participating in online meetings using
546
+ real-time captioning. Models such as [[Whisper]] approach humans in their accuracy
547
+ and robustness and are well positioned to improve accessibility of such use cases.
548
+
549
+ ### Text Generation ### {#usecase-text-generation}
550
+
551
+ Various text generation use cases are enabled by large language models (LLM) that
552
+ are able to perform tasks where a general ability to predict the next item
553
+ in a text sequence is required. This class of models can translate texts, answer
554
+ questions based on a text input, summarize a larger body of text, or generate
555
+ text output based on a textual input. LLMs enable better performance compared to
556
+ older models based on RNN, CNN, or LSTM architectures and further improve the
557
+ performance of many other use cases discussed in this section.
558
+ Examples of LLMs include [[t5-small]] , [[m2m100_418M]] , [[gpt2]] , and [[llama-2-7b]] .
559
+
523
560
### Detecting fake video ### {#usecase-detecting-fake-video}
524
561
525
562
A user is exposed to realistic fake videos generated by ‘deepfake’ on the web.
@@ -6524,6 +6561,25 @@ Thanks to Dwayne Robinson for his work investigating and providing recommendatio
6524
6561
],
6525
6562
"date": "January 2018"
6526
6563
},
6564
+ "SegAny": {
6565
+ "href": "https://arxiv.org/abs/2304.02643",
6566
+ "title": "Segment Anything",
6567
+ "authors": [
6568
+ "Alexander Kirillov",
6569
+ "Alex Berg",
6570
+ "Chloe Rolland",
6571
+ "Eric Mintun",
6572
+ "Hanzi Mao",
6573
+ "Laura Gustafson",
6574
+ "Nikhila Ravi",
6575
+ "Piotr Dollar",
6576
+ "Ross Girshick",
6577
+ "Spencer Whitehead",
6578
+ "Wan-Yen Lo",
6579
+ "Tete Xiao"
6580
+ ],
6581
+ "date": "April 2023"
6582
+ },
6527
6583
"PoseNet": {
6528
6584
"href": "https://medium.com/tensorflow/real-time-human-pose-estimation-in-the-browser-with-tensorflow-js-7dd0bc881cd5",
6529
6585
"title": "Real-time Human Pose Estimation in the Browser with TensorFlow.js",
@@ -6601,6 +6657,18 @@ Thanks to Dwayne Robinson for his work investigating and providing recommendatio
6601
6657
],
6602
6658
"date": "September 2016"
6603
6659
},
6660
+ "LDM": {
6661
+ "href": "https://arxiv.org/abs/2112.10752",
6662
+ "title": "High-Resolution Image Synthesis with Latent Diffusion Models",
6663
+ "authors": [
6664
+ "Robin Rombach",
6665
+ "Andreas Blattmann",
6666
+ "Dominik Lorenz",
6667
+ "Patrick Esser",
6668
+ "Björn Ommer"
6669
+ ],
6670
+ "date": "April 2022"
6671
+ },
6604
6672
"GNMT": {
6605
6673
"href": "https://github.com/tensorflow/nmt",
6606
6674
"title": "Neural Machine Translation (seq2seq) Tutorial",
@@ -6674,6 +6742,19 @@ Thanks to Dwayne Robinson for his work investigating and providing recommendatio
6674
6742
],
6675
6743
"date": "September 2017"
6676
6744
},
6745
+ "Whisper": {
6746
+ "href": "https://arxiv.org/abs/2212.04356",
6747
+ "title": "Robust Speech Recognition via Large-Scale Weak Supervision",
6748
+ "authors": [
6749
+ "Alec Radford",
6750
+ "Jong Wook Kim",
6751
+ "Tao Xu",
6752
+ "Greg Brockman",
6753
+ "Christine McLeavey",
6754
+ "Ilya Sutskever"
6755
+ ],
6756
+ "date": "December 2022"
6757
+ },
6677
6758
"GRU": {
6678
6759
"href": "https://arxiv.org/pdf/1406.1078.pdf",
6679
6760
"title": "Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation",
@@ -6766,12 +6847,133 @@ Thanks to Dwayne Robinson for his work investigating and providing recommendatio
6766
6847
],
6767
6848
"date": "November 2019"
6768
6849
},
6769
- "POWERFUL-FEATURES": {
6770
- "href": "https://w3c.github.io/webappsec-secure-contexts/",
6771
- "title": "Secure Contexts",
6850
+ "t5-small": {
6851
+ "href": "https://jmlr.org/papers/volume21/20-074/20-074.pdf",
6852
+ "title": "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer",
6853
+ "authors": [
6854
+ "Colin Raffel",
6855
+ "Noam Shazeer",
6856
+ "Adam Roberts",
6857
+ "Katherine Lee",
6858
+ "Sharan Narang",
6859
+ "Michael Matena",
6860
+ "Yanqi Zhou",
6861
+ "Wei Li",
6862
+ "Peter J. Liu"
6863
+ ],
6864
+ "date": "June 2020"
6865
+ },
6866
+ "m2m100_418M": {
6867
+ "href": "https://arxiv.org/abs/2010.11125",
6868
+ "title": "Beyond English-Centric Multilingual Machine Translation",
6772
6869
"authors": [
6773
- "Mike West"
6774
- ]
6870
+ "Angela Fan",
6871
+ "Shruti Bhosale",
6872
+ "Holger Schwenk",
6873
+ "Zhiyi Ma",
6874
+ "Ahmed El-Kishky",
6875
+ "Siddharth Goyal",
6876
+ "Mandeep Baines",
6877
+ "Onur Celebi",
6878
+ "Guillaume Wenzek",
6879
+ "Vishrav Chaudhary",
6880
+ "Naman Goyal",
6881
+ "Tom Birch",
6882
+ "Vitaliy Liptchinsky",
6883
+ "Sergey Edunov",
6884
+ "Edouard Grave",
6885
+ "Michael Auli",
6886
+ "Armand Joulin"
6887
+ ],
6888
+ "date": "October 2020"
6889
+ },
6890
+ "gpt2": {
6891
+ "href": "https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf",
6892
+ "title": "Language Models are Unsupervised Multitask Learners",
6893
+ "authors": [
6894
+ "Alec Radford",
6895
+ "Jeffrey Wu",
6896
+ "Rewon Child",
6897
+ "David Luan",
6898
+ "Dario Amodei",
6899
+ "Ilya Sutskever"
6900
+ ],
6901
+ "date": "February 2019"
6902
+ },
6903
+ "llama-2-7b": {
6904
+ "href": "https://arxiv.org/abs/2307.09288",
6905
+ "title": "Llama 2: Open Foundation and Fine-Tuned Chat Models",
6906
+ "authors": [
6907
+ "Hugo Touvron",
6908
+ "Louis Martin",
6909
+ "Kevin Stone",
6910
+ "Peter Albert",
6911
+ "Amjad Almahairi",
6912
+ "Yasmine Babaei",
6913
+ "Nikolay Bashlykov",
6914
+ "Soumya Batra",
6915
+ "Prajjwal Bhargava",
6916
+ "Shruti Bhosale",
6917
+ "Dan Bikel",
6918
+ "Lukas Blecher",
6919
+ "Cristian Canton Ferrer",
6920
+ "Moya Chen",
6921
+ "Guillem Cucurull",
6922
+ "David Esiobu",
6923
+ "Jude Fernandes",
6924
+ "Jeremy Fu",
6925
+ "Wenyin Fu",
6926
+ "Brian Fuller",
6927
+ "Cynthia Gao",
6928
+ "Vedanuj Goswami",
6929
+ "Naman Goyal",
6930
+ "Anthony Hartshorn",
6931
+ "Saghar Hosseini",
6932
+ "Rui Hou",
6933
+ "Hakan Inan",
6934
+ "Marcin Kardas",
6935
+ "Viktor Kerkez",
6936
+ "Madian Khabsa",
6937
+ "Isabel Kloumann",
6938
+ "Artem Korenev",
6939
+ "Punit Singh Koura",
6940
+ "Marie-Anne Lachaux",
6941
+ "Thibaut Lavril",
6942
+ "Jenya Lee",
6943
+ "Diana Liskovich",
6944
+ "Yinghai Lu",
6945
+ "Yuning Mao",
6946
+ "Xavier Martinet",
6947
+ "Todor Mihaylov",
6948
+ "Pushkar Mishra",
6949
+ "Igor Molybog",
6950
+ "Yixin Nie",
6951
+ "Andrew Poulton",
6952
+ "Jeremy Reizenstein",
6953
+ "Rashi Rungta",
6954
+ "Kalyan Saladi",
6955
+ "Alan Schelten",
6956
+ "Ruan Silva",
6957
+ "Eric Michael Smith",
6958
+ "Ranjan Subramanian",
6959
+ "Xiaoqing Ellen Tan",
6960
+ "Binh Tang",
6961
+ "Ross Taylor",
6962
+ "Adina Williams",
6963
+ "Jian Xiang Kuan",
6964
+ "Puxin Xu",
6965
+ "Zheng Yan",
6966
+ "Iliyan Zarov",
6967
+ "Yuchen Zhang",
6968
+ "Angela Fan",
6969
+ "Melanie Kambadur",
6970
+ "Sharan Narang",
6971
+ "Aurelien Rodriguez",
6972
+ "Robert Stojnic",
6973
+ "Sergey Edunov",
6974
+ "Thomas Scialom"
6975
+ ],
6976
+ "date": "July 2023"
6775
6977
}
6776
6978
}
6777
6979
</pre>
0 commit comments