羽山数据-合规、权威、安全,数据科技赋能产业升级。羽山数据践行数据要素市场化合规流通,为金融、保险、人事、安防、互联网等行业提供企业数字化解决方案。

slider
New
  • 大数据与人工智能助力虚假消息治理

    发布时间: 2021-05-19

    假新闻和假消息已经对全球各地的信息完整性造成了严重威胁,并且带来了针对个人、群体和政府的信任危机。不论是新闻报道、图像还是视频和备忘录,我们每天都被虚假信息所淹没。 

    假新闻并不是一个新问题。然而,社交媒体的普及,加上人工智能生成内容这样的新兴科技,为这个问题增加了新的维度,并大大放大了该问题,导致假新闻泛滥。仅靠人工核查显然已经无法跟上流通于各大平台的大量错误信息的步伐。因此,许多公司和组织已经转向人工智能以寻求有效的解决方案,通过AI大规模地探查删除不实内容,但这一方案同样充满挑战。

    诸如单词模式、语法结构和可读性特征等语言线索需要被建模分析来区分人类和机器生成的内容。需要最先进的自然语言处理(NLP)技术来表示单词和文档,以有效地捕捉单词的上下文含义。此外,知识图谱算法也被用来更好地模拟文本内容之间的相互作用,并将文件中的基本主题表示为更高层次的抽象概念。 

    在视觉内容方面,照片编辑和视频处理工具的进步使得图片和视频造假变得非常容易。然而,大规模自动识别处理过的视频和照片有很大挑战性,而且成本很高。它需要尖端的计算基础设施和最先进的计算机视觉、语音识别分析,在各个层面对视频照片进行全面建模。

    对抗假新闻多媒体的生成和传播需要先进的人工智能模型,它能有效地进行合成多媒体检测以及生成。该类型的人工智能的自我学习则需要大规模的多媒体素材和尖端的计算能力来改善人工智能对视频图片内容的理解验证以及其自动解决方案。

    然而,最近已经取得了重要的进展,可以缓解其中的一些挑战。

    大数据处理和采样方面的进展提供了巧妙而可靠的方法来提取较小但具有代表性的数据样本,这些样本包含了人工智能提升洞察力所需的所有关键模式和信号,但计算需求却大大降低。 

    模型压缩和知识提炼策略表明,保持与原始模型相同准确性的前提下,人工智能模型的复杂性、规模和推理成本也可以大大降低。 

    这些突破,加上机器学习技术,如少量学习,大量减少了云基础设施上的计算引擎成本,从而使基于人工智能的大数据分析在解决现实问题时可以承受。 

    然而,人工智能的能力依然是有限的。最准确的人工智能模型也只能通过人类智慧和专业知识的强化和训练来维持。虽然人工智能在提取和识别假新闻方面是可靠的,但它需要与人类分析师和领域专家配对,以将识别结果转化为便于解释和操作的内容。 

    此外,减轻错误和虚假信息的病毒式传播所造成的风险和损失,也需要及时、主动的反向措施,如传播可信的、经过验证的信息和新闻,并对错误/虚假新闻叙述的不同方面进行分析报告,如发布人和活动的起源。这只有在扩展的(人类+人工智能)智能中才有可能实现,它可以最佳地利用大数据、人工智能、人和高级计算的力量。 

    人类和人工智能都对虚假消息和假新闻的问题负有同等责任。我们需要改变人类行为,以适应我们作为信息消费者的新角色。我们需要在认识到信息真实性的重要性的同时,认识到我们的信息需求。这是一个渐进的过程,但在此之前,人工智能可以减少风险,并作为变革的催化剂。


    原文:

    Fake news and disinformation have become a global threat for information integrity and are driving distrust towards individuals, communities and governments worldwide. We are overwhelmed with disinformation on a daily basis through news reports, images, videos, and memes. 

    Twisting facts to further an agenda is not a new problem. However, the explosive growth of social media, combined with the emerging power of artificial intelligence to generate content, has added new dimensions to the problem and greatly magnified it, resulting in the current “fake news” epidemic and information crisis.

    It’s clear that human fact checkers working by themselves cannot keep pace with the sheer volume of misinformation being shared every day. Many have therefore turned to advanced artificial intelligence for effective solutions to combat problematic content at-scale, but this is not without its own challenges.  

    Linguistic cues such as word patterns, syntax constructs, and readability features need to be modeled to reliably discriminate between human and machine-generated content. State-of-the-art natural language processing (NLP) techniques are required for representing words and documents to effectively capture the contextual meaning of words.

    Furthermore, knowledge graphs and advanced graph NLP algorithms are required to better model the interplay between different aspects of a textual content and also represent the underlying themes in the document onto higher level abstractions. 

    In the case of visual content, advances in photo editing and video manipulation tools have made it significantly easier to create fake imagery and videos. However, automatic identification of manipulated visual content at scale is challenging and computationally expensive. It requires cutting edge compute infrastructure and implementation of state-of-the-art computer vision, speech recognition and multimedia analysis to comprehensively model the visual artifacts at various levels to understand numerous aspects such as pixel and region level inconsistencies, plagiarism, splicing, and spectrogram analytics.

    In addition, the popularity of generative adversarial networks (GANs), and the high accessibility of tools that implement them have accelerated efforts to significantly generate deceptive multimedia that mimics the verbal and physiological actions of individuals. 

    Countering deceptive multimedia generation and spread requires advanced AI models that are effective at synthetic multimedia detection as well as generation. The self learning side of this type of AI, through consistent re-training, requires massive scale multimedia and cutting edge compute power to improve the automated solutions for visual content understanding and verification.

    However, important recent advances have been made which can alleviate some of these challenges.

    Advances in big data processing and sampling offer clever and reliable ways to extract smaller, yet representative data samples that encompass all the critical patterns and signals required for the AI to extract powerful insights, but with far reduced computational demands. 

    Model compression and knowledge distillation strategies have shown that the AI model complexity, size and inference costs can also be significantly reduced whilst retaining the same level of accuracy as the original model. 

    These breakthroughs, along with machine learning techniques such as few-shot learning, have massively reduced the compute engine costs on cloud infrastructures, thereby making AI based big data analytics affordable for solving real world problems such as misinformation. 

    However, AI on its own can only do so much. The most accurate AI models can only be maintained through reinforcement and training by human intelligence and expertise. And while AI is reliable to extract advanced insights about misinformation, it needs to be paired with human analysts and domain experts -human-in-the-loop-AI – to transform the insights to be highly interpretable and actionable. 

    In addition, mitigating the risks and damages caused by the viral spread of mis and disinformation requires enforcement of timely, proactive countermeasures such as dissemination of credible, verified information and analytical reporting into different aspects of a mis/disinformation narrative e.g. key actors and campaign origins. This is only possible with extended (human + AI) intelligence that can optimally harness the power of big data, human-in-the-loop-AI, and advanced computing. 

    Humans and AI are both equally responsible for the problem of misinformation. To solve it, we need to change human behavior to suit our new roles as big-information consumers. We need to realize the importance of information authenticity along with our information needs. This is a gradual process but until then, AI can lessen the risks and act as a catalyst for change.


    原文作者:Dr. Anil Bandhakavi

    本文转载自:insidebigdata insidebigdata.com

    原文地址:https://insidebigdata.com/2021/05/13/how-can-big-data-and-ai-help-to-tackle-fake-news-and-misdisinformation/

    -

  • 1 - 1
note

本专栏搜集引用互联网上公开发表的数据服务行业精选文章,博采众长,兼收並蓄。引用文章仅代表作者观点,不代表羽山数据官方立场。

如有侵权、违规及其他不当言论内容,请广大读者监督,一经证实,平台会立即下线。监督电话:400-110-8298