Lights, Camera, Algorithm:

Gen AI in the Media Industry

By Graham Webber, BSG Business Consultant

From the printing press to virtual reality, the media industry has consistently driven technological innovation. The advent of generative AI, particularly large language models (LLMs), diffusion models, and multimodal AI, marks a new era of transformation. These technologies are redefining content creation, distribution, and consumption, offering unprecedented opportunities and challenges for media professionals. 

The Art of the Possible

Large Language Models (LLMs) in Content Creation 

Large language models such as OpenAI's GPT series have transformed, and will continue to transform, content creation. These models generate human-like text, helping media companies automate processes that include ideation, generating outlines and scripts. Their understanding of human text opens possibilities previously unattainable for AI, including script translation, synopses generation, character sketches, scene descriptions, and key point tagging. The automation and speed at which these models operate enable personalised content creation. For example, highlighting sub-themes like climate change in a children’s movie to increase engagement among targeted audiences. 

Diffusion Models and Visual Media 

Diffusion models such as DALL·E, are most prominently used for image generation. They are used to create everything from illustrations and animations to complex visual effects in film, television and games. Diffusion models can generate realistic landscapes, characters and other imagery, reducing the need for costly and time-consuming physical sets or manual CGI work in films. This accelerates production timelines and allows for greater creative freedom, as artists can quickly iterate on different visual ideas. The integration of diffusion models into the media also opens new possibilities for personalised content, customised visual experiences for individual viewers. 

Multimodal: The future of AI in Media 

Multimodal AI models work by combining multiple sources of data from different modalities, including text, video, and audio. One of the most exciting applications of multimodal AI is in the realm of interactive storytelling.

For example, a multimodal AI could generate a story with accompanying visuals and sound effects, blurring the line between viewing and playing. This technology has the potential to revolutionise video games, virtual reality experiences, and even traditional media like films and books, making them more interactive and engaging.

Multimodal AI can also enhance content moderation and accessibility, detecting inappropriate content across media types and generating alternative formats, including descriptive audio for images and subtitles for videos, thereby improving inclusivity. 

Back to the Present

Where are we today 

Only a few months ago it may have felt as if we were on the cusp of generally intelligent AI, and media was certainly riding that wave. The novelty is wearing off and expectations are normalising towards reality. For every mind-blowing video of AI generating a breathtaking piece of content there is one showing hilarious failure. A personal favourite is AI generated gymnastics, have not seen it yet, do yourself a favour and head on over to YouTube (after you have finished this article, of course).  

Moving forward from where we stand 

We can still achieve remarkable things with the technology available today; the key is to add structure and guide the process. There are two main approaches to this: 

The first is creating agents with specific tasks and orchestrating them logically so that together they can solve complex problems and allow us to control the system, a basic example is first asking the LLM to create a plan to solve the problem, then asking it to do each step and finally asking it to reflect on the solution.
The second is by adding that structure internally to the generative model either by finetuning or adding control networks.

A great example of this is the YouDream project. YouDream generates well-structured 3D assets from text prompts by first generating a ‘skeleton’ for the 3D asset and using that as a control for the asset to be built around. This type of thinking is likely to become mainstream in video generation as it would help mitigate the impossible movements and additional limbs, we see in AI generated gymnastics. 

Challenges and Ethical Considerations 

Technology has steadily democratised content creation, allowing individuals to achieve production quality and audience engagement once exclusive to major media companies. Platforms like YouTube and Patreon have already empowered creators to distribute their work independently and sustainably.

AI is the latest step forward, enabling them to compete directly with traditional media. As more creators adopt these tools, media companies risk losing market share to independent voices who deliver compelling content directly to audiences, free from traditional media’s overhead and bureaucracy. 

The advantages that generative AI bring are not only amazing, but also growing. However, there are significant challenges that need to be addressed through best practice and good governance. Particularly in terms of ethics and regulation. The potential for AI-generated content to spread misinformation or reinforce biases is a major concern.  

The content generated by these systems raises questions about intellectual property and ownership. As AI systems become more capable of creating content, determining the rights of creators, companies, and the AI itself will become increasingly complex. Legal frameworks need to evolve to address these new challenges, balancing innovation with the protection of human creators. 

Wrapping up

Generative AI, through LLMs, diffusion models, and multimodal systems, is poised to reshape the media industry in profound ways. From automating content creation to enabling new forms of storytelling and enhancing visual production, these technologies offer exciting possibilities.

However, their integration must be approached with care, ensuring that ethical considerations and the human element remain at the forefront of media production. As the industry continues to evolve, the collaboration between human creativity and AI will likely define the future of media, offering audiences richer, more diverse, and more personalised content experiences. 

A PROACTIVE FORCE FOR POSITIVE CHANGE

BSG helps organisations to navigate the complexities of AI strategy formulation and implementation - considering all the relevant lenses to ensure a holistic approach. Get in touch with us if you'd like to unpack this further.