-
LLMs and generative AI were unavoidable appsec topics this year. Here’s a recap of some relevant articles and associated interviews.
Background
- What Is ChatGPT Doing…and Why Does It Work? — Stephen Wolfram Writings
- What is AI? - MIT Technology Review
- Everyone Is Judging AI by These Tests. But Experts Say They’re Close to Meaningless – The Markup
Prompt injection & manipulating models
- ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs – it was fun to see ASCII art appear as an attack vector
- HiddenLayer Research - Prompt Injection Attacks on LLMs – towards a shared language for describing attack techniques and failure modes
- Challenges in Red Teaming AI Systems - Anthropic
- Exploring Large Language Models: Local LLM CTF & Lab - Bishop Fox – have fun with a CTF
- Prompt Airlines – more fun from Wiz
Finding flaws & augmenting appsec
- GitHub - google/oss-fuzz-gen – leveraging LLMs to guide fuzzers. This is probably one of the most appealing and impactful uses I’ve seen
- No, LLM Agents Cannot Autonomously “Hack” Websites – a practitioner’s observations on recent research, plus this follow-up article
- Project Naptime: Evaluating Offensive Security Capabilities of Large Language Models – promises of potential, but remains quite basic
- Using AI for Offensive Security - CSA – rather high level and has more optimism about models actually reasoning (rather than just being really sophisticated non-deterministic pattern matchers)
- DARPA awards $14 million to semifinal winners of AI code review competition
- Deconstructing the AI Cyber Challenge (AIxCC)
Episode 284 (segment 1)
Caleb Sima demystified some of the hype around AI and pointed out how a lot of its security needs match the mundane maintenance of building software. We didn’t get into defining all the different types of AIs, but we did identify the need for more focus on identity and authenticity in a world where LLMs craft user-like content.
Episode 284 (segment 2)
Keith Hoodlet stopped by to talk about his first-place finish in the DoD’s inaugural AI Bias bug bounty program. He showed how manipulating prompts leads to unintentional and undesired outcomes. Keith also explained how he needed to start fresh in terms of techniques since there’s no deep resources on how to conduct these kinds of tests.
Be sure to check these out for my variants on the “walks into a bar” joke.
The AI conversations continued with Sandy Dunn, who shared how the OWASP Top 10 for LLMs came about and how it continues to evolve. We talked about why this Top 10 has a mix of items specific to LLMs and items that are indistinguishable from securing any other type of software. It reinforced a lot of the ideas that we had talked about with Caleb the week before.
Stuart McClure walked through the implications in trusting AI and LLMs to find flaws and fix code. The fixing part is compelling – as long as that fix preserves the app’s intended behavior. He explains how LLMs combined with agents and RAGs have the potential to assist developers in writing secure code.
Allie Mellen pointed out where elements of LLM might help with reporting and summarizing knowledge, but where they also fall short of basic security practices. LLMs won’t magically create an asset inventory, nor will they have context about your environment or your approach to risk. She also notes where AI has been present for years already – we just call it machine learning as applied to things like fraud detection and behavioral analysis.
Subscribe to ASW to find these episodes and more!
• • • -
October was the month when tales of terror became timely and the days took a fearful turn towards Halloween.
I love Halloween and horror movies. A favorite recent series is “The Edge of Sleep” (which originated as a podcast). The found footage genre is near and dear to my heart, so I also have to recommend “Deadstream” as another recent-ish favorite.
We started a new month with an old friend. Simon Bennetts returned, along with Ori Bendet, to talk about ZAP’s new collaboration with Checkmarx.
We first talked about building ZAP and its community with Simon over a year ago in episode 254. Then he and Mark Curphy stopped by in April to talk about finding sustainable funding for the project. It’s great to see ZAP now have long-term support and, as Simon explained, how that support will create new opportunities for ZAP to expand its features.
Then Kalyani Pawar joined as a new co-host! We celebrated episode 303 by having the three of us talk about striking appsec fear in three words – like, “written in Perl” or “cybersecurity awareness month”…
There was plenty of news to cover, from how many vulns legacy code can hold to how many parsers you can pack into a package. As always, John Kinsella added his insights on secure defaults, isolating resources, and wrangling repos.
Scott Piper shared some advice on how to ratchet up security within an org’s environment, why securing clouds (and creating those guardrails) remains complex, and some tips on tracking down shadow clouds.
Creating guardrails within clouds has become a favored appsec design pattern that increases security without sacrificing development – when they’re done well.
Despite all those clouds, he shed lots of light onto strategies for enacting change that makes secure defaults better for everyone!
Adrian Sanabria stopped by for our almost-Halloween episode.
The two of us talked about some appsec lessons inspired from the slow transition to IPv6, fun hardware hacking stories, and my hypothesis that on a CPU-cycle-per-CPU-cycle basis fuzzing will outshine LLMs for finding flaws.
It was also nice for Adrian to stop by since I’ll be out for a few episodes in November and he’ll be stepping in.
We won’t have to change a thing. Just think of ASW as Adrian Sanabria Weekly…
Subscribe to ASW to find these episodes and more! Also check out the September 2024 recap.
• • • -
September was bookended by news-heavy segments, with some security awareness and bot defenses squeezed in between.
Our first episode of the month gave us a chance to catch up on a backlog of news articles. We talked about the engineering decisions that go into paying down tech debt – particularly when and why. Then some lessons learned in implementing SSO. Refactoring into Rust has been a repeated topic, but this time I used a vuln in Rust-based code to talk about expectations of behavior for an API, and John found an example of refactoring into…OCaml (!?).
Dustin Lehr walked us through why an OWASP Dev Day was canceled and some constructive steps to make outreach and engagement for developers more successful. One thing I’d love to see is more appsec appearances at developer conferences. We also talked about where the impact of security awareness can be most effective, such as targeting architects and frameworks.
Next, David Holmes joined us in a sponsored interview about the interconnected challenges of securing APIs and swatting away bots. We talked about the impacts of both, with a highlight on how bots target where the value lies within an app, why that’s closely related to business logic, and why it’s so important to use threat models to identify weaknesses in business logic. After all, such attacks rarely rely on the obviously unnatural payloads of SQL injection and cross-site scripting.
Technically, the final episode of September was recorded in October, but that feels like the kind of redirect appropriate for an episode number matching an HTTP status code. This time around Farshad Abasi joined me to talk about cars, CUPS, cloud native checklists, and password composition.
Subscribe to ASW to find these episodes and more! Also check out the August 2024 recap.
• • •