Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swift: New query for Missing Regular Expression Anchor #14639

Merged
merged 18 commits into from
Nov 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions shared/regex/codeql/regex/MissingRegExpAnchor.qll
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@ import HostnameRegexp as HostnameShared
signature module MissingRegExpAnchorSig<
RegexTreeViewSig TreeImpl, HostnameShared::HostnameRegexpSig<TreeImpl> Specific>
{
/**
* Holds if this regular expression is used in a 'replacement' operation, such
* as replacing all matches of the regular expression in the input string
* with another string.
*/
predicate isUsedAsReplace(Specific::RegExpPatternSource pattern);

/** Gets a string representation of an end anchor from a regular expression. */
Expand Down
13 changes: 13 additions & 0 deletions swift/ql/lib/codeql/swift/regex/Regex.qll
Original file line number Diff line number Diff line change
Expand Up @@ -330,6 +330,13 @@ abstract class RegexEval extends CallExpr {
*/
DataFlow::Node getAnOptionsInput() { none() }

/**
* Holds if this regular expression evaluation is a 'replacement' operation,
* such as replacing all matches of the regular expression in the input
* string with another string.
*/
abstract predicate isUsedAsReplace();

/**
* Gets a regular expression value that is evaluated here (if any can be identified).
*/
Expand Down Expand Up @@ -416,6 +423,10 @@ private class AlwaysRegexEval extends RegexEval {
override DataFlow::Node getRegexInputNode() { result = regexInput }

override DataFlow::Node getStringInputNode() { result = stringInput }

override predicate isUsedAsReplace() {
this.getStaticTarget().getName().matches(["replac%", "stringByReplac%", "trim%"])
}
}

/**
Expand Down Expand Up @@ -508,4 +519,6 @@ private class NSStringCompareOptionsRegexEval extends RegexEval instanceof NSStr
override DataFlow::Node getAnOptionsInput() {
result = this.(NSStringCompareOptionsPotentialRegexEval).getAnOptionsInput()
}

override predicate isUsedAsReplace() { this.getStaticTarget().getName().matches("replac%") }
}
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ private import codeql.regex.HostnameRegexp as Shared
/**
* An implementation of the signature that allows the Hostname analysis to run.
*/
private module Impl implements Shared::HostnameRegexpSig<TreeImpl> {
module Impl implements Shared::HostnameRegexpSig<TreeImpl> {
class DataFlowNode = DataFlow::Node;

class RegExpPatternSource = Regex::RegexPatternSource;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
category: newQuery
---

* Added a nw query "Missing regular expression anchor" (`swift/missing-regexp-anchor`) for Swift. This query detects regular expressions without anchors that can be vulnerable to bypassing.
76 changes: 76 additions & 0 deletions swift/ql/src/queries/Security/CWE-020/MissingRegexAnchor.qhelp
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
<!DOCTYPE qhelp PUBLIC
"-//Semmle//qhelp//EN"
"qhelp.dtd">
<qhelp>

<overview>
<p>

Sanitizing untrusted input with regular expressions is a
common technique, but malicious actors may be able to embed one of the
allowed patterns in an unexpected location. To prevent this,
you should use anchors in your regular expressions,
such as <code>^</code> or <code>$</code>.

</p>

<p>

Even if the matching is not done in a security-critical
context, it may still cause undesirable behavior when the regular
expression accidentally matches.

</p>
</overview>

<recommendation>
<p>

Use anchors to ensure that regular expressions match at
the expected locations.

</p>
</recommendation>

<example>

<p>

The following example code attempts to check that a URL redirection
will reach the <code>example.com</code> domain, and not
a malicious site:

</p>

<sample src="MissingRegexAnchorBad.swift"/>

<p>

However, this regular expression check can be easily bypassed,
and a malicious actor could embed
<code>http://www.example.com/</code> in the query
string component of a malicious site. For example,
<code>http://evil-example.net/?x=http://www.example.com/</code>.
Instead, you should use anchors in the regular expression check:

</p>

<sample src="MissingRegexAnchorGood.swift"/>

<p>

If you need to write a regular expression to match
multiple hosts, you should include an anchor for all of the
alternatives. For example, the regular expression
<code>/^www\.example\.com|beta\.example\.com/</code> will match the host
<code>evil.beta.example.com</code>, because the regular expression is parsed
as <code>/(^www\.example\.com)|(beta\.example\.com)/</code>.

</p>
</example>

<references>
<li>OWASP: <a href="https://www.owasp.org/index.php/Server_Side_Request_Forgery">SSRF</a></li>
<li>OWASP: <a href="https://cheatsheetseries.owasp.org/cheatsheets/Unvalidated_Redirects_and_Forwards_Cheat_Sheet.html">XSS Unvalidated Redirects and Forwards Cheat Sheet</a>.</li>
</references>
</qhelp>
45 changes: 45 additions & 0 deletions swift/ql/src/queries/Security/CWE-020/MissingRegexAnchor.ql
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
/**
* @name Missing regular expression anchor
* @description Regular expressions without anchors can be vulnerable to bypassing.
* @kind problem
* @problem.severity warning
* @security-severity 7.8
* @precision high
* @id swift/missing-regexp-anchor
* @tags correctness
* security
* external/cwe/cwe-020
*/

private import swift
private import codeql.swift.dataflow.DataFlow
private import codeql.swift.regex.Regex
private import codeql.swift.regex.RegexTreeView::RegexTreeView as TreeImpl
private import codeql.swift.security.regex.HostnameRegex as HostnameRegex
private import codeql.regex.MissingRegExpAnchor as MissingRegExpAnchor

private module Impl implements
MissingRegExpAnchor::MissingRegExpAnchorSig<TreeImpl, HostnameRegex::Impl>
{
predicate isUsedAsReplace(RegexPatternSource pattern) {
exists(RegexEval eval |
eval.getARegex() = pattern.asExpr() and
eval.isUsedAsReplace()
)
}

string getEndAnchorText() { result = "$" }
}

import MissingRegExpAnchor::Make<TreeImpl, HostnameRegex::Impl, Impl>

from DataFlow::Node node, string msg
where
isUnanchoredHostnameRegExp(node, msg)
or
isSemiAnchoredHostnameRegExp(node, msg)
or
hasMisleadingAnchorPrecedence(node, msg)
or
isLineAnchoredHostnameRegExp(node, msg)
select node, msg
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
func handleUrl(_ urlString: String) {
// get the 'url=' parameter from the URL
let components = URLComponents(string: urlString)
let redirectParam = components?.queryItems?.first(where: { $0.name == "url" })

// check we trust the host
let regex = try Regex(#"https?://www\.example\.com"#) // BAD: the host of `url` may be controlled by an attacker
if let match = redirectParam?.value?.firstMatch(of: regex) {
// ... trust the URL ...
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
func handleUrl(_ urlString: String) {
// get the 'url=' parameter from the URL
let components = URLComponents(string: urlString)
let redirectParam = components?.queryItems?.first(where: { $0.name == "url" })

// check we trust the host
let regex = try Regex(#"^https?://www\.example\.com"#) // GOOD: the host of `url` can not be controlled by an attacker
if let match = redirectParam?.value?.firstMatch(of: regex) {
// ... trust the URL ...
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
| SemiAnchoredRegex.swift:50:16:50:16 | ^a\|b | Misleading operator precedence. The subexpression '^a' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:53:16:53:16 | ^a\|b\|c | Misleading operator precedence. The subexpression '^a' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:59:16:59:16 | ^a\|(b) | Misleading operator precedence. The subexpression '^a' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:61:16:61:16 | ^(a)\|(b) | Misleading operator precedence. The subexpression '^(a)' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:63:16:63:16 | a\|b$ | Misleading operator precedence. The subexpression 'b$' is anchored at the end, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:66:16:66:16 | a\|b\|c$ | Misleading operator precedence. The subexpression 'c$' is anchored at the end, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:72:16:72:16 | (a)\|b$ | Misleading operator precedence. The subexpression 'b$' is anchored at the end, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:74:16:74:16 | (a)\|(b)$ | Misleading operator precedence. The subexpression '(b)$' is anchored at the end, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:76:16:76:16 | ^good.com\|better.com | Misleading operator precedence. The subexpression '^good.com' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:76:16:76:16 | ^good.com\|better.com | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| SemiAnchoredRegex.swift:77:16:77:16 | ^good\\.com\|better\\.com | Misleading operator precedence. The subexpression '^good\\.com' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:77:16:77:16 | ^good\\.com\|better\\.com | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| SemiAnchoredRegex.swift:78:16:78:16 | ^good\\\\.com\|better\\\\.com | Misleading operator precedence. The subexpression '^good\\\\.com' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:79:16:79:16 | ^good\\\\\\.com\|better\\\\\\.com | Misleading operator precedence. The subexpression '^good\\\\\\.com' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:80:16:80:16 | ^good\\\\\\\\.com\|better\\\\\\\\.com | Misleading operator precedence. The subexpression '^good\\\\\\\\.com' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:82:16:82:16 | ^foo\|bar\|baz$ | Misleading operator precedence. The subexpression '^foo' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:82:16:82:16 | ^foo\|bar\|baz$ | Misleading operator precedence. The subexpression 'baz$' is anchored at the end, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:89:16:89:16 | (\\.xxx)\|(\\.yyy)\|(\\.zzz)$ | Misleading operator precedence. The subexpression '(\\.zzz)$' is anchored at the end, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:90:16:90:16 | (^left\|right\|center)\\sbottom$ | Misleading operator precedence. The subexpression '^left' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:91:16:91:16 | \\.xxx\|\\.yyy\|\\.zzz$ | Misleading operator precedence. The subexpression '\\.zzz$' is anchored at the end, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:92:16:92:16 | \\.xxx\|\\.yyy\|\\.zzz$ | Misleading operator precedence. The subexpression '\\.zzz$' is anchored at the end, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:93:16:93:16 | \\.xxx\|\\.yyy\|zzz$ | Misleading operator precedence. The subexpression 'zzz$' is anchored at the end, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:94:16:94:16 | ^([A-Z]\|xxx[XY]$) | Misleading operator precedence. The subexpression 'xxx[XY]$' is anchored at the end, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:95:16:95:16 | ^(xxx yyy zzz)\|(xxx yyy) | Misleading operator precedence. The subexpression '^(xxx yyy zzz)' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:96:16:96:16 | ^(xxx yyy zzz)\|(xxx yyy)\|(1st( xxx)? yyy)\|xxx\|1st | Misleading operator precedence. The subexpression '^(xxx yyy zzz)' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:97:16:97:16 | ^(xxx:)\|(yyy:)\|(zzz:) | Misleading operator precedence. The subexpression '^(xxx:)' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:98:16:98:16 | ^(xxx?:)\|(yyy:zzz\\/) | Misleading operator precedence. The subexpression '^(xxx?:)' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:99:16:99:16 | ^@media\|@page | Misleading operator precedence. The subexpression '^@media' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:100:16:100:16 | ^\\s*(xxx?\|yyy\|zzz):\|xxx:yyy | Misleading operator precedence. The subexpression '^\\s*(xxx?\|yyy\|zzz):' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:101:16:101:16 | ^click\|mouse\|touch | Misleading operator precedence. The subexpression '^click' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:102:16:102:16 | ^http://good\\.com\|http://better\\.com | Misleading operator precedence. The subexpression '^http://good\\.com' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:102:16:102:16 | ^http://good\\.com\|http://better\\.com | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| SemiAnchoredRegex.swift:103:16:103:16 | ^https?://good\\.com\|https?://better\\.com | Misleading operator precedence. The subexpression '^https?://good\\.com' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:103:16:103:16 | ^https?://good\\.com\|https?://better\\.com | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| SemiAnchoredRegex.swift:104:16:104:16 | ^mouse\|touch\|click\|contextmenu\|drop\|dragover\|dragend | Misleading operator precedence. The subexpression '^mouse' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:105:16:105:16 | ^xxx:\|yyy: | Misleading operator precedence. The subexpression '^xxx:' is anchored at the beginning, but the other parts of this regular expression are not |
| SemiAnchoredRegex.swift:106:16:106:16 | _xxx\|_yyy\|_zzz$ | Misleading operator precedence. The subexpression '_zzz$' is anchored at the end, but the other parts of this regular expression are not |
| UnanchoredUrlRegex.swift:62:39:62:39 | https?://good.com | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| UnanchoredUrlRegex.swift:63:39:63:39 | https?://good.com | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| UnanchoredUrlRegex.swift:64:39:64:39 | ^https?://good.com | This hostname pattern may match any domain name, as it is missing a '$' or '/' at the end. |
| UnanchoredUrlRegex.swift:65:39:65:39 | (^https?://good1.com)\|(^https?://good2.com) | This hostname pattern may match any domain name, as it is missing a '$' or '/' at the end. |
| UnanchoredUrlRegex.swift:66:39:66:39 | (https?://good.com)\|(^https?://goodie.com) | This hostname pattern may match any domain name, as it is missing a '$' or '/' at the end. |
| UnanchoredUrlRegex.swift:66:39:66:39 | (https?://good.com)\|(^https?://goodie.com) | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| UnanchoredUrlRegex.swift:68:39:68:39 | https?:\\/\\/good.com | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| UnanchoredUrlRegex.swift:69:39:69:39 | https?://good.com | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| UnanchoredUrlRegex.swift:71:46:71:46 | https?://good.com | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| UnanchoredUrlRegex.swift:78:39:78:39 | https?://good.com | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| UnanchoredUrlRegex.swift:79:39:79:39 | https?://good.com:8080 | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| UnanchoredUrlRegex.swift:82:3:82:3 | https?://good.com | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| UnanchoredUrlRegex.swift:83:3:83:3 | https?:\\/\\/good.com | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| UnanchoredUrlRegex.swift:84:3:84:3 | ^https?://good.com | This hostname pattern may match any domain name, as it is missing a '$' or '/' at the end. |
| UnanchoredUrlRegex.swift:95:39:95:39 | https?:\\/\\/good.com\\/([0-9]+) | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| UnanchoredUrlRegex.swift:101:39:101:39 | example\\.com\|whatever | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| test.swift:56:16:56:16 | ^http://example.com | This hostname pattern may match any domain name, as it is missing a '$' or '/' at the end. |
| test.swift:59:16:59:16 | ^http://test\\.example.com | This hostname pattern may match any domain name, as it is missing a '$' or '/' at the end. |
| test.swift:69:16:69:16 | ^(.+\\.(?:example-a\|example-b)\\.com)/ | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| test.swift:76:16:76:16 | ^(example.dev\|example.com) | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| test.swift:77:16:77:16 | ^protos?://(localhost\|.+.example.net\|.+.example-a.com\|.+.example-b.com\|.+.example.internal) | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| test.swift:81:16:81:16 | ^(foo.example\\.com\|whatever)$ | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
| test.swift:84:16:84:16 | test.example.com | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
queries/Security/CWE-020/MissingRegexAnchor.ql
Loading