Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write path normalization without array allocations #60812

Merged
merged 5 commits into from
Jan 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 106 additions & 6 deletions src/compiler/path.ts
Original file line number Diff line number Diff line change
Expand Up @@ -624,28 +624,128 @@ export function getNormalizedPathComponents(path: string, currentDirectory: stri
}

/** @internal */
export function getNormalizedAbsolutePath(fileName: string, currentDirectory: string | undefined): string {
return getPathFromPathComponents(getNormalizedPathComponents(fileName, currentDirectory));
export function getNormalizedAbsolutePath(path: string, currentDirectory: string | undefined): string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if currentDirectory is almost always normalized.

let rootLength = getRootLength(path);
if (rootLength === 0 && currentDirectory) {
path = combinePaths(currentDirectory, path);
rootLength = getRootLength(path);
}
else {
// combinePaths normalizes slashes, so not necessary in the other branch
path = normalizeSlashes(path);
}

const simpleNormalized = simpleNormalizePath(path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simpleNormalizePath also calls normalizeSlashes on the first line of the function, which is redundant. I'd suggest you remove the line from simpleNormalizePath and just ensure its other callers normalize slashes before calling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, that was my intention, good catch!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmarks I shared already reflect that; I just made a copy/paste error updating this PR.

if (simpleNormalized !== undefined) {
return simpleNormalized.length > rootLength ? removeTrailingDirectorySeparator(simpleNormalized) : simpleNormalized;
}

const length = path.length;
const root = path.substring(0, rootLength);
// `normalized` is only initialized once `path` is determined to be non-normalized
let normalized;
let index = rootLength;
let segmentStart = index;
let normalizedUpTo = index;
let seenNonDotDotSegment = rootLength !== 0;
while (index < length) {
// At beginning of segment
segmentStart = index;
let ch = path.charCodeAt(index);
while (ch === CharacterCodes.slash && index + 1 < length) {
index++;
ch = path.charCodeAt(index);
}
if (index > segmentStart) {
// Seen superfluous separator
normalized ??= path.substring(0, segmentStart - 1);
segmentStart = index;
}
// Past any superfluous separators
let segmentEnd = path.indexOf(directorySeparator, index + 1);
if (segmentEnd === -1) {
segmentEnd = length;
}
const segmentLength = segmentEnd - segmentStart;
if (segmentLength === 1 && path.charCodeAt(index) === CharacterCodes.dot) {
// "." segment (skip)
normalized ??= path.substring(0, normalizedUpTo);
}
else if (segmentLength === 2 && path.charCodeAt(index) === CharacterCodes.dot && path.charCodeAt(index + 1) === CharacterCodes.dot) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: If the previous segmentLength was 1, but the segment was not ., this performs an unnecessary recheck of segmentLength === 2.

// ".." segment
if (!seenNonDotDotSegment) {
if (normalized !== undefined) {
normalized += normalized.length === rootLength ? ".." : "/..";
}
else {
normalizedUpTo = index + 2;
}
}
else if (normalized === undefined) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in this branch it is technically possible for you to avoid extra substrings by adjusting normalizedUpTo instead of initializing normalized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that it matters since you'd still need to allocate a substring when returning.

if (normalizedUpTo - 2 >= 0) {
normalized = path.substring(0, Math.max(rootLength, path.lastIndexOf(directorySeparator, normalizedUpTo - 2)));
}
else {
normalized = path.substring(0, normalizedUpTo);
}
}
else {
const lastSlash = normalized.lastIndexOf(directorySeparator);
if (lastSlash !== -1) {
normalized = normalized.substring(0, Math.max(rootLength, lastSlash));
}
else {
normalized = root;
}
if (normalized.length === rootLength) {
seenNonDotDotSegment = rootLength !== 0;
}
}
}
else if (normalized !== undefined) {
if (normalized.length !== rootLength) {
normalized += directorySeparator;
}
seenNonDotDotSegment = true;
normalized += path.substring(segmentStart, segmentEnd);
}
else {
seenNonDotDotSegment = true;
normalizedUpTo = segmentEnd;
}
index = segmentEnd + 1;
}
return normalized ?? (length > rootLength ? removeTrailingDirectorySeparator(path) : path);
}

/** @internal */
export function normalizePath(path: string): string {
path = normalizeSlashes(path);
let normalized = simpleNormalizePath(path);
if (normalized !== undefined) {
return normalized;
}
normalized = getNormalizedAbsolutePath(path, "");
return normalized && hasTrailingDirectorySeparator(path) ? ensureTrailingDirectorySeparator(normalized) : normalized;
andrewbranch marked this conversation as resolved.
Show resolved Hide resolved
}

function simpleNormalizePath(path: string): string | undefined {
// Most paths don't require normalization
if (!relativePathSegmentRegExp.test(path)) {
return path;
}
// Some paths only require cleanup of `/./` or leading `./`
const simplified = path.replace(/\/\.\//g, "/").replace(/^\.\//, "");
let simplified = path.replace(/\/\.\//g, "/");
if (simplified.startsWith("./")) {
simplified = simplified.slice(2);
}
if (simplified !== path) {
path = simplified;
if (!relativePathSegmentRegExp.test(path)) {
return path;
}
}
// Other paths require full normalization
const normalized = getPathFromPathComponents(reducePathComponents(getPathComponents(path)));
return normalized && hasTrailingDirectorySeparator(path) ? ensureTrailingDirectorySeparator(normalized) : normalized;
return undefined;
}

function getPathWithoutRoot(pathComponents: readonly string[]) {
Expand Down
15 changes: 15 additions & 0 deletions src/testRunner/unittests/paths.ts
Original file line number Diff line number Diff line change
Expand Up @@ -317,9 +317,24 @@ describe("unittests:: core paths", () => {
assert.strictEqual(ts.getNormalizedAbsolutePath("", ""), "");
assert.strictEqual(ts.getNormalizedAbsolutePath(".", ""), "");
assert.strictEqual(ts.getNormalizedAbsolutePath("./", ""), "");
assert.strictEqual(ts.getNormalizedAbsolutePath("./a", ""), "a");
// Strangely, these do not normalize to the empty string.
assert.strictEqual(ts.getNormalizedAbsolutePath("..", ""), "..");
assert.strictEqual(ts.getNormalizedAbsolutePath("../", ""), "..");
assert.strictEqual(ts.getNormalizedAbsolutePath("../..", ""), "../..");
assert.strictEqual(ts.getNormalizedAbsolutePath("../../", ""), "../..");
assert.strictEqual(ts.getNormalizedAbsolutePath("./..", ""), "..");
assert.strictEqual(ts.getNormalizedAbsolutePath("../../a/..", ""), "../..");

// More .. segments
assert.strictEqual(ts.getNormalizedAbsolutePath("src/ts/foo/../../../bar/bar.ts", ""), "bar/bar.ts");
assert.strictEqual(ts.getNormalizedAbsolutePath("src/ts/foo/../../..", ""), "");
// not a real URL root!
assert.strictEqual(ts.getNormalizedAbsolutePath("file:/Users/matb/projects/san/../../../../../../typings/@epic/Core.d.ts", ""), "../typings/@epic/Core.d.ts");
// the root is `file://Users/`
assert.strictEqual(ts.getNormalizedAbsolutePath("file://Users/matb/projects/san/../../../../../../typings/@epic/Core.d.ts", ""), "file://Users/typings/@epic/Core.d.ts");
// this is real
assert.strictEqual(ts.getNormalizedAbsolutePath("file:///Users/matb/projects/san/../../../../../../typings/@epic/Core.d.ts", ""), "file:///typings/@epic/Core.d.ts");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk if we want to preserve specific examples in these, but that might be fine.

Maybe we should have tests on untitled:?


// Interaction between relative paths and currentDirectory.
assert.strictEqual(ts.getNormalizedAbsolutePath("", "/home"), "/home");
Expand Down
Loading