Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache DocBlocks #608

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions src/CachingDocBlockFactory.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
<?php
declare(strict_types=1);

namespace LanguageServer;

use Microsoft\PhpParser\Node;
use phpDocumentor\Reflection\DocBlock;
use phpDocumentor\Reflection\DocBlockFactory;
use phpDocumentor\Reflection\Types;

/**
* Caches DocBlocks by node start position and file URI.
*/
class CachingDocBlockFactory
{
/**
* Maps file + node start positions to DocBlocks.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the pre-tolerant parser version I used to save the docblock instance on the Node itself so they would get naturally get cached and garbage collected when the document gets garbage collected. The CachingDocBlockFactory instance is saved in the DefinitionResolver, which is a long-living singleton-like object. Why not save this map on PhpDocument instances? That way it is not a cache that needs to be cleared on seemingly unrelated events like new messages, it is simply a memoization store to remember lazily computed docblocks. It might actually be just as fast to eagerly compute them all because we need to do that anyway to get the hover contents for the index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a lot of sense. Perhaps the name resolution cache from the Scope thing could also live in PhpDocument then?

*/
private $cache = [];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add an @var annotation


/**
* @var DocBlockFactory
*/
private $docBlockFactory;


public function __construct()
{
$this->docBlockFactory = DocBlockFactory::createInstance();
}

/**
* @return DocBlock|null
*/
public function getDocBlock(Node $node)
{
$cacheKey = $node->getStart() . ':' . $node->getUri();
if (array_key_exists($cacheKey, $this->cache)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is a rather hot path, so using
If(isset(...) || array_key_exists(...))
Could speed it up further

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - I could try how that affects the performance, but I'm not too optimistic. I think that forming the cache key dominates the performance impact of the function.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why only isset is not sufficient? $cache doesn't have a type declaration unfortunately but I assume it would not contain null values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has nulls at least for fields where there is no comment text, but the cost for those fields is probably not big, since the comment is never parsed. But there was the following comment in the code, which is the primary reason I added this - I have not verified if that is still the case though:

try {
    // create() throws when it thinks the doc comment has invalid fields.
    // For example, a @see tag that is followed by something that doesn't look like a valid fqsen will throw.
    return $this->docBlockFactory->create($text, $context);
} catch (\InvalidArgumentException $e) {
    return null;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could use a NullObject instead of null in this cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the NullObject pattern - so just define class NullObject {} and store those in the array, without NullObjects ever leaving the cache class? I'm not sure if that would make the code clearer here, since there would have to be a check for instanceof NullObject when reading the cache.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A NullObject would only use unneeded memory and put pressure on the GC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not with

class NullObject {
    public static function getInstance(): self {
        static $instance;
        return $instance ?? ($instance = new self);
    }
}

However, I'm still not convinced this would make the code clearer, especially since nullness only needs to be checked in one place.

return $this->cache[$cacheKey];
}
$text = $node->getDocCommentText();
return $this->cache[$cacheKey] = $text === null ? null : $this->createDocBlockFromNodeAndText($node, $text);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard to read, could you set parenthesis?

}

public function clearCache()
{
$this->cache = [];
}

/**
* @return DocBlock|null
*/
private function createDocBlockFromNodeAndText(Node $node, string $text)
{
list($namespaceImportTable,,) = $node->getImportTablesForCurrentScope();
$namespaceImportTable = array_map('strval', $namespaceImportTable);
$namespaceDefinition = $node->getNamespaceDefinition();
if ($namespaceDefinition !== null && $namespaceDefinition->name !== null) {
$namespaceName = (string)$namespaceDefinition->name->getNamespacedName();
} else {
$namespaceName = 'global';
}
$context = new Types\Context($namespaceName, $namespaceImportTable);
try {
// create() throws when it thinks the doc comment has invalid fields.
// For example, a @see tag that is followed by something that doesn't look like a valid fqsen will throw.
return $this->docBlockFactory->create($text, $context);
} catch (\InvalidArgumentException $e) {
return null;
}
}
}
63 changes: 16 additions & 47 deletions src/DefinitionResolver.php
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,7 @@
use Microsoft\PhpParser;
use Microsoft\PhpParser\Node;
use Microsoft\PhpParser\FunctionLike;
use phpDocumentor\Reflection\{
DocBlock, DocBlockFactory, Fqsen, Type, TypeResolver, Types
};
use phpDocumentor\Reflection\{DocBlock, Fqsen, Type, TypeResolver, Types};

class DefinitionResolver
{
Expand All @@ -29,11 +27,11 @@ class DefinitionResolver
private $typeResolver;

/**
* Parses Doc Block comments given the DocBlock text and import tables at a position.
* Parses and caches Doc Block comments given Node.
*
* @var DocBlockFactory
* @var CachingDocBlockFactory
*/
private $docBlockFactory;
private $cachingDocBlockFactory;

/**
* Creates SignatureInformation
Expand All @@ -49,7 +47,7 @@ public function __construct(ReadableIndex $index)
{
$this->index = $index;
$this->typeResolver = new TypeResolver;
$this->docBlockFactory = DocBlockFactory::createInstance();
$this->cachingDocBlockFactory = new CachingDocBlockFactory;
$this->signatureInformationFactory = new SignatureInformationFactory($this);
}

Expand Down Expand Up @@ -114,14 +112,14 @@ public function getDocumentationFromNode($node)
$variableName = $node->getName();

$functionLikeDeclaration = ParserHelpers\getFunctionLikeDeclarationFromParameter($node);
$docBlock = $this->getDocBlock($functionLikeDeclaration);
$docBlock = $this->cachingDocBlockFactory->getDocBlock($functionLikeDeclaration);

$parameterDocBlockTag = $this->tryGetDocBlockTagForParameter($docBlock, $variableName);
return $parameterDocBlockTag !== null ? $parameterDocBlockTag->getDescription()->render() : null;
}

// For everything else, get the doc block summary corresponding to the current node.
$docBlock = $this->getDocBlock($node);
$docBlock = $this->cachingDocBlockFactory->getDocBlock($node);
if ($docBlock !== null) {
// check whether we have a description, when true, add a new paragraph
// with the description
Expand All @@ -136,40 +134,6 @@ public function getDocumentationFromNode($node)
return null;
}

/**
* Gets Doc Block with resolved names for a Node
*
* @param Node $node
* @return DocBlock|null
*/
private function getDocBlock(Node $node)
{
// TODO make more efficient (caching, ensure import table is in right format to begin with)
$docCommentText = $node->getDocCommentText();
if ($docCommentText !== null) {
list($namespaceImportTable,,) = $node->getImportTablesForCurrentScope();
foreach ($namespaceImportTable as $alias => $name) {
$namespaceImportTable[$alias] = (string)$name;
}
$namespaceDefinition = $node->getNamespaceDefinition();
if ($namespaceDefinition !== null && $namespaceDefinition->name !== null) {
$namespaceName = (string)$namespaceDefinition->name->getNamespacedName();
} else {
$namespaceName = 'global';
}
$context = new Types\Context($namespaceName, $namespaceImportTable);

try {
// create() throws when it thinks the doc comment has invalid fields.
// For example, a @see tag that is followed by something that doesn't look like a valid fqsen will throw.
return $this->docBlockFactory->create($docCommentText, $context);
} catch (\InvalidArgumentException $e) {
return null;
}
}
return null;
}

/**
* Create a Definition for a definition node
*
Expand Down Expand Up @@ -346,6 +310,11 @@ public function resolveReferenceNodeToFqn(Node $node)
return null;
}

public function clearCache()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a docblock.

{
$this->cachingDocBlockFactory->clearCache();
}

private function resolveQualifiedNameNodeToFqn(Node\QualifiedName $node)
{
$parent = $node->parent;
Expand Down Expand Up @@ -1080,7 +1049,7 @@ public function getTypeFromNode($node)
// function foo($a)
$functionLikeDeclaration = ParserHelpers\getFunctionLikeDeclarationFromParameter($node);
$variableName = $node->getName();
$docBlock = $this->getDocBlock($functionLikeDeclaration);
$docBlock = $this->cachingDocBlockFactory->getDocBlock($functionLikeDeclaration);

$parameterDocBlockTag = $this->tryGetDocBlockTagForParameter($docBlock, $variableName);
if ($parameterDocBlockTag !== null && ($type = $parameterDocBlockTag->getType())) {
Expand Down Expand Up @@ -1117,7 +1086,7 @@ public function getTypeFromNode($node)
// 3. TODO: infer from return statements
if ($node instanceof PhpParser\FunctionLike) {
// Functions/methods
$docBlock = $this->getDocBlock($node);
$docBlock = $this->cachingDocBlockFactory->getDocBlock($node);
if (
$docBlock !== null
&& !empty($returnTags = $docBlock->getTagsByName('return'))
Expand Down Expand Up @@ -1185,7 +1154,7 @@ public function getTypeFromNode($node)
// Property, constant or variable
// Use @var tag
if (
($docBlock = $this->getDocBlock($declarationNode))
($docBlock = $this->cachingDocBlockFactory->getDocBlock($declarationNode))
&& !empty($varTags = $docBlock->getTagsByName('var'))
&& ($type = $varTags[0]->getType())
) {
Expand Down Expand Up @@ -1302,7 +1271,7 @@ public static function getDefinedFqn($node)
// namespace A\B;
// const FOO = 5; A\B\FOO
// class C {
// const $a, $b = 4 A\B\C::$a(), A\B\C::$b
// const $a, $b = 4 A\B\C::$a, A\B\C::$b
// }
if (($constDeclaration = ParserHelpers\tryGetConstOrClassConstDeclaration($node)) !== null) {
if ($constDeclaration instanceof Node\Statement\ConstDeclaration) {
Expand Down
4 changes: 4 additions & 0 deletions src/LanguageServer.php
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,10 @@ public function __construct(ProtocolReader $reader, ProtocolWriter $writer)
$e
);
}

// When a request is processed, clear the caches of definition resolver as not to leak memory.
$this->definitionResolver->clearCache();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the cache needs to be cleared after every single protocol message. This is a very hot code path that is even hit in the middle of indexing. Could you explain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent was to

  • clear the cache when some user action causes the cache to be filled to avoid memory leaks, and
  • not have to worry about invalidating the cache on user edits.

You're right, the code does not fulfill that intent. I also did not realize that it could be hit in the middle of indexing - I imagine that this might cause some weird behavior if user edit is performed on a file that is being parsed.

Copy link
Owner

@felixfbecker felixfbecker Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't happen while a file is parsing, but whenever control is yielded back to the event loop (definitely between indexing files) or getting file contents.

The problem I have is that a cache is not useful if it is constantly cleared when it doesn't have to. For example, moving the mouse over a document can trigger thousands of hover requests within milliseconds. All of these would cause the whole cache to be cleared every time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - it currently only speeds things up when the same docblock is parsed multiple times on the same request. If it's moved into PhpDocument, this issue could also be resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixfbecker - I started to move this to PhpDocument and found that it was rather difficult to pass PhpDocument to all functions requiring access to docblocks.

Should I

  • Pass PhpDocument as argument individually to all DefinitionResolver functions requiring it, even if it requires adding the same argument to many places.
  • Have a per-document DefinitionResolver, containing a reference to PhpDocument
  • Have PhpDocumentFactory in DefinitionResolver, access the document through the URI of a node. This would require some changes though, since PhpDocument parses in constructor, so it would not be accessible via PhpDocumentFactory during the initial parse.
  • Something else?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Maybe create a dedicated class for getting docblocks that caches them? The only constraint is that the memory needs to be disposed when a PhpDocument is disposed.


// Only send a Response for a Request
// Notifications do not send Responses
if (AdvancedJsonRpc\Request::isRequest($msg->body)) {
Expand Down
2 changes: 2 additions & 0 deletions src/PhpDocument.php
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,8 @@ public function updateContent(string $content)
}

$this->sourceFileNode = $treeAnalyzer->getSourceFileNode();

$this->definitionResolver->clearCache();
}

/**
Expand Down