Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser combinator framework consumes unnececessary amount of memory #319

Open
scabug opened this issue Oct 12, 2012 · 2 comments
Open

Parser combinator framework consumes unnececessary amount of memory #319

scabug opened this issue Oct 12, 2012 · 2 comments

Comments

@scabug
Copy link

scabug commented Oct 12, 2012

Since all Readers provider by parser combinators framework use PagedSeq inside, using those parsers for working with large files seems impossible - because PagedSeq will not release already parsed elements.

For example, consider the scenario of parsing 1GB file, from which you need only a portion of information (you may want to skip headers, comments, etc.). PagedSeq will hold on the whole 1GB until the parsing finishes and GC would step in.

Example code:

import collection.immutable.PagedSeq
import util.parsing.combinator._
import util.parsing.input._

// virtual file reader (simulates ~400Mb file)
def in = new java.io.Reader {
  var buffersRead = 0
  def read(cbuf: Array[Char], offset: Int, l: Int) = {
    if (buffersRead < 100000) {
      (0 until cbuf.size).foreach(cbuf(_) = 't')
      buffersRead += 1
      cbuf.size
    } else -1
  }
  def close() {}
}

def parser = new RegexParsers {
  var gcCountdown = 0
  def tt = new Parser[Char] {
    def apply(in: Input) = {
      gcCountdown += 1
      if (gcCountdown > 10000) {
        System.gc()
        gcCountdown = 0
      }
      if (in.atEnd)
        Failure("", in)
      else
        Success(in.first, in.drop(1024))
    }
  }
  def go(in: java.io.Reader) = parseAll(tt.*, in).get.size
}
println(parser.go(in))

If you would look at memory usage using something like jvisualvm, you would notice that running this process consumes about 800Mb of RAM just to parse 400kb worth of characters.

@scabug
Copy link
Author

scabug commented Oct 12, 2012

Imported From: https://issues.scala-lang.org/browse/SI-6520?orig=1
Reporter: Platon Pronko (rogach)
Affected Versions: 2.9.2

@scabug
Copy link
Author

scabug commented Jul 10, 2013

@adriaanm said:
Unassigning and rescheduling to M6 as previous deadline was missed.

@scabug scabug closed this as completed Jul 17, 2015
@SethTisue SethTisue transferred this issue from scala/bug Nov 19, 2020
@scala scala deleted a comment from scabug Nov 19, 2020
@SethTisue SethTisue reopened this Nov 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants