Parser combinator framework consumes unnececessary amount of memory #319

scabug · 2012-10-12T04:58:23Z

Since all Readers provider by parser combinators framework use PagedSeq inside, using those parsers for working with large files seems impossible - because PagedSeq will not release already parsed elements.

For example, consider the scenario of parsing 1GB file, from which you need only a portion of information (you may want to skip headers, comments, etc.). PagedSeq will hold on the whole 1GB until the parsing finishes and GC would step in.

Example code:

import collection.immutable.PagedSeq
import util.parsing.combinator._
import util.parsing.input._

// virtual file reader (simulates ~400Mb file)
def in = new java.io.Reader {
  var buffersRead = 0
  def read(cbuf: Array[Char], offset: Int, l: Int) = {
    if (buffersRead < 100000) {
      (0 until cbuf.size).foreach(cbuf(_) = 't')
      buffersRead += 1
      cbuf.size
    } else -1
  }
  def close() {}
}

def parser = new RegexParsers {
  var gcCountdown = 0
  def tt = new Parser[Char] {
    def apply(in: Input) = {
      gcCountdown += 1
      if (gcCountdown > 10000) {
        System.gc()
        gcCountdown = 0
      }
      if (in.atEnd)
        Failure("", in)
      else
        Success(in.first, in.drop(1024))
    }
  }
  def go(in: java.io.Reader) = parseAll(tt.*, in).get.size
}
println(parser.go(in))

If you would look at memory usage using something like jvisualvm, you would notice that running this process consumes about 800Mb of RAM just to parse 400kb worth of characters.

scabug · 2012-10-12T04:58:23Z

Imported From: https://issues.scala-lang.org/browse/SI-6520?orig=1
Reporter: Platon Pronko (rogach)
Affected Versions: 2.9.2

scabug · 2013-07-10T22:20:08Z

@adriaanm said:
Unassigning and rescheduling to M6 as previous deadline was missed.

scabug closed this as completed Jul 17, 2015

SethTisue transferred this issue from scala/bug Nov 19, 2020

scala deleted a comment from scabug Nov 19, 2020

SethTisue reopened this Nov 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser combinator framework consumes unnececessary amount of memory #319

Parser combinator framework consumes unnececessary amount of memory #319

scabug commented Oct 12, 2012

scabug commented Oct 12, 2012

scabug commented Jul 10, 2013

Parser combinator framework consumes unnececessary amount of memory #319

Parser combinator framework consumes unnececessary amount of memory #319

Comments

scabug commented Oct 12, 2012

scabug commented Oct 12, 2012

scabug commented Jul 10, 2013