Skip to content

v0.15: Experimental new CSV-, and Geographic integrations and many other fixes

Latest
Compare
Choose a tag to compare
@Jolanrensen Jolanrensen released this 09 Dec 14:13
· 1 commit to 0.15.0 since this release

This release contains several new features, tons of fixes and two new exciting experimental new integrations:

  • Experimental new CSV parser based on Deephaven-CSV. See below for more information.
  • Experimental new GeoDataFrame class for working with geographical data (from GeoJson/Shapefile) and plotting it with Kandy. See below for more information.
  • Full BigInteger support:
    Just like we support the BigDecimal numbers, DataFrame now also supports BigInteger in parsing, converting, statistics, column arithmetics, etc.
  • Custom SQL Database registration (read user guide)
  • Improved parsing:
    Parsing and converting String columns to other types is now faster.
    We added String -> Char parsing.
    We also introduce the new experimental ParserOptions.useFastDoubleParser setting, which uses FastDoubleParser for faster and more flexible Double parsing.
  • We continue improving our Compiler Plugin with every release. See below for more information.
  • See this notebook for some more information about the changes.

New Experimental CSV integration

DataFrame's CSV parsing has been based on Apache Commons CSV from the beginning. While this has been sufficient for most applications, it had some issues like running out of memory, performance, and our API lacking in clarity, documentation, and completeness.

For DataFrame 0.15, we introduce a new separate package org.jetbrains.kotlinx:dataframe-csv which tries to solve all these issues at once. It's based on Deephaven-CSV which makes it faster and more memory efficient. And since we built it from the ground up, we made sure the API was complete, predictable, and documented carefully.

To try it yourself, explicitly add the dependency org.jetbrains.kotlinx:dataframe-csv to your project. In notebooks you can add enableExperimentalCsv=true to the %use-magic, like %use dataframe(enableExperimentalCsv=true).
Use the new DataFrame.readCsv()/DataFrame.readTsv()/DataFrame.readDelim() functions over the old DataFrame.readCSV() ones.

We happily await your feedback!

New Experimental Geo integration

Kandy v0.8 introduces geo-plotting which allows you to visualize geospatial/geographical data using the awesome Kandy DSL. To make working with this geographical data (from GeoJson/Shapefile) easier, we happily accepted the GeoDataFrame PR from the Kandy team.

To try it yourself, explicitly add the dependency org.jetbrains.kotlinx:dataframe-geo to your project or enableExperimentalGeo=true to your notebook (with the repository maven("https://repo.osgeo.org/repository/release")) and use GeoDataFrame.readGeoJson() or GeoDataFrame.readShapeFile() to get started!

Features

Compiler Plugin

  • [Compiler plugin] Lower frontend generated implicit receivers by @koperagen in #869
  • Generate valid code in transform(call) when interpret(call) fails by @koperagen in #907
  • [Compiler plugin] Support dataFrameOf(Pair<String, List) by @koperagen in #908
  • [Compiler plugin] Add a mechanism to handle function calls to stdlib that can appear as df api arguments by @koperagen in #914
  • [Compiler plugin] Generate ColumnName annotations on frontend for all names that contain illegal characters by @koperagen in #913
  • Revert insertGenericTreeImpl by @koperagen in #923
  • [Compiler plugin] Propagate nullability in toDataFrame tree conversion by @koperagen in #942
  • Add castTo(Function) overload for workflows that use compiler plugin by @koperagen in #948
  • [Compiler plugin] Setup call transformer pipeline to handle (...) -> DataRow functions by @koperagen in #918
  • Compiler plugin read improvements by @koperagen in #949
  • [Compiler plugin] Support valueCounts by @koperagen in #951

Fixes

Docs and Examples

New Contributors

Full Changelog: v0.14.2...v0.15.0